cat /dev/brain

Cutting Off the Internet, Part I

This year I gave the talk "Cutting Off the Internet: Testing Applications that Use Requests" at PyTennessee and PyCon. The recording of the talk is already online with my slides.

At the end of my talk, I promised to write a blog post going into far more detail and covering some more practical examples. As I already mentioned in the talk, the primary reason I wrote betamax was to be able to test github3.py, so most of these will be examples ripped right out of that repository. [1]

If you have already watched the talk (or were there in person -- thank you for coming), you should be able to dive right in to the rest of this blog post. If you have not yet watched the talk, you should be okay, but I make no guarantees that I won't reference parts of the talk in the blog post.

All code below should be compatible on Python 2.6, 2.7, 3.3, and 3.4 (and probably 3.2 as well).

Let's Talk About Code

The examples I gave during my talk were pretty simplistic, but let's reproduce them here for referential continuity.

The example we were testing throughout the presentation is:

def url_for(resource):
    return 'https://api.github.com/{0}'.format(resource)

def get_resource(session, resource, params=None, headers=None):
    url = url_for(resource)
    resp = session.get(url, params=params, headers=headers)
    if not resp.ok:
        resp.raise_for_status()
    return resp

For this blog post, we'll talk about that example but we're also going to lift some code directly out of the source of github3.py.

The first example we'll pull will be the code used to create a new repository:

@requires_auth
def create_repository(self, name, description='', homepage='',
                      private=False, has_issues=True, has_wiki=True,
                      auto_init=False, gitignore_template=''):
    """Create a repository for the authenticated user.

    :param str name: (required), name of the repository
    :param str description: (optional)
    :param str homepage: (optional)
    :param str private: (optional), If ``True``, create a
        private repository. API default: ``False``
    :param bool has_issues: (optional), If ``True``, enable
        issues for this repository. API default: ``True``
    :param bool has_wiki: (optional), If ``True``, enable the
        wiki for this repository. API default: ``True``
    :param bool auto_init: (optional), auto initialize the repository
    :param str gitignore_template: (optional), name of the git template to
        use; ignored if auto_init = False.
    :returns: :class:`Repository <github3.repos.Repository>`

    .. warning: ``name`` should be no longer than 100 characters
    """
    url = self._build_url('user', 'repos')
    data = {'name': name, 'description': description,
            'homepage': homepage, 'private': private,
            'has_issues': has_issues, 'has_wiki': has_wiki,
            'auto_init': auto_init,
            'gitignore_template': gitignore_template}
    json = self._json(self._post(url, data=data), 201)
    return self._instance_or_null(Repository, json)

This example has a lot of private methods it calls and uses a special decorator created for github3.py. As you might recall, I strongly advocate using Session objects from Requests, so you would be correct if you guessed that github3.py leverages Session objects heavily. The requires_auth decorator basically checks the saved Session object in order to determine if the user has registered any kind of credentials with which we can authenticate against the GitHub API.

Further, the method does a few things it calls _post and _json. The private _post method adds some functionality that as only recently added to requests and the _json method will raise an exception if the response object doesn't have the specified status code, otherwise it will return the parsed JSON from the response.

Our other example will be slightly simpler - we'll use the method that allows users to get information about an Organization:

def organization(self, username):
    """Returns a Organization object for the login name

    :param str username: (required), login name of the org
    :returns: :class:`Organization <github3.orgs.Organization>`
    """
    url = self._build_url('orgs', username)
    json = self._json(self._get(url), 200)
    return self._instance_or_null(Organization, json)

This time, the big difference is in calling the _get method, which you can guess the purpose of.

Collaborators, Dependency Injection, and Mocks

So I didn't name it directly during the talk, but when dealing with collaborators in unit tests, dependency injection is your friend. Some really smart people (like Daniel Rocco) might disagree with me, and I mostly agree with them except in the cases where you need to add tests to an existing code base without significantly refactoring it. So let's get started testing these code samples.

The first thing you might notice is that both of these pieces of code are actually methods. They both live on the same object (github3.github.GitHub) which has a session attribute. That session is a subclass of a regular requests.Session class. What it adds on top is frankly unimportant. That said, we need to create a mock object to replace our session.

A simple approach would be to use, quite simply, a mock.Mock instance. This will work until we try to test our create_repository method. create_repository is decorated with requires_auth which introspects the session for some things. Let's start testing our organization method first though, and see how far this gets us.

Testing GitHub#organization

First let's have a function which gives us our mocked out session:

def create_mocked_session():
    return mock.Mock()

And let's make another function to create a session with the mocked out session: [2]

def create_client():
    session = create_mocked_session()
    client = GitHub()
    client.session = session
    return client

Now, we have mocked out our session object. Let's write our unit test for the organization method:

def test_organization():
    """Check the call made to requests to retrieve organization data."""
    client = create_client()
    client.organization('requests')  # Yes, there is a requests organization

    session = client.session
    session.get.assert_called_once_with('https://api.github.com/organizations/requests')

This is a pretty simple test. It works very well and it's very fast. In fact, excepting how the test is set up, this is pretty much a standard unit test in github3.py. This covers the very simple case of most endpoints. Let's posit, however, that during a refactor I miswrote the call to session.get. If I miswrote the assertion in the test too, I'd be testing that the code is actually broken. For example, I might write:

session.gte.assert_called_once_with(...)

But there is no method defined as gte. In that case, it would be nice if our mock object would not naïvely allow us to assert the wrong things. We can have our mock very closely mimic the session object by updating our create_mocked_session function:

def create_mocked_session():
    MockedSession = mock.create_autospec(github3.session.GitHubSession)
    return MockedSession()

By creating an autospec, we're telling the mock library to introspect the class we're mocking out. This will only allow methods defined on the original object to be called. This means that we'd see an exception. This is a huge improvement.

Testing GitHub#create_repository

Testing our create_repository method will be a bit more difficult to test. Let's reproduce the meat of the method again so we can remind ourselves what it looks like.

@requires_auth
def create_repository(self, name, description='', homepage='',
                      private=False, has_issues=True, has_wiki=True,
                      auto_init=False, gitignore_template=''):
    url = self._build_url('user', 'repos')
    data = {'name': name, 'description': description,
            'homepage': homepage, 'private': private,
            'has_issues': has_issues, 'has_wiki': has_wiki,
            'auto_init': auto_init,
            'gitignore_template': gitignore_template}
    json = self._json(self._post(url, data=data), 201)
    return self._instance_or_null(Repository, json)

Now let's take some notes:

  1. create_repository requires the user to be authenticated
  2. This posts JSON data to https://api.github.com/user/repos
  3. The response should be a 201 Created response

So let's start writing our test and see what happens.

def test_create_repository():
    """Check the call made to requests to create a new repository."""
    client = create_client()
    client.create_repository('new-repository')

    session = client.session
    session.post.assert_called_once_with(
        'https://api.github.com/user/repos',
        data={'name': 'new-repository', 'description': '', 'homepage': '',
              'private': False, 'has_issues': True, 'has_wiki': True,
              'auto_init': False, 'gitignore_template': ''},
        headers={'Content-Type': 'application/json'}
    )

This will fail because I omitted an important thing about the _post method. The data parameter converts everything to JSON by doing:

data = json.dumps(data)

So what happens if we do:

def test_create_repository():
    """Check the call made to requests to create a new repository."""
    client = create_client()
    client.create_repository('new-repository')

    session = client.session
    session.post.assert_called_once_with(
        'https://api.github.com/user/repos',
        data=('{"name": "new-repository", "description": "",'
              ' "homepage": "", "private": False, "has_issues": True,'
              ' "has_wiki": True, "auto_init": False,'
              ' "gitignore_template": ""}'),
        headers={'Content-Type': 'application/json'}
    )

This will fail as well. Why? Python's dictionaries do not preserve order. So this test won't pass, especially not on Python 3. So what we need to do is write a helper function:

def post_called_with(session, *args, **kwargs):
    """Use to assert post was called with JSON."""
    assert session.post.called is True
    call_args, call_kwargs = session.post.call_args

    # Data passed to assertion
    data = kwargs.pop('data', None)
    # Data passed by the call to post positionally
    #                                URL, 'json string'
    call_args, call_data = call_args[:1], call_args[1]
    # If data is a dictionary (or list) and call_data exists
    if not isinstance(data, str) and call_data:
        call_data = json.loads(call_data)

    assert args == call_args
    assert data == call_data
    assert kwargs == call_kwargs

This will introspect the call arguments and parse the data from JSON back into a dictionary. We then pull data out of the **kwargs to post_called_with to make the assertion. So let's update our test to use this:

def test_create_repository():
    """Check the call made to requests to create a new repository."""
    client = create_client()
    client.create_repository('new-repository')

    session = client.session
    post_called_with(
        session, 'https://api.github.com/user/repos',
        data=('{"name": "new-repository", "description": "",'
              ' "homepage": "", "private": False, "has_issues": True,'
              ' "has_wiki": True, "auto_init": False,'
              ' "gitignore_template": ""}'),
        headers={'Content-Type': 'application/json'}
    )

And now our tests pass. Now we need to figure out how to make sure the method requires authentication.

def test_create_repository_requires_auth():
    client = create_client()
    client.session.auth = None

    with pytest.raises(github3.GitHubError):
        client.create_repository('new-repository')

We need to set the auth attribute to None because Mock objects return other Mock instances which are "truth-y" in Python. @requires_auth checks for the truthiness of the auth attribute so we need to make that version "false-y".

Interlude

Now you can see why, while a good example for a 25 minute talk, the code in the presentation was a bit ... too simplistic. These examples show how you can have one fairly simple test while having another less simple test case. The test cases in the slides show the core concepts. In the next blog post we'll give some real examples using betamax.


[1]I started writing this immediately after the conference but have been constantly side-tracked by other things.
[2]Yes, I know py.test, unittest, etc have better ways of doing this, but I dno't want to bog down the blog post with that information.