cat /dev/brain

API Client Design Decisions

Every API client is different. They are shaped by many things, including:

  • the author's personal tastes
  • the language and its conventions
  • the API itself
  • the underlying HTTP library

People have told me that they love github3.py and how I designed it. Personally, I feels there is room for significant improvement. Many of the deficiencies sought to fix other problems. Let's explore.

Overloaded use of None

None is the favorite sentinel in Python. It can have any number of semantic meanings depending on the context. In github3.py, it also had several of its own meanings:

  • a request the user made for an object returned a 404
  • GitHub's API returned null for a specific attribute
  • the representation of the current object didn't contain the attribute

The first and last cases were what caused me to make some ... questionable design decisions.

None returned because of 404

GitHub's API returns 404 for a few cases:

  1. the route is incorrect, e.g., /userss/foo (note the extra s in userss)
  2. the resource doesn't exist, e.g., /users/does-not-exist-123456789
  3. the user doesn't have permission to see the resource, e.g., /repos/company/private-repo

But the API doesn't tell us why. Because of this, we cannot guide the user towards remediation. It seemed best when I made the library to just return None.

Representing a 404 (on some calls) as None started to irk me. As a user of my own library, I had written a lot of code that looked like

user = gh.user('sigmavirus24')

if user is not None:
    # Do something

repository = gh.repository('sigmavirus24', 'github3.py')

if repository is not None:
    # Do something

I discovered the Null Object pattern while I was still working predominantly in Ruby. I thought that this could make all of those pesky checks for None go away. I'd finally be able to just code fearlessly while still being able to warn the user if it was necessary and, of course, surely my users would love it too!

It turns out that implementing an object that behaves this way is pretty easy. So naturally, I threw it right into github3.py. Unfortunately, people have found this more confusing than not. I don't want to confuse my users. I don't want to burden them with extra cognitive decisions and context. I decided it's probably best to revert to just returning None. If you're wrinkling your nose at that, then let's talk through some alternatives:

  1. raise an exception

    In my opinion this is worse than returning None because then each of those accesses looks like:

    try:
        user = gh.user('sigmavirus24')
    except github3.exceptions.NotFound:
        # Handle 404
    else:
        # Do something with user object
    

    At least by returning None, users can be reasonably sure they'll get a value they can handle.

  2. return an Option or Result type

    If this were Haskell or Rust that would be fantastic. I suspect, however, that this would be as confusing (if not more so) than the Null object pattern. To provide an example, this would start to look like this:

    user = gh.user('sigmavirus24').ok().unwrap()
    

    That provides some assurance that the user will exist. There is no need for the guarding if conditions. But exception handling will have to happen somewhere since it isn't always safe to just unwrap a result.

Signifying a Missing Attribute Using None

I chose to use None to represent a missing attribute to avoid users having two instances of the same object with different sets of attributes. This is fundamentally a symptom of the GitHub API. For example, as an API user, you can get either all of a single user's information or a subset of several user's information. Specifically, there's a difference between doing:

GET /users/sigmavirus24

And the representation of that user in:

GET /users

The latter is a subset of the data in the former. The former is also a subset of what you receive for information about an authenticated user:

GET /user

I tried to represent all three of these with the same User object. The problem with this is that sometimes some of the attributes are missing. I chose to always initialize the attributes from each of the representations. But, how do we distinguish between attributes that were genuinely null and those that were not provided?

For this, I came up with the idea of a sentinel specific to github3.py to signify this. Thus Empty was born. Beyond the terrible name, this has barely been merged (not even released) for a couple weeks and has already caused a significant amount of confusion.

At this point, I do think we can learn something from our friends using languages like Haskell and Rust. I have proposed that instead of trying to handle two or more sets of data that we split these out into separate classes (without duplicating too much code). For example, after this change if a user wrote:

user = gh.user('sigmavirus24')

They would get a User instance back. If instead a user wrote:

user = next(gh.all_users())

They would receive an instance of a different object, perhaps ListUser, that only has the attributes returned in that part of the response. Finally, if the user wrote:

user = gh.me()

They would receive an instance of an AuthenticatedUser. These would all share methods and have some of the same attributes. Further, users would be able to convert ListUser into a User similar to how they already retrieve the extra attributes.

Conclusion (For Lack of a Better Section Title)

It's taken me several years to learn some basic lessons:

  • Separate out your representations of API responses when they differ (even slightly). This makes explicit what is possible and what data is available.
  • Use your language's empty/null/none sentinel judiciously.
  • Do not try to introduce new and foreign concepts to your users.

With the amount of code that currently is github3.py this will take me... a while (unless you'd like to help!).

If you learn from my mistakes, you'll be able to incorporate these designs from the start. Your users will be happier, your code will be more maintainable, and your client will be just a tad bit more "self-documenting" (which is not an excuse to write poor documentation).