cat /dev/brain

github3.py 1.0.0a1 released

On August 2nd, 2013 I opened issue 122 on github3.py's issue tracker. The driving force behind the "Roadmap for 1.0" was to clean up an API that I fundamentally disliked. Yesterday, on December 7th, 2014, I released the first alpha version of 1.0. The following is a brief description of why 1.0 is necessary (in my opinion) and how we have come to the point where 1.0 is close to being released. This blog could be alternatively titled: "Some of what Ian has done in the last year and four months".

Why is 1.0 necessary?

When I started writing github3.py, I didn't have a firm understanding of APIs, or anything surrounding them. I especially didn't fully understand pagination or how to follow pagination. My initial attempts at writing methods for resources that were paginated ended up using the method names that I had wanted to use for the entire resource. Instead of being able to call .repos() on a GitHub instance, you now had to call .iter_repos() because I had taken on something without fully understanding the nuance.

Further, by August of last year, I realized that any progressive migration to an API that I would actually enjoy using and like would be very difficult. It was especially apparent because I knew there were methods that needed to be broken up. Their signatures and their usage were too confusing for even me to understand how to use properly. In short, I was too stupid to use my own library and I was not happy about that.

Why did it take a year and 4 months to get an alpha ready?

Sadly, I have done a terrible job of growing a community around github3.py. I frankly just do not have a lot of contributors and write most of the code myself. Lately there has been an uncharacteristic up-tick in the number of fresh faces and contributions received from each, but I'm not very confident that will last long.

github3.py also isn't the only project I work on and am responsible for in the commons of Open Source. A lot of other projects have consumed my attention and time for the last year and I don't regret it at all.

Prior to August, I started a job as a consultant at Bendyworks and started working with developers at Moz. Their team used VCR for integration testing of their services that spoke to other APIs. Using VCR, learning its purpose, and realizing how much more reliable it made the codebase inspired me to make Betamax for Python. I wanted to use Betamax with github3.py and I wanted to really change and improve github3.py's test suite. We had bugs that would have been caught (and would have been easier to test) if we had integration tests with cassettes that we could immediately re-record.

At the same time, I realized that there was a really severe need by users of requests for streaming multipart/form-data uploads with very large files. So recognizing that need, I created requests-toolbelt with Cory Benfield (the other core developer of requests) to supplement the typical requests user experience.

Confidence

Through all of this I continued to learn Ruby and JavaScript more as I worked with them for at least 32 hours a week [1] I learned more and more techniques that seemed isolated to those communities which haven't really taken a strong hold in the Python community. The idea of confident programming really enthralled me. Especially the idea of the null object pattern enthralled me. It meant you could do things so much more confidently without having to be as paranoid as usual. In all versions of github3.py < 1.0.0a1, you currently have to do

import sys

import github3

user = github3.user('sigmavirus24')

if user is None:
    print('User sigmavirus24 could not be found.')
    sys.exit(1)

for repository in user.iter_repos():
    print(str(repository))

If you didn't have that if condition in there, then trying to iterate over the user's repositories would raise an AttributeError because 'NoneType' object has no attribute 'iter_repos'. Starting in 1.0.0a1 you can do, more simply

import github3

user = github3.user('sigmavirus24')

for repository in user.repositories():
    print(str(repository))

If sigmavirus24 isn't a valid user on GitHub, you'll get back a NullObject. That will respond to .repositories() and return the same NullObject. When iterated over, it essentially provides an emtpy list. This code is simpler and it's far easier to read. Granted if you were only checking if user then your code will still exit out as you intended. So code written vaguely will still have the benefit of None being a "False-y" value and shouldn't break while upgrading (in this one very particular instance).

Testing

Also, while working on Moz, I found that I really liked applying separation of concerns to tests as well as my library. This meant that as I began to replace the old tests in github3.py, I created two new classes of tests: unit and integration tests.

The unit tests are never allowed to talk to the internet. All calls to requests are verified using mock and no return value is ever checked. The unit tests purely test the code in that method and the fact that they construct a call to requests a certain way.

The integration tests actually talk to GitHub's API once and those interactions with the API are recorded using Betamax and then saved and used in future runs of the tests. [2]

All of this came together to mean that whenver I changed something or added a feature, I wrote unit and integration tests for the change. At times, I even went through and would write a large number of unit or integration tests (or both) at once just to get them out of the way. I had, however, also changed how I wrote tests. Each test made as few assertions as possible. The test method (or function) name was descriptive and each test had a doc-string to explain its purpose. This meant that my tests were not DRY. This is a large change from the old tests in github3.py which would stick all the assertions about a particular method into one test method. [3]

Help

Finally, after more than a year of wading through all of this alone, a few people popped up who reinvigorated my enthusiasm and sent pull requests to help with the effort.

  • Barry Morrison sent PRs to help rename a bunch of the iter_* methods that I had not yet converted and forced me into writing tests for those changes during Madison+Ruby.
  • Carol Willing sent PRs to help switch methods to use a NullObject when they were previously returning None.
  • Marc Abramowitz sent PRs that improved the way we used tox and gave me the energy to add flake8 to the testing process on Travis-CI.

Where are we today? (a.k.a, Why isn't this 1.0.0?)

Well, I'm a bit of perfectionist. I think yesterday's alpha release really could have been a beta release but I'm

  • not that confident that there isn't something broken
  • cautious when trying to communicate the status of somewhat experimental software to users

There are other things that need to be done for a 1.0.0 to be satisfactory in my mind:

  • Most if not all of the items on issue 122 need to be checked off (improved/rewritten documentation, better integration test coverage)
  • The items marked for the 1.0.0a2, 1.0.0b1, and 1.0.0 milestones need to be completed.
  • The 0.99.0 transition version needs to be very very very close to being finished. [4]
  • I need to go through my backlog of 160 or so notifications from the developer.github.com repository and make sure nothing important is missing.

Documentation

Right now we have our experimental documentation on ReadTheDocs but we've also been experimenting with using IPython Notebook for docs. I personally find writing notebooks much easier than trying to write conventional documentation. It feels more engaging than sitting in vim and writing reStructuredText. It also has the added benefit that anyone can run:

git clone git://github.com/sigmavirus24/github3.py
cd github3.py
mkvirtualenv github3.py
pip install .
pip install ipython[notebook]
ipython notebook

And then explore those notebooks in their own browser locally and really run the code without having to copy and paste into a different medium.

I really owe Barry Morrison a great deal of gratitude for this idea though. Without their inspiration, I would never have started experimenting with using IPython Notebook for documentation.

This blog post was too long!

So the key points to take away are:

  • github3.py 1.0.0 is going to be awesome
  • github3.py users who are on older versions will have a migration path (0.99)
  • While the work to get us to this point has been largely done by me, the inspiration and motivation that made that work happen came entirely from the users and I love them for it.
  • I'm trying out some cool stuff including:
    • Using the Null Object Pattern
    • Integration testing
    • Not-DRY (or very wet) tests
    • Using IPython Notebook for documentation along side auto-generated documentation by Sphinx
  • The sooner you want 1.0.0 out the door, the more you should do to help us get there. Look at what's left to get us to 1.0.0 and send pull requests as soon as you can.

Thank you for reading this and for using github3.py. This project has been a true joy to work on and supporting you, the community, is one of my favorite hobbies.


Footnotes

[1]At Bendyworks, we paired for 4 days a week on client projects and the 5th day was a personal growth day that could be used for improving yourself or the company.
[2]Many Django developers would probably call this "Functional testing". Rubyists usually call this "acceptance testing".
[3]I used to think large test counts were just padding by not being DRY. I've since paid dearly for this foolish way of thinking.
[4]0.99 is another idea I stole from Rubyists, specifically from RSpec. They recently went from 2.x to 3.x and released 2.99 as a compatibility layer to ease the transition for users.