cat /dev/brain

Retries in Requests

This is the second in what I hope will be a series of explorations of advanced features in requests.

Websites and servers sometimes, misbehave. They can misbehave in a number of ways:

  • read errors
  • large numbers of redirects
  • failure to connect
  • 500 errors

What most people don't know is that requests can actually handle this for you, with help from urllib3.

Using Retries in urllib3

Recently urllib3 added the ability to configure retry logic on a PoolManager or on specific requests. For example, you could make a PoolManager that retries every time it receives a 500 error:

from urllib3.util import Retry
from urllib3 import PoolManager

retries = Retry(total=5, status_forcelist=[500])
manager = PoolManager(retries=retries)
response = manager.request('GET', 'https://httpbin.org/status/500')

If you run this code, you should see it raise an exception - a MaxRetryError to be specific. You can also do one that deals with redirects:

retries = Retry(redirect=5)
manager = PoolManager(retries=retries)
response = manager.request('GET', 'https://httpbin.org/redirect/3')

This will work just fine. But if you tell HTTPbin to redirect 5 times, you'll get a MaxRetryError again.

Using Retries in requests

You may be wondering now how you can use this with requests. If you're not already familiar with Transport Adapters in requests, you should go read Cory's blog post first (but you're not required to).

Now that you've read that, you should be able to better understand how you can use retries with requests. We allow users to specify their own Adapters in requests, but we use an HTTPAdapter. An HTTPAdapter can be initialized with a custom number of pool_connections, a custom pool_maxsize, whether the pool blocks (with pool_block), and a custom max_retries number.

The trick is that max_retries doesn't need to be an integer. For example:

from requests.packages.urllib3.util import Retry
from requests.adapters import HTTPAdapter
from requests import Session, exceptions

s = Session()
s.mount('https://', HTTPAdapter(
    max_retries=Retry(total=5, status_forcelist=[500])
    )
)

s.get('https://httpbin.org/status/500')

This will raise a RetryError exception because under no conditions will HTTPbin return anything other than a 500 status code. The trick, of course, is now figuring out exactly how you want retries to work in your application.

Of course, if you still want the old behaviour of requests, we'll have that for the foreseeable future. By default, requests doesn't use urllib3's retry handling, but you can turn it on by simply passing an integer as before. So if you're already doing:

s.mount('https://', HTTPAdapter(max_retries=5))

your code will still work.

A non-trivial example

Finally, let's look at a more practical example of how this might work. Let's use PyPI as the server we're connecting to. PyPI sits behind a CDN which may return a 503 response if there's an issue between the origin servers and a point of presence (POP) server. Usually, if you retry that same request, it will succeed. So if you're using requests to download requests, you might usually make a request like this:

import requests

r = requests.get('https://pypi.python.org/simple/requests/')

In this case, you might find that r will have a status code of 503, but if you retry it, you'll get a 200 status code. Instead of having to write code to do this for you, specifying your own Retry logic will handle this, for example:

from requests.packages.urllib3.util import Retry
from requests.adapters import HTTPAdapter
from requests import Session, exceptions

s = Session()
s.mount('https://pypi.python.org/', HTTPAdapter(
    max_retries=Retry(total=5, status_forcelist=[500, 503])
    )
)
r = s.get('https://pypi.python.org/simple/requests/')

Will try to retrieve that page at most 5 times if it receives either a 500 or 503 response status code. Further, this will only retry if the URL requested starts with https://pypi.python.org/ so if you're potentially requesting other sites over HTTPS, this will not affect them.

Hopefully this bit of advanced logic is one that you will not need to use. Most users shouldn't need to have retries with logic like this. Of the users that do, simple retry logic shown above should suffice. There will be a very narrow subset of requests' users who need the Retry object's more niche features, like back-off factors and method whitelists. By allowing users to access urllib3's more powerful features, requests becomes a far more powerful library than the elegant and user-friendly API might lead you to initially believe. As I write more of these, hopefully the hidden powers of requests will continue to impress you as they impress me.