Python `requests` cookbook¶

For easy automated access to the internet.

Table of Contents¶

Sessions
1. Headers in sessions
Streaming file content

Sessions¶

Sessioning is an easy way to ensure that metadata like cookies, or referer headers are maintained when scraping a website. They also provide an ‘environment’ for request conditions.

We can define a session, and mount transport adapters with

session = requests.Session()
adapter = requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10, max_retries=10)
session.mount('https://', adapter)	# example, but http and https are included in HTTPAdapter
session.mount('http://', adapter)

Headers in sessions¶

So that headers are dynamically updates when visiting different pages during a session, the requests library includes a special method to add new and update old headers without annihilating those received organically

session.headers.update({
		'old header' 		: 'updated value', 
		'some new header' 	: 'new value'
})

As always, my commonly used headers are

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:70.0) Gecko/20100101 Firefox/70.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'DNT': '1',
    'Connection': 'keep-alive',
    'Referer': '',
    'Upgrade-Insecure-Requests': '1',
}

Streaming file content¶

In order to stream bytes instead of downloading the full file we can use the stream kwarg. An example implementation is

with requests.get(url, stream=True) as r:
	r.raise_for_status()		# raises if bad status code
	with open(filename, 'wb') as f:
		for chunk in r.iter_content(chunk_size=8192):	# 2^13
			if chunk:			# ensures content available, not just keep alive resp
				f.write(chunk)

Combining this stream idiom with the sessioning can provide very fast downloads.

notes

Python requests cookbook

Contents

Python `requests` cookbook¶

Table of Contents¶

Sessions¶

Headers in sessions¶

Streaming file content¶

notes

Python requests cookbook

Contents

Python requests cookbook¶

Table of Contents¶

Sessions¶

Headers in sessions¶

Streaming file content¶

Python `requests` cookbook¶