Making HTTP requests and parsing in JS and node.js
Contents
Making HTTP requests and parsing in JS and node.js¶
Web scraping is a very powerful tool, and at the heart of the scraping techniques is the humble GET request. These notes are for specific JS implementations of common idioms
Table of Contents¶
Making HTTP requests¶
There are several routes for making HTTP requests in JavaScript. I will cover a few of them here.
HTTP client¶
TODO
XMLHttpRequest¶
The most traditional of the HTTP modules, XMLHttpRequest is a browser asynchronous library supporting many of the common HTTP methods. For node, we install a wrapper around this library npm i xmlhttprequest
designed to emulate the browser HTTP client.
const XMLHttpRequest = require("xmlhttprequest").XMLHttpRequest;
const http = new XMLHttpRequest();
http.open('GET', 'url');
// headers must be set between open() and send()
http.setRequestHeader('someNewHeader', 'value');
http.send();
http.onreadystatechange = (e) => {
console.log(http.responseText)
};
Note that the Ajax specification does not allow User-Agent headers to be changed.
jQuery Ajax in clients¶
TODO
Axios¶
The axios
project is arguably the de-facto standard for making HTTP/HTTPS requests with JS.
const axios = require('axios');
request('url').then(resp => {
// handle resp if success
}).catch(err => {
// handle error
}).then(() => {
// always executed
})
### The (deprecated) request
package
Similar to the Python requests
module, request
is a JS wrapper around. The syntax is, likewise, very similar, e.g.
const request = require('request');
request('url', (err, resp, body) => {
if (!err && resp.statusCode == 200) {
// good request
} else {
// bad request
}
});
Headers and other content can be included in an argument object
const opts = {
url: 'url',
headers : {
'someHeader': 'value',
'anotherHeader': 'otherValue'
}
};
request(opts, (err, resp, body) => {
// ...
});
Parsing response data¶
My aim here was to find a library similar to Python’s BeautifulSoup. From minimal research, I quickly stumbled on a few packages, which I will document here. Additionally, I wanted a library that used the familiar jQuery syntax.
My cookbook of jQuery searches is here (todo), updated as I find new recipes and commands.
Cheerio¶
Cheerio provides jQuery like parsing in node. Syntactically, it is very easy to use, e.g.
const cheerio = require('cheerio');
const $ = cheerio.load('some HTML string');
var href = $('searchquery').find('secondary').href;