Download online data with Python

Category: programming

Python as a very convenient and powerful scripting language, is used allowed with scraping related data. One particular example of scraping would be downloading data. Let’s say we already have format of destination links and what we need to do is just to save those online data locally in files or in a database. There are quite some ways to achieve it but each is going to fit for its own purpose.

Mainly there are the following ways:

requests
wget
curl
mechanize
selenium

requests

Requests is a great library and you can see it from the number of stars it has received so far. It handles a lot of stuff in HTTP nicely. There is not much need in going into details about how good it is–always prefer it over urllib3 because it is just awesome and user-friendly.

wget

It is “Wget - The non-interactive network downloader.” as the name in its man page. It is born for downloading information and it is reliable. In fact, you can even download an entire website

curl

CURL is a tool for making all sort of http requests. And with a -O flag we can save response somewhere as well.

mechanize

Mechanize is a more like a simple browser. It could handle form submission, follow links easily. But of course it does not support JavaScript execution.

selenium

Selenium is used more in user acceptance testing instead of web crawling. But because it runs by driving browser, it could be used to crawling dynamic pages. If you know what is the pattern for an Ajax request, then use requests or curl to make Ajax requests directly; but if the page is complicated enough, then mostly we will have to use selenium as the browser automation tool.