Wget: Downloading from the Internet

How to use Wget to Download Files without a Web Browser

© Mark Alexander Bain

Oct 21, 2008
Use Wget to Download Internet Files, Mark Alexander Bain
This article looks at how to download on-line data such as Yahoo! Finance's stock quotes without the need of a web browser - all through the power of Wget.

Everybody has downloaded something from the Internet at some time, whether it be:

  • some new software such as OpenOffice 3.0
  • stock quotes from somewhere such as Yahoo! Finance
  • a document from a company's intranet server

In each case the process will have been the same:

  • open up a web browser (such as Firefox or Internet Explorer)
  • enter the correct url (the web address)
  • download the file onto the user's pc

It's a process that works very well, but there is one drawback -it's interactive - meaning, of course, that the user must be present for it to work; and that's where Wget is useful - it allows non-interactive downloads to happen - and not a user in sight.

Obtaining Wget

Wget comes as part of the Linux operating system but not Windows; however, it can be downloaded from the GNU Wget web site at http://www.gnu.org/software.org.

Using Wget to Download a File from the Internet

It couldn't be easier to use Wget - it's just a matter of opening up a terminal window and using wget to download the required document from the Internet; for example the following command would download the current stock quoted for Microsoft from the Yahoo! Finance web site:

wget "http://download.finance.yahoo.com/d/quotes.csv?s=msft&f=sl1&e=.csv"

The user will immediately see the download in progress:

--10:38:55-- http://download.finance.yahoo.com/d/quotes.csv?s=msft&f=sl1&e=.csv
=> `quotes.csv?s=msft&f=sl1&e=.csv.1'
Resolving download.finance.yahoo.com... 76.13.114.90
Connecting to download.finance.yahoo.com|76.13.114.90|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
[ <=> ] 14 --.--K/s
10:38:55 (724.26 KB/s) - `quotes.csv?s=msft&f=sl1&e=.csv.1' saved [14]

There are a couple of small issues at this point:

quotes.csv?s=msft&f=sl1&e=.csv
  • the user probably isn't interesting in the complicated output from running Wget

Fortunately the solution is simple - to run Wget in quiet mode and sending the output to a named file:

wget -q -O quotes.csv "http://download.finance.yahoo.com/d/quotes.csv?s=msft&f=sl1&e=.csv"

This time the resultant file will be named 'quotes.csv', and there will be no output from Wget, just a new file:

$ cat quotes.csv
"MSFT",24.72

When Downloads go Bad: Coping with Rejection

There are any number of reasons why a download may fail, for example:

  • there may be congestion on the server or on the network
  • there may be a problem with the network connection, for instance the cat pulling out the Ethernet cable

Wget has a few options that may help in such situations:

  • the number of tries - by default Wget will retry up to 20 times; however, this can be overridden using the -t option:
    wget -t 50 -q -O novell.csv "http://download.finance.yahoo.com/d/quotes.csv?s=novl&f=sl1&e=.csv"
    Here Wget will retry 50 times (and to retry an infinite number of times t should be set to 0)
  • wait between retries - if an attempt to download a file fails then Wget will immediately retry, which may cause a heavier load on the network. To help reduce this load the waitretry option may be used - this adds a delay (in seconds) between retries, for example a delay of 10 second can be added:
    wget --waitretry=10 -q -O novell.csv "http://download.finance.yahoo.com/d/quotes.csv?s=novl&f=sl1&e=.csv"

Finally - if all else fails and only a partial download has occurred then it can be restarted from the point at which it failed:

wget -c -O redhat.csv "http://download.finance.yahoo.com/d/quotes.csv?s=rht&f=sl1&e=.csv"

Downloading a Whole Web Site

As well as downloading individual files Wget also be used to download directories, or even whole web sites; and that's done by using the recursive option to mirror the data:

wget -r http://<any old web site>.com

Conclusion

Wget is very useful and, as this article has shown, very easy to use; however, there is one word of warning: don't try downloading the whole of the Internet - that needs an awful lot of disk space.


The copyright of the article Wget: Downloading from the Internet in Command Line Programming is owned by Mark Alexander Bain. Permission to republish Wget: Downloading from the Internet in print or online must be granted by the author in writing.


Use Wget to Download Internet Files, Mark Alexander Bain
Wget is Verbose by Default, Mark Alexander Bain
     


Post this Article to facebook Add this Article to del.icio.us! Digg this Article furl this Article Add this Article to Reddit Add this Article to Technorati Add this Article to Newsvine Add this Article to Windows Live Add this Article to Yahoo Add this Article to StumbleUpon Add this Article to BlinkLists Add this Article to Spurl Add this Article to Google Add this Article to Ask Add this Article to Squidoo