Wget is a command-line program that allows you to retrieve files via HTTP or FTP from a UNIX prompt. People who are familiar with UNIX or Linux often wonder how to use wget in Perl. The simple answer is -- don't! OK, if you really want to use wget in Perl, you can always execute it like any other command-line program and capture the output.
For example the following program executes a simple UNIX command, captures the output and displays it.
However, it's better to bypass wget altogether and use the Perl package LWP.
This program retrieves an HTML page using LWP::Simple.
Retrieved 84294 bytes of data.
Much cleaner and more portable than executing wget on the command line!
If you want to do anything more complicated than simply fetch an HTML file, you might want to go to the extra trouble of creating an LWP user agent object. The "user agent" is really a virtual browser and allows you to accept cookies or pretend to be Internet Explorer, etc.
Here's a simple example that does the same thing as the code above, but contains better error checking.
Retrieved 84245 bytes of data.
If we mangle the URL, we get a message like this:
Error: 500 Can't connect to news.bbc.co.ukd:80 (Bad hostname 'news.bbc.co.ukd')
A lot of sites won't let you in unless you accept cookies and appear to be a real browser. This is especially true of sites that require you to login somehow. Let's take a look at a program that accepts website cookies and masquerades as a real browser. By default the cookies are discarded when the program finishes running, but just for kicks we'll save them in a file called "cookies.txt" (note: you could specify a full file path if you wanted).
We'll also save the retrieved HTML in a file called "save.html".
Saved 83938 bytes of data to 'save.html'.
LWP can also handle ftp and other protocols. Check out the CPAN documentation for more information.