PHP Screenscraping Using Curl
October 22, 2005
Tim was looking to try something new, so I decided to introduce him to Client URL (CURL) functions. As the example at hand, we looked at hitting the USPS site to lookup city and states based on ZIP code.
For the uninitiated, CURL basically lets you programmatically simulate a user browsing a web site. You can POST, GET, PUT, maintain cookies and session information. In the following example we are using a technique called "screen scraping" which is rarely recommended, but a good skill to have because sometimes its the only solution.
The reason its bad is because it is extremely fragile. If a webmaster of the site you are accessing makes even a slight change, it could break your page parsing. The other reason to shy away from this is some web sites really don't like when you do this. As a rule, if the webmaster of the site you are scraping contacts you and wants you to stop, you should, immediately. Though you should also recommend they provide the info you are scraping as a service through something like REST or SOAP. It would be very Web 2.0 of them to comply, it's worth a shot.
Anyway, check out this example code, it's kinda fun.
-
<?php
-
-
$ch = curl_init();
-
curl_setopt($ch, CURLOPT_URL, "http://zip4.usps.com/zip4/zcl_3_results.jsp");
-
curl_setopt ($ch, CURLOPT_POST, 1);
-
curl_setopt ($ch, CURLOPT_POSTFIELDS, "zip5=".$_GET['zip']);
-
-
$data = curl_exec($ch);
-
-
-
-
-
-
?>
Tags: address information, code, curl, php, programming, REST, screen scraping, soap, usps, web20, zip, zip code, zip codes
Comments
12 Responses to “PHP Screenscraping Using Curl”
Got something to say?

good job with the most incomplete piece of code ive ever seen.
This is purely an example of how one might use cURL and output bufferring. There is certainly no error checking or particularly useful output… but this is not a chunk of code anyone would drop in and use somewhere. It’s an example… From this example you should be able to extrapolate your own useful application, api, or what have you.
If there is some other incompleteness beyond it being short with no error checking, please enlighten people with your insight. I’d love constructive discourse on this. I’d love someone to show me other ways to accomplish this, or better uses of output buffering and/or cURL.
incomplete or not, it was handy to see a decent exmaple. thanks.
Thanks georges, that was my intent…
hi,
How to maintain the session in curl in php.
Nice example… pretty black and white. Time to scrape some pages!
gveau
Hi,
Nice Example, keep writing,
Jim
http://www.tatvasoft.com
And if you want to do the same in 5 min without writing a single line of code… have a look at http://www.openkapow.com
Thanks a lot for the excellent example! Great job!
I can’t get it to functioning - is the code not functioning any more?
/Tine