PHP Screenscraping Using Curl
October 22, 2005 | 12 Comments
Tim was looking to try something new, so I decided to introduce him to Client URL (CURL) functions. As the example at hand, we looked at hitting the USPS site to lookup city and states based on ZIP code.
For the uninitiated, CURL basically lets you programmatically simulate a user browsing a web site. You can POST, GET, PUT, maintain cookies and session information. In the following example we are using a technique called "screen scraping" which is rarely recommended, but a good skill to have because sometimes its the only solution.
The reason its bad is because it is extremely fragile. If a webmaster of the site you are accessing makes even a slight change, it could break your page parsing. The other reason to shy away from this is some web sites really don't like when you do this. As a rule, if the webmaster of the site you are scraping contacts you and wants you to stop, you should, immediately. Though you should also recommend they provide the info you are scraping as a service through something like REST or SOAP. It would be very Web 2.0 of them to comply, it's worth a shot.
Anyway, check out this example code, it's kinda fun.
-
<?php
-
-
$ch = curl_init();
-
curl_setopt($ch, CURLOPT_URL, "http://zip4.usps.com/zip4/zcl_3_results.jsp");
-
curl_setopt ($ch, CURLOPT_POST, 1);
-
curl_setopt ($ch, CURLOPT_POSTFIELDS, "zip5=".$_GET['zip']);
-
-
$data = curl_exec($ch);
-
-
-
-
-
-
?>
Tags: address information, code, curl, php, programming, REST, screen scraping, soap, usps, web20, zip, zip code, zip codes
