PHP Screenscraping Using Curl

October 22, 2005

Php-1Tim was looking to try something new, so I decided to introduce him to Client URL (CURL) functions. As the example at hand, we looked at hitting the USPS site to lookup city and states based on ZIP code.

For the uninitiated, CURL basically lets you programmatically simulate a user browsing a web site. You can POST, GET, PUT, maintain cookies and session information. In the following example we are using a technique called "screen scraping" which is rarely recommended, but a good skill to have because sometimes its the only solution.

The reason its bad is because it is extremely fragile. If a webmaster of the site you are accessing makes even a slight change, it could break your page parsing. The other reason to shy away from this is some web sites really don't like when you do this. As a rule, if the webmaster of the site you are scraping contacts you and wants you to stop, you should, immediately. Though you should also recommend they provide the info you are scraping as a service through something like REST or SOAP. It would be very Web 2.0 of them to comply, it's worth a shot.

Anyway, check out this example code, it's kinda fun.

PHP:
  1. <?php
  2.  
  3. $ch = curl_init();
  4. curl_setopt($ch, CURLOPT_URL, "http://zip4.usps.com/zip4/zcl_3_results.jsp");
  5. curl_setopt ($ch, CURLOPT_POST, 1);
  6. curl_setopt ($ch, CURLOPT_POSTFIELDS, "zip5=".$_GET['zip']);
  7.  
  8. $data = curl_exec($ch);
  9. $string = ob_get_contents();
  10.  
  11.  
  12. list(,$second) = explode('Actual City name', $string);
  13. list($first) = explode('images/spacer.gif', $second);
  14. $junk = explode(“\n”,$first);
  15.  
  16. list($city,$state) = explode(', ',trim(strip_tags($junk[6])));
  17.  
  18. $city = ucwords(strtolower($city));
  19.  
  20. print $city.','.$state;
  21. ?>

web20, web2.0, code, programming, screen scraping, php, soap, rest, curl, usps, address information, zip code, zip, zip codes

Tags: , , , , , , , , , , , ,

Related:


Comments

12 Responses to “PHP Screenscraping Using Curl”

  1. brian on October 4th, 2006 11:17 am

    good job with the most incomplete piece of code ive ever seen.

  2. zbtirrell on October 6th, 2006 8:12 am

    This is purely an example of how one might use cURL and output bufferring. There is certainly no error checking or particularly useful output… but this is not a chunk of code anyone would drop in and use somewhere. It’s an example… From this example you should be able to extrapolate your own useful application, api, or what have you.

    If there is some other incompleteness beyond it being short with no error checking, please enlighten people with your insight. I’d love constructive discourse on this. I’d love someone to show me other ways to accomplish this, or better uses of output buffering and/or cURL.

  3. georges on October 16th, 2006 9:50 pm

    incomplete or not, it was handy to see a decent exmaple. thanks.

  4. zbtirrell on October 17th, 2006 8:42 am

    Thanks georges, that was my intent…

  5. Ashish Srivastava on November 8th, 2006 7:54 am

    hi,

    How to maintain the session in curl in php.

  6. Kumi Rauf on November 8th, 2006 11:43 pm

    Nice example… pretty black and white. Time to scrape some pages!

  7. bttrtalwgy on January 23rd, 2007 1:38 pm
  8. Software Outsourcing Company on February 9th, 2007 6:01 am

    Hi,

    Nice Example, keep writing,

    Jim
    http://www.tatvasoft.com

  9. madsh on February 12th, 2007 4:02 pm

    And if you want to do the same in 5 min without writing a single line of code… have a look at http://www.openkapow.com

  10. Lee on May 10th, 2007 8:19 pm

    Thanks a lot for the excellent example! Great job!

  11. Tine Müller on September 6th, 2007 4:12 am

    I can’t get it to functioning - is the code not functioning any more?

    /Tine

  12. Jane on November 22nd, 2007 6:14 pm

Got something to say?





User contributed tags: PHP SCRAPER (349) - php scrape (277) - php curl cookie (170) - php screen-scraping (159) - php curl cookies (121) - php screen scraper (116) - php screen scrape (107) - php screenscraping (94) - php curl image (93) - php curl scrape (92) - curl cookies (90) - php scraping (81) - screen scraping php (75) - curl spider (73) - curl examples (73) - php curl example (67) - curl scrape (66) - cUrl php cookie (66) - web scraping php (65) - php scrape website (65) - php page scraper (59) - PHP cURL images (58) - php curl spider (57) - curl cookie (57) - curl php cookies (52) - php site scraper (51) - php curl get image (49) - screenscraping php (47) - php web scraping (45) - curl screen scraping (45) - screen scraping using php (43) - php curl screen scrape (41) - screen scrape php (40) - screen scraper php (38) - scrape PHP (38) - curl php (37) - scraping php (37) - curl scraping (36) - curl cookies php (34) - curl screen scrape (34) - screen-scraping with php (33) - web scrape php (33) - "curl soap" (33) - curl google (32) - php curl scraping (30) - php page scrape (30) - screen scraping curl (29) - php curl examples (29) - curl cookie php (28) - Screenscraping with PHP (28) - php scrape curl (28) - php curl google (27) - curl scraper (27) - curl samples (27) - php curl screen scraping (26) - curl php scrape (25) - scraper php (25) - curl sample (25) - php curl soap (24) - web scraping with php (24) - screen scrape curl (23) - page scraping php (23) - PHP CURL (22) - php page scraping (22) - php curl get (21) - curl ob_start (21) - php scrape page (21) - php image scraper (20) - t (19) - site scraper PHP (19) - php screen scrape curl (18) - screen scraping with curl (18) - php scrape site (18) - curl images (18) - +"curl" +"php" +"example" (17) - curl example (17) - curl explode (17) - scraping using php (17) - using curl (16) - site scraping php (16) - website scraping php (15) - php scrape google (15) - curl image (14) - screen scraping in php (14) - php image scraping (14) - php site scraping (14) - scrape curl (13) - web scraping using php (13) - scraping with curl (13) - all (13) - scrape.php (12) - php website scraping (12) - php image scrape (12) - php scrape image (12) - php scraping curl (12) - curl image php (12) - php website scraper (12) - curl php spider (12) - screen scraping php curl (11) - php scrape webpage (11) - PHP scrapers (11) - curl images php (11) - curl GET image (11) - using PHP curl (11) - page scrape PHP (11) - php scrape images (11) - php scrape web page (11) - curl php session (11) - php 5 curl (11) - curl php images (10) - screen scrape php5 (10) - php web scraper (10) - php curl sample (10) - curl usps (10) - screenscraping in php (10) - curl php examples (10) - php page scrape session (9) - php5 screen scraping (9) - web scraper php (9) - +cookie +curl +php (9) - php curl scraper (9) - curl php scraping (9) - +scrape +website +php (9) - php curl "ob_start" (9) - curl screen scraper (9) - page scraping with php (9) - curl GET (9) - curl php image (9) - php5 curl examples (8) - php curl get images (8) - Screenscraping (8) - curl php get image (8) - using CURL PHP (8) - screen scraping using curl (8) - php curl screen scraper (8) - google scraper php (8) - curl and sessions (8) - scraping with php (8) - curl screenscraping (7) - web scraping in php (7) - PHP Curl usps (7) - curl scrape php (7) - php cookie curl (7) - php4 curl "screen scrape" (7) - curl soap php (7) - php curl samples (7) - web scrape curl (7) - php sample code curl (7) - curl php google (7) - how to scrape a website using php (7) -