Archive for October, 2005

PHP Screenscraping Using Curl

// October 22nd, 2005 // 18 Comments » // Technology Bits

Php-1Tim was looking to try something new, so I decided to introduce him to Client URL (CURL) functions. As the example at hand, we looked at hitting the USPS site to lookup city and states based on ZIP code.

For the uninitiated, CURL basically lets you programmatically simulate a user browsing a web site. You can POST, GET, PUT, maintain cookies and session information. In the following example we are using a technique called "screen scraping" which is rarely recommended, but a good skill to have because sometimes its the only solution.

The reason its bad is because it is extremely fragile. If a webmaster of the site you are accessing makes even a slight change, it could break your page parsing. The other reason to shy away from this is some web sites really don't like when you do this. As a rule, if the webmaster of the site you are scraping contacts you and wants you to stop, you should, immediately. Though you should also recommend they provide the info you are scraping as a service through something like REST or SOAP. It would be very Web 2.0 of them to comply, it's worth a shot.

Anyway, check out this example code, it's kinda fun.

PHP:
  1. <?php
  2.  
  3. $ch = curl_init();
  4. curl_setopt($ch, CURLOPT_URL, "http://zip4.usps.com/zip4/zcl_3_results.jsp");
  5. curl_setopt ($ch, CURLOPT_POST, 1);
  6. curl_setopt ($ch, CURLOPT_POSTFIELDS, "zip5=".$_GET['zip']);
  7.  
  8. $data = curl_exec($ch);
  9. $string = ob_get_contents();
  10.  
  11.  
  12. list(,$second) = explode('Actual City name', $string);
  13. list($first) = explode('images/spacer.gif', $second);
  14. $junk = explode(“\n”,$first);
  15.  
  16. list($city,$state) = explode(', ',trim(strip_tags($junk[6])));
  17.  
  18. $city = ucwords(strtolower($city));
  19.  
  20. print $city.','.$state;
  21. ?>

web20, web2.0, code, programming, screen scraping, php, soap, rest, curl, usps, address information, zip code, zip, zip codes

Hot Buttered Democracy

// October 21st, 2005 // No Comments » // My Stuff


Emmons took this picture of a sign in a store window in Concord, NH. I don't understand what the deal is, but I love it. Bring on the hot buttered democracy!
hot buttered democracy, democracy, hot buttered, buttered, concord, nh, new hampshire

Writely Does It Right

// October 18th, 2005 // 7 Comments » // Technology Bits

I recently discovered a web based document or content editor called WritelyMatt at Bork Web tried it out and gave me his opinions on it so I decided to take a stab at it as well. It is a Web 2.0 application that intends to replace Microsoft Word.  I consder this similar to the way Gmail completely changed what we expect out of web-based email clients.  Now I want all content editing to work just like Writely.  I'm actually writing this blog post in Writely, since they have the ability to post content directly to a blog using the standard blogging APIs.  If this works out, it will certainly replace all reliance I had on Ecto.

OK, for testing purposes I'm going to try a couple text styles out just to see how they post through.

List:

  • element 1
  • element 2
    • sub element
  • element 3

Some boldness and a bit of italics, followed by a touch of underlined text.

 

I also placed the image in the top left with the WYSIWYG too. 

web20, web 2.0, web 20, writely, web applications, word processor, word processing, word, microsoft, microsoft word, outlook, gmail, google

Karl Rove Beat Up By Little Girl

// October 17th, 2005 // 1 Comment » // My Stuff

This is apparently old news, but I just heard it on NPR's Wait Wait... Don't Tell Me this weekend.



Karl Rove was quoted as saying this:

As long as I can remember, I’ve always loved politics, at the age of 9 I put a Nixon bumper sticker on the wire basket in the front of my bicycle. Unfortunately the little Catholic girl down the street was a couple years and about 20 pounds on me. She was for Kennedy.

When she saw me on my bike with my bumper sticker for Nixon, she put me on the ground, flattened me out and gave me a bloody nose,

Despite that beating I never lost interest in politics.

I think this explains the origin of his dirty politics and strong commitment to the Republican Party. It's too bad a Democrat is responsible for establishing these convictions, but I do wish that school girl would pay him another visit.
karl rove, rove, republican, democrat, politics, school girl, catholic school girl, political, Nixon

Web 2.0 and the Long Tail

// October 9th, 2005 // 4 Comments » // Technology Bits


Casey started spouting to me weeks ago about "The Long Tail", and it's significance in our changing online world. I read a bit, was admittedly intrigued, but haven't thought too much about it until recently.

After noticing Web 2.0 being listed as a top search term on Technorati for the last few weeks I began looking into what this term actually means. Apparently Tim O'Reilly and others have begun talking about the next phase of the Internet. Where are we going, what is successful, and why? Obviously Google is a key part of all of this, but I was somewhat surprised to find discussion also tying back to the long tail.

At this point I realized I needed to know a bit more about this now too, so stumbled into Chris Anderson's blog, longtail.typepad.com. Anderson discusses at great length Legos, digital music, BitTorrent, software and more. He cites intriguing examples of and detail about how companies are taking advantage of the long tail to make money in ways no one could ever imagine without the Internet and it's grand reach. Go read his site, learn about this, it is very important.

Not surprisingly, as I began to grasp his concepts I was led right back to O'Reilly and conveniently his "What is Web 2.0?" article. This is also a must read. Let me quickly quote a brainstorm list of services that have transitioned from Web 1.0 to Web 2.0:

Web 1.0 Web 2.0
DoubleClick --> Google AdSense
Ofoto --> Flickr
Akamai --> BitTorrent
mp3.com --> Napster
Britannica Online --> Wikipedia
personal websites --> blogging
evite --> upcoming.org and EVDB
domain name speculation --> search engine optimization
page views --> cost per click
screen scraping --> web services
publishing --> participation
content management systems --> wikis
directories (taxonomy) --> tagging ("folksonomy")
stickiness --> syndication

Somehow that list really made the Web 2.0 concepts become much clearer. Reading deeper I came upon a comparison that was most clear to me, an ex-advertising programmer:

Overture and Google's success came from an understanding of what Chris Anderson refers to as "the long tail," the collective power of the small sites that make up the bulk of the web's content. DoubleClick's offerings require a formal sales contract, limiting their market to the few thousand largest web sites. Overture and Google figured out how to enable ad placement on virtually any web page. What's more, they eschewed publisher/ad-agency friendly advertising formats such as banner ads and popups in favor of minimally intrusive, context-sensitive, consumer-friendly text advertising.

This was a fantastic hell yeah moment for me, linking back long tail notions to these Web 2.0 ideas.

The next major point he makes is "Harnessing Collective Intelligence" which I compare to the infinite monkey theorem and is my reason why people should blog. He cites the success of Wikipedia, Flikr, open source web development, and more.

A point I find myself arguing at work constantly but didn't have words for yet is also addressed: "End of the Software Release Cycle". Read that section, for software developers this is a complete change in mind set, but more importantly, trainers and managers are going to need to get on board or fear missing the boat.

There is a ton of detail to absorb here, I could keep pulling at my favorite gems, but I'll let you read up yourself. What I will do however is leave you with this final list of core Web 2.0 competencies from the article:

- Services, not packaged software, with cost-effective scalability
- Control over unique, hard-to-recreate data sources that get richer as more people use them
- Trusting users as co-developers
- Harnessing collective intelligence
- Leveraging the long tail through customer self-service
- Software above the level of a single device
- Lightweight user interfaces, development models, AND business models

software development, web development, web, web 2.0, napster, wikipedia, Tim O'Reilly, Chris Anderson, O'Reilly, Anderson, collective intelligence, flickr, long tail, self service, scalability, blog

Buffy – The Chosen Collection

// October 9th, 2005 // 7 Comments » // My Stuff

After getting hooked on Firefly and Serenity, I decided to plunge into Joss Whedon's other TV series'. It took me until halfway through the second season of Buffy to get hooked, but now I love the show. Additionally, I moved into the Angel series and that rules as well.

I mention all this specifically because I just discovered today that the complete Buffy series is being repackaged and released in a 40 DVD set call "The Chosen Collection." Retailing at a mere $129.99 on Amazon, I might need to pick this sweet item up.

In fact, anyone who has any seasons currently, might be wise to dump them on Ebay, and use the money to buy this collection.

Check out some more images of the set on TVShowsonDVD.com
joss,joss whedon, whedon, dvd, television, firefly, serenity, buffy the vampire slayer, buffy, vampire, angel, sarah michelle geller, tv, tv series, special collection

Aardman Rules Claymation

// October 9th, 2005 // 2 Comments » // My Stuff

I just saw Wallace and Gromit: Curse of the Were-Rabbit and it was just as great as I hoped it would be. Aardman is the animation studio responsible for the more famous Chicken Run movie as well as a series of Wallace and Gromit shorts dating back to the 80's.

If you like any of their previous work, go see this movie. It's more of the same, and that was exactly what I was looking for.


aardman, animation, claymation, wallace and gromit, chicken run, wallace, gromit, curse of the were-rabbit, movie, movies, entertainment, shorts

PHVsPjxsaT48c3Ryb25nPndvb19hYm91dDwvc3Ryb25nPiAtIENvbWljIGJvb2sgZ3V5LCB0ZWNoIGdlZWssIGFuZCBmYXRoZXIgb2YgdHdvLi4uPC9saT48bGk+PHN0cm9uZz53b29fYWJvdXRsaW5rPC9zdHJvbmc+IC0gIzwvbGk+PGxpPjxzdHJvbmc+d29vX2Fkc19yb3RhdGU8L3N0cm9uZz4gLSB0cnVlPC9saT48bGk+PHN0cm9uZz53b29fYWRfaW1hZ2VfMTwvc3Ryb25nPiAtIGh0dHA6Ly93d3cud29vdGhlbWVzLmNvbS9hZHMvd29vdGhlbWVzLTEyNXgxMjUtMS5naWY8L2xpPjxsaT48c3Ryb25nPndvb19hZF9pbWFnZV8yPC9zdHJvbmc+IC0gaHR0cDovL3d3dy53b290aGVtZXMuY29tL2Fkcy93b290aGVtZXMtMTI1eDEyNS0yLmdpZjwvbGk+PGxpPjxzdHJvbmc+d29vX2FkX2ltYWdlXzM8L3N0cm9uZz4gLSBodHRwOi8vd3d3Lndvb3RoZW1lcy5jb20vYWRzL3dvb3RoZW1lcy0xMjV4MTI1LTMuZ2lmPC9saT48bGk+PHN0cm9uZz53b29fYWRfaW1hZ2VfNDwvc3Ryb25nPiAtIGh0dHA6Ly93d3cud29vdGhlbWVzLmNvbS9hZHMvd29vdGhlbWVzLTEyNXgxMjUtNC5naWY8L2xpPjxsaT48c3Ryb25nPndvb19hZF91cmxfMTwvc3Ryb25nPiAtIGh0dHA6Ly93d3cud29vdGhlbWVzLmNvbTwvbGk+PGxpPjxzdHJvbmc+d29vX2FkX3VybF8yPC9zdHJvbmc+IC0gaHR0cDovL3d3dy53b290aGVtZXMuY29tPC9saT48bGk+PHN0cm9uZz53b29fYWRfdXJsXzM8L3N0cm9uZz4gLSBodHRwOi8vd3d3Lndvb3RoZW1lcy5jb208L2xpPjxsaT48c3Ryb25nPndvb19hZF91cmxfNDwvc3Ryb25nPiAtIGh0dHA6Ly93d3cud29vdGhlbWVzLmNvbTwvbGk+PGxpPjxzdHJvbmc+d29vX2FsdF9zdHlsZXNoZWV0PC9zdHJvbmc+IC0gZ3JheS5jc3M8L2xpPjxsaT48c3Ryb25nPndvb19jdXN0b21fY3NzPC9zdHJvbmc+IC0gPC9saT48bGk+PHN0cm9uZz53b29fY3VzdG9tX2Zhdmljb248L3N0cm9uZz4gLSA8L2xpPjxsaT48c3Ryb25nPndvb19mZWVkYnVybmVyX3VybDwvc3Ryb25nPiAtIGh0dHA6Ly9mZWVkczIuZmVlZGJ1cm5lci5jb20vbm9zaGVlcDwvbGk+PGxpPjxzdHJvbmc+d29vX2dvb2dsZV9hbmFseXRpY3M8L3N0cm9uZz4gLSA8c2NyaXB0IHR5cGU9InRleHQvamF2YXNjcmlwdCI+DQp2YXIgZ2FKc0hvc3QgPSAoKCJodHRwczoiID09IGRvY3VtZW50LmxvY2F0aW9uLnByb3RvY29sKSA/ICJodHRwczovL3NzbC4iIDogImh0dHA6Ly93d3cuIik7DQpkb2N1bWVudC53cml0ZSh1bmVzY2FwZSgiJTNDc2NyaXB0IHNyYz0nIiArIGdhSnNIb3N0ICsgImdvb2dsZS1hbmFseXRpY3MuY29tL2dhLmpzJyB0eXBlPSd0ZXh0L2phdmFzY3JpcHQnJTNFJTNDL3NjcmlwdCUzRSIpKTsNCjwvc2NyaXB0Pg0KPHNjcmlwdCB0eXBlPSJ0ZXh0L2phdmFzY3JpcHQiPg0KdmFyIHBhZ2VUcmFja2VyID0gX2dhdC5fZ2V0VHJhY2tlcigiVUEtODI3MjAtMSIpOw0KcGFnZVRyYWNrZXIuX3RyYWNrUGFnZXZpZXcoKTsNCjwvc2NyaXB0PjwvbGk+PGxpPjxzdHJvbmc+d29vX2hvbWU8L3N0cm9uZz4gLSB0cnVlPC9saT48bGk+PHN0cm9uZz53b29faG9tZV9hcmNoaXZlczwvc3Ryb25nPiAtIGh0dHA6Ly9ub3NoZWVwLm5ldC9hcmNoaXZlcy88L2xpPjxsaT48c3Ryb25nPndvb19ob21lX2ZsaWNrcl9jb3VudDwvc3Ryb25nPiAtIDEwPC9saT48bGk+PHN0cm9uZz53b29faG9tZV9mbGlja3JfdXJsPC9zdHJvbmc+IC0gaHR0cDovL3d3dy5mbGlja3IuY29tL3Bob3Rvcy90aXJyZWxsLzwvbGk+PGxpPjxzdHJvbmc+d29vX2hvbWVfZmxpY2tyX3VzZXI8L3N0cm9uZz4gLSA2MDg2MzE1NUBOMDA8L2xpPjxsaT48c3Ryb25nPndvb19ob21lX2xpZmVzdHJlYW08L3N0cm9uZz4gLSAxMDwvbGk+PGxpPjxzdHJvbmc+d29vX2hvbWVfcG9zdHM8L3N0cm9uZz4gLSA1PC9saT48bGk+PHN0cm9uZz53b29fbG9nbzwvc3Ryb25nPiAtIGh0dHA6Ly9ub3NoZWVwLm5ldC93cC1jb250ZW50L3dvb191cGxvYWRzLzMtbG9nby5wbmc8L2xpPjxsaT48c3Ryb25nPndvb19tYWlucmlnaHQ8L3N0cm9uZz4gLSBmYWxzZTwvbGk+PGxpPjxzdHJvbmc+d29vX21hbnVhbDwvc3Ryb25nPiAtIGh0dHA6Ly93d3cud29vdGhlbWVzLmNvbS9zdXBwb3J0L3RoZW1lLWRvY3VtZW50YXRpb24vaXJyZXNpc3RpYmxlLzwvbGk+PGxpPjxzdHJvbmc+d29vX25hdjwvc3Ryb25nPiAtIGZhbHNlPC9saT48bGk+PHN0cm9uZz53b29fc2hvcnRuYW1lPC9zdHJvbmc+IC0gd29vPC9saT48bGk+PHN0cm9uZz53b29fdGFiczwvc3Ryb25nPiAtIGZhbHNlPC9saT48bGk+PHN0cm9uZz53b29fdGhlbWVuYW1lPC9zdHJvbmc+IC0gSXJyZXNpc3RpYmxlPC9saT48bGk+PHN0cm9uZz53b29fdXBsb2Fkczwvc3Ryb25nPiAtIGh0dHA6Ly9ub3NoZWVwLm5ldC93cC1jb250ZW50L3dvb191cGxvYWRzLzMtbG9nby5wbmc8L2xpPjxsaT48c3Ryb25nPndvb192aWRlbzwvc3Ryb25nPiAtIGZhbHNlPC9saT48L3VsPg==