Whoa. I got my brain back today after spending the last four days with some kind of African Sleeping Sickness. I am messing around with PHP's cURL functions and building agents that surf the web by themselves. I spent a few pathetic hours scratching around in regular expression land trying to extract all the links from any given website until I found the following one-liner. It will take the URLs of all the links in a string called $result and stuff them into an array called $arrayoflinks:
preg_match_all("|href=\"?([^"' >]+)|i", $result,$arrayoflinks);
I posted the notes from our July PHP users group meeting about writing spiders. It was one of our best meetings ever. There a bunch of extroverted geniuses there this time and I learned a lot.
5:44:38 PM
|