Web Site Crawler And Slowness
Posted by Robert on the 25th of January, 2005 at 12:36 AM GMT0. Permalink.Tags: Geek, PHP
The following is an advertisement.
I'm working on a web site crawler in PHP. Yeah, yeah, yeah. I already kinda wrote one. This one is better. The first was meant to only index my site and maybe a few others. I'm going to attempt to make this one a lot cooler. I've already got some path checking stuff so it crawls properly. This one uses fsockopen instead of fopen, as well. That way I can make HTTP custom headers. The crawler will download the site (archive to disk), validate certain types, or index it (which should eventually "figure out" what the page is about).
Limitations: I'm writing it in PHP, since I'm not willing to invest the time in a compiled language at the moment.
In other news, I noticed some slowness with the site. I thought it was the tracking stuff. It turns out that, as I had feared, the script to download titles from sites is occasionally VERY slow. So, I'm going to re-write that so that it queries for the title on preview and post (if needed) and inserts the UBB code into the post. That was something I wanted to avoid. I wanted all the user posts to be untouched in the database. I'll have to make a concession for this one.
Add this page to del.icio.us or email it.
Comments on this page are closed due to age. If you need to say something, e-mail me.