From Jim Gilliam's blog archives
Crawling Inefficiencies

April 18, 2003 2:07 AM

Thanks to Microdoc News for pointing out an error in my post yesterday about Grub's distributed crawling requiring twice as much bandwidth. The local client is able to identify if a page has changed, and not send that unchanged data back to Grub. That is definitely better bandwidth utilization than I expected. I assume it must be using a checksum of the file to determine that it's the same, otherwise it would have to send its copy of the file to the client.

More from the archive in Emergence, Search.

Crawling Inefficiencies (04.18.2003)

Next Entry: Ashcroft's New Ally (04.18.2003)
Previous Entry: Trading with the Enemy (04.17.2003)

Jim Gilliam
Jim Gilliam

Email:







Add to My Yahoo!

Last week's soundtrack:

jgilliam's Last.fm Weekly Artists Chart