From Jim Gilliam's blog

Crawling Inefficiencies

April 18, 2003 02:07 AM

Thanks to Microdoc News for pointing out an error in my post yesterday about Grub's distributed crawling requiring twice as much bandwidth. The local client is able to identify if a page has changed, and not send that unchanged data back to Grub. That is definitely better bandwidth utilization than I expected. I assume it must be using a checksum of the file to determine that it's the same, otherwise it would have to send its copy of the file to the client.

More from the archive in Emergence, Search.

A Watershed Moment (09.10.2004)
Phonebanking circa 2004 (08.31.2004)
Passion beats Money (08.19.2004)

Next Entry: Ashcroft's New Ally (04.18.2003)
Previous Entry: Trading with the Enemy (04.17.2003)

Jim Gilliam
Jim Gilliam
Learn more about me or read my blog. For the latest on my lung transplant situation, check on jim.

Email:







Add to My Yahoo!

Last week's soundtrack:

jgilliam's Last.fm Weekly Artists Chart

Iraq for Sale - The War Profiteers