2007 7
Pinging vs Crawling and an open source search engine
Published by MartinVarsavsky.net in General with No Comments
I am sure that there´s tons of stuff written on the web about the pros and cons of pinging (notifications a la technorati) vs crawling (programs that scout the web for links a la google) or listening vs spying. Tonight we had dinner with Jimmy Wales the founder of Wikipedia in Madrid and we spoke about some of these. In general pinging beats crawling in everything but thoroughness. Crawling finds all there is to find on the net, pinging finds what wants to be found. Jimmy described to me a problem that I was not aware of and that is that ajax pages are hard to crawl. I commented on a problem that he was not aware of and that is that Google is the biggest or one of the biggest consumers of electricity in the world and that is among other things because crawling is incredibly energy inefficient compared to pinging. In any case what was extremely interesting is the concept of an open source search engine. I really hope that Jimmy and his open sourcers make this one work. One of the worst jobs at Google is probably policing results to make sure they are not hacked as the monetary incentive to hack google results is huge. Wouldn´t it be great to have a community police force rather than some paid employees? This problem is more manageable than the problem of people who tried to hack Wikipedia. If the Wikipedia community dealt successfully with article hacking, search optimization hacking should also be policed more effectively by a community than by a few paid individuals. Wisdom of the crowds at work in search. Intriguing. In the meantime I mentioned to Jimmy the little search engine that we put together at Fon called Unfolding News. This engine combines crawled sources with pinged sources that are all fresh.
Follow Martin Varsavsky on Twitter: twitter.com/martinvars
Related Posts
No Comments
Karl-Friedrich Lenz on November 7, 2007 ·
One might also note that Google’s practice of crawling everything without bothering to ask copyright holders for permission first is illegal under current European and Japanese copyright law; therefore pinging is really the only way to build a legal search engine.
Elliott on November 7, 2007 ·
Immediately after seeing this post I saw this article indicating that last year US data centers consumed 1.5% of the USA’s electric power and it doubled between 2001 and 2006 & is expected to double again in the next 5 years. Taming the Guzzlers That Power the World Wide Web”“, NY Times, Nov 7, 2007.
Martin Varsavsky on November 8, 2007 ·
Karl,
Can u give background information on how crawling is illegal in Europe and legal in USA?
Karl-Friedrich Lenz on November 9, 2007 ·
I don’t wanted to imply that Google is legal under American copyright; only the case is stronger under European and Japanese standards, where there is no chance to muddle the issue.
Google and other search engines copy other peoples’ content into their database.
For that to be legal they need either a license or an exception.
Opt-out search engines don’t bother to ask for a license, so they have none.
There is no exception or limitation to be found for search engines in the 2001 copyright directive. See my December 2006 blog post for details.
There is also none under Japanese law. That point is even clearer, since there are proposals now to introduce a search engine exception. That would be unneccessary if such an exception already existed. my January 2007 blog post for details.
anina.net on November 11, 2007 ·
http://www.unfoldingnews.com/search/fashion/
hi i tried your search engine and i am sorry to say i fail to see what these things have to do with fashion:
Barrington must confront teen drinking N (Providence Journal)
Va. GOP debates direction to take N (Richmond Times Dispatch)
Boise State QB leads No. 19 Broncos to 52-0 shutout of Utah St. N (KSL-TV)
Spain’s King Juan Carlos tells Hugo Chavez to ‘shut up’ during summit N (The Canadian Press)
Trinity vote exposes mistrust of Dallas City Hall N (Dallas Morning News)
Paw Prints: Studies reveal that cats can develop Alzheimer’s, too N (Terre Haute Tribune Star)
New York Paper Reports That L.A. Drug Ring Was Funneling Money to Hezbollah B ( Patterico’s Pontifications )
EDraw Max v3.3 Cracked-iNViSiBLE B (Releaselog | RLSLOG.net)
NpTech Tag Summary: New Tools, Old Cultures, Tis the Season, and Facebook Social Ads B (beth.typepad.com)
most of the things that your search turned up, save 2 links, one about cufflinks, and one about spring collection, had nothing to do with fashion. there were NOT ONE 360Fashion blog in there, and no style.com, no elle.com, nothing.
so, i think your search needs a little more work.
just my fashion perspective….
Leave a Comment
You must be logged in to post a comment.
SlightlyShadySEO on November 7, 2007 ·
I feel like you are underestimating the strength of the people trying to abuse this. Hacking wikipedia offers very little financial reward, which is why a community police force works.
Hacking a whole search engine has a potentially HUGE financial reward. $100k/week(or even day)+ easily for certain high traffic keywords (“home loans, “online gambling”, etc).
That would make people be infinitely more aggressive than they ever were with wikipedia.