Wednesday, November 17, 2010

Week 11 Reading Notes


Web Search Engines (Parts 1 & 2)

I thought these articles were very interesting. Rarely does a day go by (particularly these days) where I don’t spend some time involved with a web search engine, whether it’s Google or not. With the advent of mobile technologies and the rapid pace of their development and improvement, I can’t imagine that the time we spend using search engines will lessen. Because of that, I think articles like this are invaluable because they explain in some depth the functionality and background of web search engines—they give use context and a vantage point from which to marvel at the technological achievements that surround us.

Current Developments and Future Trends for the OAI Protocol for Metadata Harvesing

There are several concepts that have emerged across classes this semester as key concepts. Among these, and not the least of which, is the idea of interoperability. In a world where individual users must access multiple systems simultaneously, it’s completely necessary for protocols like OAI-PMH to exist in order for such simultaneous access to be possible. These sorts of protocols are testaments to the organizational mastery of humans. In the relatively short period of time since issues like interoperability/interaccessibility were first raised, we’ve come so far towards permanent solutions.

Deep Web: Surfacing Hidden Value

I love when this kind of thing happens—only last night, Dr. Tomer was talking about the vertical nature of the current Pitt ULS website. It’s not exactly the same thing that’s talked about in this article, but it’s pretty close…

When the authors noted that the deep web accounts for more than 99% of all information in the internet (19Tb on the surface, 7,500Tb in the deep web), I had to wonder just what kind of information this is…  SEC filings? Medical files. Yes. But there’s so much more, too! I thought the list of the 60 largest deep we sites was really interesting, though it’s a shame that the NOAA link doesn’t work. I was also shocked to see mp3.com on the list.

3 comments:

  1. Current estimates on the deep web are around 91,000 terabytes. Quite a jump from the 2001 estimate in the Bergman paper, eh?

    ReplyDelete
  2. Sometimes the vastness of the Internet scares me. But then I realize how very dependent upon it I am (especially Google), and I become less scared.

    ReplyDelete