Thursday, September 23, 2010

Week 4 Reading Notes (Representation and Storage)

As some of you have seen, I made the mistake of posting notes for Week 5... Oops. Here are my notes for Week 4.

Wikipedia article on Compression:

Data compression is a fascinating topic for me. I think it is particularly relevant in the context of some discussions that have occurred in and around LIS2000 regarding the digitization of printed materials. Many of us wonder at the effects that the digitization of books will have on the reader's ability to interact with them, but we've already got a good model for predicting certain effects: .mp3 representation of recorded artifacts. If you go to youtube and search for your favorite song, there's a good chance you'll come up with a file that has a fair amount of 'glassiness' to it, which sounds like a flanger and is usually most noticeable on hihats/cymbals and soprano/alto/high-bari backing vocals. (You usually won't run into this if you watch the official music video for a track, but it's very common among videos or tracks that were posted by amateur enthusiasts.) Near as I can tell, this doesn't bother most people; however, it's just about my biggest pet-peeve. The process of digitization and the use of lossy compression to reduce file size risks the quality of the artifact being digitized. If such quality loss is commonplace and often acceptable in digital music files, it makes sense to me that similar/analogous quality loss will be present in digital print material once it reaches the popularity that audio .mp3s already have... That worries me. To the bibliophile, part of the beauty of the reading experience might have much to do with the intricacies of an antique font, or with the fineness of the paper/binding, or even simply the feel of the page on his/her fingers-- much in the same way that the audiophile might find the most pleasure in a listening experience that is born of a four-tube, analog compressor (different than the type of compression we're talking about in this class) complete with subtle tape hiss in the high midrange and the soft anomalous transients that only occur with analog equipment. The audiophile can still listen to music on .mp3s, and the bibliophile can still read books, but the experience is qualitatively (and possible significantly) changed. It might be more convenient, it might even be necessary, but it's just not as good.

On another note, I found the numeric representation of lossy vs. lossless compression to be a very helpful visual aid, but one that is difficult to envision in terms of more complicated multimedia materials. I also found it particularly interesting to think that the digital representation of an artifact that does not show a pattern cannot be compressed. After having read that, it made perfect sense, but it's not something I'd thought of before.

Data Compression Basics Article:

I think the best point made in this article is that lossy compression preserves information, but not data. In order for this point to make sense, one must assume that the people viewing/listening to an uncompressed file will all necessarily get the same information from it. It's true that the range of human perception can be well represented by a bell curve, with most all of our perception abilities accounted for within the first two standard deviations from the mean. But, from a somewhat pedantic, purist viewpoint, it may well be the case that for some with unusual sensitivity, the most relevant/meaningful information might be conveyed by data lost during the compression process.

Digitizing Pittsburgh:

I think this is an example of a digitization project that was carried out very well. The images appear to be very high quality, and I wasn't surprised to read of the lengths the team went to in order to ensure reliability and interoperability among the differing institutions' metadata.

Youtube and Libraries:

The link on Courseweb wouldn't work for me, so I thought I'd post one that worked: here it is.

I thought the best part of this article is something it implies, as opposed to something it explicitly stated: new media and modes of media dissemination can play a vital role in the future of libraries and the way that patrons can interact with their library, be it a local public library, a university library, etc.. In order for the library as a concept to stay afloat, those of us working in and for them will need to keep abreast of social technologies in order to exploit them effectively.

Tuesday, September 21, 2010

Assignment 2: Flickr & Digitization

Here's the link to my photostream on Flickr.

my Flickr photostream

(Please forgive the blurriness of some of the photos. The resolution is good, I'm just a terrible photographer.)

Week 3 Muddiest Point

While I understand the concept of open source software, I'm not sure I get the economics of it. Does the survival of the open source practice simply rely on a sort of 'programmer altruism?' Are there any models that attempt to explain this phenomenon, or is it a notion akin to 'social authorship'?

Week 5 Reading Notes



Database article on Wikipedia

I found that this reading overlapped nicely with readings for LIS 2005 this past week. In fact, this Wikipedia article also fed into readings I’ve done for Music 2111 (Research and Bibliography). The overlap occurs here: breaking down database structure into external, conceptual and internal levels.

While I was working at a law firm in Chicago, I interacted with a database called Concordance on a daily basis. It was a bland program, a black screen with fields for information input. I seem to recall that there was red text every now and again, too. Up until this point, I’ve always thought of databases in those terms: mostly monochromatic, lifeless windows into a digital world. The database was simply a digital thing that existed solely on a computer. Now, I think I’m beginning to appreciate databases for what they are: physical collections of history (however mundane or forgettable) that exist somewhere beyond those lifeless windows.

In our reading for LIS 2005, we read an argument that databasing has been a central act in modern society from Proust to IMDB. We create databases of all those things in the world that mean something to us, that allow us to blanket our worlds with meaning. Perhaps it’s the conceptual level of databases that allows us to do this?

When discussing the three levels of databases (external, conceptual and internal), the Wikipedia article mentioned that accuracy is reduced for the sake of clarity—outliers are removed, and the database is pure. (As a side note, this also reminded me a lot of readings for MUSIC 2111, Research and Bibliography, in which it was said that the creation of an effective citation structure is necessarily a conceptual structure where the odd entries, those that don’t fit well in real practice are left out until the transition to physicality necessitates their inclusion.) I find that this is generally a common practice in Library Science, and indeed in any science that seeks to treat or explain large-scale phenomenon— superimposing generalized conceptualizations simply makes it easier to perceive of order.

(And now for something completely different.)

Does anyone have any examples of ‘post-relational database models?’ I’m having a hard time with this one…

Setting the Stage (metadata article)

Is anyone else as fascinated with the idea of user-created metadata, such as tags, as I am? I think it’s the democratization of the classification process that attracts me to it so much. There just seems to be such potential for organically, publicly derived classification systems for data! By allowing user-created metadata derived from an open-ended ability to apply adjectives to an object (say, an emotion-related adjective or a temporal adjective, e.g., ‘morning’ to an artifact that is about neither morning nor emotion directly) can potentially shed so much new light onto the ways that information users interact with artifacts of all types from books, to signs, to images and those things represented in images… This process could yield such a font of data for analysis!

Dublin Core Data Model article

What strikes me the most about this article and the DCDM idea is the linguistic barrier it potentially faces with regard to its ‘internationalization’ goal. Even with a drastically limited set of appropriate, agreed-upon modifiers, it seems likely that linguistic barriers will be met.

It’s an old-hat notion that different languages have different words with different connotations for similar concepts. (There is, for example, no word for ‘home’ in French—there is only the word ‘maison,’ which is the equivalent of ‘house.’ Similar concepts, but different connotations entirely.) Given situations like this, it seems that DCDM would require the use of an artificial language like Esperanto, or else would require acceptance of certain linguistic barriers that cannot be crossed short of widespread possession of multilingual capabilities on the part of catalogers and users.

Maybe this is shortsighted on my part? Too nitpicky?

Furthermore, it seems possible (though I’m not making this point as a whole-hearted supporter of it) that such homogenization of classification protocol could serve to diminish the cultural eccentricities we’ve all come to know, love and study as scholarly researchers. Again, maybe this is only an over-simplification on my part.

How do we establish universal classification schemas without overriding distinct cultural schemas? Is this question an over-reaction?