Tuesday, July 22, 2008

Visualizing Text with Wordle

This morning I was inspired to do a bit of word cloud exploration ...

I was prompted by a general tweet from Kevin Lim about "making beautiful word clouds using wordle.net. He had constructed a word cloud based on his del.icio.us entries ...

2691588906_4e4c41e949.jpg

Kevin, in turn, was prompted by Henry Farrell's posting of his text cloud based on his book “The Political Economy of Trust: Institutions, Interests and Inter-Firm Cooperation” ...

wordle-crookedtimber-sm.jpg

Henry was reacting to Steven Poole's posting of the word cloud based on "Unspeak" ...

wordle-uwb.jpg

So, what does it all mean?

At the simplest level, a "word cloud" ("text cloud", "tag cloud" in the case of a blog) displays words whose font size is proportional to that word's frequency of occurrence in the analyzed text. It's a graphical concordance.

The placement of the graphemes, however, is not so rigid. Some clouds display the words according to size (boring :)), some alphabetically (not as boring), and some use algorithms that are not always published (interesting, since one is left to guess the algorithm).

The process can be made more analytical and comparative ... the IBM's Many Eyes project can compare two text clouds (recent examples were campaign speaches by Hilary Clinton and Barack Obama ... expect more of this genre this fall).


Henry Farrell, Wordle, http://crookedtimber.org/2008/07/21/wordle/

Kevin Lim, Wordle: Make beautiful textclouds…, http://theory.isthereason.com/?p=2285

Many Eyes, http://services.alphaworks.ibm.com/manyeyes/home

Steven Pool, Can Freedom Possible, http://unspeak.net/can-freedom-possible/

Wordle, http://wordle.net/

Monday, July 21, 2008

Understanding the Technologies behind Google ranking

In a post to the googgleblog, Official Google Blog: Technologies behind Google ranking, Amit Singhai, Google Fellow, writes:

As part of our effort to discuss search quality, I want to tell you more about the technologies behind our ranking. The core technology in our ranking system comes from the academic field of Information Retrieval (IR). The IR community has studied search for almost 50 years. It uses statistical signals of word salience, like word frequency, to rank pages. (See "Modern Information Retrieval: A Brief Overview" for a quick overview of IR technology.) IR gave us a solid foundation, and we have built a tremendous system on top using links, page structure, and many other such innovations.

Search in the last decade has moved from give me what I said to give me what I want. User expectations from search have rightly increased. We work hard to fulfill the expectations of each and every user, and to do that we need to better understand the pages, the queries, and our users. Over the last decade we have pushed the technologies for understanding these three components (of the search process) to completely new dimensions.

Official continue reading ...