Wednesday, December 26, 2007

Clustering Tags

Simpson, Edwin has published HP technical report HPL-2007-190 Clustering Tags in Enterprise and Web Folksonomies
Recently there has been massive growth in the use of tags as a simple, flexible way to categorize resources. Tags are often used collaboratively to help share information using website; such as del.icio.us. However, the number of tags used in such a service is extremely large, so the unstructured nature of tags limits their value when navigating these websites, and prevents users from fully exploiting tags added by others. Clustering similar tags can improve this by adding structure. In this paper we discuss techniques for deriving tag similarity and explain two tag clustering algorithms. We applied the algorithms to two datasets containing tags provided by users with common interests. The first dataset is from a tagging service used by a small group of colleagues and the second is a public, web-based service. The paper examines the effectiveness of both clustering algorithms and their robustness to the different types of data, giving suggestions of possible ways to improve the algorithms.

1 comment:

jrochkind said...

Nice, that sounds pretty hugely useful.

Are the algorithms actually in the original paper, not kept secret/proprietary? If so, kudos to HP researchers, and certain library R&D efforts would be improved by doing that more often.