Determining semantic similarity of two sets of words that describe two entities is an important problem in web mining (search and recommendation systems), targeted advertisement and domains that need semantic content matching. Traditional Information Retrieval approaches even when extended to include semantics by performing the similarity comparison on concepts instead of words/terms, may not always determine the right matches when there is no direct overlap in the exact concepts that represent the semantics. As the entity descriptions are treated as self-contained units, the relationships that are not explicit in the entity descriptions are usually ignored. We extend this notion of semantic similarity to consider inherent relationships between concepts using ontologies. We propose simple metrics for computing semantic similarity using spreading activation networks with multiple mechanisms for activation (set based spreading and graph based spreading) and concept matching (using bipartite graphs). We evaluate these metrics in the context of matching two user profiles to determine overlapping interests between users. Our similarity computation results show an improvement in accuracy over other approaches, when compared with human-computed similarity. Although the techniques presented here are used to compute similarity between two user profiles, these are applicable to any content matching scenario.
Monday, July 07, 2008
Computing Semantic Similarity Using Ontologies by Rajesh Thiagarajan, Geetha Manjunath, and Markus Stumptner is a new HP Lab Report.