-- 作者:admin
-- 发布时间:3/14/2007 9:51:00 AM
-- [ontolog-forum] Using Wikipedia as a Folksonomy [转帖]
"John F. Sowa" <sowa@bestweb.net> to [ontolog-forum] show details 5:02 am (4 hours ago) The Wikipedia is currently the largest informally defined result of collaborative tagging. Many people have criticized it for its lack of supervision and uneven quality of many of the articles. Yet it does serve as a convenient body of texts that have been classified informally -- over 400 million words grouped in over one million articles. The title of each article is a tag that classifies the article. Following is an article about using Wikipedia as a resource of tagged articles. It contains over 400 million words grouped in over one million articles. The title of each article is a tag that classifies the article. http://www.cs.technion.ac.il/~shaulm/papers/pdf/Gabrilovich-Markovitch-ijcai2007.pdf Computing Semantic Relatedness using Wikipedia-based Semantic Analysis This illustrates the kind of work that can be done with such resources. John Sowa ---------------------------------------------------------------- Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis Evgeniy Gabrilovich and Shaul Markovitch Department of Computer Science Technion—Israel Institute of Technology, 32000 Haifa, Israel Abstract Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia. We use machine learning techniques to explicitly represent the meaning of any text as a weighted vector of Wikipedia-based concepts. Assessing the relatedness of texts in this space amounts to comparing the corresponding vectors using conventional metrics (e.g., cosine). Compared with the previous state of the art, using ESA results in substantial improvements in correlation of computed relatedness scores with human judgments: from r = 0:56 to 0:75 for individual words and from r = 0:60 to 0:72 for texts. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.
|