计算机科学论坛--显示贴子

以文本方式查看主题

-  计算机科学论坛  (http://bbs.xml.org.cn/index.asp)
--  『 Semantic Web(语义Web)/描述逻辑/本体』  (http://bbs.xml.org.cn/list.asp?boardid=2)
----  [ontolog-forum] Using Wikipedia as a Folksonomy   [转帖]  (http://bbs.xml.org.cn/dispbbs.asp?boardid=2&rootid=&id=43882)

--  作者：admin
--  发布时间：3/14/2007 9:51:00 AM

--  [ontolog-forum] Using Wikipedia as a Folksonomy   [转帖]
"John F. Sowa" <sowa@bestweb.net>  to [ontolog-forum]
show details  5:02 am (4 hours ago)

The Wikipedia is currently the largest informally defined result
of collaborative tagging. Many people have criticized it for its
lack of supervision and uneven quality of many of the articles.
Yet it does serve as a convenient body of texts that have been
classified informally -- over 400 million words grouped in
over one million articles. The title of each article is a
tag that classifies the article.

Following is an article about using Wikipedia as a resource of
tagged articles. It contains over 400 million words grouped
in over one million articles. The title of each article is
a tag that classifies the article.

http://www.cs.technion.ac.il/~shaulm/papers/pdf/Gabrilovich-Markovitch-ijcai2007.pdf
Computing Semantic Relatedness using Wikipedia-based Semantic Analysis

This illustrates the kind of work that can be done with
such resources.

John Sowa

----------------------------------------------------------------

Computing Semantic Relatedness using
Wikipedia-based Explicit Semantic Analysis

Evgeniy Gabrilovich and Shaul Markovitch

Department of Computer Science
Technion—Israel Institute of Technology, 32000 Haifa, Israel

Abstract

Computing semantic relatedness of natural language
texts requires access to vast amounts of
common-sense and domain-specific world knowledge.
We propose Explicit Semantic Analysis (ESA),
a novel method that represents the meaning
of texts in a high-dimensional space of concepts
derived from Wikipedia. We use machine learning
techniques to explicitly represent the meaning of
any text as a weighted vector of Wikipedia-based
concepts. Assessing the relatedness of texts in
this space amounts to comparing the corresponding
vectors using conventional metrics (e.g., cosine).
Compared with the previous state of the art, using
ESA results in substantial improvements in correlation
of computed relatedness scores with human
judgments: from r = 0:56 to 0:75 for individual
words and from r = 0:60 to 0:72 for texts. Importantly,
due to the use of natural concepts, the ESA
model is easy to explain to human users.

W 3 C h i n a ( since 2003 ) 旗下站点
苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》

46.875ms