计算机科学论坛--显示贴子

以文本方式查看主题

-  计算机科学论坛  (http://bbs.xml.org.cn/index.asp)
--  『 Semantic Web(语义Web)/描述逻辑/本体』  (http://bbs.xml.org.cn/list.asp?boardid=2)
----  Frank Van Harmelen评《The Unreasonable Effectiveness of Data》（发表于IEEE Intelligent System 2009年三/四月刊）  (http://bbs.xml.org.cn/dispbbs.asp?boardid=2&rootid=&id=73781)

--  作者：admin
--  发布时间：4/2/2009 11:33:00 AM

--  Frank Van Harmelen评《The Unreasonable Effectiveness of Data》（发表于IEEE Intelligent System 2009年三/四月刊）
http://blog.larkc.eu/?p=1331

The unreasonable effectiveness of fake controversies

(by Frank van Harmelen)

The Halevy, Norvig & Pereira paper on “[URL=http://www.computer.org/portal/cms_docs_intelligent/intelligent/homepage/2009/x2exp.pdf]The Unreasonable Effectiveness of Data[/URL]” (published in IEEE Intelligent Systems, and posted on the [URL=http://googleresearch.blogspot.com/2009/03/unreasonable-effectiveness-of-data.html]Google Blog[/URL]) was much discussed in recent days.

I had my finger on the trigger for a response, when I stumbled across [URL=http://www.betaversion.org/%7Estefano/linotype/news/275/]Stefano Mazzocchi’s blog[/URL] which phrased my opinion about the piece exactly: Halevy (first author) makes his case by creating a controversy that isn’t really there. He opposes a symbolic/structural approach to semantics against a statistical approach, and makes it seem as if the two are entirely mutually exclusive. Obviously that isn’t the case: it’s great if statistical analysis of humongous datasets can unearth important relationships, and I can see no reason why the results of such work could then not be used in structural/symbolic approaches. This is (potentially) a mutually beneficial relationship, not an antagonistic one.

As Mazzocchi rightly points out, it’s rather ironic that the entire Google empire is built on …. gues what…. analysing a structural/symbolic network (namely the HREF links between webpages), which they then very succesfully combine with all kinds of statistical measures. If this combination of structural and statistical approaches works for Web1.0, why suddenly create this fake controversy when we talk about Web3.0?

The most fruitful way forward would be to investigate how the ace work that Halevy c.s. are doing on statistical methods with huge datasets can be combined with approaches that exploit the explicit structure that is available in so many large datasets.

And as an aside: cartooning the Semantic Web as being about “tagging web-pages” is defaulting to a rhetorical device known as “[URL=http://en.wikipedia.org/wiki/Straw_man]seting up a straw man[/URL]“. Never a strong sign. I’m sure Alon c.s. are familiar with LOD, but no mention of it in their paper…)

To finish up, here are some quotes from [URL=http://www.betaversion.org/%7Estefano/linotype/news/275/]Stefano Mazzocchi’s excellent blog entry:[/URL]

What upset me about that paper is not how they say “oh sure, structure is great, but look overhere: there is a goldmine in all the sand” (which is something I fully resonate with) but they phrased it as a fight, deterministic vs. statistical, trying to convince people that adding structure it not the way to go, it’s basically a global waste of research resources

….

Google uses all sort of techniques, statistical and not and they are very good at mixing them together, but that’s not what you get from the paper. What you get is a undertone of criticism for those who believe that what’s needed is a lot more explicit structure

….

this confrontational undertone is coming across at best as hypocrite and at worst as toxic, especially when coming from the [URL=http://research.google.com/]research heads[/URL] of an [URL=http://www.google.com/]entity[/URL] that so much benefited from non-statistical amplification of minor distributed increases in data structure.

Amen to that.

附：[URL=http://www.computer.org/portal/cms_docs_intelligent/intelligent/homepage/2009/x2exp.pdf]《The Unreasonable Effectiveness of Data》下载地址 [/URL]

--  作者：Humphrey
--  发布时间：4/2/2009 3:26:00 PM

--
语义网方面的专门讨论？又有的看了。

W 3 C h i n a ( since 2003 ) 旗下站点
苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》

39.063ms