National Research Council CanadaSkip all menusSkip first menu Menu
National Research Council Canada Government of Canada
NRC-IIT - Institute for Information Technology
NRC-IIT - Institute for Information Technology
Research Programs
3D Technologies
Artificial Intelligence Technologies
Broadband Visual Communication
Computational Video
e-Learning
Health Initiative
High Performance Computing
Human-Computer Interaction
Information Analysis and Retrieval
Adverb
EurekaSeek
Extractor4Speech
Uqausiit: Inuktitut Language Technologies
Lexical Semantics from Web Mining
LitMiner
Interactive Language Technologies
Internet Logic
People-Centred Technologies
Security and Privacy
Software Engineering
Research in NRC-IIT Locations
Research Success Stories
Printable version Printable
version
Home | Research | Research Programs | Information Analysis and Retrieval | Lexical Semantics from Web Mining

Information Analysis and Retrieval

Lexical Semantics from Web Mining

With some limited understanding of word meaning – lexical semantics – computers will be able to perform many tasks that are not yet within their capabilities.

Since almost every human activity involves knowledge of word meaning in some way, the more semantic information computers are able to manage usefully, the more they will be able to assist people in their daily activities.

Using algorithms from the fields of machine learning, computational linguistics, natural language processing and statistics, it is becoming possible to extract information about aspects of the meaning of words by the computational analysis of huge quantities of text – web mining.

The NRC Institute for Information Technology (NRC-IIT) has successfully developed algorithms for extracting the following semantic information:

  1. Synonym recognition – for example, "levied" is synonymous with "imposed"

  2. Semantic orientation – for example, "integrity" is a positive, praising word, but "disturb" is a negative, criticizing word

  3. Analogy and metaphor – for example, "traffic in the street" is analogous to "water in the river," in that both "flow"

  4. Lexical cohesion – for example, the terms ”math” and “statistics” go together naturally (they “cohere”), but “math” and “food” do not

Applications

Applications for semantic processing are unlimited. Below is a sample of applications drawn from the areas in which NRC-IIT has already developed semantic algorithms:

  1. Synonym recognition can lead to improved search engines.
    • for example, a query for "cars" will also return a document that mentions only "automobiles"

  2. Semantic orientation can lead to tracking public opinion by analyzing online discussions. For example,
    • politicians could gauge public reaction to policy changes
    • investors could track public opinion about stocks
    • consumers could evaluate reaction to new products

  3. Analogy and metaphor can lead to better online help systems.
    • for example, “I was in Word and the fonts went crazy" does not literally mean that the user was inside the computer, nor that fonts have mental states. Since metaphors are ubiquitous, help systems will be more useful if they are not limited to literal meanings.

  4. Lexical cohesion can lead to better automatic text summarization.
    • for example, automatically generated summaries can be improved by filtering out incoherent phrases and sentences
Related NRC-IIT Publications

Research Contact

Dr. Peter Turney
Research Officer
Interactive Information

NRC Institute for Information Technology
1200 Montreal Road
Building M-50, Room C-339
Ottawa, ON K1A 0R6
Telephone: +1 (613) 993-8564
Fax: +1 (613) 952-7151
E-mail: Peter Turney

Business Contact

Randall Milburn
Business Development Officer
Business Development Office, NCR

NRC Institute for Information Technology
1200 Montreal Road
Building M-50, Room 201
Ottawa, ON K1A 0R6
Telephone: +1 (613) 990-6590
Fax: +1 (613) 952-0074
E-mail: Randall.Milburn@nrc-cnrc.gc.ca


Date Modified: 2003-06-16
Top of Page