National Research Council CanadaSkip all menusSkip first menu Menu
National Research Council Canada Government of Canada
NRC-IIT - Institute for Information Technology
NRC-IIT - Institute for Information Technology
Research Programs
3D Technologies
Artificial Intelligence Technologies
Broadband Visual Communication
Computational Video
e-Learning
Health Initiative
High Performance Computing
Human-Computer Interaction
Information Analysis and Retrieval
Interactive Language Technologies
Internet Logic
People-Centred Technologies
Interaction Techniques for Mobile and Wearable Technology
Knowledge Collaboration
Maliseet Online
Metaxtract: Text Data Structuring Project
Omnivore
Online Questionnaire Design
Systematic Methods of Tool Evaluation
The Community Intermediaries Research Project (CIRP)
Usable Policy Configuration
Voice and Multimodal Access to Web Services
Security and Privacy
Software Engineering
Research in NRC-IIT Locations
Research Success Stories
Printable version Printable
version
Home | Research | Research Programs | People-Centred Technologies | Metaxtract: Text Data Structuring Project

People-Centred Technologies

Metaxtract: Text Data Structuring Project

The main aim of the Metaxtract project is to enable automatic or semi-automatic semantic annotation using linguistic techniques over text-rich resources for The Semantic Web. The ongoing Metaxtract project, which began in May 2003, is one activity of the Semantic Web Lab and is part of the Semantic New Brunswick Initiative. One application of the research results will have the goal of making regional businesses more visible.

Research Context

Development of Semantic Web applications depends on having a set of richly annotated data. One of the challenges of migrating from current web content to Semantic Web enabled content is the massive amount of annotation required. Put another way, the advent of the Semantic Web will create a demand for structured data and today’s web is a vast repository of unstructured text data available for structuring. We are currently experimenting with (semi-)automatic extraction or validation of data plus metadata using information extraction techniques over text.

Research Prototype

In a first phase, we have developed a prototype that combines structured and unstructured data in a novel way. Our source of facts for the prototype is a provincial government database containing consistent data on over 2000 New Brunswick manufacturing companies. This structured data includes typical business information such as contact details, number of employees, sector and product information. Many of the companies also provide a link to their corporate website, and this subset of the web served as our unstructured data source. The resulting prototype is the NBBizMapper.

Our interface makes it possible to run a keyword search over the web content and to return the hits in a structured fashion, organized by location and company name. Location information is displayed graphically for visualizing the distribution of relevant companies. This kind of display could be useful for business analysis. Building from here we hope to develop methods for making regional businesses more visible and for increasing the exchange of goods within this region and to external markets.

Possible applications include

The approach of this project is general enough to apply in many application areas. In addition to the initial NBBizMapper prototype, which applies this technology to benefit regional industry, other application areas are under consideration.

  • Applications of Metaxtract
    • Generating Semantic Web content from current Web sources
    • Semi-automatic metadata creation (e.g. for learning objects)
    • Specialized search within consistent subparts of the Web (e.g. expertise search)
  • Applications of the NBBizMapper prototype
    • Regional Business development
    • Regional Business intelligence

Results could impact on the following sectors

  • The Semantic Web
  • Metadata
  • Annotation
  • Information Retrieval
  • Information Extraction
  • Human Language Technology

Inquiries about collaborating in this research are most welcome. Anyone interested in collaborating in this project is asked to contact the research contact listed below.

Research Contact

Dr. Irina Kondratova
Group Leader
People-Centred Technologies

NRC Institute for Information Technology
46 Dineen Drive
Fredericton, NB E3B 9W4
Telephone: +1 (506) 444-0489
Fax: +1 (506) 452-3859
E-mail: Irina.Kondratova@nrc-cnrc.gc.ca

Business Contact

Marc-Alain Mallet
Business Development Officer
Business Development Office, New Brunswick

NRC Institute for Information Technology
46 Dineen Drive
Fredericton, NB E3B 9W4
Telephone: +1 (506) 444-0394
Fax: +1 (506) 452-3859
E-mail: Marc-Alain.Mallet@nrc-cnrc.gc.ca


Date Modified: 2004-06-30
Top of Page