National Research Council CanadaSkip all menusSkip first menu Menu
National Research Council Canada Government of Canada
NRC-IIT - Institute for Information Technology
NRC-IIT - Institute for Information Technology
  
Research Programs
Research in NRC-IIT Locations
Research Success Stories
PORTAGE
Factor
inDiscover
Mona Lisa: A scientific study that was heard round the world
Interactive 3D Displays go Big!
Nouse
Privacy, Security and Trust
BioMiner
The NB-PEI Research Grid
NRC-IIT's 3-D Modeling Technology
MD Robotics and ROSA
Printable version Printable
version
Home | Research | Research Success Stories | PORTAGE

PORTAGE

Statistical Machine Translation: The stakes are high

Statistical machine translation (SMT) is a very competitive area of research, pitting teams from some of the world’s top universities and best-known companies (e.g., Google, IBM and Microsoft) against each other. NRC-IIT set an ambitious goal: to build a world-class system capable of competing each year on equal terms with other systems - and succeeds. In a surprisingly short period of time, PORTAGE earned a place among the world’s best SMT systems. 

The PORTAGE technology's international visibility has been heightened by its participation, starting in October 2005, in the multimillion dollar GALE project sponsored by the US Government's Defense Advanced Research Projects Agency (DARPA). The goal of GALE (Global Autonomous Language Exploitation) is to make foreign language (Arabic and Chinese) speech and text accessible to English monolingual people, particularly in intelligence and military settings.

PORTAGE Project races to a high place among its peers

Since September 2004, researchers at NRC have been working on PORTAGE, a project in NRC-IIT’s Interactive Language Technologies Group located in Gatineau. The aim of the project is to develop a technology to enable a computer to translate from one language to another.

PORTAGE technology is based on an approach called "statistical machine translation" which enables one to train a system for translating between two languages very quickly on a bilingual corpus for the two languages of interest, using algorithms drawn from research on machine learning. NRC’s researchers involved in the project focused on English, French, Arabic, and Chinese as the main languages of interest but PORTAGE technology is applicable to all human languages for which there is a bilingual corpus from which the technology 'learns' how to translate. In fact, PORTAGE has also been applied successfully to translation between English and Spanish, German and Finnish.

Statistical machine translation (SMT) is a very competitive area of research. In 2006-07, PORTAGE participated in the following international competitive evaluations of machine translation (MT):

  • The US National Institute of Standards and Technology (NIST) MT evaluation;
  • The NAACL Workshop on Machine Translation (WMT);
  • The TC-STAR Workshop (sponsored by the European Community).

In all of these evaluations, PORTAGE ranked well above the middle of the pack of the world’s best translation systems. In the most prestigious of them all, the NIST MT evaluation, the PORTAGE Chinese-to-English system’s score according to the official “BLEU” metric jumped about 6% between the summer 2005 evaluation and that of summer 2006. The system’s relative ranking jumped as well: it moved from being in the top half of systems evaluated by NIST in 2005 to being in the top third or higher (8th of 24 systems on one data set, 5th out of 24 on another) in 2006. This dramatic improvement can be attributed to intensive research carried out at NRC in between the two evaluations. A PORTAGE-based Arabic-to-English system was also submitted to a NIST evaluation for the first time in 2006, ranking slightly above the median system.

As a member of the Nightingale consortium directed by SRI International (California), one of the three consortia participating in the project, the NRC was initially funded to supply the PORTAGE technology for translation from Arabic and Chinese into English. This funding was renewed for a second year (spanning 2007) for Chinese to English translation.

The PORTAGE technology was also instrumental in securing NRC participation in the SMART project funded by the European Commission through its “Information Society Technologies” priority, beginning September 1st, 2006. This project (Statistical Multilingual Analysis for Retrieval and Translation) involves several of Europe’s leading machine learning researchers as well as three private sector companies.

To encourage Canadian participation in this growing, economically important field, a research and education license to a version of PORTAGE’s source code (PORTAGEshared) was made available to Canadian universities for a nominal fee. In the course of 2006-07, this software was licensed to renowned Canadian universities: McGill University, Simon Fraser University, the University of Toronto, and the Université de Montréal. Dr. Anoop Sarkar of Simon Fraser writes:

“[We] have been using PORTAGEshared for our research into semi-supervised learning for machine translation ... We started using PORTAGEshared shortly after it was available to Canadian universities, in December 2006. In our experience, PORTAGEshared provided a convenient platform for experiments since it implements a well understood phrase-based statistical machine translation model which provides state-of-the-art accuracy.”

The intent of sharing this expertise being the creation of a SMT community of practice in Canada, the licensing model used for PORTAGEshared encourages interaction between participating universities, the sharing of their experience, as well as the sharing of improvements that will be brought to the software system by the community. The training of highly qualified personnel in the area of SMT is also an objective of this licensing program.

Active interest in PORTAGEshared has been expressed by several other research institutions and NRC-IIT is beginning to explore commercial licensing opportunities which would answer the needs and ambitions of Canadian industry. This technology’s success is further demonstrated by the publication of nine scientific papers in FY0607, participation at key events and conferences, and filing of four patent applications. 


Date Published: 2007-08-17
Top of Page