Artificial Intelligence
Natural language processing
Semantic web

Identification of the underlying semantic relation in compound nouns

Student: Thierry Bélair

Supervisor: Michel Gagnon

Co-supervisor(s): Caroline Barrière

The import/export business must keep track of many different products, but the descriptions of the products are often disorganized. For a machine to decipher the descriptions better, noun compounds could be used. They are numerous in the descriptions. Noun compounds (NC) are a pair of consecutive nouns that have acquired a new meaning. The two nouns are linked by a semantic relation. For example, the noun compound olive oil can be interpreted as oil that comes from olive. According to Levi, there are 12 possible relations for noun compounds, including FROM. In this work, we replicated how Nakov attributed one of the 12 relations to noun compounds. Starting with an unknown noun compound, requests are automatically sent to Google in order to retrieve documents in which the two nouns appear. The features that link the two nouns are extracted and collected into a feature vector. A feature can be either a verb or a verb with its accompanying particle. By comparing feature vectors, the noun compound can be compared to other noun compounds whose relations are known. Unfortunately, since 2011 Google does not accept automatically generated requests. Our goal is to adapt the approach of Nakov to identify the semantic relation in a noun compound without using Google Search. We assert that it is possible by using the texts of Wikipedia, Faroo, Yahoo or the Google n-grams. We modified the approach in a few ways. In addition to using a different text corpus, we form new noun compounds from the synonyms of the first noun compound. The experiment is attempted with feature vectors tweaked in different ways. We try several ways to compare the noun compounds and choose the semantic relation outputted by the system. Our prototype was submitted to three separate evaluations : one using the known noun compounds with the Leave-one-out approach, a second using the same noun compounds, but with different relations and a third using 417 new noun compounds. At best, our success ratio reaches 33%, whereas Nakov reports achieving a 43% success ratio. Even though there are many ways to improve this result, our results are subpar compared to the state of the art. We must conclude that, using a smaller corpus, the method of Nakov cannot correctly classify noun compounds into one of the 12 possible semantic relations.

For more information, click here