No next seminars
The LAMA-WeST Seminar Series - A Copy Mechanism for Handling Knowledge Base Elements in SPARQL Neural Machine Translation
Neural Machine Translation (NMT) models from English to SPARQL are a promising development for SPARQL query generation. However, current architectures are unable to integrate the knowledge base (KB) schema and handle questions on knowledge resources, classes, and properties unseen during training, rendering them unusable outside the scope of topics covered in the training set. Inspired by the performance gains in natural language processing tasks, we propose to integrate a copy mechanism for neural SPARQL query generation as a way to tackle this issue. We illustrate our proposal by adding a copy layer and a dynamic knowledge base vocabulary to two Seq2Seq architectures (CNNs and Transformers). This layer makes the models copy KB elements directly from the questions, instead of generating them. We evaluate our approach on state-of-the-art datasets, including datasets referencing unknown KB elements and measure the accuracy of the copy-augmented architectures. Our results show a considerable increase in performance on all datasets compared to non-copy architectures.
The LAMA-WeST Seminar Series - Structural Embeddings with BERT Matcher: A Representation Learning Approach for Schema Matching
The schema matching task consists of finding different types of relations between 2 ontologies. Algorithms finding these relations often need a combination of semantics, structural and lexical inputs coming from the ontologies. Structural Embeddings with BERT Matcher (SEBMatcher) is a system that leverages all of these inputs by having Random Walks as its foundation. It is also a system that employs a 2 step approach: An unsupervised pretraining of a Masked Language Modeling BERT for random walks, followed by a supervised training of a BERT classifier for positive and negative mappings. During its participation in the Ontology Alignment Evaluation Initiative (OAEI), SEBMatcher obtained promising results in participating tracks.
The LAMA-WeST Seminar Series - Digital Twinning to predict radiotherapy replanning for head and neck cancer patients
Head and neck cancer patients undergoing radiotherapy often experience weight loss over the course of treatment due to the effects of radiation. This weight loss can result in significant anatomical changes that require the patient’s treatment to be replanned to ensure that an acceptable dose of radiation is being delivered to the tumour and the nearby radiosensitive organs. Unfortunately, the decision to replan a patient is typically done with short notice to the planning team, which can significantly disrupt the workflow and consequently affect the timeline of other patient treatments. Our goal is therefore to pre-emptively determine if and when a patient will need replanning by predicting how a patient’s anatomy will change over the course of treatment. The proposed project will be carried out in three main steps. First, a variational autoencoder will be trained on patient’s cone-beam CT (CBCT) scans that are taken throughout treatment to learn latent space representations of the data. Next, the trajectory of each patient’s change in CBCT scans will be mapped in latent space such that a new patient’s trajectory can be predicted based on past patient trends that neighbour them in latent space. Finally, we aim to incorporate a digital twin framework whereby patient trajectories will be dynamically updated based on new data collected over the course of treatment.
The LAMA-WeST Seminar Series - Language Understanding and the Ubiquity of Local Structure
Recent research has shown that neural language models are surprisingly insensitive to text perturbation, such as shuffling the order of words. If the order of words is unnecessary to perform natural language understanding on many tasks, what is? We empirically demonstrate that local structure is always relied upon by neural language models to build understanding, and global structure is often unused. These results hold for over 400 different languages. We use this property of neural language models to automatically detect which of those 400 different languages are not currently well understood by our current crop of pretrained cross-lingual models, thus providing visibility into where our efforts should go as a research community.