Multilingual Knowledge Transfer

In many theoretical research contexts, we assume that we have access to large and clean datasets but this is not the case in real life applications. When applying research to real-life problems we are confronted with a panoply of challenges such as low resource datasets, noisy annotations and class imbalance and this is where we really shine.

In the context of legal documents analysis, there are no public datasets available for confidentiality reasons meaning that we have to go through a long and costly annotation process to build high quality datasets for each new language that our clients use.

This part of our research focuses on transferring knowledge from high resource languages to low resource languages in order to help our users no matter the language of their documents.

AI Research @ DiliTrust