Better Innovation Lab Team took part in a research project DisTEMIST (DISease TExt Mining Shared Task) about large-scale biomedical semantic indexing and question answering. The challenge was focused on the discovery of mentions of medical diseases in clinical texts in Spanish language and the team ranked second in the disease linking to SNOMED Clinical Terms category.
The intent of the challenge, organised by Barcelona Supercomputing Center, BioASQ, and Plan TL, was to find and develop automated ways of discovering relevant information in the data, expanding such information with domain knowledge, and offering structure to the information. This year’s challenge was focused on the discovery of mentions of medical diseases in clinical texts. The organisers provided a set of clinical documents and asked the teams to come up with a solution that would understand the content of the documents, recognise mentions of the diseases, and then link those diseases to SNOMED Clinical Terms.
In the challenge, the teams had to develop a solution consisting of two main components, one that would tackle the problem of understanding the Spanish language, and the second which would match the discovered diseases to the referring Snomed clinical terms.
The team developed an approach using “Vector embeddings” to transform clinical texts
The Better team, Robert Tovornik and Matic Bernik, together with Luis Marco Ruiz from Norwegian Centre for E-health Research, used a framework, which already offers a Spanish language model and shifted the focus on the task of recognizing the diseases. They used the dataset to train a NER (named entity recognition) model which has the power to identify key elements and patterns in the text and locate entities in question – in this case, clinical diseases. It does so by processing and understanding large amounts of structured and unstructured data. They added a few tricks and tweaks of their own to improve the performance and secure a proficient disease recognition model.
The second part of the challenge, the matching of clinical terms, is where the team really shined through. They have developed their own solution which is both accurate and extremely fast – exceeding the speeds of NLP services from Amazon or Microsoft. They have set up a model that transforms all texts, both discovered clinical disease and Snomed term descriptions, synonyms, etc. into numerical vectors through a process called “Vector embeddings”. After that, they used a recently discovered Milvus Vector Database to store the vectors and run rapid comparisons on top to find their high-confidence matches.
The process of innovation was quite rapid and as Robert Tovornik, Data Scientist at Better, said, “none of this would have been possible without a tight collaboration with our dear colleague Luis Marco Ruiz. He was the one who, after a short demo from our side, grabbed on to the idea and introduced us to the DISTEMIST challenge. Throughout the challenge, he offered us valuable support, feedback, and extremely important insight into the Spanish language”.
The solution as part of the Better Platform
As this is a big innovation for Better, there are plans to include the solution as part of the Better Platform. It will most probably start with the English version, not as a standalone service, but as a part of a wider service or as an add-on to Archetype Designer and Studio. “It fits well with the principles of storing data in an organised manner, expanding the functionality to the unstructured free-text format. However, other implementations are possible, for example an annotation tool for the developers interested in preparing data in a semi-automated way for their customised model solutions,” said Robert Tovornik.
“The achievement validates the importance of good collaboration, teamwork, and innovation. It brings a small victory for the team to celebrate, which with innovations can be rare, as you usually don’t know the clear outcome ahead. And most of all it brings a confirmation that we are steering in the right direction, along with a boost of motivation,” also said Robert and added that he is looking forward to more challenges such as this one.
You can learn more about the competition here.