COVID-19: Baby Steps in the Right Direction
16 Apr 2020 |
The COVID crisis we’re currently living through might well be the imperative of our time, as McKinsey called it recently. Antonio Guterres, United Nations Secretary-General, has since called it the ‘greatest test since WWII’. At KIMO, we believe it is critical that our brightest minds devote time and resources to help alleviate the current situation. Although we have recently seen efforts from industry players like C3.ai, MSFT, Google in collaboration with top universities like Berkeley, we also believe that posting this challenge to the greater public might produce great results (as NASA has shown repeatedly). The recent CORD-19 Challenge, initiated by the White House and Allen Institute for AI and hosted by Kaggle, is a great initiative in this direction. In this article, we aim to update you on our progress so far. We can only hope that others will continue our journey — better and faster than we can.
You can jump immediately to the demo here.
First, the nature of the problem. From a machine learning perspective, the CORD-19 challenge is focused on making sense of the +45,000 medical articles on COVID-19, SARS, and related coronaviruses, as well as prevention and control measures. The competition is challenging on (at least) three fronts:
- The language in the papers is specifically written for medical professionals and researchers. It contains words that normal language models (e.g. BERT, Open-GPT2 are not trained on). In addition, the medical papers provided are also written in different languages (e.g. English, Chinese, Arabic), the language used is not uniform and not all papers are complete. So, one could say that it’s a messy dataset to begin with.
- Question and answering models in the world of AI are not where we want them to be. Although results on simplified tests (like Stanford’s SQuAD2.0) are improving, our current models underperform on major elements like ontology (e.g. which things exist) and causality. Great performance shown, e.g. Watson winning Jeopardy, always turned out to be more stupid than they looked on the outside (e.g. most answers were Wikipedia titles). These problems mean that any QA model is expected to be far from perfect at this point in time.
- The type of questions identified in the challenge, and the very nature of decision making in the medical profession, requires answers to be very precise in order to be useful.
The complexity above can be partly captured in this TF-IDF model, where groups of medical papers are clustered statistically:
The nature of the solution. Let me start with a disclaimer here. Our model is far from perfect, and we do not believe the model in this form can be trusted to work in a medical context. That said, it may be a small step in the right direction for other Machine Learning enthusiasts to continue our journey.
Below is what we have done. Note that, given the breadth of the challenge, we have chosen to build a QA model utilizing open models. Specifically:
- Base the model on BioBERT: a biomedical language representation model designed for biomedical text mining. BioBert GitHub.
- For the QA model, use BioBert pre-trained on SQuAD2.0. This method was used before in the BioASQ challenge. A good paper on this methodology can be found here, with the Github repository here.
- API design was done with FastAPI.
So what is the outcome so far? We have turned the outcome into a quick demo to try on our website. The demo essentially turns a specific question into the best possible answer for that question by searching the 45,000 papers in the database. Out of 45,000, it will select the 5 papers that are most likely to contain the right answer and provide those answers to the user. The functioning of the model is shown below in a small video. Both the model and Github depository will be available in 2 days after we clean the code and deploy it to GCP.
‹ back to blog