Skip to main content

Fighting COVID-19 with Data and AI: A Review of Active Research Groups and Datasets

20 Apr, 2020

AI-Based Systems Detecting COVID-19


1. DAMO Academy (Alibaba Group) Detects Coronavirus Cases in CT Scans

In early February, Alibaba Research Academy (DAMO Academy) came up with an AI-based solution that can detect COVID-19 in under 20 seconds with 96% accuracy. The network is a deep computer vision model which takes the CT scan of a patient as input and outputs whether or not they show signs of coronavirus. The model was fine-tuned with more than 5,000 training samples and deployed in more than 26 hospitals across China. So far, it's helped to diagnose over 30,000 cases.


2. Lung Infection Quantification

To reduce the analysis time of CT scans, researchers built a system using deep learning to quantify the lung infection caused by COVID-19. The core idea is to develop a deep learning-based model for automatic segmentation and quantification of the affected regions, as well as the entire lungs from chest CT scans. The authors developed a network named VB-NET, which is a modification of V-Net.


3. Abnormal Respiratory Pattern Classification for Large Scale Screening

According to clinical research, it has been observed that people suffering from COVID-19 have a different pattern of respiration. Noticing this, researchers from East China Normal University have collaborated with other research organizations to develop a deep learning-based algorithm that can help in diagnosis, prognosis, and screening for infected patients based on breathing characteristics.


4. Convolutional Neural Networks for COVID-19 and Pneumonia Screening

Convolutional Neural Networks have been a simple trick for identifying patterns in different images. To make the screening process faster in China, several research groups collaborated to develop a CNN-based Deep Learning model to identify COVID-19 in its early stages from CT scans. To pursue this research, a total of 618 lung CT scans were collected.


5. COVID-19 Identification and Patient Monitoring Using Deep Learning for CT Image Analysis

This research was mainly carried out by RADLogics, a company based out of Boston, in collaboration with many other research groups across the world. The main intention was to build an AI-based automated CT image analysis tool that can achieve high accuracy in the detection of coronavirus-positive patients, and monitor them throughout treatment. Click here for reference article.


6. Drug Screening for COVID-19

After studying RNA sequences that were available on the GISAID database, the authors concluded that COVID-19 is highly homologous to SARS-CoV-1. They found this homology by translating the RNA sequences into protein sequences and then building a 3D protein model using homology modeling (comparing the proteins and constructing an atomic-resolution of them). A DFCNN (Deep Fully Convolutional Neural Network) was used to identify and rank the protein-ligand interactions for performing virtual screening quickly, since no docking or molecular dynamic simulation is needed.


7. Computational Predictions of Protein Structures with AlphaFold

This research mainly addresses the problem of protein folding. You can think of proteins like large, complex molecules. Their three-dimensional structure changes as they perform different operations. The authors gave us an example of why identifying the structure of proteins is essential. 


8. Prediction of Criticality of Patients with Severe COVID-19

In this research, the authors propose prognostic prediction models based on three indices which will predict the mortality risk and clinical route for recognizing critical cases from severe cases.


Datasets for COVID-19

These aforementioned groups have one thing in common: data! With more data, the algorithms get better and better. To help understand COVID-19, several companies and open-source organizations have developed different datasets.

Here are links to a few datasets that are being extensively put to use:

  1. COVID-19 Open Research Dataset Challenge (CORD-19): CORD-19 is a dataset by the Allen Institute for AI in collaboration with several companies and organizations. It consists of over 45,000 scholarly articles, 33,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses.
  2. COVID-19 Korea Dataset: This is an open-sourced dataset by the Republic of Korea for tracking a COVID-positive patient’s travel history. Using this dataset, an ML and web-based platform is developed for visualizing patient routes.
  3. Novel Coronavirus 2019 Dataset: This dataset has daily information on the number of cases, deaths, and recoveries from across different regions, including time-stamps.
  4. COVID19 ChextXRay Dataset: This data contains Chest-X-Rays of COVID-19 cases. Credits to Joseph Paul Cohen for making this dataset open on Github. Using this, one could try building a neural network classifier for detecting COVID-19 using X-Rays (note, however, that the data is quite limited for creating an effective model). People should not, however, claim the diagnostic performance of a model without a clinical study.

Click here for reference