The BMJ study: The existing AI model for diagnosing the new coronavirus is almost useless

image

Picture source: unsplash

Author: Zhu Rui play

The new coronavirus poses a serious threat to global health. In order to reduce the burden on the healthcare system and provide patients with the best care, the problem of efficient diagnosis and disease prognosis information needs to be solved urgently.

In theory, in the case of limited medical resources, a multivariate model that assesses the risk of infected persons and the outcome of infection can help medical staff to classify patients. From rule-based scoring systems to deep learning models, a large number of predictive models are open source and allow peer review analysis.

So, what should we expect from these models?

image

A study published in the top comprehensive medical journal " The BMJ " ( The BMJ ) systematically evaluates the existing new crown model, which mainly includes the following three functional models: general population risk prediction model, actual infected person The diagnostic model of COVID-19, the prognosis model of patients with COVID-19, and the evaluation part includes model development and external validation studies.

However, the results were not satisfactory. It can be said that the existing AI models for diagnosing the new coronavirus are almost useless.

Collection process

This research team collected covid-19 documents published between January 3, 2020 and May 5, 2020 through academic systems such as Ovid, bioRxiv, medRxiv, arXiv, PubMed, and Embase. If a document is related to a multivariate model or scoring system based on the results of the new crown, the team will include it in the study.

In the end, they collected a total of three types of predictive models: predictive models for the risk of new crowns in the general population, diagnostic models for actual infections among suspected patients, and prognostic models for patients with new crowns. The predictive factors or results have no restrictions on the reference population (for example, inpatients, outpatients, or the general population) and the prediction range (distance predicted by the model). Other related studies that simulate disease transmission or mortality, diagnostic test accuracy, and discovery predictive indicators are not considered.

Starting from the second systematic assessment, relevant documents are retrieved by AI-driven text analysis tools to prioritize sensitivity. Researchers use EPPI-Reviewer to repeatedly screen titles, abstracts, and full texts, and select controversial articles through discussion.

The study used a standardized data analysis table based on CHARMS (Strict Evaluation and Data Extraction of Systematic Reviews for Predictive Model Research) and PROBAST (risk of bias assessment tool) to evaluate predictive models.

Through the system search, the researcher retrieved 14,209 titles. The entire screening process is shown in the figure below:

image

PRISMA (Selection Report for System Evaluation and Transformation Analysis) research flow chart of whether to adopt literature

Survey results

In the final screening of 107 studies, the team used PROBAST evaluation, which is an evaluation tool specifically designed for predicting the risk of model bias.

The results found that 53 items have a higher risk of bias in the training set (reference population), that is, the reference population of the model may not be representative of the target population. None of the 26 studies clearly reported on the risk of bias assessment.

There are 15 items that have a high risk of bias on the prediction set, which indicates that the predictor variables are not necessarily applicable to the model, are not clearly defined or are affected by the prediction results.

The researchers used a simple scoring rule for a diagnostic imaging study, which presented a low risk of predictive bias.

Because the document lacks clear information about preprocessing steps (such as image cropping), and complex machine learning algorithms convert images into predictors in a complex way, analysts are still unclear about the predictors of the original model. It is difficult to assess its risk of bias. Most models use easy-to-evaluate results (e.g., death, confirmed diagnosis, etc.), but there are still concerns about bias caused by the evaluation of results in 19 studies, such as the use of subjective or proxy results (e.g., severe respiratory infections other than COVID-19) ).

With the exception of one study, all other studies have a higher risk of bias in the data set analyzed.

Many studies have small sample sizes, which leads to an increased risk of overfitting, especially when complex modeling strategies are used. Three studies did not report the predictive performance of the model, and four studies only reported surface performance (the training set and the test set were the same, and no adjustments were made for potential overfitting).

Only 13 studies evaluated calibration, but the method of checking calibration in two studies may be suboptimal.

One of the 25 models in the study used an external verification method (verified in a separate data set, and the training set and test set were separated), but in 11 models, the data set used for external verification may not be representative of the target crowd. Another study used data before the new crown epidemic. Therefore, if the model is applied to the target population, the prediction effect may be different. In one study, performance statistics commonly used for prognosis (differentiation, calibration) were not published.

But there are also models that perform well. The studies of Gozes, Fu, Chassagnon, Hu, Kurstjens, and Vaid have satisfactory predictive performance on the external validation set, but it is not clear how they collected external validation data and whether the data is representative. The studies by Wang, Barda, Guo, Tordjman, and Gong have achieved satisfactory results on a potentially unbiased validation data set, but the amount of data in the data set is less than the amount of data required for external verification (100). Diaz-Quijano's study also has a good external verification effect, but because polymerase chain reaction (PCR) testing was not performed, many patients in the data set had to be excluded.

At present, society may urgently need diagnosis and prognostic models to help medical staff get into work more quickly and effectively, which may prompt governments and medical institutions to implement predictive models prematurely.

However, all 145 prediction models have a great risk of bias, and all models lack external verification evidence. In the context of the new crown epidemic, the premature use of the model may do more harm than good.

Therefore, researchers do not recommend using any models in practice at this time.

They also recommend that future model research should focus on verifying, comparing, improving, and updating promising predictive models available, rather than focusing on the development of new predictive models.

Reference:
https://www.bmj.com/content/369/bmj.m1328.long

About data combat faction

The data practitioner hopes to use real data and industry practical cases to help readers improve their business capabilities and build an interesting big data community.

image

Guess you like

Origin blog.csdn.net/shujushizhanpai/article/details/112993992