周读论文系列笔记(2)-reivew-A survey on Deep Learning in Medical Image Analysis

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/Void_worker/article/details/85317382

刚接触这个领域…不怎么会写…有些翻译错和理解错的地方请大佬们多多指教~

这篇论文分为四个部分:
Deep learning methods
Deep learning uses in medical imaging
Application areas
Challenges and outlook
第1部分在这里就不写了…写剩下的部分

原文链接:https://www.sciencedirect.com/science/article/pii/S1361841517301135

在这里插入图片描述

1.Deep learning methods

2.Deep learning uses in medical imaging

2.1 Classification 分类

2.1.1 Image/exam classification (图像/exam 分类)

Image or exam classification was one of the first areas in which deep learning made a major contribution to medical image analysis.

In exam classification one typically has one or multiple images (an exam) as input with a single diagnostic variable as output (e.g., disease present or not).

Dataset sizes are small -> transfer learning

Two transfer learning strategies were identified:
(1) using a pre-trained network as a feature extractor.
(2) fine-tuning a pre-trained network on medical data.
The former strategy has the extra benefit of not requiring one to train a deep network at all, allowing the extracted features to be easily plugged in to existing image analysis pipelines. Both strategies are popular and have been widely applied. Few authors perform a investigation in which strategy gives the best result.

Methods:
(1)Initially focus on unsupervised pre-training and network architectures like SAEs(Sparse Autoencoder稀疏自编码器) and RBMs(Restricted Boltzmann Machine 受限玻尔兹曼机).
(2)CNN (in 2015, 2016, 2017)
The application areas ranging from brain MRI to retinal imaging(视网膜成像) and digital pathology(数字病理学) to lung computed tomography(肺部计算机断层扫描).
(3)In the more recent papers using CNNs authors also often train their own network architectures from scratch instead of using pre-trained networks.
(4)Three papers used an architecture leveraging the unique attributes of medical data.(3D…)

Summary: in exam classification CNNs are the current standard techniques. Especially CNNs pre- trained on natural images have shown surprisingly strong results, challenging the accuracy of human ex- perts in some tasks. Last, authors have shown that CNNs can be adapted to leverage intrinsic structure of medical images.

2.1.2 Object or lesion classification (object或病变分类)

Object classification usually focuses on the classification of a small (previously identified) part of the medical image into two or more classes (e.g. nodule classification in chest CT).

For many of these tasks both local information on lesion appearance and global contextual information on lesion location are required for accurate classification.
This combination is typically not possible in generic deep learning architectures.

Methods:
(1)Almost all recent papers prefer the use of end-to-end trained CNNs.
Several authors have used multi-stream architectures to resolve this in a multi-scale fashion.
three CNNs(each of which takes a nodule patch), a combination of CNNs and RNNs(for grading nuclear cataracts对核白内障分级) , 3D CNN(high-grade gliomas高级别胶质瘤)

(2)In some cases other architectures and approaches are used, such as RBMs (Restricted Boltzmann Machine 受限玻尔兹曼机) SAEs (Sparse Autoencoder稀疏自编码器) and convolutional sparse auto-encoders (CSAE) (卷积稀疏自编码器). The major difference between CSAE and a classic CNN is the usage of unsupervised pre-training with sparse auto-encoders.

(3)An interesting approach, especially in cases where object annotation to generate training data is expensive, is the integration of multiple instance learning (MIL多实例学习) and deep learning.

Summary:
object classification sees less use of pre-trained networks compared to exam classifications, mostly due to the need for incorporation of contextual or 3D information. Several authors have found innovative solutions to add this information to deep networks with good results, and as such we expect deep learning to become even more prominent for this task in the near future.

2.2 Detection 检测

2.2.1 Organ, region and landmark localization (器官 定位)

Anatomical object localization (in space or time), such as organs or landmarks, has been an important pre-processing step in segmentation tasks or in the clinical workflow for therapy planning and intervention.
Localization in medical imaging often requires parsing of 3D volumes.

Methods:
Space:
(1)To solve 3D data parsing with deep learning algorithms, several approaches have been proposed that treat the 3D space as a composition of 2D orthogonal planes.
(2)Other authors try to modify the network learning pro- cess to directly predict locations. (Due to its increased complexity, only a few methods addressed the direct localization of landmarks and regions in the 3D image space.)
Time:
(1)CNNs have also been used for the localization of scan planes or key frames in temporal data.
(2)RNN, particularly LSTM-RNNS, have also been used to exploit the temporal information contained in medical videos, another type of high dimensional data.
(3)Combine an LSTM-RNN with a CNN

Summary:
Localization through 2D image classification with CNNs seems to be the most popular strategy overall to identify organs, regions and landmarks, with good results.
However, several recent papers expand on this concept by modifying the learning process such that accurate localization is directly emphasized, with promising results.
We expect such strategies to be explored further as they show that deep learning techniques can be adapted to a wide range of localization tasks (e.g. multiple landmarks).
RNNs have shown promise in localization in the temporal domain, and multi-dimensional RNNs could play a role in spatial localization as well.

2.2.2 Object or lesion detection (object或病变检测)

The detection of objects of interest or lesions in images is a key part of diagnosis and is one of the most labor-intensive for clinicians.
Typically, the tasks consist of the localization and identification of small lesions in the full image space.

Methods:
(1)There has been a long research tradition in computer-aided detection systems that are designed to automatically detect lesions. The first object detection system using CNNs was already proposed in 1995, using a CNN with four layers to detect nodules in x-ray images.

(2)Most of the published deep learning object detection systems still uses CNNs to perform pixel (or voxel) classification, after which some form of post processing is applied to obtain object candidates.
As the classification task performed at each pixel is essentially object classification, CNN architecture and methodology are very similar to object classification.(the incorporation of contextual or 3D information: multi-stream CNNs)

Different between object detection and object classification:
Because every pixel is classified, the class balance is skewed severely towards the non-object class in a training setting.
-> To add insult to injury, usually the majority of the non-object samples are easy to discriminate.
->fCNNs(classifying each pixel in a sliding window fashion results in orders of magnitude of redundant calculation)

Summary:
Challenges are similar to those in object classification.
Few papers directly address issues specific to object detection like class imbalance/hard-negative mining or efficient pixel/voxel-wise processing of images.
We expect that more emphasis will be given to those areas in the near future, for example in the application of multi-stream networks in a fully convolutional fashion.

2.3 Segmentation 分割

2.3.1 Organ and substructure segmentation (器官和子结构分割)

The segmentation of organs and other substructures in medical images allows quantitative analysis of clinical parameters related to volume and shape, as, for ex-ample, in cardiac or brain analysis. Furthermore, it is often an important first step in computer-aided detection pipelines.

The task of segmentation is typically defined as identifying the set of voxels which make up either the contour or the interior of the object(s) of interest(对象的轮廓或内部).
Segmentation is the most common subject of papers applying deep learning to medical imaging, and as such has also seen the widest variety in methodology.

Methods:
(1)The most well-known, in medical image analysis, of these novel CNN architectures is U-net.
(2)RNNs have recently become more popular for segmentation tasks.
(3)Many authors have also obtained excellent segmentation results with patch-trained neural networks. Most recent papers now use fCNNs in preference over sliding-window-based classification to reduce redundant computation.(fCNNs have also been extended to 3D and have been applied to multiple targets at once)

Challenge:
One challenge with voxel classification approaches is that they sometimes lead to spurious responses.
To combat this, groups have tried to combine fCNNs with graphical models like MRFs and Conditional Random Fields (CRFs) to refine the segmentation output.
In most of the cases, graphical models are applied on top of the likelihood map produced by CNNs or fCNNs and act as label regularizers.

Summary:
Segmentation in medical imaging has seen a huge influx of deep learning related methods. Custom architectures have been created to directly target the segmentation task. These have obtained promising results, rivaling and often improving over results obtained with fCNNs.

2.3.2 Lesion segmentation (病变分割)

Segmentation of lesions combines the challenges of object detection and organ and substructure segmentation in the application of deep learning algorithms.
(1)Global and local context are typically needed to perform accurate segmentation, such that multi-stream networks with different scales or non-uniformly sampled patches are used.
(2)In lesion segmentation we have also seen the application of U-net and similar architectures to leverage both this global and local context.

Challenge:
class imbalance.
solutions:(1)adapting the loss function (2)performing data augmentation on positive samples

Summary:
Thus lesion segmentation sees a mixture of approaches used in object detection and organ segmentation. Developments in these two areas will most likely naturally propagate to lesion segmentation as the exist- ing challenges are also mostly similar.

2.4 Registration 配准

Registration (i.e. spatial alignment) of medical images is a common image analysis task in which a coordinate transform is calculated from one medical image to another. Often this is performed in an iterative framework where a specific type of (non-)parametric transformation is assumed and a predetermined metric is optimized.

Methods:
Researchers have found that deep networks can be beneficial in getting the best possible registration performance.
Broadly speaking, two strategies are prevalent in current literature:
(1) using deep-learning networks to estimate a similarity measure(相似性度量) for two images to drive an iterative optimization strategy(迭代优化策略).
(2) to directly predict transformation parameters using deep regression networks.

Summary:
In contrast to classification and segmentation, the research community seems not have yet settled on the best way to integrate deep learning techniques in registration methods. Not many papers have yet appeared on the subject and existing ones each have a distinctly different approach.

2.5 Other tasks in medical imaging

2.5.1 Content-based image retrieval (基于内容的图像检索)

Content-based image retrieval (CBIR) is a technique
for knowledge discovery in massive databases and offers the possibility to identify similar case histories, understand rare disorders(罕见疾病), and, ultimately, improve patient care.

Challenge:
The major challenge in the development of CBIR methods is extracting effective feature representations from the pixel-level information and associating them with meaningful concepts.

Methods:
All current approaches use (pre-trained) CNNs to extract feature descriptors from medical images.

Summary:
Content-based image retrieval as a whole has thus not seen many successful applications of deep learning methods yet, but given the results in other areas it seems only a matter of time.
An interesting avenue of research could be the direct training of deep networks for the retrieval task itself.

2.5.2 Image generation and enhancement (图像生成和增强)

A variety of image generation and enhancement methods using deep architectures have been proposed, ranging from removing obstructing elements in images, normalizing images, improving image quality, data completion, and pattern discovery.

Methods:
In image generation, 2D or 3D CNNs are used to convert one input image into another. Typically these architectures lack the pooling layers present in classification networks.
With multi-stream CNNs super-resolution images can be generated from multiple low-resolution inputs.

Summary:
Image generation has seen impressive results with
very creative applications of deep networks in significantly differing tasks.

2.5.3 Combining image data with reports (将图像数据与报告结合)

The combination of text reports and medical image data has led to two avenues of research:
(1) leveraging reports to improve image classification accuracy
(2) generating text reports from images.
the latter inspired by recent caption generation papers from natural images

Given the wealth of data that is available in PACS systems in terms of images and corresponding diagnostic reports, it seems like an ideal avenue for future deep learning research. One could expect that advances in captioning natural images will in time be applied to these data sets as well.

3.Application areas

We highlight some key contributions and discuss performance of systems on large data sets and on public challenge data sets.
All these challenges are listed on http:\\ www.grand-challenge.org

3.1 Brain 脑

DNNs have been extensively used for brain image
analysis in several different application domains. (Table 1)

[Application domins]
A large number of studies address classification of Alzheimer’s disease(阿兹海默病的分类) and segmentation of brain tissue and anatomical structures (e.g. the hippocampus)(脑组织和揭破结构(如海马体)的分割). Other important areas are detection and segmentation of lesions (e.g. tumors, white matter lesions, lacunes, micro-bleeds)(病变的检测和分割(如肿瘤,白质病变,腔隙,微出血)).

Apart from the methods that aim for a scan-level classification (e.g. Alzheimer diagnosis), most methods learn mappings from local patches to representations and subsequently from representations to labels.(局部斑块到表示,表示到标签的映射)
[Problem]
However, the local patches might lack the contextual information required for tasks where anatomical information is paramount.(局部斑块缺少上下文信息)

[Sulution]
To tackle this, Ghafoorian et al. (2016b) used non-uniformly sampled patches by gradually lowering sampling rate in patch sides to span a larger context. An alternative strategy used by many groups is multiscale analysis and a fusion of representations in a fullyconnected layer.

[Methods]
Even though brain images are 3D volumes in all surveyed studies, most methods work in 2D, analyzing the 3D volumes slice-by-slice. This is often motivated by either the reduced computational requirements or the thick slices relative to in-plane resolution in some data sets. More recent publications had also employed 3D networks.

[Summary]
BRATS
LSLES
MRBrains

the top ranking teams to date have all used CNNs.

Almost all of the aforementioned methods are concentrating on brain MR images. We expect that other brain imaging modalities such as CT and US can also benefit from deep learning based analysis.

3.2 Eye 眼睛

Ophtahlmic imaging(眼科成像)

Most works employ simple CNNs for the analysis of color fundus imaging (CFI).

A wide variety of applications are addressed: segmentation of anatomical structures(解剖结构的分割), segmentation and detection of retinal abnormalities(视网膜异常的分割和检测), diagnosis of eye diseases(眼科疾病的诊断), and image quality assessment(图像质量评估).

Kaggle: a diabetic retinopathy detection competition(糖尿病视网膜病变检测): Over 35,000 color fundus images(CFI) were provided to train algorithms to predict the severity of disease in 53,000 test images.
The majority of teams use end-to-end CNNs.

3.3 Chest 胸部

In thoracic image analysis(胸部图像分析) of both radiography(X光) and computed tomography(CT 计算机断层扫描), the detection, characterization, and classification of nodules(结节的检测、表征、分类) is the most commonly addressed application.

In chest X-ray, several groups detect multiple diseases with a single system.
In CT the detection of textural patterns indicative of interstitial lung diseases is also a popular research topic.

challenge for nodule detection in CT, LUNA16: CNN architectures were used by all top performing systems.
The best systems in LUNA16 still rely on nodule candidates computed by rule-based image processing, but systems that use deep networks for candidate detection also performed very well (e.g. U-net).

Kaggle Data Science Bowl 2017: Estimating the probability that an individual has lung cancer from a CT scan

3.4 Digital pathology and microscopy 数字病理学和显微镜

3.5 Breast 乳房

3.6 Cardiac 心脏

Deep learning has been applied to many aspects of cardiac image analysis.

[Domains]
MRI is the most researched modality and left ventricle segmentation the most common task.
Other application domains: segmentation, tracking(追踪), slice classification(切片分类), image quality assessment, automated calcium scoring(自动钙评分) and coronary centerline tracking(冠状动脉中心线追踪), and super-resolution(超级分辨率).

[Methods]
(1)Most papers used simple 2D CNNs and analyzed the 3D and often 4D data slice by slice.
(2)the exception is Wolterink et al. (2016) where 3D CNNs were used.
(3)DBNs(Deep Belief Nets 深度信念网络) are used in four papers, but these all originated from the same author group.The DBNs are only used for feature extraction and are integrated in compound segmentation frameworks.
(4)Two papers combined CNNs with RNNs.

[Challenge]
Kaggle Data Science Bowl2015: automatically measure end-systolic and end-diastolic volumes(心脏收缩末期和心脏舒张末期的容量) in cardiac MRI.

3.7 Abdomen 腹部

3.8 Musculoskeletal 肌与骨骼的

3.9 Other

4.Dissusion

4.1 Overview

(1) The earliest studies used pre-trained CNNs as feature extractors.
(2) In the last two years, end-to-end trained CNNs have become the preferred approach for medical imaging interpretation(the current standard practice).

4.2 Key aspects of successful deep learning methods

Although CNN(and derivatives) are now clearly the top performers in most medical image analysis competitions, the exact architecture is not the most important determinant in getting a good solution.

(1)Expert knowledge about the task to be solved can provide advantages that go beyond adding more layers to a CNN.(e.g. novel data preprocessing or augmentation techniques.)
(2)Designing architectures incorporating unique task-specific properties can obtain better results than straightforward CNNs. (e.g. multi-view and multi-scale networks).
Other, parts of network design are the network input size and receptive field(网络输入大小和接收场) (i.e. the area in input space that contributes to a single output unit (在输入空间中有助于单个输出单元的区域)).
(3)Model hyper-parameter optimization (e.g. learning rate, dropout rate)(a highly empirical exercise)(secondary importance with respect to performance to the previously discussed topics and training data quality.)
solutions:intuition-based random search(基于直觉的随机搜索)(work well enough), Bayesian methods for hyper-parameter optimization(not been applied in medical image analysis)

4.3 Unique challenges in medical image analysis

(1) The lack of large training data sets is often mentioned as an obstacle.
The main challenge is thus not the availability of image data itself, but the acquisition of relevant annotations/labeling for these images.

Turning reports into accurate annotations or structured labels in an automated manner requires sophisticated text-mining methods, which is an important field of study in itself where deep learning is also widely used nowadays.

[Solutions]: training a deep learning segmentation system for 3D segmentation using only sparse 2D segmentations; Multiple-instance or active learning approaches; leveraging non-expert labels via crowd-sourcing(众包); to highlight regions of interest, reducing the need for expert experience(e.g. in histopathology one can sometimes use specific immunohistochemical stains 特异性免疫组织化学染色)

(2) Label noise
(no consensus was forced)
Training a deep learning system on such data requires careful consideration of how to deal with noise and un- certainty in the reference standard.
[Solutions]: incorporating labeling uncertainty directly in the loss function(an open challenge)

(3) In medical imaging often classification or segmentation is presented as a binary task(二分类): normal versus abnormal(正常vs异常), object versus background(对象vs背景). However, this is often a gross simplification(粗糙的简化) as both classes can be highly heterogeneous.
[Solutions]:
Turning the deep learning system in a multi-class system by providing it with detailed annotations of all possible subclasses(simply not feasible); Tackling this imbalance by incorporating intelligence in the training process itself, by applying selective sampling or hard negative mining(fail when there is substantial noise in the reference standard)

(4) Class imbalance
In medical imaging, images for the abnormal class might be challenging to find.
[Solutions]: Applying specific data augmentation algorithms.

(5) Physicians often leverage a wealth of data on patient history, age, demographics and others to arrive at better decisions. Some authors have already investigated combining this information into deep learning networks in a straightforward manner. The improvements that were obtained were not as large as expected.
One of the challenges is to balance the number of imaging features in the deep learning network (typically thousands) with the number of clinical features (typically only a handful) to prevent the clinical features from being drowned out.

4.4 Outlook

(1) Several high-profile successes of deep learning in medical imaging have been reported, such as the work by Esteva et al. (2017) and Gulshan et al. (2016) in the fields of dermatology(皮肤病学) and ophthalmology(眼科学).
However, ①both focus on small 2D color image classification; ②And it also allowed the authors to use networks that were pre-trained on a very well-labeled dataset.
In contrast, ①in most medical imaging tasks 3D gray-scale(3D灰度图) or multi-channel images(多通道图像) are used for which pre-trained networks or architectures don’t exist(不存在预训练的网络或架构); ②This data typically has very specific challenges, like anisotropic voxel sizes(各向异性体素尺寸), small registration errors between varying channels (e.g. in multi-parametric MRI) or varying intensity ranges(不同的通道或不同的强度范围的小配准误差); ③Although many tasks in medical image analysis can be postulated as a classification problem, this might not always be the optimal strategy as it typically requires some form of post-processing with non-deep learning methods

(2) A key area which can be highly relevant for medical imaging and is receiving (renewed) interest: unsupervised learning.

Unsupervised methods are attractive as ①they allow (initial) network training with the wealth of unlabeled data available in the world, ②analogue to human learning.

Unsupervised strategies: ①variational auto-encoders (VAEs 变分自动编码器) ②generative adversarial networks (GANs 生成对抗网络)
The former merges variational Bayesian graphical models(变分贝叶斯图形模型) with neural networks as encoders/decoders. The latter uses two competing convolutional neural networks where one is generating artificial data samples(生成人工样本) and the other is discriminating artificial from real samples(将人工样本和真实样本区分). Both have stochastic components(随机成分) and are generative networks. Most importantly, they can be trained end-to-end and learn representative features in a completely unsupervised manner.

(3) Deep learning methods have often been described as ‘black boxes’. It is often not enough to have a good prediction system. This system also has to be able to articulate itself(自我表达) in a certain way.

Several strategies have been developed to understand what intermediate layers of convolutional networks are responding to. ①deconvolution networks(反卷积网络), ②guided back-propagation(引导反向传播) or ③deep Taylor composition(深度泰勒组合), ④tie prediction to textual representations of the image (i.e. captioning) , ⑤combine Bayesian statistics with deep networks to obtain true network uncertainty estimates

(4)We also foresee deep learning approaches will be used for related tasks in medical imaging, mostly unexplored, such as image reconstruction(图像重建) (Wang, 2016)

猜你喜欢

转载自blog.csdn.net/Void_worker/article/details/85317382