[Computer Vision | Target Detection] Arxiv Computer Vision Academic Express on Target Detection (A collection of papers on August 29)

Article directory

1. Detection related (18 articles)

1.1 Neural Network Training Strategy to Enhance Anomaly Detection Performance: A Perspective on Reconstruction Loss Amplification

Neural Network Training Strategies for Improving Anomaly Detection Performance: A Perspective Based on Reconstruction Loss Amplification

https://arxiv.org/abs/2308.14595

Unsupervised anomaly detection (UAD) is a widely adopted industrial method due to the rare occurrence of anomalies and data imbalance. A desirable property of UAD models is to include generalization capabilities that are good at reconstructing visible normal patterns but combating invisible anomalies. Researches in recent years have included the generalization ability of its UAD model in reconstruction from different perspectives, such as the structure of neural network (NN) and the design of training strategies. In contrast, we note that inclusion of generalization ability reconstructions can also simply be derived from steep shape loss landscapes. With this motivation, we propose a loss landscape sharpening method that reconstructs loss by upscaling, called Loss AMPification (LAMP). LAMP warps the loss landscape into a steeper shape, so the reconstruction error for unseen anomalies becomes larger. Therefore, the anomaly detection performance is improved without changing the structure of the neural network. Our results show that LAMP can be easily applied to any indicator of reconstruction error, and in the UAD setting the reconstructed model is trained with anomaly-free samples.

1.2 SAAN: Similarity-aware attention flow network for change detection with VHR remote sensing images

SAAN: A Similarity-Aware Attention Flow Network for Change Detection in VHR Remote Sensing Images

https://arxiv.org/abs/2308.14570

Change detection is a fundamental and important task in the field of earth observation. Existing deep learning-based CD methods usually use a weight-sharing Siamese encoder network to extract bitemporal image features and a decoder network to identify changing regions. However, the performance of these CD methods is still far from satisfactory, since we observe that 1) the deep encoder layers focus on irrelevant background regions, and 2) at different decoder stages, the model's Confidence is inconsistent. The first problem arises because deep encoder layers cannot efficiently learn from unbalanced variation categories using unique output supervision, while the second problem is attributed to the lack of explicit semantic consistency preservation. To address these issues, we design a new Similarity-Aware Attention Flow Network (SAAN). SAAN incorporates a similarity-guided attention flow module with deeply supervised similarity optimization for efficient change detection. Specifically, we address the first issue by explicitly guiding deep encoder layers to discover semantic relations from bitemporal input images using deeply supervised similarity optimization. The extracted features are optimized to be semantically similar in unchanged regions and semantically different in altered regions. The second shortcoming can be mitigated by the proposed similarity-guided attention flow module, which combines a similarity-guided attention module and an attention flow mechanism to guide the model to focus on discriminative channels and regions. We evaluate the effectiveness and generalization ability of the proposed method conducting experiments on a wide range of CD tasks. Experimental results demonstrate that our method achieves excellent performance on several CD tasks, with discriminative features and preservation of semantic consistency.

1.3 Face Presentation Attack Detection by Excavating Causal Clues and Adapting Embedding Statistics

Face Presence Attack Detection with Mining Causal Clues and Adaptive Embedding Statistics

https://arxiv.org/abs/2308.14551

Recent Face Presence Attack Detection (PAD) utilizes Domain Adaptation (DA) and Domain Generalization (DG) techniques to address performance degradation in unknown domains. However, DA-based PAD methods require access to unlabeled target data, while most DG-based PAD solutions rely on a priori, i.e., known domain labels. Furthermore, most DA/DG-based methods are computationally intensive, requiring complex model architectures and/or multi-stage training procedures. This paper proposes to model surface PAD as a composite DG task from a causal perspective, linking it to model optimization. We mine causal factors hidden in high-level representations through counterfactual interventions. Furthermore, we introduce a class-guided MixStyle to enrich the feature-level data distribution within a class instead of focusing on domain information. Neither the class-guided MixStyle nor the counterfactual intervention component introduces additional trainable parameters and negligible computational resources. Extensive experiments across datasets and analyzes demonstrate the effectiveness and efficiency of our method compared to the state-of-the-art PAD. The weights implemented and trained are publicly available.

1.4 Group Regression for Query Based Object Detection and Tracking

Query-Based Grouped Regression Object Detection and Tracking

https://arxiv.org/abs/2308.14481

Group regression is commonly used in 3D object detection to predict box parameters of similar classes in joint heads, aiming to benefit from similarity while separating highly dissimilar classes. For query-based perception methods, this has not been feasible so far. We close this gap and propose a method that integrates multi-class group regression, specifically designed for use in the context of autonomous driving in the 3D domain, to existing attention and query-based perception methods. We enhance our Transformer-based joint object detection and tracking model, this approach, and thoroughly evaluate its behavior and performance. For group regression, the classes of the nuScenes dataset were divided into six groups of similar shape and prevalence, each group regressed by a dedicated head. We show that the method is applicable to many existing Transformer-based perception methods and can bring potential benefits. The behavior of query group regression with the unified regression head is thoroughly analyzed, e.g. in terms of class switching behavior and the distribution of output parameters. The proposed method offers many possibilities for further research, such as in the direction of deep multi-hypothesis tracking.

1.5 Improving the performance of object detection by preserving label distribution

Improving object detection performance by preserving label distribution

https://arxiv.org/abs/2308.14466

Object detection is the task of location recognition and label classification of objects in images or videos. The information obtained through this process plays a crucial role in various tasks in the field of computer vision. In object detection, data for training and validation are often derived from public datasets that are balanced in terms of the number of objects in images attributed to each class. However, in real-world scenarios, it is more common to deal with datasets with much larger class imbalances, i.e., very different numbers of objects for each class, and this imbalance may play a role in predicting unseen Reduce the performance of object detection when testing images. Therefore, in our study, we propose a method that uniformly distributes classes in images for training and validation, addressing the class imbalance problem in object detection. Our proposed method aims to maintain a uniform class distribution through multi-label layering. We tested our proposed method not only on public datasets, which typically exhibit balanced class distributions, but also on custom datasets, which may have unbalanced class distributions. We find that our proposed method is more effective on datasets containing severe imbalances and less data. Our results demonstrate that the proposed method can be effectively used for datasets with substantially unbalanced class distributions.

1.6 Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection

Cross-task protocol inconsistency bridging for distillation in dense object detection

https://arxiv.org/abs/2308.14286

Knowledge distillation (KD) has shown potential for learning compact models for dense object detection. However, the commonly used softmax-based distillation ignores the absolute classification scores of individual classes. Therefore, optimality of distillation loss does not necessarily lead to optimal student classification scores for dense object detectors. This inconsistency in cross-task agreement is critical, especially for dense object detectors, since the foreground classes are highly unbalanced. To address the issue of protocol differences between distillation and classification, we propose a new distillation method with consistent protocol across tasks, tailored for dense object detection. For classification distillation, we address the inconsistency of the cross-task protocol by formulating the classification logit maps in the teacher and student models as multiple binary classification maps and applying a binary classification distillation loss to each map. For localization distillation, we design an IoU-based localization distillation loss, which is not subject to a specific network structure, and can be compared with existing localization distillation losses. Our proposed method is simple yet effective, and experimental results show that it outperforms existing methods. Code is available at https://github.com/TinyTigerPan/BCKD.

1.7 Intergrated Segmentation and Detection Models for Dentex Challenge 2023

Synthetic Segmentation and Detection Models for the Dentex 2023 Challenge

https://arxiv.org/abs/2308.14161

Dental panoramic X-rays are commonly used for dental diagnosis. With the development of deep learning technology, automatic detection of diseases in dental panoramic radiographs can help dentists to diagnose diseases more effectively. Dentex Challenge 2023 is a competition to automatically detect abnormal teeth and their enumerated IDs from dental panoramic radiographs. In this paper, we propose a method that combines segmentation and detection models to detect abnormal teeth and obtain their enumerated IDs. Our code is available at https://github.com/xyzlancehe/DentexSegAndDet.

1.8 Superpixels algorithms through network community detection

Superpixel Algorithm Based on Network Community Detection

https://arxiv.org/abs/2308.14101

Community detection is a powerful tool in complex network analysis with applications in various research fields. Several image segmentation methods rely on community detection algorithms, eg as black boxes, in order to compute under-segmentation, ie a small number of regions representing the region of interest of the image. However, to the best of our knowledge, the efficiency of this approach w.r.t. the purpose of superpixels to represent images at a smaller level while preserving as much original information as possible has been ignored so far. The only related work seems to be the one by Liu et al. et al. (IET Image Processing, 2022), who developed a superpixel algorithm using the so-called modular maximization method, leading to related results. We follow this line of research by investigating the efficiency of superpixels computed by state-of-the-art community detection algorithms on 4-connected pixel maps (so-called pixel grids). We first detect communities on such a graph, and then apply a simple pooling procedure that allows obtaining the desired number of superpixels. As we shall see, according to different widely used metrics based on ground-truth comparisons or only on superpixels, such an approach leads to the computation of relevant superpixels as emphasized by both qualitative and quantitative experiments. We observe that the choice of community detection algorithm has a strong influence on the number of communities and thus on the merging process. Similarly, small variations on the pixel grid can provide different results both qualitatively and quantitatively. For completeness, we compare our results with several state-of-the-art superpixel algorithms computed by Stutz et al. (Computer Vision and Image Understanding, 2018).

1.9 A comprehensive review on Plant Leaf Disease detection using Deep learning

A review of research on plant leaf disease detection based on deep learning

https://arxiv.org/abs/2308.14087

Leaf diseases are common fatal diseases of plants. Early diagnosis and detection are necessary to improve the prognosis of leaf diseases affecting plants. To predict leaf diseases, several automated systems have been developed using different phytopathological imaging modalities. This paper presents a systematic review of the literature based on leaf disease models for the diagnosis of various plant leaf diseases by deep learning. This paper presents the advantages and limitations of different deep learning models, including Vision Transformer (ViT), Deep Convolutional Neural Network (DCNN), Convolutional Neural Network (CNN), residual skip network-based super-resolution for leaf disease detection (RSNSR-LDD), Disease Detection Network (DDN) and YOLO (You only look once). The review also shows that research related to leaf disease detection applies different deep learning models to many publicly available datasets. To compare the performance of models, different metrics can be used such as accuracy, precision, recall, etc. used in existing research.

1.10 Practical Edge Detection via Robust Collaborative Learning

Practical Edge Detection Based on Robust Collaborative Learning

https://arxiv.org/abs/2308.14084

Edge detection, as a central part of a vision-oriented task, is to identify the boundaries and salient edges of objects in natural images. It is desirable for edge detectors to be efficient and accurate for practical use. To achieve this goal, two key issues should be focused on: 1) how to free deep edge models from the inefficient pre-trained backbone utilized by most existing deep learning methods to save computational cost and cut model size; 2 ) Due to the subjectivity and ambiguity of the annotator, noise and even mislabeling are common in edge detection. How to eliminate these effects and improve the robustness and accuracy of edge detection. In this paper, we try to address the above issues simultaneously by developing a collaborative learning based model called PEdger. The rationale behind our PEdger is that information learned from different training moments and heterogeneous (recurrent and acyclic in this work) architectures can be assembled to explore powerful knowledge against noisy annotations, even without The help of pre-training on additional data. Extensive ablation studies and quantitative and qualitative experimental comparisons are conducted on BSDS 500 and NYUD datasets to verify the effectiveness of our design and demonstrate its superiority over other competitors in terms of accuracy, speed and model size. Code can be found at https://github.co/ForawardStar/PEdger.

1.11 DETDet: Dual Ensemble Teeth Detection

DETDET: Dual Tooth Detection

https://arxiv.org/abs/2308.14070

The field of dentistry is in the era of digital transformation. Particularly, artificial intelligence is anticipated to play a significant role in digital dentistry. AI holds the potential to significantly assist dental practitioners and elevate diagnostic accuracy. In alignment with this vision, the 2023 MICCAI DENTEX challenge aims to enhance the performance of dental panoramic X-ray diagnosis and enumeration through technological advancement. In response, we introduce DETDet, a Dual Ensemble Teeth Detection network. DETDet encompasses two distinct modules dedicated to enumeration and diagnosis. Leveraging the advantages of teeth mask data, we employ Mask-RCNN for the enumeration module. For the diagnosis module, we adopt an ensemble model comprising DiffusionDet and DINO. To further enhance precision scores, we integrate a complementary module to harness the potential of unlabeled data. The code for our approach will be made accessible at https://github.com/Bestever-choi/Evident

1.12 Hierarchical Contrastive Learning for Pattern-Generalizable Image Corruption Detection

Hierarchical Contrastive Learning for Image Corruption Detection with Pattern Generalization

https://arxiv.org/abs/2308.14061

Efficient image restoration with large-scale damage, such as blind image inpainting, requires accurate detection of damaged region masks, which is still very challenging due to the different shapes and patterns of damage. In this work, we propose a new approach, Automatic Corruption Detection, which allows blind corruption recovery without known corruption masks. Specifically, we develop a hierarchical contrastive learning framework to detect corrupted regions by capturing the intrinsic semantic distinction between corrupted and non-corrupted regions. In particular, our model detects corrupted masks in a coarse-to-fine manner by first predicting a coarse mask via contrastive learning in a low-resolution feature space, and then refining the mask in uncertain regions via high-resolution rate contrastive learning. A specialized hierarchical interaction mechanism is designed to facilitate the knowledge transfer of contrastive learning at different scales, greatly improving the modeling performance. The detected multi-scale damage masks are then utilized to guide damage recovery. The model detects corrupt regions by learning their contrastive distinctions rather than the semantic patterns of corruption, with good generalization ability. Extensive experiments show that our model has the following advantages: 1) outperforming other methods on both damage detection and various image restoration tasks including blind inpainting and watermark removal, and 2) across different damage patterns such as graffiti , random noise, or other image content). Code and training weights are available at https://github.com/xyfJASON/HCL.

1.13 Joint Gaze-Location and Gaze-Object Detection

Joint line-of-sight localization and line-of-sight target detection

https://arxiv.org/abs/2308.13857

In this paper, we propose an efficient joint gaze location detection (GL-D) and gaze object detection (GO-D) method, gaze-following detection. The current approach frames GL-D and GO-D as two separate tasks in a multi-stage framework, where human head crops must first be detected and then fed to the subsequent GL-D sub-network, which is further by an additional object detector GO-D. In contrast, our redefined gaze-following detection task, which simultaneously detects human head positions and their gaze-following, aims to jointly detect human gaze positions and gaze objects in a unified and single-stage pipeline. To this end, we propose GTR, short for \underline{G}aze following detection \underline{TR}anformer, which simplifies the gaze-following detection pipeline by eliminating all extra components, thus achieving the first in a fully end-to-end manner to integrate GL A unified paradigm where -D and GO-D come together. GTR achieves an iterative interaction between global semantics and human head features through a hierarchical structure, infers salient objects and human gaze relations from global image context, and yields impressive accuracy. Specifically, GTR achieved a 12.1 mAP gain on GazeFollowing ( 25.1 % \mathbf{25.1}\%25.1 % ), achieving 18.2 mAP gain on VideoAttentionTarget (43.3 % \mathbf{43.3\%}43.3% ), achieving a 19 mAP improvement on GOO-Real (45.2 % \mathbf{45.2\%}45.2% ). Meanwhile, unlike existing systems that sequentially detect gaze-following due to the need for human heads as input, GTR has the flexibility to simultaneously understand the gaze-following of any number of people, leading to high efficiency. Specifically, GTR introduces an improvement overKaTeX parse error: Undefined control sequence: \by at position 1: \̲by ̲9, and the relative gap becomes more pronounced as the number of humans grows.

1.14 SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection

SOGDet: Semantic occupancy-guided multi-view 3D object detection

https://arxiv.org/abs/2308.13794

In the field of autonomous driving, accurate and comprehensive perception of the 3D environment is crucial. Bird's eye view (BEV) based methods have emerged as a promising solution for 3D object detection using multi-view images as input. However, existing 3D object detection methods often ignore the physical background in the environment, such as sidewalks and vegetation, resulting in suboptimal performance. In this paper, we propose a new method called SOGDet (Semantic Occupancy-Guided Multi-View 3D Object Detection), which leverages the 3D semantic occupancy branch to improve the accuracy of 3D object detection. In particular, the physical context modeled by semantic occupancy helps the detector perceive the scene in a more comprehensive view. Our SOGDet is flexible to use and can be seamlessly integrated with most existing BEV-based methods. To evaluate its effectiveness, we apply this method to several state-of-the-art baselines and conduct extensive experiments on the exclusive nuScenes dataset. Our results show that SOGDet consistently enhances the performance of three baseline methods in terms of nuScenes detection score (NDS) and mean average precision (mAP). This suggests that the combination of 3D object detection and 3D semantic occupancy leads to a more comprehensive perception of the 3D environment, which can help build more robust autonomous driving systems. The code is available at: https://github.com/zhouqiu/SOGDet.

1.15 Out-of-distribution detection using normalizing flows on the data manifold

Out-of-distribution detection using normalized flow on a data manifold

https://arxiv.org/abs/2308.13792

A common approach for out-of-distribution detection involves estimating the underlying data distribution, which assigns lower likelihood values ​​to out-of-distribution data. Normalized flows are likelihood-based generative models that provide tractable density estimates via dimension-preserving reversible transforms. Traditional normalized flows are prone to failure in out-of-distribution detection due to the well-known curse of dimensionality problem for likelihood-based models. According to the manifold assumption, real-world data usually lie on low-dimensional manifolds. This study explores the impact of manifold learning on out-of-distribution detection using normalized flow. We proceed with estimating the density on the low-dimensional manifold, coupled with the measured distance from the manifold, as the standard distribution detected. Individually, however, each of them is insufficient for this task. Extensive experimental results show that manifold learning improves the distribution detection capabilities of a class of likelihood-based models called normalized flows. This improvement is achieved without modifying the model structure or using auxiliary out-of-distribution data during training.

1.16 Zero-Shot Edge Detection with SCESAME: Spectral Clustering-based Ensemble for Segment Anything Model Estimation

SCESAME-Based Zero-Shot Edge Detection: An Ensemble of Spectral Clustering-Based Segment Any Model Estimation

https://arxiv.org/abs/2308.13779

This paper proposes a new zero-shot edge detection with SCESAME, which stands for Spectral Clustering Based Ensemble Segment Any Model Estimation, based on the recently proposed Segment Any Model (SAM). SAM is the basic model for segmentation tasks, one of the interesting applications of SAM is automatic mask generation (AMG), which generates zero-shot segmentation masks of the whole image. AMG can be applied to edge detection, but suffers from the problem of overdetecting edges. Edge detection using SCESAME overcomes this problem in three steps: (1) eliminating the generated small masks, (2) considering mask positions and overlaps, combining masks by spectral clustering, and (3) detecting Artifacts are removed afterwards. We conduct edge detection experiments on two datasets, BSDS500 and NYUDv2. Although our zero-shot method is simple, experimental results on BSDS500 show almost the same human performance and CNN-based methods seven years ago. In the NYUDv2 experiment, it performs almost as well as recent CNN-based methods. These results demonstrate that our method has the potential to be a strong baseline for future zero-shot edge detection methods. Furthermore, SCESAME is not only applicable to edge detection, but also to other downstream zero-shot tasks.

1.17 Post-Hoc Explainability of BI-RADS Descriptors in a Multi-task Framework for Breast Cancer Detection and Segmentation

Post-self-organization interpretability of BI-RADS descriptors in a multi-task framework for breast cancer detection and segmentation

https://arxiv.org/abs/2308.14213

Despite recent medical advances, breast cancer remains one of the most prevalent and deadly diseases among women. Although machine learning-based computer-aided diagnosis (CAD) systems have shown potential to help radiologists analyze medical images, the opacity of the best-performing CAD systems has raised concerns about their trustworthiness and interpretability. This paper presents MT-BI-RADS, a novel interpretable deep learning method for tumor detection in breast ultrasound (BUS) images. The method provides three levels of interpretation, enabling radiologists to understand the decision-making process for predicting tumor malignancy. First, the BI-RADS categories output by the proposed model are used by radiologists for BUS image analysis. Second, the model employs multi-task learning to simultaneously segment regions in images corresponding to tumors. Third, the output of the proposed method quantifies the contribution of each BI-RADS descriptor to the prediction of benign or malignant class using post-hoc interpretation with Shapley values.

1.18 Bias in Unsupervised Anomaly Detection in Brain MRI

Bias in unsupervised anomaly detection in brain MRI

https://arxiv.org/abs/2308.13861

Unsupervised anomaly detection methods offer a promising and flexible alternative to supervised approaches, holding the potential to revolutionize medical scan analysis and improve diagnostic performance. In the present context, it is often assumed that differences between test case and training distributions are due to pathological conditions only, meaning that any difference indicates an anomaly. However, the presence of other potential sources of distributional variation, including scanner, age, gender or race, is often overlooked. These offsets can significantly affect the accuracy of anomaly detection tasks. Prominent instances of this failure raise concerns about bias, credibility, and fairness in anomaly detection. This work presents a novel analysis of bias in unsupervised anomaly detection. By examining potentially non-pathological distributional shifts between the training and testing distributions, we reveal the extent of these deviations and their impact on anomaly detection results. Furthermore, this study explores algorithmic limitations due to bias, providing valuable insights into the challenges anomaly detection algorithms encounter in accurately learning and capturing the entire range of variation present in canonical distributions. Through this analysis, we aim to enhance understanding of these biases and pave the way for future improvements in the field. Here, we specifically investigate brain MR imaging detection of Alzheimer's disease as a case study, revealing significant bias with gender, race and scanner variation that substantially influences the results. These findings are consistent with the broader goal of improving the reliability, fairness, and effectiveness of anomaly detection in medical imaging.

Guess you like

Origin blog.csdn.net/wzk4869/article/details/132715245