[Computer Vision | Image Segmentation] arxiv Computer Vision Academic Express on Image Segmentation (A collection of papers on August 14)

1. Segmentation | Semantic Correlation (5 articles)

1.1 Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation

Spatial Information Guided Adaptive Context-Aware Network for Efficient RGB-D Semantic Segmentation

https://arxiv.org/abs/2308.06024

Effective RGB-D semantic segmentation has received extensive attention in mobile robots, where it plays a crucial role in analyzing and recognizing environmental information. According to previous studies, depth information can provide corresponding geometric relationships for objects and scenes, but the actual depth data usually exists in the form of noise. To avoid adverse effects on segmentation accuracy and computation, it is necessary to design an effective framework to exploit cross-modal correlations and complementary cues. In this paper, we propose an efficient lightweight encoder-decoder network with reduced computational parameters and guaranteed robustness of the algorithm. By using channel-wise and spatially fused attention modules, our network effectively captures multi-level RGB-D features. A globally guided local affinity context module is proposed to obtain sufficient high-level context information. The decoder utilizes a lightweight residual unit that combines short-range and long-range information with some redundant computation. Experimental results on NYUv 2, SUN RGB-D and Cityscapes datasets show that our method achieves a better trade-off among segmentation accuracy, inference time and parameters than the state-of-the-art methods. The source code is located at https://github.com/MVME-HBUT/SGACNet

1.2 FoodSAM: Any Food Segmentation

FoodSAM: Any Food Segmentation

https://arxiv.org/abs/2308.05938

In this paper, we explore the capabilities of zero-shot, Segment Any Model (SAM) for food image segmentation. To address the lack of class-specific information for SAM-generated masks, we propose a new framework called FoodSAM. This innovative approach combines coarse semantic masks with SAM-generated masks to improve semantic segmentation quality. Furthermore, we realize that ingredients in food can be considered as independent entities, which motivates us to perform instance segmentation on food images. Furthermore, FoodSAM extends its zero-shot capability to cover panoptic segmentation by incorporating object detectors, which enables FoodSAM to efficiently capture non-food object information. Inspired from recent successful cue segmentation, we also extend FoodSAM cue segmentation to support various cue variables. Thus, FoodSAM emerged as an all-encompassing solution capable of segmenting food at multiple levels of granularity. Notably, this seminal framework is the first-ever work to achieve instance, panorama, and hintable segmentation on food images. Extensive experiments demonstrate the feasibility and impressive performance of FoodSAM, validating the potential of SAM as a prominent and influential tool in the field of food image segmentation. We release the code at https://github.com/jamesjg/FoodSAM.

1.3 SegDA: Maximum Separable Segment Mask with Pseudo Labels for Domain Adaptive Semantic Segmentation

SegDA: Maximally Segmentable Segmentation Masks with Pseudo-Labels for Domain-Adaptive Semantic Segmentation

https://arxiv.org/abs/2308.05851

Unsupervised domain adaptation (UDA) addresses the label-scarce problem of target domains by transferring knowledge from label-rich source domains. Typically, the source domain consists of synthetic images, for which annotations are readily obtained using well-known computer graphics techniques. However, obtaining annotations of real-world images (target domains) requires a lot of manual annotation work and is time-consuming since it requires per-pixel annotations. To address this issue, we propose the SegDA module to improve the transfer performance of UDA methods by learning maximally separable segment representations. This solves the problem of recognizing visually similar classes such as pedestrian/cyclist, sidewalk/road, etc. We utilize an equiangular tight frame (ETF) classifier inspired by neural collapse for maximally separated segment classes. This collapses the source-domain pixel representations to a single vector forming simplice vertices aligned with a maximally separable ETF classifier. We exploit this phenomenon to propose a novel architecture for domain adaptation of segmented representations of the target domain. Furthermore, we propose to estimate the noise label of the target domain image and update the decoder with noise correction, which encourages the discovery of pseudo-labeled pixels of unrecognized classes. We have used four UDA benchmarks to simulate synthetic to real, day to night, sunny to adverse weather scenarios. Our proposed method outperforms +2.2mIoU on GTA -> Cityscapes, +2.0mIoU on Synthia -> Cityscapes, +5.9mIoU on Cityscapes -> DarkZurich, and +2.6 on Cityscapes -> ACDC mIoU.

1.4 The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions

The Multi-Channel Cell Segmentation Challenge: Towards a General Solution

https://arxiv.org/abs/2308.05864

Cell segmentation is a key step in the quantitative analysis of single cells in microscopic images. Existing cell segmentation methods are often tailored to specific modalities or require manual intervention to specify hyperparameters in different experimental settings. Here, we present a multimodal cell segmentation benchmark comprising over 1500 labeled images from more than 50 different biological experiments. Top players have developed a Transformer-based deep learning algorithm that not only outperforms existing methods, but can also be applied to a wide variety of microscopy images across imaging platforms and tissue types without manually tuning parameters. This benchmark and improved algorithm provide a promising avenue for more accurate and flexible cellular analysis in microscopy imaging.

1.5 Multi-scale Multi-site Renal Microvascular Structures Segmentation for Whole Slide Imaging in Renal Pathology

Multi-scale and multi-point renal microvascular structure segmentation for whole-slide imaging of renal pathology

https://arxiv.org/abs/2308.05782

Segmentation of microvascular structures such as arterioles, venules and capillaries from human kidney whole section images (WSI) has become a focus in renal pathology. Current manual segmentation techniques are time-consuming and infeasible for large-scale digital pathology images. Although deep learning-based methods provide solutions for automatic segmentation, most of them suffer from limitations: they are designed for and limited to training on single-site, single-scale data. In this paper, we propose Omni-Seg, a new single-dynamic network approach utilizing multi-site, multi-scale training data. Unique to our approach, we utilize partially labeled images, where only one tissue type is labeled per training image, to segment microvascular structures. We trained a single deep network at different magnifications (40x, 20x, 10x and 5x) using images from two datasets (HuBMAP and NEPTUNE). Experimental results show that Omni-Seg outperforms other algorithms in terms of Dice Similarity Coefficient (DSC) and intersection over union (IoU). Our proposed method provides renal pathologists with a powerful computational tool for quantitative analysis of renal microvascular architecture.

Guess you like

Origin blog.csdn.net/wzk4869/article/details/132466839