CVPR2019 TEXT PAPERS

1.[CVPR 2019] Aggregation Cross-Entropy for Sequence Recognition

In this paper, we propose a novel method, aggregation cross-entropy (ACE), for sequence recognition from a brand new perspective. The ACE loss function exhibits competitive performance to CTC and the attention mechanism, with much quicker implementation (as it involves only four fundamental formulas), faster inference\back-propagation (approximately O(1) in parallel), less storage requirement (no parameter and negligible runtime memory), and convenient employment (by replacing CTC with ACE).Furthermore, the proposed ACE loss function exhibits two noteworthy properties: (1) it can be directly applied for 2D prediction by flattening the 2D prediction into 1D prediction as the input and (2) it requires only characters and their numbers in the sequence annotation for supervision, which allows it to advance beyond sequence recognition, e.g., counting problem.

本文提出了一种新的序列识别方法——聚集交叉熵(ACE)。ACE损失函数具有竞争力的性能CTC和注意力机制,更快实现(它只包含四个基本公式),更快的推理\反向传播(大约O(1)并行),更少的存储需求(没有参数和运行时内存可以忽略不计),方便就业(ACE)代替CTC。此外,拟议中的王牌损失函数展示了两个值得注意的属性:(1)它可以直接申请2D预测2D预测压扁到1D预测作为输入和(2)它只需要字符和数字序列注释的监督,它可以超越“序列识别,例如,计算问题。

[CVPR 2019] An Alternative Deep Feature Approach to Line Level Keyword Spotting

Keyword spotting (KWS) is defined as the problem of detecting all instances of a given word, provided by the user either as a query word image (Query-by-Example, QbE) or a query word string (Query-by-String, QbS) in a body of digitized documents. Keyword detection is typically preceded by a preprocessing step where the text is segmented into text lines (line-level KWS). Methods following this paradigm are monopolized by test-time computationally expensive handwritten text recognition (HTR)-based approaches; furthermore, they typically cannot handle image queries (QbE). In this work, we propose a time and storage-efficient, deep feature-based approach that enables both the image and textual search options. Three distinct components, all modeled as neural networks, are combined: normalization, feature extraction and representation of image and textual input into a common space. These components, even if designed on word level image representations, collaborate in order to achieve an efficient line level keyword spotting system. The experimental results indicate that the proposed system is on par with state-of-the-art KWS methods.

关键字发现(KWS)被定义为检测给定单词的所有实例的问题，该问题由用户在数字化文档体中以查询单词图像(逐个示例查询，QbE)或查询单词字符串(逐个字符串查询，QbS)的形式提供。关键字检测通常在预处理步骤之前进行，其中文本被分割为文本行(行级KWS)。基于测试时间的手写文本识别(HTR)方法垄断了遵循该范式的方法;此外，它们通常不能处理图像查询(QbE)。在这项工作中，我们提出了一种时间和存储效率高、基于深度特性的方法，支持图像和文本搜索选项。三个不同的组成部分，所有建模为神经网络，结合:标准化，特征提取和表示图像和文本输入到一个公共空间。这些组件，即使是在word级图像表示上设计的，也可以协作以实现高效的行级关键字识别系统。实验结果表明，该系统与现有的KWS方法基本一致。

猜你喜欢