PostgreSQL里的17种文本相似算法与GIN索引_-_pg_similarity

阅读原文:http://click.aliyun.com/m/22329/
摘要: 标签 PostgreSQL , 文本相似 , pg_similarity , pg_trgm , rum , fuzzymatch gin , smlar 背景 文本相似算法,结合PostgreSQL的开放索引框架GIN,可以实现各种相似算法的文本高效检索。

标签

PostgreSQL , 文本相似 , pg_similarity , pg_trgm , rum , fuzzymatch gin , smlar

背景

文本相似算法,结合PostgreSQL的开放索引框架GIN,可以实现各种相似算法的文本高效检索。

PostgreSQL中常见的文本相似搜索插件:rum, pg_trgm, fuzzymatch, pg_similarity, smlar。

其中pg_similarity支持的算法达到了17种。

Introduction

pg_similarity is an extension to support similarity queries on PostgreSQL.

The implementation is tightly integrated in the RDBMS in the sense that it defines operators
so instead of the traditional operators (= and <>) you can use ~~~ and ! (any of these
operators represents a similarity function).

pg_similarity has three main components:

Functions:

a set of functions that implements similarity algorithms available in the literature.

These functions can be used as UDFs and, will be the base for implementing the similarity operators;

Operators:

a set of operators defined at the top of similarity functions.

They use similarity functions to obtain the similarity threshold and,
compare its value to a user-defined threshold to decide if it is a match or not;

Session Variables:

a set of variables that store similarity function parameters. Theses variables can be defined at run time.
阅读原文:http://click.aliyun.com/m/22329/

猜你喜欢

转载自1369049491.iteye.com/blog/2377504
今日推荐