Nearest neighbor search

Description

Since the similarity index between vectors is needed in the business, the vector acacia index has very high application value in many business scenarios. We already have a way to process words, sentences, pictures, and other information into resounding ways. This has application value in some relevance searches.

In this article, annoy and nmslib are built in two ways.
The following provides two ways of index construction:

Annoy index construction:

f = 200
tc_index = AnnoyIndex(f,metric='angular')
with open(r"D:\sent_vec", "r", encoding="utf-8") as reader:

    for line in reader:

        line = line.strip()
        linespl = line.split()
        id = int(linespl[0])
        vec = [float(v) for v in linespl[1:]]

        tc_index.add_item(id, vec)

tc_index.build(5)

tc_index.save(r'D:\index.ann')

nmslib index construction:

tc_index = nms.init(method='hnsw', space='cosinesimil')

with open(r"D:\sent_vec", "r", encoding="utf-8") as reader:

    for line in reader:

        line = line.strip()
        linespl = line.split()
        id = int(linespl[0])
        if id % 10000 == 0:
            print("processing {}".format(id))
        vec = [float(v) for v in linespl[1:]]
        if first_data == None:
            first_data = vec

        tc_index.addDataPoint(id, vec)

Brief comment: In the
overall use process, nmslib should be a little faster, and search the index based on the vector, which is also more friendly to unregistered.

Guess you like

Origin blog.csdn.net/cyinfi/article/details/102134979