How to perform vector search and find the semantic similarity of documents in Python?
Last updated 20, Apr 2024
Question
How to perform vector search and find the semantic similarity of documents in Python?
Answer
In order to perform Vector Similarity searches in Python, first create the index to execute the recommendations for similar documents. For the model all-distilroberta-v1
, make sure DIM
is 768
(see the example).
FT.CREATE vss_index ON HASH PREFIX 1 "doc:" SCHEMA name TEXT content TEXT creation NUMERIC SORTABLE update NUMERIC SORTABLE content_embedding VECTOR FLAT 6 TYPE FLOAT32 DIM 768 DISTANCE_METRIC COSINE
Copy code
Modeling documents
Then import the modeling library, in order to use all-distilroberta-v1
, you must include the library SentenceTransformer
.
from sentence_transformers import SentenceTransformer
Copy code
Now we need to produce a vectorial representation of the document. Use a suitable model to compute the vector embedding of the :
content = "This is an arbitrary content"
model = SentenceTransformer('sentence-transformers/all-distilroberta-v1')
embedding = model.encode(content).astype(np.float32).tobytes()
Copy code
Now you can store the embedding in the Hash that
doc = { "content_embedding" : embedding,
"name" : "Document's title",
"state" : document.state}
conn.hset("doc:{}".format(pk), mapping=doc)
Copy code
Searching for similar documents
In order to search for documents similar to a provided document, you will model the document as done previously, when creating a database of vector embeddings.
model = SentenceTransformer('sentence-transformers/all-distilroberta-v1')
new_embedding = model.encode(content).astype(np.float32).tobytes()
Copy code
And then perform the similarity search.
q = Query("*=>[KNN 3 @v $vec]").return_field("__v_score").dialect(2)
res = conn.ft("vss_index").search(q, query_params={"vec": new_embedding})
Copy code