Vectorizers

Supported vectorizers

In this document, you will learn how to use RedisVL to create embeddings using the built-in text embedding vectorizers. RedisVL supports:

  1. OpenAI
  2. HuggingFace
  3. Vertex AI
  4. Cohere
Note:
This document is a converted form of this Jupyter notebook.

Before beginning, be sure of the following:

  1. You have installed RedisVL and have that environment activated.
  2. You have a running Redis instance with the search and query capability.
# import necessary modules
import os

Create text embeddings

This example will show how to create an embedding from three simple sentences with a number of different text vectorizers in RedisVL.

  • "That is a happy dog"
  • "That is a happy person"
  • "Today is a nice day"

OpenAI

The OpenAITextVectorizer makes it easy to use RedisVL with the embedding models from OpenAI. For this you will need to install openai.

pip install openai
import getpass

# setup the API Key
api_key = os.environ.get("OPENAI_API_KEY") or getpass.getpass("Enter your OpenAI API key: ")
from redisvl.utils.vectorize import OpenAITextVectorizer

# create a vectorizer
oai = OpenAITextVectorizer(
    model="text-embedding-ada-002",
    api_config={"api_key": api_key},
)

test = oai.embed("This is a test sentence.")
print("Vector dimensions: ", len(test))
test[:10]

Vector dimensions:  1536

[-0.001025049015879631,
 -0.0030993607360869646,
 0.0024536605924367905,
 -0.004484387580305338,
 -0.010331203229725361,
 0.012700922787189484,
 -0.005368996877223253,
 -0.0029411641880869865,
 -0.0070833307690918446,
 -0.03386051580309868]
# Create many embeddings at once
sentences = [
    "That is a happy dog",
    "That is a happy person",
    "Today is a sunny day"
]

embeddings = oai.embed_many(sentences)
embeddings[0][:10]

[-0.01747742109000683,
 -5.228330701356754e-05,
 0.0013870716793462634,
 -0.025637786835432053,
 -0.01985435001552105,
 0.016117358580231667,
 -0.0037306349258869886,
 0.0008945261361077428,
 0.006577865686267614,
 -0.025091219693422318]
# openai also supports asyncronous requests, which you can use to speed up the vectorization process.
embeddings = await oai.aembed_many(sentences)
print("Number of Embeddings:", len(embeddings))

Number of Embeddings: 3

Huggingface

Huggingface is a popular natural language processing (NLP) platform that has a number of pre-trained models you can use off the shelf. RedisVL supports using Huggingface "Sentence Transformers" to create embeddings from text. To use Huggingface, you will need to install the sentence-transformers library.

pip install sentence-transformers
os.environ["TOKENIZERS_PARALLELISM"] = "false"
from redisvl.utils.vectorize import HFTextVectorizer

# create a vectorizer
# choose your model from the huggingface website
hf = HFTextVectorizer(model="sentence-transformers/all-mpnet-base-v2")

# embed a sentence
test = hf.embed("This is a test sentence.")
test[:10]

[0.00037810884532518685,
 -0.05080341175198555,
 -0.03514723479747772,
 -0.02325104922056198,
 -0.044158220291137695,
 0.020487844944000244,
 0.0014617963461205363,
 0.031261757016181946,
 0.05605152249336243,
 0.018815357238054276]
# You can also create many embeddings at once
embeddings = hf.embed_many(sentences, as_buffer=True)

VertexAI

VertexAI is GCP's fully-featured AI platform, which includes a number of pre-trained LLMs. RedisVL supports using VertexAI to create embeddings from these models. To use VertexAI, you will first need to install the google-cloud-aiplatform library.

pip install google-cloud-aiplatform>=1.26

Then you need to gain access to a Google Cloud Project and provide access to credentials. This is accomplished by setting the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of a JSON key file downloaded from your service account on GCP.

Finally, you need to find your project ID and geographic region for VertexAI.

Make sure the following env vars are set:

GOOGLE_APPLICATION_CREDENTIALS=<path to your gcp JSON creds>
GCP_PROJECT_ID=<your gcp project id>
GCP_LOCATION=<your gcp geo region for vertex ai>
from redisvl.utils.vectorize import VertexAITextVectorizer


# create a vectorizer
vtx = VertexAITextVectorizer(api_config={
    "project_id": os.environ.get("GCP_PROJECT_ID") or getpass.getpass("Enter your GCP Project ID: "),
    "location": os.environ.get("GCP_LOCATION") or getpass.getpass("Enter your GCP Location: "),
    "google_application_credentials": os.environ.get("GOOGLE_APPLICATION_CREDENTIALS") or getpass.getpass("Enter your Google App Credentials path: ")
})

# embed a sentence
test = vtx.embed("This is a test sentence.")
test[:10]

[0.04373306408524513,
 -0.05040992051362991,
 -0.011946038343012333,
 -0.043528858572244644,
 0.021510830149054527,
 0.028604144230484962,
 0.014770914800465107,
 -0.01610461436212063,
 -0.0036560404114425182,
 0.013746795244514942]

Cohere

Cohere allows you to implement language AI in your product. The CohereTextVectorizer makes it simple to use RedisVL with the embedding models at Cohere. For this, you will need to install cohere.

pip install cohere
import getpass
# set up the API Key
api_key = os.environ.get("COHERE_API_KEY") or getpass.getpass("Enter your Cohere API key: ")

Special attention needs to be paid to the input_type parameter for each embed call. For example, for embedding queries, you should set input_type='search_query'. For embedding documents, set input_type='search_document'. See more information here.

from redisvl.utils.vectorize import CohereTextVectorizer

# create a vectorizer
co = CohereTextVectorizer(
    model="embed-english-v3.0",
    api_config={"api_key": api_key},
)

# embed a search query
test = co.embed("This is a test sentence.", input_type='search_query')
print("Vector dimensions: ", len(test))
print(test[:10])

# embed a document
test = co.embed("This is a test sentence.", input_type='search_document')
print("Vector dimensions: ", len(test))
print(test[:10])

Vector dimensions:  1024
[-0.010856628, -0.019683838, -0.0062179565, 0.003545761, -0.047943115, 0.0009365082, -0.005924225, 0.016174316, -0.03289795, 0.049194336]
Vector dimensions:  1024
[-0.009712219, -0.016036987, 2.8073788e-05, -0.022491455, -0.041259766, 0.002281189, -0.033294678, -0.00057029724, -0.026260376, 0.0579834]

Learn more about using RedisVL and Cohere together through this dedicated user guide.

Search with provider embeddings

Now that you've created your embeddings, you can use them to search for similar sentences. You will use the same three sentences from above and search for similar sentences.

First, create the schema for your index.

Here's what the schema for the example looks like in YAML for the HuggingFace vectorizer:

version: '0.1.0'

index:
    name: vectorizers
    prefix: doc
    storage_type: hash

fields:
    - name: sentence
      type: text
    - name: embedding
      type: vector
      attrs:
        dims: 768
        algorithm: flat
        distance_metric: cosine
from redisvl.index import SearchIndex

# construct a search index from the schema
index = SearchIndex.from_yaml("./schema.yaml")

# connect to local redis instance
index.connect("redis://localhost:6379")

# create the index (no data yet)
index.create(overwrite=True)
# use the CLI to see the created index
!rvl index listall

22:02:27 [RedisVL] INFO   Indices:
22:02:27 [RedisVL] INFO   1. vectorizers
# load expects an iterable of dictionaries where
# the vector is stored as a bytes buffer

data = [{"text": t,
         "embedding": v}
        for t, v in zip(sentences, embeddings)]

index.load(data)

    ['doc:17c401b679ce43cb82f3ab2280ad02f2',
     'doc:3fc0502bec434b17a3f06e20824b2e59',
     'doc:199f17b0e5d24dcaa1fd4fb41558150c']
from redisvl.query import VectorQuery

# use the HuggingFace vectorizer again to create a query embedding
query_embedding = hf.embed("That is a happy cat")

query = VectorQuery(
    vector=query_embedding,
    vector_field_name="embedding",
    return_fields=["text"],
    num_results=3
)

results = index.query(query)
for doc in results:
    print(doc["text"], doc["vector_distance"])

That is a happy dog 0.160862326622
That is a happy person 0.273598492146
Today is a sunny day 0.744559407234
# cleanup
index.delete()
RATE THIS PAGE
Back to top ↑