{
  "id": "vectorizers",
  "title": "Create Embeddings with Vectorizers",
  "url": "https://redis.io/docs/latest/develop/ai/redisvl/user_guide/how_to_guides/vectorizers/",
  "summary": "",
  "tags": [],
  "last_updated": "2026-05-06T11:49:45+02:00",
  "page_type": "content",
  "content_hash": "692a328efa0b574944221af19fc87c6d18cfec48baeed9034858e1923132cc34",
  "sections": [
    {
      "id": "overview",
      "title": "Overview",
      "role": "overview",
      "text": "This guide demonstrates how to create embeddings using RedisVL's built-in text vectorizers. RedisVL supports multiple embedding providers: OpenAI, HuggingFace, Vertex AI, Cohere, Mistral AI, Amazon Bedrock, VoyageAI, and custom vectorizers."
    },
    {
      "id": "prerequisites",
      "title": "Prerequisites",
      "role": "content",
      "text": "Before you begin, ensure you have:\n- Installed RedisVL: `pip install redisvl`\n- A running Redis instance ([Redis 8+](https://redis.io/downloads/) or [Redis Cloud](https://redis.io/cloud))\n- API keys for the embedding providers you plan to use"
    },
    {
      "id": "what-you-ll-learn",
      "title": "What You'll Learn",
      "role": "content",
      "text": "By the end of this guide, you will be able to:\n- Create embeddings using multiple providers (OpenAI, HuggingFace, Cohere, etc.)\n- Use synchronous and asynchronous embedding methods\n- Batch embed multiple texts efficiently\n- Build custom vectorizers for your own embedding functions\n- Integrate vectorizers with RedisVL indexes for semantic search\n\n\n[code example]"
    },
    {
      "id": "creating-text-embeddings",
      "title": "Creating Text Embeddings",
      "role": "content",
      "text": "This example will show how to create an embedding from 3 simple sentences with a number of different text vectorizers in RedisVL.\n\n- \"That is a happy dog\"\n- \"That is a happy person\"\n- \"Today is a nice day\""
    },
    {
      "id": "openai",
      "title": "OpenAI",
      "role": "content",
      "text": "The ``OpenAITextVectorizer`` makes it simple to use RedisVL with the embeddings models at OpenAI. For this you will need to install ``openai``. \n\n[code example]\n\n\n\n[code example]\n\n\n[code example]\n\n    Vector dimensions:  1536\n\n\n\n\n\n    [-0.0011391325388103724,\n     -0.003206387162208557,\n     0.002380132209509611,\n     -0.004501554183661938,\n     -0.010328996926546097,\n     0.012922565452754498,\n     -0.005491119809448719,\n     -0.0029864837415516376,\n     -0.007327961269766092,\n     -0.03365817293524742]\n\n\n\n\n[code example]\n\n\n\n\n    [-0.017466850578784943,\n     1.8471690054866485e-05,\n     0.00129731057677418,\n     -0.02555876597762108,\n     -0.019842341542243958,\n     0.01603139191865921,\n     -0.0037347301840782166,\n     0.0009670283179730177,\n     0.006618348415941,\n     -0.02497442066669464]\n\n\n\n\n[code example]\n\n    Number of Embeddings: 3"
    },
    {
      "id": "azure-openai",
      "title": "Azure OpenAI",
      "role": "content",
      "text": "The ``AzureOpenAITextVectorizer`` is a variation of the OpenAI vectorizer that calls OpenAI models within Azure. If you've already installed ``openai``, then you're ready to use Azure OpenAI.\n\nThe only practical difference between OpenAI and Azure OpenAI is the variables required to call the API.\n\n\n[code example]\n\n\n[code example]\n\n\n[code example]"
    },
    {
      "id": "huggingface",
      "title": "Huggingface",
      "role": "content",
      "text": "[Huggingface](https://huggingface.co/models) is a popular NLP platform that has a number of pre-trained models you can use off the shelf. RedisVL supports using Huggingface \"Sentence Transformers\" to create embeddings from text. To use Huggingface, you will need to install the ``sentence-transformers`` library.\n\n[code example]\n\n\n[code example]\n\n\n[code example]"
    },
    {
      "id": "vertexai",
      "title": "VertexAI",
      "role": "content",
      "text": "[VertexAI](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings) is GCP's fully-featured AI platform including a number of pretrained LLMs. RedisVL supports using VertexAI to create embeddings from these models. To use VertexAI, you will first need to install the ``google-cloud-aiplatform`` library.\n\n[code example]\n\n1. Then you need to gain access to a [Google Cloud Project](https://cloud.google.com/gcp?hl=en) and provide [access to credentials](https://cloud.google.com/docs/authentication/application-default-credentials). This is accomplished by setting the `GOOGLE_APPLICATION_CREDENTIALS` environment variable pointing to the path of a JSON key file downloaded from your service account on GCP.\n2. Lastly, you need to find your [project ID](https://support.google.com/googleapi/answer/7014113?hl=en) and [geographic region for VertexAI](https://cloud.google.com/vertex-ai/docs/general/locations).\n\n\n**Make sure the following env vars are set:**\n\n[code example]\n\n\n[code example]"
    },
    {
      "id": "cohere",
      "title": "Cohere",
      "role": "content",
      "text": "[Cohere](https://dashboard.cohere.ai/) allows you to implement language AI into your product. The `CohereTextVectorizer` makes it simple to use RedisVL with the embeddings models at Cohere. For this you will need to install `cohere`.\n\n[code example]\n\n\n[code example]\n\n\nSpecial attention needs to be paid to the `input_type` parameter for each `embed` call. For example, for embedding \nqueries, you should set `input_type='search_query'`; for embedding documents, set `input_type='search_document'`. See\nmore information [here](https://docs.cohere.com/reference/embed)\n\n\n[code example]\n\nLearn more about using RedisVL and Cohere together through [this dedicated user guide](https://docs.cohere.com/docs/redis-and-cohere)."
    },
    {
      "id": "voyageai",
      "title": "VoyageAI",
      "role": "content",
      "text": "[VoyageAI](https://dash.voyageai.com/) allows you to implement language AI into your product. The `VoyageAIVectorizer` makes it simple to use RedisVL with the embeddings models at VoyageAI. For this you will need to install `voyageai`.\n\n[code example]\n\n\n[code example]\n\n\nSpecial attention needs to be paid to the `input_type` parameter for each `embed` call. For example, for embedding \nqueries, you should set `input_type='query'`; for embedding documents, set `input_type='document'`. See\nmore information [here](https://docs.voyageai.com/docs/embeddings)\n\n\n[code example]"
    },
    {
      "id": "mistral-ai",
      "title": "Mistral AI",
      "role": "content",
      "text": "[Mistral](https://console.mistral.ai/) offers LLM and embedding APIs for you to implement into your product. The `MistralAITextVectorizer` makes it simple to use RedisVL with their embeddings model.\nYou will need to install `mistralai`.\n\n[code example]\n\n\n[code example]"
    },
    {
      "id": "amazon-bedrock",
      "title": "Amazon Bedrock",
      "role": "content",
      "text": "Amazon Bedrock provides fully managed foundation models for text embeddings. Install the required dependencies:\n\n[code example]\n\n#### Configure AWS credentials:\n\n\n[code example]\n\n#### Create embeddings:\n\n\n[code example]"
    },
    {
      "id": "custom-vectorizers",
      "title": "Custom Vectorizers",
      "role": "content",
      "text": "RedisVL supports the use of other vectorizers and provides a class to enable compatibility with any function that generates a vector or vectors from string data\n\n\n[code example]\n\nThis enables the use of custom vectorizers with other RedisVL components\n\n\n[code example]"
    },
    {
      "id": "search-with-provider-embeddings",
      "title": "Search with Provider Embeddings",
      "role": "content",
      "text": "Now that we've created our embeddings, we can use them to search for similar sentences. We will use the same 3 sentences from above and search for similar sentences.\n\nFirst, we need to create the schema for our index.\n\nHere's what the schema for the example looks like in yaml for the HuggingFace vectorizer:\n\n[code example]\n\n\n[code example]\n\n\n[code example]\n\nLoading data to RedisVL is easy. It expects a list of dictionaries. The vector is stored as bytes.\n\n\n[code example]\n\n\n[code example]"
    },
    {
      "id": "selecting-your-float-data-type",
      "title": "Selecting your float data type",
      "role": "content",
      "text": "When embedding text as byte arrays RedisVL supports 4 different floating point data types, `float16`, `float32`, `float64` and `bfloat16`, and 2 integer types, `int8` and `uint8`.\nYour dtype set for your vectorizer must match what is defined in your search index. If one is not explicitly set the default is `float32`.\n\n\n[code example]"
    },
    {
      "id": "next-steps",
      "title": "Next Steps",
      "role": "content",
      "text": "Now that you understand how to create embeddings, explore these related guides:\n\n- [Getting Started](https://redis.io/docs/latest/../getting_started) - Learn the basics of indexes and queries\n- [Rerank Results](https://redis.io/docs/latest/rerankers) - Improve search quality with reranking models\n- [Cache Embeddings](https://redis.io/docs/latest/embeddings_cache) - Cache embedding vectors for faster repeated computations"
    },
    {
      "id": "cleanup",
      "title": "Cleanup",
      "role": "content",
      "text": "[code example]"
    }
  ],
  "examples": [
    {
      "id": "what-you-ll-learn-ex0",
      "language": "python",
      "code": "# import necessary modules\nimport os",
      "section_id": "what-you-ll-learn"
    },
    {
      "id": "openai-ex0",
      "language": "bash",
      "code": "pip install openai",
      "section_id": "openai"
    },
    {
      "id": "openai-ex1",
      "language": "python",
      "code": "import getpass\n\n# setup the API Key\napi_key = os.environ.get(\"OPENAI_API_KEY\") or getpass.getpass(\"Enter your OpenAI API key: \")",
      "section_id": "openai"
    },
    {
      "id": "openai-ex2",
      "language": "python",
      "code": "from redisvl.utils.vectorize import OpenAITextVectorizer\n\n# create a vectorizer\noai = OpenAITextVectorizer(\n    model=\"text-embedding-ada-002\",\n    api_config={\"api_key\": api_key},\n)\n\ntest = oai.embed(\"This is a test sentence.\")\nprint(\"Vector dimensions: \", len(test))\ntest[:10]",
      "section_id": "openai"
    },
    {
      "id": "openai-ex3",
      "language": "python",
      "code": "# Create many embeddings at once\nsentences = [\n    \"That is a happy dog\",\n    \"That is a happy person\",\n    \"Today is a sunny day\"\n]\n\nembeddings = oai.embed_many(sentences)\nembeddings[0][:10]",
      "section_id": "openai"
    },
    {
      "id": "openai-ex4",
      "language": "python",
      "code": "# openai also supports asynchronous requests, which we can use to speed up the vectorization process.\nembeddings = await oai.aembed_many(sentences)\nprint(\"Number of Embeddings:\", len(embeddings))",
      "section_id": "openai"
    },
    {
      "id": "azure-openai-ex0",
      "language": "python",
      "code": "# additionally to the API Key, setup the API endpoint and version\napi_key = os.environ.get(\"AZURE_OPENAI_API_KEY\") or getpass.getpass(\"Enter your AzureOpenAI API key: \")\napi_version = os.environ.get(\"OPENAI_API_VERSION\") or getpass.getpass(\"Enter your AzureOpenAI API version: \")\nazure_endpoint = os.environ.get(\"AZURE_OPENAI_ENDPOINT\") or getpass.getpass(\"Enter your AzureOpenAI API endpoint: \")\ndeployment_name = os.environ.get(\"AZURE_OPENAI_DEPLOYMENT_NAME\", \"text-embedding-ada-002\")\n\n# Skip Azure examples when required values are missing (e.g. CI or Run All without Azure).\n_azure_configured = bool(azure_endpoint and api_key and api_version)",
      "section_id": "azure-openai"
    },
    {
      "id": "azure-openai-ex1",
      "language": "python",
      "code": "from redisvl.utils.vectorize import AzureOpenAITextVectorizer\n\nif not _azure_configured:\n    print(\"Skipping Azure OpenAI example: set AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY and OPENAI_API_VERSION\")\n    az_oai = None\nelse:\n    az_oai = AzureOpenAITextVectorizer(\n        model=deployment_name,  # Must be your CUSTOM deployment name\n        api_config={\n            \"api_key\": api_key,\n            \"api_version\": api_version,\n            \"azure_endpoint\": azure_endpoint,\n        },\n    )\n\n    test = az_oai.embed(\"This is a test sentence.\")\n    print(\"Vector dimensions: \", len(test))\n    test[:10]",
      "section_id": "azure-openai"
    },
    {
      "id": "azure-openai-ex2",
      "language": "python",
      "code": "# Just like OpenAI, AzureOpenAI supports batching embeddings and asynchronous requests.\nsentences = [\n    \"That is a happy dog\",\n    \"That is a happy person\",\n    \"Today is a sunny day\",\n]\n\nif _azure_configured and az_oai is not None:\n    embeddings = await az_oai.aembed_many(sentences)\n    embeddings[0][:10]\nelse:\n    print(\"Skipping: run the Azure cells above with valid configuration.\")",
      "section_id": "azure-openai"
    },
    {
      "id": "huggingface-ex0",
      "language": "bash",
      "code": "pip install sentence-transformers",
      "section_id": "huggingface"
    },
    {
      "id": "huggingface-ex1",
      "language": "python",
      "code": "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\nfrom redisvl.utils.vectorize import HFTextVectorizer\n\n\n# create a vectorizer\n# choose your model from the huggingface website\nhf = HFTextVectorizer(model=\"sentence-transformers/all-mpnet-base-v2\")\n\n# embed a sentence\ntest = hf.embed(\"This is a test sentence.\")\ntest[:10]",
      "section_id": "huggingface"
    },
    {
      "id": "huggingface-ex2",
      "language": "python",
      "code": "# You can also create many embeddings at once\nembeddings = hf.embed_many(sentences, as_buffer=True)",
      "section_id": "huggingface"
    },
    {
      "id": "vertexai-ex0",
      "language": "bash",
      "code": "pip install google-cloud-aiplatform>=1.26",
      "section_id": "vertexai"
    },
    {
      "id": "vertexai-ex1",
      "language": "plaintext",
      "code": "GOOGLE_APPLICATION_CREDENTIALS=<path to your gcp JSON creds>\nGCP_PROJECT_ID=<your gcp project id>\nGCP_LOCATION=<your gcp geo region for vertex ai>",
      "section_id": "vertexai"
    },
    {
      "id": "vertexai-ex2",
      "language": "python",
      "code": "from redisvl.utils.vectorize import VertexAIVectorizer\n\n\n# create a vectorizer\nvtx = VertexAIVectorizer(api_config={\n    \"project_id\": os.environ.get(\"GCP_PROJECT_ID\") or getpass.getpass(\"Enter your GCP Project ID: \"),\n    \"location\": os.environ.get(\"GCP_LOCATION\") or getpass.getpass(\"Enter your GCP Location: \"),\n    \"google_application_credentials\": os.environ.get(\"GOOGLE_APPLICATION_CREDENTIALS\") or getpass.getpass(\"Enter your Google App Credentials path: \")\n})\n\n# embed a sentence\ntest = vtx.embed(\"This is a test sentence.\")\ntest[:10]",
      "section_id": "vertexai"
    },
    {
      "id": "cohere-ex0",
      "language": "bash",
      "code": "pip install cohere",
      "section_id": "cohere"
    },
    {
      "id": "cohere-ex1",
      "language": "python",
      "code": "import getpass\n# setup the API Key\napi_key = os.environ.get(\"COHERE_API_KEY\") or getpass.getpass(\"Enter your Cohere API key: \")",
      "section_id": "cohere"
    },
    {
      "id": "cohere-ex2",
      "language": "python",
      "code": "from redisvl.utils.vectorize import CohereTextVectorizer\n\n# create a vectorizer\nco = CohereTextVectorizer(\n    model=\"embed-english-v3.0\",\n    api_config={\"api_key\": api_key},\n)\n\n# embed a search query\ntest = co.embed(\"This is a test sentence.\", input_type='search_query')\nprint(\"Vector dimensions: \", len(test))\nprint(test[:10])\n\n# embed a document\ntest = co.embed(\"This is a test sentence.\", input_type='search_document')\nprint(\"Vector dimensions: \", len(test))\nprint(test[:10])",
      "section_id": "cohere"
    },
    {
      "id": "voyageai-ex0",
      "language": "bash",
      "code": "pip install voyageai",
      "section_id": "voyageai"
    },
    {
      "id": "voyageai-ex1",
      "language": "python",
      "code": "import getpass\n# setup the API Key\napi_key = os.environ.get(\"VOYAGE_API_KEY\") or getpass.getpass(\"Enter your VoyageAI API key: \")",
      "section_id": "voyageai"
    },
    {
      "id": "voyageai-ex2",
      "language": "python",
      "code": "from redisvl.utils.vectorize import VoyageAIVectorizer\n\n# create a vectorizer\nvo = VoyageAIVectorizer(\n    model=\"voyage-law-2\",  # Please check the available models at https://docs.voyageai.com/docs/embeddings\n    api_config={\"api_key\": api_key},\n)\n\n# embed a search query\ntest = vo.embed(\"This is a test sentence.\", input_type='query')\nprint(\"Vector dimensions: \", len(test))\nprint(test[:10])\n\n# embed a document\ntest = vo.embed(\"This is a test sentence.\", input_type='document')\nprint(\"Vector dimensions: \", len(test))\nprint(test[:10])",
      "section_id": "voyageai"
    },
    {
      "id": "mistral-ai-ex0",
      "language": "bash",
      "code": "pip install mistralai",
      "section_id": "mistral-ai"
    },
    {
      "id": "mistral-ai-ex1",
      "language": "python",
      "code": "from redisvl.utils.vectorize import MistralAITextVectorizer\n\nmistral = MistralAITextVectorizer()\n\n# embed a sentence using their asynchronous method\ntest = await mistral.aembed(\"This is a test sentence.\")\nprint(\"Vector dimensions: \", len(test))\nprint(test[:10])",
      "section_id": "mistral-ai"
    },
    {
      "id": "amazon-bedrock-ex0",
      "language": "bash",
      "code": "pip install 'redisvl[bedrock]'  # Installs boto3",
      "section_id": "amazon-bedrock"
    },
    {
      "id": "amazon-bedrock-ex1",
      "language": "python",
      "code": "import os\nimport getpass\n\nif \"AWS_ACCESS_KEY_ID\" not in os.environ:\n    os.environ[\"AWS_ACCESS_KEY_ID\"] = getpass.getpass(\"Enter AWS Access Key ID: \")\nif \"AWS_SECRET_ACCESS_KEY\" not in os.environ:\n    os.environ[\"AWS_SECRET_ACCESS_KEY\"] = getpass.getpass(\"Enter AWS Secret Key: \")\n\nos.environ[\"AWS_REGION\"] = \"us-east-1\"  # Change as needed",
      "section_id": "amazon-bedrock"
    },
    {
      "id": "amazon-bedrock-ex2",
      "language": "python",
      "code": "from redisvl.utils.vectorize import BedrockVectorizer\n\nbedrock = BedrockVectorizer(\n    model=\"amazon.titan-embed-text-v2:0\"\n)\n\n# Single embedding\ntext = \"This is a test sentence.\"\nembedding = bedrock.embed(text)\nprint(f\"Vector dimensions: {len(embedding)}\")\n\n# Multiple embeddings\nsentences = [\n    \"That is a happy dog\",\n    \"That is a happy person\",\n    \"Today is a sunny day\"\n]\nembeddings = bedrock.embed_many(sentences)",
      "section_id": "amazon-bedrock"
    },
    {
      "id": "custom-vectorizers-ex0",
      "language": "python",
      "code": "from redisvl.utils.vectorize import CustomVectorizer\n\ndef generate_embeddings(text_input, **kwargs):\n    return [0.101] * 768\n\ncustom_vectorizer = CustomVectorizer(generate_embeddings)\n\ncustom_vectorizer.embed(\"This is a test sentence.\")[:10]",
      "section_id": "custom-vectorizers"
    },
    {
      "id": "custom-vectorizers-ex1",
      "language": "python",
      "code": "from redisvl.extensions.cache.llm import SemanticCache\n\ncache = SemanticCache(name=\"custom_cache\", vectorizer=custom_vectorizer)\n\ncache.store(\"this is a test prompt\", \"this is a test response\")\ncache.check(\"this is also a test prompt\")",
      "section_id": "custom-vectorizers"
    },
    {
      "id": "search-with-provider-embeddings-ex0",
      "language": "yaml",
      "code": "version: '0.1.0'\n\nindex:\n    name: vectorizers\n    prefix: doc\n    storage_type: hash\n\nfields:\n    - name: sentence\n      type: text\n    - name: embedding\n      type: vector\n      attrs:\n        dims: 768\n        algorithm: flat\n        distance_metric: cosine",
      "section_id": "search-with-provider-embeddings"
    },
    {
      "id": "search-with-provider-embeddings-ex1",
      "language": "python",
      "code": "from redisvl.index import SearchIndex\n\n# construct a search index from the schema\nindex = SearchIndex.from_yaml(\"./schema.yaml\", redis_url=\"redis://localhost:6379\")\n\n# create the index (no data yet)\nindex.create(overwrite=True)",
      "section_id": "search-with-provider-embeddings"
    },
    {
      "id": "search-with-provider-embeddings-ex2",
      "language": "python",
      "code": "# use the CLI to see the created index\n!rvl index listall",
      "section_id": "search-with-provider-embeddings"
    },
    {
      "id": "search-with-provider-embeddings-ex3",
      "language": "python",
      "code": "from redisvl.redis.utils import array_to_buffer\n\nembeddings = hf.embed_many(sentences)\n\ndata = [{\"text\": t,\n         \"embedding\": array_to_buffer(v, dtype=\"float32\")}\n        for t, v in zip(sentences, embeddings)]\n\nindex.load(data)",
      "section_id": "search-with-provider-embeddings"
    },
    {
      "id": "search-with-provider-embeddings-ex4",
      "language": "python",
      "code": "from redisvl.query import VectorQuery\n\n# use the HuggingFace vectorizer again to create a query embedding\nquery_embedding = hf.embed(\"That is a happy cat\")\n\nquery = VectorQuery(\n    vector=query_embedding,\n    vector_field_name=\"embedding\",\n    return_fields=[\"text\"],\n    num_results=3\n)\n\nresults = index.query(query)\nfor doc in results:\n    print(doc[\"text\"], doc[\"vector_distance\"])",
      "section_id": "search-with-provider-embeddings"
    },
    {
      "id": "selecting-your-float-data-type-ex0",
      "language": "python",
      "code": "vectorizer = HFTextVectorizer(dtype=\"float16\")\n\n# subsequent calls to embed('', as_buffer=True) and embed_many('', as_buffer=True) will now encode as float16\nfloat16_bytes = vectorizer.embed('test sentence', as_buffer=True)\n\n# to generate embeddings with different dtype instantiate a new vectorizer\nvectorizer_64 = HFTextVectorizer(dtype='float64')\nfloat64_bytes = vectorizer_64.embed('test sentence', as_buffer=True)\n\nfloat16_bytes != float64_bytes",
      "section_id": "selecting-your-float-data-type"
    },
    {
      "id": "cleanup-ex0",
      "language": "python",
      "code": "index.delete()",
      "section_id": "cleanup"
    }
  ]
}