{
  "id": "vectorizer",
  "title": "Vectorizers",
  "url": "https://redis.io/docs/latest/develop/ai/redisvl/0.8.0/api/vectorizer/",
  "summary": "",
  "content": "\n\n## HFTextVectorizer\n\n\u003ca id=\"hftextvectorizer-api\"\u003e\u003c/a\u003e\n\n### `class HFTextVectorizer(model='sentence-transformers/all-mpnet-base-v2', dtype='float32', cache=None, *, dims=None)`\n\nBases: `BaseVectorizer`\n\nThe HFTextVectorizer class leverages Hugging Face’s Sentence Transformers\nfor generating vector embeddings from text input.\n\nThis vectorizer is particularly useful in scenarios where advanced natural language\nprocessing and understanding are required, and ideal for running on your own\nhardware without usage fees.\n\nYou can optionally enable caching to improve performance when generating\nembeddings for repeated text inputs.\n\nUtilizing this vectorizer involves specifying a pre-trained model from\nHugging Face’s vast collection of Sentence Transformers. These models are\ntrained on a variety of datasets and tasks, ensuring versatility and\nrobust performance across different embedding needs.\n\nRequirements:\n: - The sentence-transformers library must be installed with pip.\n\n```python\n# Basic usage\nvectorizer = HFTextVectorizer(model=\"sentence-transformers/all-mpnet-base-v2\")\nembedding = vectorizer.embed(\"Hello, world!\")\n\n# With caching enabled\nfrom redisvl.extensions.cache.embeddings import EmbeddingsCache\ncache = EmbeddingsCache(name=\"my_embeddings_cache\")\n\nvectorizer = HFTextVectorizer(\n    model=\"sentence-transformers/all-mpnet-base-v2\",\n    cache=cache\n)\n\n# First call will compute and cache the embedding\nembedding1 = vectorizer.embed(\"Hello, world!\")\n\n# Second call will retrieve from cache\nembedding2 = vectorizer.embed(\"Hello, world!\")\n\n# Batch processing\nembeddings = vectorizer.embed_many(\n    [\"Hello, world!\", \"How are you?\"],\n    batch_size=2\n)\n```\n\nInitialize the Hugging Face text vectorizer.\n\n* **Parameters:**\n  * **model** (*str*) – The pre-trained model from Hugging Face’s Sentence\n    Transformers to be used for embedding. Defaults to\n    ‘sentence-transformers/all-mpnet-base-v2’.\n  * **dtype** (*str*) – the default datatype to use when embedding text as byte arrays.\n    Used when setting as_buffer=True in calls to embed() and embed_many().\n    Defaults to ‘float32’.\n  * **cache** (*Optional* *[*[*EmbeddingsCache*]() *]*) – Optional EmbeddingsCache instance to cache embeddings for\n    better performance with repeated texts. Defaults to None.\n  * **\\*\\*kwargs** – Additional parameters to pass to the SentenceTransformer\n    constructor.\n  * **dims** (*Annotated* *[* *int* *|* *None* *,* *FieldInfo* *(* *annotation=NoneType* *,* *required=True* *,* *metadata=* *[* *Strict* *(* *strict=True* *)* *,* *Gt* *(* *gt=0* *)* *]* *)* *]*)\n* **Raises:**\n  * **ImportError** – If the sentence-transformers library is not installed.\n  * **ValueError** – If there is an error setting the embedding model dimensions.\n  * **ValueError** – If an invalid dtype is provided.\n\n#### `model_post_init(context, /)`\n\nThis function is meant to behave like a BaseModel method to initialise private attributes.\n\nIt takes context as an argument since that’s what pydantic-core passes when calling it.\n\n* **Parameters:**\n  * **self** (*BaseModel*) – The BaseModel instance.\n  * **context** (*Any*) – The context.\n* **Return type:**\n  None\n\n#### `model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}`\n\nConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].\n\n#### `property type: str`\n\nReturn the type of vectorizer.\n\n## OpenAITextVectorizer\n\n\u003ca id=\"openaitextvectorizer-api\"\u003e\u003c/a\u003e\n\n### `class OpenAITextVectorizer(model='text-embedding-ada-002', api_config=None, dtype='float32', cache=None, *, dims=None)`\n\nBases: `BaseVectorizer`\n\nThe OpenAITextVectorizer class utilizes OpenAI’s API to generate\nembeddings for text data.\n\nThis vectorizer is designed to interact with OpenAI’s embeddings API,\nrequiring an API key for authentication. The key can be provided directly\nin the api_config dictionary or through the OPENAI_API_KEY environment\nvariable. Users must obtain an API key from OpenAI’s website\n([https://api.openai.com/](https://api.openai.com/)). Additionally, the openai python client must be\ninstalled with pip install openai\u003e=1.13.0.\n\nThe vectorizer supports both synchronous and asynchronous operations,\nallowing for batch processing of texts and flexibility in handling\npreprocessing tasks.\n\nYou can optionally enable caching to improve performance when generating\nembeddings for repeated text inputs.\n\n```python\n# Basic usage with OpenAI embeddings\nvectorizer = OpenAITextVectorizer(\n    model=\"text-embedding-ada-002\",\n    api_config={\"api_key\": \"your_api_key\"} # OR set OPENAI_API_KEY in your env\n)\nembedding = vectorizer.embed(\"Hello, world!\")\n\n# With caching enabled\nfrom redisvl.extensions.cache.embeddings import EmbeddingsCache\ncache = EmbeddingsCache(name=\"openai_embeddings_cache\")\n\nvectorizer = OpenAITextVectorizer(\n    model=\"text-embedding-ada-002\",\n    api_config={\"api_key\": \"your_api_key\"},\n    cache=cache\n)\n\n# First call will compute and cache the embedding\nembedding1 = vectorizer.embed(\"Hello, world!\")\n\n# Second call will retrieve from cache\nembedding2 = vectorizer.embed(\"Hello, world!\")\n\n# Asynchronous batch embedding of multiple texts\nembeddings = await vectorizer.aembed_many(\n    [\"Hello, world!\", \"How are you?\"],\n    batch_size=2\n)\n```\n\nInitialize the OpenAI vectorizer.\n\n* **Parameters:**\n  * **model** (*str*) – Model to use for embedding. Defaults to\n    ‘text-embedding-ada-002’.\n  * **api_config** (*Optional* *[* *Dict* *]* *,* *optional*) – Dictionary containing the\n    API key and any additional OpenAI API options. Defaults to None.\n  * **dtype** (*str*) – the default datatype to use when embedding text as byte arrays.\n    Used when setting as_buffer=True in calls to embed() and embed_many().\n    Defaults to ‘float32’.\n  * **cache** (*Optional* *[*[*EmbeddingsCache*]() *]*) – Optional EmbeddingsCache instance to cache embeddings for\n    better performance with repeated texts. Defaults to None.\n  * **dims** (*Annotated* *[* *int* *|* *None* *,* *FieldInfo* *(* *annotation=NoneType* *,* *required=True* *,* *metadata=* *[* *Strict* *(* *strict=True* *)* *,* *Gt* *(* *gt=0* *)* *]* *)* *]*)\n* **Raises:**\n  * **ImportError** – If the openai library is not installed.\n  * **ValueError** – If the OpenAI API key is not provided.\n  * **ValueError** – If an invalid dtype is provided.\n\n#### `model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}`\n\nConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].\n\n#### `property type: str`\n\nReturn the type of vectorizer.\n\n## AzureOpenAITextVectorizer\n\n\u003ca id=\"azureopenaitextvectorizer-api\"\u003e\u003c/a\u003e\n\n### `class AzureOpenAITextVectorizer(model='text-embedding-ada-002', api_config=None, dtype='float32', cache=None, *, dims=None)`\n\nBases: `BaseVectorizer`\n\nThe AzureOpenAITextVectorizer class utilizes AzureOpenAI’s API to generate\nembeddings for text data.\n\nThis vectorizer is designed to interact with AzureOpenAI’s embeddings API,\nrequiring an API key, an AzureOpenAI deployment endpoint and API version.\nThese values can be provided directly in the api_config dictionary with\nthe parameters ‘azure_endpoint’, ‘api_version’ and ‘api_key’ or through the\nenvironment variables ‘AZURE_OPENAI_ENDPOINT’, ‘OPENAI_API_VERSION’, and ‘AZURE_OPENAI_API_KEY’.\nUsers must obtain these values from the ‘Keys and Endpoints’ section in their Azure OpenAI service.\nAdditionally, the openai python client must be installed with pip install openai\u003e=1.13.0.\n\nThe vectorizer supports both synchronous and asynchronous operations,\nallowing for batch processing of texts and flexibility in handling\npreprocessing tasks.\n\nYou can optionally enable caching to improve performance when generating\nembeddings for repeated text inputs.\n\n```python\n# Basic usage\nvectorizer = AzureOpenAITextVectorizer(\n    model=\"text-embedding-ada-002\",\n    api_config={\n        \"api_key\": \"your_api_key\", # OR set AZURE_OPENAI_API_KEY in your env\n        \"api_version\": \"your_api_version\", # OR set OPENAI_API_VERSION in your env\n        \"azure_endpoint\": \"your_azure_endpoint\", # OR set AZURE_OPENAI_ENDPOINT in your env\n    }\n)\nembedding = vectorizer.embed(\"Hello, world!\")\n\n# With caching enabled\nfrom redisvl.extensions.cache.embeddings import EmbeddingsCache\ncache = EmbeddingsCache(name=\"azureopenai_embeddings_cache\")\n\nvectorizer = AzureOpenAITextVectorizer(\n    model=\"text-embedding-ada-002\",\n    api_config={\n        \"api_key\": \"your_api_key\",\n        \"api_version\": \"your_api_version\",\n        \"azure_endpoint\": \"your_azure_endpoint\",\n    },\n    cache=cache\n)\n\n# First call will compute and cache the embedding\nembedding1 = vectorizer.embed(\"Hello, world!\")\n\n# Second call will retrieve from cache\nembedding2 = vectorizer.embed(\"Hello, world!\")\n\n# Asynchronous batch embedding of multiple texts\nembeddings = await vectorizer.aembed_many(\n    [\"Hello, world!\", \"How are you?\"],\n    batch_size=2\n)\n```\n\nInitialize the AzureOpenAI vectorizer.\n\n* **Parameters:**\n  * **model** (*str*) – Deployment to use for embedding. Must be the\n    ‘Deployment name’ not the ‘Model name’. Defaults to\n    ‘text-embedding-ada-002’.\n  * **api_config** (*Optional* *[* *Dict* *]* *,* *optional*) – Dictionary containing the\n    API key, API version, Azure endpoint, and any other API options.\n    Defaults to None.\n  * **dtype** (*str*) – the default datatype to use when embedding text as byte arrays.\n    Used when setting as_buffer=True in calls to embed() and embed_many().\n    Defaults to ‘float32’.\n  * **cache** (*Optional* *[*[*EmbeddingsCache*]() *]*) – Optional EmbeddingsCache instance to cache embeddings for\n    better performance with repeated texts. Defaults to None.\n  * **dims** (*Annotated* *[* *int* *|* *None* *,* *FieldInfo* *(* *annotation=NoneType* *,* *required=True* *,* *metadata=* *[* *Strict* *(* *strict=True* *)* *,* *Gt* *(* *gt=0* *)* *]* *)* *]*)\n* **Raises:**\n  * **ImportError** – If the openai library is not installed.\n  * **ValueError** – If the AzureOpenAI API key, version, or endpoint are not provided.\n  * **ValueError** – If an invalid dtype is provided.\n\n#### `model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}`\n\nConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].\n\n#### `property type: str`\n\nReturn the type of vectorizer.\n\n## VertexAITextVectorizer\n\n\u003ca id=\"vertexaitextvectorizer-api\"\u003e\u003c/a\u003e\n\n### `class VertexAITextVectorizer(model='textembedding-gecko', api_config=None, dtype='float32', cache=None, *, dims=None)`\n\nBases: `BaseVectorizer`\n\nThe VertexAITextVectorizer uses Google’s VertexAI Palm 2 embedding model\nAPI to create text embeddings.\n\nThis vectorizer is tailored for use in\nenvironments where integration with Google Cloud Platform (GCP) services is\na key requirement.\n\nUtilizing this vectorizer requires an active GCP project and location\n(region), along with appropriate application credentials. These can be\nprovided through the api_config dictionary or set the GOOGLE_APPLICATION_CREDENTIALS\nenv var. Additionally, the vertexai python client must be\ninstalled with pip install google-cloud-aiplatform\u003e=1.26.\n\nYou can optionally enable caching to improve performance when generating\nembeddings for repeated text inputs.\n\n```python\n# Basic usage\nvectorizer = VertexAITextVectorizer(\n    model=\"textembedding-gecko\",\n    api_config={\n        \"project_id\": \"your_gcp_project_id\", # OR set GCP_PROJECT_ID\n        \"location\": \"your_gcp_location\",     # OR set GCP_LOCATION\n    })\nembedding = vectorizer.embed(\"Hello, world!\")\n\n# With caching enabled\nfrom redisvl.extensions.cache.embeddings import EmbeddingsCache\ncache = EmbeddingsCache(name=\"vertexai_embeddings_cache\")\n\nvectorizer = VertexAITextVectorizer(\n    model=\"textembedding-gecko\",\n    api_config={\n        \"project_id\": \"your_gcp_project_id\",\n        \"location\": \"your_gcp_location\",\n    },\n    cache=cache\n)\n\n# First call will compute and cache the embedding\nembedding1 = vectorizer.embed(\"Hello, world!\")\n\n# Second call will retrieve from cache\nembedding2 = vectorizer.embed(\"Hello, world!\")\n\n# Batch embedding of multiple texts\nembeddings = vectorizer.embed_many(\n    [\"Hello, world!\", \"Goodbye, world!\"],\n    batch_size=2\n)\n```\n\nInitialize the VertexAI vectorizer.\n\n* **Parameters:**\n  * **model** (*str*) – Model to use for embedding. Defaults to\n    ‘textembedding-gecko’.\n  * **api_config** (*Optional* *[* *Dict* *]* *,* *optional*) – Dictionary containing the\n    API config details. Defaults to None.\n  * **dtype** (*str*) – the default datatype to use when embedding text as byte arrays.\n    Used when setting as_buffer=True in calls to embed() and embed_many().\n    Defaults to ‘float32’.\n  * **cache** (*Optional* *[*[*EmbeddingsCache*]() *]*) – Optional EmbeddingsCache instance to cache embeddings for\n    better performance with repeated texts. Defaults to None.\n  * **dims** (*Annotated* *[* *int* *|* *None* *,* *FieldInfo* *(* *annotation=NoneType* *,* *required=True* *,* *metadata=* *[* *Strict* *(* *strict=True* *)* *,* *Gt* *(* *gt=0* *)* *]* *)* *]*)\n* **Raises:**\n  * **ImportError** – If the google-cloud-aiplatform library is not installed.\n  * **ValueError** – If the API key is not provided.\n  * **ValueError** – If an invalid dtype is provided.\n\n#### `model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}`\n\nConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].\n\n#### `property type: str`\n\nReturn the type of vectorizer.\n\n## CohereTextVectorizer\n\n\u003ca id=\"coheretextvectorizer-api\"\u003e\u003c/a\u003e\n\n### `class CohereTextVectorizer(model='embed-english-v3.0', api_config=None, dtype='float32', cache=None, *, dims=None)`\n\nBases: `BaseVectorizer`\n\nThe CohereTextVectorizer class utilizes Cohere’s API to generate\nembeddings for text data.\n\nThis vectorizer is designed to interact with Cohere’s /embed API,\nrequiring an API key for authentication. The key can be provided\ndirectly in the api_config dictionary or through the COHERE_API_KEY\nenvironment variable. User must obtain an API key from Cohere’s website\n([https://dashboard.cohere.com/](https://dashboard.cohere.com/)). Additionally, the cohere python\nclient must be installed with pip install cohere.\n\nThe vectorizer supports only synchronous operations, allows for batch\nprocessing of texts and flexibility in handling preprocessing tasks.\n\nYou can optionally enable caching to improve performance when generating\nembeddings for repeated text inputs.\n\n```python\nfrom redisvl.utils.vectorize import CohereTextVectorizer\n\n# Basic usage\nvectorizer = CohereTextVectorizer(\n    model=\"embed-english-v3.0\",\n    api_config={\"api_key\": \"your-cohere-api-key\"} # OR set COHERE_API_KEY in your env\n)\nquery_embedding = vectorizer.embed(\n    text=\"your input query text here\",\n    input_type=\"search_query\"\n)\ndoc_embeddings = vectorizer.embed_many(\n    texts=[\"your document text\", \"more document text\"],\n    input_type=\"search_document\"\n)\n\n# With caching enabled\nfrom redisvl.extensions.cache.embeddings import EmbeddingsCache\ncache = EmbeddingsCache(name=\"cohere_embeddings_cache\")\n\nvectorizer = CohereTextVectorizer(\n    model=\"embed-english-v3.0\",\n    api_config={\"api_key\": \"your-cohere-api-key\"},\n    cache=cache\n)\n\n# First call will compute and cache the embedding\nembedding1 = vectorizer.embed(\n    text=\"your input query text here\",\n    input_type=\"search_query\"\n)\n\n# Second call will retrieve from cache\nembedding2 = vectorizer.embed(\n    text=\"your input query text here\",\n    input_type=\"search_query\"\n)\n```\n\nInitialize the Cohere vectorizer.\n\nVisit [https://cohere.ai/embed](https://cohere.ai/embed) to learn about embeddings.\n\n* **Parameters:**\n  * **model** (*str*) – Model to use for embedding. Defaults to ‘embed-english-v3.0’.\n  * **api_config** (*Optional* *[* *Dict* *]* *,* *optional*) – Dictionary containing the API key.\n    Defaults to None.\n  * **dtype** (*str*) – the default datatype to use when embedding text as byte arrays.\n    Used when setting as_buffer=True in calls to embed() and embed_many().\n    ‘float32’ will use Cohere’s float embeddings, ‘int8’ and ‘uint8’ will map\n    to Cohere’s corresponding embedding types. Defaults to ‘float32’.\n  * **cache** (*Optional* *[*[*EmbeddingsCache*]() *]*) – Optional EmbeddingsCache instance to cache embeddings for\n    better performance with repeated texts. Defaults to None.\n  * **dims** (*Annotated* *[* *int* *|* *None* *,* *FieldInfo* *(* *annotation=NoneType* *,* *required=True* *,* *metadata=* *[* *Strict* *(* *strict=True* *)* *,* *Gt* *(* *gt=0* *)* *]* *)* *]*)\n* **Raises:**\n  * **ImportError** – If the cohere library is not installed.\n  * **ValueError** – If the API key is not provided.\n  * **ValueError** – If an invalid dtype is provided.\n\n#### `model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}`\n\nConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].\n\n#### `property type: str`\n\nReturn the type of vectorizer.\n\n## BedrockTextVectorizer\n\n\u003ca id=\"bedrocktextvectorizer-api\"\u003e\u003c/a\u003e\n\n### `class BedrockTextVectorizer(model='amazon.titan-embed-text-v2:0', api_config=None, dtype='float32', cache=None, *, dims=None)`\n\nBases: `BaseVectorizer`\n\nThe AmazonBedrockTextVectorizer class utilizes Amazon Bedrock’s API to generate\nembeddings for text data.\n\nThis vectorizer is designed to interact with Amazon Bedrock API,\nrequiring AWS credentials for authentication. The credentials can be provided\ndirectly in the api_config dictionary or through environment variables:\n- AWS_ACCESS_KEY_ID\n- AWS_SECRET_ACCESS_KEY\n- AWS_REGION (defaults to us-east-1)\n\nThe vectorizer supports synchronous operations with batch processing and\npreprocessing capabilities.\n\nYou can optionally enable caching to improve performance when generating\nembeddings for repeated text inputs.\n\n```python\n# Basic usage with explicit credentials\nvectorizer = AmazonBedrockTextVectorizer(\n    model=\"amazon.titan-embed-text-v2:0\",\n    api_config={\n        \"aws_access_key_id\": \"your_access_key\",\n        \"aws_secret_access_key\": \"your_secret_key\",\n        \"aws_region\": \"us-east-1\"\n    }\n)\n\n# With environment variables and caching\nfrom redisvl.extensions.cache.embeddings import EmbeddingsCache\ncache = EmbeddingsCache(name=\"bedrock_embeddings_cache\")\n\nvectorizer = AmazonBedrockTextVectorizer(\n    model=\"amazon.titan-embed-text-v2:0\",\n    cache=cache\n)\n\n# First call will compute and cache the embedding\nembedding1 = vectorizer.embed(\"Hello, world!\")\n\n# Second call will retrieve from cache\nembedding2 = vectorizer.embed(\"Hello, world!\")\n\n# Generate batch embeddings\nembeddings = vectorizer.embed_many([\"Hello\", \"World\"], batch_size=2)\n```\n\nInitialize the AWS Bedrock Vectorizer.\n\n* **Parameters:**\n  * **model** (*str*) – The Bedrock model ID to use. Defaults to amazon.titan-embed-text-v2:0\n  * **api_config** (*Optional* *[* *Dict* *[* *str* *,* *str* *]* *]*) – AWS credentials and config.\n    Can include: aws_access_key_id, aws_secret_access_key, aws_region\n    If not provided, will use environment variables.\n  * **dtype** (*str*) – the default datatype to use when embedding text as byte arrays.\n    Used when setting as_buffer=True in calls to embed() and embed_many().\n    Defaults to ‘float32’.\n  * **cache** (*Optional* *[*[*EmbeddingsCache*]() *]*) – Optional EmbeddingsCache instance to cache embeddings for\n    better performance with repeated texts. Defaults to None.\n  * **dims** (*Annotated* *[* *int* *|* *None* *,* *FieldInfo* *(* *annotation=NoneType* *,* *required=True* *,* *metadata=* *[* *Strict* *(* *strict=True* *)* *,* *Gt* *(* *gt=0* *)* *]* *)* *]*)\n* **Raises:**\n  * **ValueError** – If credentials are not provided in config or environment.\n  * **ImportError** – If boto3 is not installed.\n  * **ValueError** – If an invalid dtype is provided.\n\n#### `model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}`\n\nConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].\n\n#### `property type: str`\n\nReturn the type of vectorizer.\n\n## CustomTextVectorizer\n\n\u003ca id=\"customtextvectorizer-api\"\u003e\u003c/a\u003e\n\n### `class CustomTextVectorizer(embed, embed_many=None, aembed=None, aembed_many=None, dtype='float32', cache=None)`\n\nBases: `BaseVectorizer`\n\nThe CustomTextVectorizer class wraps user-defined embedding methods to create\nembeddings for text data.\n\nThis vectorizer is designed to accept a provided callable text vectorizer and\nprovides a class definition to allow for compatibility with RedisVL.\nThe vectorizer may support both synchronous and asynchronous operations which\nallows for batch processing of texts, but at a minimum only synchronous embedding\nis required to satisfy the ‘embed()’ method.\n\nYou can optionally enable caching to improve performance when generating\nembeddings for repeated text inputs.\n\n```python\n# Basic usage with a custom embedding function\nvectorizer = CustomTextVectorizer(\n    embed = my_vectorizer.generate_embedding\n)\nembedding = vectorizer.embed(\"Hello, world!\")\n\n# With caching enabled\nfrom redisvl.extensions.cache.embeddings import EmbeddingsCache\ncache = EmbeddingsCache(name=\"my_embeddings_cache\")\n\nvectorizer = CustomTextVectorizer(\n    embed=my_vectorizer.generate_embedding,\n    cache=cache\n)\n\n# First call will compute and cache the embedding\nembedding1 = vectorizer.embed(\"Hello, world!\")\n\n# Second call will retrieve from cache\nembedding2 = vectorizer.embed(\"Hello, world!\")\n\n# Asynchronous batch embedding of multiple texts\nembeddings = await vectorizer.aembed_many(\n    [\"Hello, world!\", \"How are you?\"],\n    batch_size=2\n)\n```\n\nInitialize the Custom vectorizer.\n\n* **Parameters:**\n  * **embed** (*Callable*) – a Callable function that accepts a string object and returns a list of floats.\n  * **embed_many** (*Optional* *[* *Callable* *]*) – a Callable function that accepts a list of string objects and returns a list containing lists of floats. Defaults to None.\n  * **aembed** (*Optional* *[* *Callable* *]*) – an asynchronous Callable function that accepts a string object and returns a lists of floats. Defaults to None.\n  * **aembed_many** (*Optional* *[* *Callable* *]*) – an asynchronous Callable function that accepts a list of string objects and returns a list containing lists of floats. Defaults to None.\n  * **dtype** (*str*) – the default datatype to use when embedding text as byte arrays.\n    Used when setting as_buffer=True in calls to embed() and embed_many().\n    Defaults to ‘float32’.\n  * **cache** (*Optional* *[*[*EmbeddingsCache*]() *]*) – Optional EmbeddingsCache instance to cache embeddings for\n    better performance with repeated texts. Defaults to None.\n* **Raises:**\n  **ValueError** – if embedding validation fails.\n\n#### `model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}`\n\nConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].\n\n#### `property type: str`\n\nReturn the type of vectorizer.\n\n## VoyageAITextVectorizer\n\n\u003ca id=\"voyageaitextvectorizer-api\"\u003e\u003c/a\u003e\n\n### `class VoyageAITextVectorizer(model='voyage-large-2', api_config=None, dtype='float32', cache=None, *, dims=None)`\n\nBases: `BaseVectorizer`\n\nThe VoyageAITextVectorizer class utilizes VoyageAI’s API to generate\nembeddings for text data.\n\nThis vectorizer is designed to interact with VoyageAI’s /embed API,\nrequiring an API key for authentication. The key can be provided\ndirectly in the api_config dictionary or through the VOYAGE_API_KEY\nenvironment variable. User must obtain an API key from VoyageAI’s website\n([https://dash.voyageai.com/](https://dash.voyageai.com/)). Additionally, the voyageai python\nclient must be installed with pip install voyageai.\n\nThe vectorizer supports both synchronous and asynchronous operations, allows for batch\nprocessing of texts and flexibility in handling preprocessing tasks.\n\nYou can optionally enable caching to improve performance when generating\nembeddings for repeated text inputs.\n\n```python\nfrom redisvl.utils.vectorize import VoyageAITextVectorizer\n\n# Basic usage\nvectorizer = VoyageAITextVectorizer(\n    model=\"voyage-large-2\",\n    api_config={\"api_key\": \"your-voyageai-api-key\"} # OR set VOYAGE_API_KEY in your env\n)\nquery_embedding = vectorizer.embed(\n    text=\"your input query text here\",\n    input_type=\"query\"\n)\ndoc_embeddings = vectorizer.embed_many(\n    texts=[\"your document text\", \"more document text\"],\n    input_type=\"document\"\n)\n\n# With caching enabled\nfrom redisvl.extensions.cache.embeddings import EmbeddingsCache\ncache = EmbeddingsCache(name=\"voyageai_embeddings_cache\")\n\nvectorizer = VoyageAITextVectorizer(\n    model=\"voyage-large-2\",\n    api_config={\"api_key\": \"your-voyageai-api-key\"},\n    cache=cache\n)\n\n# First call will compute and cache the embedding\nembedding1 = vectorizer.embed(\n    text=\"your input query text here\",\n    input_type=\"query\"\n)\n\n# Second call will retrieve from cache\nembedding2 = vectorizer.embed(\n    text=\"your input query text here\",\n    input_type=\"query\"\n)\n```\n\nInitialize the VoyageAI vectorizer.\n\nVisit [https://docs.voyageai.com/docs/embeddings](https://docs.voyageai.com/docs/embeddings) to learn about embeddings and check the available models.\n\n* **Parameters:**\n  * **model** (*str*) – Model to use for embedding. Defaults to \"voyage-large-2\".\n  * **api_config** (*Optional* *[* *Dict* *]* *,* *optional*) – Dictionary containing the API key.\n    Defaults to None.\n  * **dtype** (*str*) – the default datatype to use when embedding text as byte arrays.\n    Used when setting as_buffer=True in calls to embed() and embed_many().\n    Defaults to ‘float32’.\n  * **cache** (*Optional* *[*[*EmbeddingsCache*]() *]*) – Optional EmbeddingsCache instance to cache embeddings for\n    better performance with repeated texts. Defaults to None.\n  * **dims** (*Annotated* *[* *int* *|* *None* *,* *FieldInfo* *(* *annotation=NoneType* *,* *required=True* *,* *metadata=* *[* *Strict* *(* *strict=True* *)* *,* *Gt* *(* *gt=0* *)* *]* *)* *]*)\n* **Raises:**\n  * **ImportError** – If the voyageai library is not installed.\n  * **ValueError** – If the API key is not provided.\n\n#### `model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}`\n\nConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].\n\n#### `property type: str`\n\nReturn the type of vectorizer.\n",
  "tags": [],
  "last_updated": "2026-04-01T08:10:08-05:00"
}