We're talking new releases & fast AI at Redis Released. Join us in your city.

Register now

Vector similarity

August 11, 2025

Vector similarity is how developers and ML engineers measure how close two data points are in a high-dimensional vector space. Vector similarity powers AI features like RAG and agent memory — and infrastructure like Redis makes it fast and production-ready.

Many AI models convert different data formats, including text, images, and more, into vector embeddings that can be stored, queried, and compared. Vector similarity is a granular concept, but it’s foundational to building high-performance AI applications.

Before LLMs can use the data they ingest and depend on, they typically have to be converted into vector embeddings. Vector similarity is a key part of how the LLM, once it’s queried, can navigate the data it's using.

Embeddings represent meaning, and most queries rely on stitching meaning together across multiple data points. By measuring vector similarities, LLMs can retrieve conceptually similar items even if they don’t match exactly. As a result, vector similarity allows LLMs to find the most relevant results, recommend the right items, or recall context. Without any concept of similarity, LLMs would struggle to compare, contrast, and combine data points, leaving AI applications slow, inaccurate, or both.

The three most common metrics for vector similarity include:

  • Cosine, which measures the angle between vectors, regardless of their magnitude.
  • Dot product, which measures both the direction and magnitude of vector alignment.
  • Euclidean distance, which measures the straight-line distance between vectors.

Different metrics focus on different things and weigh methods of comparing vectors differently. If the way you calculate vector similarity doesn’t work for your use case, it can become a constraint on the rest of your infrastructure, even limiting the performance of your AI applications.

Why vector similarity matters for real-world AI applications

If you open up ChatGPT as an end-user and ask a clear question, the functionality seems clear and complete. But once your business-critical applications, supported by AI to varying degrees, face the real world and the strains of scalability, details like vector similarity can become constraints.

Semantic search, for example, uses embeddings to allow search queries to capture conceptually similar items. Exact keyword matches, in contrast, might not return “queen” when you search for “king.” Instacart, for example, uses semantic search to help searchers filter through noisy data across the many kinds of retailers it connects users to. Without vector similarity, semantic search doesn’t work because conceptual similarity becomes impossible to measure.

And let’s return to RAG. RAG features work best when queries return intelligent, useful results. If an employee has a query about personnel records, and the RAG feature is overly literal, not returning records the employee knows are documented somewhere, the usefulness of the feature is diminished.

Similarly, a much-hyped AI approach, AI agents, cannot function without vector similarity. AI agents depend on memory to support their decision-making process. Without vector similarity, the memory that AI agents build as they work is less effective. The pattern repeats, and now, even AI agents can end up confused and unable to compare conceptually similar items.

Even more established AI use cases, such as personalization, recommendation systems, and fraud detection, rely on vector similarity. Without being able to compare similar items that are conceptually similar but technically different, AI applications can’t personalize different features, recommend similar items, or detect when behavior is truly anomalous.

Throughout these examples, the question isn’t just, “Do we support vector similarity?” Infrastructure-level performance becomes a bottleneck as these workloads scale, meaning that the wrong approach to vector similarity can contribute to latency and throughput issues.

Core similarity metrics explained

Understanding the math behind vector comparisons is a core part of supporting, maintaining, and scaling AI applications. You don’t need to become a data scientist per se, but a basic understanding of these similarity metrics is essential for choosing the right tool and avoiding costly performance issues down the line.

If it gets too complex or if you can’t predict which metrics will be best for use cases that might still be developing, you can choose infrastructure options that support all potential paths. RedisVL, for example, is open-source and supports all three core similarity metrics—cosine, dot product, and Euclidean—and integrates with LangChain, SpringAI, and LlamaIndex, making it well-suited for real-time AI use cases.

Considers directionConsiders magnitude
Cosine similarity
Dot product
Euclidean distance

Cosine similarity

Cosine similarity measures the angle between vectors, regardless of their magnitude. You can calculate it as the cosine of the angle between the two vectors. A cosine similarity score of 1 indicates perfect similarity, 0 indicates no similarity, and -1 indicates complete dissimilarity.

Because cosine similarity disregards the length of the vectors, focusing only on directionality, it’s most useful when the overall scale of the compared vectors’ values isn’t meaningful. For example, if you’re querying across documents, the fact that one document might be much longer than another isn’t meaningful when looking for substantive similarity.

Cosine similarity can become limiting, however, because it misses differences in scale that can be meaningful in certain contexts.

Dot product

Dot product measures both the direction and magnitude of vector alignment. Dot product is larger when vectors point in the same direction and when one or both vectors have large magnitude differences.

Unlike cosine similarity, dot product is sensitive to vector magnitudes. If two vectors point in the same direction, an increase in their lengths will also increase the resulting dot product. Dot product is useful in contexts where magnitude is meaningful, and is often used in collaborative filtering and LLMs trained with dot product loss functions.

In recommendation systems, for example, an embedding might capture user activity, and dot product can represent higher levels of user activity. Recommendation systems relying on dot product can then weigh the preferences of highly active users more than less active users or provide better predictions for future user-item interactions.

Euclidean distance

Euclidean distance measures the straight-line distance between vectors and is sensitive to both magnitude and position in space, treating the vectors as though they were points in a geometric space. Euclidean distance accounts for the absolute magnitude of vector components, meaning it directly measures the distance between two vectors in space, including their directions and scales of difference.

To calculate Euclidean distance, you take the difference between the corresponding components of two vectors, square each difference, sum them up, and take the square root. A smaller Euclidean distance means the vectors are very close in terms of all their component values.

Euclidean distance is typically the best choice when differences in feature values are meaningful. If a feature compares user profiles with count-based features (i.e., features that rely on the frequency of items, characteristics, or events), Euclidean distance can measure how much those attributes differ.

What makes vector similarity hard to scale

The mathematics at the foundation of vector similarity is relatively simple. You don’t need to be a data scientist, ML engineer, or mathematician to understand the basic mechanics. The nuance, the complexity, and ultimately, the scalability and success of these approaches emerge when you consider vector similarity at production scale.

Performance at scale

In theory, computing vector similarity is just a mathematical calculation, but in practice, computing vector similarity across large vector sets, in production, is a systemic challenge that frequently faces latency bottlenecks.

High-dimensional vectors are expensive to store and process at scale. In a production context, that work can be made even more complex through infrastructural overhead, including index structures and sharding. All of this can increase the memory footprint, which can lead to greater infrastructure costs. These costs can also translate into performance issues, especially in use cases like real-time search, which require sub-millisecond infrastructure.

Throughout, there are tradeoffs between memory and latency. Some approaches use more memory to run faster queries; others reduce memory and increase computation costs; and others compress vectors to reduce the memory costs of distance calculation, even though it might require more computation.

Operational complexity

Especially given the numerous performance issues and cost tradeoffs mentioned above, operational complexity can make vector similarity hard to scale. There are many open-source options, such as FAISS, a library that supports similarity search and dense vector clustering, but they tend to be complex.

These options tend to be powerful, but the infrastructure setup can be complex, and the complexity costs can accumulate over time, making maintenance difficult. Hosted solutions can often take care of this complexity for you, but there’s a tradeoff. Hosted solutions often introduce lock-in or inflexibility, making it difficult to iterate.

Integration challenges

Similarity search is one of the most frequent occasions where vector similarity comes into play, and it also highlights one of the scalability challenges: integrations. Teams building similarity search often need to plug into existing stacks. They might already be using AI frameworks, such as LangChain, semantic caches, or memory systems.

All of these tools and approaches can be powerful, but different toolstacks can struggle to scale when used in complex multi-modal or agentic workflows. The Redis Vector Library (RedisVL), in contrast, simplifies indexing and querying, making it easier to manage similarity metrics.

How Redis supports scalable vector similarity

Redis has long supported vector storage, including Redis 7.2, which introduced scalable vector similarity search in 2023 and RedisVL, which debuted in 2024. In April 2025, Redis announced vector sets, a data type designed specifically for vector similarity. Vector sets integrate fast and scalable vector search for text, image, and audio embeddings into your apps, which allows you to reduce memory use, simplify indexing, and optimize real-time similarity queries.

Redis is at the forefront of vector storage, regularly introducing new features, and has a proven track record of solving the scalability and complexity issues addressed throughout this article.

Vector search built for real-time AI workflows

Real-time is difficult to achieve reliably, and AI workflows only make real-time harder and infrastructure choices more important. Redis Cloud provides sub-millisecond latency and offers native support for cosine, dot product, and Euclidean vector similarity through RedisVL. With Redis, organizations can use vector search to power RAG, chatbots, semantic caching, and long-term agent memory – all without enduring the kind of latency that would damage the user experience.

Developer-ready and flexible

Infrastructure is only as good as it is usable. Powerful features hidden behind overly complex interfaces aren’t as powerful and effective tools that don’t effectively integrate with other tools aren’t as useful as they could be.

Redis integrates with a wide range of tools and frameworks developers are already familiar with, including LangChain, LlamaIndex, SpringAI, and more. RedisVL is open source, easy to use, and available through Redis Cloud and hybrid or on-premises environments.

Redis in action

Redis has supported many organizations with their similarity workloads, including:

  • Relevance AI, which helps companies build AI agents, uses Redis to power vector search with sub-millisecond latency, allowing AI agents to retrieve relevant information and generate instant responses.
  • Superlinked, a compute framework and cloud infrastructure provider, used Redis Cloud to build a highly responsive, scalable vector database and index to sustain periods of non-stop heavy usage with 95 percentile latency at 30ms.
  • Docugami, an AI-powered document engineering platform, uses Redis as a vector database to enable and accelerate generative AI tasks, including RAG, in-context learning, and vector search.

Jacky Koh, Co-Founder and CEO of Relevance AI, summarized the case for Redis best, explaining, “Every millisecond counts, and slow vector searches were limiting our AI agents from delivering instant, accurate responses.” If AI is your goal, then vector search and similarity are your building blocks, making Redis an ideal addition to your stack.

Redis vs. other vector infrastructure options

Redis is not the only vector infrastructure available, but customers frequently test it and find it to be the fastest, most scalable option. For example, Daniel Svonava, co-founder of Superlinked, “had very specific requirements for a vector database. We looked at the available options and determined that the Redis Cloud best fit our needs.”

  • Pinecode, a fully managed vector database, is primarily a managed service, whereas Redis supports hosted and self-managed options.
  • FAISS is an open-source option that’s highly performant but doesn’t offer a REST API or native service layer, unlike Redis.
  • Wavitate is another open-source option, but its GraphQL interface can be intimidating, unlike the intuitive interface that Redis offers.
  • Milvus is a specialized vector database, but it requires significant overhead, whereas Redis minimizes overhead while offering fine-tuning options.
  • Elasticsearch offers vector search, but because Elasticsearch isn’t optimized for this use case, Redis can maintain greater performance and recall at scale

Redis vs. Pinecone

Pinecone is a cloud-native, fully managed vector database. Pinecone is optimized for vector search and supports serverless architecture for scalability and hybrid search to enhance search accuracy. Pinecone is primarily offered as a managed service on cloud platforms, such as AWS and Azure, meaning that Pinecone is limited as a self-hosted option.

Redis, in contrast, supports both hosted and self-managed options, allowing companies full control over how and where they build their vector databases and vector search features.

Redis vs. FAISS, Weaviate, Milvus, and Elasticsearch

There is a wide variety of open source options available to support vector infrastructure, but each poses separate limitations.

FAISS, for example, is highly performant but infrastructure-heavy. FAISS doesn’t offer a REST API or native service layer, meaning it’s only suitable for teams that want to build everything from scratch.

Weavitate includes numerous built-in assumptions and requires more ramp-up for teams to understand. Additionally, the GraphQL interface can.

Milvus is a specialized vector database that is fairly performant, but it introduces operational overhead that poses tradeoffs to that performance. Milvus also requires separate metadata stores and GPU tuning to reach optimal speed.

Elasticsearch, a familiar option for most teams, does offer vector search via dense vector fields, but Elasticsearch isn’t optimized for this use case. Performance and recall can degrade at scale. As a result, Elasticsearch is often better for hybrid keyword and semantic use cases.

Own your vector search stack with Redis

AI is changing daily, and catching up can be a pyrrhic victory if you buy into tools that create vendor lock-in. Instead, developers and teams must own their vector search stack to have the freedom and flexibility to move faster, ship better, and develop smarter and smarter AI solutions.

Redis gives teams that control without sacrificing performance, allowing teams to blend flexibility, usability, and production readiness so that developers can use vectors however they need to.

Ultimately, as Svonava, co-founder of Superlinked, put it, “Users expect great search and recommendation functionality in every application and website they encounter, yet more than 80 percent of business data is unstructured—stored as text, images, audio, video, or other formats. That’s why vector databases with powerful search features will fuel the next generation of applications.”

If you want to build the next generation of applications, Redis is the infrastructure you need. Ready to own your vector search stack? Check out the Redis Vector Library (RedisVL) and then try Redis free or request a demo.