Building an AI-Powered Video Q&A Application with Redis and LangChain

Back

For AIBuilding an AI-Powered Video Q&A Application with Redis and LangChain

Will Johnston

Prasan Kumar

What you will learn in this tutorial

This tutorial focuses on building a Q&A answer engine for video content. It will cover the following topics:

How to use OpenAI, Google Gemini, and LangChain to summarize video content and generate vector embeddings
How to use Redis to store and search vector embeddings
How to use Redis as a semantic vector search cache

GITHUB CODE

Below is a command to the clone the source code for the application used in this tutorial

Introduction

Before we dive into the details of this tutorial, let's go over a few concepts that are important to understand when building generative AI applications.

Generative AI is a rapidly evolving field that focuses on creating content, whether it's text, images, or even video. It leverages deep learning techniques to generate new, unique outputs based on learned patterns and data.
Retrieval-Augmented Generation (RAG) combines generative models with external knowledge sources to provide more accurate and informed responses. This technique is particularly useful in applications where context-specific information is critical.
LangChain is a powerful library that facilitates the development of applications involving language models. It simplifies tasks such as summarization, question answering, and interaction with generative models like ChatGPT or Google Gemini.
Google Gemini and OpenAI/ChatGPT are generative models that can be used to generate text based on a given prompt. They are useful for applications that require a large amount of text generation, such as summarization or question answering.
Semantic vector search is a technique that uses vector embeddings to find similar items in a database. It is typically combined with RAG to provide more accurate responses to user queries.
Redis is an in-memory database that can be used to store and search vector embeddings. It is particularly useful for applications that require fast, real-time responses.

Our application leverages these technologies to create a unique Q&A platform based on video content. Users can upload YouTube video URLs or IDs, and the application utilizes generative AI to summarize these videos, formulate potential questions, and create a searchable database. This database can then be queried to find answers to user-submitted questions, drawing directly from the video content.

High-level overview of the AI video Q&A application with Redis

Here's how our application uses AI and semantic vector search to answer user questions based on video content:

Uploading videos: Users can upload YouTube videos either via links (e.g. https://www.youtube.com/watch?v=LaiQFZ5bXaM) or video IDs (e.g. LaiQFZ5bXaM). The application processes these inputs to retrieve necessary video information. For the purposes of this tutorial, the app is pre-seeded with a collection of videos from the Redis YouTube channel. However, when you run the application you can adjust it to cover your own set of videos.

2. Video processing and AI interaction: Using the Youtube Data API, the application obtains video titles, descriptions, and thumbnails. It also uses SearchAPI.io to retrieve video transcripts. These transcripts are then passed to a large language model (LLM) - either Google Gemini or OpenAI's ChatGPT - for summarization and sample question generation. The LLM also generates vector embeddings for these summaries.

An example summary and sample questions generated by the LLM are shown below:

3. Data storage with Redis: All generated data, including video summaries, potential questions, and vector embeddings, are stored in Redis. The app utilizes Redis's diverse data types for efficient data handling, caching, and quick retrieval.

4. Search and answer retrieval: The frontend, built with Next.js, allows users to ask questions. The application then searches the Redis database using semantic vector similarity to find relevant video content. It further uses the LLM to formulate answers, prioritizing information from video transcripts.

5. Presentation of results: The app displays the most relevant videos along with the AI-generated answers, offering a comprehensive and interactive user experience. It also displays cached results from previous queries using semantic vector caching for faster response times.

Setting Up the Environment

To get started with our AI-powered video Q&A application, you'll first need to set up your development environment. We'll follow the instructions outlined in the project's README.md file.

Requirements

Node.js
Docker
SearchAPI.io API Key
This is used to retrieve video transcripts and free for up to 100 requests. The application will cache the results to help avoid exceeding the free tier.
Google API Key
You must have the following APIs enabled:
YouTube Data API v3
Generative Language API
This is used to retrieve video information and prompt the Google Gemini model. This is not free.
OpenAI API Key
This is used to prompt the OpenAI ChatGPT model. This is not free.

Setting Up Redis

Redis is used as our database to store and retrieve data efficiently. You can start quickly with a cloud-hosted Redis instance by signing up at redis.com/try-free. This is ideal for both development and testing purposes. You can easily store the data for this application within the limitations of the Redis free tier.

Cloning the Repository

First, clone the repository containing our project:

Installing Dependencies

After setting up your Node.js environment, you'll need to install the necessary packages. Navigate to the root of your project directory and run the following command:

This command will install all the dependencies listed in the package.json file, ensuring you have everything needed to run the application.

Configuration

Before running the application, make sure to configure the environment variables. There is a script to automatically generate the .env files for you. Run the following command:

This will generate the following files:

app/.env - This file contains the environment variables for the Next.js application.
app/.env.docker - This file contains overrides for the environment variables when running in Docker.
services/video-search/.env - This file contains the environment variables for the video search service.
services/video-search/.env.docker - This file contains overrides for the environment variables when running in Docker.

By default, you should not need to touch the environment files in the app. However, you will need to configure the environment files in the services/video-search directory.

The services/video-search/.env looks like this:

For Gemini models, you can use the following if you are not sure what to do:

For OpenAI models, you can use the following if you are not sure what to do:

NOTE: Depending on your OpenAI tier you may have to use a different summary model. gpt-3.5 models will be okay.

The _PREFIX environment variables are used to prefix the keys in Redis. This is useful if you want to use the same Redis instance for multiple applications. They have the following defaults:

If you're satisfied with the defaults, you can delete these values from the .env file.

Lastly, the services/video-search/.env.docker file contains overrides for the Redis URL when used in Docker. By default this app sets up a local Redis instance in Docker. If you are using a cloud instance, you can simply add the URL to your .env and delete the override in the .env.docker file.

Running the application

After installing and configuring the application, run the following command to build the Docker images and run containers:

This command builds the app and the video service, and deploys them to Docker. It is all setup for hot reloading, so if you make changes to the code, it will automatically restart the servers.

Once the containers are up and running, the application will be accessible via your web browser:

Client: Available at http://localhost (Port 80).
Video search service: Accessible at http://localhost:8000.

This setup allows you to interact with the client-side application through your browser and make requests to the video search service hosted on a separate port.

The video search service doesn't publish a client application. Instead, it exposes a REST API that can be used to interact with the service. You can validate that it is running by checking Docker or by visiting the following URL:

http://localhost:8000/api/healthcheck

You should be up and running now! The rest of this tutorial is focused on how the application works and how to use it, with code examples.

How to build a video Q&A application with Redis and LangChain

Video uploading and processing

Handling video uploads and retrieving video transcripts and metadata

The backend is set up to handle YouTube video links or IDs. The relevant code snippet from the project demonstrates how these inputs are processed.

In the same file you will see two caches:

These caches are used to store the transcripts (as a string) and video metadata (as JSON) in Redis. The cache functions are helper functions that use Redis to store and retrieve data. They looks like this:

You will see these functions used elsewhere in the app. They are used to prevent unnecessary API calls, in this case to SearchAPI.io and the YouTube API.

Summarizing video content with LangChain, Redis, Google Gemini, and OpenAI ChatGPT

After obtaining the video transcripts and metadata, the transcripts are then summarized using LangChain and the LLMs, both Gemini and ChatGPT. There are a few interesting pieces of code to understand here:

The prompt used to ask the LLM to summarize the video transcript and generate sample questions
The refinement chain used to obtain the summarized video and sample questions
The vector embedding chain that uses the LLM to generate text embeddings and store them in Redis

The LLM summary prompt is split into two parts. This is done to allow analyzing videos where the transcript length is larger than the LLM's accepted context.

The summary prompts are used to create a refinement chain with LangChain. LangChain will automatically handle splitting the video transcript document(s) and calling the LLM accordingly.

Notice the summaryCache is used to first ask Redis if the video has already been summarized. If it has, it will return the summary and skip the LLM. This is a great example of how Redis can be used to cache data and avoid unnecessary API calls. Below is an example video summary with questions.

The vector embedding chain is used to generate vector embeddings for the video summaries. This is done by asking the LLM to generate text embeddings for the summary. The vector embedding chain is defined as follows:

The vector store uses the RedisVectorStore class from LangChain. This class is a wrapper around Redis that allows you to store and search vector embeddings. We are using the HNSW algorithm and the IP distance metric. For more information on the supported algorithms and distance metrics, see the Redis vector store documentation. We pass the embeddings object to the RedisVectorStore constructor. This object is defined as follows:

Or for OpenAI:

The embeddings object is used to generate vector embeddings for the video summaries. These embeddings are then stored in Redis using the vectorStore.

Notice that we first check if we have already generated a vector using the Redis Set VECTOR_SET. If we have, we skip the LLM and use the existing vector. This avoids unnecessary API calls and can speed things up.

Redis vector search functionality and AI integration for video Q&A

One of the key features of our application is the ability to search through video content using AI-generated queries. This section will cover how the backend handles search requests and interacts with the AI models.

Converting questions into vectors

When a user submits a question through the frontend, the backend performs the following steps to obtain the answer to the question as well as supporting videos:

We generate a semantically similar question to the one being asked. This helps to find the most relevant videos.
We then use the vectorStore to search for the most relevant videos based on the semantic question.
If we don't find any relevant videos, we search with the original question.
Once we find videos, we call the LLM to answer the question.
Finally, we return the answer and supporting videos to the user.

To answer a question, we first generate a semantically similar question to the one being asked. This is done using the QUESTION_PROMPT defined below:

Using this prompt, we generate the semantic question and use it to search for videos. We may also need to search using the original question if we don't find any videos with the semantic question. This is done using the ORIGINAL_QUESTION_PROMPT defined below:

The code above shows the whole process for getting answers from the LLM and returning them to the user. Once relevant videos are identified, the backend uses either Google Gemini or OpenAI's ChatGPT to generate answers. These answers are formulated based on the video transcripts stored in Redis, ensuring they are contextually relevant to the user's query. The ANSWER_PROMPT used to ask the LLM for answers is as follows:

That's it! The backend will now return the answer and supporting videos to the user.

Going further with semantic answer caching

The application we've built in this tutorial is a great starting point for exploring the possibilities of AI-powered video Q&A. However, there are many ways to improve the application and make it more efficient. One such improvement is to use Redis as a semantic vector cache.

Note in the previous section, we discussed making a call to the LLM to answer every question. There is a performance bottleneck during this step, because LLM response times vary, but can take several seconds. What if there was a way we could prevent unnecessary calls to the LLM? This is where semantic vector caching comes in.

What is semantic vector caching?

Semantic vector caching happens when you take the results of a call to an LLM and cache them alongside the vector embedding for the prompt. In the case of our application, we could generate vector embeddings for the questions and store them in Redis with the answer from the LLM. This would allow us to avoid calling the LLM for similar questions that have already been answered.

You might ask why store the question as a vector? Why not just store the question as a string? The answer is that storing the question as a vector allows us to perform semantic vector similarity searches. So rather than relying on someone asking the exact same question, we can determine an acceptable similarity score and return answers for similar questions

How to implement semantic vector caching in Redis

If you're already familiar with storing vectors in Redis, which we have covered in this tutorial, semantic vector caching is an extension of that and operates in essentially the same way. The only difference is that we are storing the question as a vector, rather than the video summary. We are also using the cache aside pattern. The process is as follows:

When a user asks a question, we perform a vector similarity search for existing answers to the question.
If we find an answer, we return it to the user. Thus, avoiding a call to the LLM.
If we don't find an answer, we call the LLM to generate an answer.
We then store the question as a vector in Redis, along with the answer from the LLM.

In order to store the question vectors we need to create a new vector store. This will create an index specifically for the question and answer vector. The code looks like this:

The answerVectorStore looks nearly identical to the vectorStore we defined earlier, but it uses a different algorithm and distance metric. This algorithm is better suited for similarity searches for our questions.

The following code demonstrates how to use the answerVectorStore to check if a similar question has already been answered.

The similaritySearchWithScore will find similar questions to the one being asked. It ranks them from 0 to 1, where 0 is most similar or "closest". We then filter out any results that are too similar, as defined by the maxSimilarityScore environment variable. If we find any results, we return them to the user. Using a max score is crucial here, because we don't want to return inaccurate results.

To complete this process, we need to apply the cache aside pattern and store the question as a vector in Redis. This is done as follows:

When a question is asked, we first check the answer cache. We check both the question and the generated semantic question. If we find an answer, we return it to the user. If we don't find an answer, we call the LLM to generate an answer. We then store the question as a vector in Redis, along with the answer from the LLM. It may look like we're doing more work here than we were without the cache, but keep in mind the LLM is the bottleneck. By doing this, we are avoiding unnecessary calls to the LLM.

Below are a couple screenshots from the application to see what it looks like when you find an existing answer to a question:

Conclusion

In this tutorial, we've explored how to build an AI-powered video Q&A application using Redis, LangChain, and various other technologies. We've covered setting up the environment, processing video uploads, and implementing search functionality. You also saw how to use Redis as a vector store and semantic vector cache.

Note

NOTE: Not included in this tutorial is an overview of the frontend Next.js app. However, you can find the code in the GitHub repository in the app directory.

Key takeaways

Generative AI can be leveraged to create powerful applications without writing a ton of code.
Redis is highly versatile and efficient in handling AI-generated data and vectors.
LangChain makes it easy to integrate AI models with vector stores.

Remember, Redis offers an easy start with cloud-hosted instances, which you can sign up for at redis.com/try-free. This makes experimenting with AI and Redis more accessible than ever.

We hope this tutorial inspires you to explore the exciting possibilities of combining AI with powerful databases like Redis to create innovative applications.