Vector set embeddings
Index and query embeddings with Redis vector sets
A Redis vector set lets you store a set of unique keys, each with its own associated vector. You can then retrieve keys from the set according to the similarity between their stored vectors and a query vector that you specify.
You can use vector sets to store any type of numeric vector but they are
particularly optimized to work with text embedding vectors (see
Redis for AI to learn more about text
embeddings). The example below shows how to use the
TransformersPHP library to
generate text embeddings and then store and retrieve them using a vector set
with Predis.
Initialize
Install Predis and TransformersPHP with Composer:
composer require predis/predis codewithkyrian/transformers
In a new PHP file, import the required classes and function:
<?php
require 'vendor/autoload.php';
use function Codewithkyrian\Transformers\Pipelines\pipeline;
use Predis\Client as PredisClient;
$extractor = pipeline('embeddings', 'Xenova/all-MiniLM-L6-v2');
$peopleData = [
'Marie Curie' => [
'born' => 1867,
'died' => 1934,
'description' => 'Polish-French chemist and physicist. The only person ever to win two Nobel prizes for two different sciences.',
],
'Linus Pauling' => [
'born' => 1901,
'died' => 1994,
'description' => 'American chemist and peace activist. One of only two people to win two Nobel prizes in different fields (chemistry and peace).',
],
'Freddie Mercury' => [
'born' => 1946,
'died' => 1991,
'description' => 'British musician, best known as the lead singer of the rock band Queen.',
],
'Marie Fredriksson' => [
'born' => 1958,
'died' => 2019,
'description' => 'Swedish multi-instrumentalist, mainly known as the lead singer and keyboardist of the band Roxette.',
],
'Paul Erdos' => [
'born' => 1913,
'died' => 1996,
'description' => 'Hungarian mathematician, known for his eccentric personality almost as much as his contributions to many different fields of mathematics.',
],
'Maryam Mirzakhani' => [
'born' => 1977,
'died' => 2017,
'description' => 'Iranian mathematician. The first woman ever to win the Fields medal for her contributions to mathematics.',
],
'Masako Natsume' => [
'born' => 1957,
'died' => 1985,
'description' => 'Japanese actress. She was very famous in Japan but was primarily known elsewhere in the world for her portrayal of Tripitaka in the TV series Monkey.',
],
'Chaim Topol' => [
'born' => 1935,
'died' => 2023,
'description' => "Israeli actor and singer, usually credited simply as 'Topol'. He was best known for his many appearances as Tevye in the musical Fiddler on the Roof.",
],
];
$r = new PredisClient([
'scheme' => 'tcp',
'host' => '127.0.0.1',
'port' => 6379,
'password' => '',
'database' => 0,
]);
$r->del('famousPeople');
foreach ($peopleData as $name => $details) {
$embedding = $extractor($details['description'], normalize: true, pooling: 'mean');
$r->vadd('famousPeople', $embedding[0], $name);
$r->vsetattr('famousPeople', $name, [
'born' => $details['born'],
'died' => $details['died'],
]);
}
$actorsEmbedding = $extractor('actors', normalize: true, pooling: 'mean');
$actorsResults = $r->vsim('famousPeople', $actorsEmbedding[0]);
echo "'actors': " . json_encode($actorsResults), PHP_EOL;
// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling","Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury","Marie Curie","Paul Erdos"]
$twoActorsResults = $r->vsim('famousPeople', $actorsEmbedding[0], false, false, 2);
echo "'actors (2)': " . json_encode($twoActorsResults), PHP_EOL;
// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
$entertainerEmbedding = $extractor('entertainer', normalize: true, pooling: 'mean');
$entertainerResults = $r->vsim('famousPeople', $entertainerEmbedding[0]);
echo "'entertainer': " . json_encode($entertainerResults), PHP_EOL;
// >>> 'entertainer': ["Chaim Topol","Freddie Mercury","Linus Pauling","Marie Fredriksson","Masako Natsume","Paul Erdos","Maryam Mirzakhani","Marie Curie"]
$scienceEmbedding = $extractor('science', normalize: true, pooling: 'mean');
$scienceResults = $r->vsim('famousPeople', $scienceEmbedding[0]);
echo "'science': " . json_encode($scienceResults), PHP_EOL;
// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani","Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury","Chaim Topol"]
$science2000Results = $r->vsim('famousPeople', $scienceEmbedding[0], false, false, null, null, null, '.died < 2000');
echo "'science2000': " . json_encode($science2000Results), PHP_EOL;
// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos","Masako Natsume","Freddie Mercury"]
The pipeline() function below creates an embedding generator for the
all-MiniLM-L6-v2
model. This model generates vectors with 384 dimensions, regardless of the
length of the input text:
<?php
require 'vendor/autoload.php';
use function Codewithkyrian\Transformers\Pipelines\pipeline;
use Predis\Client as PredisClient;
$extractor = pipeline('embeddings', 'Xenova/all-MiniLM-L6-v2');
$peopleData = [
'Marie Curie' => [
'born' => 1867,
'died' => 1934,
'description' => 'Polish-French chemist and physicist. The only person ever to win two Nobel prizes for two different sciences.',
],
'Linus Pauling' => [
'born' => 1901,
'died' => 1994,
'description' => 'American chemist and peace activist. One of only two people to win two Nobel prizes in different fields (chemistry and peace).',
],
'Freddie Mercury' => [
'born' => 1946,
'died' => 1991,
'description' => 'British musician, best known as the lead singer of the rock band Queen.',
],
'Marie Fredriksson' => [
'born' => 1958,
'died' => 2019,
'description' => 'Swedish multi-instrumentalist, mainly known as the lead singer and keyboardist of the band Roxette.',
],
'Paul Erdos' => [
'born' => 1913,
'died' => 1996,
'description' => 'Hungarian mathematician, known for his eccentric personality almost as much as his contributions to many different fields of mathematics.',
],
'Maryam Mirzakhani' => [
'born' => 1977,
'died' => 2017,
'description' => 'Iranian mathematician. The first woman ever to win the Fields medal for her contributions to mathematics.',
],
'Masako Natsume' => [
'born' => 1957,
'died' => 1985,
'description' => 'Japanese actress. She was very famous in Japan but was primarily known elsewhere in the world for her portrayal of Tripitaka in the TV series Monkey.',
],
'Chaim Topol' => [
'born' => 1935,
'died' => 2023,
'description' => "Israeli actor and singer, usually credited simply as 'Topol'. He was best known for his many appearances as Tevye in the musical Fiddler on the Roof.",
],
];
$r = new PredisClient([
'scheme' => 'tcp',
'host' => '127.0.0.1',
'port' => 6379,
'password' => '',
'database' => 0,
]);
$r->del('famousPeople');
foreach ($peopleData as $name => $details) {
$embedding = $extractor($details['description'], normalize: true, pooling: 'mean');
$r->vadd('famousPeople', $embedding[0], $name);
$r->vsetattr('famousPeople', $name, [
'born' => $details['born'],
'died' => $details['died'],
]);
}
$actorsEmbedding = $extractor('actors', normalize: true, pooling: 'mean');
$actorsResults = $r->vsim('famousPeople', $actorsEmbedding[0]);
echo "'actors': " . json_encode($actorsResults), PHP_EOL;
// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling","Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury","Marie Curie","Paul Erdos"]
$twoActorsResults = $r->vsim('famousPeople', $actorsEmbedding[0], false, false, 2);
echo "'actors (2)': " . json_encode($twoActorsResults), PHP_EOL;
// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
$entertainerEmbedding = $extractor('entertainer', normalize: true, pooling: 'mean');
$entertainerResults = $r->vsim('famousPeople', $entertainerEmbedding[0]);
echo "'entertainer': " . json_encode($entertainerResults), PHP_EOL;
// >>> 'entertainer': ["Chaim Topol","Freddie Mercury","Linus Pauling","Marie Fredriksson","Masako Natsume","Paul Erdos","Maryam Mirzakhani","Marie Curie"]
$scienceEmbedding = $extractor('science', normalize: true, pooling: 'mean');
$scienceResults = $r->vsim('famousPeople', $scienceEmbedding[0]);
echo "'science': " . json_encode($scienceResults), PHP_EOL;
// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani","Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury","Chaim Topol"]
$science2000Results = $r->vsim('famousPeople', $scienceEmbedding[0], false, false, null, null, null, '.died < 2000');
echo "'science2000': " . json_encode($science2000Results), PHP_EOL;
// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos","Masako Natsume","Freddie Mercury"]
Create the data
The example data is an array containing brief descriptions of famous people:
<?php
require 'vendor/autoload.php';
use function Codewithkyrian\Transformers\Pipelines\pipeline;
use Predis\Client as PredisClient;
$extractor = pipeline('embeddings', 'Xenova/all-MiniLM-L6-v2');
$peopleData = [
'Marie Curie' => [
'born' => 1867,
'died' => 1934,
'description' => 'Polish-French chemist and physicist. The only person ever to win two Nobel prizes for two different sciences.',
],
'Linus Pauling' => [
'born' => 1901,
'died' => 1994,
'description' => 'American chemist and peace activist. One of only two people to win two Nobel prizes in different fields (chemistry and peace).',
],
'Freddie Mercury' => [
'born' => 1946,
'died' => 1991,
'description' => 'British musician, best known as the lead singer of the rock band Queen.',
],
'Marie Fredriksson' => [
'born' => 1958,
'died' => 2019,
'description' => 'Swedish multi-instrumentalist, mainly known as the lead singer and keyboardist of the band Roxette.',
],
'Paul Erdos' => [
'born' => 1913,
'died' => 1996,
'description' => 'Hungarian mathematician, known for his eccentric personality almost as much as his contributions to many different fields of mathematics.',
],
'Maryam Mirzakhani' => [
'born' => 1977,
'died' => 2017,
'description' => 'Iranian mathematician. The first woman ever to win the Fields medal for her contributions to mathematics.',
],
'Masako Natsume' => [
'born' => 1957,
'died' => 1985,
'description' => 'Japanese actress. She was very famous in Japan but was primarily known elsewhere in the world for her portrayal of Tripitaka in the TV series Monkey.',
],
'Chaim Topol' => [
'born' => 1935,
'died' => 2023,
'description' => "Israeli actor and singer, usually credited simply as 'Topol'. He was best known for his many appearances as Tevye in the musical Fiddler on the Roof.",
],
];
$r = new PredisClient([
'scheme' => 'tcp',
'host' => '127.0.0.1',
'port' => 6379,
'password' => '',
'database' => 0,
]);
$r->del('famousPeople');
foreach ($peopleData as $name => $details) {
$embedding = $extractor($details['description'], normalize: true, pooling: 'mean');
$r->vadd('famousPeople', $embedding[0], $name);
$r->vsetattr('famousPeople', $name, [
'born' => $details['born'],
'died' => $details['died'],
]);
}
$actorsEmbedding = $extractor('actors', normalize: true, pooling: 'mean');
$actorsResults = $r->vsim('famousPeople', $actorsEmbedding[0]);
echo "'actors': " . json_encode($actorsResults), PHP_EOL;
// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling","Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury","Marie Curie","Paul Erdos"]
$twoActorsResults = $r->vsim('famousPeople', $actorsEmbedding[0], false, false, 2);
echo "'actors (2)': " . json_encode($twoActorsResults), PHP_EOL;
// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
$entertainerEmbedding = $extractor('entertainer', normalize: true, pooling: 'mean');
$entertainerResults = $r->vsim('famousPeople', $entertainerEmbedding[0]);
echo "'entertainer': " . json_encode($entertainerResults), PHP_EOL;
// >>> 'entertainer': ["Chaim Topol","Freddie Mercury","Linus Pauling","Marie Fredriksson","Masako Natsume","Paul Erdos","Maryam Mirzakhani","Marie Curie"]
$scienceEmbedding = $extractor('science', normalize: true, pooling: 'mean');
$scienceResults = $r->vsim('famousPeople', $scienceEmbedding[0]);
echo "'science': " . json_encode($scienceResults), PHP_EOL;
// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani","Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury","Chaim Topol"]
$science2000Results = $r->vsim('famousPeople', $scienceEmbedding[0], false, false, null, null, null, '.died < 2000');
echo "'science2000': " . json_encode($science2000Results), PHP_EOL;
// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos","Masako Natsume","Freddie Mercury"]
Add the data to a vector set
The next step is to connect to Redis and add the data to a new vector set.
The code below iterates through the array, uses the embedding pipeline to
generate a float vector from each description, and then adds the result to a
vector set called famousPeople with vadd().
It then stores the born and died values as element attributes using
vsetattr(), so you can use the metadata
later during queries.
<?php
require 'vendor/autoload.php';
use function Codewithkyrian\Transformers\Pipelines\pipeline;
use Predis\Client as PredisClient;
$extractor = pipeline('embeddings', 'Xenova/all-MiniLM-L6-v2');
$peopleData = [
'Marie Curie' => [
'born' => 1867,
'died' => 1934,
'description' => 'Polish-French chemist and physicist. The only person ever to win two Nobel prizes for two different sciences.',
],
'Linus Pauling' => [
'born' => 1901,
'died' => 1994,
'description' => 'American chemist and peace activist. One of only two people to win two Nobel prizes in different fields (chemistry and peace).',
],
'Freddie Mercury' => [
'born' => 1946,
'died' => 1991,
'description' => 'British musician, best known as the lead singer of the rock band Queen.',
],
'Marie Fredriksson' => [
'born' => 1958,
'died' => 2019,
'description' => 'Swedish multi-instrumentalist, mainly known as the lead singer and keyboardist of the band Roxette.',
],
'Paul Erdos' => [
'born' => 1913,
'died' => 1996,
'description' => 'Hungarian mathematician, known for his eccentric personality almost as much as his contributions to many different fields of mathematics.',
],
'Maryam Mirzakhani' => [
'born' => 1977,
'died' => 2017,
'description' => 'Iranian mathematician. The first woman ever to win the Fields medal for her contributions to mathematics.',
],
'Masako Natsume' => [
'born' => 1957,
'died' => 1985,
'description' => 'Japanese actress. She was very famous in Japan but was primarily known elsewhere in the world for her portrayal of Tripitaka in the TV series Monkey.',
],
'Chaim Topol' => [
'born' => 1935,
'died' => 2023,
'description' => "Israeli actor and singer, usually credited simply as 'Topol'. He was best known for his many appearances as Tevye in the musical Fiddler on the Roof.",
],
];
$r = new PredisClient([
'scheme' => 'tcp',
'host' => '127.0.0.1',
'port' => 6379,
'password' => '',
'database' => 0,
]);
$r->del('famousPeople');
foreach ($peopleData as $name => $details) {
$embedding = $extractor($details['description'], normalize: true, pooling: 'mean');
$r->vadd('famousPeople', $embedding[0], $name);
$r->vsetattr('famousPeople', $name, [
'born' => $details['born'],
'died' => $details['died'],
]);
}
$actorsEmbedding = $extractor('actors', normalize: true, pooling: 'mean');
$actorsResults = $r->vsim('famousPeople', $actorsEmbedding[0]);
echo "'actors': " . json_encode($actorsResults), PHP_EOL;
// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling","Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury","Marie Curie","Paul Erdos"]
$twoActorsResults = $r->vsim('famousPeople', $actorsEmbedding[0], false, false, 2);
echo "'actors (2)': " . json_encode($twoActorsResults), PHP_EOL;
// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
$entertainerEmbedding = $extractor('entertainer', normalize: true, pooling: 'mean');
$entertainerResults = $r->vsim('famousPeople', $entertainerEmbedding[0]);
echo "'entertainer': " . json_encode($entertainerResults), PHP_EOL;
// >>> 'entertainer': ["Chaim Topol","Freddie Mercury","Linus Pauling","Marie Fredriksson","Masako Natsume","Paul Erdos","Maryam Mirzakhani","Marie Curie"]
$scienceEmbedding = $extractor('science', normalize: true, pooling: 'mean');
$scienceResults = $r->vsim('famousPeople', $scienceEmbedding[0]);
echo "'science': " . json_encode($scienceResults), PHP_EOL;
// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani","Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury","Chaim Topol"]
$science2000Results = $r->vsim('famousPeople', $scienceEmbedding[0], false, false, null, null, null, '.died < 2000');
echo "'science2000': " . json_encode($science2000Results), PHP_EOL;
// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos","Masako Natsume","Freddie Mercury"]
Query the vector set
You can now query the data in the set. The basic approach is to generate
another embedding vector from the query text and pass it to
vsim(), which returns elements ranked in
order of similarity to that query vector.
Start with a simple query for "actors":
<?php
require 'vendor/autoload.php';
use function Codewithkyrian\Transformers\Pipelines\pipeline;
use Predis\Client as PredisClient;
$extractor = pipeline('embeddings', 'Xenova/all-MiniLM-L6-v2');
$peopleData = [
'Marie Curie' => [
'born' => 1867,
'died' => 1934,
'description' => 'Polish-French chemist and physicist. The only person ever to win two Nobel prizes for two different sciences.',
],
'Linus Pauling' => [
'born' => 1901,
'died' => 1994,
'description' => 'American chemist and peace activist. One of only two people to win two Nobel prizes in different fields (chemistry and peace).',
],
'Freddie Mercury' => [
'born' => 1946,
'died' => 1991,
'description' => 'British musician, best known as the lead singer of the rock band Queen.',
],
'Marie Fredriksson' => [
'born' => 1958,
'died' => 2019,
'description' => 'Swedish multi-instrumentalist, mainly known as the lead singer and keyboardist of the band Roxette.',
],
'Paul Erdos' => [
'born' => 1913,
'died' => 1996,
'description' => 'Hungarian mathematician, known for his eccentric personality almost as much as his contributions to many different fields of mathematics.',
],
'Maryam Mirzakhani' => [
'born' => 1977,
'died' => 2017,
'description' => 'Iranian mathematician. The first woman ever to win the Fields medal for her contributions to mathematics.',
],
'Masako Natsume' => [
'born' => 1957,
'died' => 1985,
'description' => 'Japanese actress. She was very famous in Japan but was primarily known elsewhere in the world for her portrayal of Tripitaka in the TV series Monkey.',
],
'Chaim Topol' => [
'born' => 1935,
'died' => 2023,
'description' => "Israeli actor and singer, usually credited simply as 'Topol'. He was best known for his many appearances as Tevye in the musical Fiddler on the Roof.",
],
];
$r = new PredisClient([
'scheme' => 'tcp',
'host' => '127.0.0.1',
'port' => 6379,
'password' => '',
'database' => 0,
]);
$r->del('famousPeople');
foreach ($peopleData as $name => $details) {
$embedding = $extractor($details['description'], normalize: true, pooling: 'mean');
$r->vadd('famousPeople', $embedding[0], $name);
$r->vsetattr('famousPeople', $name, [
'born' => $details['born'],
'died' => $details['died'],
]);
}
$actorsEmbedding = $extractor('actors', normalize: true, pooling: 'mean');
$actorsResults = $r->vsim('famousPeople', $actorsEmbedding[0]);
echo "'actors': " . json_encode($actorsResults), PHP_EOL;
// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling","Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury","Marie Curie","Paul Erdos"]
$twoActorsResults = $r->vsim('famousPeople', $actorsEmbedding[0], false, false, 2);
echo "'actors (2)': " . json_encode($twoActorsResults), PHP_EOL;
// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
$entertainerEmbedding = $extractor('entertainer', normalize: true, pooling: 'mean');
$entertainerResults = $r->vsim('famousPeople', $entertainerEmbedding[0]);
echo "'entertainer': " . json_encode($entertainerResults), PHP_EOL;
// >>> 'entertainer': ["Chaim Topol","Freddie Mercury","Linus Pauling","Marie Fredriksson","Masako Natsume","Paul Erdos","Maryam Mirzakhani","Marie Curie"]
$scienceEmbedding = $extractor('science', normalize: true, pooling: 'mean');
$scienceResults = $r->vsim('famousPeople', $scienceEmbedding[0]);
echo "'science': " . json_encode($scienceResults), PHP_EOL;
// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani","Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury","Chaim Topol"]
$science2000Results = $r->vsim('famousPeople', $scienceEmbedding[0], false, false, null, null, null, '.died < 2000');
echo "'science2000': " . json_encode($science2000Results), PHP_EOL;
// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos","Masako Natsume","Freddie Mercury"]
This returns the following list of elements:
'actors': ["Masako Natsume","Chaim Topol","Linus Pauling",
"Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury",
"Marie Curie","Paul Erdos"]
The first two people in the list are the two actors, as expected, but the
remaining results are less directly related. By default, the search attempts
to rank all the elements in the set. You can use the count parameter of
vsim() to limit the list to the most relevant few results:
<?php
require 'vendor/autoload.php';
use function Codewithkyrian\Transformers\Pipelines\pipeline;
use Predis\Client as PredisClient;
$extractor = pipeline('embeddings', 'Xenova/all-MiniLM-L6-v2');
$peopleData = [
'Marie Curie' => [
'born' => 1867,
'died' => 1934,
'description' => 'Polish-French chemist and physicist. The only person ever to win two Nobel prizes for two different sciences.',
],
'Linus Pauling' => [
'born' => 1901,
'died' => 1994,
'description' => 'American chemist and peace activist. One of only two people to win two Nobel prizes in different fields (chemistry and peace).',
],
'Freddie Mercury' => [
'born' => 1946,
'died' => 1991,
'description' => 'British musician, best known as the lead singer of the rock band Queen.',
],
'Marie Fredriksson' => [
'born' => 1958,
'died' => 2019,
'description' => 'Swedish multi-instrumentalist, mainly known as the lead singer and keyboardist of the band Roxette.',
],
'Paul Erdos' => [
'born' => 1913,
'died' => 1996,
'description' => 'Hungarian mathematician, known for his eccentric personality almost as much as his contributions to many different fields of mathematics.',
],
'Maryam Mirzakhani' => [
'born' => 1977,
'died' => 2017,
'description' => 'Iranian mathematician. The first woman ever to win the Fields medal for her contributions to mathematics.',
],
'Masako Natsume' => [
'born' => 1957,
'died' => 1985,
'description' => 'Japanese actress. She was very famous in Japan but was primarily known elsewhere in the world for her portrayal of Tripitaka in the TV series Monkey.',
],
'Chaim Topol' => [
'born' => 1935,
'died' => 2023,
'description' => "Israeli actor and singer, usually credited simply as 'Topol'. He was best known for his many appearances as Tevye in the musical Fiddler on the Roof.",
],
];
$r = new PredisClient([
'scheme' => 'tcp',
'host' => '127.0.0.1',
'port' => 6379,
'password' => '',
'database' => 0,
]);
$r->del('famousPeople');
foreach ($peopleData as $name => $details) {
$embedding = $extractor($details['description'], normalize: true, pooling: 'mean');
$r->vadd('famousPeople', $embedding[0], $name);
$r->vsetattr('famousPeople', $name, [
'born' => $details['born'],
'died' => $details['died'],
]);
}
$actorsEmbedding = $extractor('actors', normalize: true, pooling: 'mean');
$actorsResults = $r->vsim('famousPeople', $actorsEmbedding[0]);
echo "'actors': " . json_encode($actorsResults), PHP_EOL;
// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling","Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury","Marie Curie","Paul Erdos"]
$twoActorsResults = $r->vsim('famousPeople', $actorsEmbedding[0], false, false, 2);
echo "'actors (2)': " . json_encode($twoActorsResults), PHP_EOL;
// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
$entertainerEmbedding = $extractor('entertainer', normalize: true, pooling: 'mean');
$entertainerResults = $r->vsim('famousPeople', $entertainerEmbedding[0]);
echo "'entertainer': " . json_encode($entertainerResults), PHP_EOL;
// >>> 'entertainer': ["Chaim Topol","Freddie Mercury","Linus Pauling","Marie Fredriksson","Masako Natsume","Paul Erdos","Maryam Mirzakhani","Marie Curie"]
$scienceEmbedding = $extractor('science', normalize: true, pooling: 'mean');
$scienceResults = $r->vsim('famousPeople', $scienceEmbedding[0]);
echo "'science': " . json_encode($scienceResults), PHP_EOL;
// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani","Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury","Chaim Topol"]
$science2000Results = $r->vsim('famousPeople', $scienceEmbedding[0], false, false, null, null, null, '.died < 2000');
echo "'science2000': " . json_encode($science2000Results), PHP_EOL;
// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos","Masako Natsume","Freddie Mercury"]
The reason for using text embeddings rather than simple text search is that the embeddings capture semantic information. This allows a query to find elements with a similar meaning even if the text is different. For example, the word "entertainer" doesn't appear in any of the descriptions but if you use it as a query, the actors and musicians rank highly in the results:
<?php
require 'vendor/autoload.php';
use function Codewithkyrian\Transformers\Pipelines\pipeline;
use Predis\Client as PredisClient;
$extractor = pipeline('embeddings', 'Xenova/all-MiniLM-L6-v2');
$peopleData = [
'Marie Curie' => [
'born' => 1867,
'died' => 1934,
'description' => 'Polish-French chemist and physicist. The only person ever to win two Nobel prizes for two different sciences.',
],
'Linus Pauling' => [
'born' => 1901,
'died' => 1994,
'description' => 'American chemist and peace activist. One of only two people to win two Nobel prizes in different fields (chemistry and peace).',
],
'Freddie Mercury' => [
'born' => 1946,
'died' => 1991,
'description' => 'British musician, best known as the lead singer of the rock band Queen.',
],
'Marie Fredriksson' => [
'born' => 1958,
'died' => 2019,
'description' => 'Swedish multi-instrumentalist, mainly known as the lead singer and keyboardist of the band Roxette.',
],
'Paul Erdos' => [
'born' => 1913,
'died' => 1996,
'description' => 'Hungarian mathematician, known for his eccentric personality almost as much as his contributions to many different fields of mathematics.',
],
'Maryam Mirzakhani' => [
'born' => 1977,
'died' => 2017,
'description' => 'Iranian mathematician. The first woman ever to win the Fields medal for her contributions to mathematics.',
],
'Masako Natsume' => [
'born' => 1957,
'died' => 1985,
'description' => 'Japanese actress. She was very famous in Japan but was primarily known elsewhere in the world for her portrayal of Tripitaka in the TV series Monkey.',
],
'Chaim Topol' => [
'born' => 1935,
'died' => 2023,
'description' => "Israeli actor and singer, usually credited simply as 'Topol'. He was best known for his many appearances as Tevye in the musical Fiddler on the Roof.",
],
];
$r = new PredisClient([
'scheme' => 'tcp',
'host' => '127.0.0.1',
'port' => 6379,
'password' => '',
'database' => 0,
]);
$r->del('famousPeople');
foreach ($peopleData as $name => $details) {
$embedding = $extractor($details['description'], normalize: true, pooling: 'mean');
$r->vadd('famousPeople', $embedding[0], $name);
$r->vsetattr('famousPeople', $name, [
'born' => $details['born'],
'died' => $details['died'],
]);
}
$actorsEmbedding = $extractor('actors', normalize: true, pooling: 'mean');
$actorsResults = $r->vsim('famousPeople', $actorsEmbedding[0]);
echo "'actors': " . json_encode($actorsResults), PHP_EOL;
// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling","Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury","Marie Curie","Paul Erdos"]
$twoActorsResults = $r->vsim('famousPeople', $actorsEmbedding[0], false, false, 2);
echo "'actors (2)': " . json_encode($twoActorsResults), PHP_EOL;
// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
$entertainerEmbedding = $extractor('entertainer', normalize: true, pooling: 'mean');
$entertainerResults = $r->vsim('famousPeople', $entertainerEmbedding[0]);
echo "'entertainer': " . json_encode($entertainerResults), PHP_EOL;
// >>> 'entertainer': ["Chaim Topol","Freddie Mercury","Linus Pauling","Marie Fredriksson","Masako Natsume","Paul Erdos","Maryam Mirzakhani","Marie Curie"]
$scienceEmbedding = $extractor('science', normalize: true, pooling: 'mean');
$scienceResults = $r->vsim('famousPeople', $scienceEmbedding[0]);
echo "'science': " . json_encode($scienceResults), PHP_EOL;
// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani","Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury","Chaim Topol"]
$science2000Results = $r->vsim('famousPeople', $scienceEmbedding[0], false, false, null, null, null, '.died < 2000');
echo "'science2000': " . json_encode($science2000Results), PHP_EOL;
// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos","Masako Natsume","Freddie Mercury"]
Similarly, if you use "science" as a query, you get the scientists first, followed by the mathematicians:
'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani",
"Paul Erdos","Marie Fredriksson","Masako Natsume",
"Freddie Mercury","Chaim Topol"]
You can also use
filter expressions
with vsim() to restrict the search further. For example, repeat the
"science" query, but this time limit the results to people who died before the
year 2000:
<?php
require 'vendor/autoload.php';
use function Codewithkyrian\Transformers\Pipelines\pipeline;
use Predis\Client as PredisClient;
$extractor = pipeline('embeddings', 'Xenova/all-MiniLM-L6-v2');
$peopleData = [
'Marie Curie' => [
'born' => 1867,
'died' => 1934,
'description' => 'Polish-French chemist and physicist. The only person ever to win two Nobel prizes for two different sciences.',
],
'Linus Pauling' => [
'born' => 1901,
'died' => 1994,
'description' => 'American chemist and peace activist. One of only two people to win two Nobel prizes in different fields (chemistry and peace).',
],
'Freddie Mercury' => [
'born' => 1946,
'died' => 1991,
'description' => 'British musician, best known as the lead singer of the rock band Queen.',
],
'Marie Fredriksson' => [
'born' => 1958,
'died' => 2019,
'description' => 'Swedish multi-instrumentalist, mainly known as the lead singer and keyboardist of the band Roxette.',
],
'Paul Erdos' => [
'born' => 1913,
'died' => 1996,
'description' => 'Hungarian mathematician, known for his eccentric personality almost as much as his contributions to many different fields of mathematics.',
],
'Maryam Mirzakhani' => [
'born' => 1977,
'died' => 2017,
'description' => 'Iranian mathematician. The first woman ever to win the Fields medal for her contributions to mathematics.',
],
'Masako Natsume' => [
'born' => 1957,
'died' => 1985,
'description' => 'Japanese actress. She was very famous in Japan but was primarily known elsewhere in the world for her portrayal of Tripitaka in the TV series Monkey.',
],
'Chaim Topol' => [
'born' => 1935,
'died' => 2023,
'description' => "Israeli actor and singer, usually credited simply as 'Topol'. He was best known for his many appearances as Tevye in the musical Fiddler on the Roof.",
],
];
$r = new PredisClient([
'scheme' => 'tcp',
'host' => '127.0.0.1',
'port' => 6379,
'password' => '',
'database' => 0,
]);
$r->del('famousPeople');
foreach ($peopleData as $name => $details) {
$embedding = $extractor($details['description'], normalize: true, pooling: 'mean');
$r->vadd('famousPeople', $embedding[0], $name);
$r->vsetattr('famousPeople', $name, [
'born' => $details['born'],
'died' => $details['died'],
]);
}
$actorsEmbedding = $extractor('actors', normalize: true, pooling: 'mean');
$actorsResults = $r->vsim('famousPeople', $actorsEmbedding[0]);
echo "'actors': " . json_encode($actorsResults), PHP_EOL;
// >>> 'actors': ["Masako Natsume","Chaim Topol","Linus Pauling","Marie Fredriksson","Maryam Mirzakhani","Freddie Mercury","Marie Curie","Paul Erdos"]
$twoActorsResults = $r->vsim('famousPeople', $actorsEmbedding[0], false, false, 2);
echo "'actors (2)': " . json_encode($twoActorsResults), PHP_EOL;
// >>> 'actors (2)': ["Masako Natsume","Chaim Topol"]
$entertainerEmbedding = $extractor('entertainer', normalize: true, pooling: 'mean');
$entertainerResults = $r->vsim('famousPeople', $entertainerEmbedding[0]);
echo "'entertainer': " . json_encode($entertainerResults), PHP_EOL;
// >>> 'entertainer': ["Chaim Topol","Freddie Mercury","Linus Pauling","Marie Fredriksson","Masako Natsume","Paul Erdos","Maryam Mirzakhani","Marie Curie"]
$scienceEmbedding = $extractor('science', normalize: true, pooling: 'mean');
$scienceResults = $r->vsim('famousPeople', $scienceEmbedding[0]);
echo "'science': " . json_encode($scienceResults), PHP_EOL;
// >>> 'science': ["Linus Pauling","Marie Curie","Maryam Mirzakhani","Paul Erdos","Marie Fredriksson","Masako Natsume","Freddie Mercury","Chaim Topol"]
$science2000Results = $r->vsim('famousPeople', $scienceEmbedding[0], false, false, null, null, null, '.died < 2000');
echo "'science2000': " . json_encode($science2000Results), PHP_EOL;
// >>> 'science2000': ["Linus Pauling","Marie Curie","Paul Erdos","Masako Natsume","Freddie Mercury"]
More information
See the vector sets docs for more information and code examples. See the Redis for AI section for more details about text embeddings and other AI techniques you can use with Redis.
You may also be interested in vector search. This is a feature of Redis Search that lets you retrieve JSON and hash documents based on vector data stored in their fields.