Embeddings Indices

EmbeddingsIndex

class recoder.embedding.EmbeddingsIndex[source]

An abstract Embeddings Index from which to fetch embeddings and execute nearest neighbor search on the items represented by the embeddings

All EmbeddingsIndex should implement this interface.

get_embedding(embedding_id)[source]

Returns the embedding of the item embedding_id

get_nns_by_id(embedding_id, n)[source]

Returns the n nearest neighbors of the embedding_id

get_nns_by_embedding(embedding, n)[source]

Returns the n nearest neighbors of the embedding

get_similarity(id1, id2)[source]

Returns the similarity between item id1 and item id2

AnnoyEmbeddingsIndex

class recoder.embedding.AnnoyEmbeddingsIndex(embeddings=None, id_map=None, n_trees=10, search_k=-1, include_distances=False)[source]

An EmbeddingsIndex based on AnnoyIndex [1] to efficiently execute nearest neighbors search with trade off in accuracy.

The similarity between items is the cosine similarity.

Parameters:
  • embeddings (numpy.array, optional) – the matrix that holds the embeddings of shape (number of items * embedding size). Required to build the index.
  • id_map (dict, optional) – A dict that maps the items original ids to the indices of the embeddings. Useful to fetch and do nearest neighbor search on the original items ids. If not provided, it will simply be an identity map.
  • n_trees (int, optional) – n_trees parameter used to build AnnoyIndex.
  • search_k (int, optional) – search_k parameter used to search the AnnoyIndex for nearest items.
  • include_distances (bool, optional) – include distances in the result returned on nearest search

[1]: https://github.com/spotify/annoy

build(index_file=None)[source]

Builds the embeddings index, and stores it in index_file if provided.

Parameters:index_file (str, optional) – the index file path where to save the index. Note: The annoy index file is stored in a separate file, which should be in the same directory as index_file.
load(index_file)[source]

Loads the embeddings index from a saved index file.

Parameters:index_file (str) – the index file path to load the state of the index. Note: The annoy index file is stored in a separate file, which should be in the same directory as index_file.
get_embedding(embedding_id)[source]

Returns the embedding of the item embedding_id

get_nns_by_id(embedding_id, n)[source]

Returns the n nearest neighbors of the embedding_id

get_nns_by_embedding(embedding, n)[source]

Returns the n nearest neighbors of the embedding

get_similarity(id1, id2)[source]

Returns the similarity between item id1 and item id2

MemCacheEmbeddingsIndex

class recoder.embedding.MemCacheEmbeddingsIndex(embedding_index)[source]

Caches EmbeddingsIndex nearest neighbor search results for each item in memory to reduce computations.

Parameters:embedding_index (EmbeddingsIndex) – the EmbeddingsIndex to hit on cache misses.
get_embedding(embedding_id)[source]

Returns the embedding of the item embedding_id

get_nns_by_embedding(embedding, n)[source]

Returns the n nearest neighbors of the embedding

get_nns_by_id(embedding_id, n)[source]

Returns the n nearest neighbors of the embedding_id

get_similarity(id1, id2)[source]

Returns the similarity between item id1 and item id2