Embeddings Indices¶
EmbeddingsIndex¶
AnnoyEmbeddingsIndex¶
-
class
recoder.embedding.
AnnoyEmbeddingsIndex
(embeddings=None, id_map=None, n_trees=10, search_k=-1, include_distances=False)[source]¶ An
EmbeddingsIndex
based onAnnoyIndex
[1] to efficiently execute nearest neighbors search with trade off in accuracy.The similarity between items is the cosine similarity.
Parameters: - embeddings (numpy.array, optional) – the matrix that holds the embeddings of shape (number of items * embedding size). Required to build the index.
- id_map (dict, optional) – A dict that maps the items original ids to the indices of the embeddings. Useful to fetch and do nearest neighbor search on the original items ids. If not provided, it will simply be an identity map.
- n_trees (int, optional) – n_trees parameter used to build AnnoyIndex.
- search_k (int, optional) – search_k parameter used to search the AnnoyIndex for nearest items.
- include_distances (bool, optional) – include distances in the result returned on nearest search
[1]: https://github.com/spotify/annoy
-
build
(index_file=None)[source]¶ Builds the embeddings index, and stores it in
index_file
if provided.Parameters: index_file (str, optional) – the index file path where to save the index. Note: The annoy index file is stored in a separate file, which should be in the same directory as index_file
.
MemCacheEmbeddingsIndex¶
-
class
recoder.embedding.
MemCacheEmbeddingsIndex
(embedding_index)[source]¶ Caches
EmbeddingsIndex
nearest neighbor search results for each item in memory to reduce computations.Parameters: embedding_index (EmbeddingsIndex) – the EmbeddingsIndex to hit on cache misses.