enable projection from pre-computed vectors #21
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello, Nathan from Notion engineering here, enjoying using embedding-atlas for our use cases so far!
We currently have the vector datasets generated, and want to visualize them easily. This is in support of evaluating various embeddings models without having to re-generate them each time we load the dataset.
The current options are:
However, when we have the vectors pre-computed - our only current path is to manually set the x, y, and neighbors columns - which means I need to additional data wrangling before loading it in.
I actually did try this, but I spend quite a bit of time getting things fully working. It would partially work, but things like the nearest neighbor search would not work because there are some internal assumptions about
_row_index
or__neighbors
that I had to try and re-create.I'm thinking that embedding-atlas actually already takes care of setting up these columns correctly! So if we pass in a vector field, then let embedding-atlas run UMAP and set the proper columns, it works out of the box.
Testing
tested locally, happy for suggestions on any additional tests you'd like to see
python -m embedding_atlas.cli tmp/processed.parquet --vector vector --text span_text