Skip to content

Conversation

nxlouie
Copy link
Contributor

@nxlouie nxlouie commented Aug 11, 2025

Hello, Nathan from Notion engineering here, enjoying using embedding-atlas for our use cases so far!

We currently have the vector datasets generated, and want to visualize them easily. This is in support of evaluating various embeddings models without having to re-generate them each time we load the dataset.

The current options are:

  • text -> 2D projection end-to-end taken care of
  • image -> 2D projection end-to-end taken care of

However, when we have the vectors pre-computed - our only current path is to manually set the x, y, and neighbors columns - which means I need to additional data wrangling before loading it in.

I actually did try this, but I spend quite a bit of time getting things fully working. It would partially work, but things like the nearest neighbor search would not work because there are some internal assumptions about _row_index or __neighbors that I had to try and re-create.

I'm thinking that embedding-atlas actually already takes care of setting up these columns correctly! So if we pass in a vector field, then let embedding-atlas run UMAP and set the proper columns, it works out of the box.

Testing

tested locally, happy for suggestions on any additional tests you'd like to see
python -m embedding_atlas.cli tmp/processed.parquet --vector vector --text span_text

Copy link
Collaborator

@donghaoren donghaoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for your contribution!

@donghaoren donghaoren merged commit da53484 into apple:main Aug 11, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants