-
Notifications
You must be signed in to change notification settings - Fork 180
Description
Hi, thank you so much for open sourcing this repo. It's amazing and super useful to quickly get some insight into your data.
I created this Foursquare POI app for Italy with 3M points (GitHub) and it really helps me understand the data better! Operating at the limit of GitHub pages' free hosting with a 93Mb data file :D
I was just trying to figure out what kind of models could be used in embedding-atlas as I noticed you only mentioned the kind of dated all-MiniLM-L6-v2
in the API docs. I didn't quite dig in the code too much yet but at first glance it seems only the API version (and not the frontend) supports inferencing right?
So it would be great if you could:
- add some docs about what models work and what backend is used for inferencing in the API version (to understand whether it's already optimized for MPS for example)
- add support for frontend inferencing, e.g. with transformers.js. They already support WebGPU too. Just note that the batch size is absolutely crucial for speed, e.g. have a look at this demo I created to test batching. For reference: indexing the whole bible with 128 as batch size takes 35 seconds for 95605 chunks (M3 Max)
Batch size: 128
Chunks: 95605
Time passed: 35336.00 ms
Embeddings per second: 2705.60
Batch sizes of 64 or 256 instead roughly double the time needed on my device.
Apart from these standard models, I wanted to propose @MinishLab's static models (ideally with model2vec-rs) too as they are much faster on CPU and even rank higher on MTEB :)