Making citation-map an order of magnitude smaller and faster #5841
devlux76
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm working on ways to reduce the bulk data so it's easier to deal with.
The citation-map is about 1.2GB uncompressed.
You could add a view like the one below and shrink it down to less than 120MB without any data loss at all.
Then you export the view like this.
The above is just using duckdb on the raw .csv file, but it can easily be adapted to Postgres. It takes about 10 minutes to run on my 2018 MacBook Pro with 32GB of RAM.
The end result is over a 10x reduction in disk space and makes the data much faster to work with.
Beta Was this translation helpful? Give feedback.
All reactions