Skip to content

Fork arrow2 and get rid of polars #4789

@emilk

Description

@emilk

While we want to migrate from the arrow2 crate to arrow (#3741), it is a big task that we would rather punt on right now. It is technical debt, but the debt is not going to grow significantly. The gains don't justify the potential rabbit hole of paint it could turn into.

One of the major reasons to migrate away from arrow2 is because DataType has a huge overhead, especially when cloned.

We have a PR to fix it (jorgecarleitao/arrow2#1469) but it is unmerged, because arrow2 in unmaintained.

So: we fork arrow2 as re_arrow2, merge our PR, and solve our immediate memory issue.

Since polars require arrow2, we need to stop using it. We only have it for a few tests.


We should revisit the migrating away from arrow2 when we start exposing arrow things to the users (e.g. support data queries in the SDK) and/or when we want to interface with a some data frame crate.

Metadata

Metadata

Assignees

Labels

blockedcan't make progress right nowdependenciesconcerning crates, pip packages etcenhancementNew feature or request🏹 arrowApache Arrow📉 performanceOptimization, memory use, etc

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions