-
Notifications
You must be signed in to change notification settings - Fork 317
Narwhals support for CLV aggregation #1809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1809 +/- ##
===========================================
- Coverage 92.28% 40.45% -51.84%
===========================================
Files 62 62
Lines 7469 7487 +18
===========================================
- Hits 6893 3029 -3864
- Misses 576 4458 +3882 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
OMG! 100% yes! 🥳 |
Indeed, thanks for starting this! Do you think the current pandas functions should still be retained for a time even after this is merged? Also, it seems like the PR description in the original message requires more details. |
I was just doing some comparisons of the two at the moment. However, I think that the new one should just take it's place. |
Maybe @ColtAllen is interested in taking this over? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @williambdean 👋🏼 I just find out this PR 🔥 left a couple of comments that might help😇
if observation_period_end is None: | ||
observation_period_end = transactions[datetime_col].cast(nw.Datetime).max() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this would work/is supported, but you might try to do:
if observation_period_end is None: | |
observation_period_end = transactions[datetime_col].cast(nw.Datetime).max() | |
if observation_period_end is None: | |
observation_period_end = pl.col("max").max() |
to get the global max datetime value.
This might also help to avoid this requirement:
The LazyFrame like libraries will require a provided observation_end_date. However, that can be found outside of the
) -> IntoFrameT: | ||
transactions = nw.from_native(transactions) | ||
|
||
date = nw.col(datetime_col).cast(nw.Datetime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very tempting, but consider creating a new column between operations - I would be afraid that for pandas the casting happens multiple times instead of once
|
||
customers = ( | ||
nw.from_native(repeated_transactions) | ||
.group_by(customer_id_col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some time now, it should be possible to pass an expression so that you can avoid the renaming down in the pipeline, but it's definitely more of a personal preference 😇
.group_by(customer_id_col) | |
.group_by(nw.col(customer_id_col).alias("customer_id")) |
Description
Still a work in progress.
The LazyFrame like libraries will require a provided observation_end_date. However, that can be found outside of the
Still building out the functionality for the:
Related Issue
Checklist
pre-commit.ci autofix
to auto-fix.📚 Documentation preview 📚: https://pymc-marketing--1809.org.readthedocs.build/en/1809/