Skip to content

Conversation

williambdean
Copy link
Contributor

@williambdean williambdean commented Jul 3, 2025

Description

Still a work in progress.

The LazyFrame like libraries will require a provided observation_end_date. However, that can be found outside of the

Still building out the functionality for the:

  • remove first observation - mean with nans might differ in the backends

Related Issue

  • Closes #
  • Related to #

Checklist


📚 Documentation preview 📚: https://pymc-marketing--1809.org.readthedocs.build/en/1809/

Copy link

codecov bot commented Jul 3, 2025

Codecov Report

Attention: Patch coverage is 26.31579% with 14 lines in your changes missing coverage. Please review.

Project coverage is 40.45%. Comparing base (ca0c420) to head (f4804c8).

Files with missing lines Patch % Lines
pymc_marketing/clv/utils.py 26.31% 14 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (ca0c420) and HEAD (f4804c8). Click for more details.

HEAD has 9 uploads less than BASE
Flag BASE (ca0c420) HEAD (f4804c8)
22 13
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1809       +/-   ##
===========================================
- Coverage   92.28%   40.45%   -51.84%     
===========================================
  Files          62       62               
  Lines        7469     7487       +18     
===========================================
- Hits         6893     3029     -3864     
- Misses        576     4458     +3882     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@juanitorduz
Copy link
Collaborator

OMG! 100% yes! 🥳

@ColtAllen
Copy link
Collaborator

ColtAllen commented Jul 15, 2025

OMG! 100% yes! 🥳

Indeed, thanks for starting this!

Do you think the current pandas functions should still be retained for a time even after this is merged? Also, it seems like the PR description in the original message requires more details.

@williambdean
Copy link
Contributor Author

Do you think the current pandas functions should still be retained for a time even after this is merged? Also, it seems like the PR description in the original message requires more details.

I was just doing some comparisons of the two at the moment. However, I think that the new one should just take it's place.

@williambdean
Copy link
Contributor Author

Maybe @ColtAllen is interested in taking this over?

Copy link

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @williambdean 👋🏼 I just find out this PR 🔥 left a couple of comments that might help😇

Comment on lines +306 to +307
if observation_period_end is None:
observation_period_end = transactions[datetime_col].cast(nw.Datetime).max()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this would work/is supported, but you might try to do:

Suggested change
if observation_period_end is None:
observation_period_end = transactions[datetime_col].cast(nw.Datetime).max()
if observation_period_end is None:
observation_period_end = pl.col("max").max()

to get the global max datetime value.

This might also help to avoid this requirement:

The LazyFrame like libraries will require a provided observation_end_date. However, that can be found outside of the

) -> IntoFrameT:
transactions = nw.from_native(transactions)

date = nw.col(datetime_col).cast(nw.Datetime)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very tempting, but consider creating a new column between operations - I would be afraid that for pandas the casting happens multiple times instead of once


customers = (
nw.from_native(repeated_transactions)
.group_by(customer_id_col)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some time now, it should be possible to pass an expression so that you can avoid the renaming down in the pipeline, but it's definitely more of a personal preference 😇

Suggested change
.group_by(customer_id_col)
.group_by(nw.col(customer_id_col).alias("customer_id"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants