-
Notifications
You must be signed in to change notification settings - Fork 317
Major updates on utils functions #1927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1927 +/- ##
==========================================
- Coverage 91.99% 91.26% -0.73%
==========================================
Files 66 66
Lines 7843 8253 +410
==========================================
+ Hits 7215 7532 +317
- Misses 628 721 +93 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Big kudos to @ricardoV94 because all initial draft functions came from him. I'm just a parrot opening the PRs and showing my use cases! |
PS: Very open to change names, I feel the current names are all quite bad 😅 |
Wow! Just, wow! This will take a moment to process, but looks amazing. |
Why one giant PR 😓 |
Sorry for it, but I added good level of tests, maybe try it out over different scenarios (?) and then we can move on! I just wanna to put all together. @williambdean |
There might be a version / behavior change with latest mlflow. Can you check the version. Maybe a mlflow 3.0 thing |
class MaskedDist: | ||
"""Create a masked deterministic from a Prior over full dims. | ||
|
||
The foal is to reduce the number of parameters in the model by creating a masked deterministic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for horsing around, but this should be "The goal" 😉
MLflow stuff should be fixed |
Thanks a lot, you are the best. I'll follow your guide there, and continue adding tests. |
full_dims: tuple[str, ...] = ( | ||
(self.dims if isinstance(self.dims, tuple) else (self.dims,)) | ||
if self.dims | ||
else () | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior class should already do this
Description
Hey team, major updates in the utility functions. Feel free to jump, comment or modify if you think something can be done better!
New Masked Distribution Prior
As users have the multidimensionality capabilities, models can grow without control, for example, currently is possible to make a model with dims (date, country, state, city, brand, product, channel) & shape (720, 50, 90, 10, 100, 12).
Even when is great to bring this flexibility to the users, probably given the huge dimensional space not all data will be perfectly balance, probably some values in the matrix can be missing. The
MaskedDist
allow the user to add a mask over the prior, modifying the tensor graph behind it to avoid to sample does none existent parameters, optimizing the process.What it does (in short)
Benefits
Users can combine this to modify the params into their functions, such as
saturation
oradstock
.If from the full grid, some panels are missing they can mask the likelihood as well to not sample on dates or combos dates-region-city which doesn't exist.
Reducing the dimensionality brings huge improvements in efficiency. Already some examples show, even in low dimensional problems a 30% less of computational time given a mask which cuts a few diagonals in our matrix.
Important
Out of sample with only test data can't handle the masking around likelihood. So, in order to make out of sample, you must share the full X_train and X_test to the sample posterior predictive.
Model JAX sampling estimation
Related to the above, I added a helper in JAX to users estimate the minimum sample time from their models. Doing so, users could compare the minimum amount of time that their given model will take.
This tool is great for comparison, helps to have ideas around how heavy is your model and make changes before even properly sample the first time.
Merge graphical models
Combines multiple independent PyMC Models into a single Model.
As we move forward in causality, and discover new DAGs, a way to collect direct or indirect effects could be create different regressions, every one adjusting by the minimum set of variables. Each regression will respond to the same DAG but will uncover different effects per channel depending on the adjustment.
Nevertheless, we probably want to optimize all regressions together or we want to join the mode to make several operations. But how to? Well let's remember PyMC models are graphical generative model so, each model can be represent as a function graph, and their nodes/variables could be prefixed (e.g., model1_, model2_). Doing so, we could decide which variable or data will be the one to merge_on="some_var" and the variable is kept unprefixed and shared across the merged models. All outputs and coordinates are merged into a single function graph.
Quick example, imagine you have the following DAG:
This structure makes the minimal sufficient sets different for each channel’s direct effect on (Y):
For (X_1): backdoor via (C_1); mediated paths via (X_2(\to X_3)).
Adjust: ({C_1, X_2}) (conditioning on (X_2) also blocks the (X_1 \to X_2 \to X_3 \to Y) path).
For (X_2): backdoors via (C_2) and via (X_1) (since (X_1 \to X_2) and (X_1 \to Y)); mediated path via (X_3).
Adjust: ({C_2, X_1, X_3}).
For (X_3): backdoors via (C_3) and via (X_2) (since (X_2 \to X_3) and (X_2 \to Y)).
Adjust: ({C_3, X_2}).
Meaning, we need three models to uncover the effects:
As you can see we should be able to zero out with mask the distributions to avoid calculate the contributions for the channels we don't want to adjust in the regression. Nevertheless, even when channels are zero out given the priors and not consider during the sampling, the data input is the same for all.
We could merge based on it.
And finally, optimize.
This opens the door to make optimization of different effects (total, indirects or directs) together. Allows for multi-objective optimization. Every model could have a different target variable but the target depends on same inputs, and consequently we want to optimize A and B together.
Conclusion
These examples are just the tip of the iceberg, but they allow users to scale their models in dimensionality and optimize them as much as possible, even in cases with panel imbalance. Estimate sampling time to decide if MCMC is feasible or VI methods could be a better option. Even for cases where they could end with multiple models, they can merge them together to consolidate their learnings or optimize together mixing different outputs.
Related Issue
Checklist
pre-commit.ci autofix
to auto-fix.📚 Documentation preview 📚: https://pymc-marketing--1927.org.readthedocs.build/en/1927/