DoWhy documentation#
Date: Nov 22, 2024 Version: main
Related resources: Source Repository | Issues & Ideas | PyWhy Organization | DoWhy on PyPI |
Join our Community on Discord: https://discord.gg/cSBGb3vsZb
Much like machine learning libraries have done for prediction, DoWhy is a Python library that aims to spark causal thinking and analysis. DoWhy provides a wide variety of algorithms for effect estimation, causal structure learning, diagnosis of causal structures, root cause analysis, interventions and counterfactuals.
Key differences compared to available causal inference software#
DoWhy brings four key differences compared to available software for causal inference:
- Explicit identifying assumptions
Assumptions are first-class citizens in DoWhy.
Each analysis starts by building a causal model. The assumptions can be viewed graphically or in terms of conditional independence statements. Further, in the case of GCMs, the data generation process of each node is modeled explicitly. Wherever possible, DoWhy can also automatically test stated assumptions using observed data.
- Separation between identification and estimation
Identification is the causal problem. Estimation is simply a statistical problem.
DoWhy respects this boundary and treats them separately. This focuses the causal inference effort on identification and frees up estimation using any available statistical estimator for a target estimand. In addition, multiple estimation methods can be used for a single identified estimand and vice-versa. The same goes for modeling causal mechanisms, where any third-party machine learning package can be used for modeling the functional relationships.
- Automated validation of assumptions
What happens when key identifying assumptions may not be satisfied?
The most critical, and often skipped, part of causal analysis is checking whether the made assumptions about the causal relationships hold. DoWhy makes it easy to automatically run sensitivity and robustness checks on the obtained estimate, to falsify the given causal graph, or to evaluate fitted causal mechanisms.
- Default parameters for simple application of complex algorithms
Selecting the right set of variables or models is a hard problem.
DoWhy aims to select appropriate parameters by default while allowing users to fully customize each function call and model specification. For instance, DoWhy automatically selects the most appropriate identification method or offers functionalities to automatically assign appropriate causal mechanisms.
Finally, DoWhy is easily extensible with a particular focus on supporting other libraries, such as EconML, CausalML, scikit-learn and more. Algorithms are implemented in a modular way, encouraging users to contribute their own or to simply plug in different customized models.