Getting Started

Installation

The simplest installation is through pip or conda:

pip install dowhy
conda install -c conda-forge dowhy

Further installation scenarios and instructions can be found at Installation.

“Hello causal inference world”

In this section, we will show the “Hello world” version of DoWhy. DoWhy is based on a simple unifying language for causal inference, unifying two powerful frameworks, namely graphical causal models (GCM) and potential outcomes (PO). It uses graph-based criteria and do-calculus for modeling assumptions and identifying a non-parametric causal effect.

To get you started, we introduce two features out of a large variety of features DoWhy offers.

Effect inference

For effect estimation, DoWhy switches to methods based primarily on potential outcomes. To do it, DoWhy offers a simple 4-step recipe consisting of modeling a causal model, identification, estimation, and refutation:

from dowhy import CausalModel
import dowhy.datasets

# Generate some sample data
data = dowhy.datasets.linear_dataset(
    beta=10,
    num_common_causes=5,
    num_instruments=2,
    num_samples=10000)

# Step 1: Create a causal model from the data and given graph.
model = CausalModel(
    data=data["df"],
    treatment=data["treatment_name"],
    outcome=data["outcome_name"],
    graph=data["gml_graph"])

# Step 2: Identify causal effect and return target estimands
identified_estimand = model.identify_effect()

# Step 3: Estimate the target estimand using a statistical method.
estimate = model.estimate_effect(identified_estimand,
                                 method_name="backdoor.propensity_score_matching")

# Step 4: Refute the obtained estimate using multiple robustness checks.
refute_results = model.refute_estimate(identified_estimand, estimate,
                                       method_name="random_common_cause")

To understand what these four steps mean (and why we need four steps), the best place to learn more is the user guide’s Effect inference chapter. Alternatively, you can dive into the code and explore basic features in Basic Example for Calculating the Causal Effect.

For estimation of conditional effects, you can also use methods from EconML using the same API, refer to Conditional Average Treatment Effects (CATE) with DoWhy and EconML.

Graphical causal model-based inference

For features like root cause analysis, structural analysis and similar, DoWhy uses graphical causal models. The language of graphical causal models again offers a variety of causal questions that can be answered. DoWhy’s API to answer these causal questions follows a simple 3-step recipe as follows:

import networkx as nx, numpy as np, pandas as pd
from dowhy import gcm

# Let's generate some "normal" data we assume we're given from our problem domain:
X = np.random.normal(loc=0, scale=1, size=1000)
Y = 2 * X + np.random.normal(loc=0, scale=1, size=1000)
Z = 3 * Y + np.random.normal(loc=0, scale=1, size=1000)
data = pd.DataFrame(dict(X=X, Y=Y, Z=Z))

# Step 1: Model our system:
causal_model = gcm.StructuralCausalModel(nx.DiGraph([('X', 'Y'), ('Y', 'Z')]))
gcm.auto.assign_causal_mechanisms(causal_model, data)

# Step 2: Train our causal model with the data from above:
gcm.fit(causal_model, data)

# Step 3: Perform a causal analysis. E.g. we have an:
anomalous_record = pd.DataFrame(dict(X=[.7], Y=[100.0], Z=[303.0]))
# ... and would like to answer the question:
# "Which node is the root cause of the anomaly in Z?":
anomaly_attribution = gcm.attribute_anomalies(causal_model, "Z", anomalous_record)

Again, if this doesn’t entirely make sense, yet, we recommend starting with GCM-based inference (Experimental) in the user guide or check out Basic Example for Graphical Causal Model-Based Intervention.

Further resources

There’s further resources available: