DoWhy: Interpreters for Causal Estimators#

This is a quick introduction to the use of interpreters in the DoWhy causal inference library. We will load in a sample dataset, use different methods for estimating the causal effect of a (pre-specified)treatment variable on a (pre-specified) outcome variable and demonstrate how to interpret the obtained results.

First, let us add the required path for Python to find the DoWhy code and load all required packages

[1]:
%load_ext autoreload
%autoreload 2
[2]:
import numpy as np
import pandas as pd
import logging

import dowhy
from dowhy import CausalModel
import dowhy.datasets

Now, let us load a dataset. For simplicity, we simulate a dataset with linear relationships between common causes and treatment, and common causes and outcome.

Beta is the true causal effect.

[3]:
data = dowhy.datasets.linear_dataset(beta=1,
        num_common_causes=5,
        num_instruments = 2,
        num_treatments=1,
        num_discrete_common_causes=1,
        num_samples=10000,
        treatment_is_binary=True,
        outcome_is_binary=False)
df = data["df"]
print(df[df.v0==True].shape[0])
df
8470
[3]:
Z0 Z1 W0 W1 W2 W3 W4 v0 y
0 1.0 0.758179 0.223113 -0.583247 1.561522 0.016196 3 True 3.919456
1 1.0 0.336647 0.339881 0.042399 -1.543854 0.047453 1 True 0.311371
2 1.0 0.019540 -0.224140 -1.984048 0.460944 -0.704755 3 True 1.732886
3 1.0 0.811149 -1.666045 1.904262 0.062447 -0.203613 3 True 2.805750
4 1.0 0.816470 0.409841 -0.913059 -0.064024 0.073619 3 True 2.319611
... ... ... ... ... ... ... ... ... ...
9995 0.0 0.577807 -1.296264 -0.774313 -0.110763 -1.224282 2 True 0.668734
9996 0.0 0.981149 -2.850787 0.037785 -0.597924 -2.228819 1 True -0.898895
9997 0.0 0.747377 -0.952558 0.711825 2.137963 -1.312254 0 True 2.795585
9998 0.0 0.828307 -0.555047 0.259252 0.169997 0.896667 0 True 1.032831
9999 0.0 0.064636 -0.465600 -1.135197 -0.625661 -0.010386 0 False -1.554850

10000 rows × 9 columns

Note that we are using a pandas dataframe to load the data.

Identifying the causal estimand#

We now input a causal graph in the GML graph format.

[4]:
# With graph
model=CausalModel(
        data = df,
        treatment=data["treatment_name"],
        outcome=data["outcome_name"],
        graph=data["gml_graph"],
        instruments=data["instrument_names"]
        )
[5]:
model.view_model()
../_images/example_notebooks_dowhy_interpreter_9_0.png
[6]:
from IPython.display import Image, display
display(Image(filename="causal_model.png"))
../_images/example_notebooks_dowhy_interpreter_10_0.png

We get a causal graph. Now identification and estimation is done.

[7]:
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W1,W0,W2,W4])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W1,W0,W2,W4,U) = P(y|v0,W3,W1,W0,W2,W4)

### Estimand : 2
Estimand name: iv
Estimand expression:
 ⎡                              -1⎤
 ⎢    d        ⎛    d          ⎞  ⎥
E⎢─────────(y)⋅⎜─────────([v₀])⎟  ⎥
 ⎣d[Z₀  Z₁]    ⎝d[Z₀  Z₁]      ⎠  ⎦
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0,Z1})
Estimand assumption 2, Exclusion: If we remove {Z0,Z1}→{v0}, then ¬({Z0,Z1}→y)

### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!

Method 1: Propensity Score Stratification#

We will be using propensity scores to stratify units in the data.

[8]:
causal_estimate_strat = model.estimate_effect(identified_estimand,
                                              method_name="backdoor.propensity_score_stratification",
                                              target_units="att")
print(causal_estimate_strat)
print("Causal Estimate is " + str(causal_estimate_strat.value))
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W1,W0,W2,W4])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W1,W0,W2,W4,U) = P(y|v0,W3,W1,W0,W2,W4)

## Realized estimand
b: y~v0+W3+W1+W0+W2+W4
Target units: att

## Estimate
Mean value: 1.002656737901175

Causal Estimate is 1.002656737901175

Textual Interpreter#

The textual Interpreter describes (in words) the effect of unit change in the treatment variable on the outcome variable.

[9]:
# Textual Interpreter
interpretation = causal_estimate_strat.interpret(method_name="textual_effect_interpreter")
Increasing the treatment variable(s) [v0] from 0 to 1 causes an increase of 1.002656737901175 in the expected value of the outcome [['y']], over the data distribution/population represented by the dataset.

Visual Interpreter#

The visual interpreter plots the change in the standardized mean difference (SMD) before and after Propensity Score based adjustment of the dataset. The formula for SMD is given below.

\(SMD = \frac{\bar X_{1} - \bar X_{2}}{\sqrt{(S_{1}^{2} + S_{2}^{2})/2}}\)

Here, \(\bar X_{1}\) and \(\bar X_{2}\) are the sample mean for the treated and control groups.

[10]:
# Visual Interpreter
interpretation = causal_estimate_strat.interpret(method_name="propensity_balance_interpreter")
../_images/example_notebooks_dowhy_interpreter_18_0.png

This plot shows how the SMD decreases from the unadjusted to the stratified units.

Method 2: Propensity Score Matching#

We will be using propensity scores to match units in the data.

[11]:
causal_estimate_match = model.estimate_effect(identified_estimand,
                                              method_name="backdoor.propensity_score_matching",
                                              target_units="atc")
print(causal_estimate_match)
print("Causal Estimate is " + str(causal_estimate_match.value))
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W1,W0,W2,W4])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W1,W0,W2,W4,U) = P(y|v0,W3,W1,W0,W2,W4)

## Realized estimand
b: y~v0+W3+W1+W0+W2+W4
Target units: atc

## Estimate
Mean value: 1.0038102718387791

Causal Estimate is 1.0038102718387791
[12]:
# Textual Interpreter
interpretation = causal_estimate_match.interpret(method_name="textual_effect_interpreter")
Increasing the treatment variable(s) [v0] from 0 to 1 causes an increase of 1.0038102718387791 in the expected value of the outcome [['y']], over the data distribution/population represented by the dataset.

Cannot use propensity balance interpretor here since the interpreter method only supports propensity score stratification estimator.

Method 3: Weighting#

We will be using (inverse) propensity scores to assign weights to units in the data. DoWhy supports a few different weighting schemes: 1. Vanilla Inverse Propensity Score weighting (IPS) (weighting_scheme=”ips_weight”) 2. Self-normalized IPS weighting (also known as the Hajek estimator) (weighting_scheme=”ips_normalized_weight”) 3. Stabilized IPS weighting (weighting_scheme = “ips_stabilized_weight”)

[13]:
causal_estimate_ipw = model.estimate_effect(identified_estimand,
                                            method_name="backdoor.propensity_score_weighting",
                                            target_units = "ate",
                                            method_params={"weighting_scheme":"ips_weight"})
print(causal_estimate_ipw)
print("Causal Estimate is " + str(causal_estimate_ipw.value))
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W1,W0,W2,W4])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W1,W0,W2,W4,U) = P(y|v0,W3,W1,W0,W2,W4)

## Realized estimand
b: y~v0+W3+W1+W0+W2+W4
Target units: ate

## Estimate
Mean value: 1.2641975291140912

Causal Estimate is 1.2641975291140912
[14]:
# Textual Interpreter
interpretation = causal_estimate_ipw.interpret(method_name="textual_effect_interpreter")
Increasing the treatment variable(s) [v0] from 0 to 1 causes an increase of 1.2641975291140912 in the expected value of the outcome [['y']], over the data distribution/population represented by the dataset.
[15]:
interpretation = causal_estimate_ipw.interpret(method_name="confounder_distribution_interpreter", fig_size=(8,8), font_size=12, var_name='W4', var_type='discrete')
../_images/example_notebooks_dowhy_interpreter_27_0.png
[ ]: