Functional API Preview#

This notebook is part of a set of notebooks that provides a preview of the proposed functional API for dowhy. For details on the new API for DoWhy, check out py-why/dowhy It is a work-in-progress and is updated as we add new functionality. We welcome your feedback through Discord or on the Discussions page. This functional API is designed with backwards compatibility. So both the old and new API will continue to co-exist and work for the immediate new releases. Gradually the old API using CausalModel will be deprecated in favor of the new API.

The current Functional API covers: * Identify Effect: * identify_effect(...): Run the identify effect algorithm using defaults just provide the graph, treatment and outcome. * auto_identify_effect(...): More configurable version of identify_effect(...). * id_identify_effect(...): Identify Effect using the ID-Algorithm. * Refute Estimate: * refute_estimate: Function to run a set of the refuters below with the default parameters. * refute_bootstrap: Refute an estimate by running it on a random sample of the data containing measurement error in the confounders. * refute_data_subset: Refute an estimate by rerunning it on a random subset of the original data. * refute_random_common_cause: Refute an estimate by introducing a randomly generated confounder (that may have been unobserved). * refute_placebo_treatment: Refute an estimate by replacing treatment with a randomly-generated placebo variable. * sensitivity_simulation: Add an unobserved confounder for refutation (Simulation of an unobserved confounder). * sensitivity_linear_partial_r2: Add an unobserved confounder for refutation (Linear partial R2 : Sensitivity Analysis for linear models). * sensitivity_non_parametric_partial_r2: Add an unobserved confounder for refutation (Non-Parametric partial R2 based : Sensitivity Analyis for non-parametric models). * sensitivity_e_value: Computes E-value for point estimate and confidence limits. Benchmarks E-values against measured confounders using Observed Covariate E-values. Plots E-values and Observed Covariate E-values. * refute_dummy_outcome: Refute an estimate by introducing a randomly generated confounder (that may have been unobserved).

Import Dependencies#

[1]:
# Config dict to set the logging level
import logging.config

from dowhy import CausalModel  # We still need this as we haven't created the functional API for effect estimation
from dowhy.causal_estimators.econml import Econml
from dowhy.causal_estimators.propensity_score_matching_estimator import PropensityScoreMatchingEstimator
from dowhy.causal_graph import CausalGraph
from dowhy.graph import build_graph

# Functional API imports
from dowhy.causal_identifier import (
    BackdoorAdjustment,
    EstimandType,
    identify_effect,
    identify_effect_auto,
    identify_effect_id,
)
from dowhy.causal_refuters import (
    refute_bootstrap,
    refute_data_subset,
    refute_estimate,
)
from dowhy.datasets import linear_dataset

DEFAULT_LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    "loggers": {
        "": {
            "level": "WARN",
        },
    },
}


# set random seed for deterministic dataset generation
# and avoid problems when running tests
import numpy as np
np.random.seed(1)

logging.config.dictConfig(DEFAULT_LOGGING)
# Disabling warnings output
import warnings
from sklearn.exceptions import DataConversionWarning

warnings.filterwarnings(action="ignore", category=DataConversionWarning)

Create the Datasets#

[2]:
# Parameters for creating the Dataset
TREATMENT_IS_BINARY = True
BETA = 10
NUM_SAMPLES = 500
NUM_CONFOUNDERS = 3
NUM_INSTRUMENTS = 2
NUM_EFFECT_MODIFIERS = 2

# Creating a Linear Dataset with the given parameters
data = linear_dataset(
    beta=BETA,
    num_common_causes=NUM_CONFOUNDERS,
    num_instruments=NUM_INSTRUMENTS,
    num_effect_modifiers=NUM_EFFECT_MODIFIERS,
    num_samples=NUM_SAMPLES,
    treatment_is_binary=True,
)

data_2 = linear_dataset(
    beta=BETA,
    num_common_causes=NUM_CONFOUNDERS,
    num_instruments=NUM_INSTRUMENTS,
    num_effect_modifiers=NUM_EFFECT_MODIFIERS,
    num_samples=NUM_SAMPLES,
    treatment_is_binary=True,
)

treatment_name = data["treatment_name"]
print(treatment_name)
outcome_name = data["outcome_name"]
print(outcome_name)

graph = build_graph(
    action_nodes=treatment_name,
    outcome_nodes=outcome_name,
    effect_modifier_nodes=data["effect_modifier_names"],
    common_cause_nodes=data["common_causes_names"],
)
observed_nodes = data["df"].columns.tolist()
['v0']
y

Identify Effect - Functional API (Preview)#

[3]:
# Default identify_effect call example:
identified_estimand = identify_effect(graph, treatment_name, outcome_name, observed_nodes)

# auto_identify_effect example with extra parameters:
identified_estimand_auto = identify_effect_auto(
    graph,
    treatment_name,
    outcome_name,
    observed_nodes,
    estimand_type=EstimandType.NONPARAMETRIC_ATE,
    backdoor_adjustment=BackdoorAdjustment.BACKDOOR_EFFICIENT,
)

# id_identify_effect example:
identified_estimand_id = identify_effect_id(
    graph, treatment_name, outcome_name
)  # Note that the return type for id_identify_effect is IDExpression and not IdentifiedEstimand

print(identified_estimand)
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W1,W0,W2])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,W2,U) = P(y|v0,W1,W0,W2)

### Estimand : 2
Estimand name: iv
No such variable(s) found!

### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!

Estimate Effect - Functional API (Preview)#

[4]:
# Basic Estimate Effect function
estimator = PropensityScoreMatchingEstimator(
    identified_estimand=identified_estimand,
    test_significance=None,
    evaluate_effect_strength=False,
    confidence_intervals=False,
).fit(
    data=data["df"],
    effect_modifier_names=data["effect_modifier_names"]
)

estimate = estimator.estimate_effect(
    data=data["df"],
    control_value=0,
    treatment_value=1,
    target_units="ate",
)

# Using same estimator with different data
second_estimate = estimator.estimate_effect(
    data=data_2["df"],
    control_value=0,
    treatment_value=1,
    target_units="ate",
)

print(estimate)
print("-----------")
print(second_estimate)
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate

## Estimate
Mean value: 11.32687325588067

-----------
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate

## Estimate
Mean value: 15.620265231373276

[5]:
# EconML estimator example
from econml.dml import DML
from sklearn.linear_model import LassoCV
from sklearn.preprocessing import PolynomialFeatures

from sklearn.ensemble import GradientBoostingRegressor

estimator = Econml(
    identified_estimand=identified_estimand,
    econml_estimator=DML(
        model_y=GradientBoostingRegressor(),
        model_t=GradientBoostingRegressor(),
        model_final=LassoCV(fit_intercept=False),
        featurizer=PolynomialFeatures(degree=1, include_bias=True),
    ),
).fit(
    data=data["df"],
    effect_modifier_names=data["effect_modifier_names"],
)

estimate_econml = estimator.estimate_effect(
    data=data["df"],
    control_value=0,
    treatment_value=1,
    target_units="ate",
)

print(estimate)
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate

## Estimate
Mean value: 11.32687325588067

Refute Estimate - Functional API (Preview)#

[6]:
# You can call the refute_estimate function for executing several refuters using default parameters
# Currently this function does not support sensitivity_* functions
refutation_results = refute_estimate(
    data["df"],
    identified_estimand,
    estimate,
    treatment_name=treatment_name,
    outcome_name=outcome_name,
    refuters=[refute_bootstrap, refute_data_subset],
)

for result in refutation_results:
    print(result)

# Or you can execute refute methods directly
# You can change the refute_bootstrap - refute_data_subset for any of the other refuters and add the missing parameters

bootstrap_refutation = refute_bootstrap(data["df"], identified_estimand, estimate)
print(bootstrap_refutation)

data_subset_refutation = refute_data_subset(data["df"], identified_estimand, estimate)
print(data_subset_refutation)
Refute: Bootstrap Sample Dataset
Estimated effect:11.32687325588067
New effect:11.88856219387756
p value:0.54

Refute: Use a subset of data
Estimated effect:11.32687325588067
New effect:11.66112498000895
p value:0.6599999999999999

Refute: Bootstrap Sample Dataset
Estimated effect:11.32687325588067
New effect:11.681262891992505
p value:0.7

Refute: Use a subset of data
Estimated effect:11.32687325588067
New effect:11.743625685672644
p value:0.5800000000000001

Backwards Compatibility#

This section shows replicating the same results using only the CausalModel API

[7]:
# Create Causal Model
causal_model = CausalModel(data=data["df"], treatment=treatment_name, outcome=outcome_name, graph=data["gml_graph"])

Identify Effect#

[8]:
identified_estimand_causal_model_api = (
    causal_model.identify_effect()
)  # graph, treatment and outcome comes from the causal_model object

print(identified_estimand_causal_model_api)
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W1,W0,W2])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,W2,U) = P(y|v0,W1,W0,W2)

### Estimand : 2
Estimand name: iv
Estimand expression:
 ⎡                              -1⎤
 ⎢    d        ⎛    d          ⎞  ⎥
E⎢─────────(y)⋅⎜─────────([v₀])⎟  ⎥
 ⎣d[Z₁  Z₀]    ⎝d[Z₁  Z₀]      ⎠  ⎦
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z1,Z0})
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→{v0}, then ¬({Z1,Z0}→y)

### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!

Estimate Effect#

[9]:
estimate_causal_model_api = causal_model.estimate_effect(
    identified_estimand_causal_model_api, method_name="backdoor.propensity_score_matching"
)

print(estimate_causal_model_api)
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W1,W0,W2])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,W2,U) = P(y|v0,W1,W0,W2)

## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate

## Estimate
Mean value: 11.32687325588067

Refute Estimate#

[10]:
bootstrap_refutation_causal_model_api = causal_model.refute_estimate(identified_estimand_causal_model_api, estimate_causal_model_api, "bootstrap_refuter")
print(bootstrap_refutation_causal_model_api)

data_subset_refutation_causal_model_api = causal_model.refute_estimate(
    identified_estimand_causal_model_api, estimate_causal_model_api, "data_subset_refuter"
)

print(data_subset_refutation_causal_model_api)
Refute: Bootstrap Sample Dataset
Estimated effect:11.32687325588067
New effect:11.80905615602231
p value:0.5

Refute: Use a subset of data
Estimated effect:11.32687325588067
New effect:11.766914477757974
p value:0.6399999999999999