Example to demonstrate optimized backdoor variable search for Causal Identification#

This notebook compares the performance between causal identification using vanilla backdoor search and the optimized backdoor search and demonstrates the performance gains obtained by using the latter.

[1]:

import time
import random
from networkx.linalg.graphmatrix import adjacency_matrix
import numpy as np
import pandas as pd
import networkx as nx

import dowhy
from dowhy import CausalModel
from dowhy.utils import graph_operations
import dowhy.datasets

Create Random Graph#

In this section, we create a random graph with the designated number of nodes (10 in this case).

[2]:

n = 10
p = 0.5

G = nx.generators.random_graphs.fast_gnp_random_graph(n, p, directed=True)
graph = nx.DiGraph([(u,v) for (u,v) in G.edges() if u<v])
nodes = []
for i in graph.nodes:
    nodes.append(str(i))
adjacency_matrix = np.asarray(nx.to_numpy_array(graph))
graph_dot = graph_operations.adjacency_matrix_to_graph(adjacency_matrix, nodes)
graph_dot = graph_operations.str_to_dot(graph_dot.source)
print("Graph Generated.")

df = pd.DataFrame(columns=nodes)
print("Dataframe Generated.")

Graph Generated.
Dataframe Generated.

Testing optimized backdoor search#

In this section, we compare the runtimes for causal identification using vanilla backdoor search and the optimized backdoor search.

[3]:

start = time.time()

# I. Create a causal model from the data and given graph.
model = CausalModel(data=df,treatment=str(random.randint(0,n-1)),outcome=str(random.randint(0,n-1)),graph=graph_dot)
time1 = time.time()
print("Time taken for initializing model =", time1-start)

# II. Identify causal effect and return target estimands
identified_estimand = model.identify_effect()
time2 = time.time()
print("Time taken for vanilla identification =", time2-time1)

# III. Identify causal effect using the optimized backdoor implementation
identified_estimand = model.identify_effect(optimize_backdoor=True)
end = time.time()
print("Time taken for optimized backdoor identification =", end-time2)

Time taken for initializing model = 0.004637241363525391
Time taken for vanilla identification = 0.00022339820861816406
Time taken for optimized backdoor identification = 0.00013709068298339844

It can be observed that the optimized backdoor search makes causal identification faster as compared to the vanilla implementation.