Example to demonstrate optimized backdoor variable search for Causal Identification#
This notebook compares the performance between causal identification using vanilla backdoor search and the optimized backdoor search and demonstrates the performance gains obtained by using the latter.
[1]:
import time
import random
from networkx.linalg.graphmatrix import adjacency_matrix
import numpy as np
import pandas as pd
import networkx as nx
import dowhy
from dowhy import CausalModel
from dowhy.utils import graph_operations
import dowhy.datasets
Create Random Graph#
In this section, we create a random graph with the designated number of nodes (10 in this case).
[2]:
n = 10
p = 0.5
G = nx.generators.random_graphs.fast_gnp_random_graph(n, p, directed=True)
graph = nx.DiGraph([(u,v) for (u,v) in G.edges() if u<v])
nodes = []
for i in graph.nodes:
nodes.append(str(i))
adjacency_matrix = np.asarray(nx.to_numpy_array(graph))
graph_dot = graph_operations.adjacency_matrix_to_graph(adjacency_matrix, nodes)
graph_dot = graph_operations.str_to_dot(graph_dot.source)
print("Graph Generated.")
df = pd.DataFrame(columns=nodes)
print("Dataframe Generated.")
Graph Generated.
Dataframe Generated.
Testing optimized backdoor search#
In this section, we compare the runtimes for causal identification using vanilla backdoor search and the optimized backdoor search.
[3]:
start = time.time()
# I. Create a causal model from the data and given graph.
model = CausalModel(data=df,treatment=str(random.randint(0,n-1)),outcome=str(random.randint(0,n-1)),graph=graph_dot)
time1 = time.time()
print("Time taken for initializing model =", time1-start)
# II. Identify causal effect and return target estimands
identified_estimand = model.identify_effect()
time2 = time.time()
print("Time taken for vanilla identification =", time2-time1)
# III. Identify causal effect using the optimized backdoor implementation
identified_estimand = model.identify_effect(optimize_backdoor=True)
end = time.time()
print("Time taken for optimized backdoor identification =", end-time2)
Time taken for initializing model = 0.004637241363525391
Time taken for vanilla identification = 0.00022339820861816406
Time taken for optimized backdoor identification = 0.00013709068298339844
It can be observed that the optimized backdoor search makes causal identification faster as compared to the vanilla implementation.