dowhy.utils package

Submodules

dowhy.utils.api module

dowhy.utils.api.parse_state(state)[source]

dowhy.utils.cit module

dowhy.utils.cit.compute_ci(r=None, nx=None, ny=None, confidence=0.95)[source]

Compute Parametric confidence intervals around correlation coefficient. See : https://online.stat.psu.edu/stat505/lesson/6/6.3

This is done by applying Fisher’s r to z transform z = .5[ln((1+r)/(1-r))] = arctanh(r)

The Standard error is 1/sqrt(N-3) where N is sample size

The critical value for normal distribution for a corresponding confidence level is calculated from stats.norm.ppf((1 - alpha)/2) for two tailed test

The lower and upper condidence intervals in z space are calculated with the formula z ± critical value*error

The confidence interval is then converted back to r space

:param stat : correlation coefficient :param nx : length of vector x :param ny :length of vector y :param confidence : Confidence level (0.95 = 95%)

:returns : array containing confidence interval

dowhy.utils.cit.conditional_MI(data=None, x=None, y=None, z=None)[source]

Method to return conditional mutual information between X and Y given Z I(X, Y | Z) = H(X|Z) - H(X|Y,Z)

= H(X,Z) - H(Z) - H(X,Y,Z) + H(Y,Z) = H(X,Z) + H(Y,Z) - H(X,Y,Z) - H(Z)

:param data : dataset :param x,y,z : column names from dataset :returns : conditional mutual information between X and Y given Z

dowhy.utils.cit.entropy(x)[source]: ” Returns entropy for a random variable x H(x) = - Σ p(x)log(p(x)) :param x : random variable to calculate entropy for :returns : entropy of random variable

dowhy.utils.cit.partial_corr(data=None, x=None, y=None, z=None, method='pearson')[source]

Calculate Partial correlation which is the degree of association between x and y after removing effect of z. This is done by calculating correlation coefficient between the residuals of two linear regressions : xsim z, ysim z See : 1 https://en.wikipedia.org/wiki/Partial_correlation

2 https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-842X.2004.00360.x?casa_token=p_D3joHC8C0AAAAA:qigIZHVfcVi8vsz1j2t7uQYOorrYaF3Tm4lpQOUzqG_J9gJgtFerOyliKBnQPVG187nJxbA-wcbXU3QcOw 3 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681537/ 4 http://parker.ad.siu.edu/Olive/slch6.pdf

:param data : pandas dataframe :param x : Column name in data :param y : Column name in data :param z : string or list :param method : string denoting the correlation type - “pearson” or “spearman”

: returns: a python dictionary with keys as: n: Sample size r: Partial correlation coefficient CI95: 95% parametric confidence intervals p-val: p-value

dowhy.utils.cli_helpers module

dowhy.utils.cli_helpers.query_yes_no(question, default=True)[source]

Ask a yes/no question via standard input and return the answer.

Source: https://stackoverflow.com/questions/3041986/apt-command-line-interface-like-yes-no-input

If invalid input is given, the user will be asked until they actually give valid input.

Side Effects: Blocks program execution until valid input(y/n) is given.

Parameters

question(str) – A question that is presented to the user.
default(bool|None) – The default value when enter is pressed with no value. When None, there is no default value and the query will loop.

Returns

A bool indicating whether user has entered yes or no.

dowhy.utils.dgp module

class dowhy.utils.dgp.DataGeneratingProcess(**kwargs)[source]

Bases: object

Base class for implementation of data generating process.

Subclasses implement functions that create various data generating processes. All data generating processes are in the package “dowhy.utils.dgps”.

DEFAULT_PERCENTILE = 0.9

convert_to_binary(data, deterministic=False)[source]

generate_data()[source]

generation_process()[source]

dowhy.utils.graph_operations module

dowhy.utils.graph_operations.add_edge(i, j, g)[source]: Adds an edge i –> j to the graph, g. The edge is only added if this addition does NOT cause the graph to have cycles.

dowhy.utils.graph_operations.adjacency_matrix_to_adjacency_list(adjacency_matrix, labels=None)[source]

Convert the adjacency matrix of a graph to an adjacency list.

Parameters

adjacency_matrix – A numpy array representing the graph adjacency matrix.
labels – List of labels.

Returns

Adjacency list as a dictionary.

dowhy.utils.graph_operations.adjacency_matrix_to_graph(adjacency_matrix, labels=None)[source]

Convert a given graph adjacency matrix to DOT format.

Parameters

adjacency_matrix – A numpy array representing the graph adjacency matrix.
labels – List of labels.

Returns

Graph in DOT format.

dowhy.utils.graph_operations.convert_to_undirected_graph(g)[source]

dowhy.utils.graph_operations.daggity_to_dot(daggity_string)[source]

Converts the input daggity_string to valid DOT graph format.

Parameters: daggity_string – Output graph from Daggity site
Returns: DOT string

dowhy.utils.graph_operations.del_edge(i, j, g)[source]: Deletes the edge i –> j in the graph, g. The edge is only deleted if this removal does NOT cause the graph to be disconnected.

dowhy.utils.graph_operations.find_ancestor(node_set, node_names, adjacency_matrix, node2idx, idx2node)[source]

Finds ancestors of a given set of nodes in a given graph.

Parameters

node_set – Set of nodes whos ancestors must be obtained.
node_names – Name of all nodes in the graph.
adjacency_matrix – Graph adjacency matrix.
node2idx – A dictionary mapping node names to their row or column index in the adjacency matrix.
idx2node – A dictionary mapping the row or column indices in the adjacency matrix to the corresponding node names.

Returns

OrderedSet containing ancestors of all nodes in the node_set.

dowhy.utils.graph_operations.find_c_components(adjacency_matrix, node_set, idx2node)[source]

Obtain C-components in a graph.

Parameters

adjacency_matrix – Graph adjacency matrix.
node_set – Set of nodes whos ancestors must be obtained.
idx2node – A dictionary mapping the row or column indices in the adjacency matrix to the corresponding node names.

Returns

List of C-components in the graph.

dowhy.utils.graph_operations.find_predecessor(i, j, g)[source]: Finds a predecessor, k, in the path between two nodes, i and j, in the graph, g.

dowhy.utils.graph_operations.get_random_node_pair(n)[source]: Randomly generates a pair of nodes.

dowhy.utils.graph_operations.get_simple_ordered_tree(n)[source]: Generates a simple-ordered tree. The tree is just a directed acyclic graph of n nodes with the structure 0 –> 1 –> …. –> n.

dowhy.utils.graph_operations.induced_graph(node_set, adjacency_matrix, node2idx)[source]

To obtain the induced graph corresponding to a subset of nodes.

Parameters

node_set – Set of nodes whos ancestors must be obtained.
adjacency_matrix – Graph adjacency matrix.
node2idx – A dictionary mapping node names to their row or column index in the adjacency matrix.

Returns

Numpy array representing the adjacency matrix of the induced graph.

dowhy.utils.graph_operations.is_connected(g)[source]: Checks if a the directed acyclic graph is connected.

dowhy.utils.graph_operations.str_to_dot(string)[source]

Converts input string from graphviz library to valid DOT graph format.

Parameters: string – Graph in DOT format.
Returns: DOT string converted to a suitable format for the DoWhy library.

dowhy.utils.ordered_set module

class dowhy.utils.ordered_set.OrderedSet(elements=None)[source]

Bases: object

Python class for ordered set. Code taken from https://github.com/buyalsky/ordered-hash-set/tree/5198b23e01faeac3f5398ab2c08cb013d14b3702.

add(element)[source]

Function to add an element to do set if it does not exit.

Parameters: element – element to be added.

difference(other_set)[source]

Function to remove elements in self._set which are also present in other_set.

Parameters: other_set – The set to obtain difference with. Can be a list, set or OrderedSet.
Returns: New OrderedSet representing the difference of elements in the self._set and other_set.

get_all()[source]

Function to return list of all elements in the set.

Returns: List of all items in the set.

intersection(other_set)[source]

Function to compute the intersection of self._set and other_set.

Parameters: other_set – The set to obtain intersection with. Can be a list, set or OrderedSet.
Returns: New OrderedSet representing the set with elements common to the OrderedSet object and other_set.

is_empty()[source]

Function to determine if the set is empty or not.

Returns: True if the set is empty, False otherwise.

union(other_set)[source]

Function to compute the union of self._set and other_set.

Parameters: other_set – The set to obtain union with. Can be a list, set or OrderedSet.
Returns: New OrderedSet representing the set with elements from the OrderedSet object and other_set.

dowhy.utils.propensity_score module

dowhy.utils.propensity_score.binarize_discrete(data, covariates, variable_types)[source]

dowhy.utils.propensity_score.binary_treatment_model(data, covariates, treatment, variable_types)[source]

dowhy.utils.propensity_score.categorical_treatment_model(data, covariates, treatment, variable_types)[source]

dowhy.utils.propensity_score.continuous_treatment_model(data, covariates, treatment, variable_types)[source]

dowhy.utils.propensity_score.discrete_to_integer(discrete)[source]

dowhy.utils.propensity_score.get_type_string(variables, variable_types)[source]

dowhy.utils.propensity_score.propensity_of_treatment_score(data, covariates, treatment, model='logistic', variable_types=None)[source]

dowhy.utils.propensity_score.state_propensity_score(data, covariates, treatments, variable_types=None)[source]

dowhy.utils.regression module

dowhy.utils.regression.create_polynomial_function(max_degree)[source]

Creates a list of polynomial functions

Parameters: max_degree – degree of the polynomial function to be created
Returns: list of lambda functions

dowhy.utils.regression.generate_moment_function(W, g)[source]: Generate and returns moment function m(W,g) = g(1,W) - g(0,W) for Average Causal Effect

dowhy.utils.regression.get_generic_regressor(cv, X, Y, max_degree=3, estimator_list=None, estimator_param_list=None, numeric_features=None)[source]

Finds the best estimator for regression function (g_s)

Parameters

cv – training and testing data indices obtained afteer Kfolding the dataset
X – regressors data for training the regression model
Y – outcome data for training the regression model
max_degree – degree of the polynomial function used to approximate the regression function
estimator_list – list of estimator objects for finding the regression function
estimator_param_list – list of dictionaries with parameters for tuning respective estimators in estimator_list
numeric_features – list of indices of numeric features in the dataset

Returns

estimator for Reisz Regression function

dowhy.utils.regression.get_numeric_features(X)[source]

Finds the numeric feature columns in a dataset

Parameters: X – pandas dataframe

returns: list of indices of numeric features

dowhy.utils package

Submodules

dowhy.utils.api module

dowhy.utils.cit module

dowhy.utils.cli_helpers module

dowhy.utils.dgp module

dowhy.utils.graph_operations module

dowhy.utils.ordered_set module

dowhy.utils.propensity_score module

dowhy.utils.regression module

Module contents