dowhy.utils package
Submodules
dowhy.utils.api module
dowhy.utils.cit module
- dowhy.utils.cit.compute_ci(r=None, nx=None, ny=None, confidence=0.95)[source]
Compute Parametric confidence intervals around correlation coefficient. See : https://online.stat.psu.edu/stat505/lesson/6/6.3
This is done by applying Fisher’s r to z transform z = .5[ln((1+r)/(1-r))] = arctanh(r)
The Standard error is 1/sqrt(N-3) where N is sample size
The critical value for normal distribution for a corresponding confidence level is calculated from stats.norm.ppf((1 - alpha)/2) for two tailed test
The lower and upper condidence intervals in z space are calculated with the formula z ± critical value*error
The confidence interval is then converted back to r space
:param stat : correlation coefficient :param nx : length of vector x :param ny :length of vector y :param confidence : Confidence level (0.95 = 95%)
:returns : array containing confidence interval
- dowhy.utils.cit.conditional_MI(data=None, x=None, y=None, z=None)[source]
Method to return conditional mutual information between X and Y given Z I(X, Y | Z) = H(X|Z) - H(X|Y,Z)
= H(X,Z) - H(Z) - H(X,Y,Z) + H(Y,Z) = H(X,Z) + H(Y,Z) - H(X,Y,Z) - H(Z)
:param data : dataset :param x,y,z : column names from dataset :returns : conditional mutual information between X and Y given Z
- dowhy.utils.cit.entropy(x)[source]
” Returns entropy for a random variable x H(x) = - Σ p(x)log(p(x)) :param x : random variable to calculate entropy for :returns : entropy of random variable
- dowhy.utils.cit.partial_corr(data=None, x=None, y=None, z=None, method='pearson')[source]
Calculate Partial correlation which is the degree of association between x and y after removing effect of z. This is done by calculating correlation coefficient between the residuals of two linear regressions : xsim z, ysim z See : 1 https://en.wikipedia.org/wiki/Partial_correlation
:param data : pandas dataframe :param x : Column name in data :param y : Column name in data :param z : string or list :param method : string denoting the correlation type - “pearson” or “spearman”
- : returns: a python dictionary with keys as
n: Sample size r: Partial correlation coefficient CI95: 95% parametric confidence intervals p-val: p-value
dowhy.utils.cli_helpers module
- dowhy.utils.cli_helpers.query_yes_no(question, default=True)[source]
Ask a yes/no question via standard input and return the answer.
Source: https://stackoverflow.com/questions/3041986/apt-command-line-interface-like-yes-no-input
If invalid input is given, the user will be asked until they actually give valid input.
Side Effects: Blocks program execution until valid input(y/n) is given.
- Parameters
question(str) – A question that is presented to the user.
default(bool|None) – The default value when enter is pressed with no value. When None, there is no default value and the query will loop.
- Returns
A bool indicating whether user has entered yes or no.
dowhy.utils.dgp module
- class dowhy.utils.dgp.DataGeneratingProcess(**kwargs)[source]
Bases:
object
Base class for implementation of data generating process.
Subclasses implement functions that create various data generating processes. All data generating processes are in the package “dowhy.utils.dgps”.
- DEFAULT_PERCENTILE = 0.9
dowhy.utils.graph_operations module
- dowhy.utils.graph_operations.add_edge(i, j, g)[source]
Adds an edge i –> j to the graph, g. The edge is only added if this addition does NOT cause the graph to have cycles.
- dowhy.utils.graph_operations.adjacency_matrix_to_adjacency_list(adjacency_matrix, labels=None)[source]
Convert the adjacency matrix of a graph to an adjacency list.
- Parameters
adjacency_matrix – A numpy array representing the graph adjacency matrix.
labels – List of labels.
- Returns
Adjacency list as a dictionary.
- dowhy.utils.graph_operations.adjacency_matrix_to_graph(adjacency_matrix, labels=None)[source]
Convert a given graph adjacency matrix to DOT format.
- Parameters
adjacency_matrix – A numpy array representing the graph adjacency matrix.
labels – List of labels.
- Returns
Graph in DOT format.
- dowhy.utils.graph_operations.daggity_to_dot(daggity_string)[source]
Converts the input daggity_string to valid DOT graph format.
- Parameters
daggity_string – Output graph from Daggity site
- Returns
DOT string
- dowhy.utils.graph_operations.del_edge(i, j, g)[source]
Deletes the edge i –> j in the graph, g. The edge is only deleted if this removal does NOT cause the graph to be disconnected.
- dowhy.utils.graph_operations.find_ancestor(node_set, node_names, adjacency_matrix, node2idx, idx2node)[source]
Finds ancestors of a given set of nodes in a given graph.
- Parameters
node_set – Set of nodes whos ancestors must be obtained.
node_names – Name of all nodes in the graph.
adjacency_matrix – Graph adjacency matrix.
node2idx – A dictionary mapping node names to their row or column index in the adjacency matrix.
idx2node – A dictionary mapping the row or column indices in the adjacency matrix to the corresponding node names.
- Returns
OrderedSet containing ancestors of all nodes in the node_set.
- dowhy.utils.graph_operations.find_c_components(adjacency_matrix, node_set, idx2node)[source]
Obtain C-components in a graph.
- Parameters
adjacency_matrix – Graph adjacency matrix.
node_set – Set of nodes whos ancestors must be obtained.
idx2node – A dictionary mapping the row or column indices in the adjacency matrix to the corresponding node names.
- Returns
List of C-components in the graph.
- dowhy.utils.graph_operations.find_predecessor(i, j, g)[source]
Finds a predecessor, k, in the path between two nodes, i and j, in the graph, g.
- dowhy.utils.graph_operations.get_simple_ordered_tree(n)[source]
Generates a simple-ordered tree. The tree is just a directed acyclic graph of n nodes with the structure 0 –> 1 –> …. –> n.
- dowhy.utils.graph_operations.induced_graph(node_set, adjacency_matrix, node2idx)[source]
To obtain the induced graph corresponding to a subset of nodes.
- Parameters
node_set – Set of nodes whos ancestors must be obtained.
adjacency_matrix – Graph adjacency matrix.
node2idx – A dictionary mapping node names to their row or column index in the adjacency matrix.
- Returns
Numpy array representing the adjacency matrix of the induced graph.
dowhy.utils.ordered_set module
- class dowhy.utils.ordered_set.OrderedSet(elements=None)[source]
Bases:
object
Python class for ordered set. Code taken from https://github.com/buyalsky/ordered-hash-set/tree/5198b23e01faeac3f5398ab2c08cb013d14b3702.
- add(element)[source]
Function to add an element to do set if it does not exit.
- Parameters
element – element to be added.
- difference(other_set)[source]
Function to remove elements in self._set which are also present in other_set.
- Parameters
other_set – The set to obtain difference with. Can be a list, set or OrderedSet.
- Returns
New OrderedSet representing the difference of elements in the self._set and other_set.
- get_all()[source]
Function to return list of all elements in the set.
- Returns
List of all items in the set.
- intersection(other_set)[source]
Function to compute the intersection of self._set and other_set.
- Parameters
other_set – The set to obtain intersection with. Can be a list, set or OrderedSet.
- Returns
New OrderedSet representing the set with elements common to the OrderedSet object and other_set.
dowhy.utils.propensity_score module
- dowhy.utils.propensity_score.binary_treatment_model(data, covariates, treatment, variable_types)[source]
- dowhy.utils.propensity_score.categorical_treatment_model(data, covariates, treatment, variable_types)[source]
- dowhy.utils.propensity_score.continuous_treatment_model(data, covariates, treatment, variable_types)[source]
dowhy.utils.regression module
- dowhy.utils.regression.create_polynomial_function(max_degree)[source]
Creates a list of polynomial functions
- Parameters
max_degree – degree of the polynomial function to be created
- Returns
list of lambda functions
- dowhy.utils.regression.generate_moment_function(W, g)[source]
Generate and returns moment function m(W,g) = g(1,W) - g(0,W) for Average Causal Effect
- dowhy.utils.regression.get_generic_regressor(cv, X, Y, max_degree=3, estimator_list=None, estimator_param_list=None, numeric_features=None)[source]
Finds the best estimator for regression function (g_s)
- Parameters
cv – training and testing data indices obtained afteer Kfolding the dataset
X – regressors data for training the regression model
Y – outcome data for training the regression model
max_degree – degree of the polynomial function used to approximate the regression function
estimator_list – list of estimator objects for finding the regression function
estimator_param_list – list of dictionaries with parameters for tuning respective estimators in estimator_list
numeric_features – list of indices of numeric features in the dataset
- Returns
estimator for Reisz Regression function