dowhy.utils package#
Submodules#
dowhy.utils.api module#
dowhy.utils.cit module#
- dowhy.utils.cit.compute_ci(r=None, nx=None, ny=None, confidence=0.95)[source]#
Compute Parametric confidence intervals around correlation coefficient. See : https://online.stat.psu.edu/stat505/lesson/6/6.3
This is done by applying Fisher’s r to z transform z = .5[ln((1+r)/(1-r))] = arctanh(r)
The Standard error is 1/sqrt(N-3) where N is sample size
The critical value for normal distribution for a corresponding confidence level is calculated from stats.norm.ppf((1 - alpha)/2) for two tailed test
The lower and upper condidence intervals in z space are calculated with the formula z ± critical value*error
The confidence interval is then converted back to r space
:param stat : correlation coefficient :param nx : length of vector x :param ny :length of vector y :param confidence : Confidence level (0.95 = 95%)
:returns : array containing confidence interval
- dowhy.utils.cit.conditional_MI(data=None, x=None, y=None, z=None)[source]#
Method to return conditional mutual information between X and Y given Z I(X, Y | Z) = H(X|Z) - H(X|Y,Z)
= H(X,Z) - H(Z) - H(X,Y,Z) + H(Y,Z) = H(X,Z) + H(Y,Z) - H(X,Y,Z) - H(Z)
:param data : dataset :param x,y,z : column names from dataset :returns : conditional mutual information between X and Y given Z
- dowhy.utils.cit.entropy(x)[source]#
” Returns entropy for a random variable x H(x) = - Σ p(x)log(p(x)) :param x : random variable to calculate entropy for :returns : entropy of random variable
- dowhy.utils.cit.partial_corr(data=None, x=None, y=None, z=None, method='pearson')[source]#
Calculate Partial correlation which is the degree of association between x and y after removing effect of z. This is done by calculating correlation coefficient between the residuals of two linear regressions : xsim z, ysim z See : 1 https://en.wikipedia.org/wiki/Partial_correlation
:param data : pandas dataframe :param x : Column name in data :param y : Column name in data :param z : string or list :param method : string denoting the correlation type - “pearson” or “spearman”
- : returns: a python dictionary with keys as
n: Sample size r: Partial correlation coefficient CI95: 95% parametric confidence intervals p-val: p-value
dowhy.utils.cli_helpers module#
- dowhy.utils.cli_helpers.query_yes_no(question, default=True)[source]#
Ask a yes/no question via standard input and return the answer.
Source: https://stackoverflow.com/questions/3041986/apt-command-line-interface-like-yes-no-input
If invalid input is given, the user will be asked until they actually give valid input.
Side Effects: Blocks program execution until valid input(y/n) is given.
- Parameters:
question(str) – A question that is presented to the user.
default(bool|None) – The default value when enter is pressed with no value. When None, there is no default value and the query will loop.
- Returns:
A bool indicating whether user has entered yes or no.
dowhy.utils.dgp module#
- class dowhy.utils.dgp.DataGeneratingProcess(**kwargs)[source]#
Bases:
object
Base class for implementation of data generating process.
Subclasses implement functions that create various data generating processes. All data generating processes are in the package “dowhy.utils.dgps”.
- DEFAULT_PERCENTILE = 0.9#
dowhy.utils.graph_operations module#
- dowhy.utils.graph_operations.add_edge(i, j, g)[source]#
Adds an edge i –> j to the graph, g. The edge is only added if this addition does NOT cause the graph to have cycles.
- dowhy.utils.graph_operations.adjacency_matrix_to_adjacency_list(adjacency_matrix, labels=None)[source]#
Convert the adjacency matrix of a graph to an adjacency list.
- Parameters:
adjacency_matrix – A numpy array representing the graph adjacency matrix.
labels – List of labels.
- Returns:
Adjacency list as a dictionary.
- dowhy.utils.graph_operations.adjacency_matrix_to_graph(adjacency_matrix, labels=None)[source]#
Convert a given graph adjacency matrix to DOT format.
- Parameters:
adjacency_matrix – A numpy array representing the graph adjacency matrix.
labels – List of labels.
- Returns:
Graph in DOT format.
- dowhy.utils.graph_operations.daggity_to_dot(daggity_string)[source]#
Converts the input daggity_string to valid DOT graph format.
- Parameters:
daggity_string – Output graph from Daggity site
- Returns:
DOT string
- dowhy.utils.graph_operations.del_edge(i, j, g)[source]#
Deletes the edge i –> j in the graph, g. The edge is only deleted if this removal does NOT cause the graph to be disconnected.
- dowhy.utils.graph_operations.find_ancestor(node_set, node_names, adjacency_matrix, node2idx, idx2node)[source]#
Finds ancestors of a given set of nodes in a given graph.
- Parameters:
node_set – Set of nodes whos ancestors must be obtained.
node_names – Name of all nodes in the graph.
adjacency_matrix – Graph adjacency matrix.
node2idx – A dictionary mapping node names to their row or column index in the adjacency matrix.
idx2node – A dictionary mapping the row or column indices in the adjacency matrix to the corresponding node names.
- Returns:
OrderedSet containing ancestors of all nodes in the node_set.
- dowhy.utils.graph_operations.find_c_components(adjacency_matrix, node_set, idx2node)[source]#
Obtain C-components in a graph.
- Parameters:
adjacency_matrix – Graph adjacency matrix.
node_set – Set of nodes whos ancestors must be obtained.
idx2node – A dictionary mapping the row or column indices in the adjacency matrix to the corresponding node names.
- Returns:
List of C-components in the graph.
- dowhy.utils.graph_operations.find_predecessor(i, j, g)[source]#
Finds a predecessor, k, in the path between two nodes, i and j, in the graph, g.
- dowhy.utils.graph_operations.get_simple_ordered_tree(n)[source]#
Generates a simple-ordered tree. The tree is just a directed acyclic graph of n nodes with the structure 0 –> 1 –> …. –> n.
- dowhy.utils.graph_operations.induced_graph(node_set, adjacency_matrix, node2idx)[source]#
To obtain the induced graph corresponding to a subset of nodes.
- Parameters:
node_set – Set of nodes whos ancestors must be obtained.
adjacency_matrix – Graph adjacency matrix.
node2idx – A dictionary mapping node names to their row or column index in the adjacency matrix.
- Returns:
Numpy array representing the adjacency matrix of the induced graph.
dowhy.utils.graphviz_plotting module#
- dowhy.utils.graphviz_plotting.plot_causal_graph_graphviz(causal_graph: Graph, layout_prog: str | None = None, display_causal_strengths: bool = True, causal_strengths: Dict[Tuple[Any, Any], float] | None = None, colors: Dict[Any | Tuple[Any, Any], str] | None = None, filename: str | None = None, display_plot: bool = True, figure_size: Tuple[int, int] | None = None) None [source]#
dowhy.utils.networkx_plotting module#
- dowhy.utils.networkx_plotting.plot_causal_graph_networkx(causal_graph: Graph, layout_prog: str | None = None, causal_strengths: Dict[Tuple[Any, Any], float] | None = None, colors: Dict[Any | Tuple[Any, Any], str] | None = None, filename: str | None = None, display_plot: bool = True, label_wrap_length: int = 3, figure_size: Tuple[int, int] | None = None) None [source]#
dowhy.utils.ordered_set module#
- class dowhy.utils.ordered_set.OrderedSet(elements=None)[source]#
Bases:
object
Python class for ordered set. Code taken from buyalsky/ordered-hash-set.
- add(element)[source]#
Function to add an element to do set if it does not exit.
- Parameters:
element – element to be added.
- difference(other_set)[source]#
Function to remove elements in self._set which are also present in other_set.
- Parameters:
other_set – The set to obtain difference with. Can be a list, set or OrderedSet.
- Returns:
New OrderedSet representing the difference of elements in the self._set and other_set.
- get_all()[source]#
Function to return list of all elements in the set.
- Returns:
List of all items in the set.
- intersection(other_set)[source]#
Function to compute the intersection of self._set and other_set.
- Parameters:
other_set – The set to obtain intersection with. Can be a list, set or OrderedSet.
- Returns:
New OrderedSet representing the set with elements common to the OrderedSet object and other_set.
dowhy.utils.plotting module#
- dowhy.utils.plotting.bar_plot(values: Dict[str, float], uncertainties: Dict[str, Tuple[float, float]] | None = None, ylabel: str = '', filename: str | None = None, display_plot: bool = True, figure_size: List[int] | None = None, bar_width: float = 0.8, xticks: List[str] | None = None, xticks_rotation: int = 90, sort_names: bool = False) None [source]#
Convenience function to make a bar plot of the given values with uncertainty bars, if provided. Useful for all kinds of attribution results (including confidence intervals).
- Parameters:
values – A dictionary where the keys are the labels and the values are the values to be plotted.
uncertainties – A dictionary of attributes to be added to the error bars.
ylabel – The label for the y-axis.
filename – An optional filename if the output should be plotted into a file.
display_plot – Optionally specify if the plot should be displayed or not (default to True).
figure_size – The size of the figure to be plotted.
bar_width – The width of the bars.
xticks – Explicitly specify the labels for the bars on the x-axis.
xticks_rotation – Specify the rotation of the labels on the x-axis.
sort_names – If True, the names in the plot are sorted alphabetically. If False, the order as given in values are used.
- dowhy.utils.plotting.plot(causal_graph: Graph, layout_prog: str | None = None, causal_strengths: Dict[Tuple[Any, Any], float] | None = None, colors: Dict[Any | Tuple[Any, Any], str] | None = None, filename: str | None = None, display_plot: bool = True, figure_size: Tuple[int, int] | None = None, **kwargs) None [source]#
Convenience function to plot causal graphs. This function uses different backends based on what’s available on the system. The best result is achieved when using Graphviz as the backend. This requires both the shared system library (e.g.
brew install graphviz
orapt-get install graphviz
) and the Python pygraphviz package (pip install pygraphviz
). When graphviz is not available, it will fall back to the networkx backend.- Parameters:
causal_graph – The graph to be plotted
layout_prog – Defines the layout type. If None is given, the ‘dot’ layout is used for graphviz plots and a customized layout for networkx plots.
causal_strengths – An optional dictionary with Edge -> float entries.
colors – An optional dictionary with color specifications for edges or nodes.
filename – An optional filename if the output should be plotted into a file.
display_plot – Optionally specify if the plot should be displayed or not (default to True).
figure_size – A tuple to define the width and height (as a tuple) of the pyplot. This is used to parameter to modify pyplot’s ‘figure.figsize’ parameter. If None is given, the current/default value is used.
kwargs – Remaining parameters will be passed through to the backend verbatim.
Example usage:
>>> plot(nx.DiGraph([('X', 'Y')])) # plots X -> Y >>> plot(nx.DiGraph([('X', 'Y')]), causal_strengths={('X', 'Y'): 0.43}) # annotates arrow with 0.43 >>> plot(nx.DiGraph([('X', 'Y')]), colors={('X', 'Y'): 'red', 'X': 'green'}) # colors X -> Y red and X green
dowhy.utils.propensity_score module#
- dowhy.utils.propensity_score.binary_treatment_model(data, covariates, treatment, variable_types)[source]#
- dowhy.utils.propensity_score.categorical_treatment_model(data, covariates, treatment, variable_types)[source]#
- dowhy.utils.propensity_score.continuous_treatment_model(data, covariates, treatment, variable_types)[source]#
dowhy.utils.regression module#
- dowhy.utils.regression.create_polynomial_function(max_degree)[source]#
Creates a list of polynomial functions
- Parameters:
max_degree – degree of the polynomial function to be created
- Returns:
list of lambda functions