dowhy.utils package#

Submodules#

dowhy.utils.api module#

dowhy.utils.api.parse_state(state)[source]#

dowhy.utils.cit module#

dowhy.utils.cit.compute_ci(r=None, nx=None, ny=None, confidence=0.95)[source]#

Compute Parametric confidence intervals around correlation coefficient. See : https://online.stat.psu.edu/stat505/lesson/6/6.3

This is done by applying Fisher’s r to z transform z = .5[ln((1+r)/(1-r))] = arctanh(r)

The Standard error is 1/sqrt(N-3) where N is sample size

The critical value for normal distribution for a corresponding confidence level is calculated from stats.norm.ppf((1 - alpha)/2) for two tailed test

The lower and upper condidence intervals in z space are calculated with the formula z ± critical value*error

The confidence interval is then converted back to r space

:param stat : correlation coefficient :param nx : length of vector x :param ny :length of vector y :param confidence : Confidence level (0.95 = 95%)

:returns : array containing confidence interval

dowhy.utils.cit.conditional_MI(data=None, x=None, y=None, z=None)[source]#

Method to return conditional mutual information between X and Y given Z I(X, Y | Z) = H(X|Z) - H(X|Y,Z)

= H(X,Z) - H(Z) - H(X,Y,Z) + H(Y,Z) = H(X,Z) + H(Y,Z) - H(X,Y,Z) - H(Z)

:param data : dataset :param x,y,z : column names from dataset :returns : conditional mutual information between X and Y given Z

dowhy.utils.cit.entropy(x)[source]#

” Returns entropy for a random variable x H(x) = - Σ p(x)log(p(x)) :param x : random variable to calculate entropy for :returns : entropy of random variable

dowhy.utils.cit.partial_corr(data=None, x=None, y=None, z=None, method='pearson')[source]#

Calculate Partial correlation which is the degree of association between x and y after removing effect of z. This is done by calculating correlation coefficient between the residuals of two linear regressions : xsim z, ysim z See : 1 https://en.wikipedia.org/wiki/Partial_correlation

:param data : pandas dataframe :param x : Column name in data :param y : Column name in data :param z : string or list :param method : string denoting the correlation type - “pearson” or “spearman”

: returns: a python dictionary with keys as

n: Sample size r: Partial correlation coefficient CI95: 95% parametric confidence intervals p-val: p-value

dowhy.utils.cli_helpers module#

dowhy.utils.cli_helpers.query_yes_no(question, default=True)[source]#

Ask a yes/no question via standard input and return the answer.

Source: https://stackoverflow.com/questions/3041986/apt-command-line-interface-like-yes-no-input

If invalid input is given, the user will be asked until they actually give valid input.

Side Effects: Blocks program execution until valid input(y/n) is given.

Parameters:
  • question(str) – A question that is presented to the user.

  • default(bool|None) – The default value when enter is pressed with no value. When None, there is no default value and the query will loop.

Returns:

A bool indicating whether user has entered yes or no.

dowhy.utils.dgp module#

class dowhy.utils.dgp.DataGeneratingProcess(**kwargs)[source]#

Bases: object

Base class for implementation of data generating process.

Subclasses implement functions that create various data generating processes. All data generating processes are in the package “dowhy.utils.dgps”.

DEFAULT_PERCENTILE = 0.9#
convert_to_binary(data, deterministic=False)[source]#
generate_data()[source]#
generation_process()[source]#

dowhy.utils.graph_operations module#

dowhy.utils.graph_operations.add_edge(i, j, g)[source]#

Adds an edge i –> j to the graph, g. The edge is only added if this addition does NOT cause the graph to have cycles.

dowhy.utils.graph_operations.adjacency_matrix_to_adjacency_list(adjacency_matrix, labels=None)[source]#

Convert the adjacency matrix of a graph to an adjacency list.

Parameters:
  • adjacency_matrix – A numpy array representing the graph adjacency matrix.

  • labels – List of labels.

Returns:

Adjacency list as a dictionary.

dowhy.utils.graph_operations.adjacency_matrix_to_graph(adjacency_matrix, labels=None)[source]#

Convert a given graph adjacency matrix to DOT format.

Parameters:
  • adjacency_matrix – A numpy array representing the graph adjacency matrix.

  • labels – List of labels.

Returns:

Graph in DOT format.

dowhy.utils.graph_operations.convert_to_undirected_graph(g)[source]#
dowhy.utils.graph_operations.daggity_to_dot(daggity_string)[source]#

Converts the input daggity_string to valid DOT graph format.

Parameters:

daggity_string – Output graph from Daggity site

Returns:

DOT string

dowhy.utils.graph_operations.del_edge(i, j, g)[source]#

Deletes the edge i –> j in the graph, g. The edge is only deleted if this removal does NOT cause the graph to be disconnected.

dowhy.utils.graph_operations.find_ancestor(node_set, node_names, adjacency_matrix, node2idx, idx2node)[source]#

Finds ancestors of a given set of nodes in a given graph.

Parameters:
  • node_set – Set of nodes whos ancestors must be obtained.

  • node_names – Name of all nodes in the graph.

  • adjacency_matrix – Graph adjacency matrix.

  • node2idx – A dictionary mapping node names to their row or column index in the adjacency matrix.

  • idx2node – A dictionary mapping the row or column indices in the adjacency matrix to the corresponding node names.

Returns:

OrderedSet containing ancestors of all nodes in the node_set.

dowhy.utils.graph_operations.find_c_components(adjacency_matrix, node_set, idx2node)[source]#

Obtain C-components in a graph.

Parameters:
  • adjacency_matrix – Graph adjacency matrix.

  • node_set – Set of nodes whos ancestors must be obtained.

  • idx2node – A dictionary mapping the row or column indices in the adjacency matrix to the corresponding node names.

Returns:

List of C-components in the graph.

dowhy.utils.graph_operations.find_predecessor(i, j, g)[source]#

Finds a predecessor, k, in the path between two nodes, i and j, in the graph, g.

dowhy.utils.graph_operations.get_random_node_pair(n)[source]#

Randomly generates a pair of nodes.

dowhy.utils.graph_operations.get_simple_ordered_tree(n)[source]#

Generates a simple-ordered tree. The tree is just a directed acyclic graph of n nodes with the structure 0 –> 1 –> …. –> n.

dowhy.utils.graph_operations.induced_graph(node_set, adjacency_matrix, node2idx)[source]#

To obtain the induced graph corresponding to a subset of nodes.

Parameters:
  • node_set – Set of nodes whos ancestors must be obtained.

  • adjacency_matrix – Graph adjacency matrix.

  • node2idx – A dictionary mapping node names to their row or column index in the adjacency matrix.

Returns:

Numpy array representing the adjacency matrix of the induced graph.

dowhy.utils.graph_operations.is_connected(g)[source]#

Checks if a the directed acyclic graph is connected.

dowhy.utils.graph_operations.str_to_dot(string)[source]#

Converts input string from graphviz library to valid DOT graph format.

Parameters:

string – Graph in DOT format.

Returns:

DOT string converted to a suitable format for the DoWhy library.

dowhy.utils.graphviz_plotting module#

dowhy.utils.graphviz_plotting.plot_causal_graph_graphviz(causal_graph: Graph, layout_prog: str | None = None, display_causal_strengths: bool = True, causal_strengths: Dict[Tuple[Any, Any], float] | None = None, colors: Dict[Any | Tuple[Any, Any], str] | None = None, filename: str | None = None, display_plot: bool = True, figure_size: Tuple[int, int] | None = None) None[source]#

dowhy.utils.networkx_plotting module#

dowhy.utils.networkx_plotting.plot_causal_graph_networkx(causal_graph: Graph, layout_prog: str | None = None, causal_strengths: Dict[Tuple[Any, Any], float] | None = None, colors: Dict[Any | Tuple[Any, Any], str] | None = None, filename: str | None = None, display_plot: bool = True, label_wrap_length: int = 3, figure_size: Tuple[int, int] | None = None) None[source]#

dowhy.utils.ordered_set module#

class dowhy.utils.ordered_set.OrderedSet(elements=None)[source]#

Bases: object

Python class for ordered set. Code taken from buyalsky/ordered-hash-set.

add(element)[source]#

Function to add an element to do set if it does not exit.

Parameters:

element – element to be added.

difference(other_set)[source]#

Function to remove elements in self._set which are also present in other_set.

Parameters:

other_set – The set to obtain difference with. Can be a list, set or OrderedSet.

Returns:

New OrderedSet representing the difference of elements in the self._set and other_set.

get_all()[source]#

Function to return list of all elements in the set.

Returns:

List of all items in the set.

intersection(other_set)[source]#

Function to compute the intersection of self._set and other_set.

Parameters:

other_set – The set to obtain intersection with. Can be a list, set or OrderedSet.

Returns:

New OrderedSet representing the set with elements common to the OrderedSet object and other_set.

is_empty()[source]#

Function to determine if the set is empty or not.

Returns:

True if the set is empty, False otherwise.

union(other_set)[source]#

Function to compute the union of self._set and other_set.

Parameters:

other_set – The set to obtain union with. Can be a list, set or OrderedSet.

Returns:

New OrderedSet representing the set with elements from the OrderedSet object and other_set.

dowhy.utils.plotting module#

dowhy.utils.plotting.bar_plot(values: Dict[str, float], uncertainties: Dict[str, Tuple[float, float]] | None = None, ylabel: str = '', filename: str | None = None, display_plot: bool = True, figure_size: List[int] | None = None, bar_width: float = 0.8, xticks: List[str] | None = None, xticks_rotation: int = 90, sort_names: bool = False) None[source]#

Convenience function to make a bar plot of the given values with uncertainty bars, if provided. Useful for all kinds of attribution results (including confidence intervals).

Parameters:
  • values – A dictionary where the keys are the labels and the values are the values to be plotted.

  • uncertainties – A dictionary of attributes to be added to the error bars.

  • ylabel – The label for the y-axis.

  • filename – An optional filename if the output should be plotted into a file.

  • display_plot – Optionally specify if the plot should be displayed or not (default to True).

  • figure_size – The size of the figure to be plotted.

  • bar_width – The width of the bars.

  • xticks – Explicitly specify the labels for the bars on the x-axis.

  • xticks_rotation – Specify the rotation of the labels on the x-axis.

  • sort_names – If True, the names in the plot are sorted alphabetically. If False, the order as given in values are used.

dowhy.utils.plotting.plot(causal_graph: Graph, layout_prog: str | None = None, causal_strengths: Dict[Tuple[Any, Any], float] | None = None, colors: Dict[Any | Tuple[Any, Any], str] | None = None, filename: str | None = None, display_plot: bool = True, figure_size: Tuple[int, int] | None = None, **kwargs) None[source]#

Convenience function to plot causal graphs. This function uses different backends based on what’s available on the system. The best result is achieved when using Graphviz as the backend. This requires both the shared system library (e.g. brew install graphviz or apt-get install graphviz) and the Python pygraphviz package (pip install pygraphviz). When graphviz is not available, it will fall back to the networkx backend.

Parameters:
  • causal_graph – The graph to be plotted

  • layout_prog – Defines the layout type. If None is given, the ‘dot’ layout is used for graphviz plots and a customized layout for networkx plots.

  • causal_strengths – An optional dictionary with Edge -> float entries.

  • colors – An optional dictionary with color specifications for edges or nodes.

  • filename – An optional filename if the output should be plotted into a file.

  • display_plot – Optionally specify if the plot should be displayed or not (default to True).

  • figure_size – A tuple to define the width and height (as a tuple) of the pyplot. This is used to parameter to modify pyplot’s ‘figure.figsize’ parameter. If None is given, the current/default value is used.

  • kwargs – Remaining parameters will be passed through to the backend verbatim.

Example usage:

>>> plot(nx.DiGraph([('X', 'Y')])) # plots X -> Y
>>> plot(nx.DiGraph([('X', 'Y')]), causal_strengths={('X', 'Y'): 0.43}) # annotates arrow with 0.43
>>> plot(nx.DiGraph([('X', 'Y')]), colors={('X', 'Y'): 'red', 'X': 'green'}) # colors X -> Y red and X green
dowhy.utils.plotting.plot_adjacency_matrix(adjacency_matrix: DataFrame, is_directed: bool, filename: str | None = None, display_plot: bool = True) None[source]#
dowhy.utils.plotting.pretty_print_graph(graph: DiGraph) None[source]#

Pretty print the graph edges with time lags.

Parameters:

graph (networkx.Graph) – The networkx graph.

Returns:

None

Return type:

None

dowhy.utils.propensity_score module#

dowhy.utils.propensity_score.binarize_discrete(data, covariates, variable_types)[source]#
dowhy.utils.propensity_score.binary_treatment_model(data, covariates, treatment, variable_types)[source]#
dowhy.utils.propensity_score.categorical_treatment_model(data, covariates, treatment, variable_types)[source]#
dowhy.utils.propensity_score.continuous_treatment_model(data, covariates, treatment, variable_types)[source]#
dowhy.utils.propensity_score.discrete_to_integer(discrete)[source]#
dowhy.utils.propensity_score.get_type_string(variables, variable_types)[source]#
dowhy.utils.propensity_score.propensity_of_treatment_score(data, covariates, treatment, model='logistic', variable_types=None)[source]#
dowhy.utils.propensity_score.state_propensity_score(data, covariates, treatments, variable_types=None)[source]#

dowhy.utils.regression module#

dowhy.utils.regression.create_polynomial_function(max_degree)[source]#

Creates a list of polynomial functions

Parameters:

max_degree – degree of the polynomial function to be created

Returns:

list of lambda functions

dowhy.utils.regression.generate_moment_function(W, g)[source]#

Generate and returns moment function m(W,g) = g(1,W) - g(0,W) for Average Causal Effect

dowhy.utils.regression.get_numeric_features(X)[source]#

Finds the numeric feature columns in a dataset

Parameters:

X – pandas dataframe

returns: list of indices of numeric features

Module contents#