dowhy.gcm package#

Subpackages#

Submodules#

dowhy.gcm.anomaly module#

dowhy.gcm.anomaly.anomaly_scores(causal_model: ~dowhy.gcm.causal_models.ProbabilisticCausalModel, anomaly_data: ~pandas.core.frame.DataFrame, num_samples_conditional: int = 10000, num_samples_unconditional: int = 10000, anomaly_scorer_factory: ~typing.Callable[[], ~dowhy.gcm.anomaly_scorer.AnomalyScorer] = <class 'dowhy.gcm.anomaly_scorers.RescaledMedianCDFQuantileScorer'>) Dict[Any, ndarray][source]#
dowhy.gcm.anomaly.attribute_anomalies(causal_model: InvertibleStructuralCausalModel, target_node: Any, anomaly_samples: DataFrame, anomaly_scorer: AnomalyScorer | None = None, attribute_mean_deviation: bool = False, num_distribution_samples: int = 3000, shapley_config: ShapleyConfig | None = None) Dict[Any, ndarray][source]#

Estimates the contributions of upstream nodes to the anomaly score of the target_node for each sample in anomaly_samples. By default, the anomaly score is based on the information theoretic (IT) score -log(P(g(X) >= g(x))), where g is the anomaly_scorer, X samples from the marginal distribution of the target_node and x an observation of the target_node in anomaly_samples. If attribute_mean_deviation is set to True, the contribution to g(x) - E[g(X)] is estimated instead, i.e. the feature relevance for the given scoring function. The underlying algorithm utilizes the reconstructed noise of upstream nodes (including the target_node itself) for the given anomaly_samples. By this, it is possible to estimate how much of the anomaly score can be explained by upstream anomalies with respect to anomalous noise values.

Note: This function requires that the noise can be recovered from samples, i.e. the causal models of non-root nodes need to be an InvertibleNoiseModel (e.g. AdditiveNoiseModel).

Related paper: Janzing, D., Budhathoki, K., Minorics, L., & Bloebaum, P. (2019). Causal structure based root cause analysis of outliers https://arxiv.org/abs/1912.02724

Parameters:
  • causal_model – The fitted InvertibleStructuralCausalModel.

  • target_node – Target node for which the contributions are estimated.

  • anomaly_samples – Anomalous observations for which the contributions are estimated.

  • anomaly_scorer – Anomaly scorer g. If None is given, a MedianCDFQuantileScorer is used.

  • attribute_mean_deviation – If set to False, the contribution is estimated based on the IT score and if it is set to True, the contribution is based on the feature relevance with respect to the given scoring function.

  • num_distribution_samples – Number of samples from X, the marginal distribution of the target. These are used for evaluating the tail probability in case of the IT score (attribute_mean_deviation is False) or as samples for randomization in case of feature relevance (attribute_mean_deviation is True).

  • shapley_configShapleyConfig for the Shapley estimator.

Returns:

A dictionary that assigns a numpy array to each upstream node including the target_node itself. The i-th entry of an array indicates the contribution of the corresponding node to the anomaly score of the target for the i-th observation in anomaly_samples.

dowhy.gcm.anomaly.attribute_anomaly_scores(anomaly_samples: ndarray, distribution_samples: ndarray, anomaly_scoring_func: Callable[[ndarray], ndarray], attribute_mean_deviation: bool, shapley_config: ShapleyConfig | None = None) ndarray[source]#

Estimates the contributions of the features for each sample in anomaly_samples to the anomaly score obtained by the anomaly_scoring_func. If attribute_mean_deviation is set to False, the anomaly score is based on the information theoretic (IT) score -log(P(g(X) >= g(x))), where g is the anomaly_scoring_func, X samples from the marginal distribution of the target_node and x an observation of the target_node in anomaly_samples. If attribute_mean_deviation is set to True, the contribution to g(x) - E[g(X)] is estimated instead, i.e. the feature relevance for the given scorer.

Note that the anomaly scoring function needs to handle the dimension and modality of the data. An example for a function for multidimensional continues data would be:

density_estimator = GaussianMixtureDensityEstimator() density_estimator.fit(original_observations) anomaly_scoring_func = lambda x, y: estimate_inverse_density_score(x, y, density_estimator)

Related paper: Janzing, D., Budhathoki, K., Minorics, L., & Bloebaum, P. (2022). Causal structure based root cause analysis of outliers https://arxiv.org/abs/1912.02724

Parameters:
  • anomaly_samples – Samples x for which the contributions are estimated. The dimensionality of these samples doesn’t matter as long as the anomaly_scoring_func supports it.

  • distribution_samples – Samples from the (non-anomalous) distribution X.

  • anomaly_scoring_func – A function g that takes a sample from X as input and returns an anomaly score.

  • attribute_mean_deviation – If set to False, the contribution is estimated based on the IT score and if it is set to True, the contribution is based on the feature relevance with respect to the given scoring function.

  • shapley_configShapleyConfig for the Shapley estimator.

Returns:

A numpy array with the feature contributions to the anomaly score for each sample in anomaly_samples.

dowhy.gcm.anomaly.conditional_anomaly_scores(parent_samples: ~numpy.ndarray, target_samples: ~numpy.ndarray, causal_mechanism: ~dowhy.gcm.causal_mechanisms.ConditionalStochasticModel, anomaly_scorer_factory: ~typing.Callable[[], ~dowhy.gcm.anomaly_scorer.AnomalyScorer] = <class 'dowhy.gcm.anomaly_scorers.MedianCDFQuantileScorer'>, num_samples_conditional: int = 10000) ndarray[source]#

Estimates the conditional anomaly scores based on the expected outcomes of the causal model.

Parameters:
  • parent_samples – Samples from all parents of the target node.

  • target_samples – Samples from the target node.

  • causal_mechanism – Causal mechanism of the target node.

  • anomaly_scorer_factory – A callable that returns an anomaly scorer.

  • num_samples_conditional – Number of samples drawn from the conditional distribution based on the given parent samples. The more samples, the more accurate the results.

Returns:

The conditional anomaly score for each sample in target_samples.

dowhy.gcm.anomaly_scorer module#

class dowhy.gcm.anomaly_scorer.AnomalyScorer[source]#

Bases: ABC

abstract fit(X: ndarray) None[source]#

Fits the anomaly scorer to the given data. Depending on the definition of the scorer, this can imply different things, such as fitting a (parametric) distribution to the data or estimating certain properties such as mean, variance, median etc. that are used for computing a score.

Parameters:

X – Samples from the underlying data distribution.

abstract score(X: ndarray) ndarray[source]#

dowhy.gcm.anomaly_scorers module#

This module contains implementations of different anomaly scorers.

class dowhy.gcm.anomaly_scorers.ITAnomalyScorer(anomaly_scorer: AnomalyScorer)[source]#

Bases: AnomalyScorer

Transforms any anomaly scorer into an information theoretic (IT) score. This means, given a scorer S(x), an anomalous observation x and samples from the distribution of X, this scorer class represents:

score(x) = -log(P(S(X) >= S(x)))

This is, the negative logarithm of the probability to get the same or a higher score with (random) samples from X compared to the score obtained based on the anomalous observation x. By this, the score of arbitrarily different anomaly scorers become comparable information theoretic quantities. The new score -log(P(S(X) >= S(x))) can also be seen as “The higher the score, the rarer the anomaly event”. For instance, if we have S(x) = c, but observe the same or higher scores in 50% or even 100% of all samples in X, then this is not really a rare event, and thus, not an anomaly. As mentioned above, transforming it into an IT score makes arbitrarily different anomaly scorer with potentially completely different scaling comparable. For example, one could compare the IT score of isolation forests with z-scores.

For more details about IT scores, see:

Causal structure based root cause analysis of outliers Kailash Budhathoki, Patrick Bloebaum, Lenon Minorics, Dominik Janzing (2022)

The higher the score, the higher the likelihood that the observations is an anomaly.

fit(X: ndarray) None[source]#

Fits the anomaly scorer to the given data. Depending on the definition of the scorer, this can imply different things, such as fitting a (parametric) distribution to the data or estimating certain properties such as mean, variance, median etc. that are used for computing a score.

Parameters:

X – Samples from the underlying data distribution.

score(X: ndarray) ndarray[source]#
class dowhy.gcm.anomaly_scorers.InverseDensityScorer(density_estimator: DensityEstimator | None = None)[source]#

Bases: AnomalyScorer

Estimates an anomaly score based on 1 / p(x), where x is the data to score. The density value p(x) is estimated using the given density estimator. If None is given, a Gaussian mixture model is used by default.

Note: The given density estimator needs to support the data types, i.e. if the data has categorical values, the density estimator needs to be able to handle that. The default Gaussian model can only handle numeric data.

Note: If the density p(x) is 0, a nan or inf could be returned.

fit(X: ndarray) None[source]#

Fits the anomaly scorer to the given data. Depending on the definition of the scorer, this can imply different things, such as fitting a (parametric) distribution to the data or estimating certain properties such as mean, variance, median etc. that are used for computing a score.

Parameters:

X – Samples from the underlying data distribution.

score(X: ndarray) ndarray[source]#
class dowhy.gcm.anomaly_scorers.MeanDeviationScorer[source]#

Bases: AnomalyScorer

Given an anomalous observation x and samples from the distribution of X, this score represents:

score(x) = |x - E[X]| / std[X]

This scores the given sample based on its distance to the mean of X and scaled by the standard deviation of X. This is also equivalent to the Z-score in Gaussian variables.

The higher the score, the higher the deviation of the observation from the mean of X.

fit(X: ndarray) None[source]#

Fits the anomaly scorer to the given data. Depending on the definition of the scorer, this can imply different things, such as fitting a (parametric) distribution to the data or estimating certain properties such as mean, variance, median etc. that are used for computing a score.

Parameters:

X – Samples from the underlying data distribution.

score(X: ndarray) ndarray[source]#
class dowhy.gcm.anomaly_scorers.MedianCDFQuantileScorer[source]#

Bases: AnomalyScorer

Given an anomalous observation x and samples from the distribution of X, this score represents:

score(x) = 1 - 2 * min[P(X > x) + P(X = x) / 2, P(X < x) + P(X = x) / 2]

Here, the value x is considered as part of X for the computation.

Comparing two NaN values are considered equal here.

It scores the observation based on the quantile of x with respect to the distribution of X. Here, if the sample x lies in the tail of the distribution, we want to have a large score. Since we apriori don’t know whether the sample falls on the left or right side of the median of X, we estimate the quantile on both sides and take the minimum. Here, these probabilities are estimated by counting and since half of the samples are on one side from the median, we need to multiply this by a factor of two to obtain the two-sided quantile. For example:

X = [-3, -2, -1, 0, 1, 2, 3] x = 2.5

Then, x falls in the right sided-quantile and only one sample in X is larger than x. Therefore, we get

P(X > x) = 1 / 8 P(X < x) = 6 / 8 P(X = x) = 1 / 8

We divide by 8 here, because we consider x itself. This gives us a score of:

1 - 2 * min[P(X > x) + P(X = x) / 2, P(X < x) + P(X = x) / 2] = 1 - 3 / 8 = 0.625

Note: For equal samples, we contribute half of the count to the left and half of the count the right side. Note: For a statistically more rigorous, but also more conservative version, see RankBasedAnomalyScorer.

fit(X: ndarray) None[source]#

Fits the anomaly scorer to the given data. Depending on the definition of the scorer, this can imply different things, such as fitting a (parametric) distribution to the data or estimating certain properties such as mean, variance, median etc. that are used for computing a score.

Parameters:

X – Samples from the underlying data distribution.

score(X: ndarray) ndarray[source]#
class dowhy.gcm.anomaly_scorers.MedianDeviationScorer[source]#

Bases: AnomalyScorer

Given an anomalous observation x and samples from the distribution of X, this score represents:

score(x) = |x - med[X]| / mad[X]

This scores the given sample based on its distance to the median of X and scaled by the median absolute deviation of X.

The higher the score, the higher the deviation of the observation from the median of X.

fit(X: ndarray) None[source]#

Fits the anomaly scorer to the given data. Depending on the definition of the scorer, this can imply different things, such as fitting a (parametric) distribution to the data or estimating certain properties such as mean, variance, median etc. that are used for computing a score.

Parameters:

X – Samples from the underlying data distribution.

score(X: ndarray) ndarray[source]#
class dowhy.gcm.anomaly_scorers.RankBasedAnomalyScorer[source]#

Bases: AnomalyScorer

Similar to the RescaledMedianCDFQuantileScorer, but this scorer is more directly based on ranks and the assumption of exchangeability.

This scorer computes anomaly scores for test samples by evaluating their ranks within the training samples (and a given sample). For each test sample, the scorer computes its rank from above (number of samples greater than or equal to it) and rank from below (number of samples less than or equal to it). It then calculates a p-value based on these ranks, under the assumption of exchangeability. The p-value then represents the probability of observing a rank as extreme as the observed rank or more extreme.

Specifically, the p-value is computed as the minimum of: 1. Twice the rank from above divided by the total number of samples. 2. Twice the rank from below divided by the total number of samples. 3. 1 (to ensure the p-value is at most 1).

This method is non-parametric and makes no assumptions about the underlying distribution of the data.

The anomaly score is then calculated as the negative log of this p-value (i.e. it is an information-theoretic (IT) score). Higher anomaly scores indicate a lower probability, consequently, a higher likelihood of being an anomaly.

For example:

X = [-3, -2, -1, 0, 1, 2, 3] x = 2.5

Then,

p(X >= x) = 2 / 8 P(X <= x) = 7 / 8

Note that we count the sample x itself as equal here in both cases. Which gives the p-value:

-log(min[1, 2 * 7 / 8, 2 * 2 / 8]) = -log(4 / 8) = 0.69314718

fit(X: ndarray) None[source]#

Fits the anomaly scorer to the given data. Depending on the definition of the scorer, this can imply different things, such as fitting a (parametric) distribution to the data or estimating certain properties such as mean, variance, median etc. that are used for computing a score.

Parameters:

X – Samples from the underlying data distribution.

score(X: ndarray) ndarray[source]#
class dowhy.gcm.anomaly_scorers.RescaledMedianCDFQuantileScorer[source]#

Bases: AnomalyScorer

Given an anomalous observation x and samples from the distribution of X, this score represents:

score(x) = -log(2 * min[P(X > x) + P(X = x) / 2, P(X < x) + P(X = x) / 2])

Comparing two NaN values are considered equal here.

This is a rescaled version of the score s obtained by the MedianCDFQuantileScorer by calculating the negative log-probability -log(1 - s). This has the advantage that small differences in the probabilities are amplified, especially when they are close to 0. For instance, the difference between probabilities 0.02 and 0.01 seems to be small and insignificant, but the rescaled difference would be significantly larger: -log(0.02) - log(0.01) −= 8.5

The higher the score, the less likely the sample comes from the distribution of X.

fit(X: ndarray) None[source]#

Fits the anomaly scorer to the given data. Depending on the definition of the scorer, this can imply different things, such as fitting a (parametric) distribution to the data or estimating certain properties such as mean, variance, median etc. that are used for computing a score.

Parameters:

X – Samples from the underlying data distribution.

score(X: ndarray) ndarray[source]#

dowhy.gcm.auto module#

class dowhy.gcm.auto.AssignmentQuality(value)[source]#

Bases: Enum

An enumeration.

BEST = 3#
BETTER = 2#
GOOD = 1#
class dowhy.gcm.auto.AutoAssignmentSummary[source]#

Bases: object

Summary class for logging and storing information of the auto assignment process.

add_model_performance(node, model: str, performance: str, metric_name: str)[source]#
add_node_log_message(node: Any, message: str)[source]#
dowhy.gcm.auto.assign_causal_mechanism_node(causal_model: ProbabilisticCausalModel, node: str, based_on: DataFrame, quality: AssignmentQuality) List[Tuple[Callable[[], PredictionModel], float, str]][source]#
dowhy.gcm.auto.assign_causal_mechanisms(causal_model: ProbabilisticCausalModel, based_on: DataFrame, quality: AssignmentQuality = AssignmentQuality.GOOD, override_models: bool = False) AutoAssignmentSummary[source]#

Automatically assigns appropriate causal mechanisms to nodes. If causal mechanisms are already assigned to nodes and override_models is set to False, this function only validates the assignments with respect to the graph structure. This is, the validation checks whether root nodes have StochasticModels and non-root ConditionalStochasticModels assigned.

The following types of causal mechanisms are considered for the automatic selection:

If root node: An empirical distribution, i.e., the distribution is represented by randomly sampling from the provided data. This provides a flexible and non-parametric way to model the marginal distribution and is valid for all types of data modalities.

If non-root node and the data is continuous: Additive Noise Models (ANM) of the form X_i = f(PA_i) + N_i, where PA_i are the parents of X_i and the unobserved noise N_i is assumed to be independent of PA_i. To select the best model for f, different regression models are evaluated and the model with the smallest mean squared error is selected. Note that minimizing the mean squared error here is equivalent to selecting the best choice of an ANM. See the following paper for more details:

Hoyer, P., Janzing, D., Mooij, J. M., Peters, J., & Schölkopf, B. (2008). Nonlinear causal discovery with additive noise models. Advances in neural information processing systems, 21

If non-root node and the data is discrete: Discrete Additive Noise Models have almost the same definition as non-discrete ANMs, but come with an additional constraint to return discrete values. Note that ‘discrete’ here refers to numerical values with an order. If the data is categorical, consider representing them as strings to ensure proper model selection. See the following paper for more details:

Peters, J., Janzing, D., & Scholkopf, B. (2011). Causal inference on discrete data using additive noise models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2436-2450.

If non-root node and the data is categorical: A functional causal model based on a classifier, i.e., X_i = f(PA_i, N_i). Here, N_i follows a uniform distribution on [0, 1] and is used to randomly sample a class (category) using the conditional probability distribution produced by a classification model. Here, different model classes are evaluated using the (negative) F1 score and the best performing model class is selected.

The current model zoo is:

With “GOOD” quality:

Numerical: - Linear Regressor - Linear Regressor with polynomial features - Histogram Gradient Boost Regressor

Categorical: - Logistic Regressor - Logistic Regressor with polynomial features - Histogram Gradient Boost Classifier

With “BETTER” quality:

Numerical: - Linear Regressor - Linear Regressor with polynomial features - Gradient Boost Regressor - Ridge Regressor - Lasso Regressor - Random Forest Regressor - Support Vector Regressor - Extra Trees Regressor - KNN Regressor - Ada Boost Regressor

Categorical: - Logistic Regressor - Logistic Regressor with polynomial features - Histogram Gradient Boost Classifier - Random Forest Classifier - Extra Trees Classifier - Support Vector Classifier - KNN Classifier - Gaussian Naive Bayes Classifier - Ada Boost Classifier

With “BEST” quality: An auto ML model based on AutoGluon (optional dependency, needs to be installed).

Parameters:
  • causal_model – The causal model to whose nodes to assign causal models.

  • based_on – Jointly sampled data corresponding to the nodes of the given graph.

  • quality

    AssignmentQuality for the automatic model selection and model accuracy. This changes the type of

    prediction model and time spent on the selection. See the docstring for a list of potential models. The options for the quality are:

    • AssignmentQuality.GOOD: Only a small set of models are evaluated.

      Model selection speed: Fast Model training speed: Fast Model inference speed: Fast Model accuracy: Medium

    • AssignmentQuality.BETTER: A larger set of models are evaluated.

      Model selection speed: Medium Model training speed: Fast Model inference speed: Fast Model accuracy: Good

    • AssignmentQuality.BEST: Uses an AutoGluon (auto ML) model with default settings defined by the AutoGluon

      wrapper. While the model selection itself is fast, the training and inference speed can be significantly slower than in the other options. NOTE: This requires the optional autogluon.tabular dependency. Model selection speed: Instant Model training speed: Slow Model inference speed: Slow-Medium Model accuracy: Best

  • override_models – If set to True, existing mechanism assignments are replaced with automatically selected ones. If set to False, the assigned mechanisms are only validated with respect to the graph structure.

Returns:

A summary object containing details about the model selection process.

dowhy.gcm.auto.find_best_model(prediction_model_factories: List[Callable[[], PredictionModel]], X: ndarray, Y: ndarray, metric: Callable[[ndarray, ndarray], float] | None = None, max_samples_per_split: int = 20000, model_selection_splits: int = 5, n_jobs: int | None = None) Tuple[Callable[[], PredictionModel], List[Tuple[Callable[[], PredictionModel], float, str]]][source]#
dowhy.gcm.auto.has_linear_relationship(X: ndarray, Y: ndarray, max_num_samples: int = 3000) bool[source]#
dowhy.gcm.auto.select_model(X: ndarray, Y: ndarray, model_selection_quality: AssignmentQuality) Tuple[PredictionModel | ClassificationModel, List[Tuple[Callable[[], PredictionModel], float, str]]][source]#

dowhy.gcm.causal_mechanisms module#

This module implements different causal mechanisms.

class dowhy.gcm.causal_mechanisms.AdditiveNoiseModel(prediction_model: PredictionModel, noise_model: StochasticModel | None = None)[source]#

Bases: PostNonlinearModel

Represents the continuous functional causal model of the form

Y = f(X) + N,

where X is the input (typically, direct causal parents of Y) and the noise N is assumed to be independent of X. This is a special instance of a PostNonlinearModel where the function g is the identity function.

Given joint samples from (X, Y), this model can be fitted by first training a model f (e.g. using least squares regression) and then reconstruct N by N = Y - f(X), i.e. using the residual.

Parameters:
  • prediction_model – The prediction model f.

  • invertible_function – The invertible function g.

  • noise_model – The StochasticModel to describe the distribution of the noise N.

clone()[source]#
class dowhy.gcm.causal_mechanisms.ClassifierFCM(classifier_model: ClassificationModel | None = None)[source]#

Bases: FunctionalCausalModel, ProbabilityEstimatorModel

Represents the categorical functional causal model of the form

Y = f(X, N),

where X is the input (typically, direct causal parents of Y) and the noise N here is uniform on [0, 1]. The model is mostly based on a standard classification model that outputs probabilities. In order to generate a new random sample given an input x, the return value y is uniformly sampled based on the class probabilities p(y | x). Here, the noise is used to make this sampling process deterministic by using the cumulative distribution functions defined by the given inputs.

property classifier_model: ClassificationModel#
clone()[source]#
draw_noise_samples(num_samples: int) ndarray[source]#

Returns uniformly sampled values on [0, 1].

Parameters:

num_samples – Number of noise samples.

Returns:

Noise samples on [0, 1].

estimate_probabilities(parent_samples: ndarray) ndarray[source]#

Returns the class probabilities for the given parent_samples.

Parameters:

parent_samples – Samples from inputs X.

Returns:

A nxd numpy matrix with class probabilities for each sample, where n is the number of samples and d the number of classes. Here, array entry A[i][j] corresponds to the i-th sample indicating the probability of the j-th class.

evaluate(parent_samples: ndarray, noise_samples: ndarray) ndarray[source]#

Evaluates the model Y = f(X, N), where X are the parent_samples and N the noise_samples. Here, the cumulative distribution functions are defined by the parent_samples. For instance, lets say we have 2 classes, n = 0.7 and an input x with p(y = 0| x) = 0.6 and p(y = 1| x) = 0.4, then we get y = 1 as a return value. This is because p(y = 0| x) < n <= 1.0, i.e. n falls into the bucket that is spanned by p(y = 1| x).

Parameters:
  • parent_samples – Samples from the inputs X.

  • noise_samples – Samples from the noise on [0, 1].

Returns:

Class labels Y based on the inputs and noise.

fit(X: ndarray, Y: ndarray) None[source]#

Fits the underlying classification model.

Parameters:
  • X – Input samples.

  • Y – Target labels.

Returns:

None

get_class_names(class_indices: ndarray) List[str][source]#
class dowhy.gcm.causal_mechanisms.ConditionalStochasticModel[source]#

Bases: ABC

A conditional stochastic model represents a model used for causal mechanisms for non-root nodes in a graphical causal model.

abstract clone()[source]#
abstract draw_samples(parent_samples: ndarray) ndarray[source]#

Draws samples for the fitted model.

abstract fit(X: ndarray, Y: ndarray) None[source]#

Fits the model according to the data.

class dowhy.gcm.causal_mechanisms.DiscreteAdditiveNoiseModel(prediction_model: PredictionModel, noise_model: StochasticModel | None = None)[source]#

Bases: AdditiveNoiseModel

Implements a discrete ANM. This is, it follows a normal ANM of the form Y = f(X) + N, where N is assumed to be independent of X and f is forced to output discrete values. To allow for flexible models, f can be any regression model and the output will be rounded to a discrete value accordingly. Note that this remains a valid additive noise model, but assumes that Y can take any integer value.

Parameters:
  • prediction_model – The prediction model f.

  • invertible_function – The invertible function g.

  • noise_model – The StochasticModel to describe the distribution of the noise N.

clone()[source]#
estimate_noise(target_samples: ndarray, parent_samples: ndarray) ndarray[source]#

Reconstruct the noise given samples from (X, Y). This is done by:

  1. Transform Y via the inverse of g: g^-1(Y) = f(X) + N

  2. Return the residual g^-1(Y) - f(X)

Parameters:
  • target_samples – Samples from the input X.

  • parent_samples – Samples from the target Y.

Returns:

The reconstructed noise based on the given samples.

evaluate(parent_samples: ndarray, noise_samples: ndarray) ndarray[source]#

Evaluates the post non-linear model given samples (X, N). This is done by:

  1. Evaluate f(X)

  2. Evaluate f(X) + N

  3. Return g(f(X) + N)

Parameters:
  • parent_samples – Samples from the inputs X.

  • noise_samples – Samples from the noise N.

Returns:

The Y values based on the given samples.

fit(X: ndarray, Y: ndarray) None[source]#

Fits the post non-linear model of the form Y = g(f(X) + N). Here, this consists of three steps given samples from (X, Y):

  1. Transform Y via the inverse of g: g^-1(Y) = f(X) + N

  2. Fit the model for f on (X, g^-1(Y))

  3. Reconstruct N based on the residual N = g^-1(Y) - f(X)

Note that the noise here can be inferred uniquely if the model assumption holds.

Parameters:
  • X – Samples from the input X.

  • Y – Samples from the target Y.

Returns:

None

class dowhy.gcm.causal_mechanisms.FunctionalCausalModel[source]#

Bases: ConditionalStochasticModel

Represents a Functional Causal Model (FCM), a specific type of conditional stochastic model, that is defined as:

Y := f(X, N), N: Noise

abstract draw_noise_samples(num_samples: int) ndarray[source]#
draw_samples(parent_samples: ndarray) ndarray[source]#

Draws samples for the fitted model.

abstract evaluate(parent_samples: ndarray, noise_samples: ndarray) ndarray[source]#
class dowhy.gcm.causal_mechanisms.InvertibleFunctionalCausalModel[source]#

Bases: FunctionalCausalModel, ABC

abstract estimate_noise(target_samples: ndarray, parent_samples: ndarray) ndarray[source]#
class dowhy.gcm.causal_mechanisms.PostNonlinearModel(prediction_model: PredictionModel, noise_model: StochasticModel, invertible_function: InvertibleFunction)[source]#

Bases: InvertibleFunctionalCausalModel

Represents an post nonlinear FCM, i.e. models of the form:

Y = g(f(X) + N),

where X are parent nodes of the target node Y, f an arbitrary prediction model expecting inputs from the parents X, N a noise variable and g an invertible function.

Parameters:
  • prediction_model – The prediction model f.

  • invertible_function – The invertible function g.

  • noise_model – The StochasticModel to describe the distribution of the noise N.

clone()[source]#
draw_noise_samples(num_samples: int) ndarray[source]#

Draws samples from the noise distribution N.

Parameters:

num_samples – Number of noise samples.

Returns:

A numpy array containing num_samples samples from the noise.

estimate_noise(target_samples: ndarray, parent_samples: ndarray) ndarray[source]#

Reconstruct the noise given samples from (X, Y). This is done by:

  1. Transform Y via the inverse of g: g^-1(Y) = f(X) + N

  2. Return the residual g^-1(Y) - f(X)

Parameters:
  • target_samples – Samples from the input X.

  • parent_samples – Samples from the target Y.

Returns:

The reconstructed noise based on the given samples.

evaluate(parent_samples: ndarray, noise_samples: ndarray) ndarray[source]#

Evaluates the post non-linear model given samples (X, N). This is done by:

  1. Evaluate f(X)

  2. Evaluate f(X) + N

  3. Return g(f(X) + N)

Parameters:
  • parent_samples – Samples from the inputs X.

  • noise_samples – Samples from the noise N.

Returns:

The Y values based on the given samples.

fit(X: ndarray, Y: ndarray) None[source]#

Fits the post non-linear model of the form Y = g(f(X) + N). Here, this consists of three steps given samples from (X, Y):

  1. Transform Y via the inverse of g: g^-1(Y) = f(X) + N

  2. Fit the model for f on (X, g^-1(Y))

  3. Reconstruct N based on the residual N = g^-1(Y) - f(X)

Note that the noise here can be inferred uniquely if the model assumption holds.

Parameters:
  • X – Samples from the input X.

  • Y – Samples from the target Y.

Returns:

None

property invertible_function: InvertibleFunction#
property noise_model: StochasticModel#
property prediction_model: PredictionModel#
class dowhy.gcm.causal_mechanisms.ProbabilityEstimatorModel[source]#

Bases: ABC

abstract estimate_probabilities(parent_samples: ndarray) ndarray[source]#
class dowhy.gcm.causal_mechanisms.StochasticModel[source]#

Bases: ABC

A stochastic model represents a model used for causal mechanisms for root nodes in a graphical causal model.

abstract clone()[source]#
abstract draw_samples(num_samples: int) ndarray[source]#

Draws samples for the fitted model.

abstract fit(X: ndarray) None[source]#

Fits the model according to the data.

dowhy.gcm.causal_models module#

This module defines the fundamental classes for graphical causal models (GCMs).

class dowhy.gcm.causal_models.InvertibleStructuralCausalModel(graph: ~dowhy.graph.DirectedGraph | None = None, graph_copier: ~typing.Callable[[~dowhy.graph.DirectedGraph], ~dowhy.graph.DirectedGraph] = <class 'networkx.classes.digraph.DiGraph'>, remove_existing_mechanisms: bool = False)[source]#

Bases: StructuralCausalModel

Represents an invertible structural graphical causal model, as required e.g. by counterfactual_samples(). This is a subclass of StructuralCausalModel and has further restrictions on the class of causal mechanisms. Here, the mechanisms of non-root nodes need to be invertible with respect to the noise, such as PostNonlinearModel.

Parameters:
  • graph – Optional graph object to be used as causal graph.

  • graph_copier – Optional function that can copy a causal graph. Defaults to a networkx.DiGraph constructor.

  • remove_existing_mechanisms – If True, removes existing causal mechanisms assigned to nodes if they exist. Otherwise, does not modify graph.

causal_mechanism(node: Any) StochasticModel | InvertibleFunctionalCausalModel[source]#

Returns the generative causal model of node in the causal graph.

Parameters:

node – Target node whose causal model is to be assigned.

Returns:

The causal mechanism for this node. A root node is of type StochasticModel, whereas a non-root node is of type ConditionalStochasticModel.

set_causal_mechanism(target_node: Any, mechanism: StochasticModel | InvertibleFunctionalCausalModel) None[source]#

Assigns the generative causal model of node in the causal graph.

Parameters:
  • node – Target node whose causal model is to be assigned.

  • mechanism – Causal mechanism to be assigned. A root node must be a StochasticModel, whereas a non-root node must be a ConditionalStochasticModel.

class dowhy.gcm.causal_models.ProbabilisticCausalModel(graph: ~dowhy.graph.DirectedGraph | None = None, graph_copier: ~typing.Callable[[~dowhy.graph.DirectedGraph], ~dowhy.graph.DirectedGraph] = <class 'networkx.classes.digraph.DiGraph'>, remove_existing_mechanisms: bool = False)[source]#

Bases: object

Represents a probabilistic graphical causal model, i.e. it combines a graphical representation of causal causal relationships and corresponding causal mechanism for each node describing the data generation process. The causal mechanisms can be any general stochastic models.

Parameters:
  • graph – Optional graph object to be used as causal graph.

  • graph_copier – Optional function that can copy a causal graph. Defaults to a networkx.DiGraph constructor.

  • remove_existing_mechanisms – If True, removes existing causal mechanisms assigned to nodes if they exist. Otherwise, does not modify graph.

causal_mechanism(node: Any) StochasticModel | ConditionalStochasticModel[source]#

Returns the generative causal model of node in the causal graph.

Parameters:

node – Target node whose causal model is to be assigned.

Returns:

The causal mechanism for this node. A root node is of type StochasticModel, whereas a non-root node is of type ConditionalStochasticModel.

clone()[source]#

Clones the causal model, but keeps causal mechanisms untrained.

set_causal_mechanism(node: Any, mechanism: StochasticModel | ConditionalStochasticModel) None[source]#

Assigns the generative causal model of node in the causal graph.

Parameters:
  • node – Target node whose causal model is to be assigned.

  • mechanism – Causal mechanism to be assigned. A root node must be a StochasticModel, whereas a non-root node must be a ConditionalStochasticModel.

class dowhy.gcm.causal_models.StructuralCausalModel(graph: ~dowhy.graph.DirectedGraph | None = None, graph_copier: ~typing.Callable[[~dowhy.graph.DirectedGraph], ~dowhy.graph.DirectedGraph] = <class 'networkx.classes.digraph.DiGraph'>, remove_existing_mechanisms: bool = False)[source]#

Bases: ProbabilisticCausalModel

Represents a structural causal model (SCM), as required e.g. by counterfactual_samples(). As compared to a ProbabilisticCausalModel, an SCM describes the data generation process in non-root nodes by functional causal models.

Parameters:
  • graph – Optional graph object to be used as causal graph.

  • graph_copier – Optional function that can copy a causal graph. Defaults to a networkx.DiGraph constructor.

  • remove_existing_mechanisms – If True, removes existing causal mechanisms assigned to nodes if they exist. Otherwise, does not modify graph.

causal_mechanism(node: Any) StochasticModel | FunctionalCausalModel[source]#

Returns the generative causal model of node in the causal graph.

Parameters:

node – Target node whose causal model is to be assigned.

Returns:

The causal mechanism for this node. A root node is of type StochasticModel, whereas a non-root node is of type ConditionalStochasticModel.

set_causal_mechanism(node: Any, mechanism: StochasticModel | FunctionalCausalModel) None[source]#

Assigns the generative causal model of node in the causal graph.

Parameters:
  • node – Target node whose causal model is to be assigned.

  • mechanism – Causal mechanism to be assigned. A root node must be a StochasticModel, whereas a non-root node must be a ConditionalStochasticModel.

dowhy.gcm.causal_models.clone_causal_models(source: HasNodes, destination: HasNodes)[source]#
dowhy.gcm.causal_models.validate_causal_dag(causal_graph: DirectedGraph) None[source]#
dowhy.gcm.causal_models.validate_causal_graph(causal_graph: DirectedGraph) None[source]#
dowhy.gcm.causal_models.validate_causal_model_assignment(causal_graph: DirectedGraph, target_node: Any) None[source]#
dowhy.gcm.causal_models.validate_local_structure(causal_graph: DirectedGraph, node: Any) None[source]#
dowhy.gcm.causal_models.validate_node(causal_graph: DirectedGraph, node: Any) None[source]#
dowhy.gcm.causal_models.validate_node_has_causal_model(causal_graph: HasNodes, node: Any) None[source]#

dowhy.gcm.confidence_intervals module#

This module provides functionality to estimate confidence intervals via bootstrapping.

dowhy.gcm.confidence_intervals.confidence_intervals(estimation_func: ~typing.Callable[[], ~numpy.ndarray] | ~typing.Callable[[], ~typing.Dict[~typing.Any, float]], confidence_level: float = 0.95, num_bootstrap_resamples: int = 20, bootstrap_results_summary_func: ~typing.Callable[[~numpy.ndarray], ~numpy.ndarray] = <function estimate_geometric_median>, n_jobs: int = 1) Tuple[ndarray | Dict[Any, ndarray], ndarray | Dict[Any, ndarray]][source]#

Estimates confidence intervals based on the outputs generated by calling the given estimation_func. Since one result for each repetition is produced, all results can be summarized by the method defined in summary_method_of_bootstrap_results. For instance, summary_method_of_bootstrap_results = lambda x: numpy.mean(x, axis=0) to get the mean over all runs. By default, the geometric median is returned.

Currently, the confidence intervals are empirically estimated based on the n-th estimated quantiles (without bias correction) of the results, where the quantiles are determined by the given confidence_level.

NOTE: The outputs of estimation_func are assumed to be pairwise independent. For multidimensional outputs of estimation_func, this could be violated and should be kept in mind. For instance, when evaluating the outcome of interventions in a graph like X -> Y -> Z, the confidence intervals are estimate independently for X, Y and Z although they have a strong dependency. If estimation_func returns one dimensional results, as for instance when estimating the direct arrow strength, then there should be no problem.

Example usage with numpy array output:

>>> def estimation_func() -> np.ndarray:
>>>     return direct_arrow_strength_of_model(causal_model, parent_data)
>>>
>>> arrow_strengths, confidence_intervals = confidence_intervals(estimation_func)

Example usage with dictionary output:

>>> def estimation_func() -> Dict[Any, float]:
>>>     return distribution_change(
>>>             causal_dag, original_observations, outlier_observations, 'X3')
>>>
>>> mean_contributions, confidence_intervals = confidence_intervals(estimation_func)

More details about the estimation of confidence intervals via bootstrapping can be found here.

Parameters:
  • estimation_func – Function that generates a non-deterministic output for which the confidence interval(s) are estimated.

  • confidence_level – Confidence level of the interval.

  • num_bootstrap_resamples – Number of samples generated by estimation_func, i.e. number of times is called. The higher the number, the more accurate the results and intervals, but the slower the runtime.

  • bootstrap_results_summary_func – Function that takes a numpy array with all results as an input and returns a single (potentially multidimensional) value/vector. For instance, the mean or median over all results.

  • n_jobs – Number of parallel jobs. Each repetition can be estimated in parallel. However, since many other functions of the library are already running in parallel ( such as distribution change), this is set to 1 by default. Only if it is certain that the estimation_func is not running in parallel internally (e.g. when performing interventions), this should be set to a different value.

Returns:

A tuple (summarized result over all repetitions based on summary_method_of_bootstrap_results, confidence interval for each dimension/variable)

dowhy.gcm.confidence_intervals.estimate_geometric_median(X: ndarray) ndarray[source]#

dowhy.gcm.confidence_intervals_cms module#

This module provides functionality to estimate confidence intervals via bootstrapping the fitting and sampling.

dowhy.gcm.confidence_intervals_cms.fit_and_compute(f: Callable[[ProbabilisticCausalModel | StructuralCausalModel | InvertibleStructuralCausalModel, Any], Dict[Any, ndarray | float]], causal_model: ProbabilisticCausalModel | StructuralCausalModel | InvertibleStructuralCausalModel, bootstrap_training_data: DataFrame, bootstrap_data_subset_size_fraction: float = 0.75, auto_assign_quality: AssignmentQuality | None = None, *args, **kwargs)[source]#

A convenience function when computing confidence intervals specifically for causal queries. This function specifically bootstraps training and sampling.

Example usage:

>>> scores_median, scores_intervals = gcm.confidence_intervals(
>>>     gcm.fit_and_compute(gcm.arrow_strength,
>>>                         causal_model,
>>>                         bootstrap_training_data=data,
>>>                         target_node='Y'))
Parameters:
  • f – The causal query to perform. A causal query is a function taking a graphical causal model as first parameter and an arbitrary number of remaining parameters. It must return a dictionary with attribution-like data.

  • causal_model – A graphical causal model to perform the causal query on. It need not be fitted.

  • bootstrap_training_data – The training data to use when fitting. A random subset from this data set is used in every iteration when calling fit.

  • bootstrap_data_subset_size_fraction – The fraction defines the fractional size of the subset compared to the total training data.

  • auto_assign_quality – If a quality is provided, then the existing causal mechanisms in the given causal_model are overridden by new automatically inferred mechanisms based on the provided AssignmentQuality. If None is given, the existing assigned mechanisms are used.

  • args – Args passed through verbatim to the causal queries.

  • kwargs – Keyword args passed through verbatim to the causal queries.

Returns:

A tuple containing (1) the median of causal query results and (2) the confidence intervals.

dowhy.gcm.config module#

dowhy.gcm.config.disable_progress_bars()[source]#
dowhy.gcm.config.enable_progress_bars()[source]#
dowhy.gcm.config.set_default_n_jobs(n_jobs: int) None[source]#

dowhy.gcm.constant module#

dowhy.gcm.density_estimator module#

class dowhy.gcm.density_estimator.DensityEstimator[source]#

Bases: ABC

abstract density(X: ndarray) ndarray[source]#

Returns the density of each input.

abstract fit(X: ndarray) None[source]#

dowhy.gcm.density_estimators module#

This module contains implementations of different density estimators.

class dowhy.gcm.density_estimators.GaussianMixtureDensityEstimator(num_components: int | None = None)[source]#

Bases: DensityEstimator

Represents a density estimator based on a Gaussian mixture model. The estimator uses the sklearn BayesianGaussianMixture model internally.

density(X: ndarray) ndarray[source]#

Returns the density of each input.

fit(X: ndarray) None[source]#
class dowhy.gcm.density_estimators.KernelDensityEstimator1D[source]#

Bases: DensityEstimator

Represents a kernel based density estimator. The estimator uses the sklearn KernelDensity class internally.

density(X: ndarray) ndarray[source]#

Returns the density of each input.

fit(X: ndarray) None[source]#

dowhy.gcm.distribution_change module#

This module defines functions to attribute distribution changes.

dowhy.gcm.distribution_change.distribution_change(causal_model: ~dowhy.gcm.causal_models.ProbabilisticCausalModel, old_data: ~pandas.core.frame.DataFrame, new_data: ~pandas.core.frame.DataFrame, target_node: ~typing.Any, invariant_nodes: ~typing.List[~typing.Any] | None = None, num_samples: int = 2000, difference_estimation_func: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function auto_estimate_kl_divergence>, independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, conditional_independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, mechanism_change_test_significance_level: float = 0.05, mechanism_change_test_fdr_control_method: str | None = 'fdr_bh', auto_assignment_quality: ~dowhy.gcm.auto.AssignmentQuality | None = None, return_additional_info: bool = False, shapley_config: ~dowhy.gcm.shapley.ShapleyConfig | None = None, graph_factory: ~typing.Callable[[~typing.Any], ~dowhy.graph.DirectedGraph] = <class 'networkx.classes.digraph.DiGraph'>) Dict[Any, float] | Tuple[Dict[Any, float], Dict[Any, bool], ProbabilisticCausalModel, ProbabilisticCausalModel][source]#

Attributes the change in the marginal distribution of the target_node to nodes upstream in the causal DAG.

Note that this method creates two copies of the causal DAG. The causal models of one causal DAG are learned from old data and those of another DAG are learned from new data.

Research Paper: Kailash Budhathoki, Dominik Janzing, Patrick Bloebaum, Hoiyi Ng. Why did the distribution change?. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1666-1674, 2021.

Parameters:
  • causal_model – Reference causal model.

  • old_data – Joint samples from the ‘old’ distribution.

  • new_data – Joint samples from the ‘new’ distribution.

  • target_node – Target node of interest for attributing the marginal distribution change.

  • invariant_nodes – List of nodes where the mechanism is kept constant regardless of changes in the datasets being analyzed.

  • num_samples – Number of samples used for estimating Shapley values. This can have a significant influence on runtime and accuracy.

  • difference_estimation_func – Function for quantifying the distribution change. This function should expect two inputs which represent samples from two different distributions, e.g. difference in average values.

  • independence_test – Unconditional independence test. This is used to identify mechanism changes in root nodes.

  • conditional_independence_test – Conditional independence test. This is used to identify mechanism changes in non-root nodes.

  • mechanism_change_test_significance_level – A significance level for rejecting the null hypothesis that the causal mechanism of a node has not changed.

  • mechanism_change_test_fdr_control_method – The false discovery rate control method for mechanism change tests. For more options, checkout statsmodels manual.

  • auto_assignment_quality – If set to None, the assigned models from the given causal models are used for the old and new graph. However, they are re-fitted on the given data. If set to a valid assignment quality, new models are automatically assigned to the old and new graph based on the respective data.

  • return_additional_info – If set to True, three additional items are returned: a dictionary indicating whether each node’s mechanism changed, the causal DAG whose causal models are learned from old data, and the causal DAG whose causal models are learned from new data.

  • shapley_config – Configuration for the Shapley estimator.

  • graph_factory – Allows customization in case a graph class different than networkx.DiGraph should be used. This function must copy nodes and edges. Attributes of nodes will be overridden in the copy, so the algorithm is independent of the attribute copy behavior of this factory.

Returns:

By default, if return_additional_info is set to False, only the dictionary containing contribution of each upstream node is returned. If return_additional_info is set to True, three additional items are returned: a dictionary indicating whether each node’s mechanism changed, the causal DAG whose causal models learned from old data, and the causal DAG whose causal models are learned from new data.

dowhy.gcm.distribution_change.distribution_change_of_graphs(causal_model_old: ~dowhy.gcm.causal_models.ProbabilisticCausalModel, causal_model_new: ~dowhy.gcm.causal_models.ProbabilisticCausalModel, target_node: ~typing.Any, num_samples: int = 2000, difference_estimation_func: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function auto_estimate_kl_divergence>, shapley_config: ~dowhy.gcm.shapley.ShapleyConfig | None = None, graph_factory: ~typing.Callable[[~typing.Any], ~dowhy.graph.DirectedGraph] = <class 'networkx.classes.digraph.DiGraph'>) Dict[Any, float][source]#

Attributes the change of the marginal distribution of target_node to upstream nodes based on the distributions generated by the ‘old’ and ‘new’ causal graphs. These graphs are assumed to represent the same causal structure and to be fitted on the respective data.

Note: This method creates a copy of the given causal models, i.e. the original objects will not be modified.

Related paper: Budhathoki, K., Janzing, D., Bloebaum, P., & Ng, H. (2021). Why did the distribution change? arXiv preprint arXiv:2102.13384.

Parameters:
  • causal_model_old – The ProbabilisticCausalModel fitted on the ‘old’ data.

  • causal_model_new – The ProbabilisticCausalModel fitted on the ‘new’ data.

  • target_node – Node of interest for attributing the marginal distribution change.

  • num_samples – Number of samples used for the estimation. This can have a significant influence on the runtime and accuracy.

  • difference_estimation_func – Function for quantifying the distribution change. This function should expect two inputs which represent samples from two different distributions. An example could be the KL divergence.

  • shapley_config – Config for the Shapley estimator.

  • graph_factory – Allows customization in case a graph class different than networkx.DiGraph should be used. This function must copy nodes and edges. Attributes of nodes will be overridden in the copy, so the algorithm is independent of the attribute copy behavior of this factory.

Returns:

A dictionary containing the contributions of upstream nodes to the marginal distribution change in the target node.

dowhy.gcm.distribution_change.estimate_distribution_change_scores(causal_model: ~dowhy.gcm.causal_models.ProbabilisticCausalModel, original_data: ~pandas.core.frame.DataFrame, new_data: ~pandas.core.frame.DataFrame, difference_estimation_func: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], ~numpy.ndarray | float] = <function auto_estimate_kl_divergence>, max_num_evaluation_samples: int = 1000, num_joint_samples: int = 500, early_stopping_percentage: float = 0.01, independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, conditional_independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, mechanism_change_test_significance_level: float = 0.05, mechanism_change_test_fdr_control_method: str | None = 'fdr_bh') Dict[Any, float][source]#

Given newly observed and original samples from the joint distribution of the given causal graphical model, this method estimates a score for each node that quantifies how much the distribution of the node has changed. For this, it first checks whether the underlying causal mechanism has changed at all and, if this is the case, it estimates the difference between the new and original distributions. The score is based on the quantity measured by the provided difference_estimation_func or 0 if no mechanism change has been detected.

Note that for each parent sample, num_joint_samples conditional samples are generated based on the original and new causal mechanism and evaluated by the given difference_estimation_func function. These results are then averaged over multiple different parent samples.

Parameters:
  • causal_model – The underlying causal model based on the original data.

  • original_data – Samples from the original data.

  • new_data – Samples from the new data.

  • difference_estimation_func – Function for quantifying the distribution change. This function should expect two inputs which represent samples from two different distributions. An example could be the KL divergence.

  • max_num_evaluation_samples – Maximum number of (parent) samples for evaluating the difference in distributions.

  • num_joint_samples – Number of samples generated in a node per parent sample.

  • early_stopping_percentage – If the change in percentage between multiple consecutive runs is below this threshold, the evaluation stops before evaluating all max_num_evaluation_samples.

  • independence_test – Unconditional independence test. This is used to identify mechanism changes in root nodes.

  • conditional_independence_test – Conditional independence test. This is used to identify mechanism changes in non-root nodes.

  • mechanism_change_test_significance_level – A significance level for rejecting the null hypothesis that the causal mechanism of a node has not changed.

  • mechanism_change_test_fdr_control_method

    The false discovery rate control method for mechanism change tests. For more options, checkout statsmodels manual.

Returns:

A dictionary assining a score to each node in the causal graph.

dowhy.gcm.distribution_change.mechanism_change_test(target_original_data: ~numpy.ndarray, target_new_data: ~numpy.ndarray, parents_original_data: ~numpy.ndarray | None = None, parents_new_data: ~numpy.ndarray | None = None, independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, conditional_independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>) float[source]#

Estimates a p-value for the null hypothesis that the original and new data were generated by the same mechanism. Here, we check the dependency between binary labels indicating whether a sample is from the original or a new data set. If the labels do not provide information to determine if a sample is coming from the original/new distribution, then it is likely that the mechanism has not changed.

For non-root nodes, samples from parent variables are needed as conditioning variables. This is, testing the null hypothesis that the data were generated by the same mechanism given the parent samples. By this, we incorporate upstream changes that might have impacted the parents, but not the target node itself.

Parameters:
  • target_original_data – Samples of the node from the original data set.

  • target_new_data – Samples of the node from the new data set.

  • parents_original_data – Samples from parents of the node from the original data set.

  • parents_new_data – Samples from parents of the node from the new data set.

  • independence_test – Unconditional independence test. This is used to identify mechanism changes in nodes without parents.

  • conditional_independence_test – Conditional independence test. This is used to identify mechanism changes in nodes with parents.

Returns:

A p-value for the null hypothesis that the mechanism has not changed.

dowhy.gcm.divergence module#

dowhy.gcm.divergence.auto_estimate_kl_divergence(X: ndarray, Y: ndarray) float[source]#
dowhy.gcm.divergence.estimate_kl_divergence_categorical(X: ndarray, Y: ndarray) float[source]#
dowhy.gcm.divergence.estimate_kl_divergence_continuous_clf(samples_P: ~numpy.ndarray, samples_Q: ~numpy.ndarray, n_splits: int = 5, classifier_model: ~dowhy.gcm.auto.AssignmentQuality | ~typing.Callable[[], ~dowhy.gcm.ml.classification.ClassificationModel] = functools.partial(<function create_logistic_regression_classifier>, max_iter=10000), epsilon: float = 2.220446049250313e-16) float[source]#

Estimates KL-Divergence based on probabilities given by classifier. This is:

D_f(P || Q) = int f(p(x)/q(x)) q(x) dx ~= -1/N sum_x log(p(Y = 1 | x) / (1 - p(Y = 1 | x)))

Here, the KL divergence can be approximated using the log ratios of probabilities to predict whether a sample comes from distribution P or Q.

Parameters:
  • samples_P – Samples drawn from P. Can have a different number of samples than Q.

  • samples_Q – Samples drawn from Q. Can have a different number of samples than P.

  • n_splits – Number of splits of the training and test data. The classifier is trained on the training data and evaluated on the test data to obtain the probabilities.

  • classifier_model – Used to estimate the probabilities for the log ratio. This can either be a ClassificationModel or an AssignmentQuality. In the latter, a model is automatically selected based on the best performance on a training set.

  • epsilon – If the probability is either 1 or 0, this value will be used for clipping, i.e., 0 becomes epsilon and 1 becomes 1- epsilon.

Returns:

Estimated value of the KL divergence D(P||Q).

dowhy.gcm.divergence.estimate_kl_divergence_continuous_knn(X: ndarray, Y: ndarray, k: int = 1, remove_common_elements: bool = True, n_jobs: int = 1) float[source]#

Estimates KL-Divergence using k-nearest neighbours (Wang et al., 2009).

While, in theory, this handles multidimensional inputs, consider using estimate_kl_divergence_continuous_clf for data with more than one dimension.

Q. Wang, S. R. Kulkarni, and S. Verdú, “Divergence estimation for multidimensional densities via k-nearest-neighbor distances”, IEEE Transactions on Information Theory, vol. 55, no. 5, pp. 2392-2405, May 2009.

Parameters:
  • X – (N_1,D) Sample drawn from distribution P_X

  • Y – (N_2,D) Sample drawn from distribution P_Y

  • k – Number of neighbors to consider.

  • remove_common_elements – If true, common values in X and Y are removed. This would otherwise lead to a KNN distance of zero for these values if k is set to 1, which would cause a division by zero error.

  • n_jobs – Number of parallel jobs used for the nearest neighbors model. -1 means it uses all available cores. Note that in most applications, parallelizing this rather introduces more overhead, leading to a slower runtime.

return: Estimated value of D(P_X||P_Y).

dowhy.gcm.divergence.estimate_kl_divergence_of_probabilities(X: ndarray, Y: ndarray) float[source]#

Estimates the Kullback-Leibler divergence between each pair of probability vectors (row wise) in X and Y separately and returns the mean over all results.

dowhy.gcm.divergence.is_probability_matrix(X: ndarray) bool[source]#

dowhy.gcm.falsify module#

This module provides functionality to falsify a user-given DAG given observed data.

class dowhy.gcm.falsify.EvaluationResult(summary: dict, significance_level: float, suggestions: dict | None = None)[source]#

Bases: object

Dataset class containing the evaluation result of falsifying a graph using a node-permutation test.

Attributes#

summarydict

Dictionary containing the summary of the evaluation.

significance_levelfloat

Significance level based on which we falsify the given DAG

falsifiablebool

Whether the given DAG is falsifiable.

falsifiedbool

Whether the given DAG is falsified.

significance_level: float#
suggestions: dict | None = None#
summary: dict#
update_significance_level(significance_level: float)[source]#

Update the significance level to decide if we falsify a given DAG.

class dowhy.gcm.falsify.FalsifyConst(value)[source]#

Bases: Enum

An enumeration.

F_GIVEN_VIOLATIONS = 7#
F_PERM_VIOLATIONS = 8#
GIVEN_VIOLATIONS = 5#
LOCAL_VIOLATION_INSIGHT = 9#
MEC = 16#
METHOD = 10#
N_TESTS = 2#
N_VIOLATIONS = 1#
PERM_GRAPHS = 15#
PERM_VIOLATIONS = 6#
P_VALUE = 3#
P_VALUES = 4#
VALIDATE_CM = 14#
VALIDATE_LMC = 11#
VALIDATE_PD = 13#
VALIDATE_TPA = 12#
dowhy.gcm.falsify.apply_suggestions(causal_graph: DirectedGraph, evaluation_result: EvaluationResult, edges_to_keep: List[Tuple[Any, Any]] | None = None)[source]#
dowhy.gcm.falsify.falsify_graph(causal_graph: ~dowhy.graph.DirectedGraph, data: ~pandas.core.frame.DataFrame, suggestions: bool = False, independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, conditional_independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, significance_level: float = 0.05, significance_ci: float = 0.05, n_permutations: int | None = None, show_progress_bar: bool | None = None, n_jobs: int | None = None, plot_histogram: bool = False, plot_kwargs: ~typing.Dict | None = None, allow_data_subset: bool = True) EvaluationResult[source]#

Falsify a given DAG using observational data.

This method returns the result of a permutation-test to falsify a user-given DAG using observational data. To this end we construct the test statistics by testing the violations of local Markov conditions (LMC) implied by the graph using conditional independence (CI) tests. The null is the number of LMC violations of a random node-permutation of the given graph. Our test can be interpreted as whether the given graph is significantly better than random in terms of the CIs it entails. To determine whether a given graph is falsifiable by our metric, we implement a second test, which reports whether given graph is “characteristic” enough in terms of the CIs it entails. For this, we compute how many of the random node permutations lie in the same Markov equivalence class (MEC) as the given graph and conclude that the given graph is falsifiable only if the fraction of permuted DAGs in the same MEC as the given graph is “reasonably” small.

The returned EvaluationResult object has two attributes: falsified and falsifiable:

falsifiable: The given graph lies in a different MEC than >= 1-significance_level of the permuted DAGs falsified: The given graph is falsifiable and violates fewer LMCs than >= 1-significance_level of the

permuted DAGs

By default, we only run 1 / significance_level permutations as those are enough to falsify a graph with type I error probability significance_level at some given significance_level. If you are interested in a more exact estimate of the p-value or wish to plot a histogram to see how the given DAG compares to random node permutations, you should set n_permutations to some larger value (e.g. 100 or 1000). If n_permutations=-1 we test on all n_nodes! permutations (the default if plot_histogram=True).

Additionally, this method allows to return suggestions to the user (suggestions=True). This is done by testing for violations of causal minimality via validate_cm.

Related paper:

Eulig, E., Mastakouri, A. A., Blöbaum, P., Hardt, M., & Janzing, D. (2023). Toward Falsifying Causal Graphs Using a Permutation-Based Test. https://arxiv.org/abs/2305.09565

Parameters:
  • causal_graph – A directed acyclic graph (DAG).

  • data – Observations of variables in the DAG.

  • suggestions – Provide suggestions to the user. At the moment the only source of suggestions comes from validating causal minimality (using validate_cm).

  • independence_test – Independence test to use for checking pairwise independencies.

  • conditional_independence_test – Conditional independence test to use.

  • significance_level – Significance level for the permutation test.

  • significance_ci – Significance level for (conditional) independence tests.

  • n_permutations – Number of permutations to perform. If -1 use all n_nodes! permutations.

  • show_progress_bar – Whether to show progress bar over permutations.

  • n_jobs – Number of jobs to use for parallel execution of (conditional) independence tests.

  • plot_histogram – Plot histogram of results from permutation baseline.

  • plot_kwargs – Additional plot arguments to be passed to plot_evaluation_results.

  • allow_data_subset – If True, performs the evaluation even if data is only available for a subset of nodes. If False, raises an error if not all nodes have data available.

Returns:

EvaluationResult

dowhy.gcm.falsify.plot_evaluation_results(evaluation_result, figsize=(8, 3), bins=None, title='', savepath='', display=True)[source]#
dowhy.gcm.falsify.plot_local_insights(causal_graph: DirectedGraph, evaluation_result: EvaluationResult | Dict, method: str | None = FalsifyConst.VALIDATE_LMC)[source]#

For some given graph and evaluation result plot local violations. :param causal_graph: DiGraph :param evaluation_result: EvaluationResult :param method: Method for which to plot violations

dowhy.gcm.falsify.run_validations(causal_graph: ~dowhy.graph.DirectedGraph, data: ~pandas.core.frame.DataFrame, methods: ~typing.Callable | ~typing.Tuple[~typing.Callable, ...] | ~typing.List[~typing.Callable] | None = functools.partial(<function validate_lmc>, independence_test=<function kernel_based>, conditional_independence_test=<function kernel_based>)) Dict[str, Dict][source]#

Validate a given causal graph using observational data and some given methods. If methods are provided, they must be wrapped in a partial object, with their respective parameters. E.g., if one wants to test the local Markov conditions and the pairwise dependencies (unconditional faithfulness), then call

run_validations(G, data, methods=(

partial(validate_lmc, independence_test=…, conditional_independence_test=…), partial(validate_pd, independence_test=…), )

)

Parameters:
  • causal_graph – A directed acyclic graph (DAG).

  • data – Observations of variables in the DAG.

  • methods – Method functions wrapped in wrap_partial. E.g. wrap_partial(validate_lmc, data=data, independence_test=…, conditional_independence_test=…). If no methods are provided we run validate_lmc with optional keyword arguments provided to run_validations.

Returns:

Validation summary as dict.

dowhy.gcm.falsify.validate_cm(causal_graph: ~dowhy.graph.DirectedGraph, data: ~pandas.core.frame.DataFrame, p_values_memory: ~dowhy.gcm.falsify._PValuesMemory | None = None, independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, conditional_independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, significance_level: float = 0.05, n_jobs: int | None = None) Dict[str, int | Dict[tuple, float]][source]#

Function to test causal minimality of a DAG (see [1], Proposition 6.36). [1] J. Peters, D. Janzing, and B. Schölkopf, Elements of Causal Inference: Foundations and Learning Algorithms. Cambridge, MA, USA: MIT Press, 2017.

Parameters:
  • causal_graph – A directed acyclic graph (DAG).

  • data – Observations of variables in the DAG.

  • p_values_memory – _PValuesMemory object, where results of previously performed tests are stored.

  • independence_test – Independence test to use.

  • conditional_independence_test – Conditional independence test to use.

  • significance_level – Significance level for independence tests.

  • n_jobs – Number of jobs to use for parallel execution of (conditional) independence tests.

Returns:

Validation summary as dict.

dowhy.gcm.falsify.validate_lmc(causal_graph: ~dowhy.graph.DirectedGraph, data: ~pandas.core.frame.DataFrame, p_values_memory: ~dowhy.gcm.falsify._PValuesMemory | None = None, independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, conditional_independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, significance_level: float = 0.05, include_unconditional: bool = True, n_jobs: int | None = None) Dict[str, int | Dict[str, float]][source]#

Validate the local markov condition for a given directed graph. Return number of violations and p values for each node.

Parameters:
  • causal_graph – A directed acyclic graph (DAG).

  • data – Observations of variables in the DAG.

  • p_values_memory – _PValuesMemory instance, where results of previously performed tests are stored.

  • independence_test – Test to use for unconditional independencies (only used if include_unconditional=True)

  • conditional_independence_test – Conditional independence test to use for checking local Markov condition.

  • significance_level – Significance level for (conditional) independence tests.

  • include_unconditional – Test also unconditional independencies of root nodes.

  • n_jobs – Number of jobs to use for parallel execution of (conditional) independence tests.

Returns:

Outcome of validation containing number of violations in the graph and p values/violation for each tuple (node, non_desc)

dowhy.gcm.falsify.validate_pd(causal_graph: ~dowhy.graph.DirectedGraph, data: ~pandas.core.frame.DataFrame, p_values_memory: ~dowhy.gcm.falsify._PValuesMemory | None = None, n_pairs: int = -1, independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, significance_level: float = 0.05, adjacent_only: bool = False, n_jobs: int | None = None) Dict[str, int | Dict[tuple, float]][source]#

Validate pairwise dependencies (pd) for a given causal graph and data. Test for each node if it is statistically dependent of all its ancestors.

Parameters:
  • causal_graph – A directed acyclic graph (DAG).

  • data – Observations of variables in the DAG.

  • p_values_memory – _PValuesMemory object, where results of previously performed tests are stored.

  • n_pairs – Evaluate dependencies for n_pairs <= all pairs in the DAG. If n_pairs=-1, evaluate dependencies for all (ancestor, node) pairs (default).

  • independence_test – Independence test to use for checking pairwise dependencies.

  • significance_level – Significance level for independence tests.

  • adjacent_only – Only test adjacent node pairs.

  • n_jobs – Number of jobs to use for parallel execution of (conditional) independence tests.

Returns:

Summary dict: {n_violations: int, n_tests: int, p_values: {(ancestor, node): float, …}}

dowhy.gcm.falsify.validate_tpa(causal_graph: DirectedGraph, causal_graph_reference: DirectedGraph, include_unconditional: bool = True) Dict[str, int][source]#

Graphical criterion to evaluate which pairwise parental d-separations (parental triples) in causal_graph are violated, assuming causal_graph_reference is the ground truth graph. If none are violated, then both graphs lie in the same Markov equivalence class. Specifically we test:

X _|_G’ Y | Z and X _/|_G Y | Z for Y in ND{X}^G’, Z = PA{X}^G

Parameters:
  • causal_graph – Causal graph for which to evaluate parental d-separations (G’)

  • causal_graph_reference – Causal graph where we test if d-separation holds (G)

  • include_unconditional – Test also unconditional independencies of root nodes.

Returns:

Validation summary with number of d-separations implied by causal_graph and number of times these are violated in the graph causal_graph_reference.

dowhy.gcm.feature_relevance module#

This module allows to estimate the feature relevance of inputs with respect to a given model. While these models can be blackbox prediction models, it is also possible to explain causal mechanisms with respect to the direct parents. In these cases, it would be possible to incorporate the noise to represent the part of the generation process that cannot be explained by the parents.

dowhy.gcm.feature_relevance.feature_relevance_distribution(prediction_method: Callable[[ndarray], ndarray], feature_samples: ndarray, subset_scoring_func: Callable[[ndarray, ndarray], ndarray | float], max_num_samples_randomization: int = 5000, max_num_baseline_samples: int = 500, max_batch_size: int = 100, randomize_features_jointly: bool = True, shapley_config: ShapleyConfig | None = None) ndarray[source]#

Estimates the population based feature relevance of the input features for the given prediction_method. This method uses all samples given in feature_samples by comparing the output of the prediction_method given certain features are randomized with the outputs when no features are randomized. The subset_scoring_func defines how these predictions are compared. For instance, the variance of deviations.

If the randomized predictions should rather be compared to the original data, this has (and can) be defined via the set function by ignoring the second input parameter (the predicted values using all feauters). Instead, the original data can be used.

Note: The distribution level relevance is estimated by taking the expectation of the outcome of the set functions when applied to multiple samples. Due to the linearity of the Shapley value estimation, this is equivalent to taking the expectation over the Shapley values.

Related paper: Janzing, D., Minorics, L., & Bloebaum, P. (2020). Feature relevance quantification in explainable AI: A causal problem. In International Conference on Artificial Intelligence and Statistics (pp. 2907-2916). PMLR.

Parameters:
  • prediction_method – A callable that is expected to return a prediction for given samples.

  • feature_samples – Samples from the joint distribution.

  • subset_scoring_func – Set function for estimating the quantity of interest based on the model outcomes. This function expects two inputs; the outcome of the prediction model for some samples if certain features are permuted and the outcome of the model for the same samples when no features were permuted. The set functions represents the comparison between the samples, for instance, the variance of deviations. This is then used as the ‘characteristic function’ in coalition games when estimating the Shapley values.

  • max_num_samples_randomization – Maximum number of samples used for randomizing the feature that are not in the susbet. Consider increasing this number for more accurate results (if enough samples are available) or reducing it for less memory consumption and faster runtime.

  • max_num_baseline_samples – Maximum number of samples on which the set function is evaluated on. These samples are used as fixed observations for features that are in the subset. For instance, in case of taking the mean as set_function_summary_func, this defines the maximum number of samples used to estimate the mean. Consider increasing this number for more accurate results (if enough samples are available) or reducing it for less memory consumption and faster runtime.

  • max_batch_size – Maximum batch size for a estimating the predictions. This has a significant influence on the overall memory usage. If set to -1, all samples are used in one batch.

  • randomize_features_jointly – If set to True, features that are not in a subset are jointly permuted. Note that this still represents an interventional distribution. If set to False, features that are not in a subset are independently permuted. Note: The theory in the linked publication assumes that this is set to True.

  • shapley_config – Config for the Shapley estimator.

Returns:

A numpy array with the feature relevance of each input feature.

dowhy.gcm.feature_relevance.feature_relevance_sample(prediction_method: Callable[[ndarray], ndarray], feature_samples: ndarray, baseline_samples: ndarray, subset_scoring_func: Callable[[ndarray, ndarray], ndarray | float], baseline_target_values: ndarray | None = None, average_set_function: bool = False, max_batch_size: int = 100, randomize_features_jointly: bool = True, shapley_config: ShapleyConfig | None = None) ndarray[source]#

Estimates the feature relevance of the prediction_method for each sample in baseline_noise_samples. This method uses all samples given in feature_samples as ‘background’ samples. This is, they should represent samples from the joint distribution of the input features. The subset_scoring_func defines the comparison between the output of the prediction_method when certain features are randomized and the outputs when no features are randomized. The most common function would be the difference between the expectations.

If the randomized predictions should rather be compared to the original data, this has (and can) be defined via the set function by ignoring the second input parameter (the predicted values using all feauters). Instead, the original data can be used.

Related paper: Janzing, D., Minorics, L., & Bloebaum, P. (2020). Feature relevance quantification in explainable AI: A causal problem. In International Conference on Artificial Intelligence and Statistics (pp. 2907-2916). PMLR.

Parameters:
  • prediction_method – A callable that is expected to return a prediction for given samples.

  • feature_samples – Samples from the joint distribution. These are used as ‘background samples’ to randomize features that are not in a subset.

  • baseline_samples – Samples for which the feature relevance should be estimated.

  • subset_scoring_func – Set function for estimating the quantity of interest based on the model outcomes. This function expects two inputs; the outcome of the prediction model for some samples if certain features are permuted and the outcome of the model for the same samples when no features were permuted. A typical choice for regression models would be the difference between expectations. This is then used as the ‘characteristic function’ in coalition games when estimating the Shapley values.

  • baseline_target_values – These baseline values are compared with the subset specific outcomes of the prediction method. If set to None (default), the baseline values are the outcomes of the given prediction_method applied to the baseline_noise_samples, i.e. the outcome of the empty subset.

  • max_batch_size – Maximum batch size for a estimating the predictions. This has a significant influence on the overall memory usage. If set to -1, all samples are used in one batch.

  • average_set_function – If set to True, the averaged result of the set function applied to each sample of interest is used for estimating the Shapley values. If set to False, Shapley values for each sample of interest are estimated separately.

  • randomize_features_jointly – If set to True, features that are not in a subset are jointly permuted. Note that this still represents an interventional distribution. If set to False, features that are not in a subset are independently permuted. Note: The theory in the linked publication assumes that this is set to True.

  • shapley_config – Config for the Shapley estimator.

Returns:

A numpy array with the feature relevance for each sample in baseline_noise_samples.

dowhy.gcm.feature_relevance.parent_relevance(causal_model: StructuralCausalModel, target_node: Any, parent_samples: DataFrame | None = None, subset_scoring_func: Callable[[ndarray, ndarray], ndarray | float] | None = None, num_samples_randomization: int = 5000, num_samples_baseline: int = 500, max_batch_size: int = 100, shapley_config: ShapleyConfig | None = None) Tuple[Dict[Any, Any], ndarray][source]#

Estimates the distribution based relevance of the direct parents of the given target_node. This is, the relevance of direct parents as input features of the the underlying causal model of target_node. Here, the unobserved noise is considered as a direct parent (input) as well. Samples utilized for the estimation are drawn from the given causal graph.

By default, the used subset_scoring_func is based on the variance between Y and Y’, where Y is the outputs of the causal model and Y’ the outputs of the models when certain features are randomized. In case of continuous data, the feature relevance adds up to Var(Y - Y’).

Note: The feature relevance based on the distribution cannot be directly compared with the feature relevance for single samples. If this is desired, the set function needs to be defined accordingly.

Related paper: Janzing, D., Minorics, L., & Bloebaum, P. (2020). Feature relevance quantification in explainable AI: A causal problem. In International Conference on Artificial Intelligence and Statistics (pp. 2907-2916). PMLR.

Parameters:
  • causal_model – The fitted structural causal model.

  • target_node – Node with the causal model of interest.

  • parent_samples – Samples for the parents of the given target_node. If None is given, new samples are generated based on the graph. These samples are used for randomizing features that are not in the subset.

  • subset_scoring_func – Set function for estimating the quantity of interest based on the model outcomes. This function expects two inputs; the outcome of the causal model for some samples if certain features are permuted and the outcome of the model for the same samples when no features were permuted. The set functions represents the comparison between the samples, for instance, the variance of deviations. This is then used as the ‘characteristic function’ in coalition games when estimating the Shapley values.

  • num_samples_randomization – Number of samples used as background parent samples for evaluating the set function. If no parent_samples are given, this represents the number of generated samples from the joint distribution of the parents and are used for randomizing features that are not in the subset. Consider increasing this number for more accurate results or reducing it for less memory consumption and faster runtime.

  • num_samples_baseline – Number of samples on which the set functions are evaluated on. These samples are used as fixed observations for parents that are in the subset. Consider increasing this number for more accurate results or reducing it for less memory consumption and faster runtime.

  • max_batch_size – Maximum batch size for estimating multiple predictions at once. This has a significant influence on the overall memory usage. If set to -1, all samples are used in one batch.

  • shapley_configShapleyConfig for the Shapley estimator.

Returns:

There are two return vales. A dictionary with the feature relevance for each direct parent of the given target_node and the feature relevance of noise.

dowhy.gcm.fitting_sampling module#

This module provides functionality for fitting probabilistic causal models and drawing samples from them.

dowhy.gcm.fitting_sampling.draw_samples(causal_model: ProbabilisticCausalModel, num_samples: int) DataFrame[source]#

Draws new joint samples from the given graphical causal model. This is done by first generating random samples from root nodes and then propagating causal downstream effects through the graph.

Parameters:
  • causal_model – New samples are generated based on the given causal model.

  • num_samples – Number of samples to draw.

Returns:

A pandas data frame where columns correspond to the nodes in the graph and rows to the drawn joint samples.

dowhy.gcm.fitting_sampling.fit(causal_model: ProbabilisticCausalModel, data: DataFrame, return_evaluation_summary: bool = False)[source]#

Fits the causal mechanism of each node to the data. This is done by iterating over the nodes in the graph and fitting their assigned causal mechanisms individually to the data by calling the corresponding fit function. Due to the modularity assumption, we can fit each mechanism in the graph independently of the other mechanisms. For root nodes, the training data is the corresponding column in the provided data. For non-root nodes, the data is based on a node’s parents and the node itself. Before a node is fitted, this function first validates whether the assigned mechanism is valid, i.e., whether a root node follows a StochasticModel and whether a non-root node follows a ConditionalStochasticModel.

The details of fitting a causal mechanism depend on their implementation. For example, if a node follows an additive noise model X_i = f_i(PA_i) + N_i, where N_i is unobserved noise, the fitting involves fitting the function f_i (which could be any scikit-learn regressor) to the data and modeling the distribution N_i based on the residuals X_i - f_i(PA_i). For more details on how each individual mechanism is fitted, refer to the corresponding documentation, since these are individual implementation details.

This function optionally, returns a summary of different metrics of the causal mechanisms evaluated via cross-validation. Note, this will use the evaluate_causal_model method. For more detailed and extensive evaluations, consider using the evaluate_causal_model method directly.

Parameters:
  • causal_model – The causal model containing the mechanisms of the node that will be fitted.

  • data – Observations of nodes in the causal model.

  • return_evaluation_summary – If True, returns a summary of the performances of the fitted mechanisms using the evaluate_causal_model method. If False, nothing is returned.

Returns:

Optionally, a CausalModelEvaluationResult summarizing the performances of the causal mechanisms via cross-validation.

dowhy.gcm.fitting_sampling.fit_causal_model_of_target(causal_model: ProbabilisticCausalModel, target_node: Any, training_data: DataFrame) None[source]#

Fits only the causal mechanism of the given target node based on the training data.

Parameters:
  • causal_model – The causal model containing the target node.

  • target_node – Target node for which the mechanism is fitted.

  • training_data – Training data for fitting the causal mechanism.

Returns:

None

dowhy.gcm.influence module#

This module provides functions to estimate causal influences.

dowhy.gcm.influence.arrow_strength(causal_model: ProbabilisticCausalModel, target_node: Any, parent_samples: DataFrame | None = None, num_samples_conditional: int = 2000, max_num_runs: int = 5000, tolerance: float = 0.01, n_jobs: int = -1, difference_estimation_func: Callable[[ndarray, ndarray], ndarray | float] | None = None) Dict[Tuple[Any, Any], float][source]#

Computes the causal strength of each edge directed to the target node. The strength of an edge is quantified in terms of distance between conditional distributions of the target node in the original graph and the imputed graph wherein the edge has been removed and the target node is fed a random permutation of the observations of the source node. For more scientific details behind this API, please refer to the research paper below.

Research Paper: Dominik Janzing, David Balduzzi, Moritz Grosse-Wentrup, Bernhard Schölkopf. Quantifying Causal Influences. The Annals of Statistics, Vol. 41, No. 5, 2324-2358, 2013.

Parameters:
  • causal_model – The probabilistic causal model for whose target node we compute the strength of incoming edges for.

  • target_node – The target node whose incoming edges’ strength is to be computed.

  • parent_samples – Optional samples from the parents of the target_node. If None are given, they are generated based on the provided causal model. Providing observational data can help to mitigate misspecifications in the graph, such as missing interactions between root nodes or confounders.

  • num_samples_conditional – Sample size to use for estimating the distance between distributions. The more more samples, the higher the accuracy.

  • max_num_runs – The maximum number of times to resample and estimate the strength to report the average strength.

  • tolerance – If the percentage change in the estimated strength between two consecutive runs falls below the specified tolerance, the algorithm will terminate before reaching the maximum number of runs. A value of 0.01 would indicate a change of less than 1%. However, in order to minimize the impact of randomness, there must be at least three consecutive runs where the change is below the threshold.

  • n_jobs – The number of jobs to run in parallel. Set it to -1 to use all processors.

  • difference_estimation_func – Optional: How to measure the distance between two distributions. By default, the difference of the variance is estimated for a continuous target node and the KL divergence for a categorical target node.

Returns:

Causal strength of each edge.

dowhy.gcm.influence.arrow_strength_of_model(conditional_stochastic_model: ConditionalStochasticModel, input_samples: ndarray, num_samples_from_conditional: int = 2000, max_num_runs: int = 5000, tolerance: float = 0.01, n_jobs: int = -1, difference_estimation_func: Callable[[ndarray, ndarray], ndarray | float] | None = None, input_subsets: List[List[int]] | None = None) ndarray[source]#
dowhy.gcm.influence.intrinsic_causal_influence(causal_model: StructuralCausalModel, target_node: Any, prediction_model: PredictionModel | ClassificationModel | str = 'approx', attribution_func: Callable[[ndarray, ndarray], float] | None = None, num_training_samples: int = 100000, num_samples_randomization: int = 250, num_samples_baseline: int = 1000, max_batch_size: int = -1, auto_assign_quality: AssignmentQuality = AssignmentQuality.GOOD, shapley_config: ShapleyConfig | None = None) Dict[Any, float][source]#

Computes the causal contribution of each upstream noise term of the target node (including the noise of the target itself) to the statistical property (e.g. mean, variance) of the target. We call this contribution intrinsic as noise terms, by definition, do not inherit properties of observed parents. The contribution of each noise term is then the intrinsic causal contribution of the corresponding node. For more scientific details, please refer to the paper below.

Research Paper: Janzing et al. Quantifying causal contributions via structure preserving interventions. arXiv:2007.00714, 2021.

Parameters:
  • causal_model – The structural causal model for whose target node we compute the intrinsic causal influence of its ancestors.

  • target_node – Target node whose statistical property is to be attributed.

  • prediction_model – Prediction model for estimating the functional relationship between subsets of ancestor noise terms and the target node. This can be an instance of a PredictionModel, the string ‘approx’ or the string ‘exact’. With ‘exact’, the underlying causal models in the graph are utilized directly by propagating given noise inputs through the graph, which ensures that generated samples follow the fitted models. In contrast, the ‘approx’ method involves selecting and training a suitable model based on data sampled from the graph. This might lead to deviations from the outcomes of the fitted models, but is faster and can be more robust in certain settings.

  • attribution_func – Optional attribution function to measure the statistical property of the target node. This function expects two inputs; predictions after the randomization of certain features (i.e. samples from noise nodes) and a baseline where no features were randomized. The baseline predictions can be typically ignored if one is interested in uncertainty measures such as entropy or variance, but they might be relevant if, for instance, these shall be estimated based on the residuals. By default, entropy is used if prediction model is a classifier, variance otherwise.

  • num_training_samples – Number of samples drawn from the graphical causal model that are used for fitting the prediction_model (if necessary).

  • num_samples_randomization – Number of noise samples drawn from the graphical causal model that are used for evaluating the set function. Here, these samples are samples from the noise distributions used for randomizing features that are not in the subset.

  • num_samples_baseline – Number of noise samples drawn from the graphical causal model that are used for evaluating the set function. Here, these samples are used as fixed observations for features that are in the subset.

  • max_batch_size – Maximum batch size for estimating the predictions from evaluation samples. This has a significant impact on the overall memory usage. If set to -1, all samples are used in one batch.

  • auto_assign_quality – Auto assign quality for the ‘approx’ prediction_model option.

  • shapley_configShapleyConfig for the Shapley estimator.

Returns:

Intrinsic causal contribution of each ancestor node to the statistical property defined by the attribution_func of the target node.

dowhy.gcm.influence.intrinsic_causal_influence_sample(causal_model: InvertibleStructuralCausalModel, target_node: Any, baseline_samples: DataFrame, noise_feature_samples: DataFrame | None = None, prediction_model: PredictionModel | ClassificationModel | str = 'approx', subset_scoring_func: Callable[[ndarray, ndarray], ndarray | float] | None = None, num_noise_feature_samples: int = 5000, max_batch_size: int = 100, auto_assign_quality: AssignmentQuality = AssignmentQuality.GOOD, shapley_config: ShapleyConfig | None = None) List[Dict[Any, Any]][source]#

Estimates the intrinsic causal impact of upstream nodes on a specified target_node, using the provided baseline_samples as a reference. In this context, observed values are attributed to the noise factors present in upstream nodes. Compared to intrinsic_causal_influence, this method quantifies the influences with respect to single observations instead of the distribution. Note that the current implementation only supports non-categorical data, since the noise terms need to be reconstructed.

Research Paper: Janzing et al. Quantifying causal contributions via structure preserving interventions. arXiv:2007.00714, 2021.

Parameters:
  • causal_model – The fitted invertible structural causal model.

  • target_node – Node of interest.

  • baseline_samples – Samples for which the influence should be estimated.

  • noise_feature_samples – Optional noise samples of upstream nodes used as ‘background’ samples. If None is given, new noise samples are generated based on the graph. These samples are used for randomizing features that are not in the subset.

  • prediction_model – Prediction model for estimating the functional relationship between subsets of ancestor noise terms and the target node. This can be an instance of a PredictionModel, the string ‘approx’ or the string ‘exact’. With ‘exact’, the underlying causal models in the graph are utilized directly by propagating given noise inputs through the graph, which ensures that generated samples follow the fitted models. In contrast, the ‘approx’ method involves selecting and training a suitable model based on data sampled from the graph. This might lead to deviations from the outcomes of the fitted models, but is faster and can be more robust in certain settings.

  • subset_scoring_func – Set function for estimating the quantity of interest based. This function expects two inputs; the outcome of the model for some samples if certain features are permuted and the outcome of the model for the same samples when no features were permuted. By default, the difference between means of these samples are estimated.

  • num_noise_feature_samples – If no noise_feature_samples are given, noise samples are drawn from the graph. This parameter indicates how many.

  • max_batch_size – Maximum batch size for estimating multiple predictions at once. This has a significant influence on the overall memory usage. If set to -1, all samples are used in one batch.

  • auto_assign_quality – Auto assign quality for the ‘approx’ prediction_model option.

  • shapley_configShapleyConfig for the Shapley estimator.

Returns:

A list of dictionaries indicating the intrinsic causal influence of a node on the target for a particular sample. This is, each dictionary belongs to one baseline sample.

dowhy.gcm.model_evaluation module#

class dowhy.gcm.model_evaluation.CausalModelEvaluationResult(mechanism_performances: Dict[str, dowhy.gcm.model_evaluation.MechanismPerformanceResult] | NoneType = None, pnl_assumptions: Dict[Any, Tuple[float, str, float | NoneType]] | NoneType = None, graph_falsification: dowhy.gcm.falsify.EvaluationResult | NoneType = None, overall_kl_divergence: float | NoneType = None, plot_falsification_histogram: bool = True)[source]#

Bases: object

graph_falsification: EvaluationResult | None = None#
mechanism_performances: Dict[str, MechanismPerformanceResult] | None = None#
overall_kl_divergence: float | None = None#
plot_falsification_histogram: bool = True#
pnl_assumptions: Dict[Any, Tuple[float, str, float | None]] | None = None#
class dowhy.gcm.model_evaluation.EvaluateCausalModelConfig(mechanism_evaluation_kfolds: int = 5, baseline_models_regression: ~typing.List[~typing.Callable[[], ~dowhy.gcm.ml.prediction_model.PredictionModel]] | None = None, baseline_models_classification: ~typing.List[~typing.Callable[[], ~dowhy.gcm.ml.prediction_model.PredictionModel]] | None = None, independence_test_invertible: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = functools.partial(<function kernel_based>, use_bootstrap=False), significance_level_invertible: float = 0.05, fdr_control_method_invertible: str | None = 'bonferroni', bootstrap_runs_invertible: int = 5, max_num_permutations_falsify: int = 50, independence_test_falsify: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = functools.partial(<function kernel_based>, use_bootstrap=False, max_num_samples_run=500), conditional_independence_test_falsify: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray, ~numpy.ndarray], float] = functools.partial(<function kernel_based>, use_bootstrap=False, max_num_samples_run=500), falsify_graph_significance_level: float = 0.2, n_jobs: int | None = None)[source]#

Bases: object

Config for the causal model evaluation.

Parameters for the causal model evaluation method. See the parameter description for more details.

Parameters:
  • mechanism_evaluation_kfolds – Number of folds for evaluating the causal mechanisms.

  • baseline_models_regression – Baseline models for continuous nodes. The causal mechanisms assigned to the nodes in the graph are compared against additive noise models with these baseline regression models.

  • baseline_models_classification – Baseline models for categorical nodes. The causal mechanisms assigned to the nodes in the graph are compared against these baseline models.

  • independence_test_invertible – A method for testing the independence between inputs and estimated noise of invertible causal mechanisms. This is used to evaluate whether the made model assumptions hold.

  • significance_level_invertible – The significance level for rejecting the null hypothesis that inputs and residuals are independent.

  • fdr_control_method_invertible – The false discovery rate control method when running multiple hypothesis tests. Note that we can assume that the tests are independent.

  • bootstrap_runs_invertible – The independence tests are only run on a small subset of samples. This parameter indicates how many subsets the tests should be performed on. The resulting p-values are aggregated using a family-wise error control method.

  • max_num_permutations_falsify – Number of permutations used for falsifying the given graph structure.

  • independence_test_falsify – A method for testing the independence between two variables used for falsifying the given graph structure. Note that the variables can be multivariate.

  • conditional_independence_test_falsify – A method for testing the independence between two variables given a third one used for falsifying the given graph structure. Note that the variables can be multivariate.

  • falsify_graph_significance_level – Significance level for rejecting the given graph based on the permutation tests. The default of 0.2 here is higher than the usual 0.05. Consider reducing it to be more strict about falsifying the graph.

  • n_jobs – Number of parallel jobs. Whenever the evaluation method supports parallelization, this parameter is used.

class dowhy.gcm.model_evaluation.MechanismPerformanceResult(node_name: Any, is_root: bool, crps: float | NoneType, kl_divergence: float | NoneType, mse: float | NoneType, nmse: float | NoneType, r2: float | NoneType, f1: float | NoneType, count_better_performance: int | NoneType, best_baseline_model: str | NoneType, total_number_baselines: int, best_baseline_performance: float | NoneType)[source]#

Bases: object

dowhy.gcm.model_evaluation.crps(X: ndarray, Y: ndarray, conditional_sampling_method: Callable[[ndarray], ndarray], num_conditional_samples: int = 100, normalize: bool = True) float[source]#

Estimates the (normalized) Continuous Ranked Probability Score (CRPS) based on the given data and generation process. This is used to check the calibration of a probabilistic prediction.

Parameters:
  • X – Observations of the input features.

  • Y – Observations of the corresponding target value.

  • conditional_sampling_method – Method to sample from the conditional given an input sample from X.

  • num_conditional_samples – Number of samples that should be drawn from the conditional to estimate the CRPS.

  • normalize – If True, the target values are normalized in the continuous case by the standard deviation of the expected Y values. By this, the CRPS become comparable across different scales.

Returns:

The Continuous Ranked Probability Score.

dowhy.gcm.model_evaluation.evaluate_causal_model(causal_model: ProbabilisticCausalModel, data: DataFrame, max_num_samples: int = -1, evaluate_causal_mechanisms: bool = True, compare_mechanism_baselines: bool = False, evaluate_invertibility_assumptions: bool = True, evaluate_overall_kl_divergence: bool = True, evaluate_causal_structure: bool = True, config: EvaluateCausalModelConfig | None = None) CausalModelEvaluationResult[source]#

Evaluates the given causal model by running different evaluations.

Evaluation of Causal Mechanisms: The quality of the causal mechanisms is assessed using k-fold cross validation. This means that the models are trained from scratch multiple times, which might take a significant amount of time for larger models. Within each fold, the models are assessed by different metrics. For all models, the continuous ranked probability score (CRPS) normalized by the standard deviation is estimated, an important metric that provides insights to the model performance as well as its calibration. Further, if the node is numerical, the mean squared error (MSE), the normalized MSE (normalized by the variance) and the R2 coefficient is computed. In case of categorical nodes, the F1 score is computed instead. Optionally, the mechanisms’ CRPS are compared with baseline models to see if there are baseline models performing significantly better.

Evaluation of Invertible Functional Causal Model Assumption: Invertible causal mechanisms rely on the assumption that the inputs are independent of the reconstructed noise. This is, assuming there are no hidden confounders, the noise should be independent of parents of a node. This can be evaluated by testing statistical independence between the reconstructed noise and the used input samples.

Evaluation of Generated Distribution: The distribution generated by the causal model is compared with the observed data using KL divergence. To avoid estimating the KL divergence of high-dimensional data, we approximate it by calculating the mean KL divergence across the individual marginal KL divergences for each node.

Evaluation of the Causal Graph Structure: The causal graph structure is evaluated by running a method to falsify the graph. The method involves conducting independence tests and may consume a significant amount of time for more extensive graphs. The results provide an indication of whether the graph is rejected or not. It’s important to note that a non-rejected graph does not guarantee its correctness. It simply means that the evaluation did not find substantial evidence to refute the causal graph based on the provided data. However, a rejected graph might indicate potential issues with its structure.

The outcomes of these evaluation methods should be interpreted with caution, and bad fits should not be over-interpreted. Nonetheless, the results can offer insights into the performance of the causal model and potential areas for improvement.

Parameters:
  • causal_model – The causal model to evaluate.

  • data – The data used for the evaluation.

  • max_num_samples – The maximum number of samples used for the evaluation. If the runtime is too slow, consider setting this to a smaller value. The default -1 indicates that all samples are used.

  • evaluate_causal_mechanisms – If True, the causal mechanisms are evaluated.

  • compare_mechanism_baselines – If True, the causal mechanisms are compared with baseline models to see if there are model choices that perform significantly better. If False, this comparison is skipped. This is ignored if evaluate_causal_mechanisms is False.

  • evaluate_invertibility_assumptions – If True, the model assumption represented by invertible causal mechanisms is tested.

  • evaluate_overall_kl_divergence – If True, the KL divergence between the generated and the observed data is estimated.

  • evaluate_causal_structure – If True, the causal graph structure is evaluated.

Returns:

A summary of the evaluation.

dowhy.gcm.model_evaluation.nmse(y_true: ndarray, y_pred: ndarray, squared: bool = False) float[source]#

Estimates the Normalized Mean Squared Error (NMSE) based on the given samples. This is, the root mean squared error normalized by the variance of the observed values.

Parameters:
  • y_true – Observed values.

  • y_pred – Predicted values.

  • squared – If True, returns the normalized MSE if False, it returns the normalized RMSE.

Returns:

The normalized MSE.

dowhy.gcm.shapley module#

This module provides functionality for shapley value estimation.

class dowhy.gcm.shapley.ShapleyApproximationMethods(value)[source]#

Bases: Enum

AUTO: Using EXACT when number of players is below 6 and EARLY_STOPPING otherwise. EXACT: Generate all possible subsets and estimate Shapley values with corresponding subset weights. EXACT_FAST: Generate all possible subsets and estimate Shapley values via weighed least squares regression. This can

be faster, but, depending on the set function, numerically less stable.

SUBSET_SAMPLING: Randomly samples subsets and estimate Shapley values via weighed least squares regression. Here,

only a certain number of randomly drawn subsets are used.

EARLY_STOPPING: Estimate Shapley values based on a few randomly generated permutations. Stop the estimation process

when the Shapley values do not change much on average anymore between runs.

PERMUTATION: Estimates Shapley values based on a fixed number of randomly generated permutations. By fine tuning

hyperparameters, this can be potentially faster than the early stopping approach due to a better utilization of the parallelization.

AUTO = (0,)#
EARLY_STOPPING = (3,)#
EXACT = (1,)#
EXACT_FAST = (2,)#
PERMUTATION = (4,)#
SUBSET_SAMPLING = (5,)#
class dowhy.gcm.shapley.ShapleyConfig(approximation_method: ShapleyApproximationMethods = ShapleyApproximationMethods.AUTO, num_permutations: int = 25, num_subset_samples: int = 5000, min_percentage_change_threshold: float = 0.05, n_jobs: int | None = None)[source]#

Bases: object

Config for estimating Shapley values.

Parameters:
  • approximation_method – Type of approximation methods (see ShapleyApproximationMethods).

  • num_permutations – Number of permutations used for approximating the Shapley values. This value is only used for PERMUTATION and EARLY_STOPPING. In both cases, it indicates the maximum number of permutations that are evaluated. Note that EARLY_STOPPING might stop before reaching the number of permutations if the change in Shapley values fall below min_percentage_change_threshold.

  • num_subset_samples – Number of subsets used for the SUBSET_SAMPLING method. This value is not used otherwise.

  • min_percentage_change_threshold – This parameter is only relevant for EARLY_STOPPING and indicates the minimum required change in percentage of the Shapley values between two runs before the estimation stops. For instance, with a value of 0.01 the estimation would stop if all Shapley values change less than 0.01 per run. To mitigate the impact of randomness, the changes need to stay below the threshold for at least 2 consecutive runs.

  • n_jobs – Number of parallel jobs.

dowhy.gcm.shapley.estimate_shapley_values(set_func: Callable[[ndarray], float | ndarray], num_players: int, shapley_config: ShapleyConfig | None = None) ndarray[source]#

Estimates the Shapley values based on the provided set function. A set function here is defined by taking a (subset) of players and returning a certain utility value. This is in the context of attributing the value of the i-th player to a subset of players S by evaluating v(S u {i}) - v(S), where v is the set function and i is not in S. While we use the term ‘player’ here, this is often a certain feature/variable.

The input of the set function is a binary vector indicating which player is part of the set. For instance, given 4 players (1,2,3,4) and a subset only contains players 1,2,4, then this is indicated by the vector [1, 1, 0, 1]. The function is expected to return a numeric value based on this input.

Note: The set function can be arbitrary and can resemble computationally complex operations. Keep in mind that the estimation of Shapley values can become computationally expensive and requires a lot of memory. If the runtime is too slow, consider changing the default config.

Parameters:
  • set_func – A set function that expects a binary vector as input which specifies which player is part of the subset.

  • num_players – Total number of players.

  • shapley_config – A config object for indicating the approximation method and other parameters. If None is given, a default config is used. For faster runtime or more accurate results, consider creating a custom config.

Returns:

A numpy array representing the Shapley values for each player, i.e. there are as many Shapley values as num_players. The i-th entry belongs to the i-th player. Here, the set function defines which index belongs to which player and is responsible to keep it consistent.

dowhy.gcm.stats module#

dowhy.gcm.stats.estimate_ftest_pvalue(X_training_a: ndarray, X_training_b: ndarray, Y_training: ndarray, X_test_a: ndarray, X_test_b: ndarray, Y_test: ndarray) float[source]#

Estimates the p-value for the null hypothesis that the same regression error with less parameters can be achieved. This is, a linear model trained on a data set A with d number of features has the same performance (in terms of squared error) relative to the number of features as a model trained on a data set B with k number features, where k < d. Here, both data sets need to have the same target values. A small p-value would indicate that the model performances are significantly different.

Note that all given test samples are utilized in the f-test.

See https://en.wikipedia.org/wiki/F-test#Regression_problems for more details.

Parameters:
  • X_training_a – Input training samples for model A.

  • X_training_b – Input training samples for model B. These samples should have less features than samples in X_training_a.

  • Y_training – Target training values.

  • X_test_a – Test samples for model A.

  • X_test_b – Test samples for model B.

  • Y_test – Test values.

Returns:

A p-value on [0, 1].

dowhy.gcm.stats.marginal_expectation(prediction_method: Callable[[ndarray], ndarray], feature_samples: ndarray, baseline_samples: ndarray, baseline_feature_indices: List[int], return_averaged_results: bool = True, feature_perturbation: str = 'randomize_columns_jointly', max_batch_size: int = -1) ndarray[source]#

Estimates the marginal expectation for samples in baseline_noise_samples when randomizing features that are not part of baseline_feature_indices. This is, this function estimates

y^i = E[Y | do(x^i_s)] := int_x_s’ E[Y | x^i_s, x_s’] p(x_s’) d x_s’,

where x^i_s is the i-th sample from baseline_noise_samples, s denotes the baseline_feature_indices and x_s’ ~ X_s’ denotes the randomized features that are not in s. For an approximation of the integral, the given prediction_method is evaluated multiple times for the same x^i_s, but different x_s’ ~ X_s’.

Parameters:

prediction_method – Prediction method of interest. This should expect a numpy array as input for making

predictions. :param feature_samples: Samples from the joint distribution. These are used for randomizing the features that are not in

baseline_feature_indices.

Parameters:
  • baseline_samples – Samples for which the marginal expectation should be estimated.

  • baseline_feature_indices – Column indices of the features in s. These values for these features are remain constant when estimating the expectation.

  • return_averaged_results – If set to True, the expectation over all evaluated samples for the i-th

baseline_noise_samples is returned. If set to False, all corresponding results for the i-th sample are returned. :param feature_perturbation: Type of feature permutation:

‘randomize_columns_independently’: Each feature not in s is randomly permuted separately. ‘randomize_columns_jointly’: All features not in s are jointly permuted. Note that this still represents an interventional distribution.

Parameters:

max_batch_size – Maximum batch size for a estimating the predictions. This has a significant influence on the

overall memory usage. If set to -1, all samples are used in one batch. :return: If return_averaged_results is False, a numpy array where the i-th entry belongs to the marginal expectation of x^i_s when randomizing the remaining features. If return_averaged_results is True, a two dimensional numpy array where the i-th entry contains all predictions for x^i_s when randomizing the remaining features.

dowhy.gcm.stats.merge_p_values_average(p_values: ndarray | List[float], randomization: bool = False) float[source]#

A statistically sound method to merge multiple potentially dependent p-values into one. This is a statistically improved (i.e., more powerful) version of the “twice the average” rule, following Theorem 5.3 (second equation, F_UA) in

  1. Gasparini, R. Wang, and A. Ramdas, Combining exchangeable p-values, arXiv 2404.03484, 2024

Note, if randomization is False, we have u = 1 here. Generally, randomization requires fewer assumptions but leads to non-deterministic behavior.

Parameters:
  • p_values – A list or array of p-values.

  • randomization – If True, u is taken uniformly randomly from [0, 1] (non-deterministic). If False, u is set

to 1 (deterministic). Randomization is generally more powerful but provides non-deterministic results. :return: A single p-value based on the given p-values.

dowhy.gcm.stats.merge_p_values_fdr(p_values: ndarray | List[float], fdr_method: str = 'fdr_bh') float[source]#

Merges p-values to represent the global null hypothesis that all hypotheses represented by the p-values are true.

Here, we first adjust the given p-values based on the provided false discovery rate (FDR) control method, and then return the minimum.

Parameters:
  • p_values – A list or array of p-values.

  • fdr_method – The false discovery rate control method. For various options, please refer to this page.

Returns:

The minimum p-value after adjusting based on the given FDR method.

dowhy.gcm.stats.merge_p_values_quantile(p_values: ndarray | List[float], p_values_scaling: ndarray | None = None, quantile: float = 0.5) float[source]#

Applies a quantile based approach to merge multiple potentially dependent p-values to one. This is based on the approach described in:

Meinshausen, N., Meier, L. and Buehlmann, P., p-values for high-dimensional regression, J. Amer. Statist. Assoc.104 1671–1681, 2009

Parameters:
  • p_values – A list or array of p-values.

  • p_values_scaling – An optional list of scaling factors for each p-value.

  • quantile – The quantile used for the p-value adjustment. By default, this is the median (0.5).

Returns:

The p-value that lies on the quantile threshold. Note that this is the quantile based on scaled values p_values / quantile.

dowhy.gcm.stats.permute_features(feature_samples: ndarray, features_to_permute: List[int] | ndarray, randomize_features_jointly: bool) ndarray[source]#

dowhy.gcm.stochastic_models module#

This module defines multiple implementations of the abstract class StochasticModel.

class dowhy.gcm.stochastic_models.BayesianGaussianMixtureDistribution[source]#

Bases: StochasticModel

clone()[source]#
draw_samples(num_samples: int) ndarray[source]#

Draws samples for the fitted model.

fit(X: ndarray) None[source]#

Fits the model according to the data.

class dowhy.gcm.stochastic_models.EmpiricalDistribution[source]#

Bases: StochasticModel

An implementation of a stochastic model that uniformly samples from data samples. By randomly returning a sample from the training data set, this model represents a parameter free representation of the marginal distribution of the training data. However, it will not generate unseen data points. For this, consider BayesianGaussianMixtureDistribution.

clone()[source]#
property data: ndarray#
draw_samples(num_samples: int) ndarray[source]#

Draws samples for the fitted model.

fit(X: ndarray) None[source]#

Fits the model according to the data.

class dowhy.gcm.stochastic_models.ScipyDistribution(scipy_distribution: rv_continuous | rv_discrete | None = None, **parameters)[source]#

Bases: StochasticModel

Represents any parametric distribution that can be modeled by scipy.

Initializes a stochastic model that allows to sample from a parametric distribution implemented in Scipy.

For instance, to use a beta distribution with parameters a=2 and b=0.5:

ScipyDistribution(stats.beta, a=2, b=0.5)

Or a Gaussian distribution with mean=0 and standard deviation 2:

ScipyDistribution(stats.norm, loc=2, scale=0.5)

Note that the parameter names need to coincide with the parameter names in the corresponding Scipy implementations. See https://docs.scipy.org/doc/scipy/tutorial/stats.html for more information.

Parameters:
  • scipy_distribution – A continuous or discrete distribution parametric distribution implemented in Scipy.

  • parameters – Set of parameters of the parametric distribution.

clone()[source]#
draw_samples(num_samples: int) ndarray[source]#

Draws samples for the fitted model.

static find_suitable_continuous_distribution(distribution_samples: ndarray, divergence_threshold: float = 0.01) Tuple[rv_continuous, Dict[str, float]][source]#

Tries to find the best fitting continuous parametric distribution of given samples. This is done by fitting different parametric models and selecting the one with the smallest KL divergence between observed and generated samples.

fit(X: ndarray) None[source]#

Fits the model according to the data.

static map_scipy_distribution_parameters_to_names(scipy_distribution: rv_continuous | rv_discrete, parameters: Tuple[float]) Dict[str, float][source]#

Helper function to obtain a mapping from parameter name to parameter value. Depending whether the distribution is discrete or continuous, there are slightly different parameter names. The given parameters are assumed to follow the order as provided by the scipy fit function.

Parameters:
  • scipy_distribution – The scipy distribution.

  • parameters – The values of the corresponding parameters of the distribution. Here, it is expected to follow the same order as defined by the scipy fit function.

Returns:

A dictionary that maps a parameter name to its value.

property parameters: Dict[str, float]#
property scipy_distribution: rv_continuous | rv_discrete | None#

dowhy.gcm.uncertainty module#

Functions to estimate uncertainties such as entropy, KL divergence etc.

dowhy.gcm.uncertainty.estimate_entropy_discrete(X: ndarray) float[source]#

Estimates the entropy assuming the data in X is discrete.

Parameters:

X – Discrete samples.

Returns:

Entropy of X.

dowhy.gcm.uncertainty.estimate_entropy_kmeans(X: ndarray) float[source]#

Related paper: Kozachenko, L., & Leonenko, N. (1987). Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii, 23(2), 9–16.

dowhy.gcm.uncertainty.estimate_entropy_of_probabilities(X: ndarray) float[source]#

Estimates the entropy of each probability vector (row wise) in X separately and returns the mean over all results.

dowhy.gcm.uncertainty.estimate_entropy_using_discretization(X: ndarray, bin_width: float = 1) float[source]#
dowhy.gcm.uncertainty.estimate_gaussian_entropy(X: ndarray) float[source]#

Entropy with respect to standardized variables.

dowhy.gcm.uncertainty.estimate_variance(X: ndarray) float[source]#

dowhy.gcm.unit_change module#

This module provides the APIs for attributing the change in the output value of a deterministic mechanism for a statistical unit.

class dowhy.gcm.unit_change.LinearPredictionModel[source]#

Bases: object

abstract property coefficients: ndarray#
class dowhy.gcm.unit_change.SklearnLinearRegressionModel(sklearn_mdl: LinearModel)[source]#

Bases: SklearnRegressionModel, LinearPredictionModel

property coefficients: ndarray#
dowhy.gcm.unit_change.unit_change(background_df: DataFrame, foreground_df: DataFrame, input_column_names: List[str], background_mechanism: PredictionModel, foreground_mechanism: PredictionModel | None = None, shapley_config: ShapleyConfig | None = None) DataFrame[source]#

This function attributes the change in the output value of a deterministic mechanism for a statistical unit to each input and optionally for the mechanism if foreground_mechanism is provided. The technical method is described in the following research paper: Kailash Budhathoki, George Michailidis, Dominik Janzing. Explaining the root causes of unit-level changes. arXiv, 2022.

Parameters:
  • background_df – The background dataset.

  • foreground_df – The foreground dataset.

  • input_column_names – The names of the input columns.

  • background_mechanism – The background mechanism. If the mechanism does not change, then this mechanism is used for attribution.

  • foreground_mechanism – The foreground mechanism. If provided, the method also attributes the output change to the change in the mechanism.

  • shapley_config – The configuration for calculating Shapley values.

Returns:

A dataframe containing the contributions of each input and optionally the mechanism to the change in the output values of the deterministic mechanism(s) for given inputs.

dowhy.gcm.unit_change.unit_change_linear(background_mechanism: LinearPredictionModel, background_df: DataFrame, foreground_mechanism: LinearPredictionModel, foreground_df: DataFrame, input_column_names: List[str]) DataFrame[source]#

Calculates the contributions of mechanism and each input to the change in the output values of a linear deterministic mechanism.

Parameters:
  • background_mechanism – The linear background mechanism.

  • background_df – The background data.

  • foreground_mechanism – The linear foreground mechanism.

  • foreground_df – The foreground data.

  • input_column_names – The names of the input columns in both dataframes.

Returns:

A pandas dataframe with attributions to each cause for the change in each output row of provided dataframes.

dowhy.gcm.unit_change.unit_change_linear_input_only(mechanism: LinearPredictionModel, background_df: DataFrame, foreground_df: DataFrame, input_column_names: List[str]) DataFrame[source]#

Calculates the contributions of each input to the change in the output values of a linear deterministic mechanism.

Parameters:
  • mechanism – The linear mechanism.

  • background_df – The background data.

  • foreground_df – The foreground data.

  • input_column_names – The names of the input (features) columns in both dataframes.

Returns:

A pandas dataframe with attributions to each cause for the change in each output row of provided dataframes.

dowhy.gcm.unit_change.unit_change_nonlinear(background_mechanism: PredictionModel, background_df: DataFrame, foreground_mechanism: PredictionModel, foreground_df: DataFrame, input_column_names: List[str], shapley_config: ShapleyConfig | None = None) DataFrame[source]#

Calculates the contributions of mechanism and each input to the change in the output values of a non-linear deterministic mechanism. The technical method is described in the following research paper: Kailash Budhathoki, George Michailidis, Dominik Janzing. Explaining the root causes of unit-level changes. arXiv, 2022.

Parameters:
  • background_mechanism – The background mechanism.

  • background_df – The background data.

  • foreground_mechanism – The foreground mechanism.

  • foreground_df – The foreground data.

  • input_column_names – The names of the input (features) columns in both dataframes.

  • shapley_config – The configuration for calculating Shapley values.

Returns:

A pandas dataframe with attributions to each cause for the change in each output row of provided dataframes.

dowhy.gcm.unit_change.unit_change_nonlinear_input_only(mechanism: PredictionModel, background_df: DataFrame, foreground_df: DataFrame, input_column_names: List[str], shapley_config: ShapleyConfig | None = None) DataFrame[source]#

Calculates the contributions of each input to the change in the output values of a non-linear deterministic mechanism. The technical method is a modification of the attribution method described in the following research paper, without mechanism as a player: Kailash Budhathoki, George Michailidis, Dominik Janzing. Explaining the root causes of unit-level changes. arXiv, 2022.

Parameters:
  • mechanism – The mechanism.

  • background_df – The background data.

  • foreground_df – The foreground data.

  • input_column_names – The names of the input (features) columns in both dataframes.

  • shapley_config – The configuration for calculating Shapley values.

Returns:

A pandas dataframe with attributions to each cause for the change in each output row of provided dataframes.

dowhy.gcm.validation module#

Contains a method to reject the causal graph and validate causal mechanisms such as post non-linear models.

class dowhy.gcm.validation.RejectionResult(value)[source]#

Bases: Enum

An enumeration.

NOT_REJECTED = (<enum.auto object>,)#
REJECTED = 1#
dowhy.gcm.validation.refute_causal_structure(causal_graph: ~dowhy.graph.DirectedGraph, data: ~pandas.core.frame.DataFrame, independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, conditional_independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, significance_level: float = 0.05, fdr_control_method: str | None = 'fdr_bh') Tuple[RejectionResult, Dict[str, Dict[str, Dict[str, bool | float | Dict[str, bool | float]]]]][source]#

Validates the assumptions in a causal graph against data. To this end, at each node, we test if the node is dependent on each of its parents, and test the local Markov condition. Note that valid local Markov conditions also imply a valid global Markov condition.

Parameters:
  • causal_graph – A directed acyclic graph (DAG).

  • data – Observations of variables in the DAG.

  • independence_test – Independence test to use for checking edge dependencies.

  • conditional_independence_test – Conditional independence test to use for checking local Markov condition.

  • significance_level – Significance level for (conditional) independence tests.

  • fdr_control_method

    Method for false discovery rate (FDR) control. For various options, please refer to this page.

Returns:

Outcome of the validation process. The first element of the tuple indicates whether the graph is valid w.r.t. given data, and the second element gives the summary of tests at each node. An example for X->Y->Z:

[True, {'X': {'local_markov_test': {}, 'edge_dependence_test': {}},
        'Y': {'local_markov_test': {}, 'edge_dependence_test': {'X': {'p_value': 0.5, 'fdr_adjusted_p_value': 0.5, 'success': True}}},
        'Z': {'local_markov_test': {'p_value': 0.0, 'fdr_adjusted_p_value': 0.5, 'success': False},
              'edge_dependence_test': {'Y': {'p_value': 0.5, 'fdr_adjusted_p_value': 0.5, 'success': True}}}}]
dowhy.gcm.validation.refute_invertible_model(causal_model: ~dowhy.gcm.causal_models.InvertibleStructuralCausalModel, data: ~pandas.core.frame.DataFrame, independence_test: ~typing.Callable[[~numpy.ndarray, ~numpy.ndarray], float] = <function kernel_based>, significance_level: float = 0.05, fdr_control_method: str | None = None) RejectionResult[source]#

Validate the assumption that the structural causal models can be represented by a InvertibleFunctionalCausalModel (e.g. the causal mechanisms are AdditiveNoiseModels and/or PostNonlinearModels). For this, it is checked if the residual of a causal mechanism is independent of the mechanism’s input (i.e. we assume causal sufficiency here). For instance, PostNonlinearModels represent

Y = f(g(X) + N),

where f is invertible (g does not need to be), X are the parents of Y and N is (assumed to be) independent noise. The latter point is important here. For given data, we can then reconstruct N and perform an independence test between X and N.

Note that this method only validates the causal mechanisms and not the graph structure.

For the case of post non-linear models, see the following paper for more details:

Zhang, K., and A. Hyvärinen. On the Identifiability of the Post-Nonlinear Causal Model. 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009). AUAI Press, 2009.

Parameters:
  • causal_model – A fitted invertible structural causal model.

  • data – Observations of variables in the DAG.

  • independence_test – Independence test to use for checking if residual and input are dependent.

  • significance_level – Significance level for deciding whether input and residual is dependent.

  • fdr_control_method

    Method for false discovery rate (FDR) control. For various options, please refer to this page.

Returns:

The outcome of the validation. The causal model can not be rejected if all causal mechanisms are consistent with the invertible model assumption.

dowhy.gcm.whatif module#

This module provides functionality to answer what-if questions.

dowhy.gcm.whatif.average_causal_effect(causal_model: ProbabilisticCausalModel, target_node: Any, interventions_alternative: Dict[Any, Callable[[ndarray], float | ndarray]], interventions_reference: Dict[Any, Callable[[ndarray], float | ndarray]], observed_data: DataFrame | None = None, num_samples_to_draw: int | None = None) float[source]#

Estimates the average causal effect (ACE) on the target of two different sets of interventions. The interventions can be specified through the parameters interventions_alternative and interventions_reference. For example, if the alternative intervention is do(T := 1) and the reference intervention is do(T := 0), then the average causal effect is given by ACE = E[Y | do(T := 1)] - E[Y | do(T := 0)]:

>>> average_causal_effect(causal_model, 'Y', {'T': lambda _ : 1}, {'T': lambda _ : 0})
We can also specify more complex interventions on multiple nodes:
>>> average_causal_effect(causal_model,
>>>                       'Y',
>>>                       {'T': lambda _ : 1, 'X0': lambda x : x + 1},
>>>                       {'T': lambda _ : 0, 'X0': lambda x : x * 2})

In the above, we would estimate ACE = E[Y | do(T := 1), do(X0 := X0 + 1)] - E[Y | do(T := 0), do(X0 := X0 * 2)].

Note: The target node can be a continuous real-valued variable or a categorical variable with at most two classes (i.e. binary).

Parameters:
  • causal_model – The probabilistic causal model we perform this intervention on .

  • target_node – Target node for which the ACE is estimated.

  • interventions_alternative – Dictionary defining the interventions for the alternative values.

  • interventions_reference – Dictionary defining the interventions for the reference values.

  • observed_data – Factual data that we observe for the nodes in the causal graph. By default, new data is sampled using the causal model. If observational data is available, providing them might improve the accuracy by mitigating issues due to a misspecified graph and/or causal models.

  • num_samples_to_draw – Number of samples drawn from the causal model for estimating ACE if no observed data is given.

Returns:

The estimated average causal effect (ACE).

dowhy.gcm.whatif.counterfactual_samples(causal_model: StructuralCausalModel | InvertibleStructuralCausalModel, interventions: Dict[Any, Callable[[ndarray], float | ndarray]], observed_data: DataFrame | None = None, noise_data: DataFrame | None = None) DataFrame[source]#

Estimates counterfactual data for observed data if we were to perform specified interventions. This function implements the 3-step process for computing counterfactuals by Pearl (see https://ftp.cs.ucla.edu/pub/stat_ser/r485.pdf).

Parameters:
  • causal_model – The (invertible) structural causal model we perform this intervention on. If noise_data is None and observed_data is provided, this must be an invertible structural model, otherwise, this can be either a structural causal model or an invertible one.

  • interventions – Dictionary containing the interventions we want to perform keyed by node name. An intervention is a function that takes a value as input and returns another value. For example, {‘X’: lambda x: 2} mimics the atomic intervention do(X:=2).

  • observed_data – Factual data that we observe for the nodes in the causal graph.

  • noise_data – Data of noise terms corresponding to nodes in the causal graph. If not provided, these have to be estimated from observed data. Then we require causal models of nodes to be invertible.

Returns:

Estimated counterfactual data.

dowhy.gcm.whatif.interventional_samples(causal_model: ProbabilisticCausalModel, interventions: Dict[Any, Callable[[ndarray], float | ndarray]], observed_data: DataFrame | None = None, num_samples_to_draw: int | None = None) DataFrame[source]#

Performs intervention on nodes in the causal graph.

Parameters:
  • causal_model – The probabilistic causal model we perform this intervention on .

  • interventions – Dictionary containing the interventions we want to perform, keyed by node name. An intervention is a function that takes a value as input and returns another value. For example, {‘X’: lambda x: 2} mimics the atomic intervention do(X:=2). A soft intervention can be formulated as {‘X’: lambda x: 0.2 * x}.

  • observed_data – Optionally, data on which to perform interventions. If None are given, data is generated based on the generative models.

  • num_samples_to_draw – Sample size to draw from the interventional distribution.

Returns:

Samples from the interventional distribution.

Module contents#

The gcm sub-package provides features built on top of graphical causal model (GCM) based inference.