dodiscover.cd.BaseConditionalDiscrepancyTest#

class dodiscover.cd.BaseConditionalDiscrepancyTest[source]#

Abstract class for any conditional discrepancy test.

All CD tests are used in constraint-based causal discovery algorithms. This class interface is expected to be very lightweight to enable anyone to convert a function for CD testing into a class, which has a specific API.

Methods

compute_null(e_hat, X, Y[, null_reps, ...])

Estimate null distribution using propensity weights.

test(df, group_col, y_vars, x_vars)

Compute conditional discrepancy test.

compute_null(e_hat, X, Y, null_reps=1000, random_state=None)[source]#

Estimate null distribution using propensity weights.

Parameters:
e_hatArray-like of shape (n_samples,)

The predicted propensity score for group_ind == 1.

XArray-Like of shape (n_samples, n_features_x)

The X (covariates) array.

YArray-Like of shape (n_samples, n_features_y)

The Y (outcomes) array.

null_repsint, optional

Number of times to sample null, by default 1000.

random_stateint, optional

Random generator, or random seed, by default None.

Returns:
null_distArray-like of shape (n_samples,)

The null distribution of test statistics.

abstract test(df, group_col, y_vars, x_vars)[source]#

Compute conditional discrepancy test.

Tests the null hypothesis: \(P(Y | X, group) = P(Y | X)\), where we are trying to determine if Y is (conditionally) independent from the group denoting the distribution, given X.

Another way of viewing this test is testing whether or not \(P_i(Y|X) = P_j(Y|X)\), where \(P_i(.)\) and \(P_j(.)\) denote distributions from different groups or environments denoted by the group_col.

Parameters:
dfpd.DataFrame

The dataframe containing the dataset.

y_varsSet of column

A column in df.

group_colcolumn

A column in df that indicates which group of distribution each sample belongs to with a ‘0’, or ‘1’.

x_varsSet of column, optional

A column in df.

Returns:
Tuple[float, float]

Test statistic and pvalue.