dodiscover.cd.BaseConditionalDiscrepancyTest#
- class dodiscover.cd.BaseConditionalDiscrepancyTest[source]#
Abstract class for any conditional discrepancy test.
All CD tests are used in constraint-based causal discovery algorithms. This class interface is expected to be very lightweight to enable anyone to convert a function for CD testing into a class, which has a specific API.
Methods
compute_null
(e_hat, X, Y[, null_reps, ...])Estimate null distribution using propensity weights.
test
(df, group_col, y_vars, x_vars)Compute conditional discrepancy test.
- compute_null(e_hat, X, Y, null_reps=1000, random_state=None)[source]#
Estimate null distribution using propensity weights.
- Parameters:
- e_hatArray-like of shape (n_samples,)
The predicted propensity score for
group_ind == 1
.- XArray-Like of shape (n_samples, n_features_x)
The X (covariates) array.
- YArray-Like of shape (n_samples, n_features_y)
The Y (outcomes) array.
- null_reps
int
, optional Number of times to sample null, by default 1000.
- random_state
int
, optional Random generator, or random seed, by default None.
- Returns:
- null_distArray-like of shape (n_samples,)
The null distribution of test statistics.
- abstract test(df, group_col, y_vars, x_vars)[source]#
Compute conditional discrepancy test.
Tests the null hypothesis: \(P(Y | X, group) = P(Y | X)\), where we are trying to determine if Y is (conditionally) independent from the group denoting the distribution, given X.
Another way of viewing this test is testing whether or not \(P_i(Y|X) = P_j(Y|X)\), where \(P_i(.)\) and \(P_j(.)\) denote distributions from different groups or environments denoted by the group_col.
- Parameters:
- df
pd.DataFrame
The dataframe containing the dataset.
- y_vars
Set
ofcolumn
A column in
df
.- group_col
column
A column in
df
that indicates which group of distribution each sample belongs to with a ‘0’, or ‘1’.- x_vars
Set
ofcolumn
, optional A column in
df
.
- df
- Returns: