2.5.1. dodiscover.ci.ClassifierCITest#

class dodiscover.ci.ClassifierCITest(clf, metric=<function accuracy_score>, bootstrap=False, n_iter=20, correct_bias=True, threshold=0.03, test_size=0.3, random_state=None)[source]#

Methods

`generate_train_test_data`(df, x_vars, y_vars)	Generate a training and testing dataset for CCIT.
`test`(df, x_vars, y_vars[, z_covariates])	Abstract method for all conditional independence tests.

generate_train_test_data(df, x_vars, y_vars, z_covariates=None, k=1)#

Generate a training and testing dataset for CCIT.

This takes a conditional independence problem given a dataset and converts it to a binary classification problem.

Parameters:

dfpd.DataFrame: The dataframe containing the dataset.
x_varsSet of column: A column in df.
y_varsSet of column: A column in df.
z_covariatesSet, optional: A set of columns in df, by default None. If None, then the test should run a standard independence test.
kint: The K nearest-neighbors in subspaces for the conditional permutation step to generate distribution with conditional independence. By default, 1.

Returns:

X_train, Y_train, X_test, Y_testTuple[array_like, array_like, array_like, array_like]: The X_train, y_train, X_test, y_test to be used in binary classification, where each dataset comprises of samples from the joint and conditionally independent distributions. y_train and y_test are comprised of 1’s and 0’s only. Indices with value 1 indicate the original joint distribution, and indices with value 0 indicate the shuffled distribution.

test(df, x_vars, y_vars, z_covariates=None)[source]#

Abstract method for all conditional independence tests.

Parameters:

dfpd.DataFrame: The dataframe containing the dataset.
x_varsSet of column: A column in df.
y_varsSet of column: A column in df.
z_covariatesSet, optional: A set of columns in df, by default None. If None, then the test should run a standard independence test.

Returns:

Tuple[float, float]: Test statistic and pvalue.