2.5.1. dodiscover.ci.ClassifierCITest#
- class dodiscover.ci.ClassifierCITest(clf, metric=<function accuracy_score>, bootstrap=False, n_iter=20, correct_bias=True, threshold=0.03, test_size=0.3, random_state=None)[source]#
Methods
generate_train_test_data
(df, x_vars, y_vars)Generate a training and testing dataset for CCIT.
test
(df, x_vars, y_vars[, z_covariates])Abstract method for all conditional independence tests.
- generate_train_test_data(df, x_vars, y_vars, z_covariates=None, k=1)#
Generate a training and testing dataset for CCIT.
This takes a conditional independence problem given a dataset and converts it to a binary classification problem.
- Parameters:
- df
pd.DataFrame
The dataframe containing the dataset.
- x_vars
Set
ofcolumn
A column in
df
.- y_vars
Set
ofcolumn
A column in
df
.- z_covariates
Set
, optional A set of columns in
df
, by default None. If None, then the test should run a standard independence test.- k
int
The K nearest-neighbors in subspaces for the conditional permutation step to generate distribution with conditional independence. By default, 1.
- df
- Returns:
- X_train, Y_train, X_test, Y_test
Tuple
[array_like, array_like, array_like, array_like] The X_train, y_train, X_test, y_test to be used in binary classification, where each dataset comprises of samples from the joint and conditionally independent distributions.
y_train
andy_test
are comprised of 1’s and 0’s only. Indices with value 1 indicate the original joint distribution, and indices with value 0 indicate the shuffled distribution.
- X_train, Y_train, X_test, Y_test
- test(df, x_vars, y_vars, z_covariates=None)[source]#
Abstract method for all conditional independence tests.
- Parameters:
- df
pd.DataFrame
The dataframe containing the dataset.
- x_vars
Set
ofcolumn
A column in
df
.- y_vars
Set
ofcolumn
A column in
df
.- z_covariates
Set
, optional A set of columns in
df
, by default None. If None, then the test should run a standard independence test.
- df
- Returns: