2.2.1. pywhy_stats.conditional_ksample.kcd#

Kernel (conditional) discrepancy test.

Also known as a conditional k-sample test, where the null hypothesis is that the conditional distributions are equal across different population groups.

2.2.1.1. Returns#

PValueResult: The result of the test, which includes the test statistic and pvalue.

Functions

condind(X, Y, group_ind[, kernel_X, ...])

Test whether Y conditioned on X is invariant across the groups.

condind(X, Y, group_ind, kernel_X=None, kernel_Y=None, null_sample_size=1000, normalize_data=True, propensity_model=None, propensity_weights=None, centered=True, n_jobs=None, random_seed=None)[source]#

Test whether Y conditioned on X is invariant across the groups.

For testing conditional independence on continuous data, we leverage kernels [1] that are computationally efficient. This specifically tests the (conditional) invariance null hypothesis :math:

P_{Z=1}(Y|X) = P_{Z=0}(Y|X)

Parameters:

Xarray_like of shape (n_samples, n_features_x): Data for variable X, which can be multidimensional.
Yarray_like of shape (n_samples, n_features_y): Data for variable Y, which can be multidimensional.
group_indarray_like of shape (n_samples,): Data for group indicator Z, which can be multidimensional. This assigns each sample to a group indicated by 0 or 1.
kernel_XCallable[[array_like], array_like]: The kernel function for X. By default, the RBF kernel is used for continuous and the delta kernel for categorical data. Note that we currently only consider string values as categorical data. Kernels can be specified in the same way as for pairwise_kernels() with the addition that ‘delta’ kernel is supported for categorical data.
kernel_YCallable[[array_like], array_like]: The kernel function for Y. By default, the RBF kernel is used for continuous and the delta kernel for categorical data. Note that we currently only consider string values as categorical data. Kernels can be specified in the same way as for pairwise_kernels() with the addition that ‘delta’ kernel is supported for categorical data.
null_sample_sizeint: The number of samples to generate for the bootstrap distribution to approximate the pvalue, by default 1000.
normalize_databool: Whether the data should be standardized to unit variance, by default True.
propensity_modelOptional[sklearn.base.BaseEstimator], optional: The propensity model to use to estimate the propensity score, by default None.
propensity_weightsOptional[array_like], optional: The propensity weights to use, by default None, which means that the propensity scores will be estimated from the propensity_model.
centeredbool: Whether the kernel matrix should be centered, by default True.
n_jobsOptional[int], optional: The number of jobs to run in parallel, by default None.
random_seedOptional[int], optional: Random seed, by default None.

Notes

Any callable can be given to create the kernel matrix. For instance, to use a particular kernel from sklearn:

kernel_X = func:`sklearn.metrics.pairwise.pairwise_kernels.polynomial`

In addition, we implement an efficient delta kernel. The delta kernel can be specified using the ‘kernel’ string argument.

References