2.2.1. pywhy_stats.conditional_ksample.kcd#

Kernel (conditional) discrepancy test.

Also known as a conditional k-sample test, where the null hypothesis is that the conditional distributions are equal across different population groups.

2.2.1.1. Returns#

PValueResult

The result of the test, which includes the test statistic and pvalue.

Functions

condind(X, Y, group_ind[, kernel_X, ...])

Test whether Y conditioned on X is invariant across the groups.

condind(X, Y, group_ind, kernel_X=None, kernel_Y=None, null_sample_size=1000, normalize_data=True, propensity_model=None, propensity_weights=None, centered=True, n_jobs=None, random_seed=None)[source]#

Test whether Y conditioned on X is invariant across the groups.

For testing conditional independence on continuous data, we leverage kernels [1] that are computationally efficient. This specifically tests the (conditional) invariance null hypothesis :math:

P_{Z=1}(Y|X) = P_{Z=0}(Y|X)
Parameters:
Xarray_like of shape (n_samples, n_features_x)

Data for variable X, which can be multidimensional.

Yarray_like of shape (n_samples, n_features_y)

Data for variable Y, which can be multidimensional.

group_indarray_like of shape (n_samples,)

Data for group indicator Z, which can be multidimensional. This assigns each sample to a group indicated by 0 or 1.

kernel_XCallable[[array_like], array_like]

The kernel function for X. By default, the RBF kernel is used for continuous and the delta kernel for categorical data. Note that we currently only consider string values as categorical data. Kernels can be specified in the same way as for pairwise_kernels() with the addition that ‘delta’ kernel is supported for categorical data.

kernel_YCallable[[array_like], array_like]

The kernel function for Y. By default, the RBF kernel is used for continuous and the delta kernel for categorical data. Note that we currently only consider string values as categorical data. Kernels can be specified in the same way as for pairwise_kernels() with the addition that ‘delta’ kernel is supported for categorical data.

null_sample_sizeint

The number of samples to generate for the bootstrap distribution to approximate the pvalue, by default 1000.

normalize_databool

Whether the data should be standardized to unit variance, by default True.

propensity_modelOptional[sklearn.base.BaseEstimator], optional

The propensity model to use to estimate the propensity score, by default None.

propensity_weightsOptional[array_like], optional

The propensity weights to use, by default None, which means that the propensity scores will be estimated from the propensity_model.

centeredbool

Whether the kernel matrix should be centered, by default True.

n_jobsOptional[int], optional

The number of jobs to run in parallel, by default None.

random_seedOptional[int], optional

Random seed, by default None.

Notes

Any callable can be given to create the kernel matrix. For instance, to use a particular kernel from sklearn:

kernel_X = func:`sklearn.metrics.pairwise.pairwise_kernels.polynomial`

In addition, we implement an efficient delta kernel. The delta kernel can be specified using the ‘kernel’ string argument.

References