2.3.1. pywhy_stats.conditional_ksample.bregman#

Bregman (conditional) discrepancy test.

Also known as a conditional k-sample test, where the null hypothesis is that the conditional distributions are equal across different population groups. The Bregman tests for conditional divergence using correntropy.

2.3.1.1. Returns#

PValueResult: The result of the test, which includes the test statistic and pvalue.

Functions

condind(X, Y, group_ind[, kernel, ...])

Test whether Y conditioned on X is invariant across the groups.

condind(X, Y, group_ind, kernel=None, null_sample_size=1000, propensity_model=None, propensity_weights=None, centered=False, n_jobs=None, random_seed=None)[source]#

Test whether Y conditioned on X is invariant across the groups.

For testing conditional independence on continuous data, we compute Bregman divergences [1]. This specifically tests the (conditional) invariance null hypothesis :math:

P_{Z=1}(Y|X) = P_{Z=0}(Y|X)

Parameters:

Xarray_like of shape (n_samples, n_features_x): Data for variable X, which can be multidimensional.
Yarray_like of shape (n_samples, n_features_y): Data for variable Y, which can be multidimensional.
group_indarray_like of shape (n_samples,): Data for group indicator Z, which can be multidimensional. This assigns each sample to a group indicated by 0 or 1.
kernel_XCallable[[array_like], array_like]: The kernel function for X. By default, the RBF kernel is used for continuous and the delta kernel for categorical data. Note that we currently only consider string values as categorical data.
kernel_YCallable[[array_like], array_like]: The kernel function for Y. By default, the RBF kernel is used for continuous and the delta kernel for categorical data. Note that we currently only consider string values as categorical data.
null_sample_sizeint: The number of samples to generate for the bootstrap distribution to approximate the pvalue, by default 1000.
propensity_modelOptional[sklearn.base.BaseEstimator], optional: The propensity model to use to estimate the propensity score, by default None.
propensity_weightsOptional[array_like], optional: The propensity weights to use, by default None, which means that the propensity scores will be estimated from the propensity_model.
centeredbool: Whether the kernel matrix should be centered, by default True.
n_jobsOptional[int], optional: The number of jobs to run in parallel, by default None.
random_seedOptional[int], optional: Random seed, by default None.

Notes

Any callable can be given to create the kernel matrix. For instance, to use a particular kernel from sklearn:

kernel_X = func:`sklearn.metrics.pairwise.pairwise_kernels.polynomial`

References