2.3.1. pywhy_stats.conditional_ksample.bregman#

Bregman (conditional) discrepancy test.

Also known as a conditional k-sample test, where the null hypothesis is that the conditional distributions are equal across different population groups. The Bregman tests for conditional divergence using correntropy.

2.3.1.1. Returns#

PValueResult

The result of the test, which includes the test statistic and pvalue.

Functions

condind(X, Y, group_ind[, kernel, ...])

Test whether Y conditioned on X is invariant across the groups.

condind(X, Y, group_ind, kernel=None, null_sample_size=1000, propensity_model=None, propensity_weights=None, centered=False, n_jobs=None, random_seed=None)[source]#

Test whether Y conditioned on X is invariant across the groups.

For testing conditional independence on continuous data, we compute Bregman divergences [1]. This specifically tests the (conditional) invariance null hypothesis :math:

P_{Z=1}(Y|X) = P_{Z=0}(Y|X)
Parameters:
Xarray_like of shape (n_samples, n_features_x)

Data for variable X, which can be multidimensional.

Yarray_like of shape (n_samples, n_features_y)

Data for variable Y, which can be multidimensional.

group_indarray_like of shape (n_samples,)

Data for group indicator Z, which can be multidimensional. This assigns each sample to a group indicated by 0 or 1.

kernel_XCallable[[array_like], array_like]

The kernel function for X. By default, the RBF kernel is used for continuous and the delta kernel for categorical data. Note that we currently only consider string values as categorical data.

kernel_YCallable[[array_like], array_like]

The kernel function for Y. By default, the RBF kernel is used for continuous and the delta kernel for categorical data. Note that we currently only consider string values as categorical data.

null_sample_sizeint

The number of samples to generate for the bootstrap distribution to approximate the pvalue, by default 1000.

propensity_modelOptional[sklearn.base.BaseEstimator], optional

The propensity model to use to estimate the propensity score, by default None.

propensity_weightsOptional[array_like], optional

The propensity weights to use, by default None, which means that the propensity scores will be estimated from the propensity_model.

centeredbool

Whether the kernel matrix should be centered, by default True.

n_jobsOptional[int], optional

The number of jobs to run in parallel, by default None.

random_seedOptional[int], optional

Random seed, by default None.

Notes

Any callable can be given to create the kernel matrix. For instance, to use a particular kernel from sklearn:

kernel_X = func:`sklearn.metrics.pairwise.pairwise_kernels.polynomial`

References