3.3.1. dodiscover.cd.BregmanCDTest#

class dodiscover.cd.BregmanCDTest(metric='rbf', distance_metric='euclidean', kwidth=None, null_reps=1000, n_jobs=None, propensity_model=None, propensity_est=None, random_state=None)[source]#

Bregman divergence conditional discrepancy test.

Tests the equality of conditional distributions using a kernel approach to estimate Bregman divergences outlined in [1].

Parameters:
metricstr, optional

The kernel metric, by default ‘rbf’.

distance_metricstr, optional

The distance metric, by default ‘euclidean’.

kwidthfloat, optional

The width of the kernel, by default None, which we will then estimate using the default procedure in dodiscover.ci.kernel_utils.compute_kernel().

null_repsint, optional

Number of times to sample null distribution, by default 1000.

n_jobsint, optional

Number of CPUs to use, by default None.

propensity_modelcallable(), optional

The propensity model to estimate propensity scores among the groups. If None (default) will use sklearn.linear_model.LogisticRegression. The propensity_model passed in must implement a predict_proba method in order to be used. See https://scikit-learn.org/stable/glossary.html#term-predict_proba for more information.

propensity_estarray_like of shape (n_samples, n_groups,), optional

The propensity estimates for each group. Must match the cardinality of the group_col in the data passed to test function. If None (default), will build a propensity model using the argument in propensity_model.

random_stateint, optional

Random seed, by default None.

Notes

Currently only testing among two groups are supported. Therefore df[group_col] must only contain binary indicators and propensity_est must contain only two columns. References ———- .. footbibliography:

.. rubric:: Methods

compute_null(e_hat, X, Y[, null_reps, ...])

Estimate null distribution using propensity weights.

test(df, group_col, y_vars, x_vars)

Compute conditional discrepancy test.

compute_null(e_hat, X, Y, null_reps=1000, random_state=None)#

Estimate null distribution using propensity weights.

Parameters:
e_hatArray-like of shape (n_samples,)

The predicted propensity score for group_ind == 1.

XArray-Like of shape (n_samples, n_features_x)

The X (covariates) array.

YArray-Like of shape (n_samples, n_features_y)

The Y (outcomes) array.

null_repsint, optional

Number of times to sample null, by default 1000.

random_stateint, optional

Random generator, or random seed, by default None.

Returns:
null_distArray-like of shape (n_samples,)

The null distribution of test statistics.

test(df, group_col, y_vars, x_vars)[source]#

Compute conditional discrepancy test.

Tests the null hypothesis: \(P(Y | X, group) = P(Y | X)\), where we are trying to determine if Y is (conditionally) independent from the group denoting the distribution, given X.

Another way of viewing this test is testing whether or not \(P_i(Y|X) = P_j(Y|X)\), where \(P_i(.)\) and \(P_j(.)\) denote distributions from different groups or environments denoted by the group_col.

Parameters:
dfpd.DataFrame

The dataframe containing the dataset.

y_varsSet of column

A column in df.

group_colcolumn

A column in df that indicates which group of distribution each sample belongs to with a ‘0’, or ‘1’.

x_varsSet of column, optional

A column in df.

Returns:
Tuple[float, float]

Test statistic and pvalue.