3.3.1. dodiscover.cd.BregmanCDTest#
- class dodiscover.cd.BregmanCDTest(metric='rbf', distance_metric='euclidean', kwidth=None, null_reps=1000, n_jobs=None, propensity_model=None, propensity_est=None, random_state=None)[source]#
Bregman divergence conditional discrepancy test.
Tests the equality of conditional distributions using a kernel approach to estimate Bregman divergences outlined in [1].
- Parameters:
- metric
str
, optional The kernel metric, by default ‘rbf’.
- distance_metric
str
, optional The distance metric, by default ‘euclidean’.
- kwidth
float
, optional The width of the kernel, by default None, which we will then estimate using the default procedure in
dodiscover.ci.kernel_utils.compute_kernel()
.- null_reps
int
, optional Number of times to sample null distribution, by default 1000.
- n_jobs
int
, optional Number of CPUs to use, by default None.
- propensity_model
callable()
, optional The propensity model to estimate propensity scores among the groups. If
None
(default) will usesklearn.linear_model.LogisticRegression
. Thepropensity_model
passed in must implement apredict_proba
method in order to be used. See https://scikit-learn.org/stable/glossary.html#term-predict_proba for more information.- propensity_estarray_like of shape (n_samples, n_groups,), optional
The propensity estimates for each group. Must match the cardinality of the
group_col
in the data passed totest
function. IfNone
(default), will build a propensity model using the argument inpropensity_model
.- random_state
int
, optional Random seed, by default None.
- metric
Notes
Currently only testing among two groups are supported. Therefore
df[group_col]
must only contain binary indicators andpropensity_est
must contain only two columns. References ———- .. footbibliography:.. rubric:: Methods
compute_null
(e_hat, X, Y[, null_reps, ...])Estimate null distribution using propensity weights.
test
(df, group_col, y_vars, x_vars)Compute conditional discrepancy test.
- compute_null(e_hat, X, Y, null_reps=1000, random_state=None)#
Estimate null distribution using propensity weights.
- Parameters:
- e_hatArray-like of shape (n_samples,)
The predicted propensity score for
group_ind == 1
.- XArray-Like of shape (n_samples, n_features_x)
The X (covariates) array.
- YArray-Like of shape (n_samples, n_features_y)
The Y (outcomes) array.
- null_reps
int
, optional Number of times to sample null, by default 1000.
- random_state
int
, optional Random generator, or random seed, by default None.
- Returns:
- null_distArray-like of shape (n_samples,)
The null distribution of test statistics.
- test(df, group_col, y_vars, x_vars)[source]#
Compute conditional discrepancy test.
Tests the null hypothesis: \(P(Y | X, group) = P(Y | X)\), where we are trying to determine if Y is (conditionally) independent from the group denoting the distribution, given X.
Another way of viewing this test is testing whether or not \(P_i(Y|X) = P_j(Y|X)\), where \(P_i(.)\) and \(P_j(.)\) denote distributions from different groups or environments denoted by the group_col.
- Parameters:
- df
pd.DataFrame
The dataframe containing the dataset.
- y_vars
Set
ofcolumn
A column in
df
.- group_col
column
A column in
df
that indicates which group of distribution each sample belongs to with a ‘0’, or ‘1’.- x_vars
Set
ofcolumn
, optional A column in
df
.
- df
- Returns: