dodiscover.constraint.LearnInterventionSkeleton#

class dodiscover.constraint.LearnInterventionSkeleton(ci_estimator, cd_estimator, sep_set=None, alpha=0.05, min_cond_set_size=0, max_cond_set_size=None, max_combinations=None, condsel_method=ConditioningSetSelection.NBRS, second_stage_condsel_method=ConditioningSetSelection.PDS, keep_sorted=False, max_path_length=None, known_intervention_targets=False, n_jobs=None)[source]#

Learn skeleton using observational and interventional data.

An interventional skeleton is a skeleton that is learned from observational and/or interventional data. The interventional skeleton is just the normal skeleton among the observed variables augmented with F-nodes representing interventions and the edges from the F-nodes.

Parameters:

ci_estimatorBaseConditionalIndependenceTest: The conditional independence test function.
cd_estimatorBaseConditionalDiscrepancyTest: The conditional discrepancy test function.
sep_setdictionary of dictionary of list of set: Mapping node to other nodes to separating sets of variables. If None, then an empty dictionary of dictionary of list of sets will be initialized.
alphafloat, optional: The significance level for the conditional independence test, by default 0.05.
min_cond_set_sizeint: The minimum size of the conditioning set, by default 0. The number of variables used in the conditioning set.
max_cond_set_sizeint, optional: Maximum size of the conditioning set, by default None. Used to limit the computation spent on the algorithm.
max_combinationsint, optional: The maximum number of conditional independence tests to run from the set of possible conditioning sets. By default None, which means the algorithm will check all possible conditioning sets. If max_combinations=n is set, then for every conditioning set size, ‘p’, there will be at most ‘n’ CI tests run before the conditioning set size ‘p’ is incremented. For controlling the size of ‘p’, see min_cond_set_size and max_cond_set_size. This can be used in conjunction with keep_sorted parameter to only test the “strongest” dependences.
condsel_methodConditioningSetSelection: The method to use for testing conditional independence. Must be one of (‘pds’, ‘pds_path’). See Notes for more details.
keep_sortedbool: Whether or not to keep the considered conditioning set variables in sorted dependency order. If True (default) will sort the existing dependencies of each variable by its dependencies from strongest to weakest (i.e. largest CI test statistic value to lowest). This can be used in conjunction with max_combinations parameter to only test the “strongest” dependences.
max_path_lengthint, optional: The maximum length of any discriminating path, or None if unlimited.
n_jobsint, optional: Number of CPUs to use, by default None.

Notes

With interventional data, one may either know the interventional targets from each experimental distribution dataset, or one may not know the explicit targets. If the interventional targets are known, then the skeleton discovery algorithm of [1] is used. That is we learn the skeleton of a AugmentedPAG. Otherwise, we will not know the intervention targets, and use the skeleton discovery algorithm described in [2]. To define intervention targets, one must use the dodiscover.InterventionalContextBuilder.

References

Methods

evaluate_edge(data, conditional_test_func, X, Y)

Test any specific edge for X || Y | Z.

learn_graph

ci_estimator#: Callable[[Column, Column, Set[Column]], Tuple[float, float]]

evaluate_edge(data, conditional_test_func, X, Y, Z=None)#

Test any specific edge for X || Y | Z.

Parameters:

datapd.DataFrame: The dataset
Xcolumn: A column in data.
Ycolumn: A column in data.
Zset, optional: A list of columns in data, by default None.

Returns:

test_statfloat: Test statistic.
pvaluefloat: The pvalue.