dodiscover.constraint.LearnSemiMarkovianSkeleton#
- class dodiscover.constraint.LearnSemiMarkovianSkeleton(ci_estimator, sep_set=None, alpha=0.05, min_cond_set_size=0, max_cond_set_size=None, max_combinations=None, condsel_method=ConditioningSetSelection.NBRS, second_stage_condsel_method=ConditioningSetSelection.PDS, keep_sorted=False, max_path_length=None, n_jobs=None)[source]#
Learning a skeleton from a semi-markovian causal model.
This proceeds by learning a skeleton by testing edges with candidate separating sets from the “possibly d-separating” sets (PDS), or PDS sets that lie on a path between two nodes [1]. This algorithm requires the input of a collider-oriented PAG, which provides the necessary information to compute the PDS set for any given nodes. See Notes for more details.
- Parameters:
- ci_estimator
BaseConditionalIndependenceTest
The conditional independence test function.
- sep_setdictionary of dictionary of
list
ofset
Mapping node to other nodes to separating sets of variables. If
None
, then an empty dictionary of dictionary of list of sets will be initialized.- alpha
float
, optional The significance level for the conditional independence test, by default 0.05.
- min_cond_set_size
int
The minimum size of the conditioning set, by default 0. The number of variables used in the conditioning set.
- max_cond_set_size
int
, optional Maximum size of the conditioning set, by default None. Used to limit the computation spent on the algorithm.
- max_combinations
int
, optional The maximum number of conditional independence tests to run from the set of possible conditioning sets. By default None, which means the algorithm will check all possible conditioning sets. If
max_combinations=n
is set, then for every conditioning set size, ‘p’, there will be at most ‘n’ CI tests run before the conditioning set size ‘p’ is incremented. For controlling the size of ‘p’, seemin_cond_set_size
andmax_cond_set_size
. This can be used in conjunction withkeep_sorted
parameter to only test the “strongest” dependences.- condsel_method
ConditioningSetSelection
The method to use for determining conditioning sets when testing conditional independence of the first stage. See
LearnSkeleton
for details.- second_stage_condsel_method
ConditioningSetSelection
|None
The method to use for determining conditioning sets when testing conditional independence of the first stage. Must be one of (‘pds’, ‘pds_path’). See Notes for more details. If
None
, then no second stage skeleton discovery phase will be run.- keep_sortedbool
Whether or not to keep the considered conditioning set variables in sorted dependency order. If True (default) will sort the existing dependencies of each variable by its dependencies from strongest to weakest (i.e. largest CI test statistic value to lowest). This can be used in conjunction with
max_combinations
parameter to only test the “strongest” dependences.- max_path_length
int
, optional The maximum length of any discriminating path, or None if unlimited.
- ci_estimator
Notes
To learn the skeleton of a Semi-Markovian causal model, one approach is to consider the possibly d-separating (PDS) set, which is a superset of the d-separating sets in the true causal model. Knowing the PDS set requires knowledge of the skeleton and orientation of certain edges. Therefore, we first learn an initial skeleton by checking conditional independences with respect to node neighbors. From this, one can orient certain colliders. The resulting PAG can now be used to enumerate the PDS sets for each node, which are now conditioning candidates to check for conditional independence.
For visual examples, see Figures 16, 17 and 18 in [1]. Also, see the RFCI paper for other examples [2].
Different methods for learning the skeleton:
There are different ways to learn the skeleton that are valid under various assumptions. The value of
condsel_method
completely defines how one selects the conditioning set.References
- Attributes:
- adj_graph_
nx.Graph
The discovered graph from data. Stored using an undirected graph. The graph contains edge attributes for the smallest value of the test statistic encountered (key name ‘test_stat’), the largest pvalue seen in testing ‘x’ || ‘y’ given some conditioning set (key name ‘pvalue’).
- sep_set_dictionary of dictionary of
list
ofset
Mapping node to other nodes to separating sets of variables.
- context_
Context
The result context. Encodes causal assumptions.
- min_cond_set_size_
int
The inferred minimum conditioning set size.
- max_cond_set_size_
int
The inferred maximum conditioning set size.
- max_combinations_
int
The inferred maximum number of combinations of ‘Z’ to test per \(X \perp Y | Z\).
- n_iters_
int
The number of iterations the skeleton has been learned.
- max_path_length_
int
Th inferred maximum path length any single discriminating path is allowed to take.
- n_jobs
int
, optional Number of CPUs to use, by default None.
- adj_graph_
Methods
evaluate_edge
(data, conditional_test_func, X, Y)Test any specific edge for X || Y | Z.
learn_graph
- ci_estimator#
Callable[[Column, Column, Set[Column]], Tuple[float, float]]
- evaluate_edge(data, conditional_test_func, X, Y, Z=None)#
Test any specific edge for X || Y | Z.