dodiscover.constraint.PC#
- class dodiscover.constraint.PC(ci_estimator, alpha=0.05, min_cond_set_size=None, max_cond_set_size=None, max_combinations=None, condsel_method=ConditioningSetSelection.NBRS, apply_orientations=True, keep_sorted=False, max_iter=1000, n_jobs=None)[source]#
Peter and Clarke (PC) algorithm for causal discovery.
Assumes causal sufficiency, that is, all confounders in the causal graph are observed variables. See [1] for full details on the algorithm.
- Parameters:
- ci_estimator
BaseConditionalIndependenceTest
The conditional independence test function. The arguments of the estimator should be data, node, node to compare, conditioning set of nodes, and any additional keyword arguments. It must implement the
test
function which accepts the data, a set of X nodes, a set of Y nodes and an optional set of Z nodes, which returns a ordered tuple of test statistic and pvalue associated with the null hypothesis \(X \perp Y | Z\).- alpha
float
, optional The significance level for the conditional independence test, by default 0.05.
- min_cond_set_size
int
, optional Minimum size of the conditioning set, by default None, which will be set to ‘0’. Used to constrain the computation spent on the algorithm.
- max_cond_set_size
int
, optional Maximum size of the conditioning set, by default None. Used to limit the computation spent on the algorithm.
- max_combinations
int
, optional The maximum number of conditional independence tests to run from the set of possible conditioning sets. By default None, which means the algorithm will check all possible conditioning sets. If
max_combinations=n
is set, then for every conditioning set size, ‘p’, there will be at most ‘n’ CI tests run before the conditioning set size ‘p’ is incremented. For controlling the size of ‘p’, seemin_cond_set_size
andmax_cond_set_size
. This can be used in conjunction withkeep_sorted
parameter to only test the “strongest” dependences.- condsel_method
ConditioningSetSelection
The method to use for selecting the conditioning set. Must be one of (‘neighbors’, ‘complete’, ‘neighbors_path’). See Notes for more details.
- apply_orientationsbool
Whether or not to apply orientation rules given the learned skeleton graph and separating set per pair of variables. If
True
(default), will apply Meek’s orientation rules R0-3, orienting colliders and certain arrowheads [2].- keep_sortedbool
Whether or not to keep the considered conditioning set variables in sorted dependency order. If True (default) will sort the existing dependencies of each variable by its dependencies from strongest to weakest (i.e. largest CI test statistic value to lowest). The conditioning set is chosen lexographically based on the sorted test statistic values of ‘ith Pa(X) -> X’, for each possible parent node of ‘X’. This can be used in conjunction with
max_combinations
parameter to only test the “strongest” dependences.- max_iter
int
The maximum number of iterations through the graph to apply orientation rules.
- ci_estimator
References
- Attributes:
- graph_
EquivalenceClass
The equivalence class of graphs discovered.
- separating_sets_
dict
ofdict
oflist
ofset
The dictionary of separating sets, where it is a nested dictionary from the variable name to the variable it is being compared to the set of variables in the graph that separate the two.
- graph_
Methods
convert_skeleton_graph
(graph)Convert skeleton graph as undirected networkx Graph to CPDAG.
evaluate_edge
(data, X, Y[, Z])Test any specific edge for X || Y | Z.
learn_graph
(data, context)Fit constraint-based discovery algorithm on dataset 'X'.
learn_skeleton
(data, context[, sep_set])Learns the skeleton of a causal DAG using pairwise (conditional) independence testing.
orient_edges
(graph)Orient edges in a skeleton graph to estimate the causal DAG, or CPDAG.
orient_unshielded_triples
(graph, sep_set)Orient colliders given a graph and separation set.
- convert_skeleton_graph(graph)[source]#
Convert skeleton graph as undirected networkx Graph to CPDAG.
- Parameters:
- graph
nx.Graph
Converts a skeleton graph to the representation needed for PC algorithm, a CPDAG.
- graph
- Returns:
- graph
EquivalenceClass
The CPDAG class.
- graph
- evaluate_edge(data, X, Y, Z=None)#
Test any specific edge for X || Y | Z.
- learn_graph(data, context)#
Fit constraint-based discovery algorithm on dataset ‘X’.
- Parameters:
- X
Union
[pd.DataFrame
,Dict
[Set
,pd.DataFrame
]] Either a pandas dataframe constituting the endogenous (observed) variables as columns and samples as rows, or a dictionary of different sampled distributions with keys as the distribution names and values as the dataset as a pandas dataframe.
- context
Context
The context of the causal discovery problem.
- X
- Raises:
RuntimeError
If ‘X’ is a dictionary, then all datasets should have the same set of column names (nodes).
Notes
Control over the constraints imposed by the algorithm can be passed into the class constructor.
- learn_skeleton(data, context, sep_set=None)[source]#
Learns the skeleton of a causal DAG using pairwise (conditional) independence testing.
- Parameters:
- data
pd.DataFrame
The dataset.
- context
Context
A context object.
- sep_set
SeparatingSet
The separating set.
- data
- Returns:
- skel_graph
nx.Graph
The undirected graph of the causal graph’s skeleton.
- sep_set
SeparatingSet
The separating set per pairs of variables.
- skel_graph
Notes
Learning the skeleton of a causal DAG uses (conditional) independence testing to determine which variables are (in)dependent. This specific algorithm compares exhaustively pairs of adjacent variables.
- orient_edges(graph)[source]#
Orient edges in a skeleton graph to estimate the causal DAG, or CPDAG.
These are known as the Meek rules [2]. They are deterministic in the sense that they are logical characterizations of what edges must be present given the rest of the local graph structure.
- Parameters:
- graph
EquivalenceClass
A skeleton graph. If
None
, then will initialize PC using a complete graph. By default None.
- graph
- orient_unshielded_triples(graph, sep_set)[source]#
Orient colliders given a graph and separation set.
- Parameters:
- graph
EquivalenceClass
The CPDAG.
- sep_set
Dict
[Dict
[Set
[Set
[Any
]]]] The separating set between any two nodes.
- graph