dodiscover.constraint.FCI#

class dodiscover.constraint.FCI(ci_estimator, alpha=0.05, min_cond_set_size=None, max_cond_set_size=None, max_combinations=None, condsel_method=ConditioningSetSelection.NBRS, apply_orientations=True, keep_sorted=False, max_iter=1000, max_path_length=None, selection_bias=True, pds_condsel_method=ConditioningSetSelection.PDS, n_jobs=None)[source]#

The Fast Causal Inference (FCI) algorithm for causal discovery.

A complete constraint-based causal discovery algorithm that operates on observational data [1] assuming there may exist latent confounders, and optionally selection bias.

Parameters:
ci_estimatorCallable

The conditional independence test function. The arguments of the estimator should be data, node, node to compare, conditioning set of nodes, and any additional keyword arguments.

alphafloat, optional

The significance level for the conditional independence test, by default 0.05.

min_cond_set_sizeint, optional

Minimum size of the conditioning set, by default None, which will be set to ‘0’. Used to constrain the computation spent on the algorithm.

max_cond_set_sizeint, optional

Maximum size of the conditioning set, by default None. Used to limit the computation spent on the algorithm.

max_combinationsint, optional

The maximum number of conditional independence tests to run from the set of possible conditioning sets. By default None, which means the algorithm will check all possible conditioning sets. If max_combinations=n is set, then for every conditioning set size, ‘p’, there will be at most ‘n’ CI tests run before the conditioning set size ‘p’ is incremented. For controlling the size of ‘p’, see min_cond_set_size and max_cond_set_size. This can be used in conjunction with keep_sorted parameter to only test the “strongest” dependences.

condsel_methodConditioningSetSelection

The method to use for selecting the conditioning sets. Must be one of (‘neighbors’, ‘complete’, ‘neighbors_path’). See Notes for more details.

apply_orientationsbool

Whether or not to apply orientation rules given the learned skeleton graph and separating set per pair of variables. If True (default), will apply Zhang’s orientation rules R0-10, orienting colliders and certain arrowheads and tails [1].

keep_sortedbool

Whether or not to keep the considered conditioning set variables in sorted dependency order. If True (default) will sort the existing dependencies of each variable by its dependencies from strongest to weakest (i.e. largest CI test statistic value to lowest). The conditioning set is chosen lexographically based on the sorted test statistic values of ‘ith Pa(X) -> X’, for each possible parent node of ‘X’. This can be used in conjunction with max_combinations parameter to only test the “strongest” dependences.

max_iterint

The maximum number of iterations through the graph to apply orientation rules.

max_path_lengthint, optional

The maximum length of any discriminating path, or None if unlimited.

selection_biasbool

Whether or not to account for selection bias within the causal PAG. See [1].

pds_condsel_methodConditioningSetSelection

The method to use for selecting the conditioning sets using PDS. Must be one of (‘pds’, ‘pds_path’). See Notes for more details.

Notes

Note that the algorithm is called “fast causal inference”, but in reality the algorithm is quite expensive in terms of the number of conditional independence tests it must run.

References

Methods

evaluate_edge(data, X, Y[, Z])

Test any specific edge for X || Y | Z.

learn_graph(data, context)

Fit constraint-based discovery algorithm on dataset 'X'.

learn_skeleton(data, context[, sep_set])

Learns the skeleton of a causal DAG using pairwise (conditional) independence testing.

orient_edges(graph)

Apply orientations to edges using logical rules.

orient_unshielded_triples(graph, sep_set)

Orient colliders given a graph and separation set.

convert_skeleton_graph

evaluate_edge(data, X, Y, Z=None)#

Test any specific edge for X || Y | Z.

Parameters:
datapd.DataFrame

The dataset

Xcolumn

A column in data.

Ycolumn

A column in data.

Zset, optional

A list of columns in data, by default None.

Returns:
test_statfloat

Test statistic.

pvaluefloat

The pvalue.

learn_graph(data, context)#

Fit constraint-based discovery algorithm on dataset ‘X’.

Parameters:
XUnion[pd.DataFrame, Dict[Set, pd.DataFrame]]

Either a pandas dataframe constituting the endogenous (observed) variables as columns and samples as rows, or a dictionary of different sampled distributions with keys as the distribution names and values as the dataset as a pandas dataframe.

contextContext

The context of the causal discovery problem.

Raises:
RuntimeError

If ‘X’ is a dictionary, then all datasets should have the same set of column names (nodes).

Notes

Control over the constraints imposed by the algorithm can be passed into the class constructor.

learn_skeleton(data, context, sep_set=None)[source]#

Learns the skeleton of a causal DAG using pairwise (conditional) independence testing.

Encodes the skeleton via an undirected graph, networkx.Graph.

Parameters:
datapd.DataFrame

The dataset.

contextContext

A context object.

sep_setdict of dict of list of set

The separating set.

Returns:
skel_graphnx.Graph

The undirected graph of the causal graph’s skeleton.

sep_setdict of dict of list of set

The separating set per pairs of variables.

Notes

Learning the skeleton of a causal DAG uses (conditional) independence testing to determine which variables are (in)dependent. This specific algorithm compares exhaustively pairs of adjacent variables.

orient_edges(graph)[source]#

Apply orientations to edges using logical rules.

Parameters:
graphEquivalenceClass

Causal graph.

Raises:
NotImplementedError

All constraint-based discovery algorithms must implement this.

orient_unshielded_triples(graph, sep_set)[source]#

Orient colliders given a graph and separation set.

Parameters:
graphEquivalenceClass

The partial ancestral graph (PAG).

sep_setSeparatingSet

The separating set between any two nodes.