dodiscover.toporder.CAM#
- class dodiscover.toporder.CAM(alpha=0.05, prune=True, n_splines=10, splines_degree=3, pns=False, pns_num_neighbors=None, pns_threshold=1)[source]#
The CAM (Causal Additive Model) algorithm for causal discovery.
CAM [1] iteratively defines a topological ordering by leaf additions. Then it prunes the fully connected DAG consistent with the inferred topological order. The method assumes Additive Noise Model and Gaussianity of the noise terms.
- Parameters:
- alpha
float
, optional Alpha cutoff value for variable selection with hypothesis testing over regression coefficients, default is 0.05.
- prunebool, optional
If True (default), apply CAM-pruning after finding the topological order.
- n_splines
int
, optional Number of splines to use for the feature function, default is 10. Automatically decreased in case of insufficient samples
- splines_degree: int, optional
Order of spline to use for the feature function, default is 3.
- pnsbool, optional
If True, perform Preliminary Neighbour Search (PNS) before CAM pruning step, default is False. Allows scaling CAM pruning and ordering to large graphs.
- pns_num_neighbors: int, optional
Number of neighbors to use for PNS. If None (default) use all variables.
- pns_threshold: float, optional
Threshold to use for PNS, default is 1.
- alpha
Notes
Prior knowledge about the included and excluded directed edges in the output DAG is supported. It is not possible to provide explicit constraints on the relative positions of nodes in the topological ordering. However, explicitly including a directed edge in the DAG defines an implicit constraint on the relative position of the nodes in the topological ordering (i.e. if directed edge
(i,j)
is encoded in the graph, nodei
will precede nodej
in the output order).References
Methods
learn_graph
(data_df, context)Fit topological order based causal discovery algorithm on input data.
prune
(X, A_dense, G_included, G_excluded)Prune the dense adjacency matrix
A_dense
from spurious edges.- learn_graph(data_df, context)#
Fit topological order based causal discovery algorithm on input data.
- Parameters:
- data_df
pd.DataFrame
Datafame of the input data.
- context: Context
The context of the causal discovery problem.
- data_df
- prune(X, A_dense, G_included, G_excluded)#
Prune the dense adjacency matrix
A_dense
from spurious edges.Use sparse regression over the matrix of the data
X
for variable selection over the edges in the dense (potentially fully connected) adjacency matrixA_dense
- Parameters:
- X
np.ndarray
of shape (n_samples, n_nodes) Matrix of the data.
- A_dense
np.ndarray
of shape (n_nodes, n_nodes) Dense adjacency matrix to be pruned.
- G_included
nx.DiGraph
Graph with edges that are required to be included in the output. It encodes assumptions and prior knowledge about the causal graph.
- G_excluded
nx.DiGraph
Graph with edges that are required to be excluded from the output. It encodes assumptions and prior knowledge about the causal graph.
- X
- Returns:
- A
np.ndarray
The pruned adjacency matrix output of the causal discovery algorithm.
- A