1. Causal Graphs in PyWhy#

Pywhy-graphs provides data structures and methods for storing causal graphs.

The classes heavily rely on NetworkX and follows a similar API. We are generally “networkx-compliant” in the sense that we have similar outputs for a similar input and similarly named functions in the graph API. However, we also extend the API to account for various classes of graphs that are not covered in networkx.

The choice of graph class depends on the structure of the graph you want to represent. For reference on concepts repeated across the API, see Glossary of Common Terms and API Elements.

1.1. Which graph class should I use?#

Note, that we do not implement a causal DAG without latent confounders, because that can be represented with a networkx.DiGraph with acyclicity constraints.

Pywhy_graph Class

Edge Types

Latent confounders

ADMG

directed, bidirected, undirected

Yes

We also represent common equivalence classes of causal graphs.

Pywhy_graph Class

Edge Types

Latent confounders

CPDAG

directed, undirected

No

PAG

directed, bidirected, undirected

Yes

For representing interventions, we have an augmented graph, which stems from the addition of “F-nodes”, which represent interventions [1].

Pywhy_graph Class

Edge Types

Known Target

AugmentedGraph

directed, undirected, bidirected

Yes

AugmentedPAG

directed, undirected, bidirected, circle

Yes

Finally, we also support time-series and create graphs that represent stationary time-series causal processes.

Pywhy_graph Class

Edge Types

Analogous non time-series graph

StationaryTimeSeriesGraph

undirected

nx.Graph

StationaryTimeSeriesDiGraph

directed

nx.DiGraph

StationaryTimeSeriesMixedEdgeGraph

directed, undirected, bidirected

MixedEdgeGraph

StationaryTimeSeriesCPDAG

directed, undirected

CPDAG

StationaryTimeSeriesPAG

directed, bidirected, circle

PAG

1.2. pywhy_graphs.classes: Causal graph types#

class pywhy_graphs.classes.ADMG(incoming_directed_edges=None, incoming_bidirected_edges=None, incoming_undirected_edges=None, directed_edge_name: str = 'directed', bidirected_edge_name: str = 'bidirected', undirected_edge_name: str = 'undirected', **attr)[source]

Acyclic directed mixed graph (ADMG).

A causal graph with two different edge types: bidirected and traditional directed edges. Directed edges constitute causal relations as a causal DAG did, while bidirected edges constitute the presence of a latent confounder.

Parameters:
incoming_directed_edgesinput directed edges (optional, default: None)

Data to initialize directed edges. All arguments that are accepted by networkx.DiGraph are accepted.

incoming_bidirected_edgesinput bidirected edges (optional, default: None)

Data to initialize bidirected edges. All arguments that are accepted by networkx.Graph are accepted.

incoming_undirected_edgesinput undirected edges (optional, default: None)

Data to initialize undirected edges. All arguments that are accepted by networkx.Graph are accepted.

directed_edge_namestr

The name for the directed edges. By default ‘directed’.

bidirected_edge_namestr

The name for the bidirected edges. By default ‘bidirected’.

undirected_edge_namestr

The name for the directed edges. By default ‘undirected’.

attrkeyword arguments, optional (default= no attributes)

Attributes to add to graph as key=value pairs.

Notes

Edge Type Subgraphs

The data structure underneath the hood is stored in two networkx graphs: networkx.Graph and networkx.DiGraph to represent the non-directed edges and directed edges. Non-directed edges in an ADMG can be present as bidirected edges standing for latent confounders, or undirected edges standing for selection bias.

  • Directed edges (<-, ->, indicating causal relationship) = networkx.DiGraph

    The subgraph of directed edges may be accessed by the ADMG.sub_directed_graph. Their edges in networkx format can be accessed by ADMG.directed_edges and the corresponding name of the edge type by ADMG.directed_edge_name.

  • Bidirected edges (<->, indicating latent confounder) = networkx.Graph

    The subgraph of bidirected edges may be accessed by the ADMG.sub_bidirected_graph. Their edges in networkx format can be accessed by ADMG.bidirected_edges and the corresponding name of the edge type by ADMG.bidirected_edge_name.

  • Undirected edges (–, indicating selection bias) = networkx.Graph

    The subgraph of undirected edges may be accessed by the ADMG.sub_undirected_graph. Their edges in networkx format can be accessed by ADMG.undirected_edges and the corresponding name of the edge type by ADMG.undirected_edge_name.

By definition, no cycles may exist due to the directed edges. However, beyond that multiple types of edges between the same pairs of nodes are possible.

class pywhy_graphs.classes.CPDAG(incoming_directed_edges=None, incoming_undirected_edges=None, directed_edge_name: str = 'directed', undirected_edge_name: str = 'undirected', **attr)[source]

Completed partially directed acyclic graphs (CPDAG).

CPDAGs generalize causal DAGs by allowing undirected edges. Undirected edges imply uncertainty in the orientation of the causal relationship. For example, A - B, can be A -> B or A <- B, allowing for a Markov equivalence class of DAGs for each CPDAG.

Parameters:
incoming_directed_edgesinput directed edges (optional, default: None)

Data to initialize directed edges. All arguments that are accepted by networkx.DiGraph are accepted.

incoming_undirected_edgesinput undirected edges (optional, default: None)

Data to initialize undirected edges. All arguments that are accepted by networkx.Graph are accepted.

directed_edge_namestr

The name for the directed edges. By default ‘directed’.

undirected_edge_namestr

The name for the directed edges. By default ‘undirected’.

attrkeyword arguments, optional (default= no attributes)

Attributes to add to graph as key=value pairs.

Notes

CPDAGs are Markov equivalence class of causal DAGs. The implicit assumption in these causal graphs are the Structural Causal Model (or SCM) is Markovian, inducing causal sufficiency, where there is no unobserved latent confounder. This allows CPDAGs to be learned from score-based (such as the “GES” algorithm) and constraint-based (such as the PC algorithm) approaches for causal structure learning.

One should not use CPDAGs if they suspect their data has unobserved latent confounders.

Edge Type Subgraphs

The data structure underneath the hood is stored in two networkx graphs: networkx.Graph and networkx.DiGraph to represent the non-directed edges and directed edges. Non-directed edges in an CPDAG can be present as undirected edges standing for uncertainty in which directino the directed edge is in.

  • Directed edges (<-, ->, indicating causal relationship) = networkx.DiGraph

    The subgraph of directed edges may be accessed by the CPDAG.sub_directed_graph. Their edges in networkx format can be accessed by CPDAG.directed_edges and the corresponding name of the edge type by CPDAG.directed_edge_name.

  • Undirected edges (–, indicating uncertainty) = networkx.Graph

    The subgraph of undirected edges may be accessed by the CPDAG.sub_undirected_graph. Their edges in networkx format can be accessed by CPDAG.undirected_edges and the corresponding name of the edge type by CPDAG.undirected_edge_name.

By definition, no cycles may exist due to the directed edges.

class pywhy_graphs.classes.PAG(incoming_directed_edges=None, incoming_undirected_edges=None, incoming_bidirected_edges=None, incoming_circle_edges=None, directed_edge_name: str = 'directed', undirected_edge_name: str = 'undirected', bidirected_edge_name: str = 'bidirected', circle_edge_name: str = 'circle', **attr)[source]

Partial ancestral graph (PAG).

PAGs are a Markov equivalence class with mixed edges of directed, bidirected, undirected and edges with circle endpoints. In terms of graph functionality, they essentially extend the definition of an ADMG with edges with circular endpoints.

Parameters:
incoming_directed_edgesinput directed edges (optional, default: None)

Data to initialize directed edges. All arguments that are accepted by networkx.DiGraph are accepted.

incoming_undirected_edgesinput undirected edges (optional, default: None)

Data to initialize undirected edges. All arguments that are accepted by networkx.Graph are accepted.

incoming_bidirected_edgesinput bidirected edges (optional, default: None)

Data to initialize bidirected edges. All arguments that are accepted by networkx.Graph are accepted.

incoming_circle_edgesinput circular endpoint edges (optional, default: None)

Data to initialize edges with circle endpoints. All arguments that are accepted by networkx.DiGraph are accepted.

directed_edge_namestr

The name for the directed edges. By default ‘directed’.

undirected_edge_namestr

The name for the undirected edges. By default ‘undirected’.

bidirected_edge_namestr

The name for the bidirected edges. By default ‘bidirected’.

circle_edge_namestr

The name for the circle edges. By default ‘circle’.

attrkeyword arguments, optional (default= no attributes)

Attributes to add to graph as key=value pairs.

Notes

PAGs are Markov equivalence class of causal ADMGs. The implicit assumption in these causal graphs are the Structural Causal Model (or SCM) is Semi-Markovian, such that latent confounders may be present. This allows PAGs to be learned from constraint-based (such as the FCI algorithm) approaches for causal structure learning.

Edge Type Subgraphs

The data structure underneath the hood is stored in two types of networkx graphs: networkx.Graph and networkx.DiGraph.

  • Directed edges (<-, ->, indicating causal relationship) = networkx.DiGraph

    The subgraph of directed edges may be accessed by the ~.PAG.sub_directed_graph. Their edges in networkx format can be accessed by directed_edges and the corresponding name of the edge type by directed_edge_name.

  • Bidirected edges (<->, indicating latent confounder) = networkx.Graph

    The subgraph of bidirected edges may be accessed by the ~.PAG.sub_bidirected_graph. Their edges in networkx format can be accessed by bidirected_edges and the corresponding name of the edge type by bidirected_edge_name.

  • Undirected edges (–, indicating selection bias) = networkx.Graph

    The subgraph of undirected edges may be accessed by the ~.PAG.sub_undirected_graph. Their edges in networkx format can be accessed by ~.PAG.undirected_edges and the corresponding name of the edge type by ~.PAG.undirected_edge_name.

  • Circle edges (-o, o-, indicating uncertainty) = networkx.DiGraph

    The subgraph of undirected edges may be accessed by the ~.PAG.sub_circle_graph. Their edges in networkx format can be accessed by ~.PAG.circle_edges and the corresponding name of the edge type by ~.PAG.circle_edge_name.

How different edges are represented in the PAG

Compared to an ~pywhy_graphs.classes.ADMG and ~pywhy_graphs.classes.CPDAG and a networkx.DiGraph, a PAG is more complex in that it generalizes endpoints an edge can take, exponentially increasing the number of possible edges that can occur between two nodes. The main complication arises in edges with circle endpoints. Rather than store all possible edges as separate networkx graphs, we have a set of rules that map a combination of the above edge-type subgraphs to a certain edge.

Bidirected and undirected edges are represented by one networkx graph (networkx.Graph). They are simple in that they do not require pairing with another edge-type subgraph.

  • x <-> y: is a bidirected edge present? (Note by definition of a PAG no other edge

    can be present between x and y)

  • x - y: is an undirected present? (Note no other edge should be present in any

    other direction, so an undirected edge is similar to a bidirected edge in that it represents only one kind of edge)

Edges with arrowheads, tails and circular endpoints are represented by another networkx graph (networkx.DiGraph). They complicate matters because the ~.PAG.sub_directed_graph and ~.PAG.sub_circle_graph can be combined in different ways to result in different edges between x and y.

Without loss of generality, we will be dealing with the ordered tuple (x, y). If you want the other direction of the edge, you can just flip the order of x and y. For example, x <- y would just be y -> x, so we will only discuss the -> edge. The following rules dictate what sort of edge we are dealing with:

  • x o-o y: is circle edge present in both directions? There are only edges present

    in the ~.PAG.sub_circle_graph between x and y.

  • x o-> y: is circle edge one way and directed edge another way? There is an edge from

    the ~.PAG.sub_circle_graph and the ~.PAG.sub_directed_graph between x and y in opposite directions.

  • x o- y: is there only one circle edge? In this special case, we do not use the

    ~.PAG.sub_undirected_graph to represent the tail endpoint at y. There is only one edge in the ~.PAG.sub_circle_graph between x and y.

1.3. pywhy_graphs.classes.timeseries: Causal graph types for time-series (alpha)#

Currently, we have an alpha support for time-series causal graphs. This means that their internals and API will most surely change over the next few versions.

Support of time-series is implemented in the form of more structured graph classes, where every graph has two major differences:

  • max lag: Every graph has a keyword argument input max_lag, specifying the maximum lag that the time-series graph can represent.

  • time-series node (tsnode): Every graph’s nodes are required to be a 2-tuple, with the variable name as the first element and the lag as the second element.

  • time-ordered: All edges are time-ordered, unless the underlying graph is an undirected networkx.Graph. Time-ordered edges means that there are no directed edges pointing from the present to the past, so there are no edges of the form (('x', -t), ('y', -t')), where t < t'. For example, a directed edge of the form (('x', -3), ('y', -4)) is not allowed.

  • selection bias (undirected edges): There is no support for undirected edges, or selection bias in time-series causal graphs at this moment.

Some graphs also embody the implicit assumption of “stationarity”, which means all edges are repeated over time. For example: if we assume stationarity, and know the edge (('x', -3), ('y', -2)) exists in the graph and the maximum lag is 4, then the following edges also exist in the graph:

  • (('x', -4), ('y', -3))

  • (('x', -2), ('y', -1))

  • (('x', -1), ('y', 0))

Stationarity implies that all edge additions/removals propagate to other homologous edges [2]. This property can be turned off in StationaryTimeSeriesCPDAG and StationaryTimeSeriesPAG by calling the set_stationarity function. This may be useful for example in causal discovery, where we are modifying edges, but do not want the modifications to propagate to homologous edges.

Note that stationarity in the Markov equivalence class of the causal graphs has some subtle differences that impact the causal assumptions encoded in the MEC. All other functionalities are similar. See [3] for a characterization of assumptions within a time-series causal graph.

class pywhy_graphs.classes.timeseries.TimeSeriesGraph(incoming_graph_data=None, max_lag: int = 1, **attr)[source]

A class to imbue undirected graph with time-series structure.

This should not be used directly. See BaseTimeSeriesGraph for documentation on the functionality of time-series graphs.

class pywhy_graphs.classes.timeseries.TimeSeriesDiGraph(incoming_graph_data=None, max_lag: int = 1, **attr)[source]

A class to imbue directed graph with time-series structure.

See BaseTimeSeriesGraph for documentation on the functionality of time-series graphs.

class pywhy_graphs.classes.timeseries.TimeSeriesMixedEdgeGraph(graphs=None, edge_types=None, max_lag=1, **attr)[source]

A class to imbue mixed-edge graph with time-series structure.

This should not be used directly.

class pywhy_graphs.classes.timeseries.StationaryTimeSeriesCPDAG(incoming_directed_edges=None, incoming_undirected_edges=None, directed_edge_name: str = 'directed', undirected_edge_name: str = 'undirected', stationary: bool = True, **attr)[source]

Completed partially directed acyclic graphs (CPDAG).

CPDAGs generalize causal DAGs by allowing undirected edges. Undirected edges imply uncertainty in the orientation of the causal relationship. For example, A - B, can be A -> B or A <- B, allowing for a Markov equivalence class of DAGs for each CPDAG.

Parameters:
incoming_directed_edgesinput directed edges (optional, default: None)

Data to initialize directed edges. All arguments that are accepted by networkx.DiGraph are accepted.

incoming_undirected_edgesinput undirected edges (optional, default: None)

Data to initialize undirected edges. All arguments that are accepted by networkx.Graph are accepted.

directed_edge_namestr

The name for the directed edges. By default ‘directed’.

undirected_edge_namestr

The name for the directed edges. By default ‘undirected’.

attrkeyword arguments, optional (default= no attributes)

Attributes to add to graph as key=value pairs.

Notes

CPDAGs are Markov equivalence class of causal DAGs. The implicit assumption in these causal graphs are the Structural Causal Model (or SCM) is Markovian, inducing causal sufficiency, where there is no unobserved latent confounder. This allows CPDAGs to be learned from score-based (such as the “GES” algorithm) and constraint-based (such as the PC algorithm) approaches for causal structure learning.

One should not use CPDAGs if they suspect their data has unobserved latent confounders.

class pywhy_graphs.classes.timeseries.StationaryTimeSeriesDiGraph(incoming_graph_data=None, max_lag: int = 1, stationary: bool = True, check_time_direction: bool = True, **attr)[source]

Stationary time-series directed graph.

A stationary graph is one where lagged edges repeat themselves over time. Edges connecting to nodes in time point “t=0” are all the relevant edges needed to depict the time-series graph.

Time-series graph nodes are defined as a cross-product of variables and a time-index. Nodes are always a tuple of variables and the lag. For example, a node could be ('x', -1) indicating the ‘x’ variable at ‘-1’ lag.

Parameters:
incoming_graph_datainput graph (optional, default: None)

Data to initialize graph. If None (default) an empty graph is created. The data can be any format that is supported by the to_networkx_graph() function, currently including edge list, dict of dicts, dict of lists, NetworkX graph, 2D NumPy array, SciPy sparse matrix, or PyGraphviz graph.

max_lagint, optional

The max lag, by default 1.

attrkeyword arguments, optional (default= no attributes)

Attributes to add to graph as key=value pairs.

Attributes:
stationarybool

Whether or not the graph is stationary.

check_time_directionbool

Whether or not to check time directionality is valid, by default True. May set to False for undirected graphs.

Notes

A stationary time-series graph is one in which edges over time are repeated. In order to properly query for d-separation, one needs to query up to 2 times the maximum lag.

A ts-graph’s nodes are defined uniquely by its set of variables and the maximum-lag parameter. Given for example, ('x', 'y', 'z') as the set of variables and a maximum-lag of 2, then there would be 9 total nodes in the graph consisting of the cross-product of ('x', 'y', 'z') and (0, 1, 2). Nodes are automatically added, or deleted depending on the value of the maximum-lag in the graph.

class pywhy_graphs.classes.timeseries.StationaryTimeSeriesGraph(incoming_graph_data=None, max_lag: int = 1, stationary: bool = True, **attr)[source]

Stationary time-series graph without directionality on edges.

This class should not be used directly.

Included for completeness to enable modeling and working with nx.Graph like objects with time-series structure. By the time-ordering assumption, all lagged edges must point forward in time. This serves as an API layer to allow for non-directed edges in time (i.e. circular edges among nodes in a ts-PAG).

Parameters:
incoming_graph_dataiterable, optional

The graph data to set, by default None.

max_lagint, optional

Maximum lag, by default 1.

class pywhy_graphs.classes.timeseries.StationaryTimeSeriesMixedEdgeGraph(graphs=None, edge_types=None, max_lag: int | None = None, **attr)[source]

A mixed-edge causal graph for stationary time-series.

Parameters:
graphsList of Graph | DiGraph

A list of networkx single-edge graphs.

edge_typesList of str

A list of names for each edge type.

max_lagint, optional

The maximum lag, by default None.

attrkeyword arguments, optional (default= no attributes)

Attributes to add to graph as key=value pairs.

class pywhy_graphs.classes.timeseries.StationaryTimeSeriesPAG(incoming_directed_edges=None, incoming_circle_edges=None, incoming_bidirected_edges=None, incoming_undirected_edges=None, circle_edge_name: str = 'circle', directed_edge_name: str = 'directed', bidirected_edge_name: str = 'bidirected', undirected_edge_name: str = 'undirected', stationary: bool = True, **attr)[source]