2.7.1. pywhy_graphs.functional.make_graph_multidomain#

pywhy_graphs.functional.make_graph_multidomain(G: DiGraph, n_domains: int = 2, n_nodes_with_s_nodes: int | Tuple[int] = 1, n_invariances_to_try: int = 1, node_mean_lims: List[float] | None = None, node_std_lims: List[float] | None = None, edge_functions: List[Callable[[float], float]] | None = None, edge_weight_lims: List[float] | None = None, random_state=None) → DiGraph[source]#

Convert an existing linear Gaussian DAG to a multi-domain selection diagram model.

The multi-domain selection diagram model is a generalization of the regular causal diagram in that S-nodes represent possible changes in mechanisms for the underlying node. In particular, missing S-node edges to a specific node implies invariances in the distribution of that node across domain. For example, if you have a graph \(X \rightarrow Y\), then the S-node \(S^{1,2} \rightarrow Y\) represents the change in the distribution of \(Y\) given a change in domain. If there is no S-node \(S^{1,2} \rightarrow Y\), then the distribution of \(Y\) is invariant across domain 1 and 2.

Parameters:

GNetworkX DiGraph: The graph to sample data from. The graph will be modified in-place to get the weights and functions of the edges.
n_domainsint: The number of domains to split the graph into. By default 2.
n_nodes_with_s_nodesint | tuple[int]: The number of nodes to have S-node edges. By default 1. If a tuple, then will sample uniformly a number between the two values.
n_invariances_to_tryint: The number of invariances to try to set by deleting S-nodes. By default 1. More S-nodes than what is specified by this parameter may be deleted if there are inconsistencies in the S-nodes. See Notes for details.
node_mean_limsOptional[List[float]], optional: The lower and upper bounds of the mean of the Gaussian random variable, by default None, which defaults to a mean of 0.
node_std_limsOptional[List[float]], optional: The lower and upper bounds of the std of the Gaussian random variable, by default None, which defaults to a std of 1.
edge_functionsList[Callable[float]], optional: The set of edge functions that take in an iid sample from the parent and computes a transformation (possibly nonlinear), such as (lambda x: x**2, lambda x: x), by default None, which defaults to the identity function lambda x: x.
edge_weight_limsOptional[List[float]], optional: The lower and upper bounds of the edge weight, by default None, which defaults to a weight of 1.
random_stateint, optional: Random seed, by default None.

Returns:

GNetworkX DiGraph: NetworkX graph with the edge weights and functions set with node attributes set with 'parent_functions', and 'gaussian_noise_function'. Moreover the graph attribute 'linear_gaussian' is set to True.

See also

make_graph_linear_gaussian: Create a linear Gaussian graph

Notes

To determine the missing S-node structure, we first construct all possible S-nodes given the number of domains, n_domains. The total number of S-nodes will then be \(\binom{n_{domains}}{2}\). Then, we randomly sample a subset of nodes in the graph with S-node edges. The remaining nodes will be missing S-node edges. Then among the nodes with S-node edges, we will randomly sample a subset of S-nodes to be missing edges.

At this stage, there may be inconsistency in the S-nodes connected still. For example, if we have the S-nodes \(S^{1,2} \rightarrow Y\) among 3 domains, then we must have either one of the other S-nodes, or none at all. This is because the missing \(S^{2,3} \rightarrow Y\) and \(S^{1,3} \rightarrow Y\) implies that the distribution of \(Y\) is invariant across domains 1 and 3 and 2 and 3, which also implies they are invariant between domain 1 and 3. To fix this, for each node with S-node connections, we will delete random set of S-nodes and construct a connected component of the S-nodes domains to then remove any remaining S-nodes to keep the graph consistent.

2.7.1. pywhy_graphs.functional.make_graph_multidomain#

This Page