2.7.1. pywhy_graphs.functional.make_graph_multidomain#
- pywhy_graphs.functional.make_graph_multidomain(G: DiGraph, n_domains: int = 2, n_nodes_with_s_nodes: int | Tuple[int] = 1, n_invariances_to_try: int = 1, node_mean_lims: List[float] | None = None, node_std_lims: List[float] | None = None, edge_functions: List[Callable[[float], float]] | None = None, edge_weight_lims: List[float] | None = None, random_state=None) DiGraph [source]#
Convert an existing linear Gaussian DAG to a multi-domain selection diagram model.
The multi-domain selection diagram model is a generalization of the regular causal diagram in that S-nodes represent possible changes in mechanisms for the underlying node. In particular, missing S-node edges to a specific node implies invariances in the distribution of that node across domain. For example, if you have a graph \(X \rightarrow Y\), then the S-node \(S^{1,2} \rightarrow Y\) represents the change in the distribution of \(Y\) given a change in domain. If there is no S-node \(S^{1,2} \rightarrow Y\), then the distribution of \(Y\) is invariant across domain 1 and 2.
- Parameters:
- GNetworkX DiGraph
The graph to sample data from. The graph will be modified in-place to get the weights and functions of the edges.
- n_domainsint
The number of domains to split the graph into. By default 2.
- n_nodes_with_s_nodesint | tuple[int]
The number of nodes to have S-node edges. By default 1. If a tuple, then will sample uniformly a number between the two values.
- n_invariances_to_tryint
The number of invariances to try to set by deleting S-nodes. By default 1. More S-nodes than what is specified by this parameter may be deleted if there are inconsistencies in the S-nodes. See Notes for details.
- node_mean_limsOptional[List[float]], optional
The lower and upper bounds of the mean of the Gaussian random variable, by default None, which defaults to a mean of 0.
- node_std_limsOptional[List[float]], optional
The lower and upper bounds of the std of the Gaussian random variable, by default None, which defaults to a std of 1.
- edge_functionsList[Callable[float]], optional
The set of edge functions that take in an iid sample from the parent and computes a transformation (possibly nonlinear), such as
(lambda x: x**2, lambda x: x)
, by default None, which defaults to the identity functionlambda x: x
.- edge_weight_limsOptional[List[float]], optional
The lower and upper bounds of the edge weight, by default None, which defaults to a weight of 1.
- random_stateint, optional
Random seed, by default None.
- Returns:
- GNetworkX DiGraph
NetworkX graph with the edge weights and functions set with node attributes set with
'parent_functions'
, and'gaussian_noise_function'
. Moreover the graph attribute'linear_gaussian'
is set toTrue
.
See also
make_graph_linear_gaussian
Create a linear Gaussian graph
Notes
To determine the missing S-node structure, we first construct all possible S-nodes given the number of domains,
n_domains
. The total number of S-nodes will then be \(\binom{n_{domains}}{2}\). Then, we randomly sample a subset of nodes in the graph with S-node edges. The remaining nodes will be missing S-node edges. Then among the nodes with S-node edges, we will randomly sample a subset of S-nodes to be missing edges.At this stage, there may be inconsistency in the S-nodes connected still. For example, if we have the S-nodes \(S^{1,2} \rightarrow Y\) among 3 domains, then we must have either one of the other S-nodes, or none at all. This is because the missing \(S^{2,3} \rightarrow Y\) and \(S^{1,3} \rightarrow Y\) implies that the distribution of \(Y\) is invariant across domains 1 and 3 and 2 and 3, which also implies they are invariant between domain 1 and 3. To fix this, for each node with S-node connections, we will delete random set of S-nodes and construct a connected component of the S-nodes domains to then remove any remaining S-nodes to keep the graph consistent.