dowhy.causal_refuters.overrule package#

Subpackages#

Submodules#

dowhy.causal_refuters.overrule.ruleset module#

Ruleset estimator class for OverRule.

This module implements the boolean ruleset estimator from OverRule [1]. Code is adapted (with some simplifications) from clinicalml/overlap-code, under the MIT License.

[1] Oberst, M., Johansson, F., Wei, D., Gao, T., Brat, G., Sontag, D., & Varshney, K. (2020). Characterization of Overlap in Observational Studies. In S. Chiappa & R. Calandra (Eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (Vol. 108, pp. 788–798). PMLR. https://arxiv.org/abs/1907.04138

class dowhy.causal_refuters.overrule.ruleset.BCSRulesetEstimator(n_ref_multiplier: float = 1.0, lambda0: float = 0.0, lambda1: float = 0.0, cat_cols: List | None = None, negations: bool = True, num_thresh: int = 9, seed: int | None = None, ref_range: Dict[str, Dict] | None = None, thresh_override: Dict | None = None, **kwargs)[source]#

Bases: object

Ruleset estimator based on Boolean Rules with Column Generation.

Operates according to an scikit-learn interface with a few additional methods.

Initializes the estimator.

**kwargs are passed to OverlapBooleanRule (see ./BCS/overlap_boolean_rule.py for description of arguments)

Parameters:
  • n_ref_multiplier (float, optional) – Reference sample count multiplier, only used for estimating support, defaults to 1.0, but should be set to zero for Overlap rules

  • lambda0 (float, optional) – Regularization on the # of rules, defaults to 0.0

  • lambda1 (float, optional) – Regularization on the # of literals, defaults to 0.0

  • cat_cols (Optional[List], optional) – Set of categorical columns, defaults to None

  • negations (bool, optional) – Include negation of literals, defaults to True

  • num_thresh (int, optional) – Number of bins to discretize continuous variables, defaults to 9 (for deciles)

  • seed (int, optional) – Random seed for reference samples, only used for estimating support, defaults to None

  • ref_range (Optional[Dict], optional) – Manual override of the range for reference samples, given as a dictionary of the form ref_range = {c: {“is_binary”: True/False, “min”: min_value, “max”: max_value}}

  • thresh_override (Optional[Dict], optional) – Manual override of the thresholds for continuous features, given as a dictionary like the following, will only be applied to continuous features with more than num_thresh unique values thresh_override = {column_name: np.linspace(0, 100, 10)}

fit(x, o=None)[source]#

Fit rules for either characterizing support (if O is not provided) or for characterizing overlap, in which case O should be a vector indicating overlap by 1 and non-overlap by 0.

This function is primarily a wrapper around the OverlapBooleanRule estimator, making sure that features are binarized before being fed into the ruleset estimator, constructing reference samples for the support characterization, and so on.

Parameters:
  • x (Pandas DataFrame or Numpy Array, shape (n, d)) – Samples of covariates

  • o (Pandas DataFrame or Numpy Array, shape (n, )) – Binary indicator for whether or not a sample belongs in the overlap region, defaults to None. If provided, should have the same length as x

get_params(deep=False)[source]#

Return estimator parameters

init_estimator_()[source]#

Initialize rule set estimator and feature binarizer.

predict(x)[source]#

Predict whether or not X lies in the overlap region (1 = True).

Parameters:

x (Pandas DataFrame or Numpy Array, shape (n, d)) – Samples of covariates

predict_rules(x)[source]#

Predict rules activated by x

Parameters:

x (Pandas DataFrame or Numpy Array, shape (n, d)) – Samples of covariates

Returns:

Matrix with binary values, of shape (n, r), where r is the total number of rules considered by the estimator, and where 1 indicates that the sample matches the rule, and 0 indicates otherwise.

Return type:

Numpy Array, shape (n, r)

rules(as_str: bool = False, transform: Callable[[str, float], float] | None = None, fmt: str = '%.3f', labels: Dict[str, str] = {})[source]#

Return rules learned by the estimator.

Parameters:
  • as_str (bool, optional) – Return a string if True, otherwise a dictionary, defaults to False

  • transform (Optional[Callable[[str, float], float]], optional) – A function that takes key-value pairs for rules and thresholds and transforms the value. This function is used to re-scale standardized data, defaults to None

  • fmt (str, optional) – Formatting string for float values, for printing rules with thresholds, defaults to “%.3f”

  • labels (Dict[str, str], optional) – Dictionary mapping from original feature names to display names when printing rules, any feature not specified here will default to the original name, defaults to {}

set_params(**params)[source]#

Set estimator parameters

dowhy.causal_refuters.overrule.utils module#

Utilities for learning boolean rules.

This module implements the boolean ruleset estimator from OverRule [1]. Code is adapted (with some simplifications) from clinicalml/overlap-code, under the MIT License.

[1] Oberst, M., Johansson, F., Wei, D., Gao, T., Brat, G., Sontag, D., & Varshney, K. (2020). Characterization of Overlap in Observational Studies. In S. Chiappa & R. Calandra (Eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (Vol. 108, pp. 788–798). PMLR. https://arxiv.org/abs/1907.04138

dowhy.causal_refuters.overrule.utils.fatom(f: str, o: str, v: str | float | None, fmt: str = '%.3f') str[source]#

Format an “atom”, i.e., a single literal in a Boolean Rule.

Parameters:
  • f (str) – Feature name

  • o (str) – Operator, one of [“<=”, “>”, “>=”, “<”, “==”, “not”, “”]

  • v (Optional[Union[str, float]]) – Value of comparison for [“<=”, “>”, “>=”, “<”, “==”]

  • fmt (str) – Formatting string for floats, defaults to “%.3f”

Returns:

Formatted atom

Return type:

str

dowhy.causal_refuters.overrule.utils.rule_str(C: List, fmt: str = '%.3f') str[source]#

Convert a rule into a string.

Parameters:
  • C (List) – List of rules, where each element is a list (a single rule) containing a set of atoms.

  • fmt (str) – Formatting string for floats, defaults to “%.3f”

Returns:

Formatted rule

Return type:

str

dowhy.causal_refuters.overrule.utils.sampleUnif(x, n: int = 10000, seed: int | None = None)[source]#

Generate samples from a uniform distribution over the max / min of each column of the sample X.

These are used for estimation of support, as the number of samples included under the rules gives a measure of volume. This function is specialized to continuous variables, while sample_reference handles the general case, calling this function where necessary.

Parameters:
  • x (Pandas Dataframe or Numpy Array) – 2D array of samples, where each column corresponds to a feature.

  • n (int, optional) – int, defaults to 10000

  • seed (int, optional) – Random seed for uniform sampling, defaults to None

dowhy.causal_refuters.overrule.utils.sample_reference(x, n: int | None = None, cat_cols: List[str] = [], seed: int | None = None, ref_range: Dict | None = None)[source]#

Generate samples from a uniform distribution over the columns of X.

Parameters:
  • x (Pandas Dataframe or Numpy Array) – 2D array of samples, where each column corresponds to a feature.

  • n (Optional[int], optional) – Number of samples to draw, defaults to the same number as the samples provided.

  • cat_cols (List[str], optional) – Set of categorical columns, defaults to None

  • seed (int, optional) – Random seed for uniform sampling, defaults to None

  • ref_range (Optional[Dict], optional) – Manual override of the range for reference samples, given as a dictionary of the form ref_range = {c: {“is_binary”: True/False, “min”: min_value, “max”: max_value}}

Module contents#