dowhy.causal_refuters.overrule package#
Subpackages#
- dowhy.causal_refuters.overrule.BCS package
- Submodules
- dowhy.causal_refuters.overrule.BCS.beam_search module
- dowhy.causal_refuters.overrule.BCS.load_process_data_BCS module
- dowhy.causal_refuters.overrule.BCS.overlap_boolean_rule module
OverlapBooleanRule
OverlapBooleanRule.compute_conjunctions()
OverlapBooleanRule.fit()
OverlapBooleanRule.get_objective_value()
OverlapBooleanRule.get_params()
OverlapBooleanRule.greedy_round_()
OverlapBooleanRule.predict()
OverlapBooleanRule.predict_()
OverlapBooleanRule.predict_rules()
OverlapBooleanRule.round_()
OverlapBooleanRule.set_params()
- Module contents
Submodules#
dowhy.causal_refuters.overrule.ruleset module#
Ruleset estimator class for OverRule.
This module implements the boolean ruleset estimator from OverRule [1]. Code is adapted (with some simplifications) from clinicalml/overlap-code, under the MIT License.
[1] Oberst, M., Johansson, F., Wei, D., Gao, T., Brat, G., Sontag, D., & Varshney, K. (2020). Characterization of Overlap in Observational Studies. In S. Chiappa & R. Calandra (Eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (Vol. 108, pp. 788–798). PMLR. https://arxiv.org/abs/1907.04138
- class dowhy.causal_refuters.overrule.ruleset.BCSRulesetEstimator(n_ref_multiplier: float = 1.0, lambda0: float = 0.0, lambda1: float = 0.0, cat_cols: List | None = None, negations: bool = True, num_thresh: int = 9, seed: int | None = None, ref_range: Dict[str, Dict] | None = None, thresh_override: Dict | None = None, **kwargs)[source]#
Bases:
object
Ruleset estimator based on Boolean Rules with Column Generation.
Operates according to an scikit-learn interface with a few additional methods.
Initializes the estimator.
**kwargs are passed to OverlapBooleanRule (see ./BCS/overlap_boolean_rule.py for description of arguments)
- Parameters:
n_ref_multiplier (float, optional) – Reference sample count multiplier, only used for estimating support, defaults to 1.0, but should be set to zero for Overlap rules
lambda0 (float, optional) – Regularization on the # of rules, defaults to 0.0
lambda1 (float, optional) – Regularization on the # of literals, defaults to 0.0
cat_cols (Optional[List], optional) – Set of categorical columns, defaults to None
negations (bool, optional) – Include negation of literals, defaults to True
num_thresh (int, optional) – Number of bins to discretize continuous variables, defaults to 9 (for deciles)
seed (int, optional) – Random seed for reference samples, only used for estimating support, defaults to None
ref_range (Optional[Dict], optional) – Manual override of the range for reference samples, given as a dictionary of the form ref_range = {c: {“is_binary”: True/False, “min”: min_value, “max”: max_value}}
thresh_override (Optional[Dict], optional) – Manual override of the thresholds for continuous features, given as a dictionary like the following, will only be applied to continuous features with more than num_thresh unique values thresh_override = {column_name: np.linspace(0, 100, 10)}
- fit(x, o=None)[source]#
Fit rules for either characterizing support (if O is not provided) or for characterizing overlap, in which case O should be a vector indicating overlap by 1 and non-overlap by 0.
This function is primarily a wrapper around the OverlapBooleanRule estimator, making sure that features are binarized before being fed into the ruleset estimator, constructing reference samples for the support characterization, and so on.
- Parameters:
x (Pandas DataFrame or Numpy Array, shape (n, d)) – Samples of covariates
o (Pandas DataFrame or Numpy Array, shape (n, )) – Binary indicator for whether or not a sample belongs in the overlap region, defaults to None. If provided, should have the same length as x
- predict(x)[source]#
Predict whether or not X lies in the overlap region (1 = True).
- Parameters:
x (Pandas DataFrame or Numpy Array, shape (n, d)) – Samples of covariates
- predict_rules(x)[source]#
Predict rules activated by x
- Parameters:
x (Pandas DataFrame or Numpy Array, shape (n, d)) – Samples of covariates
- Returns:
Matrix with binary values, of shape (n, r), where r is the total number of rules considered by the estimator, and where 1 indicates that the sample matches the rule, and 0 indicates otherwise.
- Return type:
Numpy Array, shape (n, r)
- rules(as_str: bool = False, transform: Callable[[str, float], float] | None = None, fmt: str = '%.3f', labels: Dict[str, str] = {})[source]#
Return rules learned by the estimator.
- Parameters:
as_str (bool, optional) – Return a string if True, otherwise a dictionary, defaults to False
transform (Optional[Callable[[str, float], float]], optional) – A function that takes key-value pairs for rules and thresholds and transforms the value. This function is used to re-scale standardized data, defaults to None
fmt (str, optional) – Formatting string for float values, for printing rules with thresholds, defaults to “%.3f”
labels (Dict[str, str], optional) – Dictionary mapping from original feature names to display names when printing rules, any feature not specified here will default to the original name, defaults to {}
dowhy.causal_refuters.overrule.utils module#
Utilities for learning boolean rules.
This module implements the boolean ruleset estimator from OverRule [1]. Code is adapted (with some simplifications) from clinicalml/overlap-code, under the MIT License.
[1] Oberst, M., Johansson, F., Wei, D., Gao, T., Brat, G., Sontag, D., & Varshney, K. (2020). Characterization of Overlap in Observational Studies. In S. Chiappa & R. Calandra (Eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (Vol. 108, pp. 788–798). PMLR. https://arxiv.org/abs/1907.04138
- dowhy.causal_refuters.overrule.utils.fatom(f: str, o: str, v: str | float | None, fmt: str = '%.3f') str [source]#
Format an “atom”, i.e., a single literal in a Boolean Rule.
- Parameters:
f (str) – Feature name
o (str) – Operator, one of [“<=”, “>”, “>=”, “<”, “==”, “not”, “”]
v (Optional[Union[str, float]]) – Value of comparison for [“<=”, “>”, “>=”, “<”, “==”]
fmt (str) – Formatting string for floats, defaults to “%.3f”
- Returns:
Formatted atom
- Return type:
str
- dowhy.causal_refuters.overrule.utils.rule_str(C: List, fmt: str = '%.3f') str [source]#
Convert a rule into a string.
- Parameters:
C (List) – List of rules, where each element is a list (a single rule) containing a set of atoms.
fmt (str) – Formatting string for floats, defaults to “%.3f”
- Returns:
Formatted rule
- Return type:
str
- dowhy.causal_refuters.overrule.utils.sampleUnif(x, n: int = 10000, seed: int | None = None)[source]#
Generate samples from a uniform distribution over the max / min of each column of the sample X.
These are used for estimation of support, as the number of samples included under the rules gives a measure of volume. This function is specialized to continuous variables, while sample_reference handles the general case, calling this function where necessary.
- Parameters:
x (Pandas Dataframe or Numpy Array) – 2D array of samples, where each column corresponds to a feature.
n (int, optional) – int, defaults to 10000
seed (int, optional) – Random seed for uniform sampling, defaults to None
- dowhy.causal_refuters.overrule.utils.sample_reference(x, n: int | None = None, cat_cols: List[str] = [], seed: int | None = None, ref_range: Dict | None = None)[source]#
Generate samples from a uniform distribution over the columns of X.
- Parameters:
x (Pandas Dataframe or Numpy Array) – 2D array of samples, where each column corresponds to a feature.
n (Optional[int], optional) – Number of samples to draw, defaults to the same number as the samples provided.
cat_cols (List[str], optional) – Set of categorical columns, defaults to None
seed (int, optional) – Random seed for uniform sampling, defaults to None
ref_range (Optional[Dict], optional) – Manual override of the range for reference samples, given as a dictionary of the form ref_range = {c: {“is_binary”: True/False, “min”: min_value, “max”: max_value}}