dowhy.causal_refuters.overrule.BCS package#

Submodules#

dowhy.causal_refuters.overrule.BCS.load_process_data_BCS module#

Code for Binarizing Features.

This module implements the boolean ruleset estimator from OverRule [1]. Code is adapted (with some simplifications) from clinicalml/overlap-code, under the MIT License.

[1] Oberst, M., Johansson, F., Wei, D., Gao, T., Brat, G., Sontag, D., & Varshney, K. (2020). Characterization of Overlap in Observational Studies. In S. Chiappa & R. Calandra (Eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (Vol. 108, pp. 788–798). PMLR. https://arxiv.org/abs/1907.04138

class dowhy.causal_refuters.overrule.BCS.load_process_data_BCS.FeatureBinarizer(colCateg: List[str] = [], numThresh: int = 9, negations: bool = False, threshStr: bool = False, threshOverride: Dict = {}, **kwargs)[source]#

Bases: TransformerMixin

Transformer for binarizing categorical and ordinal (including continuous) features.

Note that all features are converted into binary variables before learning Boolean rules.

Initialize transformer for binarizing categorical and ordinal (including continuous) features

Parameters:
  • colCateg (List[str], optional) – List of categorical columns, defaults to [], ‘object’ dtype automatically treated as categorical

  • numThresh (int, optional) – Number of quantile thresholds to binarize ordinal features, defaults to 9

  • negations (bool, optional) – Include negations, defaults to False

  • threshStr (bool, optional) – Convert thresholds to strings, defaults to False

  • threshOverride (Dict, optional) – Dictionary to override quantile thresholds, defaults to {}, formatted as {colname : np.linspace object} to define cuts

fit(X)[source]#

Fit to data, including the learning of thresholds where appropriate.

Sets the following internal variables: * maps = dictionary of mappings for unary/binary columns * enc = dictionary of OneHotEncoders for categorical columns * thresh = dictionary of lists of thresholds for ordinal columns * NaN = list of ordinal columns containing NaN values

Parameters:

X (pd.DataFrame) – Original features as a Pandas Dataframe

transform(X: DataFrame) DataFrame[source]#

Transform data into binary features.

Parameters:

X (pd.DataFrame) – Original features as a Pandas Dataframe

Return A:

Binary feature dataframe

dowhy.causal_refuters.overrule.BCS.overlap_boolean_rule module#

OverlapBooleanRule.

This module implements the boolean ruleset estimator from OverRule [1]. Code is adapted (with some simplifications) from clinicalml/overlap-code, under the MIT License.

[1] Oberst, M., Johansson, F., Wei, D., Gao, T., Brat, G., Sontag, D., & Varshney, K. (2020). Characterization of Overlap in Observational Studies. In S. Chiappa & R. Calandra (Eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (Vol. 108, pp. 788–798). PMLR. https://arxiv.org/abs/1907.04138

class dowhy.causal_refuters.overrule.BCS.overlap_boolean_rule.OverlapBooleanRule(alpha=0.95, lambda0=0.01, lambda1=0.01, K=20, D=20, B=10, iterMax=10, eps=1e-06, silent=False, verbose=False, solver='ECOS', rounding='greedy_sweep')[source]#

Bases: object

Overlap Boolean Rule class in the style of scikit-learn

Learn Boolean Rules in Disjuntive Normal Form to describe the positive class.

Parameters:
  • alpha (float, optional) – Fraction of the positive samples to ensure are included in the rules, defaults to 0.95

  • lambda0 (float, optional) – Regularization on the # of rules, defaults to 1e-2

  • lambda1 (float, optional) – Regularization on the # of literals, defaults to 1e-2

  • K (int, optional) – Maximum results returned during beam search, defaults to 20

  • D (int, optional) – Maximum extra rules per beam seach iteration, defaults to 20

  • B (int, optional) – Width of beam search, defaults to 10

  • iterMax (int, optional) – Maximum number of iterations of column generation, defaults to 10

  • eps (float, optional) – Numerical tolerance on comparisons, defaults to 1e-6

  • silent (bool) – Silence non-optimizer output, defaults to False

  • verbose (bool, optional) – Verbose optimizer output, defaults to False

  • solver (str, optional) – Linear programming solver used by CVXPY to solve the LP relaxation, defaults to ‘ECOS’

  • rounding (str, optional) – Strategy to perform rounding, either ‘greedy’ or ‘greedy_sweep’, defaults to ‘greedy_sweep’

compute_conjunctions(X)[source]#

Compute conjunctions of features specified in self.z

fit(X: DataFrame, y: ndarray | DataFrame)[source]#

Fit model to training data.

Parameters:
  • X – Pandas DataFrame containing covariates

  • y – +1 for Overlap/Support (depending on rules being learned), 0 for non-overlap, and -1 for background samples. Should only contain (+1/0) for overlap rules, or (+1/-1) for learning support rules.

get_objective_value(X, o, rounded=True)[source]#
get_params()[source]#

Returns estimator parameters

greedy_round_(X: DataFrame, y: ndarray | DataFrame, xi: float = 0.5, use_lp: bool = False)[source]#

Round the rule coefficients to integer values.

For DNF, this starts with no conjunctions, and adds them greedily based on a cost, which penalizes (any) inclusion of negative samples, and rewards (new) inclusion of positive samples, and goes until it covers at least alpha fraction of positive samples.

Parameters:
  • X – Pandas DataFrame containing covariates

  • y – +1 for Overlap/Support (depending on rules being learned), 0 for non-overlap, and -1 for background samples. Should only contain (+1/0) for overlap rules, or (+1/-1) for learning support rules.

  • xi – Reward for including positive samples, relative to cost (1) for including negative samples

  • use_lp – Restrict to those conjuctions where the LP coefficients are positive. Note that the LP makes a difference regardless, as we only consider the rules generated by column generation here.

predict(X)[source]#

Predict whether points belong to overlap region

predict_(X, w)[source]#

Predict whether points belong to overlap region

predict_rules(X)[source]#

Predict whether points belong to overlap region

round_(X: DataFrame, y: ndarray | DataFrame, scoring: str = 'greedy', xi=None, use_lp: bool = True)[source]#

Round the rule coefficients to integer values via a greedy approach, either using a fixed reward (scoring=”greedy”) or optimizing the reward for including positive examples according to balanced accuracy on classifying positive vs negative samples (scoring=”greedy_sweep).

Parameters:
  • X – Pandas DataFrame containing covariates

  • y – +1 for Overlap/Support (depending on rules being learned), 0 for non-overlap, and -1 for background samples. Should only contain (+1/0) for overlap rules, or (+1/-1) for learning support rules.

  • xi – Reward for including positive samples, relative to cost (1) for including negative samples. For scoring=”greedy”, should be a single value, or an array of values for scoring=”greedy_sweep”. For the latter, will default to np.logspace(np.log10(0.01), 0.5, 20).

  • use_lp – Restrict to those conjuctions where the LP coefficients are positive. Note that the LP makes a difference regardless, as we only consider the rules generated by column generation here.

set_params(**params)[source]#

Sets estimator parameters

Module contents#