dowhy.causal_estimators package#
Submodules#
dowhy.causal_estimators.causalml module#
- class dowhy.causal_estimators.causalml.Causalml(identified_estimand: IdentifiedEstimand, causalml_estimator: _CausalmlEstimator | str, test_significance: bool | str = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: bool | str = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, **kwargs)[source]#
Bases:
CausalEstimator
Wrapper class for estimators from the causalml library.
Supports additional parameters as listed below. For specific parameters of each estimator, refer to the CausalML docs.
- Parameters:
identified_estimand – probability expression representing the target identified estimand to estimate.
causalml_methodname – Fully qualified name of causalml estimator class.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap
num_null_simulations – The number of simulations for testing the statistical significance of the estimator
num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate
sample_size_fraction – The size of the sample for the bootstrap estimator
confidence_level – The confidence level of the confidence interval estimate
need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph
num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
kwargs – (optional) Additional estimator-specific parameters
- estimate_effect(data: DataFrame, treatment_value: Any = 1, control_value: Any = 0, target_units=None, **_)[source]#
data: dataframe containing the data on which treatment effect is to be estimated. treatment_value: value of the treatment variable for which the effect is to be estimated. control_value: value of the treatment variable that denotes its absence (usually 0) target_units: The units for which the treatment effect should be estimated.
It can be a DataFrame that contains values of the effect_modifiers and effect will be estimated only for this new data. It can also be a lambda function that can be used as an index for the data (pandas DataFrame) to select the required rows.
- fit(data: DataFrame, effect_modifier_names: List[str] | None = None)[source]#
Fits the estimator with data for effect estimation :param data: data frame containing the data :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param effect_modifiers: Variables on which to compute separate
effects, or return a heterogeneous effect function. Not all methods support this currently.
dowhy.causal_estimators.distance_matching_estimator module#
- class dowhy.causal_estimators.distance_matching_estimator.DistanceMatchingEstimator(identified_estimand: IdentifiedEstimand, test_significance: bool | str = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: bool | str = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, num_matches_per_unit: int = 1, distance_metric: str = 'minkowski', **kwargs)[source]#
Bases:
CausalEstimator
Simple matching estimator for binary treatments based on a distance metric.
Supports additional parameters as listed below.
- Parameters:
identified_estimand – probability expression representing the target identified estimand to estimate.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap
num_null_simulations – The number of simulations for testing the statistical significance of the estimator
num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate
sample_size_fraction – The size of the sample for the bootstrap estimator
confidence_level – The confidence level of the confidence interval estimate
need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph
num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
num_matches_per_unit – The number of matches per data point. Default=1.
distance_metric – Distance metric to use. Default=”minkowski” that corresponds to Euclidean distance metric with p=2.
kwargs – (optional) Additional estimator-specific parameters
- Valid_Dist_Metric_Params = ['p', 'V', 'VI', 'w']#
- estimate_effect(data: DataFrame, treatment_value: Any = 1, control_value: Any = 0, target_units=None, **_)[source]#
- fit(data: DataFrame, effect_modifier_names: List[str] | None = None, exact_match_cols=None)[source]#
Fits the estimator with data for effect estimation :param data: data frame containing the data :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param exact_match_cols: List of column names whose values should be exactly matched. Typically used for columns with discrete values. :param effect_modifiers: Variables on which to compute separate
effects, or return a heterogeneous effect function. Not all methods support this currently.
dowhy.causal_estimators.econml module#
- class dowhy.causal_estimators.econml.Econml(identified_estimand: IdentifiedEstimand, econml_estimator: _EconmlEstimator | str, test_significance: bool | str = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: bool | str = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, **kwargs)[source]#
Bases:
CausalEstimator
Wrapper class for estimators from the EconML library.
Supports additional parameters as listed below. For init and fit parameters of each estimator, refer to the EconML docs.
- Parameters:
identified_estimand – probability expression representing the target identified estimand to estimate.
econml_estimator – Instance of an econml estimator class.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap
num_null_simulations – The number of simulations for testing the statistical significance of the estimator
num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate
sample_size_fraction – The size of the sample for the bootstrap estimator
confidence_level – The confidence level of the confidence interval estimate
need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph
num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
kwargs – (optional) Additional estimator-specific parameters
- effect(df: DataFrame, *args, **kwargs) ndarray [source]#
Pointwise estimated treatment effect, output shape n_units x n_treatment_values (not counting control) :param df: Features of the units to evaluate :param args: passed through to the underlying estimator :param kwargs: passed through to the underlying estimator
- effect_inference(df: DataFrame, *args, **kwargs)[source]#
Inference (uncertainty) results produced by the underlying EconML estimator :param df: Features of the units to evaluate :param args: passed through to the underlying estimator :param kwargs: passed through to the underlying estimator
- effect_interval(df: DataFrame, *args, **kwargs) ndarray [source]#
Pointwise confidence intervals for the estimated treatment effect :param df: Features of the units to evaluate :param args: passed through to the underlying estimator :param kwargs: passed through to the underlying estimator
- effect_tt(df: DataFrame, treatment_value, *args, **kwargs)[source]#
Effect of the actual treatment that was applied to each unit (“effect of Treatment on the Treated”) :param df: Features of the units to evaluate :param args: passed through to estimator.effect() :param kwargs: passed through to estimator.effect()
- estimate_effect(data: DataFrame, treatment_value: Any = 1, control_value: Any = 0, target_units=None, **_)[source]#
data: dataframe containing the data on which treatment effect is to be estimated. treatment_value: value of the treatment variable for which the effect is to be estimated. It can be (optionally) a sequence for different values of the treatment variable. control_value: value of the treatment variable that denotes its absence (usually 0) target_units: The units for which the treatment effect should be estimated.
It can be a DataFrame that contains values of the effect_modifiers and effect will be estimated only for this new data. It can also be a lambda function that can be used as an index for the data (pandas DataFrame) to select the required rows.
- fit(data: DataFrame, effect_modifier_names: List[str] | None = None, **kwargs)[source]#
Fits the estimator with data for effect estimation :param data: data frame containing the data :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param effect_modifiers: Variables on which to compute separate
effects, or return a heterogeneous effect function. Not all methods support this currently.
dowhy.causal_estimators.generalized_linear_model_estimator module#
- class dowhy.causal_estimators.generalized_linear_model_estimator.GeneralizedLinearModelEstimator(identified_estimand: IdentifiedEstimand, test_significance: bool | str = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: bool | str = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, glm_family: Any | None = None, predict_score: bool = True, **kwargs)[source]#
Bases:
RegressionEstimator
Compute effect of treatment using a generalized linear model such as logistic regression.
Implementation uses statsmodels.api.GLM. Needs an additional parameter, “glm_family” to be specified in method_params. The value of this parameter can be any valid statsmodels.api families object. For example, to use logistic regression, specify “glm_family” as statsmodels.api.families.Binomial().
For a list of args and kwargs, see documentation for
CausalEstimator
.- Parameters:
identified_estimand – probability expression representing the target identified estimand to estimate.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap
num_null_simulations – The number of simulations for testing the statistical significance of the estimator
num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate
sample_size_fraction – The size of the sample for the bootstrap estimator
confidence_level – The confidence level of the confidence interval estimate
need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph
num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
glm_family – statsmodels family for the generalized linear model. For example, use statsmodels.api.families.Binomial() for logistic regression or statsmodels.api.families.Poisson() for count data.
predict_score – For models that have a binary output, whether to output the model’s score or the binary output based on the score.
kwargs – (optional) Additional estimator-specific parameters
- fit(data: DataFrame, effect_modifier_names: List[str] | None = None)[source]#
Fits the estimator with data for effect estimation :param data: data frame containing the data :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param effect_modifiers: Variables on which to compute separate
effects, or return a heterogeneous effect function. Not all methods support this currently.
dowhy.causal_estimators.instrumental_variable_estimator module#
- class dowhy.causal_estimators.instrumental_variable_estimator.InstrumentalVariableEstimator(identified_estimand: IdentifiedEstimand, iv_instrument_name: List | Dict | str | None = None, test_significance: bool | str = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: bool | str = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, **kwargs)[source]#
Bases:
CausalEstimator
Compute effect of treatment using the instrumental variables method.
This is also a superclass that can be inherited by other specific methods.
Supports additional parameters as listed below.
- Parameters:
identified_estimand – probability expression representing the target identified estimand to estimate.
iv_instrument_name – Name of the specific instrumental variable to be used. Needs to be one of the IVs identified in the identification step. Default is to use all the IV variables from the identification step.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap
num_null_simulations – The number of simulations for testing the statistical significance of the estimator
num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate
sample_size_fraction – The size of the sample for the bootstrap estimator
confidence_level – The confidence level of the confidence interval estimate
need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph
num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
kwargs – (optional) Additional estimator-specific parameters
- estimate_effect(data: DataFrame, treatment_value: Any = 1, control_value: Any = 0, target_units=None, **_)[source]#
data: dataframe containing the data on which treatment effect is to be estimated. treatment_value: value of the treatment variable for which the effect is to be estimated. control_value: value of the treatment variable that denotes its absence (usually 0) target_units: The units for which the treatment effect should be estimated.
It can be a DataFrame that contains values of the effect_modifiers and effect will be estimated only for this new data. It can also be a lambda function that can be used as an index for the data (pandas DataFrame) to select the required rows.
- fit(data: DataFrame, effect_modifier_names: List[str] | None = None)[source]#
Fits the estimator with data for effect estimation :param data: data frame containing the data :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param effect_modifiers: Variables on which to compute separate
effects, or return a heterogeneous effect function. Not all methods support this currently.
dowhy.causal_estimators.linear_regression_estimator module#
- class dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator(identified_estimand: IdentifiedEstimand, test_significance: bool | str = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: bool | str = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, **kwargs)[source]#
Bases:
RegressionEstimator
Compute effect of treatment using linear regression.
Fits a regression model for estimating the outcome using treatment(s) and confounders. For a univariate treatment, the treatment effect is equivalent to the coefficient of the treatment variable.
Simple method to show the implementation of a causal inference method that can handle multiple treatments and heterogeneity in treatment. Requires a strong assumption that all relationships from (T, W) to Y are linear.
- Parameters:
identified_estimand – probability expression representing the target identified estimand to estimate.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap
num_null_simulations – The number of simulations for testing the statistical significance of the estimator
num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate
sample_size_fraction – The size of the sample for the bootstrap estimator
confidence_level – The confidence level of the confidence interval estimate
need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph
num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
kwargs – (optional) Additional estimator-specific parameters
- fit(data: DataFrame, effect_modifier_names: List[str] | None = None)[source]#
Fits the estimator with data for effect estimation :param data: data frame containing the data :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param effect_modifiers: Variables on which to compute separate
effects, or return a heterogeneous effect function. Not all methods support this currently.
dowhy.causal_estimators.propensity_score_estimator module#
- class dowhy.causal_estimators.propensity_score_estimator.PropensityScoreEstimator(identified_estimand: IdentifiedEstimand, test_significance: bool | str = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: bool | str = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, propensity_score_model: Any | None = None, propensity_score_column: str = 'propensity_score', **kwargs)[source]#
Bases:
CausalEstimator
Base class for estimators that estimate effects based on propensity of treatment assignment.
Supports additional parameters as listed below.
- Parameters:
identified_estimand – probability expression representing the target identified estimand to estimate.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap
num_null_simulations – The number of simulations for testing the statistical significance of the estimator
num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate
sample_size_fraction – The size of the sample for the bootstrap estimator
confidence_level – The confidence level of the confidence interval estimate
need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph
num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
propensity_score_model – Model used to compute propensity score. Can be any classification model that supports fit() and predict_proba() methods. If None, LogisticRegression is used.
propensity_score_column – Column name that stores the propensity score. Default=’propensity_score’
kwargs – (optional) Additional estimator-specific parameters
- construct_symbolic_estimator(estimand)[source]#
A symbolic string that conveys what each estimator does. For instance, linear regression is expressed as y ~ bx + e
- fit(data: DataFrame, effect_modifier_names: List[str] | None = None)[source]#
Fits the estimator with data for effect estimation :param data: data frame containing the data :param effect_modifiers: Variables on which to compute separate
effects, or return a heterogeneous effect function. Not all methods support this currently.
dowhy.causal_estimators.propensity_score_matching_estimator module#
- class dowhy.causal_estimators.propensity_score_matching_estimator.PropensityScoreMatchingEstimator(identified_estimand: IdentifiedEstimand, test_significance: bool | str = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: bool | str = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, propensity_score_model: Any | None = None, propensity_score_column: str = 'propensity_score', **kwargs)[source]#
Bases:
PropensityScoreEstimator
Estimate effect of treatment by finding matching treated and control units based on propensity score.
Straightforward application of the back-door criterion.
Supports additional parameters as listed below.
- Parameters:
identified_estimand – probability expression representing the target identified estimand to estimate.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap
num_null_simulations – The number of simulations for testing the statistical significance of the estimator
num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate
sample_size_fraction – The size of the sample for the bootstrap estimator
confidence_level – The confidence level of the confidence interval estimate
need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph
num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
propensity_score_model – Model used to compute propensity score. Can be any classification model that supports fit() and predict_proba() methods. If None, LogisticRegression is used.
propensity_score_column – Column name that stores the propensity score. Default=’propensity_score’
kwargs – (optional) Additional estimator-specific parameters
- construct_symbolic_estimator(estimand)[source]#
A symbolic string that conveys what each estimator does. For instance, linear regression is expressed as y ~ bx + e
- estimate_effect(data: DataFrame, treatment_value: Any = 1, control_value: Any = 0, target_units=None, **_)[source]#
- fit(data: DataFrame, effect_modifier_names: List[str] | None = None)[source]#
Fits the estimator with data for effect estimation :param data: data frame containing the data :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param effect_modifiers: Variables on which to compute separate
effects, or return a heterogeneous effect function. Not all methods support this currently.
dowhy.causal_estimators.propensity_score_stratification_estimator module#
- class dowhy.causal_estimators.propensity_score_stratification_estimator.PropensityScoreStratificationEstimator(identified_estimand: IdentifiedEstimand, test_significance: bool | str = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: bool | str = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, num_strata: str | int = 'auto', clipping_threshold: int = 10, propensity_score_model: Any | None = None, propensity_score_column: str = 'propensity_score', **kwargs)[source]#
Bases:
PropensityScoreEstimator
Estimate effect of treatment by stratifying the data into bins with identical common causes.
Straightforward application of the back-door criterion.
Supports additional parameters as listed below.
- Parameters:
identified_estimand – probability expression representing the target identified estimand to estimate.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap
num_null_simulations – The number of simulations for testing the statistical significance of the estimator
num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate
sample_size_fraction – The size of the sample for the bootstrap estimator
confidence_level – The confidence level of the confidence interval estimate
need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph
num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
num_strata – Number of bins by which data will be stratified. Default is automatically determined.
clipping_threshold – Mininum number of treated or control units per strata. Default=10
propensity_score_model – The model used to compute propensity score. Can be any classification model that supports fit() and predict_proba() methods. If None, use LogisticRegression model as the default.
propensity_score_column – Column name that stores the propensity
score. Default=’propensity_score’ :param kwargs: (optional) Additional estimator-specific parameters
- construct_symbolic_estimator(estimand)[source]#
A symbolic string that conveys what each estimator does. For instance, linear regression is expressed as y ~ bx + e
- estimate_effect(data: DataFrame, treatment_value: Any = 1, control_value: Any = 0, target_units=None, **_)[source]#
- fit(data: DataFrame, effect_modifier_names: List[str] | None = None)[source]#
Fits the estimator with data for effect estimation :param data: data frame containing the data :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param effect_modifiers: Variables on which to compute separate
effects, or return a heterogeneous effect function. Not all methods support this currently.
dowhy.causal_estimators.propensity_score_weighting_estimator module#
- class dowhy.causal_estimators.propensity_score_weighting_estimator.PropensityScoreWeightingEstimator(identified_estimand: IdentifiedEstimand, test_significance: bool | str = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: bool | str = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, min_ps_score: float = 0.05, max_ps_score: float = 0.95, weighting_scheme: str = 'ips_weight', propensity_score_model: Any | None = None, propensity_score_column: str = 'propensity_score', **kwargs)[source]#
Bases:
PropensityScoreEstimator
Estimate effect of treatment by weighing the data by inverse probability of occurrence.
Straightforward application of the back-door criterion.
Supports additional parameters as listed below.
- Parameters:
identified_estimand – probability expression representing the target identified estimand to estimate.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap
num_null_simulations – The number of simulations for testing the statistical significance of the estimator
num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate
sample_size_fraction – The size of the sample for the bootstrap estimator
confidence_level – The confidence level of the confidence interval estimate
need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph
num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
min_ps_score – Lower bound used to clip the propensity score. Default=0.05
max_ps_score – Upper bound used to clip the propensity score. Default=0.95
weighting_scheme – Weighting method to use. Can be inverse propensity score (“ips_weight”, default), stabilized IPS score (“ips_stabilized_weight”), or normalized IPS score (“ips_normalized_weight”).
propensity_score_model – The model used to compute propensity score. Can be any classification model that supports fit() and predict_proba() methods. If None, use LogisticRegression model as the default. Default=None
propensity_score_column – Column name that stores the propensity score. Default=’propensity_score’
kwargs – (optional) Additional estimator-specific parameters
- construct_symbolic_estimator(estimand)[source]#
A symbolic string that conveys what each estimator does. For instance, linear regression is expressed as y ~ bx + e
- estimate_effect(data: DataFrame, treatment_value: Any = 1, control_value: Any = 0, target_units=None, **_)[source]#
- fit(data: DataFrame, effect_modifier_names: List[str] | None = None)[source]#
Fits the estimator with data for effect estimation :param data: data frame containing the data :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param effect_modifiers: Variables on which to compute separate
effects, or return a heterogeneous effect function. Not all methods support this currently.
dowhy.causal_estimators.regression_discontinuity_estimator module#
- class dowhy.causal_estimators.regression_discontinuity_estimator.RegressionDiscontinuityEstimator(identified_estimand: IdentifiedEstimand, test_significance: bool | str = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: bool | str = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, rd_variable_name: str | None = None, rd_threshold_value: float | None = None, rd_bandwidth: float | None = None, **kwargs)[source]#
Bases:
CausalEstimator
Compute effect of treatment using the regression discontinuity method.
Estimates effect by transforming the problem to an instrumental variables problem.
Supports additional parameters as listed below.
- Parameters:
identified_estimand – probability expression representing the target identified estimand to estimate.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap
num_null_simulations – The number of simulations for testing the statistical significance of the estimator
num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate
sample_size_fraction – The size of the sample for the bootstrap estimator
confidence_level – The confidence level of the confidence interval estimate
need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph
num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
rd_variable_name – Name of the variable on which the discontinuity occurs. This is the instrument.
rd_threshold_value – Threshold at which the discontinuity occurs.
rd_bandwidth – Distance from the threshold within which confounders can be considered the same between treatment and control. Considered band is (threshold +- bandwidth)
kwargs – (optional) Additional estimator-specific parameters
- estimate_effect(data: DataFrame, treatment_value: Any = 1, control_value: Any = 0, target_units=None, **_)[source]#
- fit(data: DataFrame, effect_modifier_names: List[str] | None = None)[source]#
Fits the estimator with data for effect estimation :param data: data frame containing the data :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param effect_modifiers: Variables on which to compute separate
effects, or return a heterogeneous effect function. Not all methods support this currently.
dowhy.causal_estimators.regression_estimator module#
- class dowhy.causal_estimators.regression_estimator.RegressionEstimator(identified_estimand: IdentifiedEstimand, test_significance: bool | str = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: bool | str = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, **kwargs)[source]#
Bases:
CausalEstimator
Compute effect of treatment using some regression function.
Fits a regression model for estimating the outcome using treatment(s) and confounders.
Base class for all regression models, inherited by LinearRegressionEstimator and GeneralizedLinearModelEstimator.
- Parameters:
identified_estimand – probability expression representing the target identified estimand to estimate.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap
num_null_simulations – The number of simulations for testing the statistical significance of the estimator
num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate
sample_size_fraction – The size of the sample for the bootstrap estimator
confidence_level – The confidence level of the confidence interval estimate
need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph
num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
kwargs – (optional) Additional estimator-specific parameters
- estimate_effect(data: DataFrame, treatment_value: Any = 1, control_value: Any = 0, target_units=None, need_conditional_estimates=None, **_)[source]#
- fit(data: DataFrame, effect_modifier_names: List[str] | None = None)[source]#
Fits the estimator with data for effect estimation :param data: data frame containing the data :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param effect_modifiers: Variables on which to compute separate
effects, or return a heterogeneous effect function. Not all methods support this currently.
- interventional_outcomes(data_df: DataFrame, treatment_val)[source]#
Applies an intervention treatment_val to all rows in data_df, then uses self.model to predict outcomes. If data_df is None, will use self._data instead. If no model exists, one will be created. The outcomes of all samples are returned, allowing analysis of individual predictions in counterfactual treatment scenarios. :param data_df: data frame containing the data :param treatment_val: value for the treatment variable :returns: A list of outcome predictions.
dowhy.causal_estimators.two_stage_regression_estimator module#
- class dowhy.causal_estimators.two_stage_regression_estimator.TwoStageRegressionEstimator(identified_estimand: IdentifiedEstimand, test_significance: bool | str = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: bool | str = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, first_stage_model: CausalEstimator | Type[CausalEstimator] | None = None, second_stage_model: CausalEstimator | Type[CausalEstimator] | None = None, **kwargs)[source]#
Bases:
CausalEstimator
Compute treatment effect whenever the effect is fully mediated by another variable (front-door) or when there is an instrument available.
Currently only supports a linear model for the effects.
Supports additional parameters as listed below.
- Parameters:
identified_estimand – probability expression representing the target identified estimand to estimate.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap
num_null_simulations – The number of simulations for testing the statistical significance of the estimator
num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate
sample_size_fraction – The size of the sample for the bootstrap estimator
confidence_level – The confidence level of the confidence interval estimate
need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph
num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
first_stage_model – First stage estimator to be used. Default is linear regression.
second_stage_model – Second stage estimator to be used. Default is linear regression.
kwargs – (optional) Additional estimator-specific parameters
- DEFAULT_FIRST_STAGE_MODEL#
alias of
LinearRegressionEstimator
- DEFAULT_SECOND_STAGE_MODEL#
alias of
LinearRegressionEstimator
- construct_symbolic_estimator(first_stage_symbolic, second_stage_symbolic, total_effect_symbolic=None, estimand_type=None)[source]#
- estimate_effect(data: DataFrame, treatment_value: Any = 1, control_value: Any = 0, target_units=None, **_)[source]#
- fit(data: DataFrame, effect_modifier_names: List[str] | None = None, **_)[source]#
Fits the estimator with data for effect estimation :param data: data frame containing the data :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param iv_instrument_name: Name of the specific instrumental variable
to be used. Needs to be one of the IVs identified in the identification step. Default is to use all the IV variables from the identification step.
- Parameters:
effect_modifiers – Variables on which to compute separate effects, or return a heterogeneous effect function. Not all methods support this currently.