diff --git a/.github/ISSUE_TEMPLATE/task-template.md b/.github/ISSUE_TEMPLATE/task-template.md new file mode 100644 index 0000000..805c6f6 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/task-template.md @@ -0,0 +1,10 @@ +--- +name: Task template +about: Tasks related to launch and organize the KDD 2023 workshop +title: "[Task] " +labels: '' +assignees: '' + +--- + +* Deadline: diff --git a/.github/ISSUE_TEMPLATE/upgrade.md b/.github/ISSUE_TEMPLATE/upgrade.md index f854fcb..b60dfa8 100644 --- a/.github/ISSUE_TEMPLATE/upgrade.md +++ b/.github/ISSUE_TEMPLATE/upgrade.md @@ -1,8 +1,8 @@ --- name: "[fastpages] Automated Upgrade" -about: "Trigger a PR for upgrading fastpages" +about: Trigger a PR for upgrading fastpages title: "[fastpages] Automated Upgrade" -labels: fastpages-automation +labels: '' assignees: '' --- diff --git a/.gitignore b/.gitignore index 6cdd784..f0c1dce 100644 --- a/.gitignore +++ b/.gitignore @@ -11,4 +11,5 @@ _notebooks/.ipynb_checkpoints # Local Netlify folder .netlify .tweet-cache +.DS_Store __pycache__ diff --git a/README.md b/README.md index 4af745a..f8a9359 100755 --- a/README.md +++ b/README.md @@ -1,120 +1,193 @@ [//]: # (This template replaces README.md when someone creates a new repo with the fastpages template.) -![](https://github.com/causal-machine-learning/kdd2021-tutorial/workflows/CI/badge.svg) -![](https://github.com/causal-machine-learning/kdd2021-tutorial/workflows/GH-Pages%20Status/badge.svg) +![](https://github.com/causal-machine-learning/kdd2023-workshop/workflows/CI/badge.svg) +![](https://github.com/causal-machine-learning/kdd2023-workshop/workflows/GH-Pages%20Status/badge.svg) [![](https://img.shields.io/static/v1?label=fastai&message=fastpages&color=57aeac&labelColor=black&style=flat&logo=)](https://github.com/fastai/fastpages) -# **Causal Inference and Machine Learning in Practice with EconML and CausalML**: Industrial Use Cases at Microsoft, TripAdvisor, Uber +--- +layout: home +search_exclude: true +image: images/logo.png +--- -## **Schedule** +# **Causal Inference and Machine Learning in Practice**: Use cases for Product, Brand, Policy and Beyond -### Time +## **Schedule** -* 4:00 AM - 7:00 AM August 15, 2021 [SGT](https://www.timeanddate.com/worldclock/converter.html?iso=20210814T200000&p1=236&p2=tz_pt&p3=tz_et) -* 4:00 PM - 7:00 PM August 14, 2021 [EDT](https://www.timeanddate.com/worldclock/converter.html?iso=20210814T200000&p1=236&p2=tz_pt&p3=tz_et) -* 1:00 PM - 4:00 PM August 14, 2021 [PDT](https://www.timeanddate.com/worldclock/converter.html?iso=20210814T200000&p1=236&p2=tz_pt&p3=tz_et) +* Long Beach Convention & Entertainment Center, 300 E Ocean Blvd, Long Beach, CA 90802 +([Map](https://goo.gl/maps/1N3XGEovGgJqXAV98)) +* Date: August 7, 2023 (Monday) +* Time: 1:00 - 5:00 PM Pacific Time -### Live Zoom Link +## **Abstract** -To be shared within the KDD 21 Virtual Platform during the conference. +The increasing demand for data-driven decision-making has led to the rapid growth of machine learning applications in +various industries. However, the ability to draw causal inferences from observational data remains a crucial challenge. +In recent years, causal inference has emerged as a powerful tool for understanding the effects of interventions in +complex systems. Combining causal inference with machine learning has the potential to provide a deeper understanding of +the underlying mechanisms and to develop more effective solutions to real-world problems. -## **Abstract** +This workshop aims to bring together researchers and practitioners from academia and industry to share their experiences +and insights on applying causal inference and machine learning techniques to real-world problems in the areas of +product, brand, policy, and beyond. The workshop welcomes original research that covers machine learning theory, deep +learning, causal inference, and online learning. Additionally, the workshop encourages topics that address scalable +system design, algorithm bias, and interpretability. -In recent years, both academic research and industry applications see an increased effort in using machine learning methods to measure granular causal effects and design optimal policies based on these causal estimates. Open source packages such as [CausalML](https://github.com/uber/causalml) and [EconML](https://github.com/microsoft/econml) provide a unified interface for applied researchers and industry practitioners with a variety of machine learning methods for causal inference. The tutorial will cover the topics including conditional treatment effect estimators by meta-learners and tree-based algorithms, model validations and sensitivity analysis, optimization algorithms including policy leaner and cost optimization. In addition, the tutorial will demonstrate the production of these algorithms in industry use cases. +Through keynote talks, panel discussions, and contributed talks and posters, the workshop will provide a forum for +discussing the latest advances and challenges in applying causal inference and machine learning to real-world problems. +The workshop will also offer opportunities for networking and collaboration among researchers and practitioners working +in industry, government, and academia. -## **Target Audience and Prerequisites for the Tutorial** +## **Paper Submission** -Anyone who is interested in causal inference and machine learning, especially economists/statisticians/data scientists who want to learn how to combine causal inference and machine learning with real industry use cases incorporated in large scaled machine learning systems at companies such as Microsoft, TripAdvisor and Uber. -The tutorial assumes some basic knowledge in statistical methods, machine learning algorithms and the Python programming language. +Please submit your paper to the [CMT portal](https://cmt3.research.microsoft.com/CMLKDD2023) site, and check the [Call for +Paper](https://causal-machine-learning.github.io/kdd2023-workshop/cfp/) page for details on important dates and +submission guidelines. ## **Outline** -| **Title** | **Duration** | Slides | Code | -|-----------|--------------|--------|------| -| **Introduction to Causal Inference** | 20 minutes | [Slides](https://drive.google.com/file/d/1O1oVU3nX7ThzCrUxlFK-OJsxF3Bz8Khl/view?usp=sharing) | | -| **Case Studies Part 1 by CausalML** | | | | -| Introduction to CausalML| 15 minutes | [Slides](https://docs.google.com/presentation/d/1D-cqqwKyWseVoNQEH-d0TS9wqK55l3bSn3PaLFxNEWI/edit?usp=sharing) | | -| Case Study #1: Causal Impact Analysis with Observational Data: CeViChE at Uber | 30 minutes | [Slides](https://docs.google.com/presentation/d/1FvRtis2fm4c2R7XmRKWMTtZaZjUObW1fGxpNmapmjKI/edit?usp=sharing)| [Notebook](https://colab.research.google.com/drive/1ySwg9BIYWS5oLQ5haorMyiIbyiCJ431J?usp=sharing) | -| Case Study #2: Targeting Optimization: Bidder at Uber | 30 minutes |[Slides](https://drive.google.com/file/d/1QJJUCo4LH5kGQP3kaJlG1RdhjhaJWp-5/view?usp=sharing) |[Notebook](https://colab.research.google.com/drive/1fnZEHIAcNxrvSxFrlO1hRTHO7sazXbo0?usp=sharing) | -| **Case Studies Part 2 by EconML** | | | | -| Introduction to EconML| 15 minutes | [Slides](https://drive.google.com/file/d/1gt4KNznrYbwdryi9jGcC0-hDCNg7mBNE/view?usp=sharing) | [Notebook](https://colab.research.google.com/drive/1m2Ob7dc1JalEb6FIzSG1tx0qW491-YNc?usp=sharing) | -| Case Study #3: Customer Segmentation at TripAdvisor with Recommendation A/B Tests | 30 minutes | [Slides](https://drive.google.com/file/d/1yyIu_3epIVXbwzJj658Iv4vxHGjtPh8n/view?usp=sharing) | [Notebook](https://colab.research.google.com/drive/1nUhkLVpanv-gm_oA7FbValhpDpEs02wR#scrollTo=qk4_f4tx5gZz) | -| Case Study #4: Long-Term Return-on-Investment at Microsoft via Short-Term Proxies | 30 minutes | [Slides](https://drive.google.com/file/d/1FEKXFHHATntHjsEymXnEw6GAiUGMm8sG/view?usp=sharing) | [Notebook](https://colab.research.google.com/drive/1Ow7ArXRn1NJq47OLvchi26RRTdm94yv8?usp=sharing) | +| **Title** | **Speaker** | **Time (Duration)** | Link | +|-----------|-------------|--------------|------| +| **Introduction** | Organizers | 1:00 - 1:10 PM (10 minutes) | | +| **Invited Talk:** COG: Creative Optimality Gap for Video Advertising | [Raif Rustamov](#raif-rustamov-amazon) (Amazon) | 1:10 - 1:30 PM (20 minutes) | [Slides](https://drive.google.com/file/d/1ehHzNj-EDlhpQzCOhlJ9oE7Lnmf2Fpmc/view?usp=sharing)| +| **Invited Talk:** The Value of Last-Mile Delivery in Online Retail | [Ruomeng Cui](#ruomeng-cui-emory-university) (Emory) | 1:30 - 1:50 PM (20 minutes) | [Slides](https://drive.google.com/file/d/19w3ay80K1xBqceZj6DgBtyJgPx2OrYe6/view?usp=drive_link)| +| Leveraging Causal Uplift Modeling for Budget Constrained Benefits Allocation | Dmitri Goldenberg, Javier Albert (Booking.com) | 1:50 - 2:05 PM (15 minutes) | [Slides](https://docs.google.com/presentation/d/1Fz720lBj8DDsviLYNAIyBLVE7TN8qFoVtxPzbfAvo_I/edit?usp=drive_link)| +| Ensemble Method for Estimating Individualized Treatment Effects | Kevin Wu Han, Han Wu (Stanford) | 2:05 - 2:20 PM (15 minutes) | [Slides](https://drive.google.com/file/d/1KLAFpKYw5mlJQ7cNyLamS1LkuGScABSb/view?usp=sharing), [Paper](https://drive.google.com/file/d/1dzsIGgQ4cF2ltetH16A4Zz-L-pZ6tHBK/view?usp=drive_link) | +| A Scalable and Debiased Approach to Dynamic Pricing with Causal Machine Learning and Optimization | Nicolò Cosimo Albanese, Fabian Furrer, Marco Guerriero (Amazon AWS) | 2:20 - 2:35 PM (15 minutes) | [Slides](https://drive.google.com/file/d/1uSqpV51qxNcEyCrgzhYo2nr44tfn717z/view?usp=drive_link), [Paper](https://drive.google.com/file/d/1VlW-zgrCfaKi5CtGYkQbkhWw7JVXhPtw/view?usp=drive_link) | +| An IPW-based Unbiased Ranking Metric in Two-sided Markets | Keisho Oh, Naoki Nishimura (Recruit Co), Minje Sung, Ken Kobayashi, Kazuhide Nakata (Tokyo Institute of Technology) | 2:35 - 2:50 PM (15 minutes) | [Slides](https://drive.google.com/file/d/1XLQCMUNy79jmYb0y1AB7ImRflq4csh6Y/view?usp=drive_link), [Paper](https://drive.google.com/file/d/1Bshr6dwFB-E2H2K64g9OfERNUQcfP0eb/view?usp=drive_link)| +| **Break & Poster Session** | | 3:00 - 3:30 PM (30 minutes) | | +| **Invited Talk:** Unit Selection Based on Counterfactual Logic | [Ang Li](#ang-li-university-of-california-los-angeles) (UCLA) | 3:30 - 3:50 PM (20 minutes) | [Slides](https://drive.google.com/file/d/1P-it4MNrYbnWNUgodagU69oubVtpo_WV/view?usp=drive_link)| +| **Invited Talk:** Towards Automating the Causal Machine Learning Pipeline | [Vasilis Syrgkanis](#vasilis-syrgkanis-stanford-universityeconml) (Stanford/EconML) | 3:50 - 4:10 PM (20 minutes) | [Slides](https://www.dropbox.com/scl/fi/w2p1cnghhqp1qc377o4yu/auto_debiased.pptx?rlkey=8k4bdcrastqmzlao8n5bng258&dl=0)| +| Power and Pre-treatment Fit: Optimizing Synthetic Control Method for Quasi-experiments | Ali O Polat (Shipt) | 4:10 - 4:25 PM (15 minutes) | [Slides](https://drive.google.com/file/d/1F9bCmoCYg7AeNxKXekCze7a4ELXs2Ohv/view?usp=drive_link), [Paper](https://drive.google.com/file/d/1rDsCmwl23HELiD-P_Qq9TUZZ_3opbEji/view?usp=drive_link)| +| Dynamic Causal Structure Discovery and Causal Effect Estimation | Jianian Wang, Rui Song (NCSU) | 4:25 - 4:40 PM (15 minutes) | [Slides](https://drive.google.com/file/d/13MzjyZLGxfGNoAg3TMRF0Es-TqtnRVSH/view?usp=drive_link), [Paper](https://drive.google.com/file/d/1928QcX3PdFan_gkeJl-mxEgl7kQ9KupN/view?usp=drive_link)| +| Hierarchical Clustering As a Novel Solution to the Notorious Multicollinearity Problem in Observational Causal Inference | Yufei Wu, Zhiying Gu, Alex Deng, Jacob Zhu (Airbnb) | 4:40 - 4:55 PM (15 minutes) | [Slides](https://drive.google.com/file/d/1r0xXFFflSVDFNtGEypUGh845CeycF51q/view?usp=drive_link), [Paper](https://drive.google.com/file/d/1HQekrFF1vNNrO7TF4aX43850-Dc6UHgo/view?usp=drive_link)| + +## **Invited Speakers** + +### Raif Rustamov, Amazon + +#### Title: COG: Creative Optimality Gap for Video Advertising + +#### Bio + +Raif Rustamov is a Senior Applied Scientist at Amazon where he focuses on brand advertising science including relevance +modeling, representation learning, and causal inference. He previously worked as a Principal Inventive Scientist in AI +and Data Science at AT&T Labs conducting research on recommender systems, customer segmentation, identity for +cross-device advertising, and location analytics. Raif has a PhD in Applied and Computational Mathematics from Princeton +University and has taught at Purdue and Drew Universities, as well as worked as a research associate at Stanford +University. + +#### Abstract -## **Presentation Abstracts** +Video creatives play a crucial role in shaping consumer experiences and brand perceptions, but quantifying their impact +on shopper experience remains a complex challenge. In this talk, we introduce the Creative Optimality Gap (COG), +a metric developed to assess the relative optimality of video creatives using causal-inferential machine learning +methodology. Our main contributions include the development of the COG metric through the use of conditional individual +treatment effects projected on interpretable video features, the introduction of a meta-learner for its computation, +and the incorporation of model uncertainty to avoid false positives. Our work advances the understanding of video +creative effectiveness and provides a valuable tool for optimizing ad performance. -### Introduction to Causal Inference +### Ruomeng Cui, Emory University -We will give an overview of basic concepts in causal inference. A quick refresher on the main tools and terminology of causal inference: correlation vs causation, average, conditional, and individual treatment effects, causal inference via randomization, Causal inference using instrumental variables, Causal inference via unconfoundedness. +#### Title: The Value of Last-Mile Delivery in Online Retail -### Introduction to CasualML +#### Bio -We will provide an overview of CausalML, an open source Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research. We will introduce the main components of CausalML: (1) inference with causal machine learning algorithms (e.g. meta-learners, uplift trees, CEVAE, dragonnet), (2) validation/analysis methods (e.g. synthetic data generation, AUUC, sensitivity analysis, interpretability), (3) optimization methods (e.g. policy optimization, value optimization, unit selection). +Ruomeng Cui is an Associate Professor of Operations Management at the Goizueta Business School, Emory University (on leave). She currently is a full-time Amazon Visiting Academic at Amazon, working in the supply chain domain. Her research focuses on causal inference, machine learning and data-driven modeling, with applications in retail, supply chains, and platforms. She currently serves as an associate editor for Manufacturing & Service Operations Management and Production and Operations Management. She received her Ph.D. in Operations Management from the Kellogg School of Management, Northwestern University and B.Sc in Industrial Engineering from Tsinghua University. -### Case #1: Causal Impact Analysis with Observational Data at Uber +#### Abstract -As an introductory case study for using causal inference, we will cover the use case of understanding the causal impact from observational data in the context of cross sell at Uber. We emphasize that simple comparisons of users who make cross purchase or not will produce biased estimates and that can be demonstrated in the causal inference framework. We show the use of different causal estimation methodologies through propensity score matching and meta learners to estimate the causal impact. In addition, we will use sensitivity analysis to show the robustness of the estimates. +Last-mile delivery has become increasingly important in the online retail industry. In this study, we study the economic +value of last-mile delivery. To do so, we conducted a quasi-experiment in collaboration with Cainiao, Alibaba's +logistics subsidiary, where home delivery was launched at some pickup stations in 2021. This allowed us to +comprehensively evaluate the causal impact of last-mile delivery. Using a difference-in-differences identification +method, we found that last-mile delivery significantly increases sales and customer spending on the retail platform. To +optimally prioritize limited delivery capacity, we employed causal machine learning to target the most responsive +customers. Our findings suggest that online retailers should carefully weigh the costs and benefits of last-mile +delivery and tailor their logistic strategies accordingly. -### Case #2: Targeting Optimization: Bidder at Uber +### Ang Li, University of California, Los Angeles -We will introduce the audience selection method with uplift modeling in online RTB, which aims to estimate heterogeneous treatment effects for advertising. It has been studied to provide a superior return on investment by selecting the most incremental users for a specific campaign. To examine the effectiveness of uplift modeling in the context of real-time bidding, we conducted the comparative analysis of four different meta-learners on real campaign data. We adapted an explore-exploit set up for offline training and online evaluation. We will also introduce how we use Targeted Maximum Likelihood Estimation (TMLE) based Average Treatment Effect (ATE) as ground truth for evaluation. +#### Title: Unit Selection Based on Counterfactual Logic -### Introduction to EconML +#### Bio -We will provide an overview of recent methodologies that combine machine learning with causal inference and the significant statistical power that machine learning brings to causal inference estimation methods. We will outline the structure and capabilities of the EconML package and describe some of the key causal machine learning methodologies that are implemented (e.g. double machine learning, causal forests, deepiv, doubly robust learning, dynamic double machine learning). We will also outline approaches to confidence interval construction (e.g. bootstrap, bootstrap-of-little-bags, debiased lasso), interpretability (shap values, tree interpreters) and policy learning (doubly robust policy learning). +Dr. Li is set to join the Florida State University Department of Computer Science as an assistant professor in August. +He is currently a post-doctoral researcher in the Department of Computer Science at UCLA under the guidance of Prof. +Judea Pearl. His primary research area is causal inference, artificial intelligence, and causality-based +decision-making, with a focus on building causal models that estimate treatment effects (interventions) and evaluating +what would have happened if an individual had taken a treatment (counterfactuals). He is also interested in +decision-making modeling using knowledge of treatment effects and counterfactuals. Prior to his post-doc, Dr. Li +obtained his Ph.D. at UCLA with Prof. Judea Pearl and his M.S. degree at the University of Minnesota Twin Cities. -### Case #3: Customer Segmentation at TripAdvisor with Recommendation A/B Tests +#### Abstract -We examine the scenario in which we wish to learn heterogeneous treatment effects (CATE), but observational data is biased and direct experimental data (e.g. A/B test) is plagued by imperfect compliance. In this setup, TripAdvisor would like to know whether joining a membership program compels users to spend more time engaging with the website and purchasing more products. The usual approach, a direct A/B test, is infeasible: the website cannot force users to comply and become members, hence the imperfect compliance that can bias calculations. The solution is to use an alternative A/B test that was originally designed to measure whether an easier sign-up process would promote user membership. This A/B test plays the role of an instrument that nudges users to sign up for membership. We introduce EconML’s IntentToTreatDRIV estimator which can leverage this repurposed A/B test to both learn the effect of membership on user engagement and understand how these effects vary with customer features. We show how this novel methodology led to extracting key business insights and helped TripAdvisor understand and differentiate how customers engage with their platform. +The unit selection problem aims to identify a set of individuals who are most likely to +exhibit a desired mode of behavior, which is defined in counterfactual terms. A typical +example is that of selecting individuals who would respond one way if encouraged and a +different way if not encouraged. Unlike previous works on this problem, which rely on ad-hoc +heuristics, we approach this problem formally, using counterfactual logic, to properly capture +the nature of the desired behavior. This formalism enables us to derive an informative +selection criterion which integrates experimental and observational data. We show that a +more accurate selection criterion can be achieved when structural information is available +in the form of a causal diagram. We further discuss data availability issue regarding the +derivation of the selection criterion without the observational or experimental data. We +demonstrate the superiority of this criterion over A/B-test-based approaches. -### Case #4: Long-Term Return-on-Investment at Microsoft via Short-Term Proxies +### Vasilis Syrgkanis, Stanford University/EconML -In this case study, we talk about using observational data to measure the long term Return-on-Investment of some types of dollar value investments Microsoft gives to the enterprise customers. There are many challenges for this setting, for instance, we don't have enough period of data to identify a long term ROI, we should control the effect coming from the future investment and we are in a high dimensional data space. We then propose a surrogate based approach assuming the long-term effect is channeled through some short-term proxies and employ a dynamic adjustment to the surrogate model in order to get rid of the effect from future investment, finally apply double machine learning (DML) techniques to estimate the ROI. We apply this methodology to answer the questions like what is the average long-run ROI on each type of the investment? What types of customers have a higher ROI to a specific investment? And how different incentives impact the different solution areas. Finally we will showcase how you could use EconML to solve similar problems by only a few lines of code. +#### Title: Towards Automating the Causal Machine Learning Pipeline +#### Bio -## **Tutors** +Vasilis Syrgkanis is an Assistant Professor in Management Science and Engineering and (by courtesy) in Computer Science, +in the School of Engineering at Stanford University. His research interests are in the areas of machine learning, causal +inference, econometrics, online and reinforcement learning, game theory/mechanism design and algorithm design. Until +August 2022, he was a Principal Researcher at Microsoft Research, New England, where he was a member of the EconCS and +StatsML groups. During his time at Microsoft, he co-led the project on Automated Learning and Intelligence for Causation +and Economics (ALICE) and was a co-founder of EconML, an open-source python package for causal machine learning. He +received his Ph.D. in Computer Science from Cornell University. + +#### Abstract -### Presenters +## **Accepted Papers** + +### For Oral Presentation +1. Leveraging Causal Uplift Modeling for Budget Constrained Benefits Allocation, Dmitri Goldenberg (Booking.com)*; Javier Albert (Booking.com); [Slides](https://docs.google.com/presentation/d/1Fz720lBj8DDsviLYNAIyBLVE7TN8qFoVtxPzbfAvo_I/edit?usp=drive_link) +2. Ensemble Method for Estimating Individualized Treatment Effects, Kevin Wu Han (Stanford University)*; Han Wu (Stanford University); [Slides](https://drive.google.com/file/d/1KLAFpKYw5mlJQ7cNyLamS1LkuGScABSb/view?usp=sharing), [Paper](https://drive.google.com/file/d/1dzsIGgQ4cF2ltetH16A4Zz-L-pZ6tHBK/view?usp=drive_link) +3. A Scalable and Debiased Approach to Dynamic Pricing with Causal Machine Learning and Optimization, Nicolò Cosimo Albanese (AWS)*; Fabian Furrer (AWS); Marco Guerriero (AWS); [Slides](https://drive.google.com/file/d/1uSqpV51qxNcEyCrgzhYo2nr44tfn717z/view?usp=drive_link), [Paper](https://drive.google.com/file/d/1VlW-zgrCfaKi5CtGYkQbkhWw7JVXhPtw/view?usp=drive_link) +4. An IPW-based Unbiased Ranking Metric in Two-sided Markets, Keisho Oh (Recruit Co., Ltd.)*; Naoki Nishimura (Recruit Co., Ltd.); Minje Sung (Tokyo Institute of Technology); Ken Kobayashi (Tokyo Institute of Technology); Kazuhide Nakata (Department of Industrial Engineering and Economics, Tokyo Institute of Technology.); [Slides](https://drive.google.com/file/d/1XLQCMUNy79jmYb0y1AB7ImRflq4csh6Y/view?usp=drive_link), [Paper](https://drive.google.com/file/d/1Bshr6dwFB-E2H2K64g9OfERNUQcfP0eb/view?usp=drive_link) +5. Power and Pre-treatment Fit: Optimizing Synthetic Control Method for Quasi-experiments, Ali O Polat (Shipt Inc.)*; [Slides](https://drive.google.com/file/d/1F9bCmoCYg7AeNxKXekCze7a4ELXs2Ohv/view?usp=drive_link), [Paper](https://drive.google.com/file/d/1rDsCmwl23HELiD-P_Qq9TUZZ_3opbEji/view?usp=drive_link) +6. Dynamic Causal Structure Discovery and Causal Effect Estimation, Jianian Wang (North Carolina State Unicersity)*; Rui Song (North Carolina State Unicersity); [Slides](https://drive.google.com/file/d/13MzjyZLGxfGNoAg3TMRF0Es-TqtnRVSH/view?usp=drive_link), [Paper](https://drive.google.com/file/d/1928QcX3PdFan_gkeJl-mxEgl7kQ9KupN/view?usp=drive_link) +7. Hierarchical Clustering As a Novel Solution to the Notorious Multicollinearity Problem in Observational Causal Inference, Yufei Wu (Airbnb)*; Zhiying Gu (Airbnb); Alex Deng (Airbnb); Jacob Zhu (Airbnb); [Slides](https://drive.google.com/file/d/1r0xXFFflSVDFNtGEypUGh845CeycF51q/view?usp=drive_link), [Paper](https://drive.google.com/file/d/1HQekrFF1vNNrO7TF4aX43850-Dc6UHgo/view?usp=drive_link) + +### For Poster Presentation +8. Community Detection-Enhanced Causal Structural Learning, Yuhe Gao (North Carolina State University)*; Hengrui Cai (University of California Irvine); Sheng Zhang (North Carolina State University); Rui Song (North Carolina State University); [Poster](https://drive.google.com/file/d/1dC_WLUhOJletn-kDJd1GPUJDyHwpMblG/view?usp=drive_link), [Paper](https://drive.google.com/file/d/11vnSFYDQZ1ZjztaM_eW7J9jd5tKH7y6Y/view?usp=drive_link) +9. ACE: Active Learning for Causal Inference with Expensive Experiments, Difan Song (Georgia Institute of Technology)*; Simon Mak (Duke University); C.F. Jeff Wu (Georgia Institute of Technology); [Paper](https://drive.google.com/file/d/1-fCLcT4RYAHJlsSBUuuAwsuqHDf9B4Qt/view?usp=drive_link) +10. Evaluate the Impact of Similar Products Ad Group Recommendations with Causal Inference, Jamie Chen (Amazon)*; Zuqi Shang (AmaOn); Raif Rustamov (Amazon); [Poster](https://drive.google.com/file/d/13UTLTuJKBA5hIFZ0g1HolLHQpAXhqcX4/view?usp=drive_link), [Paper](https://drive.google.com/file/d/1ahjN_fNvDjmdwD0btW4UvgxaKgCB32kP/view?usp=drive_link) +11. Machine Learning based Framework for Robust Price-Sensitivity Estimation with Application to Airline Pricing, Shahin Boluki (Pros Inc)*; Ravi Kumar (PROS); [Poster](https://drive.google.com/file/d/1AcRtI5NtovYWMEOLnQ7dE_IRgyHVukqf/view?usp=drive_link), [Paper](https://drive.google.com/file/d/1n5mL1quGS5jWVZXXqUrL349CJH-I2BOW/view?usp=drive_link) +12. OpportunityFinder: A Framework for Automated Causal Inference, Huy Nguyen (Amazon)*; Prince Grover (Amazon); Devashish Khatwani (Amazon); [Poster](https://drive.google.com/file/d/1MH_vV5MDafqAOVLnQRIgmG1IFGXZx05r/view?usp=drive_link), [Paper](https://drive.google.com/file/d/1_yAoohM0jG0uPi7om9_6igVQJI1JM-Ce/view?usp=drive_link) -* Jing Pan, Uber, CausalML -* Yifeng Wu, Uber, CausalML -* Huigang Chen, Facebook, CausalML -* Totte Harinen, Toyota Research Institute, CausalML -* Paul Lo, Uber, CausalML -* Greg Lewis, Microsoft Research, EconML -* Vasilis Syrgkanis, Microsoft Research, EconML -* Miruna Oprescu, Microsoft Research, EconML -* Maggie Hei, Microsoft Research, EconML -### Contributors +## **Organizers** -* Jeong-Yoon Lee, Netflix Research, CausalML +* Chu Wang, Amazon +* Yingfei Wang, University of Washington +* Xinwei Ma, UC San Diego +* [Zeyu Zheng](mailto:zyzheng@berkeley.edu), UC Berkeley, Amazon - main contact + +### [CausalML](https://github.com/uber/causalml) Team + +* Jing Pan, Snap, CausalML +* Yifeng Wu, Uber, CausalML +* Huigang Chen, Meta, CausalML +* Totte Harinen, AirBnB, CausalML +* Paul Lo, Snap, CausalML +* [Jeong-Yoon Lee](mailto:jeong@uber.com), Uber, CausalML - main contact * Zhenyu Zhao, Tencent, CausalML -* Keith Battocchi, Microsoft Research, EconML -* Eleanor Dillon, Microsoft Research, EconML +### [EconML](https://github.com/py-why/EconML) Team -## **References** - -1. Künzel, Sören R., et al. "Metalearners for estimating heterogeneous treatment effects using machine learning." Proceedings of the national academy of sciences 116.10 (2019): 4156-4165. ([paper](https://www.pnas.org/content/pnas/116/10/4156.full.pdf)) -2. Chernozhukov, Victor, et al. "Double/debiased/neyman machine learning of treatment effects." American Economic Review 107.5 (2017): 261-65. ([paper](https://arxiv.org/pdf/1701.08687)) -3. Nie, Xinkun, and Stefan Wager. "Quasi-oracle estimation of heterogeneous treatment effects." arXiv preprint arXiv:1712.04912 (2017) ([paper](https://arxiv.org/pdf/1712.04912)) -4. Tso, Fung Po, et al. "DragonNet: a robust mobile internet service system for long-distance trains." IEEE transactions on mobile computing 12.11 (2013): 2206-2218. ([paper](https://eprints.gla.ac.uk/56409/1/56409.pdf)) -5. Louizos, Christos, et al. "Causal effect inference with deep latent-variable models." arXiv preprint arXiv:1705.08821 (2017) ([paper](https://arxiv.org/pdf/1705.08821)) -6. Wager, Stefan, and Susan Athey. "Estimation and inference of heterogeneous treatment effects using random forests." Journal of the American Statistical Association 113.523 (2018): 1228-1242. ([paper](https://www.tandfonline.com/doi/pdf/10.1080/01621459.2017.1319839)) -7. Oprescu, Miruna, et al. "EconML: A Machine Learning Library for Estimating Heterogeneous Treatment Effects." ([repo](https://github.com/microsoft/EconML)) -8. Chen, Huigang, et al. "Causalml: Python package for causal machine learning." arXiv preprint arXiv:2002.11631 (2020) ([repo](https://github.com/uber/causalml)) -9. Yao, Liuyi, et al. "A survey on causal inference." arXiv preprint arXiv:2002.02770 (2020). ([paper](https://arxiv.org/pdf/2002.02770.pdf)) -10. Goldenberg, Dmitri, et al. "Personalization in Practice: Methods and Applications." Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 2021 ([paper](https://drive.google.com/drive/folders/1c_khoTDRbkoRY5OiaxEfUxRQkyNv3FeK)) -11. Blackwell, Matthew. "A selection bias approach to sensitivity analysis for causal effects." Political Analysis 22.2 (2014): 169-182. ([paper](https://www.cambridge.org/core/journals/political-analysis/article/selection-bias-approach-to-sensitivity-analysis-for-causal-effects/788C169FAF5482452566811136D4F9B4)) -12. Athey, Susan, and Stefan Wager. "Efficient policy learning." arXiv preprint arXiv:1702.02896 (2017). ([paper](https://arxiv.org/pdf/1702.02896.pdf)) -13. Sharma, Amit, and Emre Kiciman. "Causal Inference and Counterfactual Reasoning." Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. 2020. 369-370. ([paper](https://dl.acm.org/doi/abs/10.1145/3371158.3371231)) -14. Li, Ang, and Judea Pearl. "Unit selection based on counterfactual logic." Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 2019 ([paper](https://par.nsf.gov/biblio/10180278)) -15. Kennedy, Edward H. "Optimal doubly robust estimation of heterogeneous causal effects." arXiv preprint arXiv:2004.14497 (2020) ([paper](https://arxiv.org/pdf/2004.14497.pdf)) -16. Gruber, Susan, and Mark J. Van Der Laan. "Targeted maximum likelihood estimation: A gentle introduction." (2009) ([paper](https://biostats.bepress.com/cgi/viewcontent.cgi?article=1255&context=ucbbiostat)) -17. D. Foster, V. Syrgkanis. Orthogonal Statistical Learning. Proceedings of the 32nd Annual Conference on Learning Theory (COLT), 2019 ([paper](https://arxiv.org/pdf/1901.09036.pdf)) -18. V. Syrgkanis, V. Lei, M. Oprescu, M. Hei, K. Battocchi, G. Lewis. Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments. Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), 2019 ([paper](https://arxiv.org/pdf/1905.10176.pdf)) -19. M. Oprescu, V. Syrgkanis and Z. S. Wu. Orthogonal Random Forest for Causal Inference. Proceedings of the 36th International Conference on Machine Learning (ICML), 2019 ([paper](http://proceedings.mlr.press/v97/oprescu19a/oprescu19a.pdf)) -20. Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. Deep IV: A flexible approach for counterfactual prediction. Proceedings of the 34th International Conference on Machine Learning, ICML'17, 2017 ([paper](http://proceedings.mlr.press/v70/hartford17a/hartford17a.pdf)) -21. Battocchi, K., Dillon, E., Hei, M., Lewis, G., Oprescu, M., & Syrgkanis, V. (2021). Estimating the Long-Term Effects of Novel Treatments. arXiv preprint arXiv:2103.08390. ([paper](https://arxiv.org/pdf/2103.08390.pdf)) -22. Lewis, G., & Syrgkanis, V. (2020). Double/Debiased Machine Learning for Dynamic Treatment Effects. arXiv preprint arXiv:2002.07285. ([paper](https://arxiv.org/pdf/2002.07285.pdf)) +* Fabio Vera, Microsoft Research, EconML +* Eleanor Dillon, Microsoft Research, EconML +* Keith Battocchi, Microsoft Research, EconML diff --git a/_action_files/settings.ini b/_action_files/settings.ini index d803172..8becb3a 100644 --- a/_action_files/settings.ini +++ b/_action_files/settings.ini @@ -7,7 +7,7 @@ description = Writing a library entirely in notebooks keywords = jupyter notebook author = Sylvain Gugger and Jeremy Howard author_email = info@fast.ai -baseurl = /kdd2021-tutorial +baseurl = /kdd2023-workshop title = nbdev copyright = fast.ai license = apache2 diff --git a/_config.yml b/_config.yml index 6995623..da2a486 100644 --- a/_config.yml +++ b/_config.yml @@ -6,11 +6,11 @@ # https://learn-the-web.algonquindesign.ca/topics/markdown-yaml-cheat-sheet/#yaml # https://learnxinyminutes.com/docs/yaml/ -title: EconML/CausalML KDD 2021 Tutorial -description: EconML/CausalML KDD 2021 Tutorial +title: KDD 2023 Workshop - Causal Inference and Machine Learning in Practice +description: KDD 2023 Workshop - Causal Inference and Machine Learning in Practice github_username: causal-machine-learning # you can comment the below line out if your repo name is not different than your baseurl -github_repo: "kdd2021-tutorial" +github_repo: "kdd2023-workshop" # OPTIONAL: override baseurl and url if using a custom domain # Note: leave out the trailing / from this value. @@ -34,13 +34,13 @@ url: "https://causal-machine-learning.github.io" # the base hostname & protocol # # 3. You must replace the parameter `baseurl` in _action_files/settings.ini with the same value as you set here but WITHOUT QUOTES. # -baseurl: "/kdd2021-tutorial" # the subpath of your site, e.g. "/blog". +baseurl: "/kdd2023-workshop" # the subpath of your site, e.g. "/blog". # Github and twitter are optional: minima: social_links: - twitter: CausalMachine - github: causal-machine-learning + - { platform: github, user_url: "https://github.com/causal-machine-learning/" } + - { platform: twitter, user_url: "https://twitter.com/CausalMachine" } # Set this to true to get LaTeX math equation support use_math: true @@ -100,7 +100,7 @@ plugins: paginate: 15 paginate_path: /page:num/ -remote_theme: jekyll/minima +remote_theme: jekyll/minima@6513ea8b9c1c4909b6aa79926d52a1f7d865c5e7 titles_from_headings: enabled: true diff --git a/_fastpages_docs/_setup_pr_template.md b/_fastpages_docs/_setup_pr_template.md index 894e80a..2bb5d67 100644 --- a/_fastpages_docs/_setup_pr_template.md +++ b/_fastpages_docs/_setup_pr_template.md @@ -4,22 +4,22 @@ Hello :wave: @causal-machine-learning! Thank you for using fastpages! 1. Create an ssh key-pair. Open this utility. Select: `RSA` and `4096` and leave `Passphrase` blank. Click the blue button `Generate-SSH-Keys`. -2. Navigate to this link and click `New repository secret`. Copy and paste the **Private Key** into the `Value` field. This includes the "---BEGIN RSA PRIVATE KEY---" and "--END RSA PRIVATE KEY---" portions. **In the `Name` field, name the secret `SSH_DEPLOY_KEY`.** +2. Navigate to this link and click `New repository secret`. Copy and paste the **Private Key** into the `Value` field. This includes the "---BEGIN RSA PRIVATE KEY---" and "--END RSA PRIVATE KEY---" portions. **In the `Name` field, name the secret `SSH_DEPLOY_KEY`.** -3. Navigate to this link and click the `Add deploy key` button. Paste your **Public Key** from step 1 into the `Key` box. In the `Title`, name the key anything you want, for example `fastpages-key`. Finally, **make sure you click the checkbox next to `Allow write access`** (pictured below), and click `Add key` to save the key. +3. Navigate to this link and click the `Add deploy key` button. Paste your **Public Key** from step 1 into the `Key` box. In the `Title`, name the key anything you want, for example `fastpages-key`. Finally, **make sure you click the checkbox next to `Allow write access`** (pictured below), and click `Add key` to save the key. ![](https://raw.githubusercontent.com/fastai/fastpages/master/_fastpages_docs/_checkbox.png) ### What to Expect After Merging This PR -- GitHub Actions will build your site, which will take 2-3 minutes to complete. **This will happen anytime you push changes to the master branch of your repository.** You can monitor the logs of this if you like on the [Actions tab of your repo](https://github.com/causal-machine-learning/kdd2021-tutorial/actions). +- GitHub Actions will build your site, which will take 2-3 minutes to complete. **This will happen anytime you push changes to the master branch of your repository.** You can monitor the logs of this if you like on the [Actions tab of your repo](https://github.com/causal-machine-learning/kdd2023-workshop/actions). - Your GH-Pages Status badge on your README will eventually appear and be green, indicating your first successful build. -- You can monitor the status of your site in the GitHub Pages section of your [repository settings](https://github.com/causal-machine-learning/kdd2021-tutorial/settings). +- You can monitor the status of your site in the GitHub Pages section of your [repository settings](https://github.com/causal-machine-learning/kdd2023-workshop/settings). If you are not using a custom domain, your website will appear at: -#### https://causal-machine-learning.github.io/kdd2021-tutorial +#### https://causal-machine-learning.github.io/kdd2023-workshop ## Optional: Using a Custom Domain diff --git a/_pages/cfp.md b/_pages/cfp.md new file mode 100644 index 0000000..451a37e --- /dev/null +++ b/_pages/cfp.md @@ -0,0 +1,62 @@ +--- +permalink: /cfp/ +layout: default +title: Call for Paper +search_exclude: true +--- + +# **Call for Paper** + +## **Important Dates** + +All deadlines are at 11:59 PM [AoE](https://www.timeanddate.com/time/zones/aoe). +* ~~April 30th, 2023: CMT submission portal opens~~ +* ~~May 23rd, 2023 **May 30th, 2023**: Abstract submission deadline **(extended)**~~ +* ~~June 9th, 2023 **June 16th, 2023**: Workshop paper submission deadline **(extended)**~~ +* ~~July 10th, 2023: Paper decision notifications~~ +* ~~July 24th, 2023: Camera-ready deadline~~ +* August 7th, 2023: Workshop + +## **Submission Link** + +* CMT submission portal: [https://cmt3.research.microsoft.com/CMLKDD2023/](https://cmt3.research.microsoft.com/CMLKDD2023/) + +## **Aim and Scope** + +The workshop aims to bring together researchers and practitioners from academia and industry to share their experiences +and insights on applying causal inference and machine learning techniques to real-world problems in the areas of +product, brand, policy, and beyond. + +We welcome papers on a variety of topics, including but not limited to the following: +* Industry use cases where causal inference and machine learning are used in practice +* Challenges and opportunities for using causal inference and machine learning in industry settings +* Techniques for incorporating causal inference into machine learning models +* Methodologies for evaluating causal machine-learning models in practice + +We encourage submissions from researchers and practitioners working in industry, government, and academia. We welcome +papers that present new research results, works in progress, or case studies that showcase the application of causal +inference and machine learning techniques to real-world problems. + +All submissions will be peer-reviewed by the program committee, and accepted papers will be presented as contributed +talks or posters during the workshop. + +## **Submission and Formatting Instructions** + +* Submissions are single-blind—author names and affiliations should be listed. +* Submissions are limited to 6 pages (excluding references), must be in PDF, and use the ACM Conference Proceeding +template (two-column format). +* The recommended setting for Latex documents is: +`\documentclass[sigconf, review]{acmart}`. +* Additional supplemental material focused on reproducibility can be provided. Proofs, pseudo-code, and code may also be +included in the supplement, which has no explicit page limit. +* The supplementary material should be included in the same pdf file as the main manuscript. The main body of the paper +should be self-contained since reviewers are not required to read the supplementary material. The supplementary material +will not be included in the proceedings. +* Submissions violating these formatting requirements will be desk-rejected. +* The Word template guideline can be found [here](https://www.acm.org/publications/proceedings-template). +* The Latex/overleaf template guideline can be found +[here](https://www.overleaf.com/latex/templates/association-for-computing-machinery-acm-sig-proceedings-template/bmvfhcdnxfty). + +For any questions or inquiries, please contact the workshop organizers at [jeong@uber.com](mailto:jeong@uber.com) and +[zyzheng@berkeley.edu](mailto:zyzheng@berkeley.edu). We +look forward to your submissions! diff --git a/images/logo.png b/images/logo.png index 64c392f..4ef1a18 100644 Binary files a/images/logo.png and b/images/logo.png differ diff --git a/images/raif.png b/images/raif.png new file mode 100644 index 0000000..e627f58 Binary files /dev/null and b/images/raif.png differ diff --git a/images/vasilis.png b/images/vasilis.png new file mode 100644 index 0000000..3e31460 Binary files /dev/null and b/images/vasilis.png differ diff --git a/index.html b/index.html index f2a19d7..14bd462 100644 --- a/index.html +++ b/index.html @@ -4,117 +4,185 @@ image: images/logo.png --- -# **Causal Inference and Machine Learning in Practice with EconML and CausalML**: Industrial Use Cases at Microsoft, TripAdvisor, Uber +# **Causal Inference and Machine Learning in Practice**: Use cases for Product, Brand, Policy and Beyond ## **Schedule** -### Time +* Long Beach Convention & Entertainment Center, 300 E Ocean Blvd, Long Beach, CA 90802 +([Map](https://goo.gl/maps/1N3XGEovGgJqXAV98)) +* Date: August 7, 2023 (Monday) +* Time: 1:00 - 5:00 PM Pacific Time -* 4:00 AM - 7:00 AM August 15, 2021 [SGT](https://www.timeanddate.com/worldclock/converter.html?iso=20210814T200000&p1=236&p2=tz_pt&p3=tz_et) -* 4:00 PM - 7:00 PM August 14, 2021 [EDT](https://www.timeanddate.com/worldclock/converter.html?iso=20210814T200000&p1=236&p2=tz_pt&p3=tz_et) -* 1:00 PM - 4:00 PM August 14, 2021 [PDT](https://www.timeanddate.com/worldclock/converter.html?iso=20210814T200000&p1=236&p2=tz_pt&p3=tz_et) - -### Live Zoom Link +## **Abstract** -To be shared within the KDD 21 Virtual Platform during the conference. +The increasing demand for data-driven decision-making has led to the rapid growth of machine learning applications in +various industries. However, the ability to draw causal inferences from observational data remains a crucial challenge. +In recent years, causal inference has emerged as a powerful tool for understanding the effects of interventions in +complex systems. Combining causal inference with machine learning has the potential to provide a deeper understanding of +the underlying mechanisms and to develop more effective solutions to real-world problems. -## **Abstract** +This workshop aims to bring together researchers and practitioners from academia and industry to share their experiences +and insights on applying causal inference and machine learning techniques to real-world problems in the areas of +product, brand, policy, and beyond. The workshop welcomes original research that covers machine learning theory, deep +learning, causal inference, and online learning. Additionally, the workshop encourages topics that address scalable +system design, algorithm bias, and interpretability. -In recent years, both academic research and industry applications see an increased effort in using machine learning methods to measure granular causal effects and design optimal policies based on these causal estimates. Open source packages such as [CausalML](https://github.com/uber/causalml) and [EconML](https://github.com/microsoft/econml) provide a unified interface for applied researchers and industry practitioners with a variety of machine learning methods for causal inference. The tutorial will cover the topics including conditional treatment effect estimators by meta-learners and tree-based algorithms, model validations and sensitivity analysis, optimization algorithms including policy leaner and cost optimization. In addition, the tutorial will demonstrate the production of these algorithms in industry use cases. +Through keynote talks, panel discussions, and contributed talks and posters, the workshop will provide a forum for +discussing the latest advances and challenges in applying causal inference and machine learning to real-world problems. +The workshop will also offer opportunities for networking and collaboration among researchers and practitioners working +in industry, government, and academia. -## **Target Audience and Prerequisites for the Tutorial** +## **Paper Submission** -Anyone who is interested in causal inference and machine learning, especially economists/statisticians/data scientists who want to learn how to combine causal inference and machine learning with real industry use cases incorporated in large scaled machine learning systems at companies such as Microsoft, TripAdvisor and Uber. -The tutorial assumes some basic knowledge in statistical methods, machine learning algorithms and the Python programming language. +Please submit your paper to the [CMT portal](https://cmt3.research.microsoft.com/CMLKDD2023) site, and check the [Call for +Paper](https://causal-machine-learning.github.io/kdd2023-workshop/cfp/) page for details on important dates and +submission guidelines. ## **Outline** -| **Title** | **Duration** | Slides | Code | -|-----------|--------------|--------|------| -| **Introduction to Causal Inference** | 20 minutes | [Slides](https://drive.google.com/file/d/1O1oVU3nX7ThzCrUxlFK-OJsxF3Bz8Khl/view?usp=sharing) | | -| **Case Studies Part 1 by CausalML** | | | | -| Introduction to CausalML| 15 minutes | [Slides](https://drive.google.com/file/d/1ukFsX0QU0kdlQHv_VG_F8QJZtqZ86cwy/view) | | -| Case Study #1: Causal Impact Analysis with Observational Data: CeViChE at Uber | 30 minutes | [Slides](https://docs.google.com/presentation/d/1FvRtis2fm4c2R7XmRKWMTtZaZjUObW1fGxpNmapmjKI/edit?usp=sharing)| [Notebook](https://colab.research.google.com/drive/1ySwg9BIYWS5oLQ5haorMyiIbyiCJ431J?usp=sharing) | -| Case Study #2: Targeting Optimization: Bidder at Uber | 30 minutes |[Slides](https://drive.google.com/file/d/1QJJUCo4LH5kGQP3kaJlG1RdhjhaJWp-5/view?usp=sharing) |[Notebook](https://colab.research.google.com/drive/1fnZEHIAcNxrvSxFrlO1hRTHO7sazXbo0?usp=sharing) | -| **Case Studies Part 2 by EconML** | | | | -| Introduction to EconML| 15 minutes | [Slides](https://drive.google.com/file/d/1gt4KNznrYbwdryi9jGcC0-hDCNg7mBNE/view?usp=sharing) | [Notebook](https://colab.research.google.com/drive/1m2Ob7dc1JalEb6FIzSG1tx0qW491-YNc?usp=sharing) | -| Case Study #3: Customer Segmentation at TripAdvisor with Recommendation A/B Tests | 30 minutes | [Slides](https://drive.google.com/file/d/1yyIu_3epIVXbwzJj658Iv4vxHGjtPh8n/view?usp=sharing) | [Notebook](https://colab.research.google.com/drive/1nUhkLVpanv-gm_oA7FbValhpDpEs02wR#scrollTo=qk4_f4tx5gZz) | -| Case Study #4: Long-Term Return-on-Investment at Microsoft via Short-Term Proxies | 30 minutes | [Slides](https://drive.google.com/file/d/1FEKXFHHATntHjsEymXnEw6GAiUGMm8sG/view?usp=sharing) | [Notebook](https://colab.research.google.com/drive/1Ow7ArXRn1NJq47OLvchi26RRTdm94yv8?usp=sharing) | +| **Title** | **Speaker** | **Time (Duration)** | Link | +|-----------|-------------|--------------|------| +| **Introduction** | Organizers | 1:00 - 1:10 PM (10 minutes) | | +| **Invited Talk:** COG: Creative Optimality Gap for Video Advertising | [Raif Rustamov](#raif-rustamov-amazon) (Amazon) | 1:10 - 1:30 PM (20 minutes) | | +| **Invited Talk:** The Value of Last-Mile Delivery in Online Retail | [Ruomeng Cui](#ruomeng-cui-emory-university) (Emory) | 1:30 - 1:50 PM (20 minutes) | | +| Leveraging Causal Uplift Modeling for Budget Constrained Benefits Allocation | Dmitri Goldenberg, Javier Albert (Booking.com) | 1:50 - 2:05 PM (15 minutes) | | +| Ensemble Method for Estimating Individualized Treatment Effects | Kevin Wu Han, Han Wu (Stanford) | 2:05 - 2:20 PM (15 minutes) | | +| A Scalable and Debiased Approach to Dynamic Pricing with Causal Machine Learning and Optimization | Nicolò Cosimo Albanese, Fabian Furrer, Marco Guerriero (Amazon AWS) | 2:20 - 2:35 PM (15 minutes) | | +| An IPW-based Unbiased Ranking Metric in Two-sided Markets | Keisho Oh, Naoki Nishimura (Recruit Co), Minje Sung, Ken Kobayashi, Kazuhide Nakata (Tokyo Institute of Technology) | 2:35 - 2:50 PM (15 minutes) | | +| **Break & Poster Session** | | 3:00 - 3:30 PM (30 minutes) | | +| **Invited Talk:** Unit Selection Based on Counterfactual Logic | [Ang Li](#ang-li-university-of-california-los-angeles) (UCLA) | 3:30 - 3:50 PM (20 minutes) | | +| **Invited Talk:** Towards Automating the Causal Machine Learning Pipeline | [Vasilis Syrgkanis](#vasilis-syrgkanis-stanford-universityeconml) (Stanford/EconML) | 3:50 - 4:10 PM (20 minutes) | | +| Power and Pre-treatment Fit: Optimizing Synthetic Control Method for Quasi-experiments | Ali O Polat (Shipt) | 4:10 - 4:25 PM (15 minutes) | | +| Dynamic Causal Structure Discovery and Causal Effect Estimation | Jianian Wang, Rui Song (NCSU) | 4:25 - 4:40 PM (15 minutes) | | +| Hierarchical Clustering As a Novel Solution to the Notorious Multicollinearity Problem in Observational Causal Inference | Yufei Wu, Zhiying Gu, Alex Deng, Jacob Zhu (Airbnb) | 4:40 - 4:55 PM (15 minutes) | | + +## **Invited Speakers** + +### Raif Rustamov, Amazon + +#### Title: COG: Creative Optimality Gap for Video Advertising + +#### Bio + +Raif Rustamov is a Senior Applied Scientist at Amazon where he focuses on brand advertising science including relevance +modeling, representation learning, and causal inference. He previously worked as a Principal Inventive Scientist in AI +and Data Science at AT&T Labs conducting research on recommender systems, customer segmentation, identity for +cross-device advertising, and location analytics. Raif has a PhD in Applied and Computational Mathematics from Princeton +University and has taught at Purdue and Drew Universities, as well as worked as a research associate at Stanford +University. + +#### Abstract -## **Presentation Abstracts** +Video creatives play a crucial role in shaping consumer experiences and brand perceptions, but quantifying their impact +on shopper experience remains a complex challenge. In this talk, we introduce the Creative Optimality Gap (COG), +a metric developed to assess the relative optimality of video creatives using causal-inferential machine learning +methodology. Our main contributions include the development of the COG metric through the use of conditional individual +treatment effects projected on interpretable video features, the introduction of a meta-learner for its computation, +and the incorporation of model uncertainty to avoid false positives. Our work advances the understanding of video +creative effectiveness and provides a valuable tool for optimizing ad performance. -### Introduction to Causal Inference +### Ruomeng Cui, Emory University -We will give an overview of basic concepts in causal inference. A quick refresher on the main tools and terminology of causal inference: correlation vs causation, average, conditional, and individual treatment effects, causal inference via randomization, Causal inference using instrumental variables, Causal inference via unconfoundedness. +#### Title: The Value of Last-Mile Delivery in Online Retail -### Introduction to CasualML +#### Bio -We will provide an overview of CausalML, an open source Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research. We will introduce the main components of CausalML: (1) inference with causal machine learning algorithms (e.g. meta-learners, uplift trees, CEVAE, dragonnet), (2) validation/analysis methods (e.g. synthetic data generation, AUUC, sensitivity analysis, interpretability), (3) optimization methods (e.g. policy optimization, value optimization, unit selection). +Ruomeng Cui is an Associate Professor of Operations Management at the Goizueta Business School, Emory University (on leave). She currently is a full-time Amazon Visiting Academic at Amazon, working in the supply chain domain. Her research focuses on causal inference, machine learning and data-driven modeling, with applications in retail, supply chains, and platforms. She currently serves as an associate editor for Manufacturing & Service Operations Management and Production and Operations Management. She received her Ph.D. in Operations Management from the Kellogg School of Management, Northwestern University and B.Sc in Industrial Engineering from Tsinghua University. -### Case #1: Causal Impact Analysis with Observational Data at Uber +#### Abstract -As an introductory case study for using causal inference, we will cover the use case of understanding the causal impact from observational data in the context of cross sell at Uber. We emphasize that simple comparisons of users who make cross purchase or not will produce biased estimates and that can be demonstrated in the causal inference framework. We show the use of different causal estimation methodologies through propensity score matching and meta learners to estimate the causal impact. In addition, we will use sensitivity analysis to show the robustness of the estimates. +Last-mile delivery has become increasingly important in the online retail industry. In this study, we study the economic +value of last-mile delivery. To do so, we conducted a quasi-experiment in collaboration with Cainiao, Alibaba's +logistics subsidiary, where home delivery was launched at some pickup stations in 2021. This allowed us to +comprehensively evaluate the causal impact of last-mile delivery. Using a difference-in-differences identification +method, we found that last-mile delivery significantly increases sales and customer spending on the retail platform. To +optimally prioritize limited delivery capacity, we employed causal machine learning to target the most responsive +customers. Our findings suggest that online retailers should carefully weigh the costs and benefits of last-mile +delivery and tailor their logistic strategies accordingly. -### Case #2: Targeting Optimization: Bidder at Uber +### Ang Li, University of California, Los Angeles -We will introduce the audience selection method with uplift modeling in online RTB, which aims to estimate heterogeneous treatment effects for advertising. It has been studied to provide a superior return on investment by selecting the most incremental users for a specific campaign. To examine the effectiveness of uplift modeling in the context of real-time bidding, we conducted the comparative analysis of four different meta-learners on real campaign data. We adapted an explore-exploit set up for offline training and online evaluation. We will also introduce how we use Targeted Maximum Likelihood Estimation (TMLE) based Average Treatment Effect (ATE) as ground truth for evaluation. +#### Title: Unit Selection Based on Counterfactual Logic -### Introduction to EconML +#### Bio -We will provide an overview of recent methodologies that combine machine learning with causal inference and the significant statistical power that machine learning brings to causal inference estimation methods. We will outline the structure and capabilities of the EconML package and describe some of the key causal machine learning methodologies that are implemented (e.g. double machine learning, causal forests, deepiv, doubly robust learning, dynamic double machine learning). We will also outline approaches to confidence interval construction (e.g. bootstrap, bootstrap-of-little-bags, debiased lasso), interpretability (shap values, tree interpreters) and policy learning (doubly robust policy learning). +Dr. Li is set to join the Florida State University Department of Computer Science as an assistant professor in August. +He is currently a post-doctoral researcher in the Department of Computer Science at UCLA under the guidance of Prof. +Judea Pearl. His primary research area is causal inference, artificial intelligence, and causality-based +decision-making, with a focus on building causal models that estimate treatment effects (interventions) and evaluating +what would have happened if an individual had taken a treatment (counterfactuals). He is also interested in +decision-making modeling using knowledge of treatment effects and counterfactuals. Prior to his post-doc, Dr. Li +obtained his Ph.D. at UCLA with Prof. Judea Pearl and his M.S. degree at the University of Minnesota Twin Cities. -### Case #3: Customer Segmentation at TripAdvisor with Recommendation A/B Tests +#### Abstract -We examine the scenario in which we wish to learn heterogeneous treatment effects (CATE), but observational data is biased and direct experimental data (e.g. A/B test) is plagued by imperfect compliance. In this setup, TripAdvisor would like to know whether joining a membership program compels users to spend more time engaging with the website and purchasing more products. The usual approach, a direct A/B test, is infeasible: the website cannot force users to comply and become members, hence the imperfect compliance that can bias calculations. The solution is to use an alternative A/B test that was originally designed to measure whether an easier sign-up process would promote user membership. This A/B test plays the role of an instrument that nudges users to sign up for membership. We introduce EconML’s IntentToTreatDRIV estimator which can leverage this repurposed A/B test to both learn the effect of membership on user engagement and understand how these effects vary with customer features. We show how this novel methodology led to extracting key business insights and helped TripAdvisor understand and differentiate how customers engage with their platform. +The unit selection problem aims to identify a set of individuals who are most likely to +exhibit a desired mode of behavior, which is defined in counterfactual terms. A typical +example is that of selecting individuals who would respond one way if encouraged and a +different way if not encouraged. Unlike previous works on this problem, which rely on ad-hoc +heuristics, we approach this problem formally, using counterfactual logic, to properly capture +the nature of the desired behavior. This formalism enables us to derive an informative +selection criterion which integrates experimental and observational data. We show that a +more accurate selection criterion can be achieved when structural information is available +in the form of a causal diagram. We further discuss data availability issue regarding the +derivation of the selection criterion without the observational or experimental data. We +demonstrate the superiority of this criterion over A/B-test-based approaches. -### Case #4: Long-Term Return-on-Investment at Microsoft via Short-Term Proxies +### Vasilis Syrgkanis, Stanford University/EconML -In this case study, we talk about using observational data to measure the long term Return-on-Investment of some types of dollar value investments Microsoft gives to the enterprise customers. There are many challenges for this setting, for instance, we don't have enough period of data to identify a long term ROI, we should control the effect coming from the future investment and we are in a high dimensional data space. We then propose a surrogate based approach assuming the long-term effect is channeled through some short-term proxies and employ a dynamic adjustment to the surrogate model in order to get rid of the effect from future investment, finally apply double machine learning (DML) techniques to estimate the ROI. We apply this methodology to answer the questions like what is the average long-run ROI on each type of the investment? What types of customers have a higher ROI to a specific investment? And how different incentives impact the different solution areas. Finally we will showcase how you could use EconML to solve similar problems by only a few lines of code. +#### Title: Towards Automating the Causal Machine Learning Pipeline +#### Bio -## **Tutors** +Vasilis Syrgkanis is an Assistant Professor in Management Science and Engineering and (by courtesy) in Computer Science, +in the School of Engineering at Stanford University. His research interests are in the areas of machine learning, causal +inference, econometrics, online and reinforcement learning, game theory/mechanism design and algorithm design. Until +August 2022, he was a Principal Researcher at Microsoft Research, New England, where he was a member of the EconCS and +StatsML groups. During his time at Microsoft, he co-led the project on Automated Learning and Intelligence for Causation +and Economics (ALICE) and was a co-founder of EconML, an open-source python package for causal machine learning. He +received his Ph.D. in Computer Science from Cornell University. + +#### Abstract -### Presenters +## **Accepted Papers** -* Jing Pan, Uber, CausalML -* Yifeng Wu, Uber, CausalML -* Huigang Chen, Facebook, CausalML -* Totte Harinen, Toyota Research Institute, CausalML -* Paul Lo, Uber, CausalML -* Greg Lewis, Microsoft Research, EconML -* Vasilis Syrgkanis, Microsoft Research, EconML -* Miruna Oprescu, Microsoft Research, EconML -* Maggie Hei, Microsoft Research, EconML +### For Oral Presentation +1. Leveraging Causal Uplift Modeling for Budget Constrained Benefits Allocation, Dmitri Goldenberg (Booking.com)*; Javier Albert (Booking.com) +2. Ensemble Method for Estimating Individualized Treatment Effects, Kevin Wu Han (Stanford University)*; Han Wu (Stanford University) +3. A Scalable and Debiased Approach to Dynamic Pricing with Causal Machine Learning and Optimization, Nicolò Cosimo Albanese (AWS)*; Fabian Furrer (AWS); Marco Guerriero (AWS) +4. An IPW-based Unbiased Ranking Metric in Two-sided Markets, Keisho Oh (Recruit Co., Ltd.)*; Naoki Nishimura (Recruit Co., Ltd.); Minje Sung (Tokyo Institute of Technology); Ken Kobayashi (Tokyo Institute of Technology); Kazuhide Nakata (Department of Industrial Engineering and Economics, Tokyo Institute of Technology.) +5. Power and Pre-treatment Fit: Optimizing Synthetic Control Method for Quasi-experiments, Ali O Polat (Shipt Inc.)* +6. Dynamic Causal Structure Discovery and Causal Effect Estimation, Jianian Wang (North Carolina State Unicersity)*; Rui Song (North Carolina State Unicersity) +7. Hierarchical Clustering As a Novel Solution to the Notorious Multicollinearity Problem in Observational Causal Inference, Yufei Wu (Airbnb)*; Zhiying Gu (Airbnb); Alex Deng (Airbnb); Jacob Zhu (Airbnb) + +### For Poster Presentation +8. Community Detection-Enhanced Causal Structural Learning, Yuhe Gao (North Carolina State University)*; Hengrui Cai (University of California Irvine); Sheng Zhang (North Carolina State University); Rui Song (North Carolina State University) +9. ACE: Active Learning for Causal Inference with Expensive Experiments, Difan Song (Georgia Institute of Technology)*; Simon Mak (Duke University); C.F. Jeff Wu (Georgia Institute of Technology) +10. Extracting Causal Insights from Microsoft Feedback Hub using LLMs and In-context Learning, Sara Abdali (University of California, Riverside )*; Anjali Parikh (Microsoft); Steve Lim (Microsoft) +11. Evaluate the Impact of Similar Products Ad Group Recommendations with Causal Inference, Jamie Chen (Amazon)*; Zuqi Shang (AmaOn); Raif Rustamov (Amazon) +12. Machine Learning based Framework for Robust Price-Sensitivity Estimation with Application to Airline Pricing, Shahin Boluki (Pros Inc)*; Ravi Kumar (PROS) +13. OpportunityFinder: A Framework for Automated Causal Inference, Huy Nguyen (Amazon)*; Prince Grover (Amazon); Devashish Khatwani (Amazon) + + +## **Organizers** -### Contributors +* Chu Wang, Amazon +* Yingfei Wang, University of Washington +* Xinwei Ma, UC San Diego +* [Zeyu Zheng](mailto:zyzheng@berkeley.edu), UC Berkeley, Amazon - main contact -* Jeong-Yoon Lee, Netflix Research, CausalML +### [CausalML](https://github.com/uber/causalml) Team + +* Jing Pan, Snap, CausalML +* Yifeng Wu, Uber, CausalML +* Huigang Chen, Meta, CausalML +* Totte Harinen, AirBnB, CausalML +* Paul Lo, Snap, CausalML +* [Jeong-Yoon Lee](mailto:jeong@uber.com), Uber, CausalML - main contact * Zhenyu Zhao, Tencent, CausalML -* Keith Battocchi, Microsoft Research, EconML -* Eleanor Dillon, Microsoft Research, EconML +### [EconML](https://github.com/py-why/EconML) Team -## **References** - -1. Künzel, Sören R., et al. "Metalearners for estimating heterogeneous treatment effects using machine learning." Proceedings of the national academy of sciences 116.10 (2019): 4156-4165. ([paper](https://www.pnas.org/content/pnas/116/10/4156.full.pdf)) -2. Chernozhukov, Victor, et al. "Double/debiased/neyman machine learning of treatment effects." American Economic Review 107.5 (2017): 261-65. ([paper](https://arxiv.org/pdf/1701.08687)) -3. Nie, Xinkun, and Stefan Wager. "Quasi-oracle estimation of heterogeneous treatment effects." arXiv preprint arXiv:1712.04912 (2017) ([paper](https://arxiv.org/pdf/1712.04912)) -4. Tso, Fung Po, et al. "DragonNet: a robust mobile internet service system for long-distance trains." IEEE transactions on mobile computing 12.11 (2013): 2206-2218. ([paper](https://eprints.gla.ac.uk/56409/1/56409.pdf)) -5. Louizos, Christos, et al. "Causal effect inference with deep latent-variable models." arXiv preprint arXiv:1705.08821 (2017) ([paper](https://arxiv.org/pdf/1705.08821)) -6. Wager, Stefan, and Susan Athey. "Estimation and inference of heterogeneous treatment effects using random forests." Journal of the American Statistical Association 113.523 (2018): 1228-1242. ([paper](https://www.tandfonline.com/doi/pdf/10.1080/01621459.2017.1319839)) -7. Oprescu, Miruna, et al. "EconML: A Machine Learning Library for Estimating Heterogeneous Treatment Effects." ([repo](https://github.com/microsoft/EconML)) -8. Chen, Huigang, et al. "Causalml: Python package for causal machine learning." arXiv preprint arXiv:2002.11631 (2020) ([repo](https://github.com/uber/causalml)) -9. Yao, Liuyi, et al. "A survey on causal inference." arXiv preprint arXiv:2002.02770 (2020). ([paper](https://arxiv.org/pdf/2002.02770.pdf)) -10. Goldenberg, Dmitri, et al. "Personalization in Practice: Methods and Applications." Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 2021 ([paper](https://drive.google.com/drive/folders/1c_khoTDRbkoRY5OiaxEfUxRQkyNv3FeK)) -11. Blackwell, Matthew. "A selection bias approach to sensitivity analysis for causal effects." Political Analysis 22.2 (2014): 169-182. ([paper](https://www.cambridge.org/core/journals/political-analysis/article/selection-bias-approach-to-sensitivity-analysis-for-causal-effects/788C169FAF5482452566811136D4F9B4)) -12. Athey, Susan, and Stefan Wager. "Efficient policy learning." arXiv preprint arXiv:1702.02896 (2017). ([paper](https://arxiv.org/pdf/1702.02896.pdf)) -13. Sharma, Amit, and Emre Kiciman. "Causal Inference and Counterfactual Reasoning." Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. 2020. 369-370. ([paper](https://dl.acm.org/doi/abs/10.1145/3371158.3371231)) -14. Li, Ang, and Judea Pearl. "Unit selection based on counterfactual logic." Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 2019 ([paper](https://par.nsf.gov/biblio/10180278)) -15. Kennedy, Edward H. "Optimal doubly robust estimation of heterogeneous causal effects." arXiv preprint arXiv:2004.14497 (2020) ([paper](https://arxiv.org/pdf/2004.14497.pdf)) -16. Gruber, Susan, and Mark J. Van Der Laan. "Targeted maximum likelihood estimation: A gentle introduction." (2009) ([paper](https://biostats.bepress.com/cgi/viewcontent.cgi?article=1255&context=ucbbiostat)) -17. D. Foster, V. Syrgkanis. Orthogonal Statistical Learning. Proceedings of the 32nd Annual Conference on Learning Theory (COLT), 2019 ([paper](https://arxiv.org/pdf/1901.09036.pdf)) -18. V. Syrgkanis, V. Lei, M. Oprescu, M. Hei, K. Battocchi, G. Lewis. Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments. Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), 2019 ([paper](https://arxiv.org/pdf/1905.10176.pdf)) -19. M. Oprescu, V. Syrgkanis and Z. S. Wu. Orthogonal Random Forest for Causal Inference. Proceedings of the 36th International Conference on Machine Learning (ICML), 2019 ([paper](http://proceedings.mlr.press/v97/oprescu19a/oprescu19a.pdf)) -20. Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. Deep IV: A flexible approach for counterfactual prediction. Proceedings of the 34th International Conference on Machine Learning, ICML'17, 2017 ([paper](http://proceedings.mlr.press/v70/hartford17a/hartford17a.pdf)) -21. Battocchi, K., Dillon, E., Hei, M., Lewis, G., Oprescu, M., & Syrgkanis, V. (2021). Estimating the Long-Term Effects of Novel Treatments. arXiv preprint arXiv:2103.08390. ([paper](https://arxiv.org/pdf/2103.08390.pdf)) -22. Lewis, G., & Syrgkanis, V. (2020). Double/Debiased Machine Learning for Dynamic Treatment Effects. arXiv preprint arXiv:2002.07285. ([paper](https://arxiv.org/pdf/2002.07285.pdf)) +* Fabio Vera, Microsoft Research, EconML +* Eleanor Dillon, Microsoft Research, EconML +* Keith Battocchi, Microsoft Research, EconML diff --git a/notebooks/kdd_intro_to_EconML.ipynb b/notebooks/kdd_intro_to_EconML.ipynb deleted file mode 100644 index 91746dd..0000000 --- a/notebooks/kdd_intro_to_EconML.ipynb +++ /dev/null @@ -1,3473 +0,0 @@ -{ - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" - }, - "colab": { - "name": "kdd_intro_to_EconML.ipynb", - "provenance": [], - "collapsed_sections": [], - "include_colab_link": true - } - }, - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "view-in-github", - "colab_type": "text" - }, - "source": [ - "\"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d_YD-YF59idL" - }, - "source": [ - "![EconML-Logo-MSFT-color.png]()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "jYAHRpIF-BRW" - }, - "source": [ - "# **KDD2021 Tutorial:** [Causal Inference and Machine Learning in Practice with EconML and CausalML: Industrial Use Cases at Microsoft, TripAdvisor, Uber](https://causal-machine-learning.github.io/kdd2021-tutorial/)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "AYlHqLQf-PjP" - }, - "source": [ - "# Introduction to [EconML](https://github.com/microsoft/EconML)\n", - "\n", - "A python library for estimation of heterogeneous treatment effects with Machine Learning.\n", - "\n", - "**Presentation:** [Introduction to EconML](https://drive.google.com/file/d/1gt4KNznrYbwdryi9jGcC0-hDCNg7mBNE/view?usp=sharing)\n", - "\n", - "**Github:** https://github.com/microsoft/EconML\n", - "\n", - "**Documentation:** https://econml.azurewebsites.net/\n", - "\n", - "By the Microsoft Research project [ALICE (Automated Learning and Intelligence for Causation and Economics)](https://www.microsoft.com/en-us/research/project/alice/)" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "cEqRBSJV9dTX" - }, - "source": [ - "#!pip install econml" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "id": "yPw9zWWm3H9e" - }, - "source": [ - "!pip install git+https://github.com/microsoft/EconML.git@mehei/driv#egg=econml" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "id": "PKxsq0Mv9cGD" - }, - "source": [ - "import numpy as np\n", - "from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier\n", - "from sklearn.linear_model import LassoCV, Lasso\n", - "from sklearn.preprocessing import PolynomialFeatures\n", - "from sklearn.model_selection import train_test_split\n", - "import matplotlib.pyplot as plt\n", - "import scipy\n", - "import warnings\n", - "warnings.simplefilter('ignore')" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "id": "WNvGZiKF9cGG" - }, - "source": [ - "def gen_data(n, discrete=False):\n", - " X = np.random.normal(0, 1, size=(n, 2))\n", - " W = np.random.normal(0, 1, size=(n, 2))\n", - " if discrete:\n", - " T = np.random.binomial(1, scipy.special.expit(W[:, 0]))\n", - " else:\n", - " T = W[:, 0] + np.random.normal(0, 1, size=(n,))\n", - " y = (X[:, 0] + 1) * T + W[:, 0] + np.random.normal(0, 1, size=(n,))\n", - " return y, T, X, W\n", - "\n", - "def gen_data_iv(n):\n", - " X = np.random.normal(0, 1, size=(n, 2))\n", - " W = np.random.normal(0, 1, size=(n, 2))\n", - " U = np.random.normal(0, 1, size=(n,))\n", - " Z = np.random.normal(0, 1, size=(n,))\n", - " T = Z + W[:, 0] + U\n", - " y = (X[:, 0] + 1) * T + W[:, 0] + U\n", - " return y, T, Z, X, W" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "7xzAqDMb9cGH" - }, - "source": [ - "# 1. Estimation under Exogeneity" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "TuRauvKA9cGH" - }, - "source": [ - "y, T, X, W = gen_data(1000)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "nHhjI8Kb9cGI", - "outputId": "059fd233-bd8e-4de3-fa61-1a042b381aba" - }, - "source": [ - "from econml.dml import NonParamDML\n", - "\n", - "est = NonParamDML(model_y=RandomForestRegressor(), # Any ML model for E[Y|X,W]\n", - " model_t=RandomForestRegressor(), # Any ML model for E[T|X,W]\n", - " model_final=RandomForestRegressor(max_depth=2), # Any ML model for CATE\n", - " discrete_treatment=False, # categorical or continuous treatment\n", - " cv=2, # number of crossfit folds\n", - " mc_iters=1) # repetitions of cross-fitting for stability\n", - "\n", - "est.fit(y, T, X=X, W=W, cache_values=True) # fit the CATE model" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 6 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Gc4fpcMr9cGJ" - }, - "source": [ - "#### Personalized effect estimates on test samples" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "9rqqLxnt9cGJ", - "outputId": "cd71c483-e9e1-48f8-fab7-e8534777a008" - }, - "source": [ - "# personalized effect for each sample from going from treatment 0 to treatment level 1\n", - "est.effect(X[:5], T0=0, T1=1)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "array([ 1.54592291, 0.87946172, -0.47128723, 0.81285758, 3.1442986 ])" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 7 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "qC23J2R09cGJ" - }, - "source": [ - "#### ML model diagnostics" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "yE2v-YzX9cGK", - "outputId": "6d074083-9a40-4222-d7bb-f2d158fc2df0" - }, - "source": [ - "# fitted nuisance models for each cross-fitting fold and out-of-sample scores\n", - "est.models_y, est.nuisance_scores_y" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "([[RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n", - " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", - " max_samples=None, min_impurity_decrease=0.0,\n", - " min_impurity_split=None, min_samples_leaf=1,\n", - " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", - " n_estimators=100, n_jobs=None, oob_score=False,\n", - " random_state=None, verbose=0, warm_start=False),\n", - " RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n", - " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", - " max_samples=None, min_impurity_decrease=0.0,\n", - " min_impurity_split=None, min_samples_leaf=1,\n", - " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", - " n_estimators=100, n_jobs=None, oob_score=False,\n", - " random_state=None, verbose=0, warm_start=False)]],\n", - " [[0.5409335662361056, 0.48599679070328405]])" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 8 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "JXOd9Rfa9cGK", - "outputId": "dfbebf0a-19e9-4f15-9182-e22318015360" - }, - "source": [ - "est.models_t, est.nuisance_scores_t" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "([[RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n", - " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", - " max_samples=None, min_impurity_decrease=0.0,\n", - " min_impurity_split=None, min_samples_leaf=1,\n", - " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", - " n_estimators=100, n_jobs=None, oob_score=False,\n", - " random_state=None, verbose=0, warm_start=False),\n", - " RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n", - " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", - " max_samples=None, min_impurity_decrease=0.0,\n", - " min_impurity_split=None, min_samples_leaf=1,\n", - " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", - " n_estimators=100, n_jobs=None, oob_score=False,\n", - " random_state=None, verbose=0, warm_start=False)]],\n", - " [[0.4282760371961908, 0.3782367983417111]])" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 9 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "CLbqvUvr9cGK" - }, - "source": [ - "#### CATE model diagnostics" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "IdiMMpsQ9cGL", - "outputId": "6b6277ca-32b3-4d29-f5f4-37b765551e39" - }, - "source": [ - "# in-sample goodness-of-fit score for the final cate model\n", - "print(est.score_)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "text": [ - "1.4717907756757855\n" - ], - "name": "stdout" - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "968ktBK_9cGL" - }, - "source": [ - "#### Nuisance quantity diagnostics" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "KDwBlm-i9cGL" - }, - "source": [ - "# calculated residuals for each training sample\n", - "yres, Tres, X_cache, W_cache = est.residuals_" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Jkc1WSi19cGL" - }, - "source": [ - "# 2. Estimation with Instruments" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "8pjyjlRQ9cGM" - }, - "source": [ - "y, T, Z, X, W = gen_data_iv(2000)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "id": "l5gr5sGS9cGM" - }, - "source": [ - "from econml.iv.dml import OrthoIV\n", - "\n", - "est = OrthoIV(model_y_xw=RandomForestRegressor(), # ML model for E[Y|X,W]\n", - " model_t_xw=RandomForestRegressor(), # ML model for E[T|X,W]\n", - " model_z_xw=RandomForestRegressor(), # ML model for E[Z|X,W]\n", - " discrete_treatment=False, # categorical/continuous treatment\n", - " discrete_instrument=False, # categorical/continuous instrument\n", - " cv=2, # number of crossfit folds\n", - " mc_iters=1) # repetitions of cross-fitting for stability" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "id": "b_k1NpEp9cGM", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "8b057fc5-bd87-43fa-ea75-33ee736a2a88" - }, - "source": [ - "est.fit(y, T, Z=Z, X=X, W=W, cache_values=True)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 14 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a0zr3m3r9cGM" - }, - "source": [ - "#### Personalized effect estimates on test samples" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "2wAnhhq99cGN", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "2061468c-6969-4b1a-86fa-ba16a8f6b49c" - }, - "source": [ - "est.effect(X, T0=0, T1=1)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "array([-0.32390668, 2.23728003, 0.70346148, ..., 0.85984079,\n", - " 0.17539819, 1.07325619])" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 15 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "D9ESfbo29cGN" - }, - "source": [ - "#### ML model diagnostics" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "9JlbW4an9cGN", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "f77e36a8-295f-4dff-e11d-21057357e4c8" - }, - "source": [ - "est.models_y_xw, est.nuisance_scores_y_xw" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "([[RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n", - " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", - " max_samples=None, min_impurity_decrease=0.0,\n", - " min_impurity_split=None, min_samples_leaf=1,\n", - " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", - " n_estimators=100, n_jobs=None, oob_score=False,\n", - " random_state=None, verbose=0, warm_start=False),\n", - " RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n", - " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", - " max_samples=None, min_impurity_decrease=0.0,\n", - " min_impurity_split=None, min_samples_leaf=1,\n", - " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", - " n_estimators=100, n_jobs=None, oob_score=False,\n", - " random_state=None, verbose=0, warm_start=False)]],\n", - " [[0.3493220983540998, 0.2621905464991312]])" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 16 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "e1YCQWZK9cGO", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "71292840-3f77-4985-fdf3-855ad7d4f07a" - }, - "source": [ - "est.models_t_xw, est.nuisance_scores_t_xw" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "([[RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n", - " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", - " max_samples=None, min_impurity_decrease=0.0,\n", - " min_impurity_split=None, min_samples_leaf=1,\n", - " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", - " n_estimators=100, n_jobs=None, oob_score=False,\n", - " random_state=None, verbose=0, warm_start=False),\n", - " RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n", - " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", - " max_samples=None, min_impurity_decrease=0.0,\n", - " min_impurity_split=None, min_samples_leaf=1,\n", - " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", - " n_estimators=100, n_jobs=None, oob_score=False,\n", - " random_state=None, verbose=0, warm_start=False)]],\n", - " [[0.24243204274642238, 0.22952337252664082]])" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 17 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "cN5Ga_YH9cGO", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "fca11455-0e2c-4881-96a9-0e07a67d4461" - }, - "source": [ - "est.models_z_xw, est.nuisance_scores_z_xw" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "([[RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n", - " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", - " max_samples=None, min_impurity_decrease=0.0,\n", - " min_impurity_split=None, min_samples_leaf=1,\n", - " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", - " n_estimators=100, n_jobs=None, oob_score=False,\n", - " random_state=None, verbose=0, warm_start=False),\n", - " RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n", - " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", - " max_samples=None, min_impurity_decrease=0.0,\n", - " min_impurity_split=None, min_samples_leaf=1,\n", - " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", - " n_estimators=100, n_jobs=None, oob_score=False,\n", - " random_state=None, verbose=0, warm_start=False)]],\n", - " [[-0.10057707531258632, -0.08476759034439452]])" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 18 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "oNkC5W9i9cGP" - }, - "source": [ - "#### CATE model diagnostics" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "zENiyXuV9cGP", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "621aba3d-e433-4c89-e0dc-1879afd565f6" - }, - "source": [ - "print(est.score_)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "text": [ - "5.018208071305707e-16\n" - ], - "name": "stdout" - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "slJuNhWU9cGP" - }, - "source": [ - "#### Nuisance quantity diagnostics" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "1o5aP05y9cGP" - }, - "source": [ - "yres, Tres, Zres, Xc, Wc, Zc = est.residuals_" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WrsMl9pm9cGQ" - }, - "source": [ - "# 3. Inference" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "oeTIUL6_9cGQ" - }, - "source": [ - "y, T, X, W = gen_data(1000)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "K0kinaK__Djv" - }, - "source": [ - "### Generic Bootstrap Inference" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "rSGauZ1k9cGQ" - }, - "source": [ - "from econml.dml import NonParamDML\n", - "from econml.sklearn_extensions.linear_model import WeightedLasso\n", - "\n", - "est = NonParamDML(model_y=Lasso(alpha=.1), # Any ML model for E[Y|X,W]\n", - " model_t=Lasso(alpha=.1), # Any ML model for E[T|X,W]\n", - " model_final=WeightedLasso(alpha=.1), # Any ML model for CATE that accepts `sample_weight` at fit\n", - " discrete_treatment=False, # categorical or continuous treatment\n", - " cv=2, # number of crossfit folds\n", - " mc_iters=1) # repetitions of cross-fitting for stability" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "lckeXBSV9cGR", - "outputId": "d1cecd80-0022-4e16-c4a9-dc1081f474f3" - }, - "source": [ - "est.fit(y, T, X=X, W=W, inference='bootstrap') # fit the CATE model" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 23 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 235 - }, - "id": "p-RP37i-9cGR", - "outputId": "2126735a-91d6-4908-eb8a-b36bcae1cbc8" - }, - "source": [ - "est.effect_inference(X[:5], T0=0, T1=1).summary_frame()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
point_estimatestderrzstatpvalueci_lowerci_upper
X
01.2700.03932.6200.01.2101.334
11.7070.05133.6340.01.6311.787
22.4170.07930.4280.02.3042.559
31.3740.04133.7000.01.3121.445
4-0.3020.070-4.3380.0-0.449-0.216
\n", - "
" - ], - "text/plain": [ - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "X \n", - "0 1.270 0.039 32.620 0.0 1.210 1.334\n", - "1 1.707 0.051 33.634 0.0 1.631 1.787\n", - "2 2.417 0.079 30.428 0.0 2.304 2.559\n", - "3 1.374 0.041 33.700 0.0 1.312 1.445\n", - "4 -0.302 0.070 -4.338 0.0 -0.449 -0.216" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 24 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 251 - }, - "id": "9tGAONKF9cGS", - "outputId": "cce9a4d0-d2a9-4991-87aa-90504695a8a9" - }, - "source": [ - "est.ate_inference(X)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/html": [ - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Uncertainty of Mean Point Estimate
mean_point stderr_mean zstat pvalue ci_mean_lower ci_mean_upper
0.989 0.061 16.1 0.0 0.888 1.09
\n", - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Distribution of Point Estimate
std_point pct_point_lower pct_point_upper
1.041 -0.704 2.734
\n", - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Total Variance of Point Estimate
stderr_point ci_point_lower ci_point_upper
1.042 -0.698 2.737


Note: The stderr_mean is a conservative upper bound." - ], - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 25 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "ab22GTTm9cGS", - "outputId": "b26f4c08-1938-43fb-c81b-36ca177254f8" - }, - "source": [ - "from econml.inference import BootstrapInference\n", - "est.fit(y, T, X=X, W=W,\n", - " inference=BootstrapInference(n_bootstrap_samples=100,\n", - " bootstrap_type='normal'))" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 26 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 235 - }, - "id": "MK02FTmf9cGS", - "outputId": "3d1bb9f4-cbcf-45ac-930b-f6d837b407b0" - }, - "source": [ - "est.effect_inference(X[:5], T0=0, T1=1).summary_frame()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
point_estimatestderrzstatpvalueci_lowerci_upper
X
01.2670.04627.5910.01.1921.343
11.7040.05530.9780.01.6131.794
22.4140.07930.6230.02.2842.543
31.3720.04828.8790.01.2941.450
4-0.3030.073-4.1250.0-0.424-0.182
\n", - "
" - ], - "text/plain": [ - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "X \n", - "0 1.267 0.046 27.591 0.0 1.192 1.343\n", - "1 1.704 0.055 30.978 0.0 1.613 1.794\n", - "2 2.414 0.079 30.623 0.0 2.284 2.543\n", - "3 1.372 0.048 28.879 0.0 1.294 1.450\n", - "4 -0.303 0.073 -4.125 0.0 -0.424 -0.182" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 27 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 251 - }, - "id": "qIKkO36S9cGT", - "outputId": "574a2f94-2cca-463f-adf6-b2193d1d0e6c" - }, - "source": [ - "est.ate_inference(X)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/html": [ - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Uncertainty of Mean Point Estimate
mean_point stderr_mean zstat pvalue ci_mean_lower ci_mean_upper
0.986 0.065 15.226 0.0 0.88 1.093
\n", - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Distribution of Point Estimate
std_point pct_point_lower pct_point_upper
1.039 -0.704 2.73
\n", - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Total Variance of Point Estimate
stderr_point ci_point_lower ci_point_upper
1.041 -0.703 2.735


Note: The stderr_mean is a conservative upper bound." - ], - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 28 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "LH0Uo_yJ--G_" - }, - "source": [ - "### Tailored Valid Inference" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ZfUSZoW-4TJQ" - }, - "source": [ - "#### Heteroskedasticity-robust OLS inference for linear CATE models $\\theta(x)=\\langle\\theta, \\phi(x)\\rangle$" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "uPiPxMU19cGT" - }, - "source": [ - "from econml.dml import LinearDML\n", - "\n", - "est = LinearDML(model_y=RandomForestRegressor(), # Any ML model for E[Y|X,W]\n", - " model_t=RandomForestRegressor(), # Any ML model for E[T|X,W]\n", - " featurizer=PolynomialFeatures(degree=2, include_bias=False)) # any featurizer for " - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "qAQKnPA59cGU", - "outputId": "77583b37-6895-4c2e-c636-d948a6de8798" - }, - "source": [ - "est.fit(y, T, X=X, W=W)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 30 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 353 - }, - "id": "Cy73CEcm9cGU", - "outputId": "749cefb2-54d6-475a-d630-4a8d9beba4f6" - }, - "source": [ - "est.summary()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/html": [ - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Coefficient Results
point_estimate stderr zstat pvalue ci_lower ci_upper
X0 1.001 0.034 29.398 0.0 0.945 1.057
X1 0.042 0.032 1.302 0.193 -0.011 0.094
X0^2 0.011 0.025 0.447 0.655 -0.029 0.052
X0 X1 0.032 0.027 1.203 0.229 -0.012 0.076
X1^2 -0.024 0.019 -1.269 0.204 -0.056 0.007
\n", - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
CATE Intercept Results
point_estimate stderr zstat pvalue ci_lower ci_upper
cate_intercept 0.99 0.046 21.503 0.0 0.914 1.066


A linear parametric conditional average treatment effect (CATE) model was fitted:
$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$
where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:
$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$
where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.
" - ], - "text/plain": [ - "\n", - "\"\"\"\n", - " Coefficient Results \n", - "===========================================================\n", - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "-----------------------------------------------------------\n", - "X0 1.001 0.034 29.398 0.0 0.945 1.057\n", - "X1 0.042 0.032 1.302 0.193 -0.011 0.094\n", - "X0^2 0.011 0.025 0.447 0.655 -0.029 0.052\n", - "X0 X1 0.032 0.027 1.203 0.229 -0.012 0.076\n", - "X1^2 -0.024 0.019 -1.269 0.204 -0.056 0.007\n", - " CATE Intercept Results \n", - "====================================================================\n", - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "--------------------------------------------------------------------\n", - "cate_intercept 0.99 0.046 21.503 0.0 0.914 1.066\n", - "--------------------------------------------------------------------\n", - "\n", - "A linear parametric conditional average treatment effect (CATE) model was fitted:\n", - "$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$\n", - "where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:\n", - "$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$\n", - "where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.\n", - "\"\"\"" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 38 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 235 - }, - "id": "og2hubAj9cGV", - "outputId": "5f0c835f-dcff-4f9c-d1be-dc6741da2a10" - }, - "source": [ - "est.effect_inference(X[:5], T0=0, T1=1).summary_frame()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
point_estimatestderrzstatpvalueci_lowerci_upper
X
01.2610.04726.7320.01.1841.339
11.7850.06826.2740.01.6731.897
22.4160.08329.0690.02.2802.553
31.4180.04829.8280.01.3401.497
4-0.2720.064-4.2660.0-0.378-0.167
\n", - "
" - ], - "text/plain": [ - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "X \n", - "0 1.261 0.047 26.732 0.0 1.184 1.339\n", - "1 1.785 0.068 26.274 0.0 1.673 1.897\n", - "2 2.416 0.083 29.069 0.0 2.280 2.553\n", - "3 1.418 0.048 29.828 0.0 1.340 1.497\n", - "4 -0.272 0.064 -4.266 0.0 -0.378 -0.167" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 32 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "aMqhTNEb4Zt4" - }, - "source": [ - "#### Debiased Lasso Inference for high-dimensional linear CATE models $\\theta(x)=\\langle\\theta, \\phi(x)\\rangle$" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "dtjN8aC99cGY" - }, - "source": [ - "from econml.dml import SparseLinearDML\n", - "\n", - "est = SparseLinearDML(featurizer=PolynomialFeatures(degree=3, include_bias=False))" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "W3xElnFJ9cGY", - "outputId": "89e6d659-d695-49c0-927c-66e3128c3e13" - }, - "source": [ - "est.fit(y, T, X=X, W=W)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 40 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 437 - }, - "id": "6XHdjW9I9cGY", - "outputId": "b8ad3ad6-7006-47ac-c431-d90c109cf0e3" - }, - "source": [ - "est.summary()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/html": [ - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Coefficient Results
point_estimate stderr zstat pvalue ci_lower ci_upper
X0 0.99 0.065 15.182 0.0 0.882 1.097
X1 0.012 0.07 0.179 0.858 -0.102 0.127
X0^2 0.071 0.022 3.168 0.002 0.034 0.107
X0 X1 -0.012 0.034 -0.337 0.736 -0.068 0.045
X1^2 -0.019 0.027 -0.686 0.493 -0.063 0.026
X0^3 0.004 0.013 0.328 0.743 -0.017 0.026
X0^2 X1 0.026 0.025 1.025 0.305 -0.016 0.067
X0 X1^2 0.021 0.026 0.806 0.42 -0.022 0.065
X1^3 0.001 0.016 0.054 0.957 -0.026 0.027
\n", - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
CATE Intercept Results
point_estimate stderr zstat pvalue ci_lower ci_upper
cate_intercept 0.945 0.051 18.628 0.0 0.862 1.029


A linear parametric conditional average treatment effect (CATE) model was fitted:
$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$
where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:
$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$
where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.
" - ], - "text/plain": [ - "\n", - "\"\"\"\n", - " Coefficient Results \n", - "=============================================================\n", - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "-------------------------------------------------------------\n", - "X0 0.99 0.065 15.182 0.0 0.882 1.097\n", - "X1 0.012 0.07 0.179 0.858 -0.102 0.127\n", - "X0^2 0.071 0.022 3.168 0.002 0.034 0.107\n", - "X0 X1 -0.012 0.034 -0.337 0.736 -0.068 0.045\n", - "X1^2 -0.019 0.027 -0.686 0.493 -0.063 0.026\n", - "X0^3 0.004 0.013 0.328 0.743 -0.017 0.026\n", - "X0^2 X1 0.026 0.025 1.025 0.305 -0.016 0.067\n", - "X0 X1^2 0.021 0.026 0.806 0.42 -0.022 0.065\n", - "X1^3 0.001 0.016 0.054 0.957 -0.026 0.027\n", - " CATE Intercept Results \n", - "====================================================================\n", - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "--------------------------------------------------------------------\n", - "cate_intercept 0.945 0.051 18.628 0.0 0.862 1.029\n", - "--------------------------------------------------------------------\n", - "\n", - "A linear parametric conditional average treatment effect (CATE) model was fitted:\n", - "$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$\n", - "where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:\n", - "$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$\n", - "where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.\n", - "\"\"\"" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 41 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 235 - }, - "id": "nYXaiMBX9cGZ", - "outputId": "34e12072-ae19-40c5-d75f-72d96ff4682c" - }, - "source": [ - "est.effect_inference(X[:5], T0=0, T1=1).summary_frame()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
point_estimatestderrzstatpvalueci_lowerci_upper
X
01.2440.06120.4580.0001.1441.344
11.7430.10217.1200.0001.5751.910
22.5280.09227.5530.0002.3772.679
31.3660.06122.5210.0001.2661.466
4-0.2360.085-2.7780.005-0.376-0.096
\n", - "
" - ], - "text/plain": [ - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "X \n", - "0 1.244 0.061 20.458 0.000 1.144 1.344\n", - "1 1.743 0.102 17.120 0.000 1.575 1.910\n", - "2 2.528 0.092 27.553 0.000 2.377 2.679\n", - "3 1.366 0.061 22.521 0.000 1.266 1.466\n", - "4 -0.236 0.085 -2.778 0.005 -0.376 -0.096" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 42 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "9kI_3o004cnQ" - }, - "source": [ - "#### Bootstrap-of-Little-Bags inference for forests CATE models $\\theta(x)$" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "JVKLLU5H9cGZ" - }, - "source": [ - "y, T, X, W = gen_data(2000, discrete=True)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "TzNqV3HW9cGZ", - "outputId": "e829ef4f-3016-431a-91d3-4864c9cca580" - }, - "source": [ - "from econml.dml import CausalForestDML\n", - "\n", - "est = CausalForestDML(discrete_treatment=True,\n", - " criterion='mse', n_estimators=1000)\n", - "est.tune(y, T, X=X, W=W)\n", - "est.fit(y, T, X=X, W=W, cache_values=True)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 44 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 451 - }, - "id": "PhogESjH9cGa", - "outputId": "97fea945-cc4e-4641-cb7b-e20f8e61064d" - }, - "source": [ - "est.summary()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "text": [ - "Population summary of CATE predictions on Training Data\n" - ], - "name": "stdout" - }, - { - "output_type": "execute_result", - "data": { - "text/html": [ - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Uncertainty of Mean Point Estimate
mean_point stderr_mean zstat pvalue ci_mean_lower ci_mean_upper
1.099 0.205 5.362 0.0 0.762 1.436
\n", - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Distribution of Point Estimate
std_point pct_point_lower pct_point_upper
1.086 -0.862 2.947
\n", - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Total Variance of Point Estimate
stderr_point ci_point_lower ci_point_upper
1.105 -0.862 2.965
\n", - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Doubly Robust ATE on Training Data Results
point_estimate stderr zstat pvalue ci_lower ci_upper
ATE 1.095 0.056 19.388 0.0 1.002 1.187
\n", - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Doubly Robust ATT(T=0) on Training Data Results
point_estimate stderr zstat pvalue ci_lower ci_upper
ATT 1.075 0.078 13.719 0.0 0.947 1.204
\n", - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Doubly Robust ATT(T=1) on Training Data Results
point_estimate stderr zstat pvalue ci_lower ci_upper
ATT 1.116 0.081 13.721 0.0 0.982 1.25


Note: The stderr_mean is a conservative upper bound." - ], - "text/plain": [ - "\n", - "\"\"\"\n", - " Uncertainty of Mean Point Estimate \n", - "===============================================================\n", - "mean_point stderr_mean zstat pvalue ci_mean_lower ci_mean_upper\n", - "---------------------------------------------------------------\n", - " 1.099 0.205 5.362 0.0 0.762 1.436\n", - " Distribution of Point Estimate \n", - "=========================================\n", - "std_point pct_point_lower pct_point_upper\n", - "-----------------------------------------\n", - " 1.086 -0.862 2.947\n", - " Total Variance of Point Estimate \n", - "==========================================\n", - "stderr_point ci_point_lower ci_point_upper\n", - "------------------------------------------\n", - " 1.105 -0.862 2.965\n", - " Doubly Robust ATE on Training Data Results \n", - "=========================================================\n", - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "---------------------------------------------------------\n", - "ATE 1.095 0.056 19.388 0.0 1.002 1.187\n", - " Doubly Robust ATT(T=0) on Training Data Results \n", - "=========================================================\n", - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "---------------------------------------------------------\n", - "ATT 1.075 0.078 13.719 0.0 0.947 1.204\n", - " Doubly Robust ATT(T=1) on Training Data Results \n", - "=========================================================\n", - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "---------------------------------------------------------\n", - "ATT 1.116 0.081 13.721 0.0 0.982 1.25\n", - "---------------------------------------------------------\n", - "\n", - "Note: The stderr_mean is a conservative upper bound.\n", - "\"\"\"" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 45 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 235 - }, - "id": "bZzv-0bo9cGa", - "outputId": "26e8e29b-691e-4fa4-d469-7b00ee463ac4" - }, - "source": [ - "est.effect_inference(X[:5], T0=0, T1=1).summary_frame()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
point_estimatestderrzstatpvalueci_lowerci_upper
X
01.9090.2308.2930.0001.5312.288
10.3080.1132.7350.0060.1230.493
2-0.1130.204-0.5530.580-0.4490.223
31.2770.1926.6650.0000.9621.593
42.3950.22510.6390.0002.0242.765
\n", - "
" - ], - "text/plain": [ - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "X \n", - "0 1.909 0.230 8.293 0.000 1.531 2.288\n", - "1 0.308 0.113 2.735 0.006 0.123 0.493\n", - "2 -0.113 0.204 -0.553 0.580 -0.449 0.223\n", - "3 1.277 0.192 6.665 0.000 0.962 1.593\n", - "4 2.395 0.225 10.639 0.000 2.024 2.765" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 46 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "qqO9by3p9cGa" - }, - "source": [ - "# 4. Causal Scoring" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "9HwG--DU9cGa" - }, - "source": [ - "y, T, X, W = gen_data(2000, discrete=True)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Rr5El3LR9cGa" - }, - "source": [ - "#### Multitude of approaches for CATE estimation to select from" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "je0AkjG99cGb" - }, - "source": [ - "from econml.dml import DML, LinearDML, SparseLinearDML, NonParamDML\n", - "from econml.metalearners import XLearner, TLearner, SLearner, DomainAdaptationLearner\n", - "from econml.dr import DRLearner\n", - "\n", - "reg = lambda: RandomForestRegressor(min_samples_leaf=10)\n", - "clf = lambda: RandomForestClassifier(min_samples_leaf=10)\n", - "# A multitude of possible approaches for CATE estimation under conditional exogeneity\n", - "models = [('ldml', LinearDML(model_y=reg(), model_t=clf(), discrete_treatment=True,\n", - " linear_first_stages=False, cv=3)),\n", - " ('sldml', SparseLinearDML(model_y=reg(), model_t=clf(), discrete_treatment=True,\n", - " featurizer=PolynomialFeatures(degree=2, include_bias=False),\n", - " linear_first_stages=False, cv=3)),\n", - " ('xlearner', XLearner(models=reg(), cate_models=reg(), propensity_model=clf())),\n", - " ('dalearner', DomainAdaptationLearner(models=reg(), final_models=reg(),\n", - " propensity_model=clf())),\n", - " ('slearner', SLearner(overall_model=reg())),\n", - " ('tlearner', TLearner(models=reg())),\n", - " ('drlearner', DRLearner(model_propensity=clf(), model_regression=reg(),\n", - " model_final=reg(), cv=3)),\n", - " ('rlearner', NonParamDML(model_y=reg(), model_t=clf(), model_final=reg(),\n", - " discrete_treatment=True, cv=3)),\n", - " ('dml3dlasso', DML(model_y=reg(), model_t=clf(), model_final=LassoCV(),\n", - " discrete_treatment=True,\n", - " featurizer=PolynomialFeatures(degree=3),\n", - " linear_first_stages=False, cv=3))\n", - "]" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "it8Mb8219cGb" - }, - "source": [ - "#### Split the data in train and validation" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "dsqFr4Cz9cGb" - }, - "source": [ - "XW = np.hstack([X, W])\n", - "XW_train, XW_val, T_train, T_val, Y_train, Y_val = train_test_split(XW, T, y, test_size=.4)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "9_X0_AQS9cGc" - }, - "source": [ - "#### Fit all CATE models on train data" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "fNWePOEi9cGc", - "outputId": "1e887546-43fe-4b1f-f238-06ec0d15ab2b" - }, - "source": [ - "from joblib import Parallel, delayed\n", - "\n", - "def fit_model(name, model):\n", - " return name, model.fit(Y_train, T_train, X=XW_train)\n", - "\n", - "models = Parallel(n_jobs=-1, verbose=1)(delayed(fit_model)(name, mdl) for name, mdl in models)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "text": [ - "[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.\n", - "[Parallel(n_jobs=-1)]: Done 9 out of 9 | elapsed: 17.7s finished\n" - ], - "name": "stderr" - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5HrON5yV9cGc" - }, - "source": [ - "#### Train the scorer on the validation data" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "B4r0CXBP9cGc", - "outputId": "aae0216f-9ed1-4846-d0d4-eaff59d12e77" - }, - "source": [ - "from econml.score import RScorer\n", - "\n", - "# Causal score actually needs fitting on the test set!\n", - "scorer = RScorer(model_y=reg(), model_t=clf(),\n", - " discrete_treatment=True, cv=3,\n", - " mc_iters=3, mc_agg='median')\n", - "scorer.fit(Y_val, T_val, X=XW_val)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 51 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "VaApR6459cGd" - }, - "source": [ - "#### Evaluate each of the trained CATE models on the validation data" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "WX92N2EP9cGd" - }, - "source": [ - "# Then we can evaluate every trained CATE model\n", - "rscore = [scorer.score(mdl) for _, mdl in models]" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "E2ZWa1Mu9cGd" - }, - "source": [ - "#### Calculate ideal score of each model, since we know ground truth" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "1e2SvCV89cGd" - }, - "source": [ - "expected_te_val = 1 + XW_val[:, 0]" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "id": "yPF1_UOM9cGe" - }, - "source": [ - "rootpehe = [np.sqrt(np.mean((expected_te_val.flatten() - mdl.effect(XW_val).flatten())**2))\n", - " for _, mdl in models]" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a4RafR039cGe" - }, - "source": [ - "#### Qualitatively different performance of each method" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 948 - }, - "id": "fkIgNP999cGe", - "outputId": "3aaa0a23-a756-4857-9daa-0f94c68b7280" - }, - "source": [ - "plt.figure(figsize=(16, 16))\n", - "rows = int(np.ceil(len(models) / 3))\n", - "for it, (name, mdl) in enumerate(models):\n", - " plt.subplot(rows, 3, it + 1)\n", - " plt.title('{}. RScore: {:.3f}, Root-PEHE: {:.3f}'.format(name, rscore[it], rootpehe[it]))\n", - " plt.scatter(XW_val[:, 0], mdl.effect(XW_val), label='{}'.format(name))\n", - " plt.plot(XW_val[:, 0], 1 + XW_val[:, 0], 'b--', label='True effect')\n", - " plt.ylabel('Treatment Effect')\n", - " plt.xlabel('x')\n", - " plt.legend()\n", - "plt.show()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "display_data", - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "tags": [], - "needs_background": "light" - } - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hFddbtTU9cGf" - }, - "source": [ - "#### RScore correlates well with ideal score\n", - "\n", - "Higher `Rscore` implies smaller `PEHE`" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 279 - }, - "id": "i2w9dPuI9cGf", - "outputId": "3adc25ea-2f29-46e2-be2e-4a5162994be3" - }, - "source": [ - "plt.scatter(rootpehe, rscore)\n", - "plt.xlabel('rpehe')\n", - "plt.ylabel('rscore')\n", - "plt.show()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "display_data", - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEGCAYAAAB/+QKOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAT8klEQVR4nO3dfWxd933f8ffHktxxswMFtZJF9IPczFGjzZnV0G67JMYatJbTDbLg2oGdbKvXrl7Weh2wRai1dhvm/mG32kP/iAtYGYYs6DzXCVTBRR6YLg/O2i2baMuxIntcFcUPogpEycoFiZlYkr/7g1f2lfSTeEnz8lDU+wUQuud3Hu5XP5D88Hd+55ybqkKSpNNd1HUBkqTlyYCQJDUZEJKkJgNCktRkQEiSmlZ3XcBiueyyy2rDhg1dlyFJ55UnnnjiW1W1rrVuxQTEhg0bmJiY6LoMSTqvJHn+bOs8xSRJajIgJElNBoQkqcmAkCQ1GRCSpKYVcxWTNF979k2xc3ySI9MzrF87wvYtG9m2ebTrsqRlw4DQBWnPvil27N7PzLETAExNz7Bj934AQ0Lq8RSTLkg7xydfDYeTZo6dYOf4ZEcVScuPAaEL0pHpmXm1SxciA0IXpPVrR+bVLl2IDAhdkLZv2cjImlWntI2sWcX2LRs7qkhafpyk1gXp5ES0VzFJZ2dA6IK1bfOogSCdg6eYJElNF/wIwpulJKntgg4Ib5aSpLO7oAPiXDdLGRDD4YhNOn8MdQ4iyc1JJpMcTHJvY/2NSZ5McjzJbX3tP5Xkqb6v7yfZttj1ebPU0jo5YpuanqF4bcS2Z99U16VJahhaQCRZBTwIvA/YBNyZZNNpm70A3AU83N9YVV+squuq6jrgvcBLwOcWu0ZvllpaPt5COr8McwRxA3Cwqg5V1cvAI8At/RtU1XNV9TTwyjmOcxvwmap6abEL9GappXW2kdnU9AxX3/sp3vXAFxxNSMvIMANiFHixb/lwr22+7gD+y6JUdJptm0e5/9ZrGV07QoDRtSPcf+u1nhMfknONzDzlJC0/y3qSOslbgGuB8bOsvxu4G+DKK69c0Ht4s9TS2b5l4ylXjbV4kYC0fAxzBDEFXNG3fHmvbT7eD/xBVR1rrayqXVU1VlVj69atW2CZWiqnj9jOxosEpOVhmCOIvcA1Sa5mNhjuAD4wz2PcCexY7MLUnf4R27se+AJTjTDwIgFpeRjaCKKqjgP3MHt66Fng0ao6kOS+JFsBklyf5DBwO/BQkgMn90+ygdkRyOPDqlHd8iIBaXlLVXVdw6IYGxuriYmJrsvQPHnjnNStJE9U1Vhr3bKepNbK50UC0vLl01wlSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiSmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiSmoYaEEluTjKZ5GCSexvrb0zyZJLjSW47bd2VST6X5NkkzyTZMMxaJUmnGlpAJFkFPAi8D9gE3Jlk02mbvQDcBTzcOMTHgZ1V9XbgBuCbw6pVknSm1UM89g3Awao6BJDkEeAW4JmTG1TVc711r/Tv2AuS1VX1R73tvjvEOiVJDcM8xTQKvNi3fLjXNoi3AdNJdifZl2Rnb0RyiiR3J5lIMnH06NFFKFmSdNJynaReDbwH+DBwPfAjzJ6KOkVV7aqqsaoaW7du3dJWKEkr3DADYgq4om/58l7bIA4DT1XVoao6DuwBfmyR65MkncMwA2IvcE2Sq5NcDNwBPDaPfdcmOTkseC99cxeSpOEbWkD0/vK/BxgHngUeraoDSe5LshUgyfVJDgO3Aw8lOdDb9wSzp5c+n2Q/EOCjw6pVknSmVFXXNSyKsbGxmpiY6LoMSTqvJHmiqsZa65brJLUkqWMGhCSpyYCQJDUZEJKkJgNCktRkQEiSmob5sD5JA9izb4qd45McmZ5h/doRtm/ZyLbNgz62TBoeA0Lq0J59U+zYvZ+ZYycAmJqeYcfu/QCGhDrnKSapQzvHJ18Nh5Nmjp1g5/hkRxVJrzEgpA4dmZ6ZV7u0lAwIqUPr147Mq11aSgaE1KHtWzYysubUz8IaWbOK7Vs2dlSR9BonqaUOnZyI9iomLUcGhNSxbZtHDQQtS55ikiQ1GRCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpNPc5W0YHv2Tfmo8hVs4BFEkquS/HTv9UiSS4dXlqTlbs++KXbs3s/U9AwFTE3PsGP3fvbsm+q6NC2SgQIiyS8BnwQe6jVdDuwZVlGSlr+d45PMHDtxStvMsRPsHJ/sqCIttkFHEL8CvAv4DkBV/SnwpmEVJWn5OzI9M692nX8GDYgfVNXLJxeSrAZqrp2S3JxkMsnBJPc21t+Y5Mkkx5Pcdtq6E0me6n09NmCdkpbI+rUj82rX+WfQgHg8yT8HRpL8DPAJ4A/PtUOSVcCDwPuATcCdSTadttkLwF3Aw41DzFTVdb2vrQPWKWmJbN+ykZE1q05pG1mziu1bNnZUkRbboAHxa8BRYD/wD4FPA78xxz43AAer6lBv9PEIcEv/BlX1XFU9Dbwyr6oldW7b5lHuv/VaRteOEGB07Qj333qtVzGtIHNe5tobCRyoqh8FPjqPY48CL/YtHwZ+fB77/4UkE8Bx4IGqOmNSPMndwN0AV1555TwOLWkxbNs8aiCsYHOOIKrqBDCZZKl/A19VVWPAB4DfSfLWRm27qmqsqsbWrVu3xOVJ0so26I1ybwQOJPlfwPdONs4xNzAFXNG3fHmvbSBVNdX791CSLwGbga8Pur8k6fUZNCD+xQKOvRe4JsnVzAbDHcyOBuaU5I3AS1X1gySXMXuJ7W8voAZJWrAL/U7xgSapq+px4H8Dl/a+nu21nWuf48A9wDjwLPBoVR1Icl+SrQBJrk9yGLgdeCjJgd7ubwcmknwV+CKzcxDPzP+/J0kL453ikKo5b2cgyfuBncCXgADvAbZX1SeHWt08jI2N1cTERNdlSFoh3vXAF5hq3PQ3unaEP7n3vR1UNBxJnujN955h0FNMvw5cX1Xf7B1wHfBfmX38hiStON4pPvh9EBedDIeeb89jX0k673in+OC/5D+bZDzJXUnuAj4FfGZ4ZUlSt7xTfMBTTFW1PcmtwLt7Tbuq6g+GV5Ykdevk1UoX8lVMg05SXw38WVV9v7c8Ary5qp4bbnmDc5JakubvXJPUg55i+gSnPi/pRK9NkrRCDRoQq/sf9917ffFwSpIkLQeDBsTRkze3ASS5BfjWcEqSJC0Hg94H8SHgPyf5CLM3yr0I/L2hVSVJ6tygVzF9HfiJJJf0lr871KokSZ0b6BRTkn+S5A3MPsn1d3ofE3rTcEuTJHVp0DmIX6iq7wA3AT8M/F3ggaFVJUnq3KABkd6/Pwt8vKoO9LVJklagOQMiSYBvJRlnNiDGk1yKnyMtSSvanJPUVVVJ/jLwQeAbVfVSkh8G/v7Qq5MkdWbQy1z3Mnuz3DRAVX2b2Se6SpJWqEED4seBDyZ5ntkrmcLs4OIdQ6tMktSpQQNiy1CrkCQtO4PeKPf8sAuRJC0vfiqcJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0GhCSpyYCQJDUZEJKkJgNCktQ01IBIcnOSySQHk9zbWH9jkieTHE9yW2P9G5IcTvKRYdYpSTrT0AIiySrgQeB9wCbgziSbTtvsBeAu4OGzHOY3gS8Pq0ZJ0tkNcwRxA3Cwqg5V1cvAI8At/RtU1XNV9TSNz7dO8k7gzcDnhlijJOkshhkQo8CLfcuHe21zSnIR8G+BD8+x3d1JJpJMHD16dMGFSpLONOgnyi21XwY+XVWHk5x1o6raBewCGBsbqyWqTdIKtWffFDvHJzkyPcP6tSNs37KRbZsH+rt2RRpmQEwBV/QtX95rG8RPAu9J8svAJcDFSb5bVWdMdEvSYtizb4odu/czc+wEAFPTM+zYvR/ggg2JYZ5i2gtck+TqJBcDdwCPDbJjVX2wqq6sqg3Mnmb6uOEgaZh2jk++Gg4nzRw7wc7xyY4q6t7QAqKqjgP3AOPAs8CjVXUgyX1JtgIkuT7JYeB24KEkB4ZVjySdy5HpmXm1XwiGOgdRVZ8GPn1a27/se72X2VNP5zrGx4CPDaE8SXrV+rUjTDXCYP3akQ6qWR68k1qSgO1bNjKyZtUpbSNrVrF9y8aOKurecr2KSZKW1MmJaK9ieo0BIUk92zaPXtCBcDpPMUmSmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiSmgwISVKTASFJajIgJElNQw2IJDcnmUxyMMm9jfU3JnkyyfEkt/W1X9VrfyrJgSQfGmadkqQzrR7WgZOsAh4EfgY4DOxN8lhVPdO32QvAXcCHT9v9z4CfrKofJLkE+Fpv3yPDqleSdKqhBQRwA3Cwqg4BJHkEuAV4NSCq6rneulf6d6yql/sWfwhPhUnSkhvmL95R4MW+5cO9toEkuSLJ071j/FZr9JDk7iQTSSaOHj36uguWJL1m2f5lXlUvVtU7gL8C/HySNze22VVVY1U1tm7duqUvUpJWsGEGxBRwRd/y5b22eemNHL4GvGeR6pIkDWCYAbEXuCbJ1UkuBu4AHhtkxySXJxnpvX4j8G5gcmiVSpLOMLSAqKrjwD3AOPAs8GhVHUhyX5KtAEmuT3IYuB14KMmB3u5vB/5nkq8CjwP/pqr2D6tWSdKZUlVd17AoxsbGamJiousyJOm8kuSJqhprrVu2k9SSpG4ZEJKkJgNCktRkQEiSmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNa3uugBJ0sLs2TfFzvFJjkzPsH7tCNu3bGTb5tFFO74BIUnnoT37ptixez8zx04AMDU9w47d+wEWLSQ8xSRJ56Gd45OvhsNJM8dOsHN8ctHew4CQpPPQkemZebUvhAEhSeeh9WtH5tW+EAaEJJ2Htm/ZyMiaVae0jaxZxfYtGxftPZyklqTz0MmJaK9ikiSdYdvm0UUNhNN5ikmS1GRASJKaDAhJUpMBIUlqMiAkSU2pqq5rWBRJjgLPd13HkF0GfKvrIpYB+2GW/TDLfnjNQvriqqpa11qxYgLiQpBkoqrGuq6ja/bDLPthlv3wmsXuC08xSZKaDAhJUpMBcX7Z1XUBy4T9MMt+mGU/vGZR+8I5CElSkyMISVKTASFJajIglpkkNyeZTHIwyb2N9f80yTNJnk7y+SRXdVHnUhigLz6UZH+Sp5L8cZJNXdQ5bHP1Q992P5ekkqzISz4H+H64K8nR3vfDU0n+QRd1Dtsg3w9J3t/7PXEgycMLfrOq8muZfAGrgK8DPwJcDHwV2HTaNj8F/MXe638E/H7XdXfYF2/oe70V+GzXdXfRD73tLgW+DHwFGOu67o6+H+4CPtJ1rcugH64B9gFv7C2/aaHv5whiebkBOFhVh6rqZeAR4Jb+Darqi1X1Um/xK8DlS1zjUhmkL77Tt/iXgJV4xcWc/dDzm8BvAd9fyuKW0KD9sNIN0g+/BDxYVX8OUFXfXOibGRDLyyjwYt/y4V7b2fwi8JmhVtSdgfoiya8k+Trw28CvLlFtS2nOfkjyY8AVVfWppSxsiQ36s/FzvdOvn0xyxdKUtqQG6Ye3AW9L8idJvpLk5oW+mQFxnkryd4AxYGfXtXSpqh6sqrcCvwb8Rtf1LLUkFwH/DvhnXdeyDPwhsKGq3gH8EfCfOq6nK6uZPc30N4E7gY8mWbuQAxkQy8sU0P9Xz+W9tlMk+Wng14GtVfWDJaptqQ3UF30eAbYNtaJuzNUPlwJ/DfhSkueAnwAeW4ET1XN+P1TVt/t+Hv4D8M4lqm0pDfJzcRh4rKqOVdU3gP/DbGDMmwGxvOwFrklydZKLgTuAx/o3SLIZeIjZcFjwucXzwCB90f9N/7eAP13C+pbKOfuhqv5fVV1WVRuqagOz81Jbq2qim3KHZpDvh7f0LW4Fnl3C+pbKnP0A7GF29ECSy5g95XRoIW+2euF1arFV1fEk9wDjzF6t8B+r6kCS+4CJqnqM2VNKlwCfSALwQlVt7azoIRmwL+7pjaaOAX8O/Hx3FQ/HgP2w4g3YD7+aZCtwHPi/zF7VtKIM2A/jwE1JngFOANur6tsLeT8ftSFJavIUkySpyYCQJDUZEJKkJgNCktRkQEiSmgwIaQkl+ViS27quQxqEASEtUGb5M6QVy29uaR6SbOg9i//jwNeAE0n+fe+5+59Psq633VuTfDbJE0n+W5If7TvMjUn+e5JD/aOJJNuT7O09bO5fL/F/TTqDASHN3zXA71bVX+0tT/RePw78q17bLuAfV9U7gQ8Dv9u3/1uAdwN/G3gAIMlNvePeAFwHvDPJjcP+j0jn4qM2pPl7vqq+0nv9CvD7vde/B+xOcgnwN3jtcSgAP9S3/56qegV4Jsmbe2039b729ZYvYTYwvjyc/4I0NwNCmr/vnWNdMTsyn66q686yTf8TeNP37/1V9dAi1CctCk8xSa/PRcDJeYQPAH/c+6S7byS5HV6dzP7rcxxnHPiF3uiDJKNJ3jSsoqVBGBDS6/M94IYkXwPeC9zXa/8g8ItJvgocYI6Px6yqzwEPA/8jyX7gk8x+1oPUGZ/mKr0OSb5bVZd0XYc0DI4gJElNjiAkSU2OICRJTQaEJKnJgJAkNRkQkqQmA0KS1PT/AXiZS0Wzh+rUAAAAAElFTkSuQmCC\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "tags": [], - "needs_background": "light" - } - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "r8eRryFl9cGf" - }, - "source": [ - "#### Choose CATE model with larger Rscore" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "wTaPVNH39cGf", - "outputId": "bb85de97-45cd-4a07-d659-38bc56183bfb" - }, - "source": [ - "mdl, score = scorer.best_model([mdl for _, mdl in models])\n", - "mdl" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 57 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "cULnL_mZ9cGg" - }, - "source": [ - "rootpehe_best = np.sqrt(np.mean((expected_te_val.flatten() - mdl.effect(XW_val).flatten())**2))" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 295 - }, - "id": "kLwPBbku9cGg", - "outputId": "125c9a16-1d42-4800-8ee5-c77c24d330f1" - }, - "source": [ - "plt.figure()\n", - "plt.title('RScore: {:.3f}, Root-PEHE: {:.3f}'.format(score, rootpehe_best))\n", - "plt.scatter(XW_val[:, 0], mdl.effect(XW_val), label='best')\n", - "plt.plot(XW_val[:, 0], 1 + XW_val[:, 0], 'b--', label='True effect')\n", - "plt.ylabel('Treatment Effect')\n", - "plt.xlabel('x')\n", - "plt.legend()\n", - "plt.show()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "display_data", - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "tags": [], - "needs_background": "light" - } - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "AWpMuY_49cGg" - }, - "source": [ - "# 4. Interpretation" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "PkjmyQgj9cGg" - }, - "source": [ - "y, T, X, W = gen_data(2000, discrete=True)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "S53DNzMd9cGg" - }, - "source": [ - "#### Fit any CATE model" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "XoHzf8Lo9cGg", - "outputId": "e5b36880-2294-4fbb-d4f7-b4e2535f1d9b" - }, - "source": [ - "from econml.dml import NonParamDML\n", - "\n", - "est = NonParamDML(model_y=RandomForestRegressor(min_samples_leaf=10), # Any ML model for E[Y|X,W]\n", - " model_t=RandomForestClassifier(min_samples_leaf=10), # Any ML model for E[T|X,W]\n", - " model_final=RandomForestRegressor(max_depth=2), # Any ML model for CATE\n", - " discrete_treatment=True, # categorical or continuous treatment\n", - " cv=5, # number of crossfit folds\n", - " mc_iters=1) # repetitions of cross-fitting for stability\n", - "\n", - "est.fit(y, T, X=X, W=W, cache_values=True) # fit the CATE model" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 61 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "2N5UElbB9cGh" - }, - "source": [ - "#### Interpret its behavior with a single Tree" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "YnMz_1Hr9cGh", - "outputId": "cbca0155-0c49-424b-9366-c0496df4e3b5" - }, - "source": [ - "from econml.cate_interpreter import SingleTreeCateInterpreter\n", - "\n", - "intrp = SingleTreeCateInterpreter(max_depth=1)\n", - "intrp.interpret(est, X)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 62 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "JVQosW2K9cGh" - }, - "source": [ - "intrp.export_graphviz(out_file='cate_tree.dot')" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 248 - }, - "id": "Rvq9jg649cGh", - "outputId": "b43e7d52-09d1-45cd-b21f-08d2f51cfa55" - }, - "source": [ - "intrp.plot()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "display_data", - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "tags": [], - "needs_background": "light" - } - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WCtJ48zI9cGi" - }, - "source": [ - "#### Make tree-based policy recommendations from CATE model" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "SZL7vB1_9cGi", - "outputId": "b085ab61-7631-4959-b5e8-ed458e8e80d1" - }, - "source": [ - "from econml.cate_interpreter import SingleTreePolicyInterpreter\n", - "\n", - "intrp = SingleTreePolicyInterpreter(max_depth=1)\n", - "intrp.interpret(est, X, sample_treatment_costs=0.2)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 65 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "-tWTr9ha9cGi" - }, - "source": [ - "intrp.export_graphviz(out_file='policy_tree.dot')" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 279 - }, - "id": "J90REc4o9cGi", - "outputId": "44988977-d39f-4fec-cca2-fe02014c6bc9" - }, - "source": [ - "intrp.plot()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "display_data", - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "tags": [], - "needs_background": "light" - } - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "YMzp0si39cGj" - }, - "source": [ - "#### Interpret CATE model with SHAP" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "FS1Uy2V-9cGj" - }, - "source": [ - "shap_values = est.shap_values(X)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 194 - }, - "id": "cXdMGuBY9cGj", - "outputId": "f980abf2-c1ed-4305-f76c-48fae99331d8" - }, - "source": [ - "import shap\n", - "\n", - "# effect heterogeneity feature importances with summary plot\n", - "shap.summary_plot(shap_values['Y0']['T0_1'])" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "display_data", - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "tags": [], - "needs_background": "light" - } - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 193 - }, - "id": "rowyWQ_l9cGj", - "outputId": "0dc74436-6ca5-417b-e45c-b260c20b6e19" - }, - "source": [ - "shap.initjs()\n", - "# explain the heterogeneity of the effect of any single sample\n", - "shap.force_plot(shap_values['Y0']['T0_1'][0])" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "display_data", - "data": { - "text/html": [ - "
" - ], - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - } - }, - { - "output_type": "execute_result", - "data": { - "text/html": [ - "\n", - "
\n", - "
\n", - " Visualization omitted, Javascript library not loaded!
\n", - " Have you run `initjs()` in this notebook? If this notebook was from another\n", - " user you must also trust this notebook (File -> Trust notebook). If you are viewing\n", - " this notebook on github the Javascript has been stripped for security. If you are using\n", - " JupyterLab this error is because a JupyterLab extension has not yet been written.\n", - "
\n", - " " - ], - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 70 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dA1nRbFK9cGk" - }, - "source": [ - "# 5. Validation and Sensitivity" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "Nds7Fwyg9cGl" - }, - "source": [ - "y, T, Z, X, W = gen_data_iv(2000)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-zHcWkF19cGl" - }, - "source": [ - "#### Instantiate any CATE model" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "0nMA97Jh9cGl" - }, - "source": [ - "from econml.iv.dml import OrthoIV\n", - "\n", - "est = OrthoIV(model_y_xw=RandomForestRegressor(), # ML model for E[Y|X,W]\n", - " model_t_xw=RandomForestRegressor(), # ML model for E[Y|X,W]\n", - " model_z_xw=RandomForestRegressor(), # ML model for E[Y|X,W]\n", - " discrete_treatment=False, # categorical/continuous treatment\n", - " discrete_instrument=False, # categorical/continuous instrument\n", - " cv=2, # number of crossfit folds\n", - " mc_iters=1) # repetitions of cross-fitting for stability" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Z91lWEiQ9cGl" - }, - "source": [ - "#### Enable dowhy capabilities" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "QEe9XMai9cGm" - }, - "source": [ - "import dowhy\n", - "est = est.dowhy" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BIWgh7Ce9cGm" - }, - "source": [ - "#### Then fit" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "roPf-t439cGm", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "4d24cac9-58fa-44fa-87e9-f8e57ec68176" - }, - "source": [ - "est.fit(y, T, Z=Z, X=X, W=W, cache_values=True)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 74 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5tlyZMcP9cGn" - }, - "source": [ - "#### Use it as a normal EconML cate estimator" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "lHr9Ooaa9cGn", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 290 - }, - "outputId": "e86a4da8-cbac-4271-ae99-8d312fd1949c" - }, - "source": [ - "est.summary()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/html": [ - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
Coefficient Results
point_estimate stderr zstat pvalue ci_lower ci_upper
X0 0.99 0.035 27.933 0.0 0.932 1.049
X1 -0.012 0.027 -0.441 0.659 -0.057 0.033
\n", - "\n", - "\n", - "\n", - " \n", - "\n", - "\n", - " \n", - "\n", - "
CATE Intercept Results
point_estimate stderr zstat pvalue ci_lower ci_upper
cate_intercept 1.018 0.026 38.577 0.0 0.975 1.061


A linear parametric conditional average treatment effect (CATE) model was fitted:
$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$
where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:
$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$
where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.
" - ], - "text/plain": [ - "\n", - "\"\"\"\n", - " Coefficient Results \n", - "========================================================\n", - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "--------------------------------------------------------\n", - "X0 0.99 0.035 27.933 0.0 0.932 1.049\n", - "X1 -0.012 0.027 -0.441 0.659 -0.057 0.033\n", - " CATE Intercept Results \n", - "====================================================================\n", - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "--------------------------------------------------------------------\n", - "cate_intercept 1.018 0.026 38.577 0.0 0.975 1.061\n", - "--------------------------------------------------------------------\n", - "\n", - "A linear parametric conditional average treatment effect (CATE) model was fitted:\n", - "$Y = \\Theta(X)\\cdot T + g(X, W) + \\epsilon$\n", - "where for every outcome $i$ and treatment $j$ the CATE $\\Theta_{ij}(X)$ has the form:\n", - "$\\Theta_{ij}(X) = \\phi(X)' coef_{ij} + cate\\_intercept_{ij}$\n", - "where $\\phi(X)$ is the output of the `featurizer` or $X$ if `featurizer`=None. Coefficient Results table portrays the $coef_{ij}$ parameter vector for each outcome $i$ and treatment $j$. Intercept Results table portrays the $cate\\_intercept_{ij}$ parameter.\n", - "\"\"\"" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 75 - } - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "o1z0z-E79cGo", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 235 - }, - "outputId": "5cf7bc63-16c6-4eb4-9cd7-3ecd7a7128fd" - }, - "source": [ - "est.effect_inference(X[:5]).summary_frame()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
point_estimatestderrzstatpvalueci_lowerci_upper
X
00.2860.0456.3620.00.2120.360
12.7300.07138.6180.02.6132.846
20.6780.03022.8660.00.6300.727
30.7600.03819.9510.00.6970.823
42.5390.08629.6050.02.3982.680
\n", - "
" - ], - "text/plain": [ - " point_estimate stderr zstat pvalue ci_lower ci_upper\n", - "X \n", - "0 0.286 0.045 6.362 0.0 0.212 0.360\n", - "1 2.730 0.071 38.618 0.0 2.613 2.846\n", - "2 0.678 0.030 22.866 0.0 0.630 0.727\n", - "3 0.760 0.038 19.951 0.0 0.697 0.823\n", - "4 2.539 0.086 29.605 0.0 2.398 2.680" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 76 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "cAPR5eMZ9cGo" - }, - "source": [ - "#### But now we also have DoWhy capabilities: Sensitivity Analysis" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "9aUCWN489cGo" - }, - "source": [ - "ref_res = est.refute_estimate(method_name=\"add_unobserved_common_cause\",\n", - " effect_strenght_on_treatment=0.05,\n", - " effect_strength_on_outcome=0.5)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "id": "7bTLzM_H9cGp", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "7b5fdb67-e26d-46f0-b46d-572001977533" - }, - "source": [ - "print(ref_res)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "text": [ - "Refute: Add an Unobserved Common Cause\n", - "Estimated effect:1.0314669223854354\n", - "New effect:0.9983858330683743\n", - "\n" - ], - "name": "stdout" - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "MuaLIl6X9cGp" - }, - "source": [ - "# 6. Policy Learning" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "Z5NSUCzK9cGp" - }, - "source": [ - "y, T, X, W = gen_data(2000, discrete=True)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "_oGYT0HP9cGp" - }, - "source": [ - "#### Fit a Doubly Robust policy tree" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "PLk_-21M9cGp" - }, - "source": [ - "from econml.policy import DRPolicyTree\n", - "\n", - "est = DRPolicyTree(max_depth=2, min_impurity_decrease=0.01, honest=True)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "DVBwu-oh9cGq", - "outputId": "8972c75a-1793-4611-959c-708e85ea5796" - }, - "source": [ - "est.fit(y, T, X=X, W=W)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 81 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WNpPMq0Z9cGq" - }, - "source": [ - "#### Visualize treatment policy" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 279 - }, - "id": "4YnPJkJW9cGq", - "outputId": "5b9b02ea-01dd-4c3a-d598-4fb61b8e02ce" - }, - "source": [ - "est.plot()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "display_data", - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "tags": [], - "needs_background": "light" - } - } - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "JjBym5Iw9cGr", - "outputId": "2fdf00f7-9f6f-494e-be9a-84b85ad7d30c" - }, - "source": [ - "est.feature_importances_" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "array([1., 0.])" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 83 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "L7q7UxZG9cGr" - }, - "source": [ - "#### Produce recommended treatments" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "sZ_E-Tjk9cGr", - "outputId": "4773b57f-bc7e-4f70-f65c-50ad9471041e" - }, - "source": [ - "est.predict(X[:100])" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "array([1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,\n", - " 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1,\n", - " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1,\n", - " 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1,\n", - " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1])" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 84 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Zb3MFbCq9cGs" - }, - "source": [ - "#### Fit a Doubly Robust policy forest" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "N4CAMtPQ9cGs" - }, - "source": [ - "from econml.policy import DRPolicyForest\n", - "\n", - "est = DRPolicyForest(n_estimators=100, max_depth=2,\n", - " min_impurity_decrease=0.01, honest=True)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "jK5vBCCm9cGs", - "outputId": "1b97cc54-6324-42c0-e403-fba3dc3dee14" - }, - "source": [ - "est.fit(y, T, X=X, W=W)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 86 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lhiGgCKD9cGt" - }, - "source": [ - "#### Produce recommended treatments" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "lAGuJ1wB9cGt", - "outputId": "b1590aad-d1d1-4657-8e65-012ae7abec58" - }, - "source": [ - "est.predict(X[:100])" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "array([1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,\n", - " 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1,\n", - " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1,\n", - " 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1,\n", - " 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])" - ] - }, - "metadata": { - "tags": [] - }, - "execution_count": 87 - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "3xWXtC8k9cGt" - }, - "source": [ - "#### Plot one of the trees" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 279 - }, - "id": "Xen6bm7Y9cGt", - "outputId": "25f978fc-a5d0-4395-9bb2-a0b089ff2828" - }, - "source": [ - "est.plot(0)" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "display_data", - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "tags": [], - "needs_background": "light" - } - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "HInWzLf-84ta" - }, - "source": [ - "#### Plot decisions as function of covariates" - ] - }, - { - "cell_type": "code", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 265 - }, - "id": "0EdmyR-39cGt", - "outputId": "992dd9e8-abed-4f2b-a3cd-2d12e507ed05" - }, - "source": [ - "plt.scatter(X[:, 0], est.predict(X))\n", - "plt.show()" - ], - "execution_count": null, - "outputs": [ - { - "output_type": "display_data", - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAQt0lEQVR4nO3df6zddX3H8eeLy2UWfxXXayZtXYlDsk5U9AYxJBvxxyjoKOrcYLrNzcg/YlwkNTAMbjijrpnORBaHzjgdypgiaxRT3WQxIeC4WH4IWFOZSi9uXIWyGaqU8t4f56Cnt/fec27vac/tp89HQnrP9/v5fs+7hT5z7jnfyzdVhSTp8HfUqAeQJA2HQZekRhh0SWqEQZekRhh0SWrE0aN64lWrVtW6detG9fSSdFi69dZbf1RVE3PtG1nQ161bx9TU1KieXpIOS0m+P98+33KRpEYYdElqhEGXpEYYdElqhEGXpEb0vcolySeAVwMPVNXz5tgf4MPA2cAjwJuq6pvDHlTL07qLvzTqEXQEefIxY7z3NSdz7imredd1d/KZb/yAx2vf/eNjR/Hw7j08fcU4Cex6ZA/Hr1zBpjNP4txTVnPdtmk2b93O/bt2s/LYcapYcP1ces/Rb+0wjhtU+v3fFpP8JvAT4FPzBP1s4G10gv4S4MNV9ZJ+Tzw5OVletnh4M+YahbGjwmknHMeN331wUcetGB/jdS9ezedvnWb3nr0DrX/fa0/eL7jXbZvmkmvv3Occ860dxnGzJbm1qibn2tf3LZeq+jqw0J/cRjqxr6q6GViZ5FkDTydJi7D38Vp0zAF279nLZ79x30Axf2L95q3b99u+eev2/c4x39phHLcYw3gPfTVwX8/jnd1t+0lyQZKpJFMzMzNDeGpJGtzeRd7/4f5duwfattD2pR63GIf0Q9GqurKqJqtqcmJizp9claSDZixZ1PrjV64YaNtC25d63GIMI+jTwNqex2u62yRp6MaOCqc/5xmLPm7F+Bjnv2QtK8bHBl6/6cyT9tu+6cyT9jvHfGuHcdxiDCPoW4A/SsdpwMNV9cMhnFfL3Pfe/6pRj6AjzJOPGeNvXv8CrnrLS3njac/mqOy/f+WKcQKsXDHOccd2vl69cgXve+3J/NW5J/O+157M6pUrCHDcseMLrp/rw8pzT1m9zzkWWjuM4xZjkKtcPgucAawC/gd4NzAOUFUf7V62+BFgA53LFv+kqvpevuJVLpK0eAtd5dL3OvSqOr/P/gLeeoCzSZKGxJ8UlaRGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGDBT0JBuSbE+yI8nFc+x/dpIbkmxLckeSs4c/qiRpIX2DnmQMuAI4C1gPnJ9k/axl7wKuqapTgPOAvxv2oJKkhQ3yCv1UYEdV3VtVjwJXAxtnrSngad2vnw7cP7wRJUmDGCToq4H7eh7v7G7r9RfAG5PsBK4H3jbXiZJckGQqydTMzMwBjCtJms+wPhQ9H/hkVa0BzgY+nWS/c1fVlVU1WVWTExMTQ3pqSRIMFvRpYG3P4zXdbb3eDFwDUFU3AU8CVg1jQEnSYAYJ+i3AiUlOSHIMnQ89t8xa8wPg5QBJfp1O0H1PRZIOob5Br6rHgAuBrcA9dK5muSvJ5UnO6S67CHhLktuBzwJvqqo6WENLkvZ39CCLqup6Oh929m67rOfru4HThzuaJGkx/ElRSWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRgwU9CQbkmxPsiPJxfOs+b0kdye5K8lnhjumJKmfo/stSDIGXAG8EtgJ3JJkS1Xd3bPmROAS4PSqeijJMw/WwJKkuQ3yCv1UYEdV3VtVjwJXAxtnrXkLcEVVPQRQVQ8Md0xJUj+DBH01cF/P453dbb2eCzw3yY1Jbk6yYa4TJbkgyVSSqZmZmQObWJI0p2F9KHo0cCJwBnA+8LEkK2cvqqorq2qyqiYnJiaG9NSSJBgs6NPA2p7Ha7rbeu0EtlTVnqr6L+A7dAIvSTpEBgn6LcCJSU5IcgxwHrBl1prr6Lw6J8kqOm/B3DvEOSVJffQNelU9BlwIbAXuAa6pqruSXJ7knO6yrcCPk9wN3ABsqqofH6yhJUn7S1WN5IknJydrampqJM8tSYerJLdW1eRc+/xJUUlqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqxEBBT7IhyfYkO5JcvMC61yWpJJPDG1GSNIi+QU8yBlwBnAWsB85Psn6OdU8F3g58Y9hDSpL6G+QV+qnAjqq6t6oeBa4GNs6x7j3AB4CfDnE+SdKABgn6auC+nsc7u9t+LsmLgLVV9aWFTpTkgiRTSaZmZmYWPawkaX5L/lA0yVHAB4GL+q2tqiurarKqJicmJpb61JKkHoMEfRpY2/N4TXfbE54KPA/4jyTfA04DtvjBqCQdWoME/RbgxCQnJDkGOA/Y8sTOqnq4qlZV1bqqWgfcDJxTVVMHZWJJ0pz6Br2qHgMuBLYC9wDXVNVdSS5Pcs7BHlCSNJijB1lUVdcD18/adtk8a89Y+liSpMXyJ0UlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaMVDQk2xIsj3JjiQXz7H/HUnuTnJHkn9P8qvDH1WStJC+QU8yBlwBnAWsB85Psn7Wsm3AZFU9H/gc8NfDHlSStLBBXqGfCuyoqnur6lHgamBj74KquqGqHuk+vBlYM9wxJUn9DBL01cB9PY93drfN583Al+fakeSCJFNJpmZmZgafUpLU11A/FE3yRmAS2DzX/qq6sqomq2pyYmJimE8tSUe8owdYMw2s7Xm8prttH0leAVwK/FZV/Ww440mSBjXIK/RbgBOTnJDkGOA8YEvvgiSnAH8PnFNVDwx/TElSP32DXlWPARcCW4F7gGuq6q4klyc5p7tsM/AU4F+S3JZkyzynkyQdJIO85UJVXQ9cP2vbZT1fv2LIc0mSFsmfFJWkRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhh0SWqEQZekRhw9yKIkG4APA2PAx6vq/bP2/xLwKeDFwI+B36+q7w13VLhu2zSbt27n/l27WXnsOFXw8O49HL9yBZvOPIlzT1m96PMs5tgnjpvetZuxhL1VP/81QHXXHXfsOO/+nd/Y75zXbZvmkmvvYPeexxf5Oz88nP6cZ3DVW1466jGkI1bfV+hJxoArgLOA9cD5SdbPWvZm4KGq+jXgQ8AHhj1oJ4Z3Mr1rNwU89Mgedu3eQwHTu3ZzybV3ct226UWfZ9Bje48D2Fu1z6/Vs/ahR/aw6XO373PO67ZN845/vq3ZmAPc+N0HecPHbhr1GNIRa5C3XE4FdlTVvVX1KHA1sHHWmo3AP3a//hzw8iQZ3piweet2du/ZO+/+3Xv2snnr9gM6zyDH9nv+2fbsrX3OuXnrdtpN+S/c+N0HRz2CdMQaJOirgft6Hu/sbptzTVU9BjwM/PLsEyW5IMlUkqmZmZlFDXp/95XxwVrT79hBzr3QMQdyvCQtxiH9ULSqrqyqyaqanJiYWNSxx69ccVDX9Dt2kHMvdMyBHC9JizFI0KeBtT2P13S3zbkmydHA0+l8ODo0m848iRXjY/PuXzE+xqYzTzqg8wxybL/nn218LPucc9OZJx0RlxSd/pxnjHoE6Yg1yFUutwAnJjmBTrjPA/5g1potwB8DNwG/C3ytqooheuKKkaVe5TL7PIMe23vcgVzl8sTXXuUi6WDJIN1Ncjbwt3QuW/xEVb03yeXAVFVtSfIk4NPAKcCDwHlVde9C55ycnKypqakl/wYk6UiS5Naqmpxr30DXoVfV9cD1s7Zd1vP1T4HXL2VISdLSHAlv60rSEcGgS1IjDLokNcKgS1IjBrrK5aA8cTIDfH+e3auAHx3CcRZjOc8GzrdUzrc0znfgBp3tV6tqzp/MHFnQF5Jkar7LckZtOc8GzrdUzrc0znfghjGbb7lIUiMMuiQ1YrkG/cpRD7CA5TwbON9SOd/SON+BW/Jsy/I9dEnS4i3XV+iSpEUy6JLUiGUd9CQXJakkq0Y9S68k70lyR5LbknwlyfGjnqlXks1Jvt2d8QtJVo56pl5JXp/kriSPJ1k2l5Al2ZBke5IdSS4e9Ty9knwiyQNJvjXqWWZLsjbJDUnu7v57ffuoZ+qV5ElJ/jPJ7d35/nLUM80lyViSbUm+eKDnWLZBT7IW+G3gB6OeZQ6bq+r5VfVC4IvAZf0OOMS+Cjyvqp4PfAe4ZMTzzPYt4LXA10c9yBMGvBn6KH0S2DDqIebxGHBRVa0HTgPeusz+7H4GvKyqXgC8ENiQ5LQRzzSXtwP3LOUEyzbowIeAd/KL+0YsG1X1vz0Pn8wym7GqvtK9tyvAzXTuMrVsVNU9VdX/jt6H1iA3Qx+Zqvo6nXsNLDtV9cOq+mb36/+jE6X+d5s5RKrjJ92H491/ltXf2SRrgFcBH1/KeZZl0JNsBKar6vZRzzKfJO9Nch/wBpbfK/Refwp8edRDHAYGuRm6+kiyjs6Nbr4x2kn21X074zbgAeCrVbWs5qNzA6F3Aku6ndlAN7g4GJL8G/Arc+y6FPhzOm+3jMxC81XVv1bVpcClSS4BLgTevZzm6665lM63w1cdytm6z913PrUlyVOAzwN/Nuu72JGrqr3AC7ufJ30hyfOqall8HpHk1cADVXVrkjOWcq6RBb2qXjHX9iQnAycAtyeBztsF30xyalX996jnm8NVdO7mdEiD3m++JG8CXg28fNj3dx3EIv78lotBboaueSQZpxPzq6rq2lHPM5+q2pXkBjqfRyyLoAOnA+d0b/X5JOBpSf6pqt642BMtu7dcqurOqnpmVa2rqnV0vvV90aGMeT9JTux5uBH49qhmmUuSDXS+fTunqh4Z9TyHiZ/fDD3JMXRuhr5lxDMdFtJ55fUPwD1V9cFRzzNbkoknrvRKsgJ4Jcvo72xVXVJVa7q9Ow/42oHEHJZh0A8T70/yrSR30HlraFldpgV8BHgq8NXupZUfHfVAvZK8JslO4KXAl5JsHfVM3Q+RLwS20vlQ75qqumu0U/1Cks8CNwEnJdmZ5M2jnqnH6cAfAi/r/vd2W/fV5nLxLOCG7t/XW+i8h37AlwYuZ/7ovyQ1wlfoktQIgy5JjTDoktQIgy5JjTDoktQIgy5JjTDoktSI/wcFDUSp/D5FigAAAABJRU5ErkJggg==\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "tags": [], - "needs_background": "light" - } - } - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "c90e1Sei80dv" - }, - "source": [ - "" - ], - "execution_count": null, - "outputs": [] - } - ] -} \ No newline at end of file