EconMLの因果探索例

EconMLの因果探索例

投稿者: ★排泄ケア相談員の認定試験問題集★ 7/10/2025

EconML

概要 EconMLは、Microsoft ResearchのALICEチームによって開発されたライブラリで、観察データから異質な治療効果を推定するために機械学習技術を適用し、経済学と機械学習の交差点に焦点を当てています。
特徴

推定方法

ダブルマシンラーニング（線形、スパース線形、汎用ML）
動的ダブルマシンラーニング
因果フォレスト（Causal Forests）
直交ランダムフォレスト（Orthogonal Random Forests）
メタラーナー（XLearner, SLearner, TLearner）
ダブルロバストラーナー（線形、スパース線形、非パラメトリック）
計器変数を使用したダブルマシンラーニング（直交、非パラメトリック）
計器変数を使用したダブルロバストマシンラーニング（線形、スパース線形、非パラメトリック、線形ITT）
深層計器変数（Deep Instrumental Variables）

解釈可能性

CATEモデルのツリー解釈
CATEモデルの政策解釈
SHAP値によるCATEモデルの解釈

因果モデル選択とクロスバリデーション

RScorerを使用した因果モデル選択
ファーストステージモデル選択

推論

効果推論サマリー
人口サマリー
パラメータ推論サマリー

政策学習

DRPolicyTree, DRPolicyForestによるダブルロバスト政策学習

適用例: A/Bテストの推薦、顧客セグメンテーション、多投資帰属分析などに使用。詳細はEconMLのドキュメントで確認できます。

!pip install dowhy
!pip install econml
%load_ext autoreload
%autoreload 2
import numpy as np
import pandas as pd
import logging

import dowhy
from dowhy import CausalModel
import dowhy.datasets

import econml
import warnings
warnings.filterwarnings('ignore')

BETA = 10
data = dowhy.datasets.linear_dataset(BETA, num_common_causes=4, num_samples=10000,
num_instruments=2, num_effect_modifiers=2,
num_treatments=1,
treatment_is_binary=False,
num_discrete_common_causes=2,
num_discrete_effect_modifiers=0,
one_hot_encode=False)
df=data['df']
print(df.head())
print("True causal estimate is", data["ate"])
model = CausalModel(data=data["df"],
treatment=data["treatment_name"], outcome=data["outcome_name"],
graph=data["gml_graph"])
model.view_model()
from IPython.display import Image, display
display(Image(filename="causal_model.png"))
出力結果
0 　-1.628404　 0.563526 　 1.0 　0.994155 　 0.952486 　-0.899368　 1　 2 　34.761490
1 　 0.046193　 -0.490712　 1.0 　 0.408218　 0.249081　 -1.913044 　0 　3 　19.561881
2　 0.147865 　 1.882448　 1.0 　0.280972 　 1.178634　 -2.756998 　2 　1 　18.687156
3　 -2.653401　 1.554569　 0.0 　0.285960 　 0.141632　 0.302597 　2 　1 　 18.199990
4　 0.421991　 0.286150 　 1.0 　0.233265　 -0.357421　 -2.827680 　3 　3 　26.263581

            y
0 　 234.154258
1 　176.895732
2 　317.697445
3 　121.505832
4 　 337.973534
True causal estimate is 11.471666628263067

X0 　　　 X1 　　　　Z0 　 Z1 　　　 W0 　　 W1　　W2　 W3 v0 \

　

※引用アドレスhttps://www.pywhy.org/dowhy/v0.11/example_notebooks/dowhy_causal_discovery_example.html

import dowhy
from dowhy import CausalModel

import numpy as np
import pandas as pd
import graphviz
import networkx as nx

np.set_printoptions(precision=3, suppress=True)
np.random.seed(0)

# Load the data first
data_mpg = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original',
delim_whitespace=True, header=None,
names = ['mpg', 'cylinders', 'displacement',
'horsepower', 'weight', 'acceleration',
'model year', 'origin', 'car name'])

def make_graph(adjacency_matrix, labels=None):
idx = np.abs(adjacency_matrix) > 0.01
dirs = np.where(idx)
d = graphviz.Digraph(engine='dot')
names = labels if labels else [f'x{i}' for i in range(len(adjacency_matrix))]
for name in names:
d.node(name)
for to, from_, coef in zip(dirs[0], dirs[1], adjacency_matrix[idx]):
d.edge(names[from_], names[to], label=str(coef))
return d

def str_to_dot(string):
'''
Converts input string from graphviz library to valid DOT graph format.
'''
graph = string.strip().replace('\n', ';').replace('\t','')
graph = graph[:9] + graph[10:-2] + graph[-1] # Removing unnecessary characters from string
return graph

# Now proceed with data manipulation as data_mpg is defined
data_mpg.dropna(inplace=True)
data_mpg.drop(['model year', 'origin', 'car name'], axis=1, inplace=True)
print(data_mpg.shape)
print(data_mpg.head()) # Use print to see the head in the output

from causallearn.search.ConstraintBased.PC import pc

labels = [f'{col}' for i, col in enumerate(data_mpg.columns)]
data = data_mpg.to_numpy()

cg = pc(data)

# Visualization using pydot
from causallearn.utils.GraphUtils import GraphUtils
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import io

pyd = GraphUtils.to_pydot(cg.G, labels=labels)
tmp_png = pyd.create_png(f="png")
fp = io.BytesIO(tmp_png)
img = mpimg.imread(fp, format='png')
plt.axis('off')
plt.imshow(img)
plt.show()

出力結果

(392, 6)
　　　mpg　 cylinders 　　 displacement horsepower weight acceleration
0 　　　18.0 　   8.0 　　　 307.0 　　  130.0 　　　3504.0          12.0
1 　　　15.0 　 8.0 　　　 350.0 　　 165.0 　　　3693.0          11.5
2 　　　18.0 　   8.0 　　　 318.0 　　  150.0 　　　3436.0          11.0
3　　　 16.0 　   8.0 　　　   304.0 　　   150.0 　　　3433.0          12.0
4 　　　17.0 　 8.0 　　　 302.0 　　 140.0 　　　3449.0          10.5

Depth=3, working on node 5: 100%　　　　　　　 6/6 [00:00<00:00, 167.53it/s]