PFP descriptorsとShapley値を組み合わせた原子レベルでの予測の解釈 – Matlantis株式会社

PFP descriptorsはPFPの最終層の情報を出力する機能です。各原子に周辺環境を表現するための256次元のベクトルが割り当てられており、このベクトルを用いることで、高精度で材料の物性予測できることが報告されています[1]。

ここでは、PFP descriptorsとshapley値を用いて、機械学習モデルの予測を解釈する手法について解説します。

本内容は第86回応用物理学会秋季学術講演会"PFP descriptorsとShapley値を組み合わせた原子レベルでの予測の解釈"で発表した内容です。実装はMatlantisのExample内で公開しており、お使いいただけます。

Exampleは以下で確認できます。
Example Launcher -> Matlantis Examples -> Miscellaneous -> dielectric_prediction_with_descriptors.ipynb

Shapley値とは

Shapley値は、得られた利得を、各プレイヤーの貢献度に応じて公平に分配するための計算方法です。

中心的なアイデアは、「もしその人がチームに参加したら、全体の利益はどれくらい増えたか？」を、あらゆるパターンで考え、その平均値をそのプレイヤーの貢献度（取り分）とする、というものです。

具体例

A, B, Cの３人がおり、参加プレイヤーによって得られる利得がTable1のように定義されているとします。A, B, C３人が参加した際、利得をどのように分配すればいいかを考えます。

プレイヤー	利得
誰も参加しない	0
A	1
B	2
C	3
A, B	6
A, C	7
B, C	8
A, B, C	15

Table1. プレイヤーと利得の関係

あるメンバーの貢献度を、そのプレイヤーが追加された際に得られる利得がどれだけ増えるかを基準に考えます。しかし、既に入っているメンバーによって、増える利得は変わります。

例えば、誰もいない（利得:0）ところにAが参加した(利得:1)場合、Aの貢献度は1になり、Bがいる(利得:2)ところにAが参加した場合、Aの貢献度は4になります。

そのため、既に入っているメンバーの全パターンを考え、平均することで各メンバーの貢献度を定義します。

具体的な計算はTable2のようになります。Table2からもわかるように、sharpley値は、各メンバーの貢献度の合計が全メンバーが参加したときの利得と一致するという良い性質を有しています。

参加順	Aの貢献度	Bの貢献度	Cの貢献度
A -> B -> C	1	5	9
A -> C-> B	1	8	6
B -> A -> C	4	2	9
B -> C -> A	7	2	6
C -> A -> B	4	8	3
C -> B -> A	7	5	3
平均	4	5	6

Table2. 各プレイヤーの貢献度

一般化と近似計算

上記操作を一般化すると以下の式になります。

phi_iはプレイヤーiの貢献度、Nはプレイヤーの全体集合、Sは部分集合、|S|は部分集合に入っているプレイヤー数、|N|は全プレイヤー数、νは利得を計算するための価値関数です。

Shapley値は良い性質を有しますが、全プレイヤー数がnのとき、計算オーダーはO(2ⁿ)と非常に高いです。そこでShapley値を以下のShapley Kernelで重み付けた線形回帰で近似する手法が提案されました[2]。

z'は各プレイヤーが参加しているかを表すone hot vector、Mは全プレイヤー数です。

Shapley kernelで近似できることの証明は論文をご参照ください。以下のコードを実行することで、上記例をShapley kernelで計算できることを確認できます。np.hstack((np.ones((X.shape[0], 1)), X))は、全原子がアクティブではない場合の価値関数の値を計算するために入れています。

import numpy as np
import itertools
import math

coalition_to_value = {
    (0, 0, 0): 0,
    (1, 0, 0): 1,  # A
    (0, 1, 0): 2,  # B
    (0, 0, 1): 3,  # C
    (1, 1, 0): 6,  # A, B
    (1, 0, 1): 7,  # A, C
    (0, 1, 1): 8,  # B, C
    (1, 1, 1): 15, # A, B, C
}

M = 3

coalitions_binary = list(itertools.product([0, 1], repeat=M))
X = np.array(coalitions_binary)

y = np.array([coalition_to_value[c] for c in coalitions_binary])

def shapley_kernel_weight(M, s):
    if s == 0 or s == M:
        return 1e9 
    else:
        combinations_term = math.comb(M, s)
        denominator = combinations_term * s * (M - s)
        kernel_weight = (M - 1) / denominator
        return kernel_weight

kernel_weights = np.array([shapley_kernel_weight(M, np.sum(c)) for c in coalitions_binary])

X_matrix = np.hstack((np.ones((X.shape[0], 1)), X))
    
sqrt_weights = np.sqrt(kernel_weights)
X_weighted = X_matrix * sqrt_weights[:, np.newaxis]
y_weighted = y * sqrt_weights

coeffs, _, _, _ = np.linalg.lstsq(X_weighted, y_weighted, rcond=None)

phi_0 = coeffs[0]
shapley_values = coeffs[1:]

print(f"{phi_0=}")
print(f"{shapley_values=}")

PFP descriptorsとShapley値を用いた機械学習モデルの解釈

Shapley値のNを解釈したい材料のPFP descriptors、iを解釈したい原子とすれば、各原子の予測値に対するShapley値を計算することができます。公開している実装は以下です。

value_functionはShapley値の定義式のνに相当するものです。

この関数はPFP descriptorsと参加するプレイヤーを意味するactive_atom_mask、学習済みの機械モデルmodelを引数に取ります。参加しているプレイヤー（原子）以外をmaskして、activeな原子数×256の行列になっているので、何かしらの方法でベクトルに変換し、機械学習モデルに入力して予測値（利得）を計算します。ここでは、各次元で合計してベクトル化しており、誰も参加していない場合は0ベクトルとしています。

calc_kernnel_shapではshapley kernelを用いた重み付き線形回帰でShapley値を計算している。引数はPFP descriptorsであるatom_features、価値関数value_function、学習済みモデルmodel、サンプリング回数n_kernel_samplesです。

#1では全原子が参加している場合と参加しない場合の計算を行っています。

#2では、いくつかの原子がmaskされたデータをサンプリングして計算しています。

#3では、ここまでにサンプリングしたデータに対して重み付き線形回帰を実行しています。

def value_function(all_atom_features, active_atoms_mask, model):
    active_features = all_atom_features[active_atoms_mask]

    if active_features.shape[0] == 0:
        input_value = np.zeros_like(all_atom_features[0])
    else:
        input_value = np.sum(active_features, axis=0)

    value = model.predict(input_value.reshape(1, -1))

    return float(value.item())


def calc_kernel_shap(atom_features, value_function, model, n_kernel_samples=None):
    M = atom_features.shape[0] # Total number of atoms
    if M == 0:
        raise ValueError("The input 'atom_features' contains no atoms (number of atoms is 0).")

    if n_kernel_samples is None:
        n_kernel_samples = min(2**M, 2*M+2048)
    else:
        n_kernel_samples = min(2**M, n_kernel_samples)
    
    Z_prime = np.zeros((n_kernel_samples, M))
    y_target = np.zeros(n_kernel_samples)
    kernel_weights = np.zeros(n_kernel_samples)

    # 1. Set up special coalitions: all atoms included and no atoms included
    # Coalition with all atoms included
    Z_prime[0, :] = 1
    y_target[0] = value_function(atom_features, np.ones(M, dtype=bool), model)
    kernel_weights[0] = 1e9

    # Coalition with no atoms included
    Z_prime[1, :] = 0
    y_target[1] = value_function(atom_features, np.zeros(M, dtype=bool), model)
    kernel_weights[1] = 1e9

    # 2. Sample random coalitions
    # Use a set to store tuples of already generated random coalitions to avoid duplicates
    sampled_coalitions_set = set()
    num_sampled_so_far = 2 # The first two coalitions are fixed

    while num_sampled_so_far < n_kernel_samples:
        z_prime_candidate_arr = np.random.randint(0, 2, M)
        s = np.sum(z_prime_candidate_arr)

        if s == 0 or s == M: # Skip cases where all atoms are on/off, as they are already handled
            continue

        z_prime_candidate_tuple = tuple(z_prime_candidate_arr)
        if z_prime_candidate_tuple in sampled_coalitions_set:
            continue # Resample if this coalition is a duplicate
        
        sampled_coalitions_set.add(z_prime_candidate_tuple)
        Z_prime[num_sampled_so_far, :] = z_prime_candidate_arr
        
        y_target[num_sampled_so_far] = value_function(atom_features, z_prime_candidate_arr.astype(bool), model)
        
        # Calculate the Shapley kernel weight
        combinations_term = math.comb(M, s)
        denominator = combinations_term * s * (M - s)
        kernel_weights[num_sampled_so_far] = (M - 1) / denominator
            
        num_sampled_so_far += 1
    
    # 3. Perform weighted linear regression
    # Prepend a column of ones to Z_prime to create the design matrix X for the intercept term
    X_matrix = np.hstack((np.ones((n_kernel_samples, 1)), Z_prime))
    
    sqrt_weights = np.sqrt(kernel_weights)
    X_weighted = X_matrix * sqrt_weights[:, np.newaxis] # Element-wise multiplication using broadcasting
    y_weighted = y_target * sqrt_weights

    coeffs, _, _, _ = np.linalg.lstsq(X_weighted, y_weighted, rcond=None)
    shapley_values = coeffs[1:]  # The first coefficient is the intercept (phi_0), the rest are the Shapley values
    phi_0 = coeffs[0]
    
    return shapley_values, phi_0

分子を解釈するときに原子単位ではなく、官能基単位で解釈したいときがあるかと思います。

Shapley値のNを解釈したい分子のPFP descriptors、iを解釈したい官能基とすれば、各官能基の予測値に対するShapley値を計算することができます。実装としては以下のようになります。functional_group設定して、官能基単位でmaskできる。こうすることで、厳密解に必要なsample数が2**原子数から2**官能基数になるので、粗視化されることによる解釈性の向上だけでなく、計算コストの低減も見込めます。

from itertools import chain

def calc_group_kernel_shap(atom_features, value_function, model, functional_group=None, n_kernel_samples=None):
    
    atoms_number = atom_features.shape[0]
    
    # process functional_group
    processed_functional_group = []
    if functional_group:
        # copy
        processed_functional_group = [list(g) for g in functional_group]
        
        # index set in functional_group
        grouped_atoms = set(chain.from_iterable(processed_functional_group))
    else:
        grouped_atoms = set()

    all_atoms = set(range(atoms_number))
    
    ungrouped_atoms = all_atoms - grouped_atoms
    for atom_idx in sorted(list(ungrouped_atoms)):
        processed_functional_group.append([atom_idx])
    
    # the number of functional groups
    M = len(processed_functional_group)

    # n_kernel_samples
    max_coalitions = 2**M
    if n_kernel_samples is None:
        n_kernel_samples = min(max_coalitions, 2 * M + 2048)
    else:
        n_kernel_samples = min(max_coalitions, n_kernel_samples)
    
    if n_kernel_samples < 2:
        n_kernel_samples = 2

    Z_prime = np.zeros((n_kernel_samples, M))
    y_target = np.zeros(n_kernel_samples)
    kernel_weights = np.zeros(n_kernel_samples)

    # 1. Set up special coalitions: all functional groups included and no functional groups included
    Z_prime[0, :] = 1
    y_target[0] = value_function(atom_features, np.ones(atoms_number, dtype=bool), model)
    kernel_weights[0] = 1e9

    Z_prime[1, :] = 0
    y_target[1] = value_function(atom_features, np.zeros(atoms_number, dtype=bool), model)
    kernel_weights[1] = 1e9

    # 2. Sample random coalitions
    sampled_coalitions_set = {tuple(Z_prime[0]), tuple(Z_prime[1])}
    num_sampled_so_far = 2

    while num_sampled_so_far < n_kernel_samples:
        z_prime_candidate_arr = np.random.randint(0, 2, M)
        s = np.sum(z_prime_candidate_arr)

        if s == 0 or s == M:
            continue

        z_prime_candidate_tuple = tuple(z_prime_candidate_arr)
        if z_prime_candidate_tuple in sampled_coalitions_set:
            continue
        
        sampled_coalitions_set.add(z_prime_candidate_tuple)
        Z_prime[num_sampled_so_far, :] = z_prime_candidate_arr
        
        # convert functional group mask to atom mask
        active_atoms_mask = np.zeros(atoms_number, dtype=bool)
        active_group_indices = np.where(z_prime_candidate_arr == 1)[0]
        for group_idx in active_group_indices:
            atom_indices_in_group = processed_functional_group[group_idx]
            active_atoms_mask[atom_indices_in_group] = True
        
        y_target[num_sampled_so_far] = value_function(atom_features, active_atoms_mask, model)
        
        # Calculate the Shapley kernel weight
        combinations_term = math.comb(M, s)
        denominator = combinations_term * s * (M - s)
        kernel_weights[num_sampled_so_far] = (M - 1) / denominator if denominator > 0 else 0
            
        num_sampled_so_far += 1
    
    # 3. Perform weighted linear regression
    X_matrix = np.hstack((np.ones((n_kernel_samples, 1)), Z_prime))
    
    sqrt_weights = np.sqrt(kernel_weights)
    X_weighted = X_matrix * sqrt_weights[:, np.newaxis]
    y_weighted = y_target * sqrt_weights

    coeffs, _, _, _ = np.linalg.lstsq(X_weighted, y_weighted, rcond=None)
    
    phi_0 = coeffs[0]
    group_shapley_values = coeffs[1:]

    # 4. Divide the Shapley Value of functional groups into each atom.
    atom_shapley_values = np.zeros(atoms_number)
    for i, group_shap in enumerate(group_shapley_values):
        group_atom_indices = processed_functional_group[i]
        num_atoms_in_group = len(group_atom_indices)
        if num_atoms_in_group > 0:
            atom_shapley_values[group_atom_indices] = group_shap / num_atoms_in_group
            
    return atom_shapley_values, phi_0

References

[1] Z. Mao et. al., npj Comput. Mater. 10, 265 (2024).

[2] S. M. Lundberg, S. I. Lee., Adv. Neural Inf. Process. Syst. 30 (2017).

Shapley値とは

具体例

一般化と近似計算

PFP descriptorsとShapley値を用いた機械学習モデルの解釈

References

関連記事