PFP descriptors are a feature that outputs information from the final layers of a PFP model. Each atom is assigned a 256-dimensional vector to represent its local chemical environment. It has been reported that using these vectors enables the highly accurate prediction of material properties [1].
Here, we explain a method for interpreting the predictions of machine learning models using PFP descriptors and Shapley values.
This content is based on a presentation titled "Atomic Level Explanation via PFP Descriptors and Shapley Values" which was given at the 86th JSAP (The Japan Society of Applied Physics) Autumn Meeting. The implementation is in the examples section of Matlantis and is ready for you to use.
The example can be found below.
Example Launcher -> Matlantis Examples -> Miscellaneous -> dielectric_prediction_with_descriptors.ipynb
What are Shapley Values?
Shapley values provide a method for fairly divide a total gain in players according to their individual contributions.
The core idea is to consider every possible order in which a player could have joined the team and calculate the average of their marginal contributions—that is, "how much did the total gain increase when this person joined?" This average is then taken as that player's contribution.
A Concrete Example
Suppose there are three players (A, B, and C) and the gain generated by any combination of these players is defined as shown in Table 1. Let's consider how the gain should be divided when all three players (A, B, and C) participate.
| Player | Gain |
| No player | 0 |
| A | 1 |
| B | 2 |
| C | 3 |
| A, B | 6 |
| A, C | 7 |
| B, C | 8 |
| A, B, C | 15 |
A player's contribution is determined by how much the gain increases when that player is added. However, this increase in gain varies depending on the members who are already present.
For example, if player A joins when there is no one else (gain: 0), and the payoff becomes 1, A's contribution is 1. But if A joins when B is already present (gain: 2), and the total gain becomes 6, A's contribution is 4.
Therefore, we define each player's contribution by considering all possible existing combinations of players and then calculating the average of their marginal contributions.
A specific calculation is shown in Table 2. As can be seen from Table 2, the Shapley value has the desirable property that the sum of each player's contribution equals the total payoff when all members participate.
| Order of entry | A's contribution | B's contribution | C's contribution |
| A -> B -> C | 1 | 5 | 9 |
| A -> C-> B | 1 | 8 | 6 |
| B -> A -> C | 4 | 2 | 9 |
| B -> C -> A | 7 | 2 | 6 |
| C -> A -> B | 4 | 8 | 3 |
| C -> B -> A | 7 | 5 | 3 |
| Average | 4 | 5 | 6 |
Generalization and Approximate Calculation
Generalizing the above operation yields the following formula.
Here, phi_i represents the contribution of player i, N is the set of all players, S is a subset of players, |S| is the number of players in the subset, |N| is the total number of players, and ν is the value function used to calculate the gain.
Although the Shapley value has desirable properties, its computational complexity is very high, with a calculation order of O(2n) when there are n players. To address this, a method has been proposed to approximate the Shapley value using a weighted linear regression with the Shapley Kernel shown below [2].
Here, z' is a one-hot vector indicating whether each player is participating, and M is the total number of players.
For the proof that this can be approximated using the Shapley kernel, please refer to the paper. You can confirm that the above example can be calculated with the Shapley kernel by running the code below.
The code np.hstack((np.ones((X.shape[0], 1)), X)) is included to calculate the value of the value function for the case where no players are participating.
import numpy as np
import itertools
import math
coalition_to_value = {
(0, 0, 0): 0,
(1, 0, 0): 1, # A
(0, 1, 0): 2, # B
(0, 0, 1): 3, # C
(1, 1, 0): 6, # A, B
(1, 0, 1): 7, # A, C
(0, 1, 1): 8, # B, C
(1, 1, 1): 15, # A, B, C
}
M = 3
coalitions_binary = list(itertools.product([0, 1], repeat=M))
X = np.array(coalitions_binary)
y = np.array([coalition_to_value[c] for c in coalitions_binary])
def shapley_kernel_weight(M, s):
if s == 0 or s == M:
return 1e9
else:
combinations_term = math.comb(M, s)
denominator = combinations_term * s * (M - s)
kernel_weight = (M - 1) / denominator
return kernel_weight
kernel_weights = np.array([shapley_kernel_weight(M, np.sum(c)) for c in coalitions_binary])
X_matrix = np.hstack((np.ones((X.shape[0], 1)), X))
sqrt_weights = np.sqrt(kernel_weights)
X_weighted = X_matrix * sqrt_weights[:, np.newaxis]
y_weighted = y * sqrt_weights
coeffs, _, _, _ = np.linalg.lstsq(X_weighted, y_weighted, rcond=None)
phi_0 = coeffs[0]
shapley_values = coeffs[1:]
print(f"{phi_0=}")
print(f"{shapley_values=}")
Atomic Level Explanation via PFP Descriptors and Shapley Values
If we set N in the Shapley value formula to be the PFP descriptors of the material we want to interpret, and i to be the specific atom of interest, we can calculate the Shapley value for each atom's contribution to the prediction. The public implementation is provided below.
The value_function corresponds to the value function ν in the Shapley value definition. This function takes the PFP descriptors, an active_atom_mask (indicating the participating players), and the pre-trained machine learning model as arguments. It masks out any non-participating players (atoms), resulting in a matrix of size (number of active atoms) × 256. This matrix is then converted into a vector—in this case, by summing along each dimension—and input into the machine learning model to compute the predicted value (gain). If no players are participating, it returns a zero vector.
The calc_kernel_shap function calculates the Shapley values using a weighted linear regression with the Shapley kernel. Its arguments are atom_features (the PFP descriptors), the value_function, the pre-trained model, and n_kernel_samples (the number of sampling iterations).
#1: It first calculates the outcomes for the two base cases: when all atoms are participating and when no atoms are participating.
#2: It then samples data for various coalitions by masking certain subsets of atoms and calculates their outcomes.
#3: Finally, it performs a weighted linear regression on all the sampled data to compute the final Shapley values.
def value_function(all_atom_features, active_atoms_mask, model):
active_features = all_atom_features[active_atoms_mask]
if active_features.shape[0] == 0:
input_value = np.zeros_like(all_atom_features[0])
else:
input_value = np.sum(active_features, axis=0)
value = model.predict(input_value.reshape(1, -1))
return float(value.item())
def calc_kernel_shap(atom_features, value_function, model, n_kernel_samples=None):
M = atom_features.shape[0] # Total number of atoms
if M == 0:
raise ValueError("The input 'atom_features' contains no atoms (number of atoms is 0).")
if n_kernel_samples is None:
n_kernel_samples = min(2**M, 2*M+2048)
else:
n_kernel_samples = min(2**M, n_kernel_samples)
Z_prime = np.zeros((n_kernel_samples, M))
y_target = np.zeros(n_kernel_samples)
kernel_weights = np.zeros(n_kernel_samples)
# 1. Set up special coalitions: all atoms included and no atoms included
# Coalition with all atoms included
Z_prime[0, :] = 1
y_target[0] = value_function(atom_features, np.ones(M, dtype=bool), model)
kernel_weights[0] = 1e9
# Coalition with no atoms included
Z_prime[1, :] = 0
y_target[1] = value_function(atom_features, np.zeros(M, dtype=bool), model)
kernel_weights[1] = 1e9
# 2. Sample random coalitions
# Use a set to store tuples of already generated random coalitions to avoid duplicates
sampled_coalitions_set = set()
num_sampled_so_far = 2 # The first two coalitions are fixed
while num_sampled_so_far < n_kernel_samples:
z_prime_candidate_arr = np.random.randint(0, 2, M)
s = np.sum(z_prime_candidate_arr)
if s == 0 or s == M: # Skip cases where all atoms are on/off, as they are already handled
continue
z_prime_candidate_tuple = tuple(z_prime_candidate_arr)
if z_prime_candidate_tuple in sampled_coalitions_set:
continue # Resample if this coalition is a duplicate
sampled_coalitions_set.add(z_prime_candidate_tuple)
Z_prime[num_sampled_so_far, :] = z_prime_candidate_arr
y_target[num_sampled_so_far] = value_function(atom_features, z_prime_candidate_arr.astype(bool), model)
# Calculate the Shapley kernel weight
combinations_term = math.comb(M, s)
denominator = combinations_term * s * (M - s)
kernel_weights[num_sampled_so_far] = (M - 1) / denominator
num_sampled_so_far += 1
# 3. Perform weighted linear regression
# Prepend a column of ones to Z_prime to create the design matrix X for the intercept term
X_matrix = np.hstack((np.ones((n_kernel_samples, 1)), Z_prime))
sqrt_weights = np.sqrt(kernel_weights)
X_weighted = X_matrix * sqrt_weights[:, np.newaxis] # Element-wise multiplication using broadcasting
y_weighted = y_target * sqrt_weights
coeffs, _, _, _ = np.linalg.lstsq(X_weighted, y_weighted, rcond=None)
shapley_values = coeffs[1:] # The first coefficient is the intercept (phi_0), the rest are the Shapley values
phi_0 = coeffs[0]
return shapley_values, phi_0
When interpreting molecules, you may want to do so at the level of functional groups rather than individual atoms.
By setting N in the Shapley value formula to be the PFP descriptors of the molecule of interest, and i to be the functional group of interest, we can calculate the Shapley value for each functional group's contribution to the prediction. The implementation is as follows.
You can set the functional_group parameter to perform masking at the functional group level. This approach reduces the number of coalitions required for an exact solution from to . Therefore, in addition to improving interpretability through this coarse-graining, it also significantly reduces computational cost.
from itertools import chain
def calc_group_kernel_shap(atom_features, value_function, model, functional_group=None, n_kernel_samples=None):
atoms_number = atom_features.shape[0]
# process functional_group
processed_functional_group = []
if functional_group:
# copy
processed_functional_group = [list(g) for g in functional_group]
# index set in functional_group
grouped_atoms = set(chain.from_iterable(processed_functional_group))
else:
grouped_atoms = set()
all_atoms = set(range(atoms_number))
ungrouped_atoms = all_atoms - grouped_atoms
for atom_idx in sorted(list(ungrouped_atoms)):
processed_functional_group.append([atom_idx])
# the number of functional groups
M = len(processed_functional_group)
# n_kernel_samples
max_coalitions = 2**M
if n_kernel_samples is None:
n_kernel_samples = min(max_coalitions, 2 * M + 2048)
else:
n_kernel_samples = min(max_coalitions, n_kernel_samples)
if n_kernel_samples < 2:
n_kernel_samples = 2
Z_prime = np.zeros((n_kernel_samples, M))
y_target = np.zeros(n_kernel_samples)
kernel_weights = np.zeros(n_kernel_samples)
# 1. Set up special coalitions: all functional groups included and no functional groups included
Z_prime[0, :] = 1
y_target[0] = value_function(atom_features, np.ones(atoms_number, dtype=bool), model)
kernel_weights[0] = 1e9
Z_prime[1, :] = 0
y_target[1] = value_function(atom_features, np.zeros(atoms_number, dtype=bool), model)
kernel_weights[1] = 1e9
# 2. Sample random coalitions
sampled_coalitions_set = {tuple(Z_prime[0]), tuple(Z_prime[1])}
num_sampled_so_far = 2
while num_sampled_so_far < n_kernel_samples:
z_prime_candidate_arr = np.random.randint(0, 2, M)
s = np.sum(z_prime_candidate_arr)
if s == 0 or s == M:
continue
z_prime_candidate_tuple = tuple(z_prime_candidate_arr)
if z_prime_candidate_tuple in sampled_coalitions_set:
continue
sampled_coalitions_set.add(z_prime_candidate_tuple)
Z_prime[num_sampled_so_far, :] = z_prime_candidate_arr
# convert functional group mask to atom mask
active_atoms_mask = np.zeros(atoms_number, dtype=bool)
active_group_indices = np.where(z_prime_candidate_arr == 1)[0]
for group_idx in active_group_indices:
atom_indices_in_group = processed_functional_group[group_idx]
active_atoms_mask[atom_indices_in_group] = True
y_target[num_sampled_so_far] = value_function(atom_features, active_atoms_mask, model)
# Calculate the Shapley kernel weight
combinations_term = math.comb(M, s)
denominator = combinations_term * s * (M - s)
kernel_weights[num_sampled_so_far] = (M - 1) / denominator if denominator > 0 else 0
num_sampled_so_far += 1
# 3. Perform weighted linear regression
X_matrix = np.hstack((np.ones((n_kernel_samples, 1)), Z_prime))
sqrt_weights = np.sqrt(kernel_weights)
X_weighted = X_matrix * sqrt_weights[:, np.newaxis]
y_weighted = y_target * sqrt_weights
coeffs, _, _, _ = np.linalg.lstsq(X_weighted, y_weighted, rcond=None)
phi_0 = coeffs[0]
group_shapley_values = coeffs[1:]
# 4. Divide the Shapley Value of functional groups into each atom.
atom_shapley_values = np.zeros(atoms_number)
for i, group_shap in enumerate(group_shapley_values):
group_atom_indices = processed_functional_group[i]
num_atoms_in_group = len(group_atom_indices)
if num_atoms_in_group > 0:
atom_shapley_values[group_atom_indices] = group_shap / num_atoms_in_group
return atom_shapley_values, phi_0
References
[1] Z. Mao et. al., npj Comput. Mater. 10, 265 (2024).
[2] S. M. Lundberg, S. I. Lee., Adv. Neural Inf. Process. Syst. 30 (2017).