Atomic Level Explanation via PFP Descriptors and Shapley Values

PFP descriptors are a feature that outputs information from the final layers of a PFP model. Each atom is assigned a 256-dimensional vector to represent its local chemical environment. It has been reported that using these vectors enables the highly accurate prediction of material properties [1].

Here, we explain a method for interpreting the predictions of machine learning models using PFP descriptors and Shapley values.

This content is based on a presentation titled "Atomic Level Explanation via PFP Descriptors and Shapley Values" which was given at the 86th JSAP (The Japan Society of Applied Physics) Autumn Meeting. The implementation is in the examples section of Matlantis and is ready for you to use.

The example can be found below.
Example Launcher -> Matlantis Examples -> Miscellaneous -> dielectric_prediction_with_descriptors.ipynb

What are Shapley Values?

Shapley values provide a method for fairly divide a total gain in players according to their individual contributions.

The core idea is to consider every possible order in which a player could have joined the team and calculate the average of their marginal contributions—that is, "how much did the total gain increase when this person joined?" This average is then taken as that player's contribution.

A Concrete Example

Suppose there are three players (A, B, and C) and the gain generated by any combination of these players is defined as shown in Table 1. Let's consider how the gain should be divided when all three players (A, B, and C) participate.

Player	Gain
No player	0
A	1
B	2
C	3
A, B	6
A, C	7
B, C	8
A, B, C	15

Table1. Relationships between Players and Gains

A player's contribution is determined by how much the gain increases when that player is added. However, this increase in gain varies depending on the members who are already present.

For example, if player A joins when there is no one else (gain: 0), and the payoff becomes 1, A's contribution is 1. But if A joins when B is already present (gain: 2), and the total gain becomes 6, A's contribution is 4.

Therefore, we define each player's contribution by considering all possible existing combinations of players and then calculating the average of their marginal contributions.

A specific calculation is shown in Table 2. As can be seen from Table 2, the Shapley value has the desirable property that the sum of each player's contribution equals the total payoff when all members participate.

Order of entry	A's contribution	B's contribution	C's contribution
A -> B -> C	1	5	9
A -> C-> B	1	8	6
B -> A -> C	4	2	9
B -> C -> A	7	2	6
C -> A -> B	4	8	3
C -> B -> A	7	5	3
Average	4	5	6

Table2. Each Player's Contribution

Generalization and Approximate Calculation

Generalizing the above operation yields the following formula.

Here, phi_i represents the contribution of player i, N is the set of all players, S is a subset of players, |S| is the number of players in the subset, |N| is the total number of players, and ν is the value function used to calculate the gain.

Although the Shapley value has desirable properties, its computational complexity is very high, with a calculation order of O(2ⁿ) when there are n players. To address this, a method has been proposed to approximate the Shapley value using a weighted linear regression with the Shapley Kernel shown below [2].

Here, z' is a one-hot vector indicating whether each player is participating, and M is the total number of players.

For the proof that this can be approximated using the Shapley kernel, please refer to the paper. You can confirm that the above example can be calculated with the Shapley kernel by running the code below.

The code np.hstack((np.ones((X.shape[0], 1)), X)) is included to calculate the value of the value function for the case where no players are participating.

import numpy as np
import itertools
import math

coalition_to_value = {
    (0, 0, 0): 0,
    (1, 0, 0): 1,  # A
    (0, 1, 0): 2,  # B
    (0, 0, 1): 3,  # C
    (1, 1, 0): 6,  # A, B
    (1, 0, 1): 7,  # A, C
    (0, 1, 1): 8,  # B, C
    (1, 1, 1): 15, # A, B, C
}

M = 3

coalitions_binary = list(itertools.product([0, 1], repeat=M))
X = np.array(coalitions_binary)

y = np.array([coalition_to_value[c] for c in coalitions_binary])

def shapley_kernel_weight(M, s):
    if s == 0 or s == M:
        return 1e9 
    else:
        combinations_term = math.comb(M, s)
        denominator = combinations_term * s * (M - s)
        kernel_weight = (M - 1) / denominator
        return kernel_weight

kernel_weights = np.array([shapley_kernel_weight(M, np.sum(c)) for c in coalitions_binary])

X_matrix = np.hstack((np.ones((X.shape[0], 1)), X))
    
sqrt_weights = np.sqrt(kernel_weights)
X_weighted = X_matrix * sqrt_weights[:, np.newaxis]
y_weighted = y * sqrt_weights

coeffs, _, _, _ = np.linalg.lstsq(X_weighted, y_weighted, rcond=None)

phi_0 = coeffs[0]
shapley_values = coeffs[1:]

print(f"{phi_0=}")
print(f"{shapley_values=}")

If we set N in the Shapley value formula to be the PFP descriptors of the material we want to interpret, and i to be the specific atom of interest, we can calculate the Shapley value for each atom's contribution to the prediction. The public implementation is provided below.

The value_function corresponds to the value function ν in the Shapley value definition. This function takes the PFP descriptors, an active_atom_mask (indicating the participating players), and the pre-trained machine learning model as arguments. It masks out any non-participating players (atoms), resulting in a matrix of size (number of active atoms) × 256. This matrix is then converted into a vector—in this case, by summing along each dimension—and input into the machine learning model to compute the predicted value (gain). If no players are participating, it returns a zero vector.

The calc_kernel_shap function calculates the Shapley values using a weighted linear regression with the Shapley kernel. Its arguments are atom_features (the PFP descriptors), the pre-trained model, readout_func, which converts PFP descriptors into vectors; n_kernel_samples, the number of sampling iterations; and batch_size, the number of data points contained in a batch..

#1: It first calculates the outcomes for the two base cases: when all atoms are participating and when no atoms are participating.

#2: It then samples data for various coalitions by masking certain subsets of atoms and calculates their outcomes.

#3: Finally, it performs a weighted linear regression on all the sampled data to compute the final Shapley values.

def batched_value_function(Z_prime, atom_features, model, readout_func, batch_size=None):
    """
    Evaluates the model predictions for a batch of binary feature masks.

    Args:
        Z_prime (np.ndarray): A 2D array of shape (num_samples, M) containing binary masks (0s and 1s).
        atom_features (np.ndarray): A 2D array of shape (M, feature_dim) containing the original atom features.
        model (object): The machine learning model, which must have a `predict` method.
        readout_func (callable): A function that takes a masked 2D array of shape (M, feature_dim) 
                                 and aggregates it into a 1D array of shape (feature_dim,).
        batch_size (int, optional): The number of samples to process at once to prevent memory overflow. 
                                    If None, processes all samples in a single batch. Defaults to None.

    Returns:
        np.ndarray: A 1D array of shape (num_samples,) containing the model predictions.
    """
    num_samples = Z_prime.shape[0]
    
    # If batch_size is None, process all samples at once
    if batch_size is None:
        batch_size = num_samples

    all_values = []
    
    # Process the data in chunks of the specified batch_size
    for start_idx in range(0, num_samples, batch_size):
        end_idx = min(start_idx + batch_size, num_samples)
        Z_chunk = Z_prime[start_idx:end_idx]
        
        # 1. Mask processing for each chunk (using broadcasting)
        masked_chunk = Z_chunk[:, :, np.newaxis] * atom_features[np.newaxis, :, :]
        
        # 2. Aggregate features using the readout_func
        chunk_inputs = np.array([readout_func(feat) for feat in masked_chunk])
        
        # 3. Model inference for each chunk
        chunk_values = model.predict(chunk_inputs)
        all_values.append(chunk_values.flatten())
        
    # Concatenate the chunked inference results into a single 1D array and return
    return np.concatenate(all_values)


def calculate_weights(s_array, M):
    """
    Calculates the Kernel SHAP weights for a batch of coalitions.

    Args:
        s_array (np.ndarray): A 1D array containing the number of 'ON' features (s) for each coalition.
        M (int): The total number of unmasked features (atoms).

    Returns:
        np.ndarray: A 1D array containing the Shapley kernel weights.
    """
    weights = np.zeros_like(s_array, dtype=float)

    # Special cases: s == 0 or s == M get an infinitely large weight (approximated as 1e9)
    special_mask = (s_array == 0) | (s_array == M)
    weights[special_mask] = 1e9

    # Normal cases
    normal_mask = ~special_mask
    s_normal = s_array[normal_mask]

    # Calculate the Shapley kernel weight
    combinations_term = binom(M, s_normal)
    denominator = combinations_term * s_normal * (M - s_normal)    
    weights[normal_mask] = (M - 1) / denominator
    
    return weights


def calc_kernel_shap(atom_features, model, readout_func, n_kernel_samples=None, batch_size=None):
    """
    Calculates the Shapley values using the Kernel SHAP method with deterministic symmetric sampling.

    Args:
        atom_features (np.ndarray): A 2D array of shape (M, feature_dim) representing the atoms.
        model (object): The trained model to explain.
        readout_func (callable): The aggregation function to apply after masking features.
        n_kernel_samples (int, optional): The maximum number of coalitions to sample. 
                                          If None, a default heuristic based on M is used. Defaults to None.
        batch_size (int, optional): The number of samples to process at a time. Defaults to None.

    Returns:
        tuple:
            - shapley_values (np.ndarray): A 1D array of length M containing the Shapley values for each atom.
            - phi_0 (float): The expected base value (intercept) of the model prediction.
    """
    M = atom_features.shape[0] 
    if M == 0:
        raise ValueError("The input 'atom_features' contains no atoms.")

    max_samples = 2**M
    if n_kernel_samples is None:
        n_kernel_samples = min(max_samples, 2*M + 2048)
    else:
        n_kernel_samples = min(max_samples, n_kernel_samples)
    
    Z_prime_list = []

    # 1. Generate sampling patterns systematically
    # Add the coalitions with all features ON and all features OFF
    Z_prime_list.append(np.ones(M))
    Z_prime_list.append(np.zeros(M))

    max_s = M // 2
    for s in range(1, max_s + 1):
        if len(Z_prime_list) >= n_kernel_samples:
            break

        # Generate all combinations of size s
        for combo in itertools.combinations(range(M), s):
                 
            # Create the ON pattern (set selected indices to 1)
            z_s = np.zeros(M)
            z_s[list(combo)] = 1
            Z_prime_list.append(z_s)

            # Create the corresponding OFF pattern (invert 0s and 1s) to maintain symmetry
            if s != M - s:
                Z_prime_list.append(1 - z_s)

            # Check if the desired number of samples is reached
            if len(Z_prime_list) >= n_kernel_samples:
                break

    # Optional: Truncate exactly to n_kernel_samples if it overshot by 1 due to the pair appending
    Z_prime_list = Z_prime_list[:n_kernel_samples]
    
    # 2. Convert the list to a NumPy array in preparation for batch processing
    Z_prime = np.array(Z_prime_list)
    s_array = np.sum(Z_prime, axis=1)

    kernel_weights = calculate_weights(s_array, M)
    y_target = batched_value_function(Z_prime, atom_features, model, readout_func, batch_size)

    # 3. Perform weighted linear regression
    n_actual_samples = len(Z_prime)
    X_matrix = np.hstack((np.ones((n_actual_samples, 1)), Z_prime))
    
    sqrt_weights = np.sqrt(kernel_weights)
    X_weighted = X_matrix * sqrt_weights[:, np.newaxis] 
    y_weighted = y_target * sqrt_weights

    # Solve the least squares problem
    coeffs, _, _, _ = np.linalg.lstsq(X_weighted, y_weighted, rcond=None)

    # The first coefficient is the intercept (phi_0), the rest are the Shapley values
    shapley_values = coeffs[1:] 
    phi_0 = coeffs[0]
    
    return shapley_values, phi_0

When interpreting molecules, you may want to do so at the level of functional groups rather than individual atoms.

By setting N in the Shapley value formula to be the PFP descriptors of the molecule of interest, and i to be the functional group of interest, we can calculate the Shapley value for each functional group's contribution to the prediction. The implementation is as follows.

You can set the functional_group parameter to perform masking at the functional group level. This approach reduces the number of coalitions required for an exact solution from to . Therefore, in addition to improving interpretability through this coarse-graining, it also significantly reduces computational cost.

from itertools import chain

def calc_kernel_shap(atom_features, model, readout_func, functional_group=None, n_kernel_samples=None, batch_size=None):
    """
    Calculates the Shapley values using the Kernel SHAP method with deterministic symmetric sampling.

    Args:
        atom_features (np.ndarray): A 2D array of shape (M, feature_dim) representing the atoms.
        model (object): The trained model to explain.
        readout_func (callable): The aggregation function to apply after masking features.
        functional_group (list of list of int, optional): Lists of atom indices to be grouped. Defaults to None.
        n_kernel_samples (int, optional): The maximum number of coalitions to sample. 
                                          If None, a default heuristic based on M is used. Defaults to None.
        batch_size (int, optional): The number of samples to process at a time. Defaults to None.

    Returns:
        tuple:
            - shapley_values (np.ndarray): A 1D array of length M containing the Shapley values for each atom.
            - phi_0 (float): The expected base value (intercept) of the model prediction.
            - processed_functional_group (list): The grouped atom indices corresponding to each Shapley value.
    """
    atoms_number = atom_features.shape[0]
    if atoms_number == 0:
        raise ValueError("The input 'atom_features' contains no atoms.")

    # Functional group processing
    processed_functional_group = []
    if functional_group:
        # copy
        processed_functional_group = [list(g) for g in functional_group]
        
        # index set in functional_group
        grouped_atoms = set(chain.from_iterable(processed_functional_group))
    else:
        grouped_atoms = set()

    all_atoms = set(range(atoms_number))
    
    ungrouped_atoms = all_atoms - grouped_atoms
    for atom_idx in sorted(list(ungrouped_atoms)):
        processed_functional_group.append([atom_idx])
    
    # the number of functional groups
    M = len(processed_functional_group)
    
    max_samples = 2**M
    if n_kernel_samples is None:
        n_kernel_samples = min(max_samples, 2*M + 2048)
    else:
        n_kernel_samples = min(max_samples, n_kernel_samples)
    
    Z_prime_list = []

    # 1. Generate sampling patterns systematically
    # Add the coalitions with all features ON and all features OFF
    Z_prime_list.append(np.ones(M))
    Z_prime_list.append(np.zeros(M))

    max_s = M // 2
    for s in range(1, max_s + 1):
        if len(Z_prime_list) >= n_kernel_samples:
            break

        # Generate all combinations of size s
        for combo in itertools.combinations(range(M), s):
                 
            # Create the ON pattern (set selected indices to 1)
            z_s = np.zeros(M)
            z_s[list(combo)] = 1
            Z_prime_list.append(z_s)

            # Create the corresponding OFF pattern (invert 0s and 1s) to maintain symmetry
            if s != M - s:
                Z_prime_list.append(1 - z_s)

            # Check if the desired number of samples is reached
            if len(Z_prime_list) >= n_kernel_samples:
                break

    # Optional: Truncate exactly to n_kernel_samples if it overshot by 1 due to the pair appending
    Z_prime_list = Z_prime_list[:n_kernel_samples]
    
    # 2. Convert the list to a NumPy array in preparation for batch processing
    Z_prime = np.array(Z_prime_list)
    s_array = np.sum(Z_prime, axis=1)

    # Expand group masks to atom-level masks
    num_actual_samples = Z_prime.shape[0]
    Z_prime_atoms = np.zeros((num_actual_samples, atoms_number))
    for group_idx, atom_indices in enumerate(processed_functional_group):
        Z_prime_atoms[:, atom_indices] = Z_prime[:, group_idx:group_idx+1]

    
    # 3. Calculate weights and prediction values (passing batch_size here)
    kernel_weights = calculate_weights(s_array, M)
    y_target = batched_value_function(Z_prime_atoms, atom_features, model, readout_func, batch_size)

    # 4. Perform weighted linear regression
    n_actual_samples = len(Z_prime)
    X_matrix = np.hstack((np.ones((n_actual_samples, 1)), Z_prime))
    
    sqrt_weights = np.sqrt(kernel_weights)
    X_weighted = X_matrix * sqrt_weights[:, np.newaxis] 
    y_weighted = y_target * sqrt_weights

    # Solve the least squares problem
    coeffs, _, _, _ = np.linalg.lstsq(X_weighted, y_weighted, rcond=None)

    # The first coefficient is the intercept (phi_0), the rest are the Shapley values
    group_shapley_values = coeffs[1:] 
    phi_0 = coeffs[0]

    # 4. Divide the Shapley Value of functional groups into each atom.
    atom_shapley_values = np.zeros(atoms_number)
    for i, group_shap in enumerate(group_shapley_values):
        group_atom_indices = processed_functional_group[i]
        num_atoms_in_group = len(group_atom_indices)
        atom_shapley_values[group_atom_indices] = group_shap / num_atoms_in_group
    
    return atom_shapley_values, phi_0, processed_functional_group

References

[1] Z. Mao et. al., npj Comput. Mater. 10, 265 (2024).

[2] S. M. Lundberg, S. I. Lee., Adv. Neural Inf. Process. Syst. 30 (2017).

Related to

What are Shapley Values?

A Concrete Example

Generalization and Approximate Calculation

Atomic Level Explanation via PFP Descriptors and Shapley Values

References

Related articles