Tips for Adsorption Structure Search – Matlantis Corporation

The pfcc-extras library provides three specialized functions for adsorption structure exploration, tailored to different system types: Slab models (adstructure_search_for_slab), Cluster models (adstructure_search_for_cluster), and Porous materials (adstructure_search_for_porous).

While these functions perform well with default settings, you can enhance exploration efficiency and more by tuning parameters to match your specific system. This article explains the key settings that offer the most impact.

1. Sampler Settings

The Sampler is the core engine of your search. By refining the TPESampler settings (as used in our examples), you can guide the search toward high-quality structures more effectively.

prior_weight

Role: An index determining how much weight is given to the center of the search domain.
Tuning Tip: Increasing this value prioritizes "exploration of unknown regions," while decreasing it prioritizes "exploitation" (searching around promising areas). In adsorption searches, the "surface" often occupies a small fraction of the total search space. Setting this lower than the default 1.0 (e.g., 0.5) can help the sampler focus more intensely on promising regions near the surface.

n_start_up_trials

Role: The number of initial trials performed using purely random sampling before Bayesian optimization begins.
Tuning Tip: In systems with large vacuum layers, the sampler might fail to find a single "adsorbed" state if the initial trial count is too low, causing subsequent optimization to stall. We recommend setting a sufficient number of startup trials to ensure the surface is encountered early on.

multivariate

Role: Determines whether variables are sampled independently or by considering their correlations.
Tuning Tip: Setting multivariate=True improves sampling quality. When dealing with large molecules or complex surfaces where the computational cost per trial is high, this setting is effective for ensuring each trial is more "informed" and deliberate.

seed

Role: Fixes the random number generator to improve reproducibility (e.g., TPESampler(seed=10)).
Note on Reproducibility: If you are using parallel processing to increase speed, calculations become non-deterministic, making full reproducibility difficult even with a fixed seed. For reproducibility, set n_jobs=1 to run trials sequentially. Note, however, that PFP inference involves minor GPU-related numerical errors, so exact bit-for-bit reproducibility can still be challenging.

Note: While Optuna supports various other samplers, TPESampler is generally sufficient for most use cases. If you wish to experiment with others, please refer to the Optuna documentation.

2. Callback Settings

As of pfcc-extras v0.13.0, you can now utilize Optuna callbacks. This allows for the implementation of Early Stopping. Instead of guessing the ideal number of trials (n_trials) beforehand, you can set a high n_trials value and use a Terminator to stop the search automatically once it converges, saving computational resources.

Implementation Example: Stopping when progress stagnates

In the following example, the search will automatically terminate if the best score is not updated for 150 consecutive trials.

from optuna.terminator import Terminator, TerminatorCallback, BestValueStagnationEvaluator

# Define criteria to stop if there is no improvement for 150 trials
improvement_evaluator = BestValueStagnationEvaluator(max_stagnation_trials=150)
terminator = Terminator(improvement_evaluator=improvement_evaluator)

adstructure_search_for_slab(
    calc_mode       = calc_mode,
    …,
    n_trials        = 1000, # Set a high limit
    sampler         = optuna.samplers.TPESampler(),
    callbacks = [TerminatorCallback(terminator)] # Enable automatic stopping
)

You can also utilize other improvement_evaluator types or define an error_evaluator. For a full list of implemented evaluators, please refer to the Optuna documentation.

Related to