The pfcc-extras library provides three specialized functions for adsorption structure exploration, tailored to different system types: Slab models (adstructure_search_for_slab), Cluster models (adstructure_search_for_cluster), and Porous materials (adstructure_search_for_porous).
While these functions perform well with default settings, you can enhance exploration efficiency and more by tuning parameters to match your specific system. This article explains the key settings that offer the most impact.
1. Sampler Settings
The Sampler is the core engine of your search. By refining the TPESampler settings (as used in our examples), you can guide the search toward high-quality structures more effectively.
prior_weight
Role: An index determining how much weight is given to the center of the search domain.
Tuning Tip: Increasing this value prioritizes "exploration of unknown regions," while decreasing it prioritizes "exploitation" (searching around promising areas). In adsorption searches, the "surface" often occupies a small fraction of the total search space. Setting this lower than the default 1.0 (e.g., 0.5) can help the sampler focus more intensely on promising regions near the surface.
n_start_up_trials
Role: The number of initial trials performed using purely random sampling before Bayesian optimization begins.
Tuning Tip: In systems with large vacuum layers, the sampler might fail to find a single "adsorbed" state if the initial trial count is too low, causing subsequent optimization to stall. We recommend setting a sufficient number of startup trials to ensure the surface is encountered early on.
multivariate
Role: Determines whether variables are sampled independently or by considering their correlations.
Tuning Tip: Setting
multivariate=Trueimproves sampling quality. When dealing with large molecules or complex surfaces where the computational cost per trial is high, this setting is effective for ensuring each trial is more "informed" and deliberate.
seed
Role: Fixes the random number generator to improve reproducibility (e.g.,
TPESampler(seed=10)).Note on Reproducibility: If you are using parallel processing to increase speed, calculations become non-deterministic, making full reproducibility difficult even with a fixed seed. For reproducibility, set
n_jobs=1to run trials sequentially. Note, however, that PFP inference involves minor GPU-related numerical errors, so exact bit-for-bit reproducibility can still be challenging.
Note: While Optuna supports various other samplers,
TPESampleris generally sufficient for most use cases. If you wish to experiment with others, please refer to the Optuna documentation.
2. Callback Settings
As of pfcc-extras v0.13.0, you can now utilize Optuna callbacks. This allows for the implementation of Early Stopping. Instead of guessing the ideal number of trials (n_trials) beforehand, you can set a high n_trials value and use a Terminator to stop the search automatically once it converges, saving computational resources.
Implementation Example: Stopping when progress stagnates
In the following example, the search will automatically terminate if the best score is not updated for 150 consecutive trials.
from optuna.terminator import Terminator, TerminatorCallback, BestValueStagnationEvaluator
# Define criteria to stop if there is no improvement for 150 trials
improvement_evaluator = BestValueStagnationEvaluator(max_stagnation_trials=150)
terminator = Terminator(improvement_evaluator=improvement_evaluator)
adstructure_search_for_slab(
calc_mode = calc_mode,
…,
n_trials = 1000, # Set a high limit
sampler = optuna.samplers.TPESampler(),
callbacks = [TerminatorCallback(terminator)] # Enable automatic stopping
)You can also utilize other improvement_evaluator types or define an error_evaluator. For a full list of implemented evaluators, please refer to the Optuna documentation.