Dear Admin,
I am fine-tuning (multi-head) a MACE model "mace-mp-0b3-medium.model" with interface ~3000 configurations obtained from optimization of various interface structures between two materials. The caculator used to generate the data was LCAO-PAW. However, I observe that the training loss is more than validation loss and it does not converge easily below 2. I am using latest version of QuantumATK Y-2026.03
# %% MACEFittingParameters
model_parameters = MACEModelParameters(
foundation_model_path='/home/MHeadFineTune/AB_PAW/Model2/mace-mp-0b3-medium.model'
)
replay_finetuning_settings = MACEReplayFinetuningSettings(
replay_data_filepath='/home/MHeadFineTune/AB_PAW/Model2/mp_traj_combined.xyz',
number_of_samples=10000,
replay_subselect_method=MLParameterOptions.REPLAY_SUBSELECT.RANDOM,
replay_filtering_type=MLParameterOptions.REPLAY_FILTERING.COMBINATIONS,
)
dataset_parameters = ForceFieldDatasetParameters(
dataset_name=None,
validation_fraction=0.2,
isolated_atom_energies=None,
energy_key='REF_energy',
forces_key='REF_forces',
stress_key='REF_stress',
energy_weight=1,
forces_weight=100.0,
stress_weight=1,
compute_stress=True,
forces_cap=None,
replay_finetuning_settings=replay_finetuning_settings,
)
training_parameters = TrainingParameters(
experiment_name='AB_replay_finetuning',
batch_size=5,
max_number_of_epochs=200,
patience=50,
device=Automatic,
random_seed=123,
number_of_workers=0,
default_dtype=MLParameterOptions.DTYPE.FLOAT64,
learning_rate=0.005,
weight_decay=5e-07,
restart_from_last_checkpoint=True,
scheduler_patience=5,
gradient_clipping_threshold=100,
save_all_available_model_formats=True,
additional_parameters=None,
)
mace_fitting_parameters = MACEFittingParameters(
model_parameters=model_parameters,
dataset_parameters=dataset_parameters,
training_parameters=training_parameters,
)
nlsave('GSiO2hBN_Train_model_with_MultiHFine.hdf5', mace_fitting_parameters)
# %% MachineLearnedForceFieldTrainer
machine_learned_force_field_trainer = MachineLearnedForceFieldTrainer(
fitting_parameters=mace_fitting_parameters,
training_sets=combined_training_set_training_set_0,
calculator=cam_AB_training_set_lcao_calculator_0,
train_test_split=0.9,
random_seed=None,
save_model_evaluator=True,
)
machine_learned_force_field_trainer.train()