Well
This is how I found out:
I designed a test system that could converge, but not easy. In this case it was a Copper dimer relaxation.
As I told, I first increased the mesh cutoff, and saw it improved convergence, but gave the same results.
In order to find out the reason for requirement of the mesh-cutoff, I suspect the reason being a rapid vary
core charge. So I opened VNL, and opened the Custom Analyzer for the pseudo-potential and saw that the
core charge was rapid varying. Hence I tried using a even smaller radial grid sampling, but this improved nothing.
With this learning I knew, that this was only a matter of iteration not being stable enough for Copper. In order
to improve the iteration, you can either tune the iteration parameters and improve the starting guess.
Therefore I calculate the mulliken population for the copper dimer using a mesh-cutoff and I saw that electrons
were moved from their staring orbitals into the split orbitals. Then I created a new basis, where I would manually fill
up the split orbitals and using this as a starting guess. However it was still unstable, and hence I knew, that this was
only a matter for improve the stability of the iterations.
When it comes to the iterations there are only 2 parameters for concern. The number of history steps and the diagonal mixing parameter.
The best choice of diagonal mixing parameter is known to be quite system dependent ( for instance for molecules a good value is 0.8, but for devices as good values is 0.05 - 0.1). However I found out that tuning this parameter did not improve anything significant, and if I increased too much, it blew the calculation to pieces.
Only parameter left was the number of history steps. However I quickly found out that doubling had not effect, but setting it down improved things.
In order to understand why this has a positive effect on the convergence, I had to think a for short while
But I know now why. The mixers are in general created to punish any rapid changes in the electronic structure between SCF cycles. However copper apparently has to make some dramatic rearrangements for the electron according to it is natural atomic environment, and first after a series of SCF cycles come to a self-consistent result. However the number of steps dictates how many steps it must remember these dramatic changes, so therefore if the number of history steps is too large, the convergence is "punished" for a long time for these dramatic changes making it unstable.
So in short: The number of history steps can be too long for some systems