To avoid any more confusion, let me take it from the start.
First of all it is important to understand that the orbitals, in the Huckel theory, does not have to be physical reasonable orbitals like we know from LCAO-DFT. The orbitals that used to construct the overlap and Hamiltonian should be seen as a parameterization of the offsite Hamiltonian elements.
If we look in the reference manual for the Huckel model (
http://www.quantumwise.com/documents/manuals/latest/ReferenceManual/index.html/chap.atkse.html#sect1.atkse.parameters ), we can see this by the fact that the overlap onsite is defined as the Kronecker delta functions. If the huckel orbitals where basis function of a normalized basis set, this condition would be unnecessary. In fact these basis function can in extreme cases be so strange, and long ranged, that they are truncated to zero, such that the functions is not continuous. This is also the reason for that sometimes a real space visualization of eigenstate from Huckel might not look so pretty.
When you are optimizing the basis set, or otherwise using Huckel, you will from time to time encounter this error message:
"Diagonalization error, overlap matrix not positive definite. This may be caused by atoms that are too close to each other or situated in equivalent positions, or (in the Extended Huckel model) a too low value of interaction_max_range"This comes from the fact the basis set function (aka parameterization of the offsite Hamiltonian) has become so "strange" that the constructed overlap is no longer positive definite, and it is possible to calculate the eigenvalues and eigenvectors for the Hamiltonian.
Summa summarum, this is the reason for your results depend on W even if you have single STO, you are scaling the offsite hamiltonian elements with this weight, and hence it affects results directly.
Personally whenever I have to create a Huckel basis set, and I only have a single STO, I would never touch the weight. This is of course a limitation since it is more limited in what kind of off-site Hamiltonian matrix elements I can describe, but I can still obtain good real space projections since the orbitals looks good. Cerda, how has many of the good basis set we include in ATK, has a different philosophy and used this weight to generate quite good basis set. The prize is that sometimes real space properties can look weird. This is the reason for my statement about only having one parameter.
If I have two STO, I will only allow for C1 and C2 such that the basis function is normalized for the same reason as above effectively giving me three free parameters.
I hope this made it all more clear, otherwise I will try to explain it in another way.