Author Topic: inquiring on the memory settings and multi-task running on a single cluster  (Read 3708 times)

0 Members and 1 Guest are viewing this topic.

Offline fanjiaping

  • Heavy QuantumATK user
  • ***
  • Posts: 67
  • Reputation: 0
    • View Profile
Dear sir:
  We are runing the ATK package on a cluster with 22 nodes. However, we find that we are allowed to run only one task each time. And if we submit another new task, the currrent job will be killed automatically. Furthermore, based on our calculation, each task is usually  time consuming and expensive, that is, for runing an individual ATK task, it cost a  great amount of memory. Can you kindly make some modification of the default envionment settings for us, then we can use the soruces properly. Or is there some options we can apply to monitor the working processing?

    Any kind of suggestiong would be greatly appreciated.
 Thanks in advance.

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5428
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Do I understand it correctly that you have a purchased license? If so, I suggest you contact your sales office to discuss these points, as they can help more specifically.

Which version of ATK are you running? 10.8 is very memory hungry, but this has changed a lot in 11.2.

Also, it matters precisely how you submit the job. If you let several MPI process run on the same node, each one uses the same amount of memory so the total memory load on the machine becomes multiplied. You can control this with the flag "-npernode 1" if your mpiexec supports it (if not, probably it will balance the load automatically, but it's best to check carefully).

Offline fanjiaping

  • Heavy QuantumATK user
  • ***
  • Posts: 67
  • Reputation: 0
    • View Profile
yes, I have a purchased license! I met a problem recently as below(from the computer Cluster):

Fatal error in MPI_Allreduce: Message truncated, error stack:
MPI_Allreduce(773)................: MPI_Allreduce(sbuf=0x1e4a9450, rbuf=0x2d55b718, count=613, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD) failed
MPIR_Reduce(759)..................:
MPIR_Reduce_redscat_gather(406)...:
MPIDI_CH3U_Receive_data_found(129): Message from rank 5 and tag 11 truncated; 4912 bytes received but buffer size is 4904

and other the message from the .log file:

+------------------------------------------------------------------------------+
| Optimization step =  0 E = -1.2254e+05 eV Maximum force =  2.0744e+00 eV/Ang |
+------------------------------------------------------------------------------+
+------------------------------------------------------------------------------+
|                                                                              |
| Device Calculation  [Started Wed Mar 23 07:44:11 2011]                       |
|                                                                              |
+------------------------------------------------------------------------------+
+------------------------------------------------------------------------------+
|                                                                              |
| Device Density Matrix Calculation   [Started Wed Mar 23 07:45:00 2011]       |
|                                                                              |
+------------------------------------------------------------------------------+
| Left electrode chemical potential  = -0.090023 Ha                            |
| Right electrode chemical potential = -0.090023 Ha                            |
+------------------------------------------------------------------------------+
rank 4 in job 1  compute-0-9_56863   caused collective abort of all ranks
  exit status of rank 4: killed by signal 9

I can't handle it by myself ! help me ,please !

Offline fanjiaping

  • Heavy QuantumATK user
  • ***
  • Posts: 67
  • Reputation: 0
    • View Profile
 :'( !
We have requested the sales office for upgrading ATK for us to the latest version. but they told us that the lastest version can't be installed in our cluster! So we have to continue working with the ATK10.8.2 ! But the ATK10.8.2 always brings problems such as over the buffer,  terminated normally but without proper results as it supposed to be(eg:I want to get ten transimissionSpectrum but only six can be obtain (I'm sure there is no problem in my script))!  
« Last Edit: March 25, 2011, 03:25 by fanjiaping »

Offline fanjiaping

  • Heavy QuantumATK user
  • ***
  • Posts: 67
  • Reputation: 0
    • View Profile
 :'(!
Can any one give some ideas to solve those issues given above?

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5428
  • Country: dk
  • Reputation: 89
    • View Profile
    • QuantumATK at Synopsys
Your first step should be to upgrade to ATK 11.2, which as far as I can judge should be no problem for you; your license should support it, and there are no additional system requirements for 11.2 compared to 10.8.

That should reduce the memory for the calculations substantially, and also speed things up. The error message above looks a bit weird (i.e. perhaps not only related to memory), but the only way to troubleshoot is to test with 11.2 so any potential problem can be addressed in that version, as there is no point to fix it in 10.8.

Offline fanjiaping

  • Heavy QuantumATK user
  • ***
  • Posts: 67
  • Reputation: 0
    • View Profile
Thanks for putting pressure on sales office. We have installed the lastest version of ATK. We will try again. Wish everything goes well! ;D