Author Topic: Efficiency of multiprocessing and multithreading  (Read 15145 times)

0 Members and 1 Guest are viewing this topic.

Offline Anirban Basak

  • Heavy QuantumATK user
  • ***
  • Posts: 25
  • Country: in
  • Reputation: 0
    • View Profile
Efficiency of multiprocessing and multithreading
« on: August 30, 2011, 07:53 »
Hi,

        We are planning to buy QuantumWise SE and DFT licenses and need help with selection of workstations. We plan to buy 2 work stations. After reading the manuals and numerous posts here I gathered some ideas about parallelization of ATK calculations. However, I want to clarify about some questions regarding efficient use of OpenMP threading on multicore processors. Say I want to use one single node with 12 core processors (like two dell hex core processors each with 12MB L3 cache, 2.4-3.3GHz clock, 1333-1033MHz RAM speed) and 24-36MB of DDR3 system RAM.

For system A using 60-100 atoms (usually carbon) with doublezetapolaroid and system B using nearly 3000 atoms (consists of various types of atoms) with doublezetapolaroid ->

(1) Can OpenMP effectively use all the cores so that when the parallelized part of the calculation runs all 12 cores are opened for threading and used with highest efficiency? (please consider the limited bandwidth of cpu cache)

(2) If 1 above was not good enough, can using MPICH2 multiprocessing improve the picture? How many MPI processes are to be opened here for optimum performance?

(3) If 1 above was not good enough, can reducing the number of cores and running only OpenMP significantly improve the efficency of processor usage? (like two quad core processors or one hex core processors)

Expecting your most helpful reply and thank you in advance.

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5575
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: Efficiency of multiprocessing and multithreading
« Reply #1 on: August 30, 2011, 13:52 »
Thank you for these questions, and for making it so explicit. Even so, it's not entirely simple to answer in a conclusive way, but I will try.

The MPI benefit is generally larger than the pure OpenMP parallel scaling, except for problems which is strongly dominated by diagonalization (large bulk cells or molecules). ATK can parallelize in MPI over k-points and energy points. From a general perspective, your best option is to combine the two methods, by using 2 workstations, each with two quad-cores. This will allow you to run up to at least 4 MPICH2 processes, each of which can thread on 4 cores, for maximum performance. This will require 1 master and 3 slave licenses.

The added benefit of 2 more cores per socket (hexacores) is extremely small, but perhaps rather expensive. So the performance/money is small.

So, to summarize in relation to your points:

1. Not fully, no, except in a few cases.
2. Yes, absolutely. At least 4, in some cases (the small carbon system, and for transmission spectra etc) perhaps even 8 or more.
3. No, the combination of OpenMP and MPI is best (and the advantage of hexcore over quadcore is negligible).

Offline Anirban Basak

  • Heavy QuantumATK user
  • ***
  • Posts: 25
  • Country: in
  • Reputation: 0
    • View Profile
Re: Efficiency of multiprocessing and multithreading
« Reply #2 on: September 6, 2011, 09:00 »
Thank you so much for making me understand the multiprocessing efficiency so well. :)

However, we are in the process of quotation and we have encountered the term 'cluster' licence.  ???

I want to know what does a cluster licence mean?

Is it a group of 1 master and several slave licences?

Or, its a single licence capable of multiprocessing?
(Then Does it contain MPICH2 feature? If so how many cores can it run?)

Offline Anders Blom

  • QuantumATK Staff
  • Supreme QuantumATK Wizard
  • *****
  • Posts: 5575
  • Country: dk
  • Reputation: 96
    • View Profile
    • QuantumATK at Synopsys
Re: Efficiency of multiprocessing and multithreading
« Reply #3 on: September 6, 2011, 09:55 »
Yes, cluster means 1 master and many slaves (precise number is related to your actual cluster size, and the price ultimately).