Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - abhola

Pages: [1]
1
One more observation here is that , if i run in serial mode without using MPI , then

16 CPU m/c takes 48 minutes per loop.
8 CPU m/c takes 52 minutes per loop.

even though 8 CPU m/c has faster CPU , more cache ?


2
Yes difference is reproducable in each run. There is nothing running on the machines except atk.
yes i meant MPI nodes.


Regards
Anshu

3
Yes , i am initiating 8 threads on 8 CPU cluster and 16 threads on 16 CPU cluster, i also tried setting MKL variable but it seems nothing is clicking :(

Regards
Anshu

4
Thanks for replying.

On 8 CPU system we have
# free -m
             total       used       free     shared    buffers     cached
Mem:         14033        741      13292          0          5         96
-/+ buffers/cache:        639      13394
Swap:         1983         65       1918

On 16 CPU System it's

             total       used       free     shared    buffers     cached
Mem:         16051        329      15721          0         82        123
-/+ buffers/cache:        123      15928
Swap:         1983         51       1932

CPU configurations are different but 8 CPU system have better configuration than 16 CPU sytem.
but 8 CPU sytem is taking much more time.

PER CPU CONFIG in 16 CPU system
------------------------------------
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7340  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.090
cache size      : 4096 KB
physical id     : 6
siblings        : 4
core id         : 3
cpu cores       : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips        : 4800.39
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
---------------------------------------------
PER CPU CONFIG in 8 CPU sytem
---------------------------------------------
processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Xeon(R) CPU           E5450  @ 3.00GHz
stepping        : 6
cpu MHz         : 3000.116
cache size      : 6144 KB
physical id     : 1
siblings        : 4
core id         : 3
cpu cores       : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips        : 6000.19
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:
----------------------------------------------------------

Boards are same.

Regards
Anshu

5
Hi,

I am doing benchmarking for Two probe configuration using ATK software.

we have two systems
   1) 16 CPU with per CPU configuration
      cpu MHz         : 2400.090
      cache size      : 4096 KB

   2) 8 CPU with per CPU configuration
      cpu MHz         : 3000.116
      cache size      : 6144 KB

The problem is that with 16 CPU system the running time is apx 12 hours but on 8 CPU sytem it is much more that 12 * 2 (24) hours
On 16 CPU machine one loop takes 3-4 minutes but on 8 CPU nachine it takes 18 - 40 minutes.

We are using MPICH2.

Can some please suggest me if this is expected behaviour ?
Or there is some configuration/ sytem problem / something we can try.

Regards
Anshu

Pages: [1]