Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - chem-william

Pages: [1]
1
Hi,

I'm trying to calculate the phonon transmission of different junctions with gold electrodes and a molecule in the middle.

I'm a bit concerned about the resulting transmission as I've got substantial transmission at 0 omega (transmission > 1.0).
I took a look at the settings that the dynamical matrix was calculated with and it said the following:
Code
repetitions                        = (1, 1, 3)
which makes sense given I've told it to only make 1, 1, 1 repetitions in the settings for the dynamical matrix. But, what I suspect is that this is not enough for the phonons to go to zero and that's why I get such high transmission at omega = 0.

Is there any way to only make a bigger supercell for the electrodes, but not the unit cell? I'd like to avoid repeating the central region as it substantially increases the time to calculate the dynamical matrix.

2
Hi everyone

At our university, we have access to two different partitions: A and B. We use SLURM as workload manager.

If we submit a job to A that runs on multiple nodes, everything is fine. If we submit the same job to B, we get the following error:
Code
Fri Aug 12 08:15:30 CEST 2022
node642.cluster:UCM:a1d1:b570b740: 19751 us(19751 us):  create_ah: ERR Invalid argument
node642.cluster:UCM:a1d1:b570b740: 19760 us(9 us): UCM connect: snd ERR -> cm_lid 0 cm_qpn 10e r_psp 8104 p_sz=24
srun: Job step aborted: Waiting up to 602 seconds for job step to finish.
srun: error: node643: task 3: Killed
srun: launch/slurm: _step_signal: Terminating StepId=35302717.0
node642.cluster:UCM:a1d2:2000c740: 20600 us(20600 us):  create_ah: ERR Invalid argument
node642.cluster:UCM:a1d2:2000c740: 20613 us(13 us): UCM connect: snd ERR -> cm_lid 0 cm_qpn 10e r_psp 8104 p_sz=24
node642.cluster:UCM:a1d3:8fc04740: 22407 us(22407 us):  create_ah: ERR Invalid argument
node642.cluster:UCM:a1d3:8fc04740: 22418 us(11 us): UCM connect: snd ERR -> cm_lid 0 cm_qpn 10e r_psp 8104 p_sz=24
[0:node642][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:247] error(0x30000): ofa-v2-mlx5_0-1u: could not connect DAPL endpoints: DAT_INSUFFICIENT_RESOURCES()
Fatal error in MPI_Init: Internal MPI error!, error stack:
MPIR_Init_thread(805).................: fail failed
MPID_Init(1859).......................: channel initialization failed
MPIDI_CH3_Init(147)...................: fail failed
dapl_rc_setup_all_connections_20(1434): generic failure with errno = 16
(unknown)(): Internal MPI error!
[1:node642][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:247] error(0x30000): ofa-v2-mlx5_0-1u: could not connect DAPL endpoints: DAT_INSUFFICIENT_RESOURCES()
Fatal error in MPI_Init: Internal MPI error!, error stack:
MPIR_Init_thread(805).................: fail failed
MPID_Init(1859).......................: channel initialization failed
MPIDI_CH3_Init(147)...................: fail failed
dapl_rc_setup_all_connections_20(1434): generic failure with errno = 16
(unknown)(): Internal MPI error!
[2:node642][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:247] error(0x30000): ofa-v2-mlx5_0-1u: could not connect DAPL endpoints: DAT_INSUFFICIENT_RESOURCES()
Fatal error in MPI_Init: Internal MPI error!, error stack:
MPIR_Init_thread(805).................: fail failed
MPID_Init(1859).......................: channel initialization failed
MPIDI_CH3_Init(147)...................: fail failed
dapl_rc_setup_all_connections_20(1434): generic failure with errno = 16
(unknown)(): Internal MPI error!
slurmstepd: error: *** STEP 35302717.0 ON node642 CANCELLED AT 2022-08-12T08:15:31 ***
srun: error: node642: tasks 0-2: Killed


We have the following minimal reproducing script:
Code
from __future__ import print_function
import socket
if processIsMaster():
    print("Master node:", end=' ')
else:
   print("Slave node:", end=' ')
print(socket.gethostname())

that gets submitted using the following script:
Code
#!/bin/bash

#SBATCH --ntasks=4
#SBATCH --cpus-per-task=2
#SBATCH --nodes=2
#SBATCH --time=0:10:00
#SBATCH --partition=B

date

module load kemi
module load ATK

srun -n4 --mpi=pmi2 atkpython test_mpi.py

As far as I'm told, neither partition has Infiniband, but only RoCE

Pages: [1]