Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Topics - chem-william

Pages: [1]

I'm trying to calculate the phonon transmission of different junctions with gold electrodes and a molecule in the middle.

I'm a bit concerned about the resulting transmission as I've got substantial transmission at 0 omega (transmission > 1.0).
I took a look at the settings that the dynamical matrix was calculated with and it said the following:
repetitions                        = (1, 1, 3)
which makes sense given I've told it to only make 1, 1, 1 repetitions in the settings for the dynamical matrix. But, what I suspect is that this is not enough for the phonons to go to zero and that's why I get such high transmission at omega = 0.

Is there any way to only make a bigger supercell for the electrodes, but not the unit cell? I'd like to avoid repeating the central region as it substantially increases the time to calculate the dynamical matrix.

Hi everyone

At our university, we have access to two different partitions: A and B. We use SLURM as workload manager.

If we submit a job to A that runs on multiple nodes, everything is fine. If we submit the same job to B, we get the following error:
Fri Aug 12 08:15:30 CEST 2022
node642.cluster:UCM:a1d1:b570b740: 19751 us(19751 us):  create_ah: ERR Invalid argument
node642.cluster:UCM:a1d1:b570b740: 19760 us(9 us): UCM connect: snd ERR -> cm_lid 0 cm_qpn 10e r_psp 8104 p_sz=24
srun: Job step aborted: Waiting up to 602 seconds for job step to finish.
srun: error: node643: task 3: Killed
srun: launch/slurm: _step_signal: Terminating StepId=35302717.0
node642.cluster:UCM:a1d2:2000c740: 20600 us(20600 us):  create_ah: ERR Invalid argument
node642.cluster:UCM:a1d2:2000c740: 20613 us(13 us): UCM connect: snd ERR -> cm_lid 0 cm_qpn 10e r_psp 8104 p_sz=24
node642.cluster:UCM:a1d3:8fc04740: 22407 us(22407 us):  create_ah: ERR Invalid argument
node642.cluster:UCM:a1d3:8fc04740: 22418 us(11 us): UCM connect: snd ERR -> cm_lid 0 cm_qpn 10e r_psp 8104 p_sz=24
[0:node642][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:247] error(0x30000): ofa-v2-mlx5_0-1u: could not connect DAPL endpoints: DAT_INSUFFICIENT_RESOURCES()
Fatal error in MPI_Init: Internal MPI error!, error stack:
MPIR_Init_thread(805).................: fail failed
MPID_Init(1859).......................: channel initialization failed
MPIDI_CH3_Init(147)...................: fail failed
dapl_rc_setup_all_connections_20(1434): generic failure with errno = 16
(unknown)(): Internal MPI error!
[1:node642][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:247] error(0x30000): ofa-v2-mlx5_0-1u: could not connect DAPL endpoints: DAT_INSUFFICIENT_RESOURCES()
Fatal error in MPI_Init: Internal MPI error!, error stack:
MPIR_Init_thread(805).................: fail failed
MPID_Init(1859).......................: channel initialization failed
MPIDI_CH3_Init(147)...................: fail failed
dapl_rc_setup_all_connections_20(1434): generic failure with errno = 16
(unknown)(): Internal MPI error!
[2:node642][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:247] error(0x30000): ofa-v2-mlx5_0-1u: could not connect DAPL endpoints: DAT_INSUFFICIENT_RESOURCES()
Fatal error in MPI_Init: Internal MPI error!, error stack:
MPIR_Init_thread(805).................: fail failed
MPID_Init(1859).......................: channel initialization failed
MPIDI_CH3_Init(147)...................: fail failed
dapl_rc_setup_all_connections_20(1434): generic failure with errno = 16
(unknown)(): Internal MPI error!
slurmstepd: error: *** STEP 35302717.0 ON node642 CANCELLED AT 2022-08-12T08:15:31 ***
srun: error: node642: tasks 0-2: Killed

We have the following minimal reproducing script:
from __future__ import print_function
import socket
if processIsMaster():
    print("Master node:", end=' ')
   print("Slave node:", end=' ')

that gets submitted using the following script:

#SBATCH --ntasks=4
#SBATCH --cpus-per-task=2
#SBATCH --nodes=2
#SBATCH --time=0:10:00
#SBATCH --partition=B


module load kemi
module load ATK

srun -n4 --mpi=pmi2 atkpython

As far as I'm told, neither partition has Infiniband, but only RoCE

Pages: [1]