Those keywords indeed only work in newer version, on the other hand there are good reasons why they were introduced, since they improve the results quite a bit... Yes, for 16 atoms the calculation will easily parallelize to 48 MPI processes. However, I cannot actually recall if the 13.8 version is parallel over displacements... But you should notice that quite fast because if it is, then the log file will be a bit messy - all DFT calculations are going on at the same time and writing output to the same file (improved a lot in the 2016 edition). If not, then it will parallelize over k-points instead and you still get some benefit.
PS: The latest 13.8 edition is 13.8.2, with many bug fixes over 13.8.2.