14 MPlus CPU Use
14.1 Test Program
The CPU performance test program for Mplus was developed after consulting Mplus User’s Guide (8e) (2017, p.708-710). Mplus's MIXTURE
analysis has been optimized to use multiple processors, when they are available.
The test problem consisted of running a latent growth model in three latent classes, with 9170 observations. The data were a random subset of an SSCC user’s research data. On Winstat the actual (hardware) limit is 16 processors. On Linstat there are 36 physical cores.
Title: 03 classes;
Data:
File is csa9170.dat;
Variable:
Names are
idnew cbsa4
pwt0 pbt0 pht0 pat0
pwt1 pbt1 pht1 pat1
pwt2 pbt2 pht2 pat2
pwt3 pbt3 pht3 pat3
pwt4 pbt4 pht4 pat4 ;
Missing are all (-9999) ;
Usevariables are
pbt0 pht0 pat0
pbt1 pht1 pat1
pbt2 pht2 pat2
pbt3 pht3 pat3
pbt4 pht4 pat4 ;
AUXILIARY = idnew ;
CLASSES = c(3);
Analysis:
TYPE = MIXTURE;
STARTS = 200 20;
STSEED = 218783;
!OPTSEED = 46371;
Processor=12;
MODEL:
%OVERALL%
ib sb | pbt0@0 pbt1@1 pbt2@2 pbt3@3 pbt4@4 ;
ih sh | pht0@0 pht1@1 pht2@2 pht3@3 pht4@4 ;
ia sa | pat0@0 pat1@1 pat2@2 pat3@3 pat4@4 ;
ib ih ia sb sh sa;
OUTPUT: tech11 ;
I used the MplusAutomation package in R to generate, run, and analyze the Mplus jobs.
14.2 Number of CPUs
I did 15 repetitions of the latent class problem at each of 4, 8, 12, and 16 CPUs on Winstat, with Mplus setting the number of processors requested. On Linstat I tested 8, 16, 24, and 32 CPUs. The times (in seconds) to complete each latent class problem on Winstat were taken with no other active users (one disconnected user). The times on Linstat were taken 3 cores in use by other users.
CPUs | 4 | 8 | 12 | 16 | 24 | 32 |
---|---|---|---|---|---|---|
Winstat mean (sec) | 121.6 | 83.7 | 79.1 | 84.5 | - | - |
sd | 1.15 | 3.17 | 3.25 | 4.84 | - | - |
——————- | —— | —- | —- | —- | —- | —- |
Linstat mean | - | 78.8 | - | 65.8 | 48.3 | 53.8 |
sd | - | 7.97 | - | 4.21 | 8.55 | 9.67 |
The Linstat results for 32 CPUs are measurably (p=.00x) slower than those for 24 CPUs, suggesting there is a penalty for requesting more cores than are available. The Winstat results for 16 CPUs are measurably slower than those for 12 CPUs (p=.00x), suggesting that there is a penalty for requesting all of the available CPUs.
14.3 Competing for CPUs
Next these tests were performed on a Winstat server where another Mplus program was also requesting many CPUs. Two copies of the program (modified to produce separate output files) were launched in quick succession. These were performed with both 8 CPU and 16 CPU requests.
Times for each latent class between competing runs of the same CPU request were compared with a t-test. There was no measurable difference between the first run launched and the second run. Competing programs were affected equally, so the times are pooled here.
Competing on Winstat | 8 CPUs | 16 CPUs | problems | 30 problems @ 8 CPUs each |
---|---|---|---|---|
1 program | 83.7 | 84.5 | 15 | \((83.7\times 15)\times 2 = 2511\) sec |
2 programs | 112.0 | 448.3 | 30 | \((14.7\times 15) = 1680\) sec |
Adding CPUs speeds processing of individual problems, as long as the available CPUs are not all in use. And having to compete for CPUs slows down individual problems, again no surprise.
There is a substantial penalty when the number of CPUs requested actually exceeds the number available.
Requesting exactly as many CPUs as are available results in greater computational efficiency - more work is accomplished in less time. However requesting more CPUs than exist can produce substantial inefficiency!
Note: additional discussion of Mplus’ problems when requesting too many cores is here,
http://www.statmodel.com/discussion/messages/11/2261.html?1504051922