:: :: :: :: :: ::
Introduction: When building software with the make tool, one of the options is the -jX option. This option allows you to specify the number of concurrent jobs to execute while building. The compilation time can be greatly reduced through the proper use of this option, especially with the abundance of multi-core processors commonly available, but what is the optimal number of concurrent jobs to run? Many people suggest using one more than the number of cores in your system, but is this really the best choice?
Experiment Setup: To try and measure the best -jX setting to use, we will perform a controlled, repeated compilation of a large software package on a handful of different machines, timing the real-world time for each test run. The machines used are all linux machines that I have access to, such as my work desktop or public machines at the University of Minnesota. All the machines had no other users than myself running during the tests. For a large compilation, I used the stock linux kernel version 2.6.26, and went through make oldconfig and answered with the default for all config questions. I then created a .tar.gz file of this directory, which was freshly extracted for each trial. I wrote a small python script to manage the execution of the test trials:import time, os def doTest(command): start = int(time.time() * 1000) os.system(command) end = int(time.time() * 1000) return end - start times = {} # lookup on j to get list of times for jay in ('1', '2', '3', '4', '5', '10', '15'): times[jay] = [] x = 1 numTrials = 5 for round in range(0,numTrials): for jay in ('1', '2', '3', '4', '5', '10', '15'): os.system("tar xzf test.tar.gz") os.chdir("test") cmd = "make -j" + jay print cmd t = doTest(cmd) times[jay].append(t) print "round", round, ", jay", jay, ", time:", t os.chdir("../") os.system("rm -rf test") logfile = open("logfile.txt", "a") logfile.write("trial " + str(round) + ", -j" + jay + "\t" + str(t) + "\n") logfile.close() print "Final Results:" for jay in times: print jay, times[jay]
For each machine tested, 5 trials were performed, and each round consisted of testing each of -j1, -j2, -j3, -j4, -j5, -j10, -j15 on a freshly-extracted kernel source tarball. Times are given in seconds. Click on the processor type to jump down to the detailed results for that machine.
Processor Type | Clock Speed (GHz) | Total Cores | RAM (GB) |
---|---|---|---|
Intel Core 2 Duo | 2.66 | 2 | 4 |
Intel Pentium 4 HT | 2.80 | 2 | 1 |
AMD Athlon 64 | 2.40 | 1 | 1 |
AMD Phenom 9600 | 2.30 | 4 | 4 |
Intel Core 2 Duo 2.66 GHz - 4 GB Ram |
|||||||
---|---|---|---|---|---|---|---|
J-value | Ratio | Mean | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 |
1 | 1.000 | 1571.78 | 1563.55 | 1566.71 | 1573.13 | 1588.73 | 1566.76 |
2 | 0.590 | 927.23 | 934.04 | 928.42 | 930.93 | 932.28 | 910.51 |
3 | 0.571 | 898.05 | 876.78 | 886.85 | 895.93 | 899.50 | 931.41 |
4 | 0.562 | 882.63 | 876.05 | 882.28 | 879.75 | 877.92 | 897.07 |
5 | 0.567 | 890.84 | 876.40 | 891.63 | 894.12 | 908.87 | 883.18 |
10 | 0.564 | 886.21 | 877.78 | 877.78 | 889.93 | 881.38 | 884.69 |
15 | 0.563 | 884.25 | 889.44 | 879.82 | 875.16 | 894.38 | 882.43 |
Intel Pentium 4 HT 2.8 GHz - 1 GB Ram |
|||||||
---|---|---|---|---|---|---|---|
J-value | Ratio | Mean | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 |
1 | 1.000 | 1951.20 | 1448.32 | 1430.30 | 1464.20 | 1455.47 | 1442.27 |
2 | 0.612 | 1194.54 | 1195.88 | 1195.81 | 1201.96 | 1193.21 | 1185.84 |
3 | 0.594 | 1158.28 | 1164.67 | 1175.82 | 1159.54 | 1146.35 | 1145.04 |
4 | 0.579 | 1130.43 | 1138.71 | 1122.99 | 1149.40 | 1105.01 | 1136.02 |
5 | 0.586 | 1143.74 | 1129.01 | 1121.67 | 1118.04 | 1112.10 | 1237.91 |
10 | 0.563 | 1097.62 | 1089.20 | 1097.69 | 1108.96 | 1092.51 | 1099.77 |
15 | 0.555 | 1083.38 | 1093.31 | 1082.10 | 1103.21 | 1076.49 | 1061.80 |
AMD Athlon 64 2.4 GHz - 1 GB Ram |
|||||||
---|---|---|---|---|---|---|---|
J-value | Ratio | Mean | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 |
1 | 1.000 | 2373.06 | 2343.05 | 2392.02 | 2370.22 | 2382.22 | 2377.78 |
2 | 1.000 | 2373.15 | 2347.67 | 2394.21 | 2376.37 | 2374.41 | 2373.11 |
3 | 1.015 | 2409.76 | 2346.25 | 2388.86 | 2391.08 | 2391.10 | 2531.51 |
4 | 1.022 | 2424.66 | 2370.27 | 2422.24 | 2391.12 | 2394.57 | 2545.08 |
5 | 1.020 | 2420.12 | 2374.85 | 2341.87 | 2418.54 | 2410.81 | 2464.54 |
10 | 1.048 | 2487.96 | 2631.05 | 2452.99 | 2534.49 | 2425.15 | 2396.14 |
15 | 1.033 | 2450.99 | 2458.87 | 2461.69 | 2470.71 | 2464.98 | 2398.69 |
AMD Phenom 9600 Quad-Core 2.3 GHz - 4 GB Ram |
|||||||
---|---|---|---|---|---|---|---|
J-value | Ratio | Mean | Trial 1 | Trial 2 | Trial 3 | Trial 4 | Trial 5 |
1 | 1.000 | 2334.28 | 2324.32 | 2332.98 | 2329.27 | 2333.37 | 2351.47 |
2 | 0.523 | 1219.84 | 1218.67 | 1222.81 | 1217.27 | 1223.58 | 1216.85 |
3 | 0.378 | 883.32 | 880.72 | 885.88 | 881.46 | 888.82 | 879.70 |
4 | 0.320 | 747.70 | 742.92 | 749.22 | 754.37 | 742.08 | 749.89 |
5 | 0.304 | 708.56 | 707.91 | 704.23 | 704.69 | 711.56 | 714.39 |
6 | 0.300 | 699.42 | 694.91 | 695.51 | 716.76 | 695.57 | 694.35 |
7 | 0.296 | 691.91 | 696.52 | 692.04 | 693.56 | 690.13 | 687.31 |
8 | 0.296 | 691.22 | 691.63 | 688.94 | 690.94 | 691.47 | 693.15 |
9 | 0.299 | 698.70 | 685.88 | 707.08 | 713.69 | 693.01 | 693.85 |
10 | 0.294 | 686.88 | 684.27 | 685.17 | 684.08 | 690.92 | 689.97 |
11 | 0.297 | 694.00 | 694.97 | 694.70 | 690.69 | 697.35 | 692.31 |
12 | 0.296 | 690.55 | 692.49 | 691.70 | 688.47 | 695.27 | 684.84 |
13 | 0.297 | 692.28 | 690.92 | 690.43 | 691.43 | 697.45 | 691.18 |
14 | 0.298 | 694.87 | 690.86 | 697.26 | 695.00 | 698.22 | 692.99 |
15 | 0.297 | 692.34 | 691.21 | 693.13 | 692.98 | 692.50 | 691.89 |
16 | 0.299 | 697.48 | 700.81 | 699.70 | 699.88 | 692.26 | 694.74 |
17 | 0.298 | 696.66 | 693.57 | 696.44 | 695.61 | 700.75 | 696.96 |
18 | 0.300 | 699.83 | 703.93 | 700.16 | 695.62 | 698.61 | 700.85 |
19 | 0.299 | 696.93 | 696.49 | 697.09 | 693.98 | 697.44 | 699.63 |
20 | 0.299 | 698.11 | 697.89 | 703.05 | 699.34 | 696.30 | 693.95 |
50 | 0.307 | 715.58 | 731.12 | 727.75 | 704.66 | 706.27 | 713.12 |
100 | 0.325 | 758.86 | 763.61 | 754.51 | 759.11 | 755.39 | 761.70 |
It appears that the N+1 guideline isn't really that important. For single core processors, all j-values seem to work just as well, and for multi-core processors, there is no significant difference in compile time between N and N+1.
Comments? Suggestions? Dissenting opinions? Email Me
Copyright © 2004 - 2025, Matthew L. Beckler, CC BY-SA 3.0
Last modified: 2009-12-16 01:27:54 PM (EST)