Parallelizing over CPU threads
Launch the executable code with option "----nbCPUThreads n"
Example with the weierstrass example (in the "examples" directory downloaded with easea) on a 8 cores Intel Core i7-9700K CPU:
$ ./weierstrass --nbGen 10 ... 9 9.83 20480 9.16e+01 1.04e+02 4.12e+00 1.17e+02 finished computation at Sun May 3 13:16:58 2020 elapsed time: 9.83656s EASEA LOG [INFO]: Seed: 1588504609
When launched with 2 threads, the result of the parallelized algorithm is the following:
$ ./weierstrass --nbCPUThreads 2 --nbGen 10 --seed 1588504609 ... 9 4.95 20480 9.16e+01 1.04e+02 4.12e+00 1.17e+02 finished computation at Sun May 3 17:21:44 2020 elapsed time: 4.9501s EASEA LOG [INFO]: Seed: 1588504609
Please note that launched with the same seed, the results of the parallel algorithm are identical to those of the sequential one (only the fitness function is run in parallel). The speedup factor is x1.987.
Now, with 4 threads: $ ./weierstrass --nbCPUThreads 4 --nbGen 10 --seed 1588504609
... 9 2.51 20480 9.16e+01 1.04e+02 4.12e+00 1.17e+02 finished computation at Sun May 3 17:25:07 2020 elapsed time: 2.51069s
...the speedup factor is x3,918.
With 8 threads (remember that this is an 8-core CPU):
$ ./weierstrass --nbCPUThreads 8 --nbGen 10 --seed 1588504609 ... 9 1.23 20480 9.16e+01 1.04e+02 4.12e+00 1.17e+02 finished computation at Sun May 3 17:28:48 2020 elapsed time: 1.23833s
... the speedup factor is x7,943.
If we increase the number of threads (20) :
$ ./weierstrass --nbCPUThreads 20 --nbGen 10 --seed 1588504609 ... 9 1.23 20480 9.16e+01 1.04e+02 4.12e+00 1.17e+02 EASEA LOG [INFO]: Stopping criterion is reached finished computation at Sun May 3 17:30:16 2020 elapsed time: 1.23324s
... the speedup factor is nearly identical x7,976
Please note that this is not always the case...
On a Macbook Air 8.1 with Dual-Core Intel Core i5 (1 processor, 2 cores, activated hyper-threading), execution time is:
1 thread: 10.5831s 2 threads : 6.42092s, speedup = x1,648 3 threads: 5.4376s, speedup = x1,946 4 threads: 4.98705s, speedup = x2,122 (!) 5 threads: 4.93427s, speedup = x2,145 (!!) 6 threads: 4.83617s, speedup = x2,188 (!!!) 7 threads: 4.90261s, speedup = x2,158 8 threads: 4.90446s, speedup = x2,157
A small variability is normal (the system is running many other processes in parallel, temperature is also a factor), but it is interesting to see that on this 2-core processor, the best speedup > 2 (!) was obtained with... 6 threads
There must be some serious scheduling optimization going on down there... :-)
As a conclusion, the general advice would to ask for more threads than there are cores in the processor, and let the system deal with the scheduling and load balancing.