EASEA and EASEA-CLOUD are Free Open Source Software (under GNU Affero v3 General Public License) developed by the SONIC (Stochastic Optimisation and Nature Inspired Computing) group of the BFO team at Université de Strasbourg. Through the Strasbourg Complex Systems Digital Campus, the platforms are shared with the UNESCO CS-DC UniTwin and E-laboratory on Complex Computational Ecosystems (ECCE).
EASEA (EAsy Specification of Evolutionary Algorithms) is an Artificial Evolution platform that allows scientists with only basic skills in computer science to implement evolutionary algorithms and to exploit the massive parallelism of many-core architectures in order to optimize virtually any real-world problems (continous, discrete, combinatorial, mixed and more (with Genetic Programming)), typically allowing for speedups up to x500 on a $3,000 machine, depending on the complexity of the evaluation function of the inverse problem to be solved.
Then, for very large problems, EASEA can also exploit computational ecosystems as it can parallelize (using an embedded island model) over loosely coupled heterogenous machines (Windows, Linux or Macintosh, with or without GPGPU cards, provided that they have internet access) a grid of computers or a Cloud.
In September 2015 in Strasbourg, the EASEA platform consists in
- 15 PCs (room J4) with 2x GTX560TI
- (352 cores, 1,26 Tflops) = 10560 cores, 38 Tflops
- 20 PCs (room J5) with GTX680
- (1536 cores, 3,1 Tflops) = 30720 cores, 62 Tflops
- 30 PCs (room T40) with GTX770
- (1536 cores, 3,2 Tflops) = 48080 cores, 96 Tflops
- 20 PCs (room T21) with GTX980
- (2048 cores, 4,6 Tflops) = 40960 cores, 92 Tflops
Total : 85 multi-core PCs, 340 CPU cores, 7 CPU Tflops, 100 GPGPU cards, 130 320 GPU cores, 288 GPU Tflops 295 total Tflops
In September 2015 (date of the last investment), this equated to the computing power of the 236th fastest machine in the world as recorded by the top 500 machine list.
When EASEA meets massive parallelism
The EASEA platform is the outcome from the emergence of low cost ($50 to $1000) massively parallel GPGPU (General Purpose Graphic Processing Unit) cards developed by the gaming industry and the "old" EASEA language that was developed in 1999 by Pierre Collet at INRIA, along with Evelyne Lutton, Marc Schoenauer and Jean Louchet (Collet, 2000, 2001).
A brief history of EASEA
Back in 1998, a French INRIA "Cooperative Research Action" ARC called EVOLAB started between four French research laboratories with the aim to come up with a software platform that would help out non expert programmers to try out evolutionary algorithms to optimise their applied problems.
A first prototype of the EASEA (EAsy Specification for Evolutionary Algorithms, pronounce [i:zi:]) language was demonstrated during the EA'99 conference in Dunkerque, and new versions regularly came out on its web-based sourceforge software repository until 2003.
In the meantime, the EASEA language (along with GUIDE (Collet, 2003), its dedicated Graphic User Interface) was used for many research projects and at least 4 PhDs, as well as for teaching evolutionary computation in several universities. Before 2000, the output of the EASEA compiler was a C++ source file that was using either the GALib or the EO libraries for their implementation of the evolutionary operators.
By 2000, the DREAM (Distributed Resource Evolutionary Algorithm Machine) European Project was starting (Arenas, 2002), and it needed a programming language. EASEA became the programming language of the project, and it was modified so as to output java sourcecode for the DREAM, while feeding on the same input files, for this was another nice feature of the EASEA language: the same .ez file could be compiled to create code for GALib, EO (Lutton, 2002) or then the DREAM, depending on a single option flag. This functionality allowed replicability across platforms and computers, as a published .ez program could be recompiled and executed with minimal effort on a Mac, a PC running under Windows or Linux, or any other machine where one of the target GALib / EO / DREAM libraries was installed.
Development of the EASEA language stopped with the DREAM project in 2003, and even though it was not supported anymore, EASEA is still regularly downloaded from its sourceforge repository, with more than 5,000 downloads since March 2001.
Development of the EASEA language stopped with the end of the DREAM project in 2003 and even though it was not supported anymore, EASEA was still regularly downloaded, with more than 5,000 downloads since March 2001.
With the advent of GPGPUs, its ability to generate a fully functional and compilable source code from a simple C-like description of the evolutionary operators (namely initialiser, evaluator, crossover operator and mutator) that were needed for a particular problem made it look like a wonderful solution to allow non-expert programmers to use GPGPUs. Then, the possibility to interconnect several EASEA programs as part of a whole optimization run thanks to an island model makes it a platform of choice to exploit clusters of machines, whether they host GPGPU cards or not.
Finally, because communication between islands is very scarce and thanks to the stochastic nature of evolutionary algorithms, it is possible to run the same EASEA program on loosely connected machines over the internet (and why not interconnect several clusters over the internet, for even more computing power).
A brief history of GPGPU cards
Back in 1965, Moore predicted that the evolution of technology would allow to double the number of transistors per square millimeter of silicium every year, and later on every two years (Moore, 1965). At first, most people mistook this law for the doubling of computer power every two years. This was not a big mistake because even though being able to put twice as many transistors on the same surface did not imply that computers would run twice as fast, it happened so that clock frequency (and more generally single core computer chips) more or less did.
But what was true in the past may not hold forever, and one must observe that it is now many years since the clock frequency of personal computers is stuck below 4GHz and now even 3GHz (independently of the verification of Moore's law). Therefore, even if clock speed is not everything, and certainly not the best performance indicator, the speed of a single core of a CPU does not increase as much as before, even though (thanks to Moore's law) the number of cores that chips can hold still increases at the same rate.
Even though a dual-core chip does not make the computer run twice as fast, manufacturers can claim that their power is multiplied by two, which is only correct if an application is run that is parallel enough so that it can use both cores at the same time. In computer graphics, many algorithms such as vertex or fragment shaders are inherently parallel, so graphics cards have been able to fully benefit from Moore's law, by massively multiplying the cores on the card to calculate in parallel the colour of many fragments. In 2005, some people realised that such cards, containing as many as 128 cores and claiming a computer power of several hundred gigaflops for less than a thousand dollars could be used for other calculations than vertex shading.
In November 2006, NVidia launched CUDA (Compute Unified Device Architecture), that provides a Software Development Kit and Application Programming Interface that allowed a programmer to use a variation of the C language to code algorithms for GeForce 8 series GPUs. Then NVidia came out with TESLA calculators, which are graphics cards without a graphic output, totally dedicated to scientific processing, that (in 2009) boasted 4.3 TeraFlops for around $8.000.
Exploiting massive parallelism to optimize any kinds of problems
Unfortunately, most human-designed algorithms cannot exploit parallelism as they are sequential by essence, meaning that they rely on instructions that feed on the result of previous calculations. Right now, algorithms that can benefit from massively parallel hardware are quite rare, and most of the time, it is not the whole algorithm that can be parallelized, but very small parts of it.
So, apart from their primary use in the games industry and some rare scientific applications, it may seem that massively parallel hardware cannot be used to solve generic problems. Fortunately, this is where evolutionary algorithms (and the EASEA platform) chime in, as the workflow of image rendering algorithms (that lead to the development of GPGPU cards) is virtually the same as that of evolutionary algorithms: for graphic rendering, the same pixel shader algorithm must be executed in parallel on millions of different pixels of the image, whereas in standard artificial evolution, the same "fitness function" can be executed in parallel to evaluate thousands of "individuals" of the "population".
GPGPU hardware and the EASEA compiler
In terms of hardware, why is it possible for GPGPU chips to host several hundred cores (480 cores for an nVidia GTX480 card in 2010) where "standard" CPUs are still limited to 4 cores (Intel Core I7 in 2010) ?
Well, GPU chip designers exploit the fact that the same rendering algorithm must be used on millions of different pixels or vertices. This allow to drastically simplify the architecture, since, for instance, one Fetch and Dispatch unit (which provides code instructions to arithmetic and logic units ALUs) can be shared by many ALUs (that must then execute the same instruction at the same time, but on different data). This special kind of parallelism falls in Flynn's Single Instruction Multiple Data (SIMD) category. On the opposite, multi-core CPUs cannot sharing the same Fetch and Dispatch units because they must be able to execute different instructions (i.e. a different code) in parallel. Multi-core CPUs fall in Flynn's Multiple Instruction Multiple Data (MIMD) category.
Then, since rendering algorithms work on millions of pixels, and only hundreds of cores are available, the same core will need to execute the very same program over and over again on thousands of different pixels, meaning that heavy pipelining can be envisaged. This allows to do without a memory cache and use the saved space for yet more cores: high latency memory is not a problem if enough threads can be stacked up on one core. Once more, a solution that is acceptable due to the specificites of graphics rendering algorithms is not acceptable for one core of a MIMD multi-core CPUs, which needs cache memory as more standard algorithms are not often required to run several hundred times on different data.
All in all, the hardware specificities of GPGPU cards makes them hard to program in an efficient way, especially for applied scientists who may not be computer geeks, and as for evolutionary algorithms, this is where the EASEA language can provide some help.
In its original implementation, the EASEA compiler was wrapping a complete evolutionary algorithm around basic functions representing a problem to be optimized. The modernized compiler of the EASEA massively parallel platform still does the same, but automatically parallelizes the evaluation of the population over possibly several GPGPU cards in a quite efficient way.
Running the EASEA platform
Basically, in order to implement an evolutionary algorithm with the EASEA language, the user only needs to write problem-related code in a my_project.ez file, containing:
- a description of the structure of a potential solution (individual),
- an initialisation operator in order to create the initial population,
- a function to evaluate the quality of an individual (fitness function),
- a crossover operator, allowing to create an offspring out of two parents, and
- a mutation operator, allowing to mutate the created offspring.
Once this is done EASEA compiles the project_name.ez file into a set of c++ files that wrap a complete evolutionary algorithm around the provided code.
If compiled with the -cuda option, the EASEA compiler automatically parallelises the code for nVidia cards allowing speedups depending on the type of card hosted by the computer and the speed of the host CPU.
The theoretical limit on speedup is Amdalh's law, that states that it resides in the sequential portion of the algorithm. Indeed, if a program is 99% parallel and 1% sequential, then, supposing you can parallelize it perfectly and on an infinite number of cores, the maximum speedup you can get is x100. In order to obtain a x1000 speedup, one would need the parallel part of the program to represent as much as 99.9% of the code, which is quite difficult to achieve. Therefore, the reasonable speedup to be expected lies between x100 and x1000 (Maitre 2009a, 2009b, 2010a, 2010b, Kruger 2010).
Then, you can parallelize the same run using an island model over a large number of (identical or different) machines connected to the internet for even more power.
If your computer uses an nVidia graphics card or if you plan to buy one, this page is a good reference to evaluate the kind of computing power you can expect in TFlops.
Prof Pierre Collet.