Starting with EASEA

De
Aller à la navigation Aller à la recherche

Note for users : .ez files are compatible with a c++ syntax, meaning that colour syntaxing can be used in modern editors.
For vi / vim / macvim users, you can add the follwing line:

autocmd BufRead,BufNewFile  *.ez set filetype=cpp

in your .vimrc or .gvimrc file for an automatic colour syntaxing of .ez files.



Compilation and execution of the supplied weierstrass.ez example file

Once you have installed the EASEA platform using the instructions found on the Installing EASEA page, go to the examples/weierstrass directory, in which you should find a weierstrass.ez file. Before we analyse the contents of this file, please make sure that everything runs fine.

  • In the examples/weierstrass directory, start by compiling the weierstrass.ez file by typing:
    $ easena weierstrass
    If all goes well, you should receive congratulations for succeeding in creating target files with no warnings.
  • The target files in question are C++ source files that need to be compiled by your preferred c++ compiler (gnu if possible). In order to help you in this task, EASEA has created a complete makefile for you, so in practice, compilation is obtained by simply typing:
    $ make
  • Compilation should take place without a warning. In case you are using a version of EASEA that requests the libboost library and you get an error message where the linker complains about not finding libboost, then you can install it with the following line:
    $ sudo apt-get install libboost-program-options1.40-dev
    The program should now compile without a glitch.

It is now time to test the minimization of the Weierstrass function by typing:
$ ./weierstrass

As a result, the following lines should appear:

Population initialisation (Generation 0)...
GEN TIME EVAL BEST AVG STDEV
0 0.155148 100 1.33e+02 1.50e+02 6.81e+00
1 0.281657 200 1.30e+02 1.45e+02 6.67e+00
2 0.407462 300 1.26e+02 1.41e+02 4.93e+00
...
97 13.315471 9800 1.05e+02 1.14e+02 3.79e+00
98 13.456843 9900 1.05e+02 1.15e+02 3.87e+00
99 13.608257 10000 1.05e+02 1.14e+02 3.44e+00
Current generation 100 Generational limit : 100
Stopping criterion reached : 0

Note that artificial evolution being a stochastic process, the values you obtain should be different from those above (and also different across runs).

What is displayed above is respectively the generation number, time spent since the beginning of the run, number of times the evaluation function has been called, fitness obtained by the best individual of the generation, average fitness of the population and standard deviation.

Generation 0 is in fact the initialisation of the population. Asking for 100 generations will result in the creation of the initial population + 99 generations, as seen above.

Contents of an EASEA .ez file

Typically, in order to implement an evolutionary algorithm to solve a problem, one needs to provide a representation for an individual, along with four functions that are specific to that problem:

An initialisation function
Its job is to initialise the genome of each individual in the population. A purely random initialisation is recommended unless one can reduce the search space by narrowing down the possibilities to feasible individuals only.
A crossover function
The user should code here how a child (child) should be created out of two selected parents (parent1 and parent2).
A mutation function
One a child has been created thanks to the crossover function, most evolutionary algorithms provide for a way to mutate it in order to explore other areas than those sampled by the genetic pool that can be found in the population.
An evaluation function
Finally, the most important function of all is the evaluation function, that should be able to tell whether an individual is good or not. It is worth mentioning that evolutionary algorithms can cope with a partial order among individuals. If the reduction selector is implemented using an anti-tournament, one can discard the worst individual among several (arity of the anti-tournament), meaning that one only needs to be able to tell if an individual is better or worse than another. This does not require the ability to give an absolute rating to each and every individual.

This is about all that is necessary to write into a .ez specification file to get EASEA to wrap a complete evolutionary algorithm around those functions.

User declarations

In order to help the developer, EASEA allows to define global variables as well as macro definitions, that are all stored into a section called User declarations. This section is implemented on the very top of the generated c++ file, meaning that the aforementioned functions have access to all global variables declared here.

Default run parameters

Then, the .ez file provides a section that can be used to store default parameters that allow to specialise the evolutionary algorithm that will evolve the individuals. EASEA was designed so that it is able to emulate all kinds of evolutionary algorithms, from simple Genetic Algorithms to Evolution Strategies (including CMA-ES), as well as Genetic Programming and memetic algorithms. A future version will allow to specify multi-objective evolutionary algorithms.

Going through weierstrass.ez

The weierstrass.ez file is about the simplest one can write to get an evolutionary algorithm running.

User declarations

\User declarations :
// This section contains user declarations
#define SIZE 100         // Number of dimensions of the problem
#define ITER 120         // Number of iterations of the Weierstrass loop
#define MAX(x,y) ((x)>(y)?(x):(y))
#define MIN(x,y) ((x)<(y)?(x):(y))
float fMUT_PER_GENE=0.1; // probability of mutation per gene
\end

The code starts with some macro-definitions, to define the size of the arrays to use (dimension of the problems) and the number of iterations in the Weierstrass loop.

The Weierstrass function is quite interesting to implement for testing purposes: it makes it possible to explore how very large genomes behave (a 1000 dimensions problem will require to optimise 1000 real values, i.e. 4KB in single precision or 8KB in double precision), as well as how much time is taken by the algorithm itself, compared with the execution of the evaluation function, as it is possible to change the complexity of the evaluation function by simply modifying the number of iterations that are required to evaluate the fitness of an individual.

As far as results are concerned, it is a very irregular curve that gives any optimisation algorithms a very hard time in finding the global optimum (0 on all dimensions of the problem).

A MAX and a MIN operators are macro-defined, and a global floating point variable (that contains the mutation probability per gene) is declared and initialised, to be used in the mutation function.

User classes

\User classes :
  // Contains the declaration of the structure of the Genome.
  // GenomeClass is a predefined keyword that is the name of the Genome Class.
GenomeClass { 
  float x[SIZE];
}
\end

All EASEA programs must declare GenomeClass in the \User classes: section. Generally speaking, EASEA syntax looks like simplified C++. The GenomeClass can contain several variables of different types, as well as uni-dimensional arrays. Predefined base types are bool, int, float, double, char as well as pointers, but users can define other classes of their own, that can be used in the GenomeClass.

This is how multi-dimensional arrays are declared: in order to define an (n x m) array, one needs to define an array of n objects of type T, itself defined as an array of m objects of another type. This will be illustrated in the listsort.ez example.

For the Weierstrass problem, an individual contains an array of SIZE floats, SIZE being the number of dimensions of the problem.

GenomeClass initialiser

\GenomeClass::initialiser:
  // Initialisation function (keyword "initializer" is also accepted)
  // "Genome" is the genome of the current individual to be initialised
  // "random" is a function provided by EASEA
  for(int i=0; i<SIZE; i++ ) 
  	Genome.x[i] = random(-1.0,1.0);
\end

The contents of this section is quite straightforward. Note that EASEA implements a user-friendly random(a,b) function (cf. EASEA defined functions section of the user manual), that returns a random value within [a,b[ of the same type as its two parameters. random(-1.0,1.0) will return a float within [-1,1[.

GenomeClass crossover

\GenomeClass::crossover:
  // How to creat "child" out of "parent1" and "parent2".
  // By default, "child" is a clone of "parent1".
  for (int i=0; i<SIZE; i++) {
    float alpha = random(0.,1.); // barycentric crossover
    child.x[i] = alpha*parent1.x[i] + (1.-alpha)*parent2.x[i];
  }
\end

This function uses three predefined variables: child, parent1 and parent2. Both parents are selected by the evolutionary engine according to what is specified in the parameters section, and child is a clone of parent1, to simplify the task of users who would like to implement a multipoint crossover.

Here, a basic barycentric crossover is implemented as Weierstrass is a real valued problem.

Note that the probability of calling the crossover function is defined in the parameters section. This value (between 0 and 1) is stored in the XOVER_PROB global variable (cf. EASEA defined variables) that can be read and modified (if the user wants to change the crossover probability depending on the value of currentGeneration with reference to the predefined number of generations stored in NB_GEN).

If the crossover function is not called to create a child, then, the child is a clone of a selected parent.

GenomeClass mutator

\GenomeClass::mutator :
  // Mutation function for genome "Genome".
  // Must return the number of mutations
  int NbMut=0;
  for (int i=0; i<SIZE; i++)
    if (tossCoin(fMUT_PER_GENE)){
      NbMut++;
      Genome.x[i] += random(-0.1,0.1);
      Genome.x[i] = MIN(1,Genome.x[i]); 
      Genome.x[i] = MAX(-1,Genome.x[i]);
    }
  return NbMut;
\end

In this function, Genome contains the recently created child to be mutated. Note that the bool tosscoin(float) EASEA defined function is used to flip a biased coin. fMUT_PER_GENE contains the probability for the tosscoin function to return TRUE.

The MIN and MAX macro-definitions defined in the User declarations: section are used to make sure that the mutation does not create values that are outside of an [-1,1] interval.

The mutator returns the number of mutated genes for statistics and optimisation purposes: if a child was created through parent cloning and none of the genes of the child get mutated, then, it can retain the fitness value of the parent.

The probability of calling the mutation function on a newly born child is defined in the parameters section of the file. It can be accessed both for reading and writing through the MUT_PROB EASEA defined variable. A common practice consists in decreasing the MUT_PROB value as the number of generations (accessible through the currentGeneration variable) increases.

GenomeClass evaluator

\GenomeClass::evaluator :
  // Must return an evaluation of the quality of the "Genome"
  float val[SIZE];
  float res = 0., b = 2., h = 0.25;
  for (int i = 0;i<SIZE; i++) {
    val[i] = 0.;
    for (int k=0;k<ITER;k++)
      val[i] += pow(b,-(float)k*h) * sin(pow(b,(float)k)*Genome.x[i]);
    res += Abs(val[i]);
  }
  return res;
\end

Then the essential function is the one that returns the fitness of the genome. Here, it computes the Weierstrass function according to the Hölder parameter (controlling the irregularity of the function) and a number of iterations, for each of the dimensions of the problem. The evaluation function must return a real value reflecting the fitness of the individual.

Run parameters

\Default run parameters:     // Please let the parameters appear in this order
  Number of generations: 100 // Value stored in the global NB_GEN variable
  Time limit: 0              // In seconds, 0 to deactivate
  Population size: 100       // Value stored in the global POP_SIZE variable
  Offspring size: 100%       // Percentage of POP_SIZE, or integer value
  Mutation probability: 1    // MUT_PROB (prob. to call the mutation function)
  Crossover probability: 1   // XOVER_PROB (prob. to call the Xover function)
  Evaluator goal: minimise   // Other possibility is "Maximise"
  Selection operator: Tournament 2
  Surviving parents: 0%      //percentage or absolute  
  Surviving offspring: 100%
  Reduce parents operator: Tournament 2
  Reduce offspring operator: Tournament 2
  Final reduce operator: Tournament 2

  Elitism: Strong                // Weak or Strong
  Elite: 1                       // Nb of elite individuals
  Print stats: true              // true/false
  Generate csv stats file: false // true/false
  Generate gnuplot script: false // true/false
  Generate R script: false       // true/false
  Plot stats: false              // true/false

  Remote island model: false     // true/false
  IP file: ip.txt                // File containing all the remote island's IPs

  Save population: true          // Population saved in project_name.pop
  Start from file: false         // and read from project_name.pop
\end

Finally, the <.ez> file ends with a list of parameters that determine the behaviour of the evolutionary engine. A single pass parsing is made through this list so please, respect the order of the parameters.

Number of generations
This is a stopping criterion on the number of generations. This integer value is accessible in the program through the NB_GEN EASEA defined variable.
Time limit
Another stopping criterion, in seconds. A value of 0 deactivates this criterion.
Population size
The value of this parameter is accessible within the program through the POP_SIZE EASEA defined variable. Right now, it cannot be modified. The population size remains constant.
Offspring size
This is either an integer value (indicating how many children must be created through crossover and mutation at each generation), or a percentage of the population size. A % sign must be added to the percentage value for the parser to differentiate with a number of children.
Note that the number of offspring can be different from the population size. In case of a generational GA, offspring size is 100% of the population size. In a Steady State GA, only one child will be created per generation, and in Evolution Strategies, the number of children can be smaller or larger than the population size.
Mutation probability
As indicated, this is the probability (between 0 and 1) to call the mutation function. A common practice is to keep this value to 1, meaning that the mutation function is called 100% of the time, leaving it to the function to mutate or not the genes individually, as done in this example. The mutation probability is available throughout the program through the MUT_PROB EASEA defined variable.
Crossover probability
Same as the mutation probability, for the crossover function. In case of a generational GA, using a probability lower than 1 allows some parents to make it to the next generations, through children that are their clones. This allows to mimic an Evolution Strategiy Plus, that picks the new generation among a subpopulation made of parents + children.
Evaluator goal
This parameter tells the evolutionary engine whether it should try to minimise or maximise the fitness of the individuals. Selectors will work differently depending on the value of this parameter, that can be tested in the program thanks to the MINIMISE boolean EASEA defined variable. It is not possible to modify this variable during the run.
Selection operator
This parameter determines how parents will be selected in order to create children. It can take the following values: tournament, random, roulette, deterministic. Note that roulette is not compatible with a minimisation of the fitness of the individuals.
Tournament expects a parameter that determines the arity of the tournament. If the parameter is smaller than 1, a stochastic tournament is implemented, that returns the best of 2 individuals with a probability within [.5,1]. Therefore, tournament .5 is equivalent to random and tournament 1 is equivalent to tournament 2.
Surviving parents
This is the first of four parameters that are confusing to the beginner.
EASEA is not restricted to Genetic Algorithms (Generational or Steady-State algorithms that evolve bitstrings) or Evolution Strategies (real valued algorithms that create a temporary population made of parents+children before they select the next generation), but can do both as well as a mix of both.
The way to do this is to allow the user to specify exactly what he wants. The Surviving parents parameter allows to tell how many parents should be included in the temporary population that will be used to create the new generation. In case of a Steady State algorithm (that creates only one child per generation), the number of surviving parents should be set to POP_SIZE-1. In order to make things clear, the mainstream evolutionary algorithms page discusses the different algorithms and proposes standard parameter settings to emulate them.
Surviving offspring
This parameter tells how many children should be included in the tempoprary population that will be used to create the new generation. In an Evolutionary Strategy +, this number is typically smaller than the population size. Please refer to the mainstream evolutionary algorithms to understand clearly how things work.
Reduce parents operator
Allows to tell which reduction operator (among tournament, random, roulette, deterministic to use to select parents to be included in the temporary population. Notice that this is a reductor, and not a selector. The difference between a reductor and a selector is that a reductor cannot select several times the same individual. Even though they share the same name (because they work on the same principle), a reduction tournament is not implemented the same way as a selection tournament.
Reduce offspring operator
Same as above, but to select which children will be part of the temporary population.
Final reduce operator
Reductor to be used to create the new generation out of the temporary population made of children+parents. Please refer to the mainstream evolutionary algorithms to understand clearly how things work.
Elitism
Strong or weak. In the case of strong elitism, some of the best parents are immediately saved into the new generation as a very first step. Then, the algorithm continues, but whatever parents are already in the next generation will remain there. In the case of weak elitism, no parents are cloned into the next generation to start with. Elitism will be performed after the temporary population made of parents + children is created. Some of the best individuals of this temporary population will be transferred to the new generation in a deterministic way. Please refer to the mainstream evolutionary algorithms to understand clearly how things work.
Print stats
True/false. Displays infos on the terminal at each generation.
Generate csv stats file
True/false
Generate gnuplot script
True/false
Generate R script
True/false
Plot stats
True/false
Remote island model
True/false. If true, each run will try to periodically send individuals to some other runs on machines whose ip identifier is listed in the file. Right now, options are hard-coded, but this will change soon.
IP file
must provide a file name containing a list of IPs of machines that will run the same project
Save population
True/false. Saves the population in project_name.pop for a future restart.
Start from file
True/false. Bypasses the initialisation function. The population is initialised from the file project_name.pop that was created during a previously saved run, allowing for a restart with different parameters (or why not a different evaluation function !