Différences entre les versions de « EASEA examples »

De
Aller à la navigation Aller à la recherche
Ligne 1 : Ligne 1 :
== Tutorial ==
 
=== 1) The onemax problem ===
 
Please find below a basic file implementing a onemax problem (starting out of a population of individuals made of a totally random array of 1's and 0's, create an individual made of only 1's).
 
The onemax problem is quite difficult for artificial evolution because the evaluation function (sum of the bits composing the individuals) is not very informative and cannot guide the crossover function.
 
 
Find here the [[onemax.ez code]] to be copied into a file called "onemax.ez" (already in the set of examples).
 
 
This code is to be compiled with the "easena" compiler by using the following command line:
 
 
of an easea file implementing a basic onemax, for individuals made of an array of 1000 boolean values, and try to get it to find the onemax (an individual with 1000 1's) in 10 000 evaluations (note that the cost of artificial evolution must be measured in number of evaluations, not number of generations).
 
 
Copy the contents of [[basic onemax]] into a file called "onemax.ez" and compile using easena, by using the following command line:<br>
 
<code>$ easena onemax.ez</code><br>
 
If you look at the contents of your directory, you will find that easena has created many other c++ source files out of your onemax.ez file. Because easena is nice with its users, it has also created a Makefile file so that you can now compile the created source files with:<br>
 
<code>$ make</code><br>
 
You can now then run the <code>onemax</code> executable file with:
 
<code>$ ./onemax</code>
 
 
Because artificial evolution is a stochastic optimization algorithm, if you run it again, you will find another value. This means that no repeatability is guaranteed (you may find a wonderful solution one day and never find it again). This also means that bug chasing could be difficult if your code hangs once in a while.
 
 
Fortunately, random numbers are created thanks to a deterministic random number generator using a seed number as a starting point. If you run <code>$ ./onemax --seed 1</code> several times, you will always find value 7.30e+02.
 
 
This is wonderful because you can track the quality of the results obtained by your algorithm by running it with seeds 1, 2, 3, ... and get back to a particular result by running again the algorithm with the seed value that gave the interesting result.
 
 
Because of the variability of results coming out of a stochastic algorithm, but because it is possible to control this thanks to seed valueds, it is necessary to get some inspiration from the biologists and copy their good practice of keeping a laboratory notebook, where all parameters '''and seeds''' should be written down in order to be able to find back good parameters.
 
 
Back to onemax.ez: the 7.30e+02 (obtained with <code>--seed 1</code>) isn't that great. Could you find better parameters that would allowing to reach value 1000 in <= 10 000 evaluations?
 
 
[[experiment]]
 
 
=== 2) Faux-Fourier transform ===
 
The idea of the onemax tutorial was for you to get a really simple example of an evolutionary algorithm running, and have fun tuning the different parameters to see their effect. Let's get more serious now, by trying to find out how a periodic signal decomposes into a sum of sines.
 
 
As always in computer science, the best way to start a new project is to start from a previous working project, so let's start from the previous "optimised" good-onemax.ez, by copying it to faux_fourier.ez ($ cp onemax.ez faux_fourier.ez).
 
Beware: using "faux-fourier.ez" will get you a compilation error, because
 
 
== Reproducibility ==
 
Running evolutionary algorithms is a discipline that is very close to biology laboratory work, because if you use a different random seed each time you launch an experiment, you will always get different results.
 
 
Therefore, many tools used to manage wet labs can be of great inspiration when running evolutionary algorithms. Let me quote Rules 5, 6 and 7 from Santiago Schnell's excellent "Ten Simple Rules for a Computational Biologist’s Laboratory Notebook" that can be found here:<br>
 
Schnell S (2015) Ten Simple Rules for a Computational Biologist’s Laboratory Notebook. PLoS Comput Biol 11(9): e1004385. doi:10.1371/journal.pcbi.1004385
 
<b>Rule 5: Every Entry Should Be Recorded with a Date, Subject, and Protocol</b><br>
 
The most logical organization of a lab notebook is chronological. Each entry should contain
 
the date it was made and subject of the entry. If you make distinct sets of entries in the same
 
day, you should separate them by using heading titles and leave sufficient space between the
 
entries [1]. The titles of your entries are important. They should be short, sharp, and informative,
 
as you will use them to build a table of contents for your lab notebook. If you are using a
 
paper lab notebook, you will need to write the date and time stamp and make your entries legible,
 
written in permanent ink and in a language accessible to everyone in the laboratory. If you
 
use an electronic lab notebook, the date and time stamps will be entered automatically for each
 
new subject.<br>
 
You should also include a brief background for each entry [2]. It could be a short synopsis
 
of your thought process that explains why this entry is important for your scientific work. You
 
can support it with references published in the literature, ideas learned from a research talk, or
 
a previous model developed by your laboratory. Then, record everything you do for this specific
 
entry. If you make a mistake in your paper lab notebook, put a thin line through the
 
mistake and write the new information next to it [1]. Never erase or destroy an entry, because
 
errors or mistakes are an important part of the scientific process.<br>
 
<b>Rule 6: Keep a Record of How Every Result Was Produced</b><br>
 
The gold standard of science is reproducibility [7]. You need to keep a record of how every
 
result was produced in your in silico experiments, statistical analyses, and mathematical or
 
computational models. Noting the sequence of steps taken allows for a result or analysis to be
 
reproduced. For every step in your model, analysis, or experiment, you should record every
 
detail that will influence its execution [8]. This includes the preparation of your wet or dry
 
experiment, preprocessing, execution with intermediate steps, analysis, and postprocessing of
 
results [1,2]. You should also store the raw data for every figure. This will allow you to have the
 
exact values for the visualization of your results. It will also give you the opportunity to redraw
 
figures to improve their quality or ensure visual consistency for publication.
 
If a result is obtained through a mathematical model, algorithm, or computer program, you
 
need to include the equations and name and version of the algorithm or computer program, as
 
well as the initial conditions and parameters used in the model. In many instances, a statistical
 
or computational analysis creates intermediate data [9]. You should record intermediate results
 
because they can reveal discrepancies with your final results, particularly if your analysis is
 
time consuming or readily executable. At the same time, they can help you track any inconsistencies
 
in your analysis or algorithms [8]. Electronic lab notebooks can be very convenient for
 
storing data and linking to computer mathematical models, algorithms, and software stored in
 
cloud mass storage systems.<br>
 
<b>Rule 7: Use Version Control for Models, Algorithms, and Computer Code</b><br>
 
As a mathematical and computational biologist, you will be updating your models, algorithms,
 
or computer programs frequently. You will also create scripts containing initial conditions and
 
parameters to run analyses or simulations. Changes in your models, algorithms, programs, or
 
scripts could drastically change your results. If you do not systematically archive changes, it
 
will be very difficult or impossible to track the codes that you used to generate certain results
 
[8,9]. Nowadays, there are version control systems to track the evolution of algorithms and
 
computer programs or changes in scripts. Bitbucket, Git, Subversion, and Mercurial are among
 
the most widely used version-control systems. You should use a standardized name system to
 
identify changes. If you have a paper lab notebook, you should record the name and location of
 
the scripts. Those using electronic lab notebooks can add links to each version of their scripts
 
or programs.<br>
 
<b>References</b><br>
 
1. Thomson JA (2007). How to Start—and Keep—a Laboratory Notebook: Policy and Practical Guidelines.
 
In: Intellectual Property Management in Health and Agricultural Innovation: A Handbook of Best
 
Practices (eds. Krattiger A, Mahoney RT, Nelsen L, et al.). MIHR: Oxford, U.K., pp. 763–771.<br>
 
2. National Institutes of Health, Office of Intramural Training and Education’s Webinar on Keeping a Lab
 
Notebook. https://www.training.nih.gov/OITEtutorials/OITENotebook/Notebook.html<br>
 
3. Sandefur CI (2014). Blogging for electronic record keeping and collaborative research. Am J Physiol
 
Gastrointest Liver Physiol. 307:G1145–G1146. doi: 10.1152/ajpgi.00384.2014 PMID: 25359540<br>
 
4. De Polavieja Lab. Open LabBook. http://www.neural-circuits.org/open-labbook<br>
 
5. Todoroki S, Konishi T, Inoue S (2006). Blog-based research notebook: personal informatics workbench
 
for high-throughput experimentation. Appl Surf Sci 252: 2640–2645.<br>
 
6. Ruggiero P, Heckathorn MA (2012). Data backup options. Department of Homeland Security, United
 
States Computer Readiness Plan. https://www.us-cert.gov/sites/default/files/publications/data_
 
backup_options.pdf.<br>
 
7. Jasny BR, Chin G, Chong L, Vignieri S (2011). Data replication & reproducibility. Again, and again, and
 
again .... Introduction. Science. 334(6060):1225. doi: 10.1126/science.334.6060.1225 PMID:
 
22144612<br>
 
8. Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013). Ten simple rules for reproducible computational
 
research. PLoS Comput Biol. 9(10):e1003285. doi: 10.1371/journal.pcbi.1003285 PMID: 24204232<br>
 
9. Peng RD (2011). Reproducible research in computational science. Science. 334:1226–7. doi: 10.
 
1126/science.1213847 PMID: 22144613<br>
 
10. U.S. Department of Health & Human Services, Office of Research Integrity. Notebook and data management.
 
http://ori.hhs.gov/education/products/wsu/data.html<br>
 
11. U.S. Department of Health & Human Services, Office of Research Integrity. Data Management
 
Responsibilities. http://ori.hhs.gov/education/products/unh_round1/www.unh.edu/rcr/DataMgt-
 
Responsibilities01.htm<br>
 
 
 
== Examples ==
 
== Examples ==
 
Some working examples are already present in the <tt>examples/</tt> directory. As a new user of EASENA, you can find it useful to walk through proposed examples with changing various parameters.
 
Some working examples are already present in the <tt>examples/</tt> directory. As a new user of EASENA, you can find it useful to walk through proposed examples with changing various parameters.

Version du 10 avril 2020 à 23:37

Examples

Some working examples are already present in the examples/ directory. As a new user of EASENA, you can find it useful to walk through proposed examples with changing various parameters.

Your first Multi-Objective Optimization Problem example

As a simple example, we will show, how to define the 3-objective DTLZ1 problem using the NSGA-II algorithm. First of all, we select a test folder (examples/dtlz/dtlz1):

$ cd examples/dtlz1/

Now you can begin defining you multi-objective problem in *.ez file. In this example, the problem has to be defined in dtlz1.ez file as follow:

I. First, in section \User declarations:

1. The header files for Problem Interface and Genetic Operators must be included :

  1. include <problems/CProblem.h> //include header for problem discription
  2. include <operators/crossover/continuous/CsbxCrossover.h> //include header for sbx crossover operator
  3. include <operators/mutation/continuous/CGaussianMutation.h> //include header for gaussian mutation operator (CPolynomialMutation.h is also available)


2. Then you have to define main parameters of problem (a number of decision veriable and a number of objectives) as following:

  1. define NB_VARIABLES 10 // here we set 10 variables
  2. define NB_OBJECTIVES 3 // here we set 3 objectives


3. Genetic operator parameters have to be defined:

  1. define XOVER_DIST_ID 20 // crossover distribution index
  2. define MUT_DIST_ID 20 // mutation distribution index

4. Now it is time to use the Problem constructor, which is responsible for initializint the problem:
A discription of the constructor parameters is shown below. /*

* param[in 1] - number of objectives
* param[in 2] - number of decision variables
* param[in 3] - problem boundary
*/

TP m_problem(NB_OBJECTIVES, NB_VARIABLES, TBoundary(NB_VARIABLES, std::make_pair<TT, TT>(0, 1)));

By this constructor you indicate the problem which consists of 3 objectives, 10 decision variables with the min boundary = 0 and max boundary = 1.

5. And at the end, genetic operators must be defined:

typedef easea::operators::crossover::continuous::sbx::CsbxCrossover<TT, TRandom &> TCrossover; //Type of crossover
typedef easea::operators::mutation::continuous::pm::CGaussianMutation<TT, TRandom &> TMutation; //Type of mutation

To define crossover operator parameters:
/*

* param[in 1] - random generator
* param[in 2] - probability
* param[in 3] - problem boundary
* param[in 4] - distibution index
* */

TCrossover crossover(m_generator, 1, m_problem.getBoundary(), XOVER_DIST_ID);

To define mutation operator parameters:
/*

* param[in 1] - random generator
* param[in 2] - probability
* param[in 3] - problem boundary
* param[in 4] - distribution index
*/

TMutation m_mutation(m_generator, 1/m_problem.getBoundary().size(), m_problem.getBoundary(), MUT_DIST_ID);

II. Accepted denotations. Important to know!
TI - type of individual TI::m_variable - vector of decision variable of every individual;
TI::m_objective - vector of objective functions for every individual;
TT - type of decision variables;
TIter - iterator of desicion variables.
You can find example how to use it in the next section.

III. In order to define some extra functions, section \User functions has to be used:
As one example, the code is shown below:
template <typename TT, typename TIter>
TT userFunction1(TIter begin, TIter end) {

       TT sum = 0;
       for (TIter it = begin; it != end; ++it)
       {
               const TT tmp = *it - 0.5;
               sum += tmp * tmp - cos(20 * PI * tmp);
       }
       return 100 * (distance(begin, end) + sum);

}

IV. In order to define problem evaluation function, section \GenomeClass::evaluator has to be used.
For example, like below:

\GenomeClass::evaluator :

 // uses Genome to evaluate the quality of the individual
       const size_t pVariables = getNumberOfObjectives() - 1;
       const TT g = (1 + userFunction1(TI::m_variable.begin() + pVariables, TI::m_variable.end())) * 0.5;
       userFunction2(TI::m_variable.begin(), TI::m_variable.begin() + pVariables, TI::m_objective.begin(), TI::m_objective.end(), g);
       return 1;

\end


V. Compilation and running
After following instructions above, the problem is defined. Now you will be able to compile and run your program:
You select MOEA (possible options: -nsgaii, -nsgaiii, -asrea, -cdas) by changing script compile.sh.
Then run script:
$ ./compile.sh

If you have successfully compiled you .ez file you can find .cpp, .h, Makefile, executable and .prm files. By modifying file .prm, you can set a number of generation (nbGen) and a population size (popSize and nbOffspring must be the same).

Now you can run selected MOEA for resolving your problem and plotting results:
$ ./launch.sh

After execution, you can find in the same folder following files:
-.png - a figure of obtained Pareto Front
- objectives - values of objectives functions (an approximation of the Pareto optimal set)

V. Performance Metrics
When an executed multi-objective algorithm is done on a multi-objective problem, it outputs an approximation of the Pareto optimal set, which allows to measure the quality of an algorithm on a particular problem, if a true Pareto Front (PF - set of the optimal solutions) is known.

  1. Hypervolume (HV) maximisation: it provides the volume of the objective space that is dominated by a PF. HV is bounded by a reference point, which is usuall set by finding a worst-case objective value for each objective. It shows the convergence quality towards the PF and the diversity in the obtained solutions set. The main disadvantage of the HV is its computational complexity = O(n^(m-1)), where n - is a size of the non-dominated set, and m - is a number of objectives.
  2. Generational Distance (GD) minimization: it measures the average Euclidean distance between the optimal solutions, obtained by algorithm and those in the Pareto Optimal Front.
  3. Inverted Generational Distance (IGD) minimization: it is an inverted variation of Generational Distance that: i) calculates the minimum Euclidean distance between an obtained solution and the real PF and ii) measures both the diversity and the convergence towards the PF of the obtained set (if enough members of PF are known).

To use these performance metrics:

in your ez-file:

  1. define QMETRICS
  2. define PARETO_TRUE_FILE "pareto-true.dat"<vr>

where "pareto-true.dat" is a file with Pareto Otimal Front (which is in the same folder as ez-file). See examples in examples/zdt(4,6).ez and examples/dtlz(1-3).ez.

Weierstrass

Here a complete EASEA program, as found in the examples/ directory

In file weierstrass.ez:

/*_________________________________________________________

Test functions

log normal adaptive mutation

Selection operator: Tournament

__________________________________________________________*/

\User declarations :

  1. define SIZE 100
  2. define X_MIN -1.
  3. define X_MAX 1.
  4. define ITER 120
  5. define Abs(x) ((x) < 0 ? -(x) : (x))
  6. define MAX(x,y) ((x)>(y)?(x):(y))
  7. define MIN(x,y) ((x)<(y)?(x):(y))
  8. define SIGMA 1. /* mutation parameter */
  9. define PI 3.141592654


float pMutPerGene=0.1;

\end

\User functions:

  1. include <math.h>

__device__ __host__ inline static float SQR(float d) {

 return (d*d);

}

__device__ __host__ inline float rosenbrock( float const *x) {

 float qualitaet;
 int i;
 int DIM = SIZE;
       qualitaet = 0.0;
       for( i = DIM-2; i >= 0; --i)
         qualitaet += 100.*SQR(SQR(x[i])-x[i+1]) + SQR(1.-x[i]);
       return ( qualitaet);

} /* f_rosenbrock() */

__device__ __host__ inline float Weierstrass(float x[SIZE], int n) // Weierstrass multimidmensionnel h = 0.25 {

  float res = 0.;
  float val[SIZE];
  float b=2.;
  float h = 0.35;
  for (int i = 0;i<n; i++) {

val[i] = 0.;

   	for (int k=0;k<ITER;k++)

val[i] += pow(b,-(float)k*h) * sin(pow(b,(float)k)*x[i]); res += Abs(val[i]); }

  return (res);

}

float gauss() /* Generates a normally distributed random value with variance 1 and 0 mean.

   Algorithm based on "gasdev" from Numerical recipes' pg. 203. */

{

 static int iset = 0;
 float gset = 0.0;
 float v1 = 0.0, v2 = 0.0, r = 0.0;
 float factor = 0.0;
 if (iset) {
       iset = 0;
       return gset;
     	}
 else {    
       do {
           v1 = (float)random(0.,1.) * 2.0 - 1.0;
           v2 = (float)random(0.,1.) * 2.0 - 1.0;
           r = v1 * v1 + v2 * v2;

}

       while (r > 1.0);
       factor = sqrt (-2.0 * log (r) / r);
       gset = v1 * factor;
       iset = 1;
       return (v2 * factor);
   	}

} \end

\User CUDA: \end

\Before everything else function: { } \end

\After everything else function:

 //cout << "After everything else function called" << endl;

\end

\At the beginning of each generation function:{ } \end

\At the end of each generation function:

 //cout << "At the end of each generation function called" << endl;

\end

\At each generation before reduce function:

 //cout << "At each generation before replacement function called" << endl;

\end

\User classes :

GenomeClass {

 float x[SIZE];
 float sigma[SIZE]; // auto-adaptative mutation parameter

} \end

\GenomeClass::display: /* for( size_t i=0 ; i<SIZE ; i++){ */ /* // cout << Genome.x[i] << ":" << Genome.sigma[i] << "|"; */ /* printf("%.02f:%.02f|",Genome.x[i],Genome.sigma[i]); */ /* } */ \end

\GenomeClass::initialiser : // "initializer" is also accepted

 for(int i=0; i<SIZE; i++ ) {
    	Genome.x[i] = (float)random(X_MIN,X_MAX);

Genome.sigma[i]=(float)random(0.,0.5); } \end

\GenomeClass::crossover :

 for (int i=0; i<SIZE; i++)
 {
   float alpha = (float)random(0.,1.); // barycentric crossover
    child.x[i] = alpha*parent1.x[i] + (1.-alpha)*parent2.x[i];
 }

\end

\GenomeClass::mutator : // Must return the number of mutations

 int NbMut=0;
 float pond = 1./sqrt((float)SIZE);
   for (int i=0; i<SIZE; i++)
   if (tossCoin(pMutPerGene)){
   	NbMut++;
      	Genome.sigma[i] = Genome.sigma[i] * exp(SIGMA*pond*(float)gauss());
      	Genome.sigma[i] = MIN(0.5,Genome.sigma[i]);              
      	Genome.sigma[i] = MAX(0.,Genome.sigma[i]);
      	Genome.x[i] += Genome.sigma[i]*(float)gauss();
      	Genome.x[i] = MIN(X_MAX,Genome.x[i]);              // pour eviter les depassements
      	Genome.x[i] = MAX(X_MIN,Genome.x[i]);
   	}

return NbMut; \end

\GenomeClass::evaluator : // Returns the score {

 float Score= 0.0;
 Score= Weierstrass(Genome.x, SIZE);         
 //Score= rosenbrock(Genome.x);         
 return Score;

} \end

\User Makefile options: \end

\Default run parameters : // Please let the parameters appear in this order

 Number of generations : 100   	// NB_GEN
 Time limit: 0 			// In seconds, 0 to deactivate
 Population size : 2048			//POP_SIZE
 Offspring size : 2048 // 40% 
 Mutation probability : 1       // MUT_PROB
 Crossover probability : 1      // XOVER_PROB
 Evaluator goal : minimise      // Maximise
 Selection operator: Tournament 2.0
 Surviving parents: 100%//percentage or absolute  
 Surviving offspring: 100%
 Reduce parents operator: Tournament 2
 Reduce offspring operator: Tournament 2
 Final reduce operator: Tournament 2
 Elitism: Strong			//Weak or Strong
 Elite: 1
 Print stats: true				//Default: 1
 Generate csv stats file:false			
 Generate gnuplot script:false
 Generate R script:false
 Plot stats:true				//Default: 0
 Remote island model: true
 IP file: ip.txt 			//File containing all the remote island's IP
 Server port : 2929
 Migration probability: 0.33
 Save population: false
 Start from file:false

\end