This shows you the differences between two versions of the page.

Both sides previous revision Previous revision | |||

students:phd_2019 [2019/05/10 20:38] blay [Context] |
students:phd_2019 [2019/05/10 20:40] (current) blay [Objectives] |
||
---|---|---|---|

Line 16: | Line 16: | ||

===== Objectives ===== | ===== Objectives ===== | ||

- | The construction of a portfolio requires covering a space of experiments broad enough to "cover" all the problems that may be submitted to it. | + | The construction of a portfolio requires covering a space of experiments broad enough to "cover" all the problems that may be submitted to it. \\ |

- | However, (1) the space of problems and solutions presents a very great ((The variability subjects are related to pretreatment, algorithms, datasets, evaluation criteria, experimental results. Each subject has several variants. For example, in OpenML, for each dataset downloaded, 61 dataset meta-features are calculated[17]. There are more than a hundred classification algorithms[5], etc.)) “diversity” [16] even within a single class of problem like classification [9]. | + | However, (1) the space of problems and solutions presents a very great ((The variability subjects are related to pretreatment, algorithms, datasets, evaluation criteria, experimental results. Each subject has several variants. For example, in OpenML, for each dataset downloaded, 61 dataset meta-features are calculated[17]. There are more than a hundred classification algorithms[5], etc.)) “diversity” [16] even within a single class of problem like classification [9].\\ |

- | (2) The resources required for ML experiments are massive (time, memory, energy)((The number of theoretical experiments to study p pretreatments, n algorithms and d data sets is 2^p*n*d. For 10 preprocessing algorithms, 100 classification algorithms and 100 sets of data, considering that each experiment only lasts one minute, it would take more than 7000 days of execution time.)). | + | (2) The resources required for ML experiments are massive (time, memory, energy)((The number of theoretical experiments to study p pretreatments, n algorithms and d data sets is 2^p*n*d. For 10 preprocessing algorithms, 100 classification algorithms and 100 sets of data, considering that each experiment only lasts one minute, it would take more than 7000 days of execution time.)). \\ |

- | (3) As the ML domain is particularly productive, the portfolio must be able to evolve to integrate new algorithms. | + | (3) As the ML domain is particularly productive, the portfolio must be able to evolve to integrate new algorithms.\\ |

(4) To cope with the mass of data, the transformation of experimental results into knowledge requires the implementation of automatic analysis procedures. | (4) To cope with the mass of data, the transformation of experimental results into knowledge requires the implementation of automatic analysis procedures. | ||

- | The objective of this thesis is, therefore, to propose different paradigms for constructing a portfolio of machine-learning workflows that meet these requirements: relevance of the results while taming the space of experiments, explanation of knowledge from meta-learning (the meta-learning is not a black box), automation of selection processes. | + | The objective of this thesis is, therefore, to propose different paradigms for constructing a portfolio of machine-learning workflows that meet these requirements: relevance of the results while taming the space of experiments, explanation of knowledge from meta-learning (the meta-learning is not a black box), automation of selection processes. \\ |

The PhD work will be organized to provide contributions in the following directions: \\ | The PhD work will be organized to provide contributions in the following directions: \\ | ||

1- A representation of experiments in the form of graphs [10] and exploitation of these structures by adapted learning algorithms[3,15]; In particular, this approach should be exploited to reduce the search space and explain the choices made[4];\\ | 1- A representation of experiments in the form of graphs [10] and exploitation of these structures by adapted learning algorithms[3,15]; In particular, this approach should be exploited to reduce the search space and explain the choices made[4];\\ |