User Tools

Site Tools


students:phd_mlws

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
students:phd_mlws [2017/05/20 09:09]
blay [Objectives]
students:phd_mlws [2017/05/20 09:31]
blay [Machine Learning Workflow System]
Line 1: Line 1:
 ====== Machine Learning Workflow System ====== ====== Machine Learning Workflow System ======
  
 +This subject is proposed as part of the [[http://​rockflows.i3s.unice.fr/​|ROCKFlows]] project involving the following researchers:​ [[mireilleblayfornarino.i3s.unice.fr|Mireille Blay-Fornarino]],​ [[http://​www.i3s.unice.fr/​~mosser/​start|Sébastien Mosser]] and [[http://​www.i3s.unice.fr/​~precioso/​|Frédéric Precioso]].
 ===== Context ===== ===== Context =====
 For many years, Machine Learning research has been focusing on designing new algorithms for solving similar kinds of problem instances (Kotthoff, 2016). However, Researchers have long ago recognized that a single algorithm will not give the best performance across all problem instances, e.g. the No-Free-Lunch-Theorem (Wolpert, 1996) states that the best classifier will not be the same on every dataset. Consequently,​ the “winner-take-all” approach should not lead to neglect some algorithms that, while uncompetitive on average, may offer excellent performances on particular problem instances. In 1976, Rice characterized this as the "​algorithm selection problem"​ (Rice, 1976). ​ For many years, Machine Learning research has been focusing on designing new algorithms for solving similar kinds of problem instances (Kotthoff, 2016). However, Researchers have long ago recognized that a single algorithm will not give the best performance across all problem instances, e.g. the No-Free-Lunch-Theorem (Wolpert, 1996) states that the best classifier will not be the same on every dataset. Consequently,​ the “winner-take-all” approach should not lead to neglect some algorithms that, while uncompetitive on average, may offer excellent performances on particular problem instances. In 1976, Rice characterized this as the "​algorithm selection problem"​ (Rice, 1976). ​
Line 8: Line 9:
 A Machine Learning (ML) Workflow can be defined as a tuple (h,p,c) where h represents hyper-parameter tuning strategy, ​ p represents a set of preprocessing techniques applied on the dataset, and c is a ML algorithm used to learn a model from the processed data and to predict then over new data. A Machine Learning (ML) Workflow can be defined as a tuple (h,p,c) where h represents hyper-parameter tuning strategy, ​ p represents a set of preprocessing techniques applied on the dataset, and c is a ML algorithm used to learn a model from the processed data and to predict then over new data.
 The construction of a Machine Learning Workflow depends upon two main aspects: The construction of a Machine Learning Workflow depends upon two main aspects:
- The structural characteristics (size, quality, and nature) of the collected data +         ​* ​The structural characteristics (size, quality, and nature) of the collected data 
- How the results will be used+         * How the results will be used.
 This task is highly complex because of the increasing number of available algorithms, the difficulty in choosing the correct preprocessing techniques together with the right algorithms as well as the correct tuning of their parameters. To decide which algorithm to choose, data scientists often consider families of algorithms in which they are experts, and can leave aside algorithms that are more “exotic” to them, but could perform better for the problem they are trying to solve. This task is highly complex because of the increasing number of available algorithms, the difficulty in choosing the correct preprocessing techniques together with the right algorithms as well as the correct tuning of their parameters. To decide which algorithm to choose, data scientists often consider families of algorithms in which they are experts, and can leave aside algorithms that are more “exotic” to them, but could perform better for the problem they are trying to solve.
  
students/phd_mlws.txt · Last modified: 2017/05/28 20:03 by blay