students:phd_mlws
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revisionNext revisionBoth sides next revision | ||
students:phd_mlws [2017/05/20 07:06] – created blay | students:phd_mlws [2017/05/20 07:32] – [Machine Learning Workflow System] blay | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Machine Learning Workflow System ====== | ====== Machine Learning Workflow System ====== | ||
+ | This subject is proposed as part of the [[http:// | ||
===== Context ===== | ===== Context ===== | ||
For many years, Machine Learning research has been focusing on designing new algorithms for solving similar kinds of problem instances (Kotthoff, 2016). However, Researchers have long ago recognized that a single algorithm will not give the best performance across all problem instances, e.g. the No-Free-Lunch-Theorem (Wolpert, 1996) states that the best classifier will not be the same on every dataset. Consequently, | For many years, Machine Learning research has been focusing on designing new algorithms for solving similar kinds of problem instances (Kotthoff, 2016). However, Researchers have long ago recognized that a single algorithm will not give the best performance across all problem instances, e.g. the No-Free-Lunch-Theorem (Wolpert, 1996) states that the best classifier will not be the same on every dataset. Consequently, | ||
Line 8: | Line 9: | ||
A Machine Learning (ML) Workflow can be defined as a tuple (h,p,c) where h represents hyper-parameter tuning strategy, | A Machine Learning (ML) Workflow can be defined as a tuple (h,p,c) where h represents hyper-parameter tuning strategy, | ||
The construction of a Machine Learning Workflow depends upon two main aspects: | The construction of a Machine Learning Workflow depends upon two main aspects: | ||
- | The structural characteristics (size, quality, and nature) of the collected data | + | |
- | How the results will be used | + | * How the results will be used. |
This task is highly complex because of the increasing number of available algorithms, the difficulty in choosing the correct preprocessing techniques together with the right algorithms as well as the correct tuning of their parameters. To decide which algorithm to choose, data scientists often consider families of algorithms in which they are experts, and can leave aside algorithms that are more “exotic” to them, but could perform better for the problem they are trying to solve. | This task is highly complex because of the increasing number of available algorithms, the difficulty in choosing the correct preprocessing techniques together with the right algorithms as well as the correct tuning of their parameters. To decide which algorithm to choose, data scientists often consider families of algorithms in which they are experts, and can leave aside algorithms that are more “exotic” to them, but could perform better for the problem they are trying to solve. | ||
Line 24: | Line 25: | ||
The thesis must address the following challenges: Relevance and quality of predictions and Scalability to manage the huge mass of ML workflows. | The thesis must address the following challenges: Relevance and quality of predictions and Scalability to manage the huge mass of ML workflows. | ||
To meet these challenges, attention should be paid to the following aspects: | To meet these challenges, attention should be paid to the following aspects: | ||
- | Handling Variabilities: | + | * //Handling Variabilities: |
- | Architecture of portfolio to automatically manage (1) experiment running, (2) collect of experiment results, (3) analyze of results, (4) evolution of algorithm base. It must support the management of execution errors, incremental analyzes, identifying context of experiments. | + | |
- | Handling Scalability of Portfolio: | + | * //Handling Scalability of Portfolio: |
- | Ensuring global consistency of Portfolio and Software Product Line. Such a system is enriched by additions to the portfolio and experiment feedbacks. As " | + | * //Ensuring global consistency// of Portfolio and Software Product Line. Such a system is enriched by additions to the portfolio and experiment feedbacks. As " |
- | We have a two-year experience on this subject which has enabled us to (I) eliminate some approaches (e.g. modeling knowledge as a system of constraints because it generates on our current basis more than 6 billion constraints), | + | |
+ | We have a two-year experience on this subject which has enabled us to (I) eliminate some approaches (e.g. modeling knowledge as a system of constraints because it generates on our current basis more than 6 billion constraints), | ||
The thesis must investigate the research around the selection of algorithms, considering the automatic composition of workflows and supporting dynamic evolutions. It is therefore a thesis in software engineering research but to address one of the current most central problems in machine learning. | The thesis must investigate the research around the selection of algorithms, considering the automatic composition of workflows and supporting dynamic evolutions. It is therefore a thesis in software engineering research but to address one of the current most central problems in machine learning. |
students/phd_mlws.txt · Last modified: 2017/05/28 18:03 by blay