User Tools

Site Tools


students:phd_2019

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
students:phd_2019 [2019/05/10 20:31]
blay [References]
students:phd_2019 [2019/05/10 20:38]
blay [Meta-learning in a Portfolio of Machine Learning Workflows]
Line 3: Line 3:
 // //
  
-Recent advances in Machine Learning (ML) have brought new solutions for the problems of prediction, decision, and identification. ML is impacting almost all domains of science or industry but determining the right ML workflow for a given problem remains a key question. To allow not only experts in the field to benefit from ML potential, last years have seen an increasing effort from the big data companies (Amazon AWS, Microsoft Azure, Google AutoML...) to provide any user with simple platforms for designing their own ML workflow. However, none of these solutions consider the design of ML workflow as a generic process intending to capture common processing patterns between workflows (even through workflows targeting different application contexts). These platforms either propose a set of dedicated solutions for given classes of problem (i.e. AutoML Vision, AutoML natural language , AutoML Translation...) or propose a recipe to build your own ML workflow from scratch (i.e. MS  Azure Machine Learning studio, RapidMiner).+Recent advances in Machine Learning (ML) have brought new solutions for the problems of prediction, decision, and identification. ML is impacting almost all domains of science or industry but determining the right ML workflow for a given problem remains a key question. To allow not only experts in the field to benefit from ML potential, last years have seen an increasing effort from the big data companies (Amazon AWS, Microsoft Azure, Google AutoML...) to provide any user with simple platforms for designing their own ML workflow. However, none of these solutions consider the design of ML workflow as a generic process intending to capture common processing patterns between workflows (even through workflows targeting different application contexts). These platforms either propose a set of dedicated solutions for given classes of problem (i.e. AutoML Vision, AutoML natural language , AutoML Translation...) or propose a recipe to build your own ML workflow from scratch (i.e. MS  Azure Machine Learning studio, RapidMiner).\\ 
 Is it then possible to envision the meta-learning process of designing ML workflow as a systematic approach analyzing past experiences to identify, explain and predict the right choices? ​ Is it then possible to envision the meta-learning process of designing ML workflow as a systematic approach analyzing past experiences to identify, explain and predict the right choices? ​
 This PhD thesis will address this issue by correlating research on software architectures (including product lines) and meta-learning,​ to bring ML workflow design to the next level by producing explanation on algorithm choices and by cutting portfolio exploration ​ complexity identifying common patterns between workflows. This PhD thesis will address this issue by correlating research on software architectures (including product lines) and meta-learning,​ to bring ML workflow design to the next level by producing explanation on algorithm choices and by cutting portfolio exploration ​ complexity identifying common patterns between workflows.
Line 16: Line 17:
 ===== Objectives ===== ===== Objectives =====
  The construction of a portfolio requires covering a space of experiments broad enough to "​cover"​ all the problems that may be submitted to it.   The construction of a portfolio requires covering a space of experiments broad enough to "​cover"​ all the problems that may be submitted to it. 
-However, (1) the space of problems and solutions presents a very great ((The variability subjects are related to pretreatment,​ algorithms, datasets, evaluation criteria, experimental results. Each subject has several variants. For example, in OpenML, for each dataset downloaded, 61 dataset meta-features are calculated[18]. There are more than a hundred classification algorithms[5],​ etc.)) “diversity” [16] even within a single class of problem like classification [9]. +However, (1) the space of problems and solutions presents a very great ((The variability subjects are related to pretreatment,​ algorithms, datasets, evaluation criteria, experimental results. Each subject has several variants. For example, in OpenML, for each dataset downloaded, 61 dataset meta-features are calculated[17]. There are more than a hundred classification algorithms[5],​ etc.)) “diversity” [16] even within a single class of problem like classification [9]. 
 (2) The resources required for ML experiments are massive (time, memory, energy)((The number of theoretical experiments to study p pretreatments,​ n algorithms and d data sets is 2^p*n*d. For 10 preprocessing algorithms, 100 classification algorithms and 100 sets of data, considering that each experiment only lasts one minute, it would take more than 7000 days of execution time.)).  ​ (2) The resources required for ML experiments are massive (time, memory, energy)((The number of theoretical experiments to study p pretreatments,​ n algorithms and d data sets is 2^p*n*d. For 10 preprocessing algorithms, 100 classification algorithms and 100 sets of data, considering that each experiment only lasts one minute, it would take more than 7000 days of execution time.)).  ​
 (3) As the ML domain is particularly productive, the portfolio must be able to evolve to integrate new algorithms. (3) As the ML domain is particularly productive, the portfolio must be able to evolve to integrate new algorithms.
students/phd_2019.txt · Last modified: 2019/05/10 20:40 by blay