User Tools

Site Tools


students:phd_2019

This is an old revision of the document!


Learning variability of Machine Learning Workflows

By Mireille Blay-Fornarino and Frédéric Precioso

Depending on data set and objectives, different machine learning workflows perform differently, commonly known as the no free lunch theorem [17]. Is it then possible to envision the meta-learning process as a systematic approach to analyzing past experiences to identify, explain and predict the right choices? This PhD thesis will address this issue by correlating research on software architectures (including product lines) and meta-learning.

Context

Advances in Machine Learning (ML) have brought new solutions for the problems of prediction, decision, and identification. To determine the right ML workflow for a given problem, numerous parameters have to be taken in account: the kind of data, expected predictions (error, accuracy, time, memory space), the choice of the algorithms and their judicious composition [14,11]. To help with this task, Microsoft Azure Machine Learning, Amazon AWS, and RapidMiner Auto Model[12] provide ML component assembly tools. However, faced with the complexity of choosing the “right” assembly, meta-learning offers an attractive solution, learning from the problems of the past. The algorithm selection problem is one of its applications [5]: given a dataset, identify which learning algorithm (and which hyperparameter setting) performs best on it. Algorithm Portfolio generalizes the problem and automates the construction of selection models [8]. The immediate goal is the same: to predict the results of the algorithms on a given problem without executing them. Even if, in the portfolio, some selection models are built by meta-learning [1], the purpose is different. The portfolio is based on the systematic acquisition of knowledge about the algorithms it contains. The research then focuses on the quality and the return of knowledge, the acquisition process itself and the construction of selection models over time.

However, these solutions focus on recommending a single algorithm, while it has been widely recognized that the quality of the results can be markedly improved by selecting the right workflows, i.e., a complete chain of pre-processing operators, algorithms, and post-algorithms. One of the additional challenges is then the growth of the search space. Meta-learning and portfolio approaches are based on the premise that the relationship between data and a good algorithm can be learned. For this it is essential to learn from past experiences such as, for example, databases of experiments (e.g., OpenML [13]); the accuracy of the selection models depends then on the coverage of the problem area. This is why different solutions combine meta-learning and online learning [7, 2]. For example, auto-sklearn builds on previous experiences to reduce combinatorics and then learns directly from the problem in an automatic confrontation process of a predefined set of classification processing pipelines built over the library scikit-learn [6]. All these solutions are based on pre-selected algorithms and the use of statistics without trying to build new knowledge or capitalize on these performances.

Objectives

The construction of a portfolio requires covering a space of experiments broad enough to “cover” all the problems that may be submitted to it. Since the field is particularly productive, the portfolio must be able to evolve to integrate new algorithms. The space of problems and solutions presents a very great 1) “diversity” [16] even within a single class of problem like classification [9]. The resources required for ML experiments are massive (time, memory, energy)2). To cope with the mass of data, the transformation of experimental results into knowledge requires the implementation of automatic analysis procedures. The objective of this thesis is, therefore, to propose different paradigms for constructing a portfolio of machine-learning workflows that meet these requirements: relevance of the results while taming the space of experiments, explanation of knowledge from meta-learning (the meta-learning is not a black box), automation of selection processes. The PhD work will be organized to provide contributions in the following directions:
1- A representation of experiments in the form of graphs [10] and exploitation of these structures by adapted learning algorithms[3,15]; In particular, this approach should be exploited to reduce the search space and explain the choices made[4];
2- An automation of the analysis of the results of experiments to identify properties and patterns of workflows such as similarities, cliques/stables that correspond to algorithms that statistically must always or never be used together;
3- A systematic exploitation of this structure to reduce the number of executions, to drive the workflow compositions, to manage the feedback loop, and to justify choices.

** The number of theoretical experiments to study p pretreatments, n algorithms and d data sets is 2^p*n*d. For 10 preprocessing algorithms, 100 classification algorithms and 100 sets of data, considering that each experiment only lasts one minute, it would take more than 7000 days of execution time.

References

1. Camillieri C, Parisi L, Blay-Fornarino M, Precioso F, Riveill M, Vaz JC (2016) Towards a Software Product Line for Machine Learning Workflows: Focus on Supporting Evolution. Proc. 10th Work. Model. Evol. co-located with ACM/IEEE 19th Int. Conf. MODELS 2016, pp 65–70

2. Clavera, I. et al. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning. in International Conference on Learning Representations (2019).

3. CSIRO's Data61, StellarGraph Machine Learning Library. GitHub Repository (2018).

4. Duffau C, Camillieri C, Blay-Fornarino M (2017) Improving confidence in experimental systems through automated construction of argumentation diagrams. ICEIS 2017

5. Fernández-Delgado, M., Cernadas, E., Barro, S. & Amorim, D. Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? J. Mach. Learn. Res. 15, 3133–3181 (2014).

6. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and Robust Automated Machine Learning. Advances in Neural Information Processing Systems 28. (pp. 2962-2970).

7. Gagliolo, M. & Schmidhuber, J. Learning dynamic algorithm portfolios. in Annals of Mathematics and Artificial Intelligence (2006). doi:10.1007/s10472-006-9036-z

8. Gomes, C. P., & Selman, B. (2001). Algorithm portfolios. Artif. Intell., 126(1–2), 43–62.

9. Rice, J. R. (1976). The Algorithm Selection Problem. Advances in Computers, 15(C), 65–118.

10. Robinson, I., Webber, J. & Emil, E. Graph Databases NEW OPPORTUNITIES FOR CONNECTED DATA. O’Reilly Media, Inc. (2014).

11. Serban F, Vanschoren J, Kietz J-U, Bernstein A (2013) A survey of intelligent assistants for data analysis. ACM Comput Surv.

12. Van Rijn, J. N. & Vanschoren, J. Sharing RapidMiner workflows and experiments with OpenML. in CEUR Workshop Proceedings (2015).

13. Vanschoren, J., van Rijn, J. N., Bischl, B. & Torgo, L. OpenML: Networked science in machine learning. ACM SIGKDD Explor. Newsl. (2013).

14. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput.

15. BakIr, G., & Neural Information Processing Systems Foundation. (2007). Predicting structured data. Cambridge, Mass: MIT Press.

16. Pohl, K., Böckle, G. & van der Linden, F. J. Software Product Line Engineering: Foundations, Principles and Techniques. (Springer-Verlag, 2005).

17. Wolpert, D. H. & Macready, W. G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. (1997).

18. Bilalli, B., Abelló, A. & Aluja-Banet, T. On the predictive power of meta-features in OpenML. Int. J. Appl. Math. Comput. Sci. 27, (2017).

1)
The variability subjects are related to pretreatment, algorithms, datasets, evaluation criteria, experimental results. Each subject has several variants. For example, in OpenML, for each dataset downloaded, 61 dataset meta-features are calculated[18]. There are more than a hundred classification algorithms[5], etc.
2)
The number of theoretical experiments to study p pretreatments, n algorithms and d data sets is 2^p*n*d. For 10 preprocessing algorithms, 100 classification algorithms and 100 sets of data, considering that each experiment only lasts one minute, it would take more than 7000 days of execution time.
students/phd_2019.1557498950.txt.gz · Last modified: 2019/05/10 16:35 by blay