Meta-learning in a Portfolio of Machine Learning Workflows

By Mireille Blay-Fornarino and Frédéric Precioso

Recent advances in Machine Learning (ML) have brought new solutions for the problems of prediction, decision, and identification. ML is impacting almost all domains of science or industry but determining the right ML workflow for a given problem remains a key question. To allow not only experts in the field to benefit from ML potential, last years have seen an increasing effort from the big data companies (Amazon AWS, Microsoft Azure, Google AutoML…) to provide any user with simple platforms for designing their own ML workflow. However, none of these solutions consider the design of ML workflow as a generic process intending to capture common processing patterns between workflows (even through workflows targeting different application contexts). These platforms either propose a set of dedicated solutions for given classes of problem (i.e. AutoML Vision, AutoML natural language , AutoML Translation…) or propose a recipe to build your own ML workflow from scratch (i.e. MS Azure Machine Learning studio, RapidMiner).

Is it then possible to envision the meta-learning process of designing ML workflow as a systematic approach analyzing past experiences to identify, explain and predict the right choices? This PhD thesis will address this issue by correlating research on software architectures (including product lines) and meta-learning, to bring ML workflow design to the next level by producing explanation on algorithm choices and by cutting portfolio exploration complexity identifying common patterns between workflows.

Context

Advances in Machine Learning (ML) have brought new solutions for the problems of prediction, decision, and identification. To determine the right ML workflow for a given problem, numerous parameters have to be taken in account: the kind of data, expected predictions (error, accuracy, time, memory space), the choice of the algorithms and their judicious composition [14,11].
To help with this task, Microsoft Azure Machine Learning, Amazon AWS, and RapidMiner Auto Model[12] provide ML component assembly tools. However, faced with the complexity of choosing the “right” assembly, meta-learning offers an attractive solution, learning from the problems of the past. The algorithm selection problem is one of its applications [5]: given a dataset, identify which learning algorithm (and which hyperparameter setting) performs best on it.
Algorithm Portfolio generalizes the problem and automates the construction of selection models [8]. The immediate goal is the same: to predict the results of the algorithms on a given problem without executing them. Even if, in the portfolio, some selection models are built by meta-learning [1], the purpose is different. The portfolio is based on the systematic acquisition of knowledge about the algorithms it contains. The research then focuses on the quality and the return of knowledge, the acquisition process itself and the construction of selection models over time.

However, these solutions focus on recommending a single algorithm, while it has been widely recognized that the quality of the results can be markedly improved by selecting the right workflows, i.e., a complete chain of pre-processing operators, algorithms, and post-algorithms. One of the additional challenges is then the growth of the search space. Meta-learning and portfolio approaches are based on the premise that the relationship between data and a good algorithm can be learned. For this it is essential to learn from past experiences such as, for example, databases of experiments (e.g., OpenML [13]); the accuracy of the selection models depends then on the coverage of the problem area. This is why different solutions combine meta-learning and online learning [7, 2]. For example, auto-sklearn builds on previous experiences to reduce combinatorics and then learns directly from the problem in an automatic confrontation process of a predefined set of classification processing pipelines built over the library scikit-learn [6]. All these solutions are based on pre-selected algorithms and the use of statistics without trying to build new knowledge or capitalize on these performances.

Objectives

The construction of a portfolio requires covering a space of experiments broad enough to “cover” all the problems that may be submitted to it.
However, (1) the space of problems and solutions presents a very great 1) “diversity” [16] even within a single class of problem like classification [9].
(2) The resources required for ML experiments are massive (time, memory, energy)2).
(3) As the ML domain is particularly productive, the portfolio must be able to evolve to integrate new algorithms.
(4) To cope with the mass of data, the transformation of experimental results into knowledge requires the implementation of automatic analysis procedures.

The objective of this thesis is, therefore, to propose different paradigms for constructing a portfolio of machine-learning workflows that meet these requirements: relevance of the results while taming the space of experiments, explanation of knowledge from meta-learning (the meta-learning is not a black box), automation of selection processes.
The PhD work will be organized to provide contributions in the following directions:
1- A representation of experiments in the form of graphs [10] and exploitation of these structures by adapted learning algorithms[3,15]; In particular, this approach should be exploited to reduce the search space and explain the choices made[4];
2- An automation of the analysis of the results of experiments to identify properties and patterns of workflows such as similarities, cliques/stables that correspond to algorithms that statistically must always or never be used together;
3- A systematic exploitation of this structure to reduce the number of executions, to drive the workflow compositions, to manage the feedback loop, and to justify choices.

References

1. Camillieri C, Parisi L, Blay-Fornarino M, Precioso F, Riveill M, Vaz JC (2016) Towards a Software Product Line for Machine Learning Workflows: Focus on Supporting Evolution. Proc. 10th Work. Model. Evol. co-located with ACM/IEEE 19th Int. Conf. MODELS 2016, pp 65–70

2. Clavera, I. et al. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning. in International Conference on Learning Representations (2019).

3. CSIRO's Data61, StellarGraph Machine Learning Library. GitHub Repository (2018).

4. Duffau C, Camillieri C, Blay-Fornarino M (2017) Improving confidence in experimental systems through automated construction of argumentation diagrams. ICEIS 2017

5. Fernández-Delgado, M., Cernadas, E., Barro, S. & Amorim, D. Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? J. Mach. Learn. Res. 15, 3133–3181 (2014).

6. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and Robust Automated Machine Learning. Advances in Neural Information Processing Systems 28. (pp. 2962-2970).

7. Gagliolo, M. & Schmidhuber, J. Learning dynamic algorithm portfolios. in Annals of Mathematics and Artificial Intelligence (2006). doi:10.1007/s10472-006-9036-z

8. Gomes, C. P., & Selman, B. (2001). Algorithm portfolios. Artif. Intell., 126(1–2), 43–62.

9. Rice, J. R. (1976). The Algorithm Selection Problem. Advances in Computers, 15(C), 65–118.

10. Robinson, I., Webber, J. & Emil, E. Graph Databases NEW OPPORTUNITIES FOR CONNECTED DATA. O’Reilly Media, Inc. (2014).

11. Serban F, Vanschoren J, Kietz J-U, Bernstein A (2013) A survey of intelligent assistants for data analysis. ACM Comput Surv.

12. Van Rijn, J. N. & Vanschoren, J. Sharing RapidMiner workflows and experiments with OpenML. in CEUR Workshop Proceedings (2015).

13. Vanschoren, J., van Rijn, J. N., Bischl, B. & Torgo, L. OpenML: Networked science in machine learning. ACM SIGKDD Explor. Newsl. (2013).

14. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput.

15. BakIr, G., & Neural Information Processing Systems Foundation. (2007). Predicting structured data. Cambridge, Mass: MIT Press.

16. Pohl, K., Böckle, G. & van der Linden, F. J. Software Product Line Engineering: Foundations, Principles and Techniques. (Springer-Verlag, 2005).

17. Bilalli, B., Abelló, A. & Aluja-Banet, T. On the predictive power of meta-features in OpenML. Int. J. Appl. Math. Comput. Sci. 27, (2017).

1) The variability subjects are related to pretreatment, algorithms, datasets, evaluation criteria, experimental results. Each subject has several variants. For example, in OpenML, for each dataset downloaded, 61 dataset meta-features are calculated[17]. There are more than a hundred classification algorithms[5], etc.
2) The number of theoretical experiments to study p pretreatments, n algorithms and d data sets is 2^p*n*d. For 10 preprocessing algorithms, 100 classification algorithms and 100 sets of data, considering that each experiment only lasts one minute, it would take more than 7000 days of execution time.