Author(s): Michel Lang, Bernd Bischl, Jakob Richter, Patrick Schratz, Martin Binder, Florian Pfisterer, Raphael Sonabend, Marc Becker, Sebastian Fischer
A modern object-oriented machine learning framework. Successor of mlr.
Relationship with data.table
mlr3 was designed to integrate closely with data.table for efficient data handling in machine learning workflows. There are two main ways mlr3 is related to data.table:
Data Backend: mlr3 uses data.table as the core data backend for all Task objects. This means that when you work with tasks in mlr3, the underlying data is stored and managed using data.table. Moreover, users can leverage data.table syntax directly within mlr3 workflows. Accessing task data via task$data() returns a data.table, enabling you to apply data.table operations for data preprocessing, feature engineering, and subsetting without any additional conversion or overhead.
Result Storage: mlr3 stores various results such as predictions, resampling outcomes, and benchmarking results as data.table objects.
The mlr3 universe includes a wide range of tools taking you from basic ML to complex experiments. To get started, here is an example of the simplest functionality – training a model and making predictions.
In this example, we trained a decision tree on a subset of the penguins dataset, made predictions on the rest of the data and then evaluated these with the accuracy measure.
The mlr3 interface also lets you run more complicated experiments in just a few lines of code:
In this more complex example, we selected two tasks and two learners, used automated tuning to optimize the number of trees in the random forest learner, and employed a machine learning pipeline that imputes missing data, consolidates factor levels, and stacks models. We also showed basic features like loading learners and choosing resampling strategies for benchmarking. Finally, we compared the performance of the models using the mean accuracy with three-fold cross-validation.