Two useful modules to help you find the best ML model for your task

FLAML and LazyPredict are two packages designed to quickly train and test machine learning models from scikit-learn so that you can determine which is the best type of model for learning from your data.

Both are easily pip-installable, and require only a few lines of code. When training for classification, all you need for LazyPredict is :

from lazypredict.Supervised import LazyClassifier
clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None)
models,predictions = clf.fit(X_train, X_test, y_train, y_test)

while for FLAML, this is all that is needed:

from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")

While their ideas are very similar, there are a few key differences:

  1. Models trained: FLAML searches different parameters for each model, whereas LazyPredict only trains using default parameters
  2. Flexibility: FLAML is a much more flexible and customisable package – it is possible to set time budget, optimization metrics (could be custom), estimator, search space, validation split strategy and more. Additionally, tasks beyond simple classification and regression can be specified.
  3. Test data: as mentioned above, in FLAML you can specify what strategy should be used to split validation data from the original dataset. However, it is not possible to provide specific test data. LazyPredict requires explicit test data, so the ‘best’ model upon first simple application of LazyPredict is that which performs best upon your selected test data, whereas the ‘best’ model from FLAML is that which is best fitted to a subsection of your training data. Where you have a more complex data splitting strategy (e.g. molecular/sequence similarity), LazyPredict will give you a better idea of the better model for generalizable learning, whereas FLAML may give you the model more likely to overfit on the training data.

Author