Skip to content
Last updated

Gluon Train

This notebook builds a prediction model using AutoGluon, automating data processing, feature engineering, model selection, ensembling, and hyperparameter tuning.

AutoGluon combines several ML models using stack ensembling. Supported models include:

  • Neural network (MXNet, FastAI)
  • Gradient Boosting Models (LightGBM, CatBoost, XGBoost)
  • Random Forests
  • Extremely Randomized Trees
  • k-Nearest Neighbors

Workflow Example

Find a sample workflow in Treasure Boxes.

+gluon_train:
  ml_train>:
    notebook: gluon_train
    model_name: gluon_model
    input_table: ml_dataset.bank_marketing
    target_column: loan 
    time_limit: 3 * 60 # soft time limit in seconds

Parameters

Parameter nameConsole NameDescriptionDefault
docker.task_memDocker Task MemTask memory size. Available: 64g, 128g (default), 256g, 384g, 512g (tier dependent).128g
input_tableInput TableTD table used for EDA as dbname.table_name.-
target_columnTarget ColumnColumn name used for the label.-
model_nameModel NamePrediction model name.-
problem_typeProblem TypeOne of binary, multiclass, regression, quantile. Inferred if not specified.None
oversampling_thresholdOversampling ThresholdThreshold rate of minority class for SMOTE oversampling (binary only). 0 disables.0.001
proba_calibrationProba CalibrationRun probability calibration after oversampling.True
eval_metricEval MetricAutomatically selected if not specified.None
ignore_columnsIgnore ColumnsColumns to ignore.time
time_limitTime LimitSoft training limit in seconds (max 24h). Hint to AutoGluon.60 * 60
sampling_thresholdSampling ThresholdThreshold used for sampling. See executed notebook.10_000_000
export_leaderboardExport LeaderboardExport leaderboard as TD table if specified.None
export_feature_importanceExport Feature ImportanceExport feature importance as TD table if specified.None
exclude_modelsExclude ModelModels to ignore.KNN
hide_table_contentsHide Table ContentsSuppress showing table contents.False
share_modelShare ModelShare trained models in an account.False
refit_fullRefit FullRetrain models on all data. Choices: best, false, default.default

Accepted eval_metric values:

  • Binary & Multiclass: accuracy, balanced_accuracy, f1, f1_macro, f1_micro, f1_weighted, average_precision, precision, precision_macro, precision_micro, precision_weighted, recall, recall_macro, recall_micro, recall_weighted, log_loss (multiclass default), pac_score
  • Binary only: roc_auc (binary default), roc_auc_ovo_macro
  • Regression: root_mean_squared_error (default), mean_squared_error, mean_absolute_error, median_absolute_error, r2
  • Quantile Regression: pinball_loss (default)

For more details, see the AutoGluon documentation.