This notebook builds a prediction model using AutoGluon, automating data processing, feature engineering, model selection, ensembling, and hyperparameter tuning.
AutoGluon combines several ML models using stack ensembling. Supported models include:
- Neural network (MXNet, FastAI)
- Gradient Boosting Models (LightGBM, CatBoost, XGBoost)
- Random Forests
- Extremely Randomized Trees
- k-Nearest Neighbors
Find a sample workflow in Treasure Boxes.
+gluon_train:
ml_train>:
notebook: gluon_train
model_name: gluon_model
input_table: ml_dataset.bank_marketing
target_column: loan
time_limit: 3 * 60 # soft time limit in seconds| Parameter name | Console Name | Description | Default |
|---|---|---|---|
| docker.task_mem | Docker Task Mem | Task memory size. Available: 64g, 128g (default), 256g, 384g, 512g (tier dependent). | 128g |
| input_table | Input Table | TD table used for EDA as dbname.table_name. | - |
| target_column | Target Column | Column name used for the label. | - |
| model_name | Model Name | Prediction model name. | - |
| problem_type | Problem Type | One of binary, multiclass, regression, quantile. Inferred if not specified. | None |
| oversampling_threshold | Oversampling Threshold | Threshold rate of minority class for SMOTE oversampling (binary only). 0 disables. | 0.001 |
| proba_calibration | Proba Calibration | Run probability calibration after oversampling. | True |
| eval_metric | Eval Metric | Automatically selected if not specified. | None |
| ignore_columns | Ignore Columns | Columns to ignore. | time |
| time_limit | Time Limit | Soft training limit in seconds (max 24h). Hint to AutoGluon. | 60 * 60 |
| sampling_threshold | Sampling Threshold | Threshold used for sampling. See executed notebook. | 10_000_000 |
| export_leaderboard | Export Leaderboard | Export leaderboard as TD table if specified. | None |
| export_feature_importance | Export Feature Importance | Export feature importance as TD table if specified. | None |
| exclude_models | Exclude Model | Models to ignore. | KNN |
| hide_table_contents | Hide Table Contents | Suppress showing table contents. | False |
| share_model | Share Model | Share trained models in an account. | False |
| refit_full | Refit Full | Retrain models on all data. Choices: best, false, default. | default |
Accepted eval_metric values:
- Binary & Multiclass: accuracy, balanced_accuracy, f1, f1_macro, f1_micro, f1_weighted, average_precision, precision, precision_macro, precision_micro, precision_weighted, recall, recall_macro, recall_micro, recall_weighted, log_loss (multiclass default), pac_score
- Binary only: roc_auc (binary default), roc_auc_ovo_macro
- Regression: root_mean_squared_error (default), mean_squared_error, mean_absolute_error, median_absolute_error, r2
- Quantile Regression: pinball_loss (default)
For more details, see the AutoGluon documentation.