# Gluon Train

This notebook builds a prediction model using [AutoGluon](https://auto.gluon.ai/stable/index.md), automating data processing, feature engineering, model selection, ensembling, and hyperparameter tuning.

AutoGluon combines several ML models using stack ensembling. Supported models include:

* Neural network (MXNet, FastAI)
* Gradient Boosting Models (LightGBM, CatBoost, XGBoost)
* Random Forests
* Extremely Randomized Trees
* k-Nearest Neighbors


## Workflow Example

Find a sample workflow in [Treasure Boxes](https://github.com/treasure-data/treasure-boxes/blob/automl/machine-learning-box/automl/ml_experiment.dig).


```
+gluon_train:
  ml_train>:
    notebook: gluon_train
    model_name: gluon_model
    input_table: ml_dataset.bank_marketing
    target_column: loan 
    time_limit: 3 * 60 # soft time limit in seconds
```

## Parameters

| Parameter name | Console Name | Description | Default |
|  --- | --- | --- | --- |
| docker.task_mem | Docker Task Mem | Task memory size. Available: 64g, 128g (default), 256g, 384g, 512g (tier dependent). | 128g |
| input_table | Input Table | TD table used for EDA as dbname.table_name. | - |
| target_column | Target Column | Column name used for the label. | - |
| model_name | Model Name | Prediction model name. | - |
| problem_type | Problem Type | One of binary, multiclass, regression, quantile. Inferred if not specified. | None |
| oversampling_threshold | Oversampling Threshold | Threshold rate of minority class for SMOTE oversampling (binary only). 0 disables. | 0.001 |
| proba_calibration | Proba Calibration | Run probability calibration after oversampling. | True |
| eval_metric | Eval Metric | Automatically selected if not specified. | None |
| ignore_columns | Ignore Columns | Columns to ignore. | time |
| time_limit | Time Limit | Soft training limit in seconds (max 24h). Hint to AutoGluon. | 60 * 60 |
| sampling_threshold | Sampling Threshold | Threshold used for sampling. See executed notebook. | 10_000_000 |
| export_leaderboard | Export Leaderboard | Export leaderboard as TD table if specified. | None |
| export_feature_importance | Export Feature Importance | Export feature importance as TD table if specified. | None |
| exclude_models | Exclude Model | Models to ignore. | KNN |
| hide_table_contents | Hide Table Contents | Suppress showing table contents. | False |
| share_model | Share Model | Share trained models in an account. | False |
| refit_full | Refit Full | Retrain models on all data. Choices: best, false, default. | default |


Accepted eval_metric values:

* Binary & Multiclass: accuracy, balanced_accuracy, f1, f1_macro, f1_micro, f1_weighted, average_precision, precision, precision_macro, precision_micro, precision_weighted, recall, recall_macro, recall_micro, recall_weighted, log_loss (multiclass default), pac_score
* Binary only: roc_auc (binary default), roc_auc_ovo_macro
* Regression: root_mean_squared_error (default), mean_squared_error, mean_absolute_error, median_absolute_error, r2
* Quantile Regression: pinball_loss (default)


For more details, see the [AutoGluon documentation](https://auto.gluon.ai/0.3.1/api/autogluon.predictor.md?highlight=eval_metric#module-0).