ML experiment tracking is the process of organizing, recording, and analyzing the results of machine learning experiments. This document explains how to create a workflow to enable ML experiment tracking.
You can find the complete ML experiment tracking workflow code in Treasure Boxes
Table of Contents
- Track ML Experiments
- Record Evaluation Results for each Model
- Detect Drift in Model Performance over Time
As a best practice, as part of an end-to-end data processing workflow, you should track each ML experiment using a "track_experiment" task following a train task. The track_experiment task issues a SQL query to record ML experiment information and the model name into a TD table named "automl_experiments". Sample Workflow Code, is as follows:
+create_db_tbl_if_not_exists:
td_ddl>: null
create_databases:
- '${ output_database}'
create_tables:
- automl_experiments
- automl_eval_results
+train:
ml_train>:
docker:
task_mem: 128g
notebook: gluon_train
model_name: 'gluon_model_${session_id}'
input_table: '${input_database}.${train_data_table}'
target_column: '${target_column}'
time_limit: '${fit_time_limit}'
share_model: true
export_leaderboard: '${output_database}.leaderboard_${train_data_table}'
export_feature_importance: '${output_database}.feature_importance_${train_data_table}'
+track_experiment:
td>: queries/track_experiment.sql
insert_into: '${output_database}.automl_experiments'
last_executed_notebook: '${automl.last_executed_notebook}'
user_id: '${automl.last_executed_user_id}'
user_email: '${automl.last_executed_user_email}'
model_name: 'gluon_model_${session_id}'
shared_model: '${automl.shared_model}'
task_attempt_id: '${attempt_id}'
session_time: '${session_local_time}'
engine: prestoThe above workflow code generates the following example content in the automl_experiments table:
| task_attempt_id | session_time | user_id | user_email | model_name | shared_model | notebook_url |
|---|---|---|---|---|---|---|
| 849779333 | 2023-05-18 7:19:18 | 7776 | xxx@treasure-data.com | gluon_model_161722236 | b4a568da-e6f3-4057-b694-e2e19bf0e924 | https://console.treasuredata.com/app/workflows/automl/notebook/4a3c431b3aea4705b32a47d85ca46368 |
| 849772621 | 2023-05-18 7:08:30 | 7776 | xxx@treasure-data.com | gluon_model_161721046 | 94ad5d0e-89ac-4836-99c4-2bc8f975ccbe | https://console.treasuredata.com/app/workflows/automl/notebook/b390b932d4a64fd3a2dc3b75503430fb |
| 849768123 | 2023-05-18 7:01:13 | 7777 | yyy@treasure-data.com | gluon_model_161720337 | 4f2351a3-dd8c-418e-8057-4c8ec9a90cbe | https://console.treasuredata.com/app/workflows/automl/notebook/e8b3319c982345a48ff74db0003d7c9c |
| 849760942 | 2023-05-18 6:49:50 | 7776 | xxx@treasure-data.com | gluon_model_161718676 | 93e68b09-1a2f-4049-bb89-2bfe596ca9b3 | https://console.treasuredata.com/app/workflows/automl/notebook/b02959b1469e4b9c86ec6c6809acc5ff |
| 849753199 | 2023-05-18 6:36:36 | 7776 | xxx@treasure-data.com | gluon_model_161717236 | a7e456d3-8fcf-4173-afb7-f2d58bb985cd | https://console.treasuredata.com/app/workflows/automl/notebook/d3dcbbab99774bd594106a496ec2b2ab |
In the table, each records contains model name, details of the user who created the models, the session time when a model is created, and link to the generated notebook.
You can optionally record each model's quality using an evaluation dataset. The following workflow is an example recording model quality that uses AUROC, a standard evaluation measure for classification problems. The record_evaluation task records evaluation results in the automl_eval_results table.
+predict:
ml_predict>:
docker:
task_mem: 64g
notebook: gluon_predict
model_name: 'gluon_model_${session_id}'
input_table: '${input_database}.${test_data_table}'
output_table: '${output_database}.predicted_${test_data_table}_${session_id}'
+evaluation:
td>: queries/auc.sql
table: '${output_database}.predicted_${test_data_table}_${session_id}'
target_column: '${target_column}'
positive_class: ' >50K'
store_last_results: true
engine: hive
+record_evaluation:
td>: queries/record_evaluation.sql
insert_into: '${output_database}.automl_eval_results'
engine: presto
model_name: 'gluon_model_${session_id}'
test_table: '${input_database}.${test_data_table}'
session_time: '${session_local_time}'
auc: '${td.last_results.auc}'Treasure Data's Hive execution engine supports Hivemall, which supports a number of evaluation measures. See Hivemall document for details
Example content in "automl_eval_results" table:
| session_time | model_name | ml_datasets.gluon_test | auroc |
|---|---|---|---|
| 2023-06-06 6:21:40 | gluon_model_164947310 | ml_datasets.gluon_test | 0.9226243033 |
| 2023-06-14 6:49:22 | gluon_model_166350110 | ml_datasets.gluon_test | 0.9299335758 |
| 2023-06-15 7:35:30 | gluon_model_166532223 | ml_datasets.gluon_test | 0.9300292252 |
| 2023-05-18 7:19:18 | gluon_model_161722236 | ml_datasets.gluon_test | 0.9238149699 |
"Drift" is a term used in machine learning to describe how the performance of a machine learning model slowly gets worse or stale over time. There are two main types for drifts: data drift and concept drift. Both data drift and concept drift can lead to a decline in the performance of a machine learning model.
Using the following workflow tasks, you can records each model's accuracy and quality to detect drift in data and model performance. You can use a scheduled workflow job to keep track of model performance and give a warning if the model performance drifts.
There are several schemes for drift detection. See the following example workflow to identify a degradation in ML model performance using an evaluation measure. When a drift is detected, you can trigger an alert email, as follows:
# timezone: PST
# schedule:
# daily>: 07:00:00
+evaluation:
td>: queries/auc.sql
table: '${output_database}.predicted_${test_data_table}_${session_id}'
target_column: '${target_column}'
positive_class: ' >50K'
store_last_results: true
engine: hive
+alert_if_drift_detected:
if>: '${td.last_results.auc < 0.93}'
_do: null
mail>: null
data: 'Detect drift in model performance. AUC was ${td.last_results.auc}.'
subject: Drift detected
to:
- me@example.com
bcc:
- foo@example.com
- bar@example.comYou can schedule workflow executions for drift detection. And when drift is detected, you can send alert email or rebuild a model using a conditional operator.