ML Experiment Tracking and Model Management
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

ML experiment tracking is the process of organizing, recording, and analyzing the results of machine learning experiments. This document explains how to create a workflow to enable ML experiment tracking.

You can find the complete ML experiment tracking workflow code in Treasure Boxes

Table of Contents

Track ML Experiments
Record Evaluation Results for each Model
Detect Drift in Model Performance over Time

Track ML Experiments

As a best practice, as part of an end-to-end data processing workflow, you should track each ML experiment using a "track_experiment" task following a train task. The track_experiment task issues a SQL query to record ML experiment information and the model name into a TD table named "automl_experiments". Sample Workflow Code, is as follows:

+create_db_tbl_if_not_exists:
  td_ddl>: null
  create_databases:
    - '${ output_database}'
  create_tables:
    - automl_experiments
    - automl_eval_results
+train:
  ml_train>:
    docker:
      task_mem: 128g
    notebook: gluon_train
    model_name: 'gluon_model_${session_id}'
    input_table: '${input_database}.${train_data_table}'
    target_column: '${target_column}'
    time_limit: '${fit_time_limit}'
    share_model: true
    export_leaderboard: '${output_database}.leaderboard_${train_data_table}'
    export_feature_importance: '${output_database}.feature_importance_${train_data_table}'
+track_experiment:
  td>: queries/track_experiment.sql
  insert_into: '${output_database}.automl_experiments'
  last_executed_notebook: '${automl.last_executed_notebook}'
  user_id: '${automl.last_executed_user_id}'
  user_email: '${automl.last_executed_user_email}'
  model_name: 'gluon_model_${session_id}'
  shared_model: '${automl.shared_model}'
  task_attempt_id: '${attempt_id}'
  session_time: '${session_local_time}'
  engine: presto

The above workflow code generates the following example content in the automl_experiments table:

task_attempt_id	session_time	user_id	user_email	model_name	shared_model	notebook_url
849779333	2023-05-18 7:19:18	7776	xxx@treasure-data.com	gluon_model_161722236	b4a568da-e6f3-4057-b694-e2e19bf0e924	https://console.treasuredata.com/app/workflows/automl/notebook/4a3c431b3aea4705b32a47d85ca46368
849772621	2023-05-18 7:08:30	7776	xxx@treasure-data.com	gluon_model_161721046	94ad5d0e-89ac-4836-99c4-2bc8f975ccbe	https://console.treasuredata.com/app/workflows/automl/notebook/b390b932d4a64fd3a2dc3b75503430fb
849768123	2023-05-18 7:01:13	7777	yyy@treasure-data.com	gluon_model_161720337	4f2351a3-dd8c-418e-8057-4c8ec9a90cbe	https://console.treasuredata.com/app/workflows/automl/notebook/e8b3319c982345a48ff74db0003d7c9c
849760942	2023-05-18 6:49:50	7776	xxx@treasure-data.com	gluon_model_161718676	93e68b09-1a2f-4049-bb89-2bfe596ca9b3	https://console.treasuredata.com/app/workflows/automl/notebook/b02959b1469e4b9c86ec6c6809acc5ff
849753199	2023-05-18 6:36:36	7776	xxx@treasure-data.com	gluon_model_161717236	a7e456d3-8fcf-4173-afb7-f2d58bb985cd	https://console.treasuredata.com/app/workflows/automl/notebook/d3dcbbab99774bd594106a496ec2b2ab

In the table, each records contains model name, details of the user who created the models, the session time when a model is created, and link to the generated notebook.

Record Evaluation Results for each Model

You can optionally record each model's quality using an evaluation dataset. The following workflow is an example recording model quality that uses AUROC, a standard evaluation measure for classification problems. The record_evaluation task records evaluation results in the automl_eval_results table.

+predict:
  ml_predict>:
    docker:
      task_mem: 64g
    notebook: gluon_predict
    model_name: 'gluon_model_${session_id}'
    input_table: '${input_database}.${test_data_table}'
    output_table: '${output_database}.predicted_${test_data_table}_${session_id}'
+evaluation:
  td>: queries/auc.sql
  table: '${output_database}.predicted_${test_data_table}_${session_id}'
  target_column: '${target_column}'
  positive_class: ' >50K'
  store_last_results: true
  engine: hive
+record_evaluation:
  td>: queries/record_evaluation.sql
  insert_into: '${output_database}.automl_eval_results'
  engine: presto
  model_name: 'gluon_model_${session_id}'
  test_table: '${input_database}.${test_data_table}'
  session_time: '${session_local_time}'
  auc: '${td.last_results.auc}'

Treasure Data's Hive execution engine supports Hivemall, which supports a number of evaluation measures. See Hivemall document for details

Example content in "automl_eval_results" table:

session_time	model_name	ml_datasets.gluon_test	auroc
2023-06-06 6:21:40	gluon_model_164947310	ml_datasets.gluon_test	0.9226243033
2023-06-14 6:49:22	gluon_model_166350110	ml_datasets.gluon_test	0.9299335758
2023-06-15 7:35:30	gluon_model_166532223	ml_datasets.gluon_test	0.9300292252
2023-05-18 7:19:18	gluon_model_161722236	ml_datasets.gluon_test	0.9238149699

Detect Drift in Model Performance over Time

"Drift" is a term used in machine learning to describe how the performance of a machine learning model slowly gets worse or stale over time. There are two main types for drifts: data drift and concept drift. Both data drift and concept drift can lead to a decline in the performance of a machine learning model.

Using the following workflow tasks, you can records each model's accuracy and quality to detect drift in data and model performance. You can use a scheduled workflow job to keep track of model performance and give a warning if the model performance drifts.

There are several schemes for drift detection. See the following example workflow to identify a degradation in ML model performance using an evaluation measure. When a drift is detected, you can trigger an alert email, as follows:

# timezone: PST
# schedule:
#  daily>: 07:00:00
+evaluation:
  td>: queries/auc.sql
  table: '${output_database}.predicted_${test_data_table}_${session_id}'
  target_column: '${target_column}'
  positive_class: ' >50K'
  store_last_results: true
  engine: hive
+alert_if_drift_detected:
  if>: '${td.last_results.auc < 0.93}'
  _do: null
mail>: null
data: 'Detect drift in model performance. AUC was ${td.last_results.auc}.'
subject: Drift detected
to:
  - me@example.com
bcc:
  - foo@example.com
  - bar@example.com

You can schedule workflow executions for drift detection. And when drift is detected, you can send alert email or rebuild a model using a conditional operator.

Track ML Experiments

Record Evaluation Results for each Model

Detect Drift in Model Performance over Time

Was this helpful?