Skip to content
Last updated

ML Experiment Tracking and Model Management

ML experiment tracking is the process of organizing, recording, and analyzing the results of machine learning experiments. This document explains how to create a workflow to enable ML experiment tracking.

You can find the complete ML experiment tracking workflow code in Treasure Boxes

Table of Contents

Track ML Experiments

As a best practice, as part of an end-to-end data processing workflow, you should track each ML experiment using a "track_experiment" task following a train task. The track_experiment task issues a SQL query to record ML experiment information and the model name into a TD table named "automl_experiments". Sample Workflow Code, is as follows:

+create_db_tbl_if_not_exists:
  td_ddl>: null
  create_databases:
    - '${ output_database}'
  create_tables:
    - automl_experiments
    - automl_eval_results
+train:
  ml_train>:
    docker:
      task_mem: 128g
    notebook: gluon_train
    model_name: 'gluon_model_${session_id}'
    input_table: '${input_database}.${train_data_table}'
    target_column: '${target_column}'
    time_limit: '${fit_time_limit}'
    share_model: true
    export_leaderboard: '${output_database}.leaderboard_${train_data_table}'
    export_feature_importance: '${output_database}.feature_importance_${train_data_table}'
+track_experiment:
  td>: queries/track_experiment.sql
  insert_into: '${output_database}.automl_experiments'
  last_executed_notebook: '${automl.last_executed_notebook}'
  user_id: '${automl.last_executed_user_id}'
  user_email: '${automl.last_executed_user_email}'
  model_name: 'gluon_model_${session_id}'
  shared_model: '${automl.shared_model}'
  task_attempt_id: '${attempt_id}'
  session_time: '${session_local_time}'
  engine: presto

The above workflow code generates the following example content in the automl_experiments table:

task_attempt_idsession_timeuser_iduser_emailmodel_nameshared_modelnotebook_url
8497793332023-05-18 7:19:187776xxx@treasure-data.comgluon_model_161722236b4a568da-e6f3-4057-b694-e2e19bf0e924https://console.treasuredata.com/app/workflows/automl/notebook/4a3c431b3aea4705b32a47d85ca46368
8497726212023-05-18 7:08:307776xxx@treasure-data.comgluon_model_16172104694ad5d0e-89ac-4836-99c4-2bc8f975ccbehttps://console.treasuredata.com/app/workflows/automl/notebook/b390b932d4a64fd3a2dc3b75503430fb
8497681232023-05-18 7:01:137777yyy@treasure-data.comgluon_model_1617203374f2351a3-dd8c-418e-8057-4c8ec9a90cbehttps://console.treasuredata.com/app/workflows/automl/notebook/e8b3319c982345a48ff74db0003d7c9c
8497609422023-05-18 6:49:507776xxx@treasure-data.comgluon_model_16171867693e68b09-1a2f-4049-bb89-2bfe596ca9b3https://console.treasuredata.com/app/workflows/automl/notebook/b02959b1469e4b9c86ec6c6809acc5ff
8497531992023-05-18 6:36:367776xxx@treasure-data.comgluon_model_161717236a7e456d3-8fcf-4173-afb7-f2d58bb985cdhttps://console.treasuredata.com/app/workflows/automl/notebook/d3dcbbab99774bd594106a496ec2b2ab

In the table, each records contains model name, details of the user who created the models, the session time when a model is created, and link to the generated notebook.

Record Evaluation Results for each Model

You can optionally record each model's quality using an evaluation dataset. The following workflow is an example recording model quality that uses AUROC, a standard evaluation measure for classification problems. The record_evaluation task records evaluation results in the automl_eval_results table.

+predict:
  ml_predict>:
    docker:
      task_mem: 64g
    notebook: gluon_predict
    model_name: 'gluon_model_${session_id}'
    input_table: '${input_database}.${test_data_table}'
    output_table: '${output_database}.predicted_${test_data_table}_${session_id}'
+evaluation:
  td>: queries/auc.sql
  table: '${output_database}.predicted_${test_data_table}_${session_id}'
  target_column: '${target_column}'
  positive_class: ' >50K'
  store_last_results: true
  engine: hive
+record_evaluation:
  td>: queries/record_evaluation.sql
  insert_into: '${output_database}.automl_eval_results'
  engine: presto
  model_name: 'gluon_model_${session_id}'
  test_table: '${input_database}.${test_data_table}'
  session_time: '${session_local_time}'
  auc: '${td.last_results.auc}'

Treasure Data's Hive execution engine supports Hivemall, which supports a number of evaluation measures. See Hivemall document for details

Example content in "automl_eval_results" table:

session_timemodel_nameml_datasets.gluon_testauroc
2023-06-06 6:21:40gluon_model_164947310ml_datasets.gluon_test0.9226243033
2023-06-14 6:49:22gluon_model_166350110ml_datasets.gluon_test0.9299335758
2023-06-15 7:35:30gluon_model_166532223ml_datasets.gluon_test0.9300292252
2023-05-18 7:19:18gluon_model_161722236ml_datasets.gluon_test0.9238149699

Detect Drift in Model Performance over Time

"Drift" is a term used in machine learning to describe how the performance of a machine learning model slowly gets worse or stale over time. There are two main types for drifts: data drift and concept drift. Both data drift and concept drift can lead to a decline in the performance of a machine learning model.

Using the following workflow tasks, you can records each model's accuracy and quality to detect drift in data and model performance. You can use a scheduled workflow job to keep track of model performance and give a warning if the model performance drifts.

There are several schemes for drift detection. See the following example workflow to identify a degradation in ML model performance using an evaluation measure. When a drift is detected, you can trigger an alert email, as follows:

# timezone: PST
# schedule:
#  daily>: 07:00:00
+evaluation:
  td>: queries/auc.sql
  table: '${output_database}.predicted_${test_data_table}_${session_id}'
  target_column: '${target_column}'
  positive_class: ' >50K'
  store_last_results: true
  engine: hive
+alert_if_drift_detected:
  if>: '${td.last_results.auc < 0.93}'
  _do: null
mail>: null
data: 'Detect drift in model performance. AUC was ${td.last_results.auc}.'
subject: Drift detected
to:
  - me@example.com
bcc:
  - foo@example.com
  - bar@example.com

You can schedule workflow executions for drift detection. And when drift is detected, you can send alert email or rebuild a model using a conditional operator.