Incremental Mini Batch Prediction
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

This document describes how to implement incremental prediction and provides a best practice to cope with low latency requirements.

There are three main patterns for prediction:

Online
Offline
Incremental

Online prediction: Typically implemented as a REST API taking a record in JSON as the input and returning a prediction result. The time taken to complete each prediction can take from milliseconds to seconds. The response for each API call can be complex to calculate because it requires multiple joins, for example, resolving user attributes from cookie id. Ensemble models might not meet low latency requirements for REST API calls.
Offline prediction: Typically scheduled as a daily batch job. It can achieve high throughput for predictions but it generally takes tens of minutes to several hours to complete each batch of prediction.
Incremental (mini-batch) prediction: Similar to batch prediction with the difference that tasks can be split into smaller batches. It can take seconds to minutes to complete each mini-batch of prediction. It can not be applied for strict real-time requirements but can be applied for semi-realtime scenarios where a few minutes' latency is acceptable.

When storing prediction results in a key-value store, it is possible to achieve milliseconds for the latency.

Implement incremental mini-batch predictions

Consider the use case of assigning scores to new users, who requested a sales callback for an insurance product. However, the number of users is large and it is required to prioritize them by predicted LTV (Lifetime Value) scores.

Many new customers need to be processed during each five minutes batch and some existing customers may have additional records. So, it's expected to calculate new LTV scores for users who accessed in the last five minutes. Prediction need to be completed within five minutes before the next batch starts.

Assuming the prediction result is stored on ${predicted_table}, you can retrieve the user details sorted by the latest LTV scores, using the following SQL query:

Get latest LTV score for each user

WITH scores AS (
  SELECT 
    userid,
    td_last(time, score) as score # use the latest prediction score
  FROM
    ${predicted_table}
  WHERE
    score >= 0.7 # filter by score threshold
  GROUP BY
    userid
)
SELECT
  l.userid, l.score,
  r.user_name, r.user_age, r.user_email, r.user_tel_no
FROM 
  scores l
  JOIN user_info r ON (l.userid=r.userid)
ORDER BY
  score DESC 
-- LIMIT 100

The following workflow can be used for scheduling mini-batch prediction that appends LTV scores to users who accessed in the last 5 minutes.

A workflow to run mini-batch prediction

# predict every 5 min
timezone: Asia/Tokyo
schedule:
  cron>: */5 * * * * 

# List users who accessed in the last 5 minutes
+list_users:
  td>: queries/list_users.sql
  store_last_results: true

# Run mini-batch predictions
+gluon_predict:
  ml_predict>:
    notebook: gluon_predict
    model_name: ${model_name}
    input_table: ${input_database}.${input_table} 
    output_table: ${output_database}.${output_table}
    output_mode: append # append scores to output_table
    rowid_column: userid
    rowid_filter: ${td.last.results.users}

It is assumed that the prediction model, specified in ${model_name}, is created on a monthly basis and a verified well-performing model is used for prediction.

In "queries/list_users.sql ", the ids of users who accessed in the last 5 minutes can be listed as follows:

queries/list_users.sql

SELECT 
  -- comma separated list of users: '111','222'
  array_join(transform(array_agg(DISTINCT userid), x -> '''' || x || ''''), ',')  as users
FROM
  session
WHERE
  time >= TD_TIME_ADD(TD_SCHEDULED_TIME(), '-5m', 'JST') -- access in the last 5 minutes

The "rowid_column " and "rowid_filter " are used to issue the following SQL query and only the matched rows are used for prediction.

SELECT * FROM ${input_database}.{input_table}
WHERE {rowid_column} in ({rowid_filter})

Incremental Mini Batch PredictionCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code

Was this helpful?

Incremental Mini Batch Prediction
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code