Memory Usage and Dataset Sizes
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

The memory usage of a Gluon Train notebook is primarily influenced by the dataset's size in terms of both records (rows) and columns.

When choosing an instance for training, a user should consider these dimensions and then allocate an appropriate memory size.

To help users in figuring out memory size requirements for different datasets, this page provides some sample memory usage corresponding to various numbers of records and columns within a specific dataset for your reference.

Assumptions

Input Dataset: Criteo 1TB Dataset (https://ailab.criteo.com/download-criteo-1tb-click-logs-dataset/)
Problem Type: Classification
Time Limit: 6 * 60 * 60 seconds ( = 6 hours)

Memory usage when varying the number of records

Specifying a large table ( such as one containing 100 million records ) as input may lead to an out-of-memory error. In such instances, it is recommended to configure the sampling_threshold parameter within the Gluon Train notebook.

Treasure Data AutoML uses 10M for s ampling_threshold by default and usually that is enough. You may not need to use the whole dataset for training.

number of records (Million)	number of columns	memory usage (GiB)
1	39	27.7
10	39	122.3
20	39	135.3
30	39	199.8
40	39	266.5
50	39	338.3

Memory usage when varying the number of columns

number of records (Million)	number of columns	memory usage (GiB)
50	20	206
50	39	338.3

Memory Usage and Dataset SizesCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code

Assumptions

Memory usage when varying the number of records

Memory usage when varying the number of columns

Was this helpful?

Memory Usage and Dataset Sizes
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code