Skip to content
Last updated

Memory Usage and Dataset Sizes

The memory usage of a Gluon Train notebook is primarily influenced by the dataset's size in terms of both records (rows) and columns.

When choosing an instance for training, a user should consider these dimensions and then allocate an appropriate memory size.

To help users in figuring out memory size requirements for different datasets, this page provides some sample memory usage corresponding to various numbers of records and columns within a specific dataset for your reference.

Assumptions

Memory usage when varying the number of records

Specifying a large table ( such as one containing 100 million records ) as input may lead to an out-of-memory error. In such instances, it is recommended to configure the sampling_threshold parameter within the Gluon Train notebook.

Treasure Data AutoML uses 10M for s ampling_threshold by default and usually that is enough. You may not need to use the whole dataset for training.

number of records (Million)number of columnsmemory usage (GiB)
13927.7
1039122.3
2039135.3
3039199.8
4039266.5
5039338.3

Memory usage when varying the number of columns

number of records (Million)number of columnsmemory usage (GiB)
5020206
5039338.3