# Memory Usage and Dataset Sizes

The memory usage of a Gluon Train notebook is primarily influenced by the dataset's size in terms of both records (rows) and columns.

When choosing an instance for training, a user should consider these dimensions and then allocate an appropriate memory size.

To help users in figuring out memory size requirements for different datasets, this page provides some sample memory usage corresponding to various numbers of records and columns within a specific dataset for your reference.

### Assumptions

* Input Dataset: Criteo 1TB Dataset ([https://ailab.criteo.com/download-criteo-1tb-click-logs-dataset/](https://ailab.criteo.com/download-criteo-1tb-click-logs-dataset/))
* Problem Type: Classification
* Time Limit: 6 * 60 * 60 seconds ( = 6 hours)


### Memory usage when varying the number of records

Specifying a large table ( such as one containing 100 million records ) as input may lead to an out-of-memory error. In such instances, it is recommended to configure the *sampling_threshold* parameter within the Gluon Train notebook.

Treasure Data AutoML uses 10M for s *ampling_threshold* by default and usually that is enough. You may not need to use the whole dataset for training.

| number of records (Million) | number of columns | memory usage (GiB) |
|  --- | --- | --- |
| 1 | 39 | 27.7 |
| 10 | 39 | 122.3 |
| 20 | 39 | 135.3 |
| 30 | 39 | 199.8 |
| 40 | 39 | 266.5 |
| 50 | 39 | 338.3 |


![](https://lh7-us.googleusercontent.com/l3mPjs6iEhm_nVQxrTSKs4CYOvWtgXqGMK037jyOPcbf7nxvuRk7nSWYEHq_x-Lg8-nR5xhp1ytuRvvMQllsMmXzyanRb1UCsar5cuECWGZbDg7GSy0PE4k1FdtUA3fgKczOQ6jdJWzGi6LV-N8XfYo)

### Memory usage when varying the number of columns

| number of records (Million) | number of columns | memory usage (GiB) |
|  --- | --- | --- |
| 50 | 20 | 206 |
| 50 | 39 | 338.3 |