Skip to content
Last updated

Time Series Forecasting

Time series forecasting is the task of fitting a model to historical, time-stamped data to predict future values. This notebook trains time-series models and forecasts future values using FLAML. The supported models are as follows:

This notebook also runs additional EDA steps and hold-out tests.

Assumed Input Table

This notebook assumes the following table format as the input of training.

tstamp..value
2022/04/21 10:00..50
2022/04/21 10:00..30
2022/04/21 11:00..70
2022/04/21 11:00..30
2022/04/21 12:00..100
2022/04/21 12:00..30

By default, we assume tstamp_column="tstamp" and target_column="value" but you can specify any column names for them.

Optionally, you can provide exogenous variables. When forecasting daily store sales of a drug store chain for instance, you can specify exogenous_columns: weather, promotions, store_type as auxiliary features explaining daily sales.

tstampweatherpromotionsstore_typesales
1960-12-01cloudy2city_large459
1961-01-01sunny1contry_small935
......
...
1965-12-01rainy0city_small886

Sample Output

If forecast_length=30 is specified, +30 further records to training data are forecasted. On the other hand, test_table is provided, forecast for the test data. The test_table must at least have tstamp_column, "tstamp" by default setting. A target_column, "value" by the default, is attached to the output_table.

Note pesudo_tstamp is used and resulted in addition to them if tstamp_column does not have valid datetime values.

tstampvalue
1960-12-010.29304519295692444
1961-01-010.00487339636310935
......
1965-12-010.5266873240470886

The visualization of show forecasted results is as follows:

Workflow Example

Find a sample workflow here in Treasure Boxes.

+run_ts_forecast:   ipynb>:     notebook: ts_forecast     train_table: ml_datasets.ts_airline     tstamp_column: period     forecast_length: 30     output_table: ml_test.ts_airline_predicted

Parameters

Parameter nameParameter on ConsoleDescriptionDefault Value
docker.task_memDocker Task MemTask memory size. Available values are 64g, 128g (default), 256g, 384g, or 512g depending on your contracted tiers128g
train_tableTrain Tablespecify a TD table used for training as dbname.table_name-
forecast_lengthForecast Lengthlength of forecasting output, either test_table or forecast_length is required-
forecast_freqForecast FreqExplicit frequency for forecasting. Accepted values: D - daily, W - weekly, M - monthly, Q - quarterly, Y - yearly. If not specified, the value is inferred from the data.-
test_tableTest TableTD table name used for prediction. Either test_table or forecast_length is required-
tstamp_columnTstamp ColumnA timestamp column to sort time series datatstamp
target_columnTarget Columncolumn name used for the labelvalue
output_tableOutput TableTD table name to export the prediction result-
output_modeOutput ModeOutput mode for exporting output_table: overwrite/replace or append. Usually no need to specify and "append" for semi-realtime prediction.overwrite
exogenous_columnsExogenous Columnscolumns that can be used as prediction input. Can use "*" to select all columns in the train_table-
ignore_columnsIgnore Columnscolumns to ignore as exogenous variablestime
estimatorsEstimatorsEstimators used for timeseries forecasting. Supported estimators: prophet,arima,lgbmprophet,arima,lgbm,xgboost,xgb_limitdepth
time_limitTime Limitsoft limit for training time budget in seconds60 * 60
sampling_thresholdSampling Thresholdthreshold used for sampling training data10_000_000
hide_table_contentsHide Table Contentssuppress showing table contentsfalse
calibrationCalibrationIf true, the output value will be calibrated.false