Treasure Workflow provides for the prediction of time-series values, like sales revenue or page views, using Facebook Prophet. Machine Learning algorithms can be run from a custom Python script as part of your scheduled workflows. Use Facebook Prophet in your Python custom script for time series analysis and sales data prediction.

Prophet is a procedure for forecasting time series data. It can learn probability distributions from incomplete data sets and shifts in the trends. Details of the data for these examples can be found in the Prophet’s official document.


These examples predict sales data with Facebook Prophet doing time series analysis to predict continuous values using past data. Pick the example to use depending on your access to S3 and Slack.

Example Workflow to Predict Future Sales with S3 and Slack

The workflow:

  • Fetches past sales data from Treasure Data

  • Builds a model with Prophet

  • Predicts future sales and write back to Treasure Data

  • Uploads predicted figures to Amazon S3

  • Sends a notification to Slack

Prerequisites

  • Make sure that the custom scripts feature is enabled for your TD account.

  • Download and install the TD Toolbelt and the TD Toolbelt Workflow module. For more information, see TD Workflow Quickstart.

  • Basic Knowledge of Treasure Workflow's syntax

  • S3 bucket and associated credentials

  • Create your Slack webhook URL

Running the Example Workflow

  1. Download the project from this repository

  2. In the command line Terminal window, change directory to timeseries. For example:

cd timeseries
  1. Run data.sh to prepare example data. It creates a database called timeseries and table called retail_sales.

$ ./data.sh  


  • Run the example workflow as follows:


$ td workflow push prophet$ td workflow secrets \ --project prophet \  --set apikey \  --set endpoint \  --set s3_bucket \  --set aws_access_key_id \  --set aws_secret_access_key  --set slack_webhook_url# Set secrets from STDIN like: apikey=x/xxxxx, endpoint=https://api.treasuredata.com, s3_bucket=$S3_BUCKET,#              aws_access_key_id=AAAAAAAAAA, aws_secret_access_key=XXXXXXXXX,  slack_webhook_url=https://hooks.slack.com/services/XXXXXXX/XXXXXX/XXXXXX # Where XXXXXXX/XXXXXX/XXXXXX is the value of the slack URL where you want information to be populated automatically. $ td workflow start prophet predict_sales --session now


A notification of prediction results is sent to your Slack:



Prediction results are stored in the predicted_sales table in the timeseries database.

Validate predicted sales values in TD Console with SQL as follows:

select  ds, yhat, yhat_lower, yhat_upperfrom  timeseries.predicted_salesorder by  ds desclimit 100;

You can see the predicted sales numbers in yhat column like follows:


Example Workflow to Predict Future Sales without S3 or Slack

This example is for when you do not have access to S3 or Slack.

The workflow:

  • Fetches past sales data from Treasure Data

  • Builds a model with Prophet

  • Predicts future sales and writes back to Treasure Data

Prerequisites

  • Make sure the custom scripts feature is enabled for your TD account.

  • Download and install the TD Toolbelt and the TD Toolbelt Workflow module. For more information, see TD Workflow Quickstart.

  • Basic Knowledge of Treasure Workflow's syntax

Running the Example Workflow

  1. Download the project from this repository

  2. In the command line Terminal window, change directory to timeseries. For example:

    cd timeseries
  3. Run data.sh to prepare example data. It creates a database called timeseries and table called retail_sales.

$ ./data.sh 
  1. Run the example workflow as follows:

$ td workflow push prophet$ td workflow secrets \ --project prophet \  --set apikey \  --set endpoint \# Set secrets from STDIN like: apikey=x/xxxxx, endpoint=https://api.treasuredata.com$ td workflow start prophet predict_sales_simple --session now

Predictions are stored in Treasure Data.

Review the Workflow Custom Python Script

Make sure you are signed into a git account that has permission to see Treasure Data workflow samples.

Review the contents of the directory:

  • predict_sales.dig - Example workflow for sales prediction and notification to Slack.

  • predict.py - Custom Python script with Prophet. It also uploads figures to Amazon S3

If you don't need to send a notification to Slack, you can remove "+send_graph" step in the predict_sales.dig.

This example uses a dataset for sales data. You can use your own data in Treasure Data after modifying the query in the read_td function.

You must prepare the ds (datestamp) column and the y column. The y column represents target numerical values for forecasting, such as sales values or page view (PV)s of web sites.

Here is example code for page view logs:

import pytd.pandas_td as tdengine = td.create_engine('presto:sample_datasets')start_date = '2014-01-01'end_date = '2018-12-31'df = td.read_td(f"""select  TD_TIME_FORMAT(TD_DATE_TRUNC('minute', time), 'yyyy-MM-dd HH:mm:ss') as ds,  count(1) as yFrom  www_accessWhere  TD_TIME_RANGE(time, '{start_date}', '{end_date}', 'PDT')group by  1Order by  1""",engine)m = Prophet(changepoint_prior_scale=0.01).fit(df)future = m.make_future_dataframe(periods=5, freq='M')fcst = m.predict(future)



  • No labels