# Time Series Analysis for Sales Prediction With Prophet Examples Treasure Workflow provides for the prediction of time-series values, like sales revenue or page views, using Facebook Prophet. Machine Learning algorithms can be run from a custom Python script as part of your scheduled workflows. Use[ Facebook Prophet](https://facebook.github.io/prophet) in your Python custom script for time series analysis and sales data prediction. Prophet is a procedure for forecasting time series data. It can learn probability distributions from incomplete data sets and shifts in the trends. Details of the data for these examples can be found in[ the Prophet’s official document](https://facebook.github.io/prophet/docs/non-daily_data.md#monthly-data). ![](/assets/image-20191007-173703.ef21fb3c08a9110670a4b3b5da0c2bacab34f7d6070295e67de7c1b0bb1e1ba7.2e4f387b.png) These examples predict sales data using Facebook Prophet's time series analysis to predict continuous values using past data. Depending on your access to S3 and Slack, pick the example to use. ## Example Workflow to Predict Future Sales with S3 and Slack The workflow: * Fetches past sales data from Treasure Data * Builds a model with Prophet * Predicts future sales and write back to Treasure Data * Uploads predicted figures to Amazon S3 * Sends a notification to Slack ### Prerequisites * Make sure that the custom scripts feature is enabled for your TD account. * Download and install the TD Toolbelt and the TD Toolbelt Workflow module. For more information, see [TD Workflow Quickstart.](/products/customer-data-platform/data-workbench/workflows/treasure-workflow-quick-start-using-td-toolbelt-in-a-cli) * Basic Knowledge of Treasure Workflow's syntax * S3 bucket and associated credentials * Create your Slack[ webhook](https://api.slack.com/incoming-webhooks) URL ### Running the Example Workflow 1. Download the project from this [repository](https://github.com/treasure-data/treasure-boxes/tree/master/machine-learning-box/sales-prediction) 2. In the command line Terminal window, change directory to timeseries. For example: ```bash cd timeseries ``` 1. Run *data.sh* to prepare example data. It creates a database called timeseries and table called retail_sales. ```bash ./data.sh ``` 1. Run the example workflow as follows: ```bash td workflow push prophet td workflow secrets \ --project prophet \ --set apikey \ --set endpoint \ --set s3_bucket \ --set aws_access_key_id \ --set aws_secret_access_key \ --set slack_webhook_url # Set secrets from STDIN like: apikey=x/xxxxx, endpoint=https://api.treasuredata.com, s3_bucket=$S3_BUCKET, # aws_access_key_id=AAAAAAAAAA, aws_secret_access_key=XXXXXXXXX,  slack_webhook_url=https://hooks.slack.com/services/XXXXXXX/XXXXXX/XXXXXX # Where XXXXXXX/XXXXXX/XXXXXX is the value of the slack URL where you want information to be populated automatically. $ td workflow start prophet predict_sales --session now ``` A notification of prediction results is sent to your Slack: Prediction results are stored in the predicted_sales table in the timeseries database. Validate predicted sales values in TD Console with SQL as follows: ```sql SELECT ds, yhat, yhat_lower, yhat_upper FROM timeseries.predicted_sales ORDER BY ds DESC LIMIT 100 ``` You can see the predicted sales numbers in that column. ## Example Workflow to Predict Future Sales without S3 or Slack This example is for when you do not have access to S3 or Slack. The workflow: * Fetches past sales data from Treasure Data * Builds a model with Prophet * Predicts future sales and writes back to Treasure Data ### Prerequisites * Make sure the custom scripts feature is enabled for your TD account. * Download and install the TD Toolbelt and the TD Toolbelt Workflow module. For more information, see [TD Workflow Quickstart.](/products/customer-data-platform/data-workbench/workflows/treasure-workflow-quick-start-using-td-toolbelt-in-a-cli) * Basic Knowledge of Treasure Workflow's syntax ### Running the Example Workflow 1. Download the project from this [repository](https://github.com/treasure-data/treasure-boxes/tree/master/machine-learning-box/sales-prediction) 2. In the command line Terminal window, change directory to timeseries. For example: ```bash cd timeseries ``` 1. Run *data.sh* to prepare example data. It creates a database called timeseries and table called retail_sales. ```bash ./data.sh ``` ```bash td workflow push prophet td workflow secrets \ --project prophet \ --set apikey \ --set endpoint # Set secrets from STDIN like: apikey=x/xxxxx, endpoint=https://api.treasuredata.com td workflow start prophet predict_sales_simple --session now ``` Predictions are stored in Treasure Data. ### Review the Workflow Custom Python Script Make sure you are signed into a git account that has permission to see Treasure Data workflow samples. Review the contents of the directory: * [predict_sales.dig](https://github.com/treasure-data/workflow-examples/blob/master/machine-learning/time_series/predict_sales.dig) - Example workflow for sales prediction and notification to Slack. * [predict.py](https://github.com/treasure-data/workflow-examples/blob/master/machine-learning/time_series/predict.py) - Custom Python script with Prophet. It also uploads figures to Amazon S3 If you don't need to send a notification to Slack, you can remove "*+send_graph* " step in the predict_sales.dig. This example uses a dataset for sales data. You can use your own data in Treasure Data after modifying the query in the read_td function. You must prepare the *ds* (datestamp) column and the *y* column. The y column represents target numerical values for forecasting, such as sales values or website page views (PVs). Here is example code for page view logs: ```python import pytd.pandas_td as td engine = td.create_engine('presto:sample_datasets') start_date = '2014-01-01' end_date = '2018-12-31' df = td.read_td(f"""select  TD_TIME_FORMAT(TD_DATE_TRUNC('minute', time), 'yyyy-MM-dd HH:mm:ss') as ds,  count(1) as yFrom  www_accessWhere  TD_TIME_RANGE(time, '{start_date}', '{end_date}', 'PDT')group by  1Order by  1""",engine) m = Prophet(changepoint_prior_scale=0.01).fit(df) future = m.make_future_dataframe(periods=5, freq='M') fcst = m.predict(future) ```