# Chapter 13: Workflows ## The Problem You've built segments, journeys, and queries. But someone still needs to run the daily data refresh. Someone needs to execute the weekly report. Someone needs to coordinate the data pipeline that feeds your segments. That someone shouldn't be you clicking buttons every morning. ## The Key Idea Core concept Workflows automate recurring tasks. AI writes workflow definitions that run on schedule without human intervention. Think of workflows as the invisible machinery that keeps your marketing operations running—data flowing, tables updating, reports generating. ## What Workflows Do Common marketing workflows: | Task | Frequency | What Happens | | --- | --- | --- | | Update customer metrics | Daily | Run SQL to calculate latest values | | Prepare segment data | Daily | Transform raw data into segment-ready tables | | Generate reports | Weekly | Calculate metrics, create summary tables | | Archive old data | Monthly | Move historical records | ## The Workflow Pattern Just like segments and journeys, workflows follow pull → edit → push: ```mermaid flowchart LR A[Pull] --> B[Edit] B --> C[Push] C --> D[Run] ``` ```bash # Pull existing workflow project tdx wf pull myproject # Edit locally, then push changes tdx wf push # Run the workflow manually tdx wf run myproject ``` ## Creating Your First Workflow Tell AI what you need automated: ``` > "Create a workflow that updates customer metrics every morning at 6am" ``` AI generates a workflow file (`workflows/daily-metrics/daily-metrics.dig`): ```yaml timezone: America/New_York schedule: daily>: 06:00:00 _export: td: database: marketing engine: presto +update_customer_metrics: td>: queries/update_customer_metrics.sql create_table: customer_metrics ``` ## Understanding the Workflow ```yaml timezone: America/New_York ``` All times interpreted in this timezone. ```yaml schedule: daily>: 06:00:00 ``` Run every day at 6:00 AM. ```yaml _export: td: database: marketing engine: presto ``` Default settings for all `td>` tasks in this workflow. ```yaml +update_customer_metrics: td>: queries/update_customer_metrics.sql create_table: customer_metrics ``` A task that runs a SQL file and creates/replaces a table with the results. ## The td> Operator The `td>` operator runs SQL queries. It's the core of TD workflows: ```yaml # Run SQL from a file +from_file: td>: queries/analysis.sql create_table: results # Run inline SQL +inline_query: td>: query: | SELECT customer_id, SUM(amount) as total_spent FROM orders WHERE TD_TIME_RANGE(time, '${session_date}', NULL) GROUP BY 1 insert_into: daily_totals ``` **Output options:** - `create_table:` — Replace table with results - `insert_into:` — Append results to existing table - `store_last_results: true` — Store results for conditional logic ## Task Sequences Tasks run in order. Later tasks wait for earlier ones: ```yaml +step_1_extract: td>: queries/extract_orders.sql create_table: orders_staging +step_2_transform: td>: queries/calculate_metrics.sql create_table: customer_metrics +step_3_aggregate: td>: queries/build_summary.sql create_table: daily_summary ``` Step 2 waits for step 1. Step 3 waits for step 2. ## Parallel Tasks Run independent tasks simultaneously: ```yaml +calculate_metrics: _parallel: true +revenue_metrics: td>: queries/revenue_metrics.sql create_table: revenue_metrics +engagement_metrics: td>: queries/engagement_metrics.sql create_table: engagement_metrics +product_metrics: td>: queries/product_metrics.sql create_table: product_metrics +combine_results: td>: queries/combine_metrics.sql create_table: all_metrics ``` All three metrics queries run at the same time, then `combine_results` runs after. ## Scheduling Options **Daily:** ```yaml schedule: daily>: 06:00:00 ``` **Hourly:** ```yaml schedule: hourly>: 00:00 ``` **Weekly:** ```yaml schedule: weekly>: Mon,06:00:00 ``` **Multiple times per day:** ```yaml schedule: cron>: 0 6,12,18 * * * ``` **Specific days:** ```yaml schedule: cron>: 0 9 1,15 * * # 1st and 15th of each month ``` ## Session Variables Built-in variables for dates: ```yaml +daily_report: td>: query: | SELECT * FROM events WHERE TD_TIME_RANGE(time, '${session_date}', TD_TIME_ADD('${session_date}', '1d')) create_table: daily_events_${session_date_compact} ``` | Variable | Example Value | Use Case | | --- | --- | --- | | `${session_date}` | 2024-12-15 | Date filtering | | `${session_time}` | 2024-12-15T06:00:00+00:00 | Exact timestamp | | `${session_date_compact}` | 20241215 | Table naming | | `${last_session_date}` | 2024-12-14 | Previous run date | ## Variables and Parameters Make workflows reusable: ```yaml _export: td: database: marketing target_days: 30 +recent_customers: td>: query: | SELECT * FROM customers WHERE last_purchase_date >= DATE_ADD('day', -${target_days}, CURRENT_DATE) create_table: recent_customers ``` Run with different parameters: ```bash tdx wf run daily-metrics --param target_days=90 ``` ## Error Handling What happens when something fails? ```yaml +critical_update: td>: queries/update_metrics.sql create_table: customer_metrics _retry: 3 _error: +log_failure: td>: query: | INSERT INTO workflow_errors VALUES ('${session_id}', 'critical_update failed', CURRENT_TIMESTAMP) ``` If `critical_update` fails after 3 retries, the `_error` block runs. ## Retry Logic Automatically retry transient failures: ```yaml +api_dependent_query: _retry: 3 _retry_interval: 60 td>: queries/external_data.sql create_table: external_metrics ``` Try up to 3 times, waiting 60 seconds between attempts. **Exponential backoff:** ```yaml +flaky_task: _retry: limit: 5 interval: 30 interval_type: exponential td>: queries/large_query.sql ``` ## Conditional Execution Run tasks based on query results: ```yaml +check_data: td>: query: | SELECT COUNT(*) as cnt FROM new_orders WHERE date = '${session_date}' store_last_results: true +process_if_data: if>: ${td.last_results.cnt > 0} _do: +process_orders: td>: queries/process_orders.sql create_table: processed_orders ``` ## A Complete Marketing Workflow Here's a realistic daily workflow: ```yaml timezone: America/New_York schedule: daily>: 05:00:00 _export: td: database: marketing engine: presto # Step 1: Update base metrics +update_metrics: td>: queries/update_customer_metrics.sql create_table: customer_metrics # Step 2: Prepare segment data (parallel) +prepare_segments: _parallel: true +high_value_data: td>: queries/high_value_customers.sql create_table: segment_high_value +recent_buyers_data: td>: queries/recent_buyers.sql create_table: segment_recent_buyers +at_risk_data: td>: queries/at_risk_customers.sql create_table: segment_at_risk # Step 3: Build summary report +daily_summary: td>: query: | SELECT '${session_date}' as report_date, (SELECT COUNT(*) FROM segment_high_value) as high_value_count, (SELECT COUNT(*) FROM segment_recent_buyers) as recent_buyers_count, (SELECT COUNT(*) FROM segment_at_risk) as at_risk_count, CURRENT_TIMESTAMP as generated_at insert_into: daily_reports # Global error handler _error: +log_error: td>: query: | INSERT INTO workflow_errors VALUES ('daily-marketing', '${session_id}', '${task_name}', CURRENT_TIMESTAMP) ``` ## Managing Workflows **List workflow projects:** ```bash tdx wf projects ``` **Pull a project to edit locally:** ```bash tdx wf pull daily-metrics # Creates: workflows/daily-metrics/ ``` **Push changes back:** ```bash tdx wf push daily-metrics ``` **View sessions and task status:** ```bash tdx wf sessions daily-metrics tdx wf attempt tasks ``` **View logs for a specific task:** ```bash tdx wf attempt logs +update_metrics ``` **Run manually:** ```bash tdx wf run daily-metrics ``` ## Mental Model: Assembly Line Think of workflows as an assembly line: ``` [Raw Data] → [Extract] → [Transform] → [Load] → [Ready for Segments] ``` Each station does one job. The line runs automatically on schedule. If a station breaks, the line stops and logs the error. ## Pitfalls **"The workflow didn't run."** Check the schedule and timezone: ``` > "When is daily-metrics scheduled to run next?" ``` **"A task is taking too long."** Large queries might need optimization. Check execution time: ```bash tdx wf attempt tasks ``` **"The query failed with a timeout."** Add retry logic and consider breaking into smaller queries: ```yaml +large_query: _retry: 2 td>: queries/optimized_query.sql ``` ## What You've Learned - Workflows automate recurring SQL tasks - The `td>` operator runs queries and creates tables - Tasks run sequentially or in parallel with `_parallel: true` - Schedules can be daily, hourly, weekly, or custom cron - Session variables like `${session_date}` make queries dynamic - Error handling with `_error` blocks prevents silent failures - Retry logic handles transient issues ## Next Step Your data pipelines run automatically. [Chapter 14](/treasure-code/book/14-agents) shows you how to build AI agents—giving your team members AI-powered assistants that can answer questions about customer data without needing Claude Code. *Your workflows are automated. Next, you'll build AI agents for your team.*