# Agent Commands

Manage LLM agents with YAML/Markdown configuration files.


```bash
tdx agent <command>   # Full command
tdx agents            # Shorthand alias for `tdx agent list`
```

## Overview

The `tdx agent` commands provide complete management of LLM agents:

| Command | Description |
|  --- | --- |
| [`list`](#list) | List agents in current project |
| [`show`](#show) | Show agent details |
| [`pull`](#pull) | Export agents from an LLM project to local YAML/Markdown files |
| [`push`](#push) | Import local agent files to an LLM project |
| [`clone`](#clone) | Clone a project to a new project |
| [`test`](#test) | Run automated tests against an agent |
| [`create`](#create) | Create a new agent (prefer `pull`/`push` workflow) |
| [`update`](#update) | Update an existing agent (prefer `pull`/`push` workflow) |
| [`delete`](#delete) | Delete an agent |


## Typical Usage


```bash
# 1. Create a new LLM project (or use an existing one)
tdx llm project create "My LLM Project" --description "Data analysis agents"

# 2. Pull project to local files (auto-sets context)
tdx agent pull "My LLM Project"
# Creates: agents/my-llm-project/

# 3. Edit YAML/Markdown files locally
# - Edit agents/my-llm-project/my-agent/prompt.md for system prompt
# - Edit agents/my-llm-project/my-agent/agent.yml for configuration

# 4. Preview changes before pushing
tdx agent push --dry-run

# 5. Push changes to the project
tdx agent push

# 6. Add tests and run them
# - Create agents/my-llm-project/my-agent/test.yml
tdx agent test ./agents/my-llm-project/my-agent/

# 7. Clone to another environment
tdx agent clone ./agents/my-llm-project/ --name "My Project (Staging)" --profile staging
```

Recommended Workflow
Use the **pull/push workflow** for agent development:

- Edit agents as YAML/Markdown files in your favorite editor
- Version control with Git for history and collaboration
- Use `--dry-run` to preview changes before pushing
- Add `test.yml` to validate agent behavior with automated tests


The `create`/`update` commands are available for quick one-off changes but the file-based workflow is recommended for maintainability.

## Folder Structure

When you pull agents from a project, the following folder structure is created:


```
agents/
└── {project-name}/                    # Normalized project name (kebab-case)
    ├── tdx.json                       # Project configuration
    ├── {agent-name}/                  # Normalized agent name
    │   ├── prompt.md                  # System prompt (editable markdown)
    │   ├── agent.yml                  # Agent configuration
    │   ├── starter_message.md         # Optional: multiline starter message
    │   └── test.yml                   # Optional: automated test definitions
    ├── knowledge_bases/               # Knowledge base definitions
    │   ├── {kb-name}.yml              # Table-based KB (queries TD database)
    │   └── {kb-name}.md               # Text-based KB (plain text content)
    ├── prompts/                       # Prompt definitions
    │   └── {prompt-name}.yml
    └── integrations/                  # Chat integration definitions
        └── {service-type}.yml         # e.g., chat_generic.yml
```

Name Collisions
If multiple agents have names that normalize to the same folder name (e.g., "My Agent" and "my-agent" both become `my-agent`), the pull operation appends numeric suffixes to avoid conflicts: `my-agent`, `my-agent-2`, `my-agent-3`, etc. The actual agent name is preserved in the `name:` field of `agent.yml`.

Integration Support
**Safe integrations** (Generic Chat, Agent Console, Parent Segment) are included in pull/push/clone operations. These are stored in `integrations/{service-type}.yml`.

**Sensitive integrations** (Slack, Webhook) are **not** synced because they contain secrets (Slack signing secrets, webhook credentials) that should not be version controlled. Configure these manually in the TD console for each environment.

## File Formats

### tdx.json

Project configuration file located in the project root folder.


```json
{
  "llm_project": "My LLM Project"
}
```

### agent.yml

Agent configuration file with model settings, tools, outputs, and variables.


```yaml
name: My Support Agent
description: Customer support assistant

# Model configuration
model: claude-4.5-sonnet
temperature: 0.7
max_tool_iterations: 5
reasoning_effort: medium    # none, minimal, low, medium, high

# Starter message (short inline, or use starter_message.md for long text)
starter_message: Hello! How can I help you today?

# Output definitions
outputs:
  - name: resolution_status
    function_name: get_resolution_status
    function_description: Returns the status of issue resolution
    json_schema: |
      {"type": "object", "properties": {"status": {"type": "string"}}}

# Tools - give the agent access to knowledge bases, other agents, etc.
tools:
  - type: knowledge_base
    target: '@ref(type: "knowledge_base", name: "support-kb")'
    target_function: SEARCH
    function_name: search_knowledge
    function_description: Search the support knowledge base

  - type: agent
    target: '@ref(type: "agent", name: "escalation-agent")'
    target_function: CHAT
    function_name: escalate_issue
    function_description: Escalate to senior support

# Variables (runtime inputs from knowledge base)
variables:
  - name: customer_context
    target_knowledge_base: '@ref(type: "knowledge_base", name: "customer-kb")'
    target_function: LOOKUP
    function_arguments: |
      {"query": "{{customer_id}}"}
```

#### Tools Configuration

Tools give agents access to external resources like knowledge bases, other agents, web search, image generation, and parent segment data. Each tool requires:

| Field | Description |
|  --- | --- |
| `type` | Tool type: `knowledge_base`, `agent`, `web_search`, `image_gen`, `parent_segment_kb` |
| `target` | Reference to the resource using `@ref(...)` syntax. Not needed for `parent_segment_kb` (singleton). |
| `target_function` | Function to call: `SEARCH`, `LOOKUP`, `CHAT`, `TEXT_TO_IMAGE`, etc. |
| `function_name` | Name exposed to the LLM (what the agent calls) |
| `function_description` | Description shown to the LLM |
| `output_mode` | Optional for agent tools: `RETURN` (default, returns complete response) or `SHOW` (shows response to user) |


**Knowledge Base Tool** - Search or lookup data from a knowledge base.

The `knowledge_base` type works with both table-based and text-based knowledge bases. The system automatically resolves the correct type based on the name:


```yaml
tools:
  # Table-based knowledge base (backed by TD table)
  - type: knowledge_base
    target: '@ref(type: "knowledge_base", name: "product-catalog")'
    target_function: SEARCH      # SEARCH, LOOKUP, LIST_COLUMNS
    function_name: search_products
    function_description: Search the product catalog for items

  # Text-based knowledge base (document/text content)
  - type: knowledge_base
    target: '@ref(type: "knowledge_base", name: "company-docs")'
    target_function: READ_TEXT   # READ_TEXT for text KBs
    function_name: read_docs
    function_description: Read company documentation
```

**Agent Tool** - Call another agent as a sub-agent:


```yaml
tools:
  - type: agent
    target: '@ref(type: "agent", name: "sql-expert")'
    target_function: CHAT
    function_name: run_sql_query
    function_description: Execute SQL queries using the SQL expert agent
    output_mode: RETURN         # RETURN (default) or SHOW
```

**Web Search Tool** - Search the web for real-time information:


```yaml
tools:
  - type: web_search
    target: '@ref(type: "web_search_tool", name: "web-search")'
    target_function: SEARCH
    function_name: search_web
    function_description: Search the web for current information
```

**Image Generation Tool** - Generate or modify images:


```yaml
tools:
  - type: image_gen
    target: '@ref(type: "image_generator", name: "image-gen")'
    target_function: TEXT_TO_IMAGE  # TEXT_TO_IMAGE, OUTPAINT, INPAINT, IMAGE_VARIATION, REMOVE_BACKGROUND
    function_name: generate_image
    function_description: Generate an image from a text description
```

**Parent Segment KB Tool** - Access CDP parent segment data (singleton per project, no `target` needed):


```yaml
tools:
  - type: parent_segment_kb
    target_function: LIST_SEGMENT_FOLDERS  # See below for all functions
    function_name: list_folders
    function_description: List segment folders in the project
```

Available `target_function` values for `parent_segment_kb`:

| Function | Description |
|  --- | --- |
| `LIST_SEGMENT_FOLDERS` | List all segment folders |
| `LIST_BY_FOLDER` | List segments in a folder |
| `LIST_ATTRIBUTES` | List available attributes |
| `LIST_BEHAVIORS` | List available behaviors |
| `GET_SEGMENT` | Get segment details |
| `GET_JOURNEY` | Get journey details |
| `GET_AUDIENCE` | Get audience details |
| `GET_QUERY` | Get query details |
| `QUERY_DATA_DIRECT` | Query segment data directly |
| `QUERY_SEGMENT_ANALYTICS` | Query segment analytics |


Singleton Tool
Unlike other tools, `parent_segment_kb` is a singleton per project - there's only one per project, so no `target` reference is needed. The tool automatically connects to the project's parent segment knowledge base.

#### Variables Configuration

Variables inject data from knowledge bases into the agent's context at runtime:


```yaml
variables:
  - name: user_profile           # Variable name used in prompt
    target_knowledge_base: '@ref(type: "knowledge_base", name: "users-kb")'
    target_function: LOOKUP      # LOOKUP for exact match, SEARCH for semantic
    function_arguments: |
      {"query": "{{user_id}}"}   # Template with runtime values
```

Use variables in `prompt.md` with `{{variable_name}}` syntax.

### prompt.md

System prompt file containing the agent's instructions. This is a plain markdown file that can be edited with any text editor.


```markdown
You are a helpful customer support agent for our e-commerce platform.

## Your Role

- Assist customers with order inquiries
- Provide product information
- Handle account issues

## Guidelines

Always be polite, professional, and empathetic.
If you cannot resolve an issue, escalate to a human agent.
```

### starter_message.md

Optional starter message file for multiline starter messages. If the starter message is short, you can include it directly in `agent.yml`.

### knowledge_bases/{name}.yml (Table-based)

Table-based knowledge base that queries a Treasure Data database.


```yaml
name: Support KB
type: database
database: customer_data
tables:
  - name: faq
    td_query: SELECT * FROM faq
    enable_data: true
    enable_data_index: true
```

### knowledge_bases/{name}.md (Text-based)

Text-based knowledge base containing plain text content. Uses YAML frontmatter for metadata.


```markdown
---
name: Product FAQ
---

# Frequently Asked Questions

## What is your return policy?
We offer a 30-day return policy for all unused items...

## How do I track my order?
You can track your order by logging into your account...
```

Text KB Name
If the `name` field is omitted from the frontmatter, the filename (without `.md`) is used as the knowledge base name.

Both table-based and text-based knowledge bases can be referenced using the same `@ref` syntax:


```yaml
tools:
  - type: knowledge_base
    target: '@ref(type: "knowledge_base", name: "Support KB")'      # Table-based
    # ...
  - type: knowledge_base
    target: '@ref(type: "knowledge_base", name: "Product FAQ")'     # Text-based
    # ...
```

### prompts/{name}.yml

Prompt template configuration.


```yaml
name: greeting-prompt
agent: '@ref(type: "agent", name: "support-agent")'
system_prompt: |
  Generate a personalized greeting...
template: |
  Customer Name: {{customer_name}}
json_schema_hint: |
  {"type": "object", "properties": {"customer_name": {"type": "string"}}}
```

### integrations/{service-type}.yml

Chat integration configuration for Generic Chat, Agent Console, or Parent Segment integrations.


```yaml
service_type: chat_generic
name: generic-chat-integration
chat_welcome_message: "Welcome! How can I help you today?"
chat_ignore_managed_actions: false
actions:
  - prompt: '@ref(type: "prompt", name: "support-prompt")'
    chat_widget_type: button
    chat_widget_label: Ask Support
    ui_tags:
      - "lang:en"
      - "lang:ja"
```

| Field | Description |
|  --- | --- |
| `service_type` | Integration type: `chat_generic`, `chat_agent_console`, or `chat_parent_segment` |
| `name` | Display name for the integration |
| `chat_welcome_message` | Welcome message shown when chat starts |
| `chat_ignore_managed_actions` | Whether to ignore managed actions |
| `actions` | List of actions/buttons shown in the chat widget |


Supported Integration Types
Only **safe** integration types are synced:

- `chat_generic` - Generic chat widget
- `chat_agent_console` - Agent console integration
- `chat_parent_segment` - Parent segment integration


`webhook` and `slack` integrations contain secrets and must be configured manually in the TD console.

## Reference Syntax

Use `@ref(...)` to reference other resources by name:


```yaml
# Reference a knowledge base
target: '@ref(type: "knowledge_base", name: "my-kb")'

# Reference another agent
target: '@ref(type: "agent", name: "my-agent")'

# Reference a prompt
prompt: '@ref(type: "prompt", name: "my-prompt")'
```

This allows resources to be referenced by name instead of UUID, making configurations portable across environments.

## Commands

### list

List agents in the current project.


```bash
tdx agent list [pattern]
tdx agents [pattern]          # Shorthand alias
```

**Options:**

- `-w, --web`: Show console URLs for each agent


**Examples:**


```bash
# List all agents in current project (uses llm_project context)
tdx agent list
tdx agents                    # Same as above

# Filter agents by pattern
tdx agent list "support-*"
tdx agents "support-*"        # Same as above

# List agents in a specific project
tdx agent list "my-project/support-*"

# Show with console URLs
tdx agent list -w
tdx agents -w                 # Same as above
```

### show

Show detailed information about a specific agent.


```bash
tdx agent show <agent-name>
```

**Examples:**


```bash
# Show agent details
tdx agent show "Support Agent"

# Show agent in JSON format
tdx agent show "Support Agent" --format json
```

### create

Create a new agent in the current project.

Prefer pull/push Workflow
For complex agents with tools, knowledge bases, or long prompts, use the **pull/push workflow** instead:

1. Create a minimal agent with this command
2. Run `tdx agent pull` to export to YAML/Markdown
3. Edit files locally and `tdx agent push` to update


```bash
tdx agent create <name> [options]
```

**Options:**

- `--system-prompt <text>`: System prompt/instructions
- `--model <name>`: Model type (default: `claude-4.5-sonnet`)
- `--starter-message <text>`: Initial greeting message
- `--max-tool-iterations <n>`: Maximum tool iterations (default: 4)
- `--temperature <n>`: Temperature 0.0-2.0 (default: 0.7)


**Examples:**


```bash
# Create basic agent
tdx agent create "My Agent"

# Create with system prompt
tdx agent create "SQL Expert" \
  --system-prompt "You are an expert in SQL and data analysis."

# Create with all options
tdx agent create "Data Analyst" \
  --system-prompt "Help users analyze data." \
  --model "claude-4.5-sonnet" \
  --starter-message "Hello! I can help you analyze your data." \
  --max-tool-iterations 8 \
  --temperature 0.5

# Create agent in a specific project
tdx agent create "MyProject/My Agent"
```

### update

Update an existing agent.

Prefer pull/push Workflow
For significant changes, use the **pull/push workflow** instead:


```bash
tdx agent pull "My Project"    # Export to YAML/Markdown
# Edit files locally
tdx agent push                  # Push changes
```

This provides better version control and allows editing complex prompts in your favorite editor.


```bash
tdx agent update <agent-name> [options]
```

**Options:**

- `--name <text>`: New agent name
- `--prompt <text>`: Agent prompt/instructions
- `--description <text>`: Agent description
- `--starter-message <text>`: Starter message


**Examples:**


```bash
# Update agent name
tdx agent update "Old Name" --name "New Name"

# Update system prompt
tdx agent update "My Agent" --prompt "Updated instructions..."

# Update multiple fields
tdx agent update "My Agent" \
  --description "Updated description" \
  --starter-message "New greeting!"
```

### delete

Delete an agent.


```bash
tdx agent delete <agent-name>
```

**Examples:**


```bash
# Delete an agent
tdx agent delete "My Agent"
```

### pull

Pull agents and resources from an LLM project to local files. Shows a YAML diff preview and asks for confirmation before writing files.


```bash
# Pull from current directory (uses tdx.json or context)
tdx agent pull

# Pull all resources from a project
tdx agent pull <project> [options]

# Pull from an existing local directory
tdx agent pull <local-dir> [options]

# Pull a specific agent
tdx agent pull <project> <agent-name> [options]
```

**Options:**

- `-o, --output <dir>`: Output directory (default: `agents/{project-name}/`)
- `--dry-run`: Preview changes without writing files
- `-f, --force`: Overwrite local changes without confirmation
- `-y, --yes`: Skip confirmation prompts


**Examples:**


```bash
# Pull from current directory (if inside a project folder with tdx.json)
tdx agent pull

# Pull entire project
tdx agent pull "My LLM Project"

# Pull to specific directory
tdx agent pull "My LLM Project" -o ./my-agents

# Pull from existing local directory
tdx agent pull ./agents/my-project/

# Pull specific agent
tdx agent pull "My LLM Project" "Support Agent"

# Preview what would be pulled (no files written)
tdx agent pull "My LLM Project" --dry-run

# Pull without confirmation prompt
tdx agent pull "My LLM Project" --yes
```

**Output:**


```
Pull summary for 'My LLM Project':
  + 4 new | ~ 3 changed | = 13 unchanged
  Agents: 1 new, 2 updated, 5 unchanged
  Knowledge Bases: 1 new, 0 updated, 2 unchanged
  Text Knowledge Bases: 1 new, 0 updated, 1 unchanged
  Prompts: 0 new, 1 updated, 3 unchanged
  Integrations: 1 new, 0 updated, 1 unchanged
  Target: agents/my-llm-project/

Changes to agent 'Support Agent':
────────────────────────────────────────────────────────────────
- temperature: 0.7
+ temperature: 0.5
────────────────────────────────────────────────────────────────

Write 7 files? [y/N]
```

### push

Push local agent files to an LLM project. Shows a YAML diff preview comparing local vs remote, asks for confirmation, and displays the console URL after push.


```bash
# Push all resources from current directory
tdx agent push [path] [options]

# Push from specific directory
tdx agent push ./agents/my-project/ [options]

# Push specific agent
tdx agent push ./agents/my-project/support-agent/ [options]
```

**Options:**

- `--dry-run`: Preview changes without pushing
- `-f, --force`: Push without confirmation
- `-y, --yes`: Skip confirmation prompts


**Examples:**


```bash
# Push from current directory
tdx agent push

# Push from specific directory
tdx agent push ./agents/my-project/

# Push specific agent
tdx agent push ./agents/my-project/support-agent/

# Preview push changes (no changes made)
tdx agent push --dry-run

# Push without confirmation prompt
tdx agent push --yes
```

**Output:**


```
Push summary for 'My LLM Project':
  + 2 new | ~ 3 changed | = 15 unchanged
  Agents: 0 new, 2 updated, 8 unchanged
  Knowledge Bases: 1 new, 0 updated, 2 unchanged
  Text Knowledge Bases: 1 new, 0 updated, 1 unchanged
  Prompts: 0 new, 0 updated, 2 unchanged
  Integrations: 0 new, 1 updated, 1 unchanged
  Source: agents/my-llm-project/

Changes to agent 'Support Agent':
────────────────────────────────────────────────────────────────
- temperature: 0.5
+ temperature: 0.7
────────────────────────────────────────────────────────────────

Push 5 resources? [y/N]

✔ Pushed 5 resources to 'My LLM Project'
Project: https://console-next.us01.treasuredata.com/app/af/12345/ag
```

When pushing a single agent, the chat URL is shown:


```
✔ Agent updated successfully
Agent: Support Agent
Chat: https://console-next.us01.treasuredata.com/app/af/12345/ag/67890/tc
```

### clone

Clone an LLM project to create a new project with all agents, knowledge bases, and prompts.


```bash
# Clone from remote project
tdx agent clone <project> --name <new-name>

# Clone from local directory
tdx agent clone <local-dir> --name <new-name>

# Clone from context (llm_project)
tdx agent clone --name <new-name>
```

**Options:**

- `-n, --name <name>`: Name for the new project (required)
- `--dry-run`: Preview what would be cloned without making changes
- `-y, --yes`: Skip confirmation prompts
- `--profile <name>`: Target profile for the new project (for cross-profile cloning)


**Examples:**


```bash
# Clone a project to a new name
tdx agent clone "Production Agents" --name "Staging Agents"

# Clone from local files (useful for cross-profile deployment)
tdx agent clone ./agents/my-project/ --name "New Project" --profile production

# Preview what would be cloned
tdx agent clone "My Project" --name "My Project Copy" --dry-run

# Clone without confirmation
tdx agent clone "My Project" --name "My Project Copy" --yes
```

**Output:**


```
Clone "Production Agents" to new project "Staging Agents"? [y/N] y
✔ Project cloned successfully
Source: Production Agents
New project: Staging Agents
New project ID: 12345

Summary:
  Agents: 5 created
  Knowledge Bases: 2 created
  Text Knowledge Bases: 1 created
  Prompts: 3 created
  Integrations: 1 created

Context set: llm_project = Staging Agents
Project: https://console-next.us01.treasuredata.com/app/af/12345/ag
```

Cross-Profile Cloning
To clone a project to a different profile (e.g., from default to production):

1. First pull the project locally: `tdx agent pull "MyProject"`
2. Then clone from local files: `tdx agent clone ./agents/my-project/ --name "MyProject-Prod" --profile production`


Integration Support in Clone
**Safe integrations** (Generic Chat, Agent Console, Parent Segment) are included in clone operations.

**Sensitive integrations** (Slack, Webhook) are **not** cloned because they contain secrets. Configure these manually in the TD console for each environment.

### test

Run automated tests against an agent using YAML test definitions. Tests are evaluated by a judge agent (Claude Sonnet 4.5) for binary pass/fail results.


```bash
# Run tests from current agent directory
tdx agent test

# Run tests from a specific path
tdx agent test <path>

# Run tests from test.yml file
tdx agent test ./agents/my-project/my-agent/test.yml
```

**Options:**

- `-n, --name <name>`: Filter to specific test(s) by name (can be repeated)
- `--tags <tags>`: Filter to tests with specific tags (comma-separated)
- `--dry-run`: Parse and validate test definitions without running
- `--no-eval`: Run conversations without evaluation (useful for debugging)
- `--reeval`: Re-evaluate last test run with updated criteria (skip conversation generation)


**Examples:**


```bash
# Run all tests in current agent directory
tdx agent test

# Run tests from specific agent
tdx agent test ./agents/my-project/my-agent/

# Run only specific tests
tdx agent test --name "greeting_test" --name "context_test"

# Run tests with specific tags
tdx agent test --tags "smoke"
tdx agent test --tags "smoke,regression"

# Validate test file without running
tdx agent test --dry-run

# Run without evaluation (just execute conversations)
tdx agent test --no-eval

# Re-evaluate last test run with updated criteria
tdx agent test --reeval

# Re-evaluate specific tests only
tdx agent test --reeval --name "greeting_test"
```

**Output:**


```
Running tests for my-agent...
Setting up evaluator agent...

Test 1/3: greeting_test
  Round 1: Hello → ✓ PASS

Test 2/3: calculation_test
  Round 1: What is 2+2? → ✓ PASS

Test 3/3: context_test
  Round 1: My name is Alice → ✓ PASS
  Round 2: What's my name? → ✓ PASS

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Results: 3 passed, 0 failed
Duration: 12.3s
```

### Test File Format

Create a `test.yml` file in your agent directory alongside `agent.yml`:


```
agents/
└── my-project/
    └── my-agent/
        ├── agent.yml
        ├── prompt.md
        └── test.yml      # Test definitions
```

#### Flat Format (Single-Round Tests)

For simple tests with a single user input and criteria:


```yaml
tests:
  - name: greeting_test
    tags: [smoke, core]
    user_input: Hello
    criteria: Should respond with a friendly greeting

  - name: calculation_test
    tags: [regression]
    user_input: What is 2 + 2?
    criteria: Should respond with the correct answer (4)

  - name: help_request
    user_input: How do I reset my password?
    criteria: Should provide clear password reset instructions
```

The `tags` field is optional and can be used to categorize tests for filtering with `--tags`.

#### Rounds Format (Multi-Round Tests)

For tests requiring multiple conversation turns:


```yaml
tests:
  - name: context_memory_test
    tags: [memory, core]
    rounds:
      - user_input: My name is Alice
        criteria: Should acknowledge the name
      - user_input: What's my name?
        criteria: Should remember and respond with "Alice"

  - name: multi_step_task
    tags: [workflow, regression]
    rounds:
      - user_input: I want to analyze sales data
        criteria: Should ask clarifying questions about the data
      - user_input: It's in the sales_2024 table
        criteria: Should acknowledge and proceed with analysis
      - user_input: Show me top products
        criteria: Should provide a list of top-selling products
```

#### Mixed Format

You can combine both formats in the same file:


```yaml
tests:
  # Simple single-round tests
  - name: basic_greeting
    user_input: Hi there!
    criteria: Should greet back politely

  - name: simple_question
    user_input: What time is it?
    criteria: Should explain it cannot tell time or ask for context

  # Complex multi-round test
  - name: data_analysis_workflow
    rounds:
      - user_input: I need help with customer segmentation
        criteria: Should ask about the data source and segmentation goals
      - user_input: Use the customers table, segment by purchase frequency
        criteria: Should propose a segmentation approach
      - user_input: Looks good, proceed
        criteria: Should execute the segmentation and show results
```

### Writing Good Criteria

Criteria should be clear and specific about what constitutes a pass:


```yaml
# ✓ Good - specific and measurable
criteria: Should respond with the number 4

# ✓ Good - describes expected behavior
criteria: Should ask for the customer's email address before proceeding

# ✓ Good - includes negative constraints
criteria: Should provide help without mentioning competitor products

# ✗ Bad - too vague
criteria: Should give a good response

# ✗ Bad - subjective
criteria: Should be helpful and friendly
```

### How Evaluation Works

1. **Test Execution**: Each round sends the `user_input` to your agent and waits for a response
2. **History Retrieval**: After all rounds complete, the full conversation history is fetched from the API
3. **Evaluation**: For each round, the judge agent (Claude Sonnet 4.5) evaluates whether the agent's response meets the `criteria`
4. **Results**: Each round gets a PASS or FAIL result with a reason


The evaluator agent is automatically created in your default project (`tdx_default_<username>`) the first time you run tests.

### Re-evaluating Tests

When iterating on evaluation criteria, you can skip conversation generation and re-evaluate the last test run:

1. **Initial Run**: Execute tests normally to generate conversations

```bash
tdx agent test
```
2. **Edit Criteria**: Modify `test.yml` to refine your criteria
3. **Re-evaluate**: Run with `--reeval` to test new criteria against cached conversations

```bash
tdx agent test --reeval
```


The cached conversations are stored in `.cache/tdx/last_agent_test_run.json` relative to your working directory. The cache includes:

- Chat IDs for conversation URLs
- Agent responses for each round
- Original test metadata


**Handling new tests:** If you add new tests to `test.yml` that weren't in the cached run, they will be executed normally (generating new conversations) while existing tests use the cache.

Criteria Development Workflow
Use `--reeval` to rapidly iterate on criteria without waiting for new conversations.
Once criteria are stable, run a fresh test to verify end-to-end behavior.

## Workflow

### Initial Setup

1. Pull a project to create local files:

```bash
tdx agent pull "My LLM Project"
```
2. This creates the folder structure with all agents and resources


### Making Changes

1. Edit the YAML/Markdown files with your favorite editor
2. Preview changes:

```bash
tdx agent push --dry-run
```
3. Push changes:

```bash
tdx agent push
```


### Testing Agents

1. Create a `test.yml` file in your agent directory:

```yaml
tests:
  - name: basic_functionality
    user_input: Hello, can you help me?
    criteria: Should respond with a helpful greeting
```
2. Run tests:

```bash
tdx agent test
```
3. Review results and iterate on your agent's prompt until tests pass


### Version Control

The YAML/Markdown format is designed for version control:

- Human-readable diffs
- Easy code reviews
- Branch and merge workflows
- Git history for audit trails
- Test definitions versioned alongside agent code


## Comparison with Legacy Commands

| Feature | `agent pull/push/clone` | `llm project backup/restore` (deprecated) |
|  --- | --- | --- |
| Format | YAML/Markdown | JSON |
| Human-editable | Yes | No |
| Git-friendly | Yes | No |
| Selective sync | Yes (single agent) | No (full project) |
| Cross-profile clone | Yes | No |
| Safe integrations | Included (Generic Chat, etc.) | Included |
| Sensitive integrations | Not included (Slack, Webhook) | Included |
| Use case | Development workflow | Legacy disaster recovery |


Deprecated Commands
The `tdx llm project backup/restore` commands are deprecated. Use:

- `tdx agent pull` instead of `backup`
- `tdx agent push` instead of `restore`
- `tdx agent clone` for copying projects