Agents developed by Treasure Data and included in our product capabilities develop responses to inputs (user prompts) based on system prompts and two fundamental technologies - Foundational Models, including Large Language Models (LLMs), and Retrieval Augmented Generation (RAG) that ground answers in specific customer data.
AI Agents always try to satisfy any system and user prompts and system parameters (such as the specific LLM Model being used and “temperature”) in which they have been programmed and that aim at constraining it to respond in a particular manner or within certain boundaries. Another important parameter is the knowledge base (if any) that the Agent is instructed to use for producing the output.
These knowledge and context “sources” then are used to generate an output, which, depending on the means in which the agents are configured, could be analyses, suggestions, or additional queries or code to be executed by the user or another agent. These outputs may or may not be tailored to the individual user based on the set up system prompts in the agents, the context provided in the user’s interactions with the agent in the user prompt, and the data connected that powers the agents.
The Large Language Models (LLMs) that Treasure Data’s Agents are built on consist of models that are trained by Anthropic on general knowledge. These models function similarly on Anthropics’ Claude.ai website. For more information on how they work, please visit this research: https://www.anthropic.com/research/mapping-mind-language-model
For Treasure Data’s purposes, there is no additional pre-training or post-training/fine tuning that has occurred or will occur with respect to this LLM. This is an important part of Treasure Data’s commitment to data privacy and governance and its implications are that data from multiple sources are neither comingled nor accessible by different customers. To reemphasize, Treasure Data does not apply any pre- or post- AI training using any types of customer data: we use only secure, out-of-the-box LLM models.
Said one further way, the LLM that Treasure Data uses does not have customer data integrated within the models themselves. Instead, the way that Treasure Data’s agents are able to access and use customer data is through a tool set up that uses a Retrieval Augmented Generation (RAG) architecture and context about the environment and its objectives through system prompting. RAG provides for “grounding” the answers that the Agent provides based on actual customer data in Treasure Data. All RAG executions happen inside of Treasure Data’s environment. The Treasure Data Agents convert natural language queries and requests into computer-language queries that are then executed against the customer dataset - just like if a human were to write the query and run it themselves. The output of the box system prompts used in the agents have general instructions and also do not include any proprietary customer information.
When interacting with the Agent, you are able to see the supervisor agent-to-subagent-to- supervisor agent communication, queries, and chain of thought that are being executed, for your validation and explainability. If you wish, this allows you to run the queries yourself and audit the agent’s work, on the spot.
Information that the Agent has accessed or derived is not shared across Treasure Data accounts and is not accessible within a Treasure Data instance outside that individual chat session, but all interactions with agents are audible and traceable for security professionals and administrators with the appropriate levels of access. Finally, no logging or telemetry is sent to the model providers.
In case you wanted to understand further why the Treasure Data agent came up with a decision or provided a response in a particular session, users can oftentimes ask the question directly to the agent and the agent will provide an explanation as to the reason why it made a decision or came up with an answer. This may be further validated by referring to the supervisor agent-to-subagent-to-supervisor agent communication as described, above.