AutoML (Automated Machine Learning) automates parts of the end‑to‑end ML process, broadening usage beyond ML experts and accelerating delivery of high‑quality models.
AutoML automates key sub‑tasks:
- Pre-process & clean data
- Exploratory Data Analysis (EDA)
- Feature engineering
- Model selection & training
- Model evaluation
The following image illustrates these sub‑tasks (green dotted line box):
Treasure Data (TD) provides AutoML as a feature which can be configured within the familiar Treasure Workflow environment. A number of workflow operators are provided which configure and activate the AutoML processes. This means that the familiar Workflow environment can be used, in terms of workflow management, permission management, scheduling, execution, notifications, and logs.
When executed, each AutoML operator creates a segregated execution environment which executes a specific Python notebook and outputs the resulting models, database tables. The notebooks are prepared and maintained by Treasure Data.
Each AutoML task runs in an isolated environment, which ensures that the process can be completed securely and efficiently without interruptions by other tasks, taking advantage of the underlying resources of the cloud execution environment. TD provides several packages for provisioning cluster resources which enable the use of increasing amounts of capacity and resources, depending on the speed and data size requirements for each customer.
After the workflow execution is complete, the workflow environment links to the corresponding log files and notebooks, enabling users to proceed with analysis and validation of the execution results. In this way, the AutoML framework provides transparency and visibility, allowing technical users (Data Scientists or ML engineers) to inspect the technical details of the AutoML model and its outputs.
- Executes inside the secure Treasure Data environment with direct access to existing clean, unified, and enriched data.
- Direct integration with existing Treasure Data components. All data processing can remain within the environment without the need for external tools or processing.
- Empowers users to make data-driven decisions by enabling teams to quickly build, train, and deploy ML models by simplifying the configuration and optimization of the various algorithms contained within the AutoML libraries.
- Enables technical users to focus more on business value creation, by reducing the time to create ML models.
- Reduces or avoids dependency on technical teams or external vendors to build and use powerful ML models.
- Dedicated execution environment provides access to higher capacity performance solutions, to support more complex machine learning tasks.