Developer Blog

Technical articles, best practices, and engineering insights from the Treasure Data developer team.

Ryu Kobayashi, Shohei OkumiyaJuly 18, 2025

Leveraging Query ReExecution for Smooth Hive 4 Migration

How Apache Hive's Query ReExecution feature recovers failing queries and enables smooth migration to Hive 4.

HiveOpen source

Data Team (Ansel Lin, Satoshi Akama)July 1, 2024

Orchestrate dbt with Treasure Workflow Episode 2

Advanced dbt practices with Treasure Workflow including node selection, batch materialization, and data mesh.

dbtWorkflow

Toru TakahashiMay 14, 2024

Upcoming Evolution of Treasure Data Query Engines

Migration from Presto to Trino and upgrade from Hive 2 to Hive 4 with performance improvements.

trinoHivepresto

Worker Team (Johan Gustavsson, Kwangshin Oh, Ryo Wada, Takashi Kurihara, and You Yamagata)March 30, 2024

Journey to Containers in Core Services Worker Platform

Evolution of Worker Platform from stateful processes to container-based architecture.

ContainersWorker Platform

Yish LimFebruary 1, 2024

Automatic Customer Segmentation with Machine Learning

Auto-Segmentation using K-Means clustering with feature prioritization and Shapley Values.

segmentationmachine learning

Serhii HimadieievJanuary 1, 2024

Testing Distributed Components of Storage Engine

Asynchronous test executor architecture using SQS, DynamoDB, and S3 for distributed storage testing.

system testplazma

Gary LucasDecember 1, 2023

Leveraging feedback is a skill!

Techniques for receiving and incorporating feedback effectively for career growth.

communication

KuoHuei (Ansel) LinNovember 1, 2023

Orchestrate dbt with Treasure Workflow

Integration of dbt Core with Treasure Workflow including setup, containerization, and deployment.

dbtWorkflow

Tom WalshOctober 1, 2023

The Zero Bug Policy

Fix important bugs immediately or close them — a practical bug management technique.

Biswadip PaulSeptember 1, 2023

Integrating Kafka with Treasure Data

Leverage Kafka with HTTP Sink Connector to connect to the Treasure Data CDP.

real-timekafkaingest

Toru TakahashiAugust 1, 2023

Visual Studio Code extension for Treasure Data

Boost Your Data Analysis Workflow with TD Query Tool for VS Code.

developer-toolvscodeprestoHive

Shohei Okumiya (@okumin)July 1, 2023

Hive Table scan optimization

20-30% performance improvements through parallelized S3 I/O optimization.

Hive

Toru TakahashiJune 16, 2023

Continuous Deployment of Treasure Workflow with Azure DevOps

Repository setup, Azure Pipeline configuration, and deployment procedures for TD Workflow.

Workflowcicdazure

Kazuki ItoJune 1, 2023

How to prepare simple test data for Hive and Presto

Techniques for preparing test data without creating test tables using single-row and multi-row queries.

Best PracticeHivepresto

KohkiMay 1, 2023

Debugging unexpected _1 column's on data connector import

Why unexpected _1 columns appear in data connector imports and how to fix them.

Best PracticeDebug

Dai MikurubeApril 1, 2023

Embulk in TD, and in the future

History of Embulk in Treasure Data, technical challenges, and open-source strategy lessons.

EmbulkOpen source

Shohei Okumiya (@okumin)March 1, 2023

Implementing the Hive Distributed Profiling System

Using Java Flight Recorder and d3-flame-graph for distributed Hive performance analysis.

Hive

TATSUNO "Taz" YasuhiroFebruary 1, 2023

#TDTechTalk : 5 challenges in CDP

The first TD in-person meet-up in three years.

meetup

AustinJanuary 1, 2023

Fuzzy Matching

Fuzzy matching techniques using RLIKE, Levenshtein algorithm, and SOUNDEX with SQL examples.

data engineeringid unification