Skip to content

Developer Blog

Technical articles, best practices, and engineering insights from the Treasure Data developer team.

Leveraging Query ReExecution for Smooth Hive 4 Migration
Ryu Kobayashi, Shohei OkumiyaJuly 18, 2025

Leveraging Query ReExecution for Smooth Hive 4 Migration

How Apache Hive's Query ReExecution feature recovers failing queries and enables smooth migration to Hive 4.

HiveOpen source
Orchestrate dbt with Treasure Workflow Episode 2
Data Team (Ansel Lin, Satoshi Akama)July 1, 2024

Orchestrate dbt with Treasure Workflow Episode 2

Advanced dbt practices with Treasure Workflow including node selection, batch materialization, and data mesh.

dbtWorkflow
Upcoming Evolution of Treasure Data Query Engines
Toru TakahashiMay 14, 2024

Upcoming Evolution of Treasure Data Query Engines

Migration from Presto to Trino and upgrade from Hive 2 to Hive 4 with performance improvements.

trinoHivepresto
Journey to Containers in Core Services Worker Platform
Worker Team (Johan Gustavsson, Kwangshin Oh, Ryo Wada, Takashi Kurihara, and You Yamagata)March 30, 2024

Journey to Containers in Core Services Worker Platform

Evolution of Worker Platform from stateful processes to container-based architecture.

ContainersWorker Platform
Automatic Customer Segmentation with Machine Learning
Yish LimFebruary 1, 2024

Automatic Customer Segmentation with Machine Learning

Auto-Segmentation using K-Means clustering with feature prioritization and Shapley Values.

segmentationmachine learning
Testing Distributed Components of Storage Engine
Serhii HimadieievJanuary 1, 2024

Testing Distributed Components of Storage Engine

Asynchronous test executor architecture using SQS, DynamoDB, and S3 for distributed storage testing.

system testplazma
Leveraging feedback is a skill!
Gary LucasDecember 1, 2023

Leveraging feedback is a skill!

Techniques for receiving and incorporating feedback effectively for career growth.

communication
Orchestrate dbt with Treasure Workflow
KuoHuei (Ansel) LinNovember 1, 2023

Orchestrate dbt with Treasure Workflow

Integration of dbt Core with Treasure Workflow including setup, containerization, and deployment.

dbtWorkflow
The Zero Bug Policy
Tom WalshOctober 1, 2023

The Zero Bug Policy

Fix important bugs immediately or close them — a practical bug management technique.

Integrating Kafka with Treasure Data
Biswadip PaulSeptember 1, 2023

Integrating Kafka with Treasure Data

Leverage Kafka with HTTP Sink Connector to connect to the Treasure Data CDP.

real-timekafkaingest
Visual Studio Code extension for Treasure Data
Toru TakahashiAugust 1, 2023

Visual Studio Code extension for Treasure Data

Boost Your Data Analysis Workflow with TD Query Tool for VS Code.

developer-toolvscodeprestoHive
Hive Table scan optimization
Shohei Okumiya (@okumin)July 1, 2023

Hive Table scan optimization

20-30% performance improvements through parallelized S3 I/O optimization.

Hive
Continuous Deployment of Treasure Workflow with Azure DevOps
Toru TakahashiJune 16, 2023

Continuous Deployment of Treasure Workflow with Azure DevOps

Repository setup, Azure Pipeline configuration, and deployment procedures for TD Workflow.

Workflowcicdazure
How to prepare simple test data for Hive and Presto
Kazuki ItoJune 1, 2023

How to prepare simple test data for Hive and Presto

Techniques for preparing test data without creating test tables using single-row and multi-row queries.

Best PracticeHivepresto
Debugging unexpected _1 column's on data connector import
KohkiMay 1, 2023

Debugging unexpected _1 column's on data connector import

Why unexpected _1 columns appear in data connector imports and how to fix them.

Best PracticeDebug
Embulk in TD, and in the future
Dai MikurubeApril 1, 2023

Embulk in TD, and in the future

History of Embulk in Treasure Data, technical challenges, and open-source strategy lessons.

EmbulkOpen source
Implementing the Hive Distributed Profiling System
Shohei Okumiya (@okumin)March 1, 2023

Implementing the Hive Distributed Profiling System

Using Java Flight Recorder and d3-flame-graph for distributed Hive performance analysis.

Hive
#TDTechTalk : 5 challenges in CDP
TATSUNO "Taz" YasuhiroFebruary 1, 2023

#TDTechTalk : 5 challenges in CDP

The first TD in-person meet-up in three years.

meetup
Fuzzy Matching
AustinJanuary 1, 2023

Fuzzy Matching

Fuzzy matching techniques using RLIKE, Levenshtein algorithm, and SOUNDEX with SQL examples.

data engineeringid unification