Skip to content
Last updated

Data Cleaning

After importing data into Treasure Data, you may see unexpected or incorrect data in the results. Sometimes this can be corrected by modifying your import settings, but sometimes you will need to perform additional processing to clean up, or normalize, your data. Some common cleanup activities are:

  • Deduplication--removing duplicate columns or rows in the Treasure Data instances of your databases.
  • Data Normalization
    • standardizing how similar data types are represented in Treasure Data. For example, a telephone number might be variously imported as 555-567-8911 or 5555678911 or 00 1 555 567 8911. Consequently, you will want to standardize the representation of telephone numbers across all your databases especially if you're planning to use it as key or ID unification.