Treasure Machine Learning (ML) is based on Apache Hivemall, which is a scalable machine learning library that runs on Apache Hive. Hivemall is designed to be scalable to the number of training instances and the number of training features. For more information, see the Hivemall User Guide.

Supported Algorithms

Hivemall provides machine learning functionality and feature engineering functions through the Hive user-defined functions (UDFs), user-defined aggregation function (UDAFs), and user-defined tabular functions (UDTFs).


  • Perceptron

  • Passive Aggressive (PA, PA1, PA2)

  • Confidence Weighted (CW)

  • Adaptive Regularization of Weight Vectors (AROW)

  • Soft Confidence Weighted (SCW1, SCW2)

  • AdaGradRDA (with hinge loss)

  • RandomForest

  • Factorization Machines


  • Logistic Regression using Stochastic Gradient Descent

  • AdaGrad / AdaDelta (with logistic loss)

  • Passive Aggressive Regression (PA1, PA2)

  • AROW regression

  • RandomForest

  • Factorization Machines


k-Nearest Neighbor

  • Minhash (LSH with jaccard index)

  • b-Bit minhash

  • Brute-force search using Cosine similarity

Feature Engineering

  • Feature hashing (MurmurHash, SHA1)

  • Feature scaling (Min-Max Normalization, Z-Score)

  • Feature instances amplifier that reduces iterations on training

  • TF-IDF vectorizer

Hivemall Evaluation UDFs

Evaluation UDFs are useful for evaluating the accuracy of your machine learning model.

Binary Classification Metrics


auc(probability, truth_label)
fmeasure(truth_label, predicted_label)


See the Hivemall user guide.

Ranking Measures


auc(recommend_list, truth_list, k)
precision_at(recommend_list, truth_list, k)
recall_at(recommend_list, truth_list, k)
mrr(recommend_list, truth_list, k)
average_precision(recommend_list, truth_list, k)
hitrate(recommend_list, truth_list, k)


See the Hivemall user guide.

  • No labels