Treasure Machine Learning (ML) is based on Apache Hivemall, which is a scalable machine learning library that runs on Apache Hive. Hivemall is designed to be scalable to the number of training instances and the number of training features. For more information, see the Hivemall User Guide.
Hivemall provides machine learning functionality and feature engineering functions through the Hive user-defined functions (UDFs), user-defined aggregation function (UDAFs), and user-defined tabular functions (UDTFs).
Perceptron
Passive Aggressive (PA, PA1, PA2)
Confidence Weighted (CW)
Adaptive Regularization of Weight Vectors (AROW)
Soft Confidence Weighted (SCW1, SCW2)
AdaGradRDA (with hinge loss)
RandomForest
Factorization Machines
Logistic Regression using Stochastic Gradient Descent
AdaGrad / AdaDelta (with logistic loss)
Passive Aggressive Regression (PA1, PA2)
AROW regression
RandomForest
Factorization Machines
Example: Movielens rating prediction
Matrix Factorization (sgd, adagrad)
Example: Movielens rating prediction
Minhash (LSH with jaccard index)
Minhash (LSH with jaccard index)
b-Bit minhash
Brute-force search using Cosine similarity
Feature hashing (MurmurHash, SHA1)
Feature scaling (Min-Max Normalization, Z-Score)
Feature instances amplifier that reduces iterations on training
Evaluation UDFs are useful for evaluating the accuracy of your machine learning model.
auc(probability, truth_label) fmeasure(truth_label, predicted_label) |
See the Hivemall user guide.
auc(recommend_list, truth_list, k) precision_at(recommend_list, truth_list, k) recall_at(recommend_list, truth_list, k) mrr(recommend_list, truth_list, k) average_precision(recommend_list, truth_list, k) hitrate(recommend_list, truth_list, k) |
See the Hivemall user guide.