# Sentiment Analysis With Tensorflow Treasure Data Workflow provides an easy way to leverage Python custom scripts for sentiment analysis with TensorFlow and export its model to Amazon S3. Machine Learning algorithms can be run as part of your scheduled workflows, using Python Custom scripts. This article introduces the steps to run the ML algorithm Sentimental Analysis within a Treasure Data Workflow. Sentimental Analysis classifies texts as positive/negative, for movie reviews using [TensorFlow](https://www.tensorflow.org/) and [TensorFlow Hub](https://tfhub.dev/). See [the official document](https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub). ## Sentimental Analysis using Python Custom Scripts There are two versions of the algorithm discussed in this article: * Example Workflow using TensorFlow with Amazon S3 * Example Workflow using TensorFlow without Amazon S3 ### Example Workflow using TensorFlow with Amazon S3 The workflow: * Fetches review data from Treasure Data * Builds a model with TensorFlow * Stores the model on S3 * Predicts polarities for unknown review data and writes it back to Treasure Data #### Prerequisites * Make sure the custom scripts feature is enabled for your TD account. * Download and install the TD Toolbelt and the TD Toolbelt Workflow module. * Basic Knowledge of Treasure Data Workflow syntax * AWS S3 * S3 Secrets Run the Example Workflow 1. Download the sentimental-analysis project from this [repository](https://github.com/treasure-data/treasure-boxes/tree/master/machine-learning-box/sentiment-analysis) 2. In the Terminal window, change directory to sentimental-analysis 3. Run *data.sh* to ingest training and test data on Treasure Data. About 80 million records are fetched to build the model. The script also creates a database named *sentiment* and tables named *movie_review_train* and *movie_review_test* to store the data. For example: ```bash $ ./data.sh ``` Assume that the input table is: | rowid | sentence | sentiment | polarity | | --- | --- | --- | --- | | 1-10531 | "Bela Lugosi revels in his role as European horticulturist (sic) Dr. Lorenz in this outlandish... | 2 | 0 | | 1-10960 | Fragmentaric movie about a couple of people in Austria during a heatwave. This kind of... | 3 | 0 | | 1-24370 | I viewed the movie together with my arrogant, film critic friend, my wife and her female friend. So... | 7 | 1 | 1. Run the example workflow as follows: ```bash td workflow push sentiment ``` 1. Set secrets from STDIN like: ```bash apikey=x/xxxxx, endpoint=https://api.treasuredata.com, s3_bucket=my_bucket, or aws_access_key_id=AAAAAAAAAA, aws_secret_access_key=XXXXXXXXX ``` ```bash td workflow secrets \ --project sentiment \ --set apikey \ --set endpoint \ --set s3_bucket \ --set aws_access_key_id \ --set aws_secret_access_key # Set secrets from STDIN like: apikey=x/xxxxx, endpoint=https://api.treasuredata.com, s3_bucket=my_bucket,aws_access_key_id=AAAAAAAAAA, aws_secret_access_key=XXXXXXXXX ``` 1. Start the analysis: ```bash td workflow start sentiment sentiment-analysis --session now ``` Results of the script are stored in the *test_predicted_polarities* table in Treasure Data. To view the table: 1. Log into TD Console. 2. Search for the sentiments database. 3. Locate the *test_predicted_polarities* table. 4. The prediction results are stored in this table as shown below: | rowid | predicted_polarity | | --- | --- | | 1-21643 | 0 | | 1-22967 | 1 | ### Example Workflow using TensorFlow without Amazon S3 The workflow: * Fetches review data from Treasure Data * Builds a model with TensorFlow * Predicts polarities for unknown review data and writes the data back to Treasure Data #### Prerequisites * Make sure this feature is enabled for your TD account. * Download and install the TD Toolbelt and the TD Toolbelt Workflow module. * Basic Knowledge of Treasure Data Workflow syntax Run the Example Workflow 1. Download the sentimental-analysis project from this [repository](https://github.com/treasure-data/treasure-boxes/tree/master/machine-learning-box/sentiment-analysis). 2. From the command line Terminal window, change directory to sentimental-analysis. For example: ```bash cd sentiment-analysis ``` 1. Run *data.sh* to ingest training and test data on Treasure Data. About 80 million records are fetched to build the model, the script also creates a database named *sentiment* and tables named *movie_review_train* and *movie_review_test* to store the data. ```bash $ ./data.sh ``` Assume that the input table is as follows: | rowid | sentence | sentiment | polarity | | --- | --- | --- | --- | | 1-10531 | "Bela Lugosi revels in his role as European horticulturist (sic) Dr. Lorenz in this outlandish... | 2 | 0 | | 1-10960 | Fragmentaric movie about a couple of people in Austria during a heatwave. This kind of... | 3 | 0 | | 1-24370 | I viewed the movie together with my arrogant, film critic friend, my wife and her female friend. So... | 7 | 1 | 1. Run the example workflow as follows: ```bash td workflow push sentiment ``` 1. Add secrets from STDIN like: apikey=x/xxxxx, endpoint=https://api.treasuredata.com ```bash td workflow secrets \ --project sentiment \ --set apikey \ --set endpoint ``` 1. Start the analysis ```bash td workflow start sentiment sentiment-analysis-simple --session now ``` Results of the script are stored in the *test_predicted_polarities* table in Treasure Data. To view the table: 1. Log into TD Console. 2. Search for the *sentiments* database. 3. Locate the *test_predicted_polarities* table. The prediction results should be similar to the following: | rowid | predicted_polarity | | --- | --- | | 1-21643 | 0 | | 1-22967 | 1 | ## Review the Workflow Custom Python Script Review the contents of the sentimental-analysis directory: * [sentiment-analysis.dig](https://github.com/treasure-data/workflow-examples/blob/master/machine-learning/sentiment-analysis/sentiment-analysis.dig) - This is the TD Workflow YAML file for sentiment analysis with TensorFlow. * [sentiment.py](https://github.com/treasure-data/workflow-examples/blob/master/machine-learning/sentiment-analysis/sentiment.py) - This is the Custom Python script with TensorFlow. It builds a prediction model with existing data and predicts polarity to unknown data. In this example, we use [a pre-trained model in TensorFlowHub](https://tfhub.dev/google/nnlm-en-dim128/1) for word embedding for English text. ```python embedded_text_feature_column = hub.text_embedding_column( key="sentence", module_spec="https://tfhub.dev/google/nnlm-en-dim128/1" ) ``` If you want to change this model to another one, for example, [Japanese model](https://tfhub.dev/google/nnlm-ja-dim128/1), you can modify it as follows: ```python embedded_text_feature_column = hub.text_embedding_column( key="sentence", module_spec="https://tfhub.dev/google/nnlm-ja-dim128/1" ) ``` Before word embedding, you need to prepare tokenized sentences for Japanese. Because this custom script also saves the trained TensorFlow model with movie reviews to Amazon S3, you can build your prediction server with [TensorFlow Serving](https://www.tensorflow.org/serving/). To change the *serving_input_receiver_fn* , modify the following code: ```python feature_spec = tf.feature_column.make_parse_example_spec([embedded_text_feature_column]) serving_input_receiver_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec) estimator.export_saved_model(EXPORT_DIR_BASE, serving_input_receiver_fn) ``` See [TensorFlow documentation](https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators) for details.