# House Price Prediction Example

This example predicts house pricing using a Linear Regression predictor. Treasure Workflow provides a way to predict continuous values, like a price or energy consumption, using Linear Regression predictor.

• regression.dig is a TD workflow script for regression

# About Feature Selection with the Filter Methods

Filter type methods select variables regardless of the model. They are based only on general features like the correlation with the variable to predict. Filter methods suppress the least interesting variables. The other variables are part of a classification or a regression model used to classify or to predict data. These methods are particularly effective in computation time and robust to overfitting.

Filter methods tend to select redundant variables when they do not consider the relationships between variables. More elaborate features minimize this problem by removing variables that are highly correlated to each other.

# Building and Running the Workflow

To build and run the workflow:

1. Obtain the The Boston Housing Dataset.

2. Prepare the sample data set.
`\$ ./data.sh `

3. Push the workflow to Treasure Data:
`\$ tf wf push regressor`

4. Run the workflow:
`\$ tf wf start regressor regression --session now -p apikey=\${YOUR_TD_API_KEY`

# Input

In this workflow, we use The Boston Housing Dataset.

This workflow assumes a table as follows:

crim
`double`

zn
`double`

indus
`double`

chas
`int`

nox
`double`

rm
`double`

age
`double`

dis
`double`

`int`

tax
`int`

ptratio
`double`

b
`double`

lstat
`double`

medv
`double`

0.00632

18

2.31

0

0.538

6.575

65.2

4.09

1

296

15.3

396.9

4.98

24

0.02731

0

7.07

0

0.469

6.421

78.9

4.9671

2

242

17.8

396.9

9.14

21.6

0.02729

0

7.07

0

0.469

7.185

61.1

4.9671

2

242

17.8

392.83

4.03

34.7

0.03237

0

2.18

0

0.458

6.998

45.8

6.0622

3

222

18.7

394.63

2.94

33.4

`medv`, median values of owner-occupied homes in \$1000’s, is the target value for regression. `chas` and `rad` are categorical values, and other features are quantitative.

By default, we filtered the top 4 correlated columns with `medv`. This technique is known as feature selection with filter methods to improve prediction accuracy. If you want to change explanatory variables, you can modify the following file:

• vectorize_log1p_features.sql

# Output

This workflow outputs predicted price of houses in predictions table as follows:

rowid
`string`

predicted_price
`double`

1-10

33.97034232809395

1-121

30.3377696027913

• No labels