# Exploratory Data Analysis

This notebook runs Exploratory Data Analysis (EDA) targeting the table specified by the *input_table* parameter.

Supported analytics methods:

* Basic EDA based on [Pandas](https://pandas.pydata.org/) DataFrame
* [Pandas Profiling](https://github.com/ydataai/pandas-profiling)
* EDA based on [Sweetviz](https://github.com/fbdesignpro/sweetviz)
* Missing data visualization based on [missingno](https://github.com/ResidentMario/missingno)


Some example visualizations from the EDA Notebook are shown below:

![](/assets/image2023-2-23_20-40-6.7980bacfddabf6fe2b99ec8e79a5a0aee8cd804a6f474b0de93ffb5a5e985c51.3cb60505.png)

![](/assets/image2023-2-23_20-41-52.4239a194a4abb2f3fbceedbc78bd50d55ffedfe6fd60ddce5b7c48ec833e07a8.3cb60505.png)

![](/assets/image2023-2-23_20-38-52.8bbcf51dde2b996a379aadbf8fc583646ba6085e6a0a0d6e8faa7eebfb0c83dd.3cb60505.png)

### EDA Workflow Example

Find a sample workflow [here in Treasure Boxes](https://github.com/treasure-data/treasure-boxes/blob/automl/machine-learning-box/automl/eda.dig).


```yaml
+run_eda:
  ipynb>:
    notebook: EDA
    input_table: ml_datasets.bank_marketing
    eda: all
    sampling_threshold: 1000000
```

### Parameters

| Parameter name | Parameter on Console | Description | Default Value |
|  --- | --- | --- | --- |
| docker.task_mem | Docker Task Mem | Task memory size. Available values are 64g, 128g (default), 256g, 384g, or 512g depending on your contracted tiers | 128g |
| input_table | Input Table | specify a TD table used for EDA as dbname.table_name | - |
| target_column | Target Column | column name used for the label | None |
| ignore_columns | Ignore Columns | columns to ignore for EDA | time |
| sampling_threshold | Sampling Threshold | threshold used for sampling. See the executed notebook in detail | 10_000_000 |
| eda | Eda | all or comma separated strings to specify types of EDA to run | all |