# Exploratory Data Analysis This notebook runs Exploratory Data Analysis (EDA) targeting the table specified by the *input_table* parameter. Supported analytics methods: * Basic EDA based on [Pandas](https://pandas.pydata.org/) DataFrame * [Pandas Profiling](https://github.com/ydataai/pandas-profiling) * EDA based on [Sweetviz](https://github.com/fbdesignpro/sweetviz) * Missing data visualization based on [missingno](https://github.com/ResidentMario/missingno) Some example visualizations from the EDA Notebook are shown below: ![](/assets/image2023-2-23_20-40-6.7980bacfddabf6fe2b99ec8e79a5a0aee8cd804a6f474b0de93ffb5a5e985c51.3cb60505.png) ![](/assets/image2023-2-23_20-41-52.4239a194a4abb2f3fbceedbc78bd50d55ffedfe6fd60ddce5b7c48ec833e07a8.3cb60505.png) ![](/assets/image2023-2-23_20-38-52.8bbcf51dde2b996a379aadbf8fc583646ba6085e6a0a0d6e8faa7eebfb0c83dd.3cb60505.png) ### EDA Workflow Example Find a sample workflow [here in Treasure Boxes](https://github.com/treasure-data/treasure-boxes/blob/automl/machine-learning-box/automl/eda.dig). ```yaml +run_eda: ipynb>: notebook: EDA input_table: ml_datasets.bank_marketing eda: all sampling_threshold: 1000000 ``` ### Parameters | Parameter name | Parameter on Console | Description | Default Value | | --- | --- | --- | --- | | docker.task_mem | Docker Task Mem | Task memory size. Available values are 64g, 128g (default), 256g, 384g, or 512g depending on your contracted tiers | 128g | | input_table | Input Table | specify a TD table used for EDA as dbname.table_name | - | | target_column | Target Column | column name used for the label | None | | ignore_columns | Ignore Columns | columns to ignore for EDA | time | | sampling_threshold | Sampling Threshold | threshold used for sampling. See the executed notebook in detail | 10_000_000 | | eda | Eda | all or comma separated strings to specify types of EDA to run | all |