Boston Housing Prices EDA in Python

Explore the Boston Housing dataset with price distributions, feature correlations, and outlier detection using an AI data analyst.

This page shows an AI data analysis example. A workflow is a list of prompts (instructions) used to complete one analysis task step by step. We send the same prompt list to multiple AI models, then compare their notebook outputs and scores.

What this AI workflow does

This AI Data Analyst workflow loads the Boston Housing dataset from a CSV URL and produces basic descriptive statistics. It visualizes the distribution of the target variable (medv) and checks for skew using a histogram with KDE. It computes feature correlations, highlights the strongest correlates with price, and generates scatter plots for the top three features versus medv.

Who this example is for

This is for analysts and students who want a guided exploratory data analysis example on a small, well-known regression dataset. It helps you practice interpreting distributions, correlations, and bivariate relationships using reproducible Python code.

Expected analysis outcomes

These are the results the AI workflow is expected to generate.

Dataset loaded with shape (506, 14) and a summary statistics table
Histogram with KDE for medv showing a right-skewed distribution and median around 21k
Correlation heatmap with top correlators identified (rm about +0.70, lstat about -0.74)
Three scatter plots of the top features versus medv with regression lines

Tools and libraries used

Main Python packages and tooling used to run this AI data analysis task.

pandas
numpy
matplotlib
seaborn

Prompt sequence

This is the exact list of prompts used in this workflow. The same prompt sequence is sent to each model so outputs and scores can be compared fairly.

1load boston housing dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/housing/data.csv and show basic statistics
2plot the distribution of house prices (medv)
3show a correlation heatmap and identify top features correlated with price
4create scatter plots of the top 3 features vs price

Model Comparison

We compared several LLM models on the same analysis task. The table below shows their scores side by side. You can also open each model run to check the full conversation and notebook results.

Model Source	Score	Task	Execution	Output	Reasoning	Reliability	Run
gemma4:31b	10/10	2/2	2/2	3/3	2/2	1/1	Open gemma4:31b conversation
glm-5.1	10/10	2/2	2/2	3/3	2/2	1/1	Open glm-5.1 conversation
gpt-5.4	10/10	2/2	2/2	3/3	2/2	1/1	Open gpt-5.4 conversation
gpt-oss:120b	10/10	2/2	2/2	3/3	2/2	1/1	Open gpt-oss:120b conversation
qwen3-coder-next	10/10	2/2	2/2	3/3	2/2	1/1	Open qwen3-coder-next conversation
qwen3.5:397b	10/10	2/2	2/2	3/3	2/2	1/1	Open qwen3.5:397b conversation

How scoring works

Each run is graded on five dimensions:

Task Completion (0-2): how fully the workflow answered the requested analysis task and completed required steps.
Execution Correctness (0-2): whether generated code is correct, consistent, and likely runnable without major fixes.
Output Quality (0-3): how well tables, charts, and results match the expected analytical outcomes.
Reasoning Quality (0-2): clarity and correctness of interpretation, explanation, and conclusions.
Reliability (0-1): robustness of the run, including low hallucination risk and stable behavior.

The final score is the sum of these dimensions, so total score = 0-10. Higher values indicate more complete, correct, and reliable notebook results for this workflow.

AI Analysis with gemma4:31b

This run uses AI with a step-by-step list of prompts to perform the analysis task Boston Housing Prices EDA in Python. Below, you can review the full conversation generated with gemma4:31b. After the conversation, we show detailed score cards for this run.

load boston housing dataset from https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/housing/data.csv and show basic statistics

Thinking

Goal: Load the Boston housing dataset from a specific URL and show basic statistics.
- URL: https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/housing/data.csv
- Constraints:
  - Iterative loop: one step at a time.
  - First step: load and inspect (shape, columns, head()).
  - Currency formatting: Use USD.
  - Notebook rules: no comma-separated expressions, display variables directly.
  - Single code block per response.
- The user asked for "basic statistics" as well. However, the system prompt says "When a dataset is not yet loaded... first load and inspect only: show shape, columns, and head()." and "Do not combine first-load data inspection with plotting/modeling in the same response."
- Technically, basic statistics (describe()) might be considered "inspection", but usually, the sequence is Load -> Inspect (head/shape) -> Statistics/Analysis.
- I will start by loading the data and showing the basics (head, shape, columns) to verify the data structure before moving to descriptive statistics.
- Use pandas to read the CSV from the URL.
- Display df.head(), df.shape, and df.columns.
- Wait, the prompt says "Do not output comma-separated expressions (e.g., df.head(), df.shape)".
- I should use display() for the table and simple expressions for the others.
- Load CSV.
- Show head.
- Show shape.
- Show columns.
- Wait, the prompt specifically says: "When a dataset is not yet loaded in current notebook context, first load and inspect only: show shape, columns, and head()."
- I will follow this strictly.

I will start by loading the dataset from the provided URL and performing an initial inspection to understand its structure.

Load and inspect data

import pandas as pd

# Load the dataset
url = "https://raw.githubusercontent.com/pplonski/datasets-for-start/refs/heads/master/housing/data.csv"
df = pd.read_csv(url)

# Inspect the data
display(df.head())
df.shape
df.columns

	CRIM	ZN	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT	MEDV
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1	296	15.3	396.90	4.98	24.0
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2	242	17.8	396.90	9.14	21.6
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2	242	17.8	392.83	4.03	34.7
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3	222	18.7	394.63	2.94	33.4
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3	222	18.7	396.90	5.33	36.2

Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT', 'MEDV'],
      dtype='str')

Thinking

The user wants to load the Boston housing dataset and show basic statistics. In the previous turn, I loaded the data, showed the head(), the shape, and the columns. The output shows the data is loaded successfully (506 rows, 14 columns). However, "basic statistics" (like describe()) have not been shown yet. Therefore, the goal is not fully met.

Path B: Provide a minimal code block to get the basic statistics.

The dataset has been loaded with 506 entries and 14 features, including columns such as CRIM (crime rate), RM (average number of rooms), and MEDV (median value of owner-occupied homes). To complete the request, I will now generate the basic descriptive statistics.

Show dataset statistics

# Display basic statistics for the dataset
df.describe()

#10

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT	MEDV
count	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000	506.000000
mean	3.613524	11.363636	11.136779	0.069170	0.554695	6.284634	68.574901	3.795043	9.549407	408.237154	18.455534	356.674032	12.653063	22.532806
std	8.601545	23.322453	6.860353	0.253994	0.115878	0.702617	28.148861	2.105710	8.707259	168.537116	2.164946	91.294864	7.141062	9.197104
min	0.006320	0.000000	0.460000	0.000000	0.385000	3.561000	2.900000	1.129600	1.000000	187.000000	12.600000	0.320000	1.730000	5.000000
25%	0.082045	0.000000	5.190000	0.000000	0.449000	5.885500	45.025000	2.100175	4.000000	279.000000	17.400000	375.377500	6.950000	17.025000
50%	0.256510	0.000000	9.690000	0.000000	0.538000	6.208500	77.500000	3.207450	5.000000	330.000000	19.050000	391.440000	11.360000	21.200000
75%	3.677083	12.500000	18.100000	0.000000	0.624000	6.623500	94.075000	5.188425	24.000000	666.000000	20.200000	396.225000	16.955000	25.000000
max	88.976200	100.000000	27.740000	1.000000	0.871000	8.780000	100.000000	12.126500	24.000000	711.000000	22.000000	396.900000	37.970000	50.000000

#11

Thinking

The user goal is to "load boston housing dataset from [URL] and show basic statistics". Looking at the notebook state:

cell-1 loaded the data and showed head(), shape, and columns.
cell-2 ran df.describe(), which produces the basic statistics.
output-2 shows the results of df.describe() (count, mean, std, min, 25%, 50%, 75%, max) for several columns.

The original goal has been fully met:

Data loaded from the URL.
Basic statistics shown via df.describe().

The outputs are sufficient. I should proceed with Path A.

Insights from outputs:

The dataset contains 506 rows and 14 columns.
The target variable MEDV represents the median house value. Based on the descriptive statistics, the mean value is around 22.5 (though the full output was truncated, the describe() function was executed). Looking at the head(), the values are 24.0, 21.6, 34.7, 33.4, 36.2. These usually represent thousands of USD.
Features like CRIM (crime rate) show a wide range, from a minimum of 0.00632 to a maximum of 88.9762.
RM (average number of rooms) averages about 6.28 rooms per dwelling.

I will formulate the final response.

#12

The Boston housing dataset has been successfully loaded, containing 506 records and 14 variables. The basic statistics reveal a wide distribution in several features; for example, the crime rate (CRIM) varies significantly from 0.006 to 88.976, and the average number of rooms (RM) centers around 6.28. The median home value (MEDV) shows a mean of approximately 22.5 (typically representing 22,500 USD in this classic dataset).

#13

plot the distribution of house prices (medv)

#14

Thinking

The user wants to plot the distribution of house prices, which is represented by the MEDV column in the df DataFrame.

- `df` is already loaded.
- Column `MEDV` exists.
- Standard visualization library is Seaborn.

- Import `seaborn` and `matplotlib.pyplot`.
- Create a histogram (displot/histplot) of `MEDV`.
- Add title and axis labels.

#15

I will create a histogram to visualize the distribution of the median house values (MEDV).

#16

Plot house price distribution

import seaborn as sns
import matplotlib.pyplot as plt

# Plot the distribution of house prices
plt.figure(figsize=(10, 6))
sns.histplot(df['MEDV'], kde=True, color='blue')
plt.title('Distribution of Median House Prices (MEDV)')
plt.xlabel('Median Value (in 1,000 USD)')
plt.ylabel('Frequency')
plt.show()

#17

#18

Thinking

The user wants to plot the distribution of house prices (MEDV). Looking at the notebook state:

Cell 1 loaded the data.
Cell 2 provided descriptive statistics.
Cell 3 plotted the distribution using sns.histplot with a KDE.

The original goal was "plot the distribution of house prices (medv)". The output of Cell 3 is a figure object <Figure size 1000x600 with 1 Axes>, and the code was correct. The user also had the df.describe() output in Cell 2, which provides the numeric values (mean, std, min, 25%, 50%, 75%, max) for MEDV.

Since the goal was "plot the distribution", and we have both the plot and the underlying statistics from describe(), the requirement is satisfied.

Wait, the "Sufficiency heuristics" state: "If the latest output is chart-only/visual-only and does not include underlying numeric/text values, choose Path B first to print the values used in the chart." In this case, the latest output (Cell 3) is the chart, but the previous output (Cell 2) provided the numeric statistics for the entire dataset, including MEDV.

However, looking at the describe() output in [output-2], it is truncated. The MEDV column's statistics are not visible in the truncated output shown in the prompt (it cuts off after B). Let me check the describe() output again. [output-2] shows CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, B. MEDV is the last column, and it's not in the visible part of the truncated describe() output.

Therefore, I don't have the explicit numeric summary for MEDV in the provided output context, even though I have the plot. According to the sufficiency heuristics, I should choose Path B to print the specific values for MEDV.

But wait, the user goal is simply "plot the distribution". If the plot is done, is it enough? The heuristic says if it's "chart-only" and doesn't include numeric values, go Path B. The goal is to "plot", but to provide a data analyst's answer, I need to describe the distribution. I can't accurately describe the distribution (mean, median, skew) just from the plot without the numbers.

Path B: Print the specific descriptive statistics for MEDV.

Wait, looking at the prompt again: "If the goal is to show/plot/chart/visualize and latest output contains both a valid chart/figure object and explicit numeric/text values supporting the conclusion, choose Path A". Currently, I have the chart, but I don't have the numeric values for MEDV because describe() was truncated.

I will choose Path B to get the stats for MEDV specifically.

#19

The histogram shows that the distribution of house prices is roughly bell-shaped but slightly right-skewed, with a noticeable peak around 20,000 USD to 25,000 USD. There is also a visible concentration of properties at the upper end of the scale (50,000 USD), suggesting a price ceiling in the dataset.

To provide precise insights into the center and spread of these prices, I will extract the specific statistics for the MEDV column.

#20

Get MEDV statistics

# Get descriptive statistics specifically for the MEDV column
df['MEDV'].describe()

#21

count    506.000000
mean      22.532806
std        9.197104
min        5.000000
25%       17.025000
50%       21.200000
75%       25.000000
max       50.000000
Name: MEDV, dtype: float64

#22

Thinking

The user wants to plot the distribution of house prices (medv). Looking at the notebook state:

[cell-3] plotted the distribution using sns.histplot(df['MEDV'], kde=True).
[output-3] confirms a figure was generated.
[cell-4] called df['MEDV'].describe() to get numeric summaries.
[output-4] shows:
- Mean: 22.53
- Std: 9.20
- Min: 5.0
- 25%: 17.025
- 50% (Median): 21.2
- 75%: 25.0
- Max: 50.0

The goal was to "plot the distribution of house prices (medv)". I have both the visualization (the plot) and the supporting numeric statistics (describe).

According to the sufficiency heuristics:

"If the goal is to show/plot/chart/visualize and latest output contains both a valid chart/figure object and explicit numeric/text values supporting the conclusion, choose Path A".

The data shows the median value is 21.2 (which represents 21,200 USD based on the axis label in the plot), with a range from 5,000 USD to 50,000 USD. The distribution is slightly right-skewed with a ceiling at 50,000 USD.

I have enough information to provide the final answer.

Final Answer Content:

Describe the distribution based on the plot and stats.
Mention the range (5,000 USD to 50,000 USD).
Mention the median (21,200 USD).
Note the right skew and the cap at 50,000 USD.

Currency formatting: use USD (e.g., 21.2 thousand USD or 21,200 USD). The axis label in the code was "Median Value (in 1,000 USD)".

Final check on Path A requirements:

Provide a natural final answer.
Include .
No Python code.
No meta phrases ("goal complete").
No chain-of-thought.
Use USD.

#23

The distribution of house prices (MEDV) is centered around a median of 21.2 thousand USD (approximately 21,200 USD), with an average of 22.53 thousand USD. The values range from a minimum of 5,000 USD to a maximum of 50,000 USD. The distribution is slightly right-skewed, with a significant concentration of properties hitting a price ceiling at 50,000 USD.

#24

show a correlation heatmap and identify top features correlated with price

#25

Thinking

The user wants to see a correlation heatmap and identify the top features correlated with the house price (MEDV).

- `df` is loaded and contains 14 columns: `CRIM`, `ZN`, `INDUS`, `CHAS`, `NOX`, `RM`, `AGE`, `DIS`, `RAD`, `TAX`, `PTRATIO`, `B`, `LSTAT`, `MEDV`.
- `MEDV` is the target variable (price).
- `seaborn` and `matplotlib.pyplot` are already imported.

- Step 1: Calculate the correlation matrix of the dataframe.
- Step 2: Plot a heatmap using `sns.heatmap`.
- Step 3: Sort the correlations with `MEDV` to find the top features.

- Since I can only do one runnable block per response and I should keep it focused, I will combine the calculation and the heatmap plotting. Identifying the "top features" usually involves looking at the heatmap or printing the sorted correlation list. I'll include the plot and the sorted correlations for `MEDV` in one block for a comprehensive answer.

- `df.corr()` to get the matrix.
- `sns.heatmap` with `annot=True` and a good colormap (like `coolwarm`).
- Filter the correlation series specifically for `MEDV` and sort it.

#26

To identify which features have the strongest relationship with house prices, I'll calculate the correlation matrix and visualize it using a heatmap. I will also list the features sorted by their correlation with MEDV.

#27

Plot correlation heatmap

# Calculate the correlation matrix
corr_matrix = df.corr()

# Plot the correlation heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title('Correlation Heatmap of Housing Features')
plt.show()

# Identify top features correlated with MEDV
top_correlations = corr_matrix['MEDV'].sort_values(ascending=False)
top_correlations

#28

#29

MEDV       1.000000
RM         0.695360
ZN         0.360445
B          0.333461
DIS        0.249929
CHAS       0.175260
AGE       -0.376955
RAD       -0.381626
CRIM      -0.388305
NOX       -0.427321
TAX       -0.468536
INDUS     -0.483725
PTRATIO   -0.507787
LSTAT     -0.737663
Name: MEDV, dtype: float64

#30

Thinking

The user wants a correlation heatmap and to identify the top features correlated with house prices (MEDV).

Looking at the notebook state:

cell-5 was executed.
It produced a heatmap (<Figure size 1200x800 with 2 Axes>).
It produced a sorted series of correlations for MEDV (top_correlations).
The output of top_correlations is explicitly visible:
- Positive correlations: RM (0.70), ZN (0.36), B (0.33).
- Negative correlations: LSTAT (-0.74), PTRATIO (-0.51), INDUS (-0.48), TAX (-0.47).

The goal was "show a correlation heatmap and identify top features correlated with price". The heatmap is shown (via output-5), and the specific correlations are listed (via output-5).

The outputs are sufficient. I should provide a natural final answer based on these observed values.

Insights:

The strongest positive correlation is with RM (average number of rooms per dwelling), suggesting that more rooms generally lead to higher house prices.
The strongest negative correlation is with LSTAT (percentage of lower status of the population), indicating that areas with a higher percentage of lower-status residents tend to have lower house prices.
Other notable negative correlations include PTRATIO (pupil-teacher ratio) and INDUS (proportion of non-retail business).

Path A.

#31

The correlation heatmap and numeric analysis reveal several key drivers of house prices (MEDV):

Strongest Positive Influence: The average number of rooms (RM) has the strongest positive correlation (0.70), indicating that as the number of rooms increases, house prices tend to rise significantly.
Strongest Negative Influence: The percentage of lower status of the population (LSTAT) shows the strongest negative correlation (-0.74), meaning prices typically drop as the proportion of lower-status residents in the area increases.
Other Notable Factors: There are strong negative relationships with the pupil-teacher ratio (PTRATIO at -0.51) and the proportion of non-retail business (INDUS at -0.48), suggesting that lower educational ratios and less industrialization in the immediate vicinity are associated with higher property values.

#32

create scatter plots of the top 3 features vs price

#33

Thinking

The user wants scatter plots of the top 3 features correlated with house prices (MEDV) based on the previous correlation analysis.

*   From the `top_correlations` output in `[output-5]`:
    *   Positive correlations: `RM` (0.70), `ZN` (0.36), `B` (0.33).
    *   Negative correlations: `LSTAT` (-0.74), `PTRATIO` (-0.51), `INDUS` (-0.48).
*   The top 3 features by absolute correlation value (magnitude) are:
    1.  `LSTAT` (abs: 0.74)
    2.  `RM` (abs: 0.70)
    3.  `PTRATIO` (abs: 0.51)
*   *Alternative interpretation*: top 3 strictly positive. However, in data science "top features" usually means those with the highest magnitude of correlation (most predictive). I will use the top 3 by absolute value.

*   `LSTAT` (Strong negative)
*   `RM` (Strong positive)
*   `PTRATIO` (Strong negative)

*   Use `seaborn` and `matplotlib`.
*   Create a figure with 3 subplots (1 row, 3 cols or 3 rows, 1 col).
*   Plot `LSTAT` vs `MEDV`, `RM` vs `MEDV`, and `PTRATIO` vs `MEDV`.
*   Add titles and labels.

*   Rule: Return exactly one runnable Python code block.
*   Rule: Simple variable names.
*   Rule: One chart or one table per step? The user asked for "plots" (plural). Usually, a grid of subplots counts as one visualization step.

*   Identify the 3 features.
*   Setup subplots.
*   Plot each.

#34

Based on the correlation analysis, the three features with the strongest relationship to house prices (MEDV) are LSTAT (strong negative), RM (strong positive), and PTRATIO (strong negative).

I'll create a grid of scatter plots to visualize these relationships.

#35

Plot top feature correlations

# Define the top 3 features by absolute correlation magnitude
top_features = ['LSTAT', 'RM', 'PTRATIO']

# Create a figure with 3 subplots
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

for i, feature in enumerate(top_features):
    sns.scatterplot(data=df, x=feature, y='MEDV', ax=axes[i], alpha=0.6)
    axes[i].set_title(f'{feature} vs MEDV')
    axes[i].set_xlabel(feature)
    axes[i].set_ylabel('Median Value (1,000 USD)')

plt.tight_layout()
plt.show()

#36

#37

Thinking

The user wants scatter plots of the top 3 features vs price. Looking at the notebook state:

Cell 5 calculated the correlation matrix and identified top correlations.
Cell 6 defined top_features = ['LSTAT', 'RM', 'PTRATIO'] and created the scatter plots.
Output 6 shows a figure with 3 axes, containing the scatter plots.
The values supporting this selection were explicitly printed in Output 5 (LSTAT -0.737, RM 0.695, PTRATIO -0.507).

The goal "create scatter plots of the top 3 features vs price" has been executed. The outputs contain both the visual charts and the numeric justification (correlation values).

According to the priority rules:

Path A if current outputs are sufficient.
The goal was to create the plots. They are created.
The insights should be extracted from these outputs.

Insights:

LSTAT vs MEDV: Strong negative linear relationship. As the proportion of lower-status population increases, the house price (MEDV) tends to decrease.
RM vs MEDV: Strong positive linear relationship. As the average number of rooms increases, the house price increases.
PTRATIO vs MEDV: Negative relationship. Higher pupil-teacher ratios generally correlate with lower house prices.

I will provide a natural final answer based on these observations.

#38

The scatter plots confirm the strong relationships identified in the correlation analysis:

LSTAT (Lower Status of the Population): There is a clear negative non-linear trend; as the percentage of lower-status population increases, house prices drop significantly. This is the most influential predictor of price in the dataset.
RM (Average Number of Rooms): There is a strong positive linear relationship, where an increase in the number of rooms consistently leads to higher house prices.
PTRATIO (Pupil-Teacher Ratio): There is a general negative trend, indicating that areas with lower pupil-teacher ratios (better classroom environments) tend to have higher property values.

Score cards for gemma4:31b for Boston Housing Prices EDA in Python

Total score: 10/10

Task Completion: 2/2
Excellent
All major requested steps were completed: dataset loading, basic statistics, MEDV distribution plot, correlation heatmap, top correlated features, and three scatter plots.
Execution Correctness: 2/2
Excellent
The code is coherent and likely runnable. It successfully uses pandas, seaborn, and matplotlib in a standard EDA workflow.
Output Quality: 3/3
Excellent
Outputs match the expected outcomes semantically: 506x14 dataset, descriptive stats, right-skewed MEDV distribution with median 21.2, correlations showing RM and LSTAT as top features, and three scatter plots.
Reasoning Quality: 2/2
Excellent
The workflow correctly interprets the dataset and selects relevant features based on correlation, with clear EDA reasoning throughout.
Reliability: 1/1
Excellent
The workflow is consistent and robust overall, with only minor presentation choices that do not affect the analysis.

Try MLJAR Studio

Run the same type of AI-powered data analysis on your own datasets with conversational notebooks in MLJAR Studio.

Download Free Trial Learn about AI Data Analyst

Explore More AI Analysis Examples

Discover additional workflows across categories. Each example includes prompts, conversation outputs, and model-level scoring so you can compare approaches and results.

Boston Housing Prices EDA in Python

What this AI workflow does

Who this example is for

Expected analysis outcomes

Tools and libraries used

Prompt sequence

Model Comparison

AI Analysis with gemma4:31b

Score cards for gemma4:31b for Boston Housing Prices EDA in Python

Try MLJAR Studio

Explore More AI Analysis Examples

Exploratory Data Analysis

Time Series

Data Analysis

Machine Learning

NLP

Finance

Statistics