Load sample dataset

Read example dataset to pandas DataFrame. Datasets are loaded from GitHub repository datasets-for-start, you need an internet connection to load them.

Binary classification datasets:

Adult dataset - Predict whether an individual's income exceeds $50K/year based on census data.
Breast Cancer dataset - Predict the presence of breast cancer based on various medical attributes.
Credit Scoring dataset - Predict the likelihood of a customer defaulting on a loan.
Employee Attrition dataset - Predict whether an employee will leave the company based on various factors.
Pima Indians Diabetes - Predict the onset of diabetes based on diagnostic measurements.
SPECT dataset - Predict heart disease based on SPECT imaging data.
Titanic dataset - Predict the survival of passengers based on various features such as age, gender, and class.
Higgs dataset - Predict whether a particle collision event produces a Higgs boson or not.
Bank Marketing dataset - Predict customer behavior in the marketing campaign.

Multiclass classification datasets:

Iris dataset - Classify iris flowers into three different species based on their physical attributes.
Wine dataset - Classify wines into different categories based on their chemical properties.

Regression datasets:

Housing dataset - Predict housing prices based on various features of the houses.
House prices dataset - Predict the final price of homes based on various features and attributes.
California Housing - Predict the final price of homes in California.

Explory data analysis dataset:

IMDB Top 1000 The Best 1000 - A comprehensive collection of IMDb's top 1000 movies, including ratings, genres, and release information.
World Happiness Report 2023 - Country-level happiness indicators from 2023, covering economic, social, and governance factors.
US House Prices 1950-2024 - Historical US housing market data (1950-2024), featuring home prices and economic indicators.

sampleexampledatasetpandas

Required packages

You need below packages to use the code generated by recipe. All packages are automatically installed in MLJAR Studio.

pandas>=1.0.0

Interactive recipe

You can use below interactive recipe to generate code. This recipe is available in MLJAR Studio.

Python code

# Python code will be here

Code explanation

We are using Pandas package and read_csv() function. It is reading CSV files from URL. All datasets are available in the GitHub repository datasets-for-start.
After DataFrame is loaded, we display shape of data, number of rows and number of columns.
We display first rows from DataFrame.

Example Python notebooks

Please find inspiration in example notebooks

Read data cookbook

Code recipes from Read data cookbook.

« Previous: Read data
Next »: Read CSV