Read data
Load sample dataset
Read example dataset to pandas DataFrame. Datasets are loaded from GitHub repository datasets-for-start, you need an internet connection to load them.
Binary classification datasets:
- Adult dataset - Predict whether an individual's income exceeds $50K/year based on census data.
- Breast Cancer dataset - Predict the presence of breast cancer based on various medical attributes.
- Credit Scoring dataset - Predict the likelihood of a customer defaulting on a loan.
- Employee Attrition dataset - Predict whether an employee will leave the company based on various factors.
- Pima Indians Diabetes - Predict the onset of diabetes based on diagnostic measurements.
- SPECT dataset - Predict heart disease based on SPECT imaging data.
- Titanic dataset - Predict the survival of passengers based on various features such as age, gender, and class.
Multiclass classification datasets:
- Iris dataset - Classify iris flowers into three different species based on their physical attributes.
- Wine dataset - Classify wines into different categories based on their chemical properties.
Regression datasets:
- Housing dataset - Predict housing prices based on various features of the houses.
- House prices dataset - Predict the final price of homes based on various features and attributes.
Explory data analysis dataset:
- IMDB Top 1000 The Best 1000 - A comprehensive collection of IMDb's top 1000 movies, including ratings, genres, and release information.
- World Happiness Report 2023 - Country-level happiness indicators from 2023, covering economic, social, and governance factors.
- US House Prices 1950-2024 - Historical US housing market data (1950-2024), featuring home prices and economic indicators.
sampleexampledatasetpandas
Required packages
You need below packages to use the code generated by recipe. All packages are automatically installed in MLJAR Studio.
pandas>=1.0.0
Interactive recipe
You can use below interactive recipe to generate code. This recipe is available in MLJAR Studio.
Python code
# Python code will be here
Code explanation
- We are using Pandas package and
read_csv()
function. It is reading CSV files from URL. All datasets are available in the GitHub repository datasets-for-start. - After DataFrame is loaded, we display shape of data, number of rows and number of columns.
- We display first rows from DataFrame.
Example Python notebooks
Please find inspiration in example notebooks
Read data cookbook
Code recipes from Read data cookbook.