Read data

Load sample dataset

Read example dataset to pandas DataFrame. Datasets are loaded from GitHub repository datasets-for-start, you need an internet connection to load them.

Binary classification datasets:

  • Adult dataset - Predict whether an individual's income exceeds $50K/year based on census data.
  • Breast Cancer dataset - Predict the presence of breast cancer based on various medical attributes.
  • Credit Scoring dataset - Predict the likelihood of a customer defaulting on a loan.
  • Employee Attrition dataset - Predict whether an employee will leave the company based on various factors.
  • Pima Indians Diabetes - Predict the onset of diabetes based on diagnostic measurements.
  • SPECT dataset - Predict heart disease based on SPECT imaging data.
  • Titanic dataset - Predict the survival of passengers based on various features such as age, gender, and class.

Multiclass classification datasets:

  • Iris dataset - Classify iris flowers into three different species based on their physical attributes.
  • Wine dataset - Classify wines into different categories based on their chemical properties.

Regression datasets:

  • Housing dataset - Predict housing prices based on various features of the houses.
  • House prices dataset - Predict the final price of homes based on various features and attributes.

Explory data analysis dataset:

  • IMDB Top 1000 The Best 1000 - A comprehensive collection of IMDb's top 1000 movies, including ratings, genres, and release information.
  • World Happiness Report 2023 - Country-level happiness indicators from 2023, covering economic, social, and governance factors.
  • US House Prices 1950-2024 - Historical US housing market data (1950-2024), featuring home prices and economic indicators.
sampleexampledatasetpandas

Required packages

You need below packages to use the code generated by recipe. All packages are automatically installed in MLJAR Studio.

pandas>=1.0.0

Interactive recipe

You can use below interactive recipe to generate code. This recipe is available in MLJAR Studio.

Python code

# Python code will be here

Code explanation

  1. We are using Pandas package and read_csv() function. It is reading CSV files from URL. All datasets are available in the GitHub repository datasets-for-start.
  2. After DataFrame is loaded, we display shape of data, number of rows and number of columns.
  3. We display first rows from DataFrame.
« Previous
Read data
Next »
Read CSV