Load sample dataset
Read example dataset to pandas DataFrame. Datasets are loaded from GitHub repository datasets-for-start, you need an internet connection to load them.
Binary classification datasets:
- Adult dataset - predict whether an individual's income exceeds $50K/year based on census data,
- Breast Cancer dataset - predict the presence of breast cancer based on various medical attributes.
- Credit Scoring dataset - predict the likelihood of a customer defaulting on a loan.
- Employee Attrition dataset - predict whether an employee will leave the company based on various factors.
- Pima Indians Diabetes - predict the onset of diabetes based on diagnostic measurements.
- SPECT dataset - predict heart disease based on SPECT imaging data.
- Titanic dataset - predict the survival of passengers based on various features such as age, gender, and class.
Multiclass classification datasets:
- Iris dataset - classify iris flowers into three different species based on their physical attributes,
- Wine dataset - classify wines into different categories based on their chemical properties.
Regression datasets:
- Housing dataset - predict housing prices based on various features of the houses,
- House prices dataset - predict the final price of homes based on various features and attributes.
Required packages
You need below packages to use the code generated by recipe. All packages are automatically installed in MLJAR Studio.
pandas>=1.0.0
Interactive recipe
You can use below interactive recipe to generate code. This recipe is available in MLJAR Studio.
Python code
# Python code will be here
Code explanation
- We are using Pandas package and
read_csv()
function. It is reading CSV files from URL. All datasets are available in the GitHub repository datasets-for-start. - After DataFrame is loaded, we display shape of data, number of rows and number of columns.
- We display first rows from DataFrame.
Example Python notebooks
Please find inspiration in example notebooks
- Train Random Forest regressor
The `scikit-learn` provides implementation of [Random Forest](/glossary/random-forest/) ...
- Visualize Decision Tree
The Decision Tree algorithm's structure is human-readable, a key advantage. In ...
- Decision Tree features importance
`Scikit-learn's` permutation importance assesses the impact of each feature on ...
- Train Decision Tree classifier
Classification is a task of predicting discrete target labels. The Python `scikit-learn` ...
- Matplotlib scatter plot
I enjoy using `matplotlib` for crafting impressive scatter plots in my notebooks. ...
- Train Decision Tree on Iris data set
Python is a great choice for Machine Learning projects, because of rich ML packages ...
- Train Decision Tree regressor
Train a Decision Tree Regressor using scikit-learn. This machine learning algorithm ...
- Train Random Forest classifier
Python implementation of Random Forest algorithm available in `scikit-learn` package ...
- Save and load Decision Tree
`Scikit-learn` provides Decision Tree algorithms for classification (`DecisionTreeClassifier`) ...
- Tune Decision Tree classifier
This notebook demonstrates tuning a Decision Tree model. We'll find the best hyperparameters ...
Read data cookbook
Code recipes from Read data cookbook.