Split data into train and test subsets

Split data into train and test subsets. Please specify the train size ratio. Typical values for train size is in range 0.6 to 0.9, but your might vary. The rest of data samples are used for testing. You have option to control data shuffle and stratification in Advanced options. You can specify the random seed to control reproducibility.

mlmachine-learningsplit

Required packages

You need below packages to use the code generated by recipe. All packages are automatically installed in MLJAR Studio.

scikit-learn>=1.5.0

Interactive recipe

You can use below interactive recipe to generate code. This recipe is available in MLJAR Studio.

In the below recipe, we assume that you have following variables available in your notebook:

df_1 (type DataFrame)
df_2 (type DataFrame)

Python code

# Python code will be here

Code explanation

Split data into train and test subsets.
Display shapes of new data sets.

Example Python notebooks

Please find inspiration in example notebooks

Train Decision Tree on Iris data set

Data wrangling cookbook

Code recipes from Data wrangling cookbook.

« Previous: Delete Column
Next »: Check missing values