Data wrangling

Split data into train and test subsets

Split data into train and test subsets. Please specify the train size ratio. Typical values for train size is in range 0.6 to 0.9, but your might vary. The rest of data samples are used for testing. You have option to control data shuffle and stratification in Advanced options. You can specify the random seed to control reproducibility.


Required packages

You need below packages to use the code generated by recipe. All packages are automatically installed in MLJAR Studio.


Interactive recipe

You can use below interactive recipe to generate code. This recipe is available in MLJAR Studio.

In the below recipe, we assume that you have following variables available in your notebook:

  • df_1 (type DataFrame)
  • df_2 (type DataFrame)

Python code

# Python code will be here

Code explanation

  1. Split data into train and test subsets.
  2. Display shapes of new data sets.
Filter rows