Split data into train and test subsets
Split data into train and test subsets. Please specify the train size ratio. Typical values for train size is in range 0.6 to 0.9, but your might vary. The rest of data samples are used for testing. You have option to control data shuffle and stratification in Advanced options. You can specify the random seed to control reproducibility.
Required packages
You need below packages to use the code generated by recipe. All packages are automatically installed in MLJAR Studio.
scikit-learn>=1.5.0
Interactive recipe
You can use below interactive recipe to generate code. This recipe is available in MLJAR Studio.
In the below recipe, we assume that you have following variables available in your notebook:
- df_1 (type DataFrame)
- df_2 (type DataFrame)
Python code
# Python code will be here
Code explanation
- Split data into train and test subsets.
- Display shapes of new data sets.
Example Python notebooks
Please find inspiration in example notebooks
Data wrangling cookbook
Code recipes from Data wrangling cookbook.
- « Previous
- Delete Column
- Next »
- Check missing values