Are hyper-parameters really important in Machine Learning?
Look at some titles of recent questions posted on Quora or Stack Overflow:
It seems that one of the most problematic topics for machine-learning self-learners is to understand the difference between parameters and hyper-parameters. The concept of hyper-parameters is very important, because these values directly influence overall performance of ML algorithms.
The simplest definition of hyper-parameters is that they are a special type of parameters that cannot be inferred from the data.
Imagine, for instance, a neural network. As you probably know, artificial neurons learning is achieved by tuning their weights in a way that the network gives the best output label in regard to the input data. However, architectures of neural networks vary depending on the task. There are many things to be considered: number of layers, size of each layer, number of connections, etc. They need to be considered before network tuning, so in this case they are called hyper-parameters.
The process of tuning hyper-parameters is called meta-optimisation (which is simply optimisation of other optimisation method) and can be realised via grid search, random search or Bayesian optimisation.
At this point you should raise your hand and ask a question: how to pick good hyper-parameters values? Well, while for standard model parameters we can infer them from data (e.g. using gradient descent for ANN), for hyper-parameters it is not the case. To be honest: we just need to guess them. But we can do this in an intelligent way, so another answer to this question can be: the best hyper-parameters give the highest score on validation data.
I would like to present you an example, using mljar.com web service, to convince you how crucial hyper-parameters are in ML models training.
For the purpose of this demo we will use “Give Me Some Credit” data from Kaggle competition to implement our own credit scoring tool. The data is publicly available at https://www.kaggle.com/c/GiveMeSomeCredit/data and consists of numerical variables describing such features of credit borrower as: age, monthly income, number of open loans and more. Obviously we also need a target attribute. It is hidden under variable SeriousDlqin2yrs, which tell whether person experienced 90 days past due delinquency or worse (1 - yes, 0 - no).
First of all, login into mljar.com service using your username and password. We start by adding a new project:
The service automatically asks about uploading a dataset:
After submission is completed:
Open the dataset panel and set SeriousDlqin2yrs usage to target and change type to categorical. Also you can unselect first Unnamed variable as this is only number of rows in csv file.
Next step is to accept column usage with a green button above the table with attributes. When you see the Attributes Selected message, everything until this point works fine.
Now it’s time to start an experiment. Go to Experiment tab in the panel on the left. Here you can create a new experiment, so don’t waste your time. You can leave most of the settings default, but remember to select uploaded dataset using dropdown list. Next - to focus our attention on just one algorithm - let’s pick Random Forest binary classifier. What is more, let’s set Metric to Area Under Curve. Note that tuning mode is set to Sport. It basically means that about 10 to 15 different hyper-parameter sets will be tested during this experiment instance.
Now you can switch to Results. Even if your experiment is not finished yet we can make some non-trivial observations here. Pick any model from the list below and scroll down. You should see a list like this:
Algorithm Random Forest
max_features: 0.5
min_samples_split: 4
criterion: entropy
min_samples_leaf: 16
All of these values are hyper-parameters for Random Forest. When you select another model, you should see the same variables with different values assigned. Tuning mode called Sport trains up to 15 models with different hyper-parameters and tests them using cross-validation (or train/test split if you select that method).
After a while, when you refresh your scoreboard, you should see that all models finished learning and you can select the best one. In my case it showed score 86.4% and had the hyper-parameters presented in a table below.
For a comparison I presented also hyper-parameters for the weakest model having score 84.7% (which is still not too bad). As you can see, increasing the minimum leaf size (min_samples_leaf) was a good strategy for this model.
In this example I showed you Random Forest training with meta-optimisation. Take a home message from that lesson is that hyper-parameters are extremely important during models tuning. It is worth to consider a whole set of values, because usually only very specific combination of them gives good performance. Who knows? Maybe if I had decided for a Perfect tuning mode in my mljar experiment, I would have gotten better results? You are welcome to test it by yourself!
More about hyper-parameters you can read from these excellent sources:
https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)
https://www.quora.com/What-are-hyperparameters-in-machine-learning