Tensorflow vs Scikit-learn

Have you ever wonder what is the difference between Tensorflow and Sckit-learn? Which one is better? Have you ever needed Tensorflow when you already use Scikit-learn?

Both packages are from the Machine Learning world. The Tensorflow is a library for differentiable programming. It allows constructing Machine Learning algorithms such as Neural Networks. It is used in Deep Learning. It was developed by Google. At the time of writing, its GitHub repository has 149k stars.

The Scikit-learn is a library that contains ready algorithms for Machine Learning, which can be used to solve tasks like: classification, regression, clustering. It has also a set of methods for data preparation. It was designed to cooperate with packages like NumPy, SciPy, Pandas, Matplotlib. Its GitHub repository has 42.5k stars.

The Tensorflow was designed to construct Deep Neural Networks which can work with various data formats: tabular data, images, text, audio, videos. On the other hand, the Scikit-learn is rather for the tabular data.

Multi Layer Perceptron

In the case of tabular data, a popular architecture of Neural Network (NN) is a Multi-Layer Perceptron (MLP). In Tensorflow you can, of course, build almost any type of NN. The interesting fact is that the MLP algorithm is also available in Scikit-learn. There are available two algorithms:

for classification: MLPClassifier
for regression: MLPRegressor

Let's compare them with Tensorflow! :)

Tensorflow vs Scikit-learn compared on tabular data

The implmentation of MLP Neural Network with Keras and Tensorflow

In the comparison, I will use simple MLP architecture with 2 hidden layers and Adam optimizer.

To use Tensorflow, I will use Keras which provides higher-level API abstraction with ready NN layers.

The code to construct the MLP with Tensorflow and Keras (TF version == 2.2.0, Keras version == 2.3.1):

# data iformation
input_dim = 12 # number of input features
number_of_classes = 3 # number of classes
ml_task = "multiclass_classification" # task that we are going to solve
# initial hyperparameters
dense_1_size = 32
dense_2_size = 16
learning_rate = 0.05

# define model architecture
model = Sequential()
model.add(Dense(dense_1_size, activation="relu", input_dim=input_dim))
model.add(Dense(dense_2_size, activation="relu"))
if ml_task == "multiclass_classification":
    model.add(Dense(number_of_classes, activation="softmax"))
elif ml_task == "binary_classification":
    model.add(Dense(1, activation="sigmoid"))
else: # regression
    model.add(Dense(1))

# compile the model
opt = Adam(learning_rate=learning_rate)
if ml_task == "multiclass_classification":
    model.compile(optimizer=opt, loss="categorical_crossentropy")
elif ml_task == "binary_classification":
    model.compile(optimizer=opt, loss="binary_crossentropy")
else: # regression
    model.compile(optimizer=opt, loss="mean_squared_error")

And the code to train the MLP with early stopping on 10% of the train data:

# X and y are training data

batch_size = min(200, X.shape[0])
epochs = 500
      
stratify = y if ml_task != "regression" else None
X_train, X_vald, y_train, y_vald = train_test_split(
    X, y, test_size=0.1, shuffle=True, stratify=stratify
)
# set callbacks
es = EarlyStopping(monitor="val_loss", mode="min", verbose=0, patience=10)
mc = ModelCheckpoint(
    "best_model.h5",
    monitor="val_loss",
    mode="min",
    verbose=0,
    save_best_only=True,
)

model.fit(
    X_train,
    y_train,
    validation_data=(X_vald, y_vald),
    batch_size=batch_size,
    epochs=500,
    verbose=False,
    callbacks=[es, mc],
)

The implementation of the MLP Neural Network with Scikit-learn

The Scikit-learn version == 0.23.2. The code to create the model:

# initial hyperparameters
dense_1_size = 32
dense_2_size = 16
learning_rate = 0.05
epochs = 500 

# the model
model = MLPClassifier(
    hidden_layer_sizes=(dense_1_size, dense_2_size),
    activation="relu",
    solver="adam",
    learning_rate="constant",
    learning_rate_init=learning_rate, 
    early_stopping=True,
    max_iter=epochs
)

If you need the MLP for regression just change the MLPClassifier to MLPRegressor.

The code to train the model:

# X and y are training data
model.fit(X, y)

That's all! As you can see, you need much more code with Tensorflow+Keras - it is a much flexible library in terms of constructing Neural Networks and that's why you need to define the whole architecture by yourself. The implementation of MLP from Scikit-learn is more like an off-the-shelf algorithm.

Let's take some data!

I get the data for comparison from Penn Machine Learning Benchamrks. They have a lot of example datasets there. I considered only datasets with 1k or more rows. The methodology of comparison was simple:

take the dataset, split it, 75% of data is used for training and 25% for testing,
train different Neural Networks architectures, and select the best one (based on 5-fold cross-validation on train data), use the best model to compute predictions on test samples,
the data preparation (converting categoricals, target scaling in regression) is handled by AutoML mljar-supervised,
the hyperparameters tuning is handled by AutoML mljar-supervised,
for classification, I've used logloss and for regression mean squared error metrics.

I used below set of hyperparameters for both implementations of MLP:

nn_params = {
    "dense_1_size": [16, 32, 64],
    "dense_2_size": [4, 8, 16, 32],
    "learning_rate": [0.01, 0.05, 0.08, 0.1],
}

The AutoML was drawing values of hyperparameters from the defined set. A model was skipped, if the drawn values were repeated. There were trained up to 13 different Neural Networks for each implementation for each dataset.

The classification task

There were used 66 datasets in classification (both binary and multiclass). The results are reported in the table below. The time reported is in seconds - it is a total time used for checking all drawn architectures.

| Dataset |:-----------------------------------------------------------| | Epistasis_2_1000atts_0.4H | 1600 | 1000 | | Epistasis_2_20atts_0.1H | 1600 | 20 | | Epistasis_2_20atts_0.4H | 1600 | 20 | | Epistasis_3_20atts_0.2H | 1600 | 20 | | Heterogeneity_50 | 1600 | 20 | 2 | 0.662284 | Heterogeneity_75 | 1600 | 20 | 2 | 0.685651 | Hill_Valley_with_noise | Hill_Valley_without_noise | adult | agaricus_lepiota | allbp | allhyper | allhypo | allrep | ann_thyroid | car | car_evaluation | chess | churn | clean2 | cmc | coil2000 | connect_4 | contraceptive | credit_g | dis | dna | fars | flare | german | hypothyroid | kr_vs_kp | krkopt | led24 | led7 | letter | magic | mfeat_factors | mfeat_fourier | mfeat_karhunen | mfeat_morphological | mfeat_pixel | mfeat_zernike | mnist | mofn_3_7_10 | mushroom | nursery | optdigits | page_blocks | parity5+5 | pendigits | phoneme | poker | ring | satimage | segmentation | shuttle | sleep | solar_flare_2 | spambase | splice | texture | twonorm | waveform_21 | waveform_40 | wine_quality_red | nrows | ncols | classes | Sklearn_logloss | TF_logloss | Sklearn_time | TF_time | --------:|--------:|------------:|--------------:|-------------:|-----------:|----------:| 2 | 0.692272 | 0.693278 | 111.71 | 140.36 | 2 | 0.69291 | 0.694732 | 27.42 | 111.56 | 2 | 0.654114 | 0.666128 | 20.41 | 95.89 | 2 | 0.69219 | 0.693055 | 18.5 | 97.85 | | 0.692669 | 24.51 | 102.42 | | 0.684641 | 22.04 | 150.61 | | 1212 | 100 | 2 | 0.701181 | 0.679262 | 23.4 | 183.42 | | 1212 | 100 | 2 | 0.693029 | 0.693106 | 22.19 | 151.83 | | 48842 | 14 | 2 | 0.31302 | 0.308794 | 225.9 | 529.94 | | 8145 | 22 | 2 | 0.000476217 | 1.28456e-08 | 74.34 | 1417.37 | | 3772 | 29 | 3 | 0.105024 | 0.0873737 | 24.18 | 261.19 | | 3771 | 29 | 4 | 0.0543183 | 0.0540823 | 40.24 | 239.92 | | 3770 | 29 | 3 | 0.15851 | 0.182609 | 36.58 | 300.35 | | 3772 | 29 | 4 | 0.0965378 | 0.0877159 | 33.73 | 289.45 | | 7200 | 21 | 3 | 0.0663306 | 0.0522971 | 118.12 | 281.09 | | 1728 | 6 | 4 | 0.0565009 | 0.0270733 | 25.95 | 336.2 | | 1728 | 21 | 4 | 0.0355119 | 0.0162453 | 26.77 | 364.57 | | 3196 | 36 | 2 | 0.0381836 | 0.0262048 | 36.16 | 433.17 | | 5000 | 20 | 2 | 0.232703 | 0.233059 | 45.63 | 342.09 | | 6598 | 168 | 2 | 0.00358971 | 1.72954e-05 | 106.99 | 1065.38 | | 1473 | 9 | 3 | 0.872536 | 0.865697 | 12.78 | 255.28 | | 9822 | 85 | 2 | 0.206189 | 0.205792 | 103.68 | 394.26 | | 67557 | 42 | 3 | 0.454379 | 0.449654 | 874.97 | 1194.23 | | 1473 | 9 | 3 | 0.872536 | 0.864501 | 13.88 | 284.1 | | 1000 | 20 | 2 | 0.54315 | 0.539984 | 19.76 | 310.27 | | 3772 | 29 | 2 | 0.0700903 | 0.0487778 | 27.56 | 325.68 | | 3186 | 180 | 3 | 0.159211 | 0.149867 | 45.03 | 415.93 | | 100968 | 29 | 8 | 0.469888 | 0.470934 | 612.58 | 1017.39 | | 1066 | 10 | 2 | 0.393395 | 0.388351 | 21.64 | 413.38 | | 1000 | 20 | 2 | 0.518058 | 0.521267 | 35.91 | 756.61 | | 3163 | 25 | 2 | 0.0643045 | 0.0698073 | 84.57 | 900.39 | | 3196 | 36 | 2 | 0.048205 | 0.0500323 | 29.96 | 136.31 | | 28056 | 6 | 18 | 16.0305 | 17.489 | 322.73 | 438.99 | | 3200 | 24 | 10 | 0.847339 | 0.852022 | 28.64 | 97.79 | | 3200 | 7 | 10 | 0.830557 | 0.822725 | 21.63 | 131.69 | | 20000 | 16 | 26 | 20.9291 | 20.7423 | 230.99 | 263.22 | | 19020 | 10 | 2 | 0.310426 | 0.307443 | 172.18 | 270.02 | | 2000 | 216 | 10 | 13.1879 | 12.7521 | 40.47 | 134.59 | | 2000 | 76 | 10 | 5.41445 | 6.88417 | 56.89 | 160.14 | | 2000 | 64 | 10 | 11.1031 | 11.0863 | 30.74 | 308.79 | | 2000 | 6 | 10 | 6.74621 | 8.46319 | 36.51 | 495.13 | | 2000 | 240 | 10 | 0.105948 | 0.130936 | 88.05 | 534.58 | | 2000 | 47 | 10 | 8.05229 | 8.27279 | 45.2 | 273.91 | | 70000 | 784 | 10 | 0.106536 | 0.115172 | 1844.9 | 2386.88 | | 1324 | 10 | 2 | 0.0100037 | 1.38658e-07 | 24.3 | 620.48 | | 8124 | 22 | 2 | 0.00113378 | 2.32726e-08 | 66.92 | 1643.03 | | 12958 | 8 | 4 | 0.00837833 | 0.00787663 | 218.48 | 413.13 | | 5620 | 64 | 10 | 0.0614936 | 0.0626042 | 58.47 | 279.3 | | 5473 | 10 | 5 | 0.111967 | 0.109957 | 38.02 | 302.59 | | 1124 | 10 | 2 | 0.366705 | 0.296488 | 17.58 | 347.32 | | 10992 | 16 | 10 | 0.0244666 | 0.0216054 | 85.26 | 346.54 | | 5404 | 5 | 2 | 0.304275 | 0.299198 | 47.83 | 437.41 | | 1025010 | 10 | 10 | 0.00452098 | 0.00726474 | 4062.23 | 2916.7 | | 7400 | 20 | 2 | 0.0829392 | 0.0794716 | 90.69 | 614.34 | | 6435 | 36 | 6 | 0.242709 | 0.22644 | 63.6 | 365.15 | | 2310 | 19 | 7 | 0.118182 | 0.110797 | 34.36 | 313.02 | | 58000 | 9 | 7 | 0.00396879 | 0.00399241 | 254.09 | 783.73 | | 105908 | 13 | 5 | 0.63446 | 0.62662 | 412.82 | 1260.44 | | 1066 | 12 | 6 | 0.567718 | 0.549793 | 23.22 | 454.14 | | 4601 | 57 | 2 | 0.164812 | 0.162715 | 55.17 | 511.57 | | 3188 | 60 | 3 | 0.314676 | 0.365399 | 46.27 | 309.57 | | 5500 | 40 | 11 | 15.8874 | 20.6093 | 58.74 | 573.67 | | 7400 | 20 | 2 | 0.0600894 | 0.0581619 | 46.37 | 717.49 | | 5000 | 21 | 3 | 0.293883 | 0.297824 | 45.55 | 544.52 | | 5000 | 40 | 3 | 0.290914 | 0.291565 | 47.11 | 536.87 | | 1599 | 11 | 6 | 0.930231 | 0.954956 | 34.55 | 437.98 |

The results can be plotted as scatter plot: Tensorflow vs Scikit-learn compared on classification

Let's zoom to the [0,1] range:

Tensorflow vs Scikit-learn compared on classification Zoomed

There were 66 datasets and the Tensorflow implementation was 39 times better than Scikit-learn implementation. The differences weren't huge.

It's worth to take a look at times of computation. All computations were on the CPU. The mean time of computation for Scikit-learn was 177 seconds while for Tensorflow it was 508 seconds. The Scikit-learn is much faster. Maybe because TF is intended to be used on GPU rather than CPU? I don't know.

The regression task

There were 48 datasets in the regression task. The results are in the table below:

| Dataset | nrows | ncols | Scikit_MSE | TF_MSE | Scikit_time | TF_time | |:---------------------------------|--------:|--------:|-----------------:|-----------------:|-----------:|----------:| | 1028_SWD | 1000 | 10 | 0.351288 | 0.350476 | 16.9 | 84.53 | | 1029_LEV | 1000 | 4 | 0.404629 | 0.398672 | 19.1 | 64.97 | | 1030_ERA | 1000 | 4 | 2.51881 | 2.5505 | 17.34 | 92.35 | | 1191_BNG_pbc | 1000000 | 18 | 688631 | 688662 | 3567.38 | 3557.82 | | 1193_BNG_lowbwt | 31104 | 9 | 207709 | 208230 | 135.2 | 224.09 | | 1196_BNG_pharynx | 1000000 | 10 | 85241.6 | 85288.7 | 3562.32 | 3290.37 | | 1199_BNG_echoMonths | 17496 | 9 | 135.59 | 135.655 | 86.55 | 152.94 | | 1201_BNG_breastTumor | 116640 | 9 | 91.819 | 91.538 | 416.2 | 879.4 | | 1203_BNG_pwLinear | 177147 | 10 | 7.64677 | 7.64574 | 431.18 | 874.16 | | 1595_poker | 1025010 | 10 | 0.0828265 | 0.0928364 | 3371.3 | 3165.36 | | 197_cpu_act | 8192 | 21 | 6.04194 | 6.18341 | 102.1 | 197.32 | | 201_pol | 15000 | 48 | 6.82901 | 6.484 | 128.53 | 220.26 | | 215_2dplanes | 40768 | 10 | 1.00462 | 1.00613 | 149.9 | 395.09 | | 218_house_8L | 22784 | 8 | 7.6172e+08 | 7.66826e+08 | 153.39 | 389.73 | | 225_puma8NH | 8192 | 8 | 10.2505 | 10.3683 | 39.81 | 255.74 | | 227_cpu_small | 8192 | 12 | 7.90558 | 8.14906 | 47.62 | 295.11 | | 294_satellite_image | 6435 | 36 | 0.490969 | 0.482076 | 79.77 | 324.38 | | 344_mv | 40768 | 10 | 0.00101928 | 0.00076863 | 191.04 | 563.72 | | 4544_GeographicalOriginalofMusic | 1059 | 117 | 0.274018 | 0.252322 | 24.36 | 212.47 | | 503_wind | 6574 | 14 | 9.51609 | 9.43174 | 35.83 | 337.39 | | 529_pollen | 3848 | 4 | 2.00028 | 1.98782 | 20.59 | 320.87 | | 537_houses | 20640 | 8 | 2.8563e+09 | 2.78972e+09 | 130.66 | 449.33 | | 562_cpu_small | 8192 | 12 | 7.90558 | 8.03778 | 46.5 | 391.88 | | 564_fried | 40768 | 10 | 1.06119 | 1.04008 | 199.99 | 648.26 | | 573_cpu_act | 8192 | 21 | 6.04194 | 6.11835 | 97.13 | 351.69 | | 574_house_16H | 22784 | 16 | 9.59662e+08 | 9.6518e+08 | 172.56 | 545.2 | | 583_fri_c1_1000_50 | 1000 | 50 | 0.784129 | 0.767178 | 20.21 | 275.6 | | 586_fri_c3_1000_25 | 1000 | 25 | 0.515327 | 0.645629 | 18.89 | 435.76 | | 588_fri_c4_1000_100 | 1000 | 100 | 0.84392 | 0.917454 | 18.02 | 333.99 | | 589_fri_c2_1000_25 | 1000 | 25 | 0.449973 | 0.572697 | 20.68 | 352.22 | | 590_fri_c0_1000_50 | 1000 | 50 | 0.341366 | 0.373753 | 23.86 | 358.61 | | 592_fri_c4_1000_25 | 1000 | 25 | 0.527293 | 0.618391 | 19.53 | 524.52 | | 593_fri_c1_1000_10 | 1000 | 10 | 0.0616339 | 0.106806 | 23.56 | 351.52 | | 595_fri_c0_1000_10 | 1000 | 10 | 0.0737259 | 0.0729692 | 15.99 | 353.94 | | 598_fri_c0_1000_25 | 1000 | 25 | 0.223486 | 0.234454 | 22.21 | 356.71 | | 599_fri_c2_1000_5 | 1000 | 5 | 0.0255253 | 0.0254379 | 23.48 | 376.14 | | 606_fri_c2_1000_10 | 1000 | 10 | 0.0528226 | 0.0531012 | 25.89 | 391.02 | | 607_fri_c4_1000_50 | 1000 | 50 | 0.792365 | 0.777099 | 20.87 | 560.02 | | 608_fri_c3_1000_10 | 1000 | 10 | 0.0529629 | 0.0580307 | 25.5 | 353.62 | | 609_fri_c0_1000_5 | 1000 | 5 | 0.0420409 | 0.043429 | 15.71 | 463.82 | | 612_fri_c1_1000_5 | 1000 | 5 | 0.0313801 | 0.0311381 | 24.51 | 526.56 | | 618_fri_c3_1000_50 | 1000 | 50 | 0.8721 | 0.881979 | 18.3 | 517.28 | | 620_fri_c1_1000_25 | 1000 | 25 | 0.713895 | 0.748985 | 15.09 | 370.89 | | 622_fri_c2_1000_50 | 1000 | 50 | 0.814633 | 0.876445 | 21.74 | 486.49 | | 623_fri_c4_1000_10 | 1000 | 10 | 0.0580734 | 0.059318 | 26.21 | 461.91 | | 628_fri_c3_1000_5 | 1000 | 5 | 0.0664999 | 0.0591807 | 19.43 | 465.6 | | banana | 5300 | 2 | 0.287642 | 0.286359 | 39.67 | 741.51 | | titanic | 2201 | 3 | 0.662838 | 0.659142 | 24.62 | 719.5 |

The results can be plotted as scatter plot:

Tensorflow vs Scikit-learn compared on regression

Let's zoom the [0,100] range:

Tensorflow vs Scikit-learn compared on regression Zoomed

Surprise, surprise! The Scikit-learn MLPRegressor was 28 times out of 48 datasets better than Tensorflow! Again, as in classification, the differences aren't huge. In time comparison, by average it is 286 seconds for Scikit-learn and 586 seconds for Tensorflow.

Summary

The Tensorflow library is intended to be used to define Deep Neural Networks. All algorithms are defined by the user manually. The high-level packages as Keras can help to speed up the process of NN construction. The library can be used with a variety of data types: tabular, images, text, audio.
The Scikit-learn package has ready algorithms to be used for classification, regression, clustering ... It works mainly with tabular data.
When comparing Tensorflow vs Scikit-learn on tabular data with classic Multi-Layer Perceptron and computations on CPU, the Scikit-learn package works very well. It has similar or better results and is very fast.
When you are using Scikit-learn and need classic MLP architecture, in my opinion, there is no need to grab Tensorflow (which is, by the way, quite a big package over 500MB on PyPi)

If you are developing Machine Learning models and want to save time, you definitely should try our AutoML mljar-supervised! It is amazing :)