AutoML Energy Usage Use Case

How AutoML Can Help

We are working with a large dataset (19,735 instances) in the Appliances Energy dataset use case, each of which shows the patterns of energy consumption of different home appliances over time. This dataset provides a comprehensive picture of how various devices affect total energy use by capturing features like appliance types, usage patterns, and temporal data. Predicting patterns of energy consumption and optimizing usage to reduce waste and expenses present a challenge. By automating model selection, hyperparameter tuning, and evaluation, MLJAR AutoML can expedite this procedure. Because of this automation, precise predictive models that estimate energy consumption, pinpoint the main causes of excessive consumption, and provide practical advice for improving energy efficiency can be developed. We can accelerate model development, increase prediction accuracy, and create more efficient plans for controlling and lowering household energy use by leveraging MLJAR AutoML.

Business Value

30%
Faster

The amount of effort required to extract meaningful insights from energy data can be reduced by around 30% because to the automation features of MLJAR AutoML. This facilitates expedited decision-making in the areas of waste reduction, energy consumption optimization, and sustainability program implementation.

40%
More Efficient

Large energy consumption datasets may be analyzed up to 40% faster with MLJAR AutoML, which makes it possible to find inefficiencies and optimization opportunities more quickly. This increase in efficiency results in the quicker application of energy-saving techniques and the prompt modification of energy management plans.

25%
Better Accuracy

When compared to conventional techniques, the sophisticated algorithms of MLJAR AutoML can increase the accuracy of energy usage projections by up to 25%. This improved accuracy aids in more accurate energy demand forecasts, which lowers operating costs and facilitates more efficient resource allocation.

40%
Quicker Model Development

MLJAR AutoML saves roughly 40% of the time and resources needed to construct predictive models by automating the processes of model selection and tuning. These financial savings could go toward other important projects like modernizing infrastructure or developing new energy-related technology.

AutoML Report

With its ability to generate comprehensive reports full of informative data, MLJAR AutoML provides deep insights into model performance, data analysis, and assessment measures. Here are a few of these instances.

Leaderboard

AutoML chose rmse as its performance measure to assess the value of trained models. As the table and graph below demonstrate, 3_Default_Xgboost was subsequently selected as the best possible model.

Best model name model_type metric_type metric_value train_time
1_Baseline Baseline rmse 106.587 0.59
2_DecisionTree Decision Tree rmse 101.641 8.03
the best 3_Default_Xgboost Xgboost rmse 75.5399 19.57
4_Default_NeuralNetwork Neural Network rmse 89.3408 2.18
5_Default_RandomForest Random Forest rmse 98.1887 5.11
Ensemble Ensemble rmse 75.5399 0.13

AutoML Performance

AutoML Performance

Spearman Correlation of Models

The relationships between various models are depicted in the figure according to their rank-order correlation. Spearman correlation highlights the strength of the models' performance rankings by measuring how effectively the relationship between two variables can be expressed using a monotonic function. Stronger associations are indicated by higher correlation values, indicating that models perform similarly across various measures or datasets.

models spearman correlation

Feature Importance

The plot visualizes the significance of various features across different models. In this heatmap, each cell represents the importance of a specific feature for a particular model, with the color intensity indicating the level of importance. Darker or more intense colors signify higher importance, while lighter colors indicate lower importance. This visualization helps in comparing the contribution of features across multiple models, highlighting which features consistently play a critical role and which are less influential in predictive performance.

Feature Importance across models

Install and import necessary packages

Install the packages with the command:

pip install pandas, mljar-supervised, scikit-learn

Import the packages into your code:

# import packages
import pandas as pd
from sklearn.model_selection import train_test_split
from supervised import AutoML
from sklearn.metrics import accuracy_score

Load data

Import the dataset containing information about energy appliances.

# read data from csv file
df = pd.read_csv(r"C:\Users\my_notebooks\energydata_complete.csv")
# display data shape
print(df.shape)
# display first rows
df.head()
(19735, 29)
date Appliances lights T1 RH_1 T2 RH_2 T3 RH_3 T4 ... T9 RH_9 T_out Press_mm_hg RH_out Windspeed Visibility Tdewpoint rv1 rv2
0 2016-01-11 17:00:00 60 30 19.89 47.596667 19.2 44.790000 19.79 44.730000 19.000000 ... 17.033333 45.53 6.600000 733.5 92.0 7.000000 63.000000 5.3 13.275433 13.275433
1 2016-01-11 17:10:00 60 30 19.89 46.693333 19.2 44.722500 19.79 44.790000 19.000000 ... 17.066667 45.56 6.483333 733.6 92.0 6.666667 59.166667 5.2 18.606195 18.606195
2 2016-01-11 17:20:00 50 30 19.89 46.300000 19.2 44.626667 19.79 44.933333 18.926667 ... 17.000000 45.50 6.366667 733.7 92.0 6.333333 55.333333 5.1 28.642668 28.642668
3 2016-01-11 17:30:00 50 40 19.89 46.066667 19.2 44.590000 19.79 45.000000 18.890000 ... 17.000000 45.40 6.250000 733.8 92.0 6.000000 51.500000 5.0 45.410389 45.410389
4 2016-01-11 17:40:00 60 40 19.89 46.333333 19.2 44.530000 19.79 45.000000 18.890000 ... 17.000000 45.40 6.133333 733.9 92.0 5.666667 47.666667 4.9 10.084097 10.084097

5 rows × 29 columns

Select features and target

We will split the dataset into features (X) and target (y) variables for model training.

# create X columns list and set y column
x_cols = ["date", "lights", "T1", "RH_1", "T2", "RH_2", "T3", "RH_3", "T4", "RH_4", "T5", "RH_5", "T6", "RH_6", "T7", "RH_7", "T8", "RH_8", "T9", "RH_9", "T_out", "Press_mm_hg", "RH_out", "Windspeed", "Visibility", "Tdewpoint", "rv1", "rv2"]
y_col = "Appliances"
# set input matrix
X = df[x_cols]
# set target vector
y = df[y_col]
# display data shapes
print(f"X shape is {X.shape}")
print(f"y shape is {y.shape}")
X shape is (19735, 28)
y shape is (19735,)

Split dataframe to train/test

To split a dataframe into train and test sets, we divide the data to create separate datasets for training and evaluating a model. This ensures we can assess the model's performance on unseen data.

# split data
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.95, shuffle=True, random_state=42)
# display data shapes
print(f"X_train shape {X_train.shape}")
print(f"X_test shape {X_test.shape}")
print(f"y_train shape {y_train.shape}")
print(f"y_test shape {y_test.shape}")
X_train shape (18748, 28)
X_test shape (987, 28)
y_train shape (18748,)
y_test shape (987,)

Fit AutoML

We need to train a model for our dataset. The fit() method will handle the model training and optimization automatically.

# create automl object
automl = AutoML(total_time_limit=300, mode="Explain")
# train automl
automl.fit(X_train, y_train)

Compute predictions

Use the trained AutoML model to make predictions on test data.

# predict with AutoML
predictions = automl.predict(X_test)
# predicted values
print(predictions)
[ 45.981655 180.34007   48.46944  188.92409   58.201576 119.89384
 156.58505   98.59373   46.404278  80.04506  163.35693   73.77315
  ...
  85.225655  71.46049  265.62012   65.11302   66.18507   55.80021
  57.64439   47.538292  52.368004]

Display result

We want to see clearly our predictions.

# create data frame and display it  
result = pd.DataFrame(data={"Prediction": predictions})
result
Prediction
0 45.981655
1 180.340073
2 48.469440
3 188.924088
4 58.201576
... ...
982 66.185066
983 55.800209
984 57.644390
985 47.538292
986 52.368004

987 rows × 1 columns

Conlusions

MLJAR AutoML provides a number of benefits for optimizing and comprehending energy consumption trends when used for energy usage forecasting. MLJAR AutoML improves the accuracy of energy consumption estimates and offers more profound insights into the variables influencing energy usage by automating the difficult procedures of data analysis and model creation. Large datasets may be handled effectively because to this sophisticated automation, which also highlights patterns and abnormalities that conventional approaches can miss. By providing more accurate forecasts and useful insights, MLJAR AutoML has the potential to enhance operational efficiency, promote sustainable energy practices, and drive more successful energy strategies as it becomes more integrated into energy management procedures.

See you soon👋.