AutoML Energy Usage Use Case

How AutoML Can Help

We are working with a large dataset (19,735 instances) in the Appliances Energy dataset use case, each of which shows the patterns of energy consumption of different home appliances over time. This dataset provides a comprehensive picture of how various devices affect total energy use by capturing features like appliance types, usage patterns, and temporal data. Predicting patterns of energy consumption and optimizing usage to reduce waste and expenses present a challenge. By automating model selection, hyperparameter tuning, and evaluation, MLJAR AutoML can expedite this procedure. Because of this automation, precise predictive models that estimate energy consumption, pinpoint the main causes of excessive consumption, and provide practical advice for improving energy efficiency can be developed. We can accelerate model development, increase prediction accuracy, and create more efficient plans for controlling and lowering household energy use by leveraging MLJAR AutoML.

Business Value

30%
Faster

The amount of effort required to extract meaningful insights from energy data can be reduced by around 30% because to the automation features of MLJAR AutoML. This facilitates expedited decision-making in the areas of waste reduction, energy consumption optimization, and sustainability program implementation.

40%
More Efficient

Large energy consumption datasets may be analyzed up to 40% faster with MLJAR AutoML, which makes it possible to find inefficiencies and optimization opportunities more quickly. This increase in efficiency results in the quicker application of energy-saving techniques and the prompt modification of energy management plans.

25%
Better Accuracy

When compared to conventional techniques, the sophisticated algorithms of MLJAR AutoML can increase the accuracy of energy usage projections by up to 25%. This improved accuracy aids in more accurate energy demand forecasts, which lowers operating costs and facilitates more efficient resource allocation.

40%
Quicker Model Development

MLJAR AutoML saves roughly 40% of the time and resources needed to construct predictive models by automating the processes of model selection and tuning. These financial savings could go toward other important projects like modernizing infrastructure or developing new energy-related technology.

AutoML Report

With its ability to generate comprehensive reports full of informative data, MLJAR AutoML provides deep insights into model performance, data analysis, and assessment measures. Here are a few of these instances.

Leaderboard

AutoML chose rmse as its performance measure to assess the value of trained models. As the table and graph below demonstrate, 3_Default_Xgboost was subsequently selected as the best possible model.

Best model	name	model_type	metric_type	metric_value	train_time
	1_Baseline	Baseline	rmse	106.587	0.59
	2_DecisionTree	Decision Tree	rmse	101.641	8.03
the best	3_Default_Xgboost	Xgboost	rmse	75.5399	19.57
	4_Default_NeuralNetwork	Neural Network	rmse	89.3408	2.18
	5_Default_RandomForest	Random Forest	rmse	98.1887	5.11
	Ensemble	Ensemble	rmse	75.5399	0.13

AutoML Performance

Spearman Correlation of Models

The relationships between various models are depicted in the figure according to their rank-order correlation. Spearman correlation highlights the strength of the models' performance rankings by measuring how effectively the relationship between two variables can be expressed using a monotonic function. Stronger associations are indicated by higher correlation values, indicating that models perform similarly across various measures or datasets.

models spearman correlation

Feature Importance

The plot visualizes the significance of various features across different models. In this heatmap, each cell represents the importance of a specific feature for a particular model, with the color intensity indicating the level of importance. Darker or more intense colors signify higher importance, while lighter colors indicate lower importance. This visualization helps in comparing the contribution of features across multiple models, highlighting which features consistently play a critical role and which are less influential in predictive performance.

Feature Importance across models

Install and import necessary packages

Install the packages with the command:

pip install pandas, mljar-supervised, scikit-learn

Import the packages into your code:

# import packages
import pandas as pd
from sklearn.model_selection import train_test_split
from supervised import AutoML
from sklearn.metrics import accuracy_score

Load data

Import the dataset containing information about energy appliances.

# read data from csv file
df = pd.read_csv(r"C:\Users\my_notebooks\energydata_complete.csv")
# display data shape
print(df.shape)
# display first rows
df.head()

(19735, 29)

	date	Appliances	lights	T1	RH_1	T2	RH_2	T3	RH_3	T4	...	T9	RH_9	T_out	Press_mm_hg	RH_out	Windspeed	Visibility	Tdewpoint	rv1	rv2
0	2016-01-11 17:00:00	60	30	19.89	47.596667	19.2	44.790000	19.79	44.730000	19.000000	...	17.033333	45.53	6.600000	733.5	92.0	7.000000	63.000000	5.3	13.275433	13.275433
1	2016-01-11 17:10:00	60	30	19.89	46.693333	19.2	44.722500	19.79	44.790000	19.000000	...	17.066667	45.56	6.483333	733.6	92.0	6.666667	59.166667	5.2	18.606195	18.606195
2	2016-01-11 17:20:00	50	30	19.89	46.300000	19.2	44.626667	19.79	44.933333	18.926667	...	17.000000	45.50	6.366667	733.7	92.0	6.333333	55.333333	5.1	28.642668	28.642668
3	2016-01-11 17:30:00	50	40	19.89	46.066667	19.2	44.590000	19.79	45.000000	18.890000	...	17.000000	45.40	6.250000	733.8	92.0	6.000000	51.500000	5.0	45.410389	45.410389
4	2016-01-11 17:40:00	60	40	19.89	46.333333	19.2	44.530000	19.79	45.000000	18.890000	...	17.000000	45.40	6.133333	733.9	92.0	5.666667	47.666667	4.9	10.084097	10.084097

5 rows × 29 columns

Select features and target

We will split the dataset into features (X) and target (y) variables for model training.

# create X columns list and set y column
x_cols = ["date", "lights", "T1", "RH_1", "T2", "RH_2", "T3", "RH_3", "T4", "RH_4", "T5", "RH_5", "T6", "RH_6", "T7", "RH_7", "T8", "RH_8", "T9", "RH_9", "T_out", "Press_mm_hg", "RH_out", "Windspeed", "Visibility", "Tdewpoint", "rv1", "rv2"]
y_col = "Appliances"
# set input matrix
X = df[x_cols]
# set target vector
y = df[y_col]
# display data shapes
print(f"X shape is {X.shape}")
print(f"y shape is {y.shape}")

X shape is (19735, 28)
y shape is (19735,)

Split dataframe to train/test

To split a dataframe into train and test sets, we divide the data to create separate datasets for training and evaluating a model. This ensures we can assess the model's performance on unseen data.

# split data
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.95, shuffle=True, random_state=42)
# display data shapes
print(f"X_train shape {X_train.shape}")
print(f"X_test shape {X_test.shape}")
print(f"y_train shape {y_train.shape}")
print(f"y_test shape {y_test.shape}")

X_train shape (18748, 28)
X_test shape (987, 28)
y_train shape (18748,)
y_test shape (987,)

Fit AutoML

We need to train a model for our dataset. The fit() method will handle the model training and optimization automatically.

# create automl object
automl = AutoML(total_time_limit=300, mode="Explain")
# train automl
automl.fit(X_train, y_train)

Compute predictions

Use the trained AutoML model to make predictions on test data.

# predict with AutoML
predictions = automl.predict(X_test)
# predicted values
print(predictions)

[ 45.981655 180.34007   48.46944  188.92409   58.201576 119.89384
 156.58505   98.59373   46.404278  80.04506  163.35693   73.77315
  ...
  85.225655  71.46049  265.62012   65.11302   66.18507   55.80021
  57.64439   47.538292  52.368004]

Display result

We want to see clearly our predictions.

# create data frame and display it  
result = pd.DataFrame(data={"Prediction": predictions})
result

	Prediction
0	45.981655
1	180.340073
2	48.469440
3	188.924088
4	58.201576
...	...
982	66.185066
983	55.800209
984	57.644390
985	47.538292
986	52.368004

987 rows × 1 columns

Conlusions

MLJAR AutoML provides a number of benefits for optimizing and comprehending energy consumption trends when used for energy usage forecasting. MLJAR AutoML improves the accuracy of energy consumption estimates and offers more profound insights into the variables influencing energy usage by automating the difficult procedures of data analysis and model creation. Large datasets may be handled effectively because to this sophisticated automation, which also highlights patterns and abnormalities that conventional approaches can miss. By providing more accurate forecasts and useful insights, MLJAR AutoML has the potential to enhance operational efficiency, promote sustainable energy practices, and drive more successful energy strategies as it becomes more integrated into energy management procedures.

See you soon👋.