AutoML Mobile Price Use Case

How AutoML Can Help

The Mobile Price dataset contains a variety of mobile phone-related characteristics, such as RAM, internal storage, screen size, and CPU type, which are used to forecast the price range of these devices. This dataset offers a thorough understanding of how various specifications affect the cost of mobile phones, providing valuable information to makers and users alike. In this situation, MLJAR AutoML can greatly improve the process of mobile price prediction. MLJAR AutoML simplifies the process of developing predictive models by automating the processes of model selection, tuning, and evaluation. This guarantees precise and effective forecasts of mobile phone costs. In the end, this automation helps stakeholders make intelligent choices about mobile device pricing strategies and market positioning by not only simplifying the model generation process but also aiding in the identification of key factors that impact pricing.

Business Value

25%
More Accurate

When compared to previous methods, AutoML can increase pricing prediction accuracy by up to 25%, resulting in more accurate pricing strategies that better reflect customer demand and market realities.

30%
Cost Reduction

By automating the model-building process, MLJAR AutoML can decrease the costs associated with data science and model development by approximately 30%, freeing up resources for other strategic initiatives.

40%
Accelerated Model Development

MLJAR AutoML can reduce the time required to develop and deploy predictive models for mobile prices by up to 40%, allowing businesses to quickly adapt to market changes and optimize pricing strategies.

AutoML Report

MLJAR AutoML offers profound insights into model performance, data analysis, and assessment metrics through its capacity to provide extensive reports filled with insightful data. Here are a couple such examples.

Leaderboard

To evaluate the effectiveness of trained models, AutoML has used logloss as its performance measure. As can be seen in the table and graph below, Ensemble was consequently chosen as the best model.

Best model	name	model_type	metric_type	metric_value	train_time
	1_Baseline	Baseline	logloss	1.38629	0.56
	2_DecisionTree	Decision Tree	logloss	0.614034	14.36
	3_Linear	Linear	logloss	0.222379	7.27
	4_Default_Xgboost	Xgboost	logloss	0.216422	9.33
	5_Default_NeuralNetwork	Neural Network	logloss	0.31977	1.1
	6_Default_RandomForest	Random Forest	logloss	0.437807	7.86
the best	Ensemble	Ensemble	logloss	0.180543	0.24

AutoML Performance

Spearman Correlation of Models

The pairwise Spearman correlation coefficients between various models are shown in the heatmap. The monotonic relationship between the predictions of two models is represented by the strength of each cell. Strong correlations are shown by values around 1 and weak or nonexistent correlations by values near 0. This heatmap allows comprehension of the relative performance of several models in terms of ranking data points.

models spearman correlation

Feature Importance

The plot visualizes the significance of various features across different models. In this heatmap, each cell represents the importance of a specific feature for a particular model, with the color intensity indicating the level of importance. Darker or more intense colors signify higher importance, while lighter colors indicate lower importance. This visualization helps in comparing the contribution of features across multiple models, highlighting which features consistently play a critical role and which are less influential in predictive performance.

Feature Importance across models

Install and import necessary packages

Install the packages with the command:

pip install pandas, mljar-supervised

Import the packages into your code:

# import packages
import pandas as pd
from supervised import AutoML

Load training data

Import the dataset containing information about mobile prices.

# read data from csv file
train = pd.read_csv(r"C:\Users\my_notebooks\train.csv")
# display data shape
print(train.shape)
# display first rows
train.head()

(2000, 21)

	battery_power	blue	clock_speed	dual_sim	fc	four_g	int_memory	m_dep	mobile_wt	n_cores	...	px_height	px_width	ram	sc_h	sc_w	talk_time	three_g	touch_screen	wifi	price_range
0	842	0	2.2	0	1	0	7	0.6	188	2	...	20	756	2549	9	7	19	0	0	1	1
1	1021	1	0.5	1	0	1	53	0.7	136	3	...	905	1988	2631	17	3	7	1	1	0	2
2	563	1	0.5	1	2	1	41	0.9	145	5	...	1263	1716	2603	11	2	9	1	1	0	2
3	615	1	2.5	0	0	0	10	0.8	131	6	...	1216	1786	2769	16	8	11	1	0	0	2
4	1821	1	1.2	0	13	1	44	0.6	141	2	...	1208	1212	1411	8	2	15	1	1	0	1

5 rows × 21 columns

Select X,y for ML training

We will split the training set into features (X_train) and target (y_train) variables for model training.

# create X columns list and set y column
x_cols = ["battery_power", "blue", "clock_speed", "dual_sim", "fc", "four_g", "int_memory", "m_dep", "mobile_wt", "n_cores", "pc", "px_height", "px_width", "ram", "sc_h", "sc_w", "talk_time", "three_g", "touch_screen", "wifi"]
y_col = "price_range"
# set input matrix
X_train = train[x_cols]
# set target vector
y_train = train[y_col]
# display data shapes
print(f"X_train shape is {X_train.shape}")
print(f"y_train shape is {y_train.shape}")

X_train shape is (2000, 20)
y_train shape is (2000,)

Fit AutoML

We need to train a model for our dataset. The fit() method will handle the model training and optimization automatically.

# create automl object
automl = AutoML(total_time_limit=300, mode="Explain")
# train automl
automl.fit(X_train, y_train)

Load test data

Import the dataset on which we will make predictions.

# read data from csv file
test = pd.read_csv(r"C:\Users\my_notebooks\test.csv")
# display data shape
print(test.shape)
# display first rows
test.head()

(1000, 21)

	id	battery_power	blue	clock_speed	dual_sim	fc	four_g	int_memory	m_dep	mobile_wt	...	pc	px_height	px_width	ram	sc_h	sc_w	talk_time	three_g	touch_screen	wifi
0	1	1043	1	1.8	1	14	0	5	0.1	193	...	16	226	1412	3476	12	7	2	0	1	0
1	2	841	1	0.5	1	4	1	61	0.8	191	...	12	746	857	3895	6	0	7	1	0	0
2	3	1807	1	2.8	0	1	0	27	0.9	186	...	4	1270	1366	2396	17	10	10	0	1	1
3	4	1546	0	0.5	1	18	1	25	0.5	96	...	20	295	1752	3893	10	0	7	1	1	0
4	5	1434	0	1.4	0	11	1	49	0.5	108	...	18	749	810	1773	15	8	7	1	0	1

5 rows × 21 columns

Compute predictions

Use the trained AutoML model to make predictions of mobile phones prices.

# predict with AutoML
predictions = automl.predict(test)
# predicted values
print(predictions)

[3 3 2 3 1 3 3 1 3 0 3 3 0 0 2 0 2 1 3 2 1 3 1 1 3 0 2 0 2 0 2 0 3 0 1 1 3
 1 2 1 1 2 0 0 0 1 0 3 1 2 1 0 3 0 3 1 3 1 1 3 3 2 0 1 1 1 2 3 1 2 1 2 2 3
 ...
 2 1 1 2 2 3 3 0 2 1 2 1 3 1 1 3 0 2 0 0 3 3 2 0 0 0 0 3 2 3 3 0 0 2 1 0 ]

Display result

We want to see clearly which mobile phone (id) has what price range (Cost).

# create mapping dict
pred_mapping = {0: 'low cost', 1: 'medium cost', 2: 'high cost', 3: 'very high cost'}
# convert list using comprehension
cat_list = [pred_mapping[x] for x in predictions]
# create data frame and display it
result = pd.DataFrame(data = {"Cost": cat_list}, index=test["id"])
result

id	Cost
1	very high cost
2	very high cost
3	high cost
4	very high cost
5	medium cost
...	...
996	high cost
997	medium cost
998	low cost
999	high cost
1000	high cost

1000 rows × 1 columns

Conlusions

MLJAR AutoML provides significant advantages for pricing strategy optimization and market responsiveness in the mobile price use case. MLJAR AutoML increases price prediction accuracy and facilitates strategic decision-making by automating the complex process of evaluating massive datasets relevant to mobile pricing. Businesses can quickly adjust to market dynamics and customer trends thanks to this automation, which also expedites model development and lowers related expenses. As a result, businesses may implement more accurate pricing plans and maintain their competitiveness in a market that is changing quickly. Pricing strategies could be completely changed by the increasing use of MLJAR AutoML, which offers a competitive advantage through improved productivity and data-driven insights.

See you soon👋.