AutoML Housing Use Case

Discover MLJAR AutoML's transformative potential in the housing market. See how automated machine learning enhances property analysis, refines price predictions, and improves market trend insights.

This notebook was created with MLJAR Studio

MLJAR Studio is Python code editior with interactive code recipes and local AI assistant.
You have code recipes UI displayed at the top of code cells.

Documentation

How AutoML Can Help

MLJAR AutoML simplifies complex housing market analysis by automating the application of machine learning models. The tool helps with a variety of housing-related tasks such as predicting property prices, identifying market trends and evaluating investment opportunities with high precision and ease. The house-prices dataset contains 1,460 samples and, as usual, includes various characteristics such as area, number of bedrooms and bathrooms, location, year of construction and others. These features are used to predict house prices by capturing the relationship between property features and market values. MLJAR AutoML drives business growth by automating complex data analysis, enabling precise insights and forecasts that empower companies to make informed decisions and seize market opportunities with confidence.

Let's make some predictions ๐Ÿค—!

# import packages
import pandas as pd
from sklearn.model_selection import train_test_split
from supervised import AutoML

Load data

We will read data from a given dataset.

# load example dataset
df = pd.read_csv("https://raw.githubusercontent.com/pplonski/datasets-for-start/master/house_prices/data.csv")
# display DataFrame shape
print(f"Loaded data shape {df.shape}")
# display first rows
df.head()

Split dataframe to train/test

To split a dataframe into train and test sets, we divide the data to create separate datasets for training and evaluating a model. This ensures we can assess the model's performance on unseen data.

# split data
train, test = train_test_split(df, train_size=0.95, shuffle=True, random_state=42)
# display data shapes
print(f"All data shape {df.shape}")
print(f"Train shape {train.shape}")
print(f"Test shape {test.shape}")

Select X,y for ML training

We will split the training set into features (X) and target (y) variables for model training.

# create X columns list and set y column
x_cols = ["Id", "MSSubClass", "MSZoning", "LotFrontage", "LotArea", "Street", "Alley", "LotShape", "LandContour", "Utilities", "LotConfig", "LandSlope", "Neighborhood", "Condition1", "Condition2", "BldgType", "HouseStyle", "OverallQual", "OverallCond", "YearBuilt", "YearRemodAdd", "RoofStyle", "RoofMatl", "Exterior1st", "Exterior2nd", "MasVnrType", "MasVnrArea", "ExterQual", "ExterCond", "Foundation", "BsmtQual", "BsmtCond", "BsmtExposure", "BsmtFinType1", "BsmtFinSF1", "BsmtFinType2", "BsmtFinSF2", "BsmtUnfSF", "TotalBsmtSF", "Heating", "HeatingQC", "CentralAir", "Electrical", "1stFlrSF", "2ndFlrSF", "LowQualFinSF", "GrLivArea", "BsmtFullBath", "BsmtHalfBath", "FullBath", "HalfBath", "BedroomAbvGr", "KitchenAbvGr", "KitchenQual", "TotRmsAbvGrd", "Functional", "Fireplaces", "FireplaceQu", "GarageType", "GarageYrBlt", "GarageFinish", "GarageCars", "GarageArea", "GarageQual", "GarageCond", "PavedDrive", "WoodDeckSF", "OpenPorchSF", "EnclosedPorch", "3SsnPorch", "ScreenPorch", "PoolArea", "PoolQC", "Fence", "MiscFeature", "MiscVal", "MoSold", "YrSold", "SaleType", "SaleCondition"]
y_col = "SalePrice"
# set input matrix
X = train[x_cols]
# set target vector
y = train[y_col]
# display data shapes
print(f"X shape is {X.shape}")
print(f"y shape is {y.shape}")

Fit AutoML

We need to train a model for our dataset. The fit() method will handle the model training and optimization automatically. In this case we will use Compete mode.

# create automl object
automl = AutoML(total_time_limit=300, mode="Compete")
# train automl
automl.fit(X, y)

Predict

Generate predictions using the trained AutoML model on test data.

# predict with AutoML
predictions = automl.predict(test)
# predicted values
print(predictions)

Conclusions

The adoption of MLJAR AutoML in the housing industry brings transformative benefits. It offers precise property valuations, insightful market trend analyses, and personalized customer experiences. By automating complex data analysis and operational tasks, AutoML allows real estate professionals to make informed decisions and focus more on strategic initiatives. As this technology advances, its role in enhancing efficiency and accuracy in the housing sector will become increasingly indispensable, paving the way for a smarter, more responsive real estate market.

See you soon๐Ÿ‘‹.

Recipes used in the automl-housing-use-case.ipynb

All code recipes used in this notebook are listed below. You can click them to check their documentation.

Packages used in the automl-housing-use-case.ipynb

List of packages that need to be installed in your Python environment to run this notebook. Please note that MLJAR Studio automatically installs and imports required modules for you.

pandas>=1.0.0

scikit-learn>=1.5.0

mljar-supervised>=1.1.7