Use case on the phishing-websites dataset

Dataset PhishingWebsites

Machine Learning Task: Binary classification

This is the Phishing Websites Data. There's plenty of articles about predicting phishing websites have been disseminated these days; no reliable training dataset has been published publically, maybe because there is no agreement in the literature on the definitive features that characterize phishing webpages. Hence it is difficult to shape a dataset that covers all possible features. In this dataset, the authors shed light on the important features that have proved to be sound and effective in predicting phishing websites.

Available at OpenML:

Category: Web

# Rows: 11,055 # Columns: 30

Target: Result


Nominal: having_IP_Address, URL_Length, Shortining_Service, having_At_Symbol, double_slash_redirecting, Prefix_Suffix, having_Sub_Domain, SSLfinal_State, Domain_registeration_length, Favicon, port, HTTPS_token, Request_URL, URL_of_Anchor, Links_in_tags, SFH, Submitting_to_email, Abnormal_URL, Redirect, on_mouseover, ...

Machine Learning Use Case Web

Area Under ROC Curve (AUC)

Phishing Websites Auc

Accuracy (ACC)

Phishing Websites Acc

Balanced Accuracy (BALACC)

Phishing Websites Balacc

Cross-Entropy Loss (LOGLOSS)

Phishing Websites Logloss

« Back to Machine Learning Algorithms Comparison