Use case on the cnae-9 dataset

Dataset cnae-9

Machine Learning Task: Multiclass classification

This is a cnae-9 database. It is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories. The original texts were preprocessed to obtain the current data set: initially, it was kept only letters, and then was removed prepositions of the texts. Next, the words were transformed into their canonical form. Finally, each document was represented as a vector, where the weight of each word is its frequency in the document. This data set is highly sparse.

Available at OpenML:

Category: Business

# Rows: 1,080 # Columns: 856

Target: Class


Numeric: V1, V2, V3, V4, V5, V6, V7, V8, V9, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V20, ...

Machine Learning Use Case Business

Cross-Entropy Loss (LOGLOSS)

Cnae 9 Logloss

Accuracy (ACC)

Cnae 9 Acc

Balanced Accuracy (BALACC)

Cnae 9 Balacc

