Text classification problem #1

In this competition, you work on text classification using supervised learning.
Classify given text documents into two categories.
Each document is represented as a sequence of (encoded) words.
A bag-of-words vectorization of each document is also provided as an easy access to the data.

About the word encoding:
Each word is coded with a particular encoding rule.
For example, “Fly”, “Flying” and “flight” are coded to “F86155”, “F86155b43” and “f86155j152”, respectively.

Problem type
Classification
Evaluation metric
Area under the ROC curve (AUC)
Competition status
Completed
Started
2014/11/19 00:00 (Japan Standard Time)
Ends
2014/12/31 23:59 (Japan Standard Time)
Public/Private
Public
Invitation setting
Open to everyone

Winners' Report

1st place winner (n.otani) and 2nd place winner (Vagif) kindly share their solutions.

Download dataset and submission

You are only allowed to download the dataset and make a submission during the competition.

Final ranking

Final rank Nickname Final score Intermediate score
1 n.otani 0.99555 0.99736
2 Vagif 0.98939 0.99275
3 Ryo 0.98121 0.99247
4 ichyo 0.96923 0.97312
5 gene 0.86007 0.89639
6 emn 0.77791 0.82194
7 zb.chen 0.77534 0.81780
8 yfukuda 0.74905 0.79368
9 WindyG 0.74329 0.77672
10 Tokumei 0.74146 0.77464
11 r_takahama 0.71841 0.74613
12 University of Big Data 0.67896 0.65746
12 cocomoff 0.67896 0.65746
12 yu_jinen 0.67896 0.65746
15 m.binou 0.67573 0.70119
16 TP 0.67408 0.71077
17 yamaguchi 0.65954 0.68692
18 showwin 0.65885 0.68133
19 Japan30 0.63450 0.65681

This leaderboard is calculated on the latest submissions.
The intermediate scores are calculated using 50% of the test dataset, and the final scores are calculated using the other 50%.
Final ranks are determined according to the final scores.

Scores over time (Final score)

Your submission timeline

You have not made any submission yet.