Sports vs Politics Classifier

A comprehensive machine learning study comparing 5 classic models with 4 feature encodings on the AG News dataset (sports vs politics)

📊

Dataset

AG News dataset filtered to sports and politics (World class)

60,000 documents 30,000 sports 30,000 politics
🔤

Features

Four feature extraction approaches compared

BoW Unigram BoW Bigram TF-IDF 1-gram TF-IDF 1-2 gram
🤖

Models

Five classic machine learning algorithms

Logistic Reg Linear SVM KNN Naive Bayes Random Forest
📈

Metrics

Comprehensive evaluation on test set

Accuracy Precision Recall F1 Score

🏆 Top Results

Best Configuration: TF-IDF (Bigram) + SVM

0.9774
Accuracy
0.9774
F1 Score
0.9781
Precision
0.9774
Recall
Rank Vectorizer Model Accuracy Precision Recall F1 Score
#1 TF-IDF (1-2 gram) SVM 0.9774 0.9781 0.9774 0.9774
#2 TF-IDF (1-gram) SVM 0.9762 0.9762 0.9762 0.9762
#3 BoW (1-2 gram) Logistic Reg. 0.9751 0.9751 0.9751 0.9751
#4 BoW (1-2 gram) Naive Bayes 0.9741 0.9742 0.9741 0.9741
#5 TF-IDF (1-gram) Logistic Reg. 0.9736 0.9736 0.9736 0.9736
#6 BoW (1-2 gram) SVM 0.9734 0.9734 0.9734 0.9734
#7 TF-IDF (1-2 gram) Logistic Reg. 0.9734 0.9734 0.9734 0.9734
#8 TF-IDF (1-gram) Naive Bayes 0.9732 0.9732 0.9732 0.9732
#9 BoW (unigram) Logistic Reg. 0.9732 0.9732 0.9732 0.9732
#10 TF-IDF (1-2 gram) Naive Bayes 0.9729 0.9729 0.9729 0.9729

💡 Key Findings

Model Performance Analysis

🎯 Linear Models Excel

Logistic Regression and SVM achieved the highest average F1 scores (~0.973), demonstrating their effectiveness on high-dimensional sparse text data. The best configuration (TF-IDF bigram + SVM) reached 97.74% F1 score.

📊 TF-IDF Significantly Better

TF-IDF representations significantly outperformed Bag of Words (+5.11% F1), demonstrating that inverse document frequency weighting helps distinguish topical keywords from common words.

🎲 Multiple Strong Configurations

The top 5 models achieved F1 scores between 97.36% and 97.74%, indicating several viable approaches for this classification task with realistic performance metrics.

📉 KNN Struggles with High Dimensionality

KNN performed poorly (F1 = 0.64-0.80), suffering from the curse of dimensionality where distance metrics become unreliable in sparse high-dimensional feature spaces.

⚠️ Dataset Transition: BBC → AG News

We initially used the BBC News dataset but found it produced unrealistic 100% accuracy across multiple models. The BBC dataset has highly distinct topical vocabulary making classes trivially separable. We switched to AG News (60K documents) for more realistic evaluation with vocabulary overlap between sports and politics.

📋 Dataset Statistics

60,000
Total Documents
30,000
Sports (50.0%)
30,000
Politics (50.0%)
42,000
Training Set (70%)
18,000
Test Set (30%)