Sports vs Politics Text Classifier

📊

Dataset

AG News dataset filtered to sports and politics (World class)

60,000 documents 30,000 sports 30,000 politics

🔤

Features

Four feature extraction approaches compared

BoW Unigram BoW Bigram TF-IDF 1-gram TF-IDF 1-2 gram

🤖

Models

Five classic machine learning algorithms

Logistic Reg Linear SVM KNN Naive Bayes Random Forest

📈

Metrics

Comprehensive evaluation on test set

Accuracy Precision Recall F1 Score

🏆 Top Results

Best Configuration: TF-IDF (Bigram) + SVM

0.9774

Accuracy

0.9774

F1 Score

0.9781

Precision

0.9774

Recall

Rank	Vectorizer	Model	Accuracy	Precision	Recall	F1 Score
#1	TF-IDF (1-2 gram)	SVM	0.9774	0.9781	0.9774	0.9774
#2	TF-IDF (1-gram)	SVM	0.9762	0.9762	0.9762	0.9762
#3	BoW (1-2 gram)	Logistic Reg.	0.9751	0.9751	0.9751	0.9751
#4	BoW (1-2 gram)	Naive Bayes	0.9741	0.9742	0.9741	0.9741
#5	TF-IDF (1-gram)	Logistic Reg.	0.9736	0.9736	0.9736	0.9736
#6	BoW (1-2 gram)	SVM	0.9734	0.9734	0.9734	0.9734
#7	TF-IDF (1-2 gram)	Logistic Reg.	0.9734	0.9734	0.9734	0.9734
#8	TF-IDF (1-gram)	Naive Bayes	0.9732	0.9732	0.9732	0.9732
#9	BoW (unigram)	Logistic Reg.	0.9732	0.9732	0.9732	0.9732
#10	TF-IDF (1-2 gram)	Naive Bayes	0.9729	0.9729	0.9729	0.9729

💡 Key Findings

Model Performance Analysis

🎯 Linear Models Excel

Logistic Regression and SVM achieved the highest average F1 scores (~0.973), demonstrating their effectiveness on high-dimensional sparse text data. The best configuration (TF-IDF bigram + SVM) reached 97.74% F1 score.

📊 TF-IDF Significantly Better

TF-IDF representations significantly outperformed Bag of Words (+5.11% F1), demonstrating that inverse document frequency weighting helps distinguish topical keywords from common words.

🎲 Multiple Strong Configurations

The top 5 models achieved F1 scores between 97.36% and 97.74%, indicating several viable approaches for this classification task with realistic performance metrics.

📉 KNN Struggles with High Dimensionality

KNN performed poorly (F1 = 0.64-0.80), suffering from the curse of dimensionality where distance metrics become unreliable in sparse high-dimensional feature spaces.

⚠️ Dataset Transition: BBC → AG News

We initially used the BBC News dataset but found it produced unrealistic 100% accuracy across multiple models. The BBC dataset has highly distinct topical vocabulary making classes trivially separable. We switched to AG News (60K documents) for more realistic evaluation with vocabulary overlap between sports and politics.

📋 Dataset Statistics

60,000

Total Documents

30,000

Sports (50.0%)

30,000

Politics (50.0%)

42,000

Training Set (70%)

18,000

Test Set (30%)