Spam Message AI Classifier

Features

Spam vs Ham text classification
NLP text vectorization using CountVectorizer
Multinomial Naive Bayes machine learning model
Data visualizations using Matplotlib and Seaborn
Word cloud generation for spam messages
Confusion matrix heatmap
Learning curve visualization
Interactive terminal message predictor
Accuracy evaluation and testing pipeline

Tech Stack

Python
Pandas
NumPy
Scikit-learn
Matplotlib
Seaborn
WordCloud
Jupyter Notebook

Repository Structure

spam-message-ai-classifier/
│
├── Spam_Message_AI_Model.ipynb
├── images/
│   ├── dataset_distribution.png
│   ├── spam_wordcloud.png
│   ├── confusion_matrix.png
│   └── learning_curve.png
├── README.md
└── requirements.txt

Dataset

This project uses the SMS Spam Collection Dataset:

Ham = Normal message
Spam = Unwanted or promotional message

Dataset source:

https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv

Project Workflow

Load and clean dataset
Convert labels into numerical format
Split data into training and testing sets
Transform text into vectors using NLP
Train Multinomial Naive Bayes model
Evaluate accuracy and confusion matrix
Visualize results and important spam words
Predict custom user messages

Example Predictions

Input: Congratulations! You won a free iPhone!
Prediction: Spam

Input: Hey, are we still meeting after school?
Prediction: Not Spam

Installation

Clone the repository:

git clone https://github.com/DagaVedant/Spam-Message-AI-Classifier.git
cd spam-message-ai-classifier

Install dependencies:

pip install pandas numpy matplotlib seaborn scikit-learn wordcloud

Run the notebook:

jupyter notebook

Screenshots / Images

Dataset Distribution Graph

Spam Word Cloud

Confusion Matrix

Learning Curve

Model Performance

The model is evaluated using:

Accuracy Score
Confusion Matrix
Cross Validation Learning Curves

The classifier performs well for beginner-level NLP spam detection and demonstrates how machine learning can process and classify text data efficiently.

Future Improvements

Add TF-IDF vectorization
Use deep learning models like LSTMs or Transformers
Deploy as a web app using Flask or Streamlit
Add real-time SMS filtering
Improve preprocessing and feature engineering

License

This project is open-source and available under the MIT License.

Author

Developed as a machine learning and NLP project focused on spam message detection and text classification for my CS Club

Developed as a machine learning and NLP project focused on spam message detection and text classification.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
LICENSE		LICENSE
README.md		README.md
Spam_Message_AI_Model.ipynb		Spam_Message_AI_Model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam Message AI Classifier

Features

Tech Stack

Repository Structure

Dataset

Project Workflow

Example Predictions

Installation

Screenshots / Images

Dataset Distribution Graph

Spam Word Cloud

Confusion Matrix

Learning Curve

Model Performance

Future Improvements

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spam Message AI Classifier

Features

Tech Stack

Repository Structure

Dataset

Project Workflow

Example Predictions

Installation

Screenshots / Images

Dataset Distribution Graph

Spam Word Cloud

Confusion Matrix

Learning Curve

Model Performance

Future Improvements

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages