- Spam vs Ham text classification
- NLP text vectorization using CountVectorizer
- Multinomial Naive Bayes machine learning model
- Data visualizations using Matplotlib and Seaborn
- Word cloud generation for spam messages
- Confusion matrix heatmap
- Learning curve visualization
- Interactive terminal message predictor
- Accuracy evaluation and testing pipeline
- Python
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
- WordCloud
- Jupyter Notebook
spam-message-ai-classifier/
│
├── Spam_Message_AI_Model.ipynb
├── images/
│ ├── dataset_distribution.png
│ ├── spam_wordcloud.png
│ ├── confusion_matrix.png
│ └── learning_curve.png
├── README.md
└── requirements.txtThis project uses the SMS Spam Collection Dataset:
- Ham = Normal message
- Spam = Unwanted or promotional message
Dataset source:
https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv- Load and clean dataset
- Convert labels into numerical format
- Split data into training and testing sets
- Transform text into vectors using NLP
- Train Multinomial Naive Bayes model
- Evaluate accuracy and confusion matrix
- Visualize results and important spam words
- Predict custom user messages
Input: Congratulations! You won a free iPhone!
Prediction: SpamInput: Hey, are we still meeting after school?
Prediction: Not SpamClone the repository:
git clone https://github.com/DagaVedant/Spam-Message-AI-Classifier.git
cd spam-message-ai-classifierInstall dependencies:
pip install pandas numpy matplotlib seaborn scikit-learn wordcloudRun the notebook:
jupyter notebookThe model is evaluated using:
- Accuracy Score
- Confusion Matrix
- Cross Validation Learning Curves
The classifier performs well for beginner-level NLP spam detection and demonstrates how machine learning can process and classify text data efficiently.
- Add TF-IDF vectorization
- Use deep learning models like LSTMs or Transformers
- Deploy as a web app using Flask or Streamlit
- Add real-time SMS filtering
- Improve preprocessing and feature engineering
This project is open-source and available under the MIT License.
Developed as a machine learning and NLP project focused on spam message detection and text classification for my CS Club
Developed as a machine learning and NLP project focused on spam message detection and text classification.



