miracle01 commited on
Commit
80a9c82
1 Parent(s): d0d3778

Upload 6 files

Browse files
Naive_Bayes_Spam_Detection.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e389ad0221c97b8034a27857fcc0fb707e4712dc73f46e22b20bb769a7ae35cc
3
+ size 1062583
README.md CHANGED
@@ -1,12 +1,13 @@
1
  ---
2
- title: Taiwo Spam Detection Project Hnd2
3
- emoji: 📚
4
  colorFrom: blue
5
- colorTo: indigo
6
  sdk: streamlit
7
- sdk_version: 1.32.2
8
  app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: SpamClassifierNaiveBayes
3
+ emoji: 😻
4
  colorFrom: blue
5
+ colorTo: red
6
  sdk: streamlit
7
+ sdk_version: 1.29.0
8
  app_file: app.py
9
  pinned: false
10
+ license: apache-2.0
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from joblib import load
2
+ from sklearn.feature_extraction.text import TfidfVectorizer
3
+ import numpy as np
4
+ import streamlit as st
5
+
6
+ info = [
7
+ {"title": "NAME", "detail": "AKINBITAN TAIWO EMMANUEL"},
8
+ {"title": "MATRIC NO", "detail": "HNDCOM/22/032"},
9
+ {"title": "CLASS", "detail": "HND2"},
10
+ {"title": "LEVEL", "detail": "400L"},
11
+ {"title": "PROJECT SUPERVISOR", "detail": ""},
12
+ ]
13
+ st.title("Project Information")
14
+
15
+ for item in info:
16
+ st.write(f"{item['title']}: {item['detail']}")
17
+
18
+ st.image('fcahpt.jpg', caption='federal college of animal health and production technology')
19
+ st.header('Spam Detection using Naive Bayes Classifier')
20
+ st.write('This is spam detection developed with python using Naive Bayes Classifier')
21
+ vectorizer = load('tfidf_vectorizer.joblib')
22
+ user_input = st.text_area("Enter some text:", "")
23
+ if user_input is not None:
24
+ x = vectorizer.transform([user_input])
25
+ model = load('Naive_Bayes_Spam_Detection.joblib')
26
+ pred = model.predict(x)
27
+ if pred[0] == 1:
28
+ st.markdown("<b>Prediction: <span style='color:red'>The entered text is likey to be a Spam, be careful </span></b>", unsafe_allow_html=True)
29
+ elif pred[0] == 0:
30
+ st.markdown("<b>Prediction: <span style='color:green'>The entered text is not a Spam and safe</span></b>", unsafe_allow_html=True)
31
+ else:
32
+ st.write('Error, Try again')
33
+
34
+ st.header("Project Description")
35
+ st.markdown("""
36
+ Spam Detection using Naive Bayes Classifier is a classic and effective approach for automatically identifying spam emails or messages.
37
+ In a comprehensive approach of how it works;
38
+ """)
39
+
40
+ st.header("1. Data Collection and Preprocessing:")
41
+ st.markdown("""
42
+ - The process begins with collecting a dataset of emails or messages labeled as spam or non-spam (ham).
43
+ - Each message undergoes preprocessing steps such as removing HTML tags, punctuation, and stopwords (commonly occurring words like "and", "the", etc.).
44
+ - The text is then tokenized and transformed into numerical representations using techniques like TF-IDF (Term Frequency-Inverse Document Frequency) or Count Vectorization.
45
+ """)
46
+
47
+ st.header("2. Understanding Naive Bayes Classifier:")
48
+ st.markdown("""
49
+ - Naive Bayes is a probabilistic classification algorithm based on Bayes' theorem, which calculates the probability of a certain event happening given the occurrence of another event.
50
+ - The "naive" assumption in Naive Bayes is that the features are conditionally independent given the class label. This simplifies the calculation and makes the algorithm computationally efficient.
51
+ """)
52
+
53
+ st.header("3. Training the Naive Bayes Model:")
54
+ st.markdown("""
55
+ - The dataset is split into training and testing sets.
56
+ - During training, the Naive Bayes classifier learns the probability distribution of words or features given each class (spam or ham).
57
+ - It calculates the prior probabilities of spam and ham messages and the likelihood probabilities of each word occurring in spam and ham messages.
58
+ - These probabilities are estimated from the training data using maximum likelihood estimation or other smoothing techniques.
59
+ """)
60
+
61
+ st.header("4. Classification:")
62
+ st.markdown("""
63
+ - Once the model is trained, it can classify new, unseen messages.
64
+ - Given a new message, the classifier calculates the probability that it belongs to each class (spam or ham) using Bayes' theorem.
65
+ - The final classification decision is based on the class with the highest probability. If the probability of a message being spam is higher than a predefined threshold, it's classified as spam; otherwise, it's classified as ham.
66
+ """)
67
+
68
+ st.header("5. Model Evaluation:")
69
+ st.markdown("""
70
+ - The performance of the Naive Bayes classifier is evaluated using metrics such as accuracy, precision, recall, and F1-score on a separate test dataset.
71
+ - These metrics help assess how well the model generalizes to unseen data and its effectiveness in distinguishing between spam and non-spam messages.
72
+ """)
73
+
74
+ st.header("6. Deployment and Fine-Tuning:")
75
+ st.markdown("""
76
+ - Once the model is trained and evaluated, it can be deployed for real-world use.
77
+ - Deployment may involve integrating the model into email systems or messaging platforms to automatically filter spam messages.
78
+ - Periodic updates and fine-tuning of the model may be necessary to adapt to changing spamming techniques and patterns.
79
+ """)
fcahpt.jpg ADDED
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ scikit-learn
2
+ joblib
3
+ streamlit
4
+ numpy
5
+ pandas
tfidf_vectorizer.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2250f89134c52246b8898de941d5d36273433b5df1840d12379e459967e8e819
3
+ size 1150476