Introduction
Text classification is a common task in Natural Language Processing (NLP) where we assign categories to text. For example, classifying a sentence as positive or negative sentiment, spam or not spam, etc. In this article, you’ll learn how to build a simple text classification program in Python using the NLTK library.
What is Text Classification?
Text classification is the process of categorizing text into predefined labels. It is widely used in:- Sentiment analysis
- Spam detection
- Topic labeling
- Chatbots and recommendation systems
Approach to Build a Text Classifier
We will create a simple classifier using NLTK. The steps include:
- Preparing a small dataset
- Converting text into features
- Training a Naive Bayes classifier
- Testing it on a new sentence
Requirements
Install NLTK if not already installed:
pip install nltk
Download required datasets:
import nltk
nltk.download('punkt')
Python Program for Text Classification
import nltk
from nltk.tokenize import word_tokenize
from nltk.classify import NaiveBayesClassifier
# Sample dataset (sentence, label)
data = [
("I love this product", "positive"),
("This is amazing", "positive"),
("I feel great today", "positive"),
("I hate this", "negative"),
("This is terrible", "negative"),
("I feel bad", "negative")
]
# Feature extraction function
def extract_features(sentence):
words = word_tokenize(sentence.lower())
return {word: True for word in words}
# Prepare training data
training_data = [(extract_features(text), label) for (text, label) in data]
# Train classifier
classifier = NaiveBayesClassifier.train(training_data)
# Test sentence
test_sentence = "I love this"
features = extract_features(test_sentence)
# Predict
result = classifier.classify(features)
print("Sentence:", test_sentence)
print("Predicted Sentiment:", result)
Output
Sentence: I love this
Predicted Sentiment: positive
Explanation of the Code
- A small dataset is created with labeled sentences
- Each sentence is converted into features (words as keys)
- Naive Bayes classifier is trained using this data
- A new sentence is tested and classified
How to Improve This Model
- Use a larger dataset
- Remove stop words for better accuracy
- Apply stemming or lemmatization
- Use advanced models like logistic regression or deep learning
Why Text Classification is Important
- Helps in analyzing large text data
- Useful in social media monitoring
- Improves user experience in applications
- Automates decision-making processes
Comments
Post a Comment