Python Program for Text Classification of a Given Sentence Using NLTK

Introduction

Text classification is a common task in Natural Language Processing (NLP) where we assign categories to text. For example, classifying a sentence as positive or negative sentiment, spam or not spam, etc. In this article, you’ll learn how to build a simple text classification program in Python using the NLTK library.

Python text classification example using NLTK with sentiment analysis code


What is Text Classification?

Text classification is the process of categorizing text into predefined labels. It is widely used in:

  • Sentiment analysis
  • Spam detection
  • Topic labeling
  • Chatbots and recommendation systems

Approach to Build a Text Classifier
We will create a simple classifier using NLTK. The steps include:

  • Preparing a small dataset
  • Converting text into features
  • Training a Naive Bayes classifier
  • Testing it on a new sentence

Requirements
Install NLTK if not already installed:

pip install nltk

Download required datasets:

import nltk
nltk.download('punkt')

Python Program for Text Classification

import nltk
from nltk.tokenize import word_tokenize
from nltk.classify import NaiveBayesClassifier

# Sample dataset (sentence, label)
data = [
("I love this product", "positive"),
("This is amazing", "positive"),
("I feel great today", "positive"),
("I hate this", "negative"),
("This is terrible", "negative"),
("I feel bad", "negative")
]

# Feature extraction function
def extract_features(sentence):
words = word_tokenize(sentence.lower())
return {word: True for word in words}

# Prepare training data
training_data = [(extract_features(text), label) for (text, label) in data]

# Train classifier
classifier = NaiveBayesClassifier.train(training_data)

# Test sentence
test_sentence = "I love this"
features = extract_features(test_sentence)

# Predict
result = classifier.classify(features)

print("Sentence:", test_sentence)
print("Predicted Sentiment:", result)

Output

Explanation of the Code

  • A small dataset is created with labeled sentences
  • Each sentence is converted into features (words as keys)
  • Naive Bayes classifier is trained using this data
  • A new sentence is tested and classified

How to Improve This Model

  • Use a larger dataset
  • Remove stop words for better accuracy
  • Apply stemming or lemmatization
  • Use advanced models like logistic regression or deep learning

Why Text Classification is Important

  • Helps in analyzing large text data
  • Useful in social media monitoring
  • Improves user experience in applications
  • Automates decision-making processes

Conclusion

Text classification using NLTK is a great starting point for beginners in NLP. With just a few lines of code, you can build a basic model that categorizes sentences. As you progress, you can enhance it with more data and advanced techniques.

Comments