Python Text Classification Using NLTK with Example Code (Beginner Guide)

Introduction

Text classification is one of the most important tasks in Natural Language Processing (NLP). Whether you're building a spam filter, sentiment analyzer, or chatbot, classification helps machines understand human language.

But beginners often struggle with how to actually implement it in Python. In this guide, you'll learn how to build a simple text classification model using NLTK step-by-step with clean code and real output.

Python text classification using NLTK example with sentiment analysis code and output

Concept Explanation

Text classification is the process of assigning predefined labels to text data.

Common use cases include:

Sentiment analysis (positive/negative)
Spam detection (spam/not spam)
Topic classification
Chatbots and recommendation systems

How it works:

Convert text into features (words)
Train a machine learning model
Predict category for new text

Program Code

pip install nltk

import nltk
nltk.download('punkt')

import nltk
from nltk.tokenize import word_tokenize
from nltk.classify import NaiveBayesClassifier

# Sample dataset
data = [
    ("I love this product", "positive"),
    ("This is amazing", "positive"),
    ("I feel great today", "positive"),
    ("I hate this", "negative"),
    ("This is terrible", "negative"),
    ("I feel bad", "negative")
]

# Feature extraction
def extract_features(sentence):
    words = word_tokenize(sentence.lower())
    return {word: True for word in words}

# Training data
training_data = [(extract_features(text), label) for (text, label) in data]

# Train model
classifier = NaiveBayesClassifier.train(training_data)

# Test sentence
test_sentence = "I love this"
features = extract_features(test_sentence)

# Prediction
result = classifier.classify(features)

print("Sentence:", test_sentence)
print("Predicted Sentiment:", result)

Sample Output

Sentence: I love this
Predicted Sentiment: positive

Step-by-Step Explanation

A small dataset is created with labeled sentences
Text is converted into features using tokenization
Naive Bayes classifier is trained
New sentence is classified based on learned patterns

Time Complexity

The time complexity depends on:

Training: O(n × m) where n = number of samples, m = words per sentence
Prediction: O(m)

Real-World Applications

Email spam filtering
Sentiment analysis on social media
Customer feedback analysis
News categorization

Common Mistakes

Using very small datasets
Not cleaning text data
Ignoring stop words
Not testing with new data

Best Practices

Use larger datasets for better accuracy
Apply preprocessing (stopword removal, stemming)
Use cross-validation
Try advanced models after basics

Frequently Asked Questions

1. What is NLTK used for?
NLTK is a Python library used for Natural Language Processing tasks like tokenization, classification, and sentiment analysis.

2. Which algorithm is used here?
Naive Bayes classifier is used for simple and fast text classification.

3. Can I use this for real projects?
Yes, but you should train it with larger datasets for better performance.

4. What is tokenization?
It is the process of splitting text into words or tokens.

🔗 Related Articles

Conclusion

Text classification using NLTK is one of the easiest ways to get started with NLP. With just a few lines of Python code, you can build a working sentiment analyzer.

Once you're comfortable with this, you can move to advanced models like Logistic Regression, SVM, or deep learning for better accuracy and real-world applications.

SureToCode – Practical Coding Tutorials for Java, PHP, Web Development and AI

Search This Blog