Python Program to Remove Stop Words from a Text File Using NLTK

on April 17, 2026

Introduction

When working with text data, some words like “the”, “is”, “in”, and “and” appear very frequently but do not add much meaning. These are called stop words. Removing them is an important step in text preprocessing, especially in Natural Language Processing (NLP). In this article, you’ll learn how to read a passage from a text file and remove stop words using Python and the NLTK library.

Python program to remove stop words from text file using NLTK with example code

What are Stop Words?

Stop words are common words that are usually filtered out before processing text. Examples include:

the, is, in, at, on, and, a, an

Removing these words helps focus on meaningful content and improves performance in tasks like text analysis, search, and machine learning.

Requirements

Before running the program, install NLTK:


pip install nltk

Also, download the stopwords dataset:


import nltk
nltk.download('stopwords')

Python Program to Remove Stop Words


import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Ensure required data is downloaded
nltk.download('punkt')
nltk.download('stopwords')

# Load stop words
stop_words = set(stopwords.words('english'))

# Read text from file
with open("input.txt", "r") as file:
    text = file.read()

# Tokenize text into words
words = word_tokenize(text)

# Remove stop words
filtered_words = [word for word in words if word.lower() not in stop_words]

# Join words back into sentence
filtered_text = " ".join(filtered_words)

print("Original Text:\n", text)
print("\nText After Removing Stop Words:\n", filtered_text)

Example Input (input.txt)


This is a simple example to demonstrate how to remove stop words from a text file using Python.

Output


Original Text:
This is a simple example to demonstrate how to remove stop words from a text file using Python.

Text After Removing Stop Words:
This simple example demonstrate remove stop words text file using Python .

“If you're learning NLP, a beginner-friendly Python and NLTK guide can help you understand concepts faster.”

→ Buy now and improve your coding skills

Explanation of the Code

The program reads text from a file named input.txt
It tokenizes the text into individual words
It loads English stop words using NLTK
It filters out words that match the stop words list
Finally, it joins and prints the cleaned text

Why Removing Stop Words is Important

Reduces data size
Improves processing speed
Focuses on meaningful words
Enhances NLP model performance

Conclusion

Removing stop words is a simple yet powerful preprocessing step in NLP. Using Python and NLTK, you can easily clean text data from files and prepare it for further analysis. This method is useful for beginners and can be extended for more advanced text processing tasks.

Comments