Introduction
When working with text data, some words like “the”, “is”, “in”, and “and” appear very frequently but do not add much meaning. These are called stop words. Removing them is an important step in text preprocessing, especially in Natural Language Processing (NLP). In this article, you’ll learn how to read a passage from a text file and remove stop words using Python and the NLTK library.
What are Stop Words?
Stop words are common words that are usually filtered out before processing text. Examples include:- the, is, in, at, on, and, a, an
Removing these words helps focus on meaningful content and improves performance in tasks like text analysis, search, and machine learning.
Requirements
Before running the program, install NLTK:pip install nltk
Also, download the stopwords dataset:
import nltk
nltk.download('stopwords')
Python Program to Remove Stop Words
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
# Ensure required data is downloaded
nltk.download('punkt')
nltk.download('stopwords')
# Load stop words
stop_words = set(stopwords.words('english'))
# Read text from file
with open("input.txt", "r") as file:
text = file.read()
# Tokenize text into words
words = word_tokenize(text)
# Remove stop words
filtered_words = [word for word in words if word.lower() not in stop_words]
# Join words back into sentence
filtered_text = " ".join(filtered_words)
print("Original Text:\n", text)
print("\nText After Removing Stop Words:\n", filtered_text)
Example Input (input.txt)
This is a simple example to demonstrate how to remove stop words from a text file using Python.
Output
Original Text:
This is a simple example to demonstrate how to remove stop words from a text file using Python.
Text After Removing Stop Words:
This simple example demonstrate remove stop words text file using Python .
“If you're learning NLP, a beginner-friendly Python and NLTK guide can help you understand concepts faster.”
→ Buy now and improve your coding skillsExplanation of the Code
- The program reads text from a file named input.txt
- It tokenizes the text into individual words
- It loads English stop words using NLTK
- It filters out words that match the stop words list
- Finally, it joins and prints the cleaned text
Why Removing Stop Words is Important
- Reduces data size
- Improves processing speed
- Focuses on meaningful words
- Enhances NLP model performance

Comments
Post a Comment