Build a Command-Line Text Analyzer in Python - Step by Step
Build a Command-Line Text Analyzer in Python that reads a text file, counts lines or words, displays frequent words, detect long lines.
Introduction
Whether you're a developer, writer, or student, analyzing text files can help uncover patterns and insights fast. Instead of doing it manually, why not build your own command-line text analyzer in Python?
In this guide, we'll build a tool that:
- Reads a text file
- Counts lines and words
- Displays frequent words
- Detects long lines
Let's dive in.
Prerequisites
- Python installed on your machine (3.6+ recommended).
- Create a new folder to keep the project related files in it. I'll name it as
command-line-text-analyzer
. - Open the above newly created folder in Visual Studio Code (VSCode).
- Create a new file named
text-analyzer.py
.
Step 1: Accept Command-Line Arguments
To analyze any file from the terminal, we need to accept its path and an optional max line length. We'll use argparse
for this.
import argparse
def main():
parser = argparse.ArgumentParser(description="Command-line Text Analyzer")
parser.add_argument("file", help="Path to the text file")
parser.add_argument("--length", type=int, default=80, help="Max line length to check for long lines")
args = parser.parse_args()
print(f"Analyzing file: {args.file}")
print(f"Long line threshold: {args.length} characters")
if __name__ == "__main__":
main()
Explanation:
-
argparse.ArgumentParser() lets us define command-line arguments.
-
file is required; --length is optional (defaults to 80).
-
Run this with:
Terminalpython text-analyzer.py dummy_text_file.txt --length 100
Create a sample text files for testing. I am using dummy_text_file.txt
with some random text and long_story.txt
with multiple paragraphs.
Step 2: Read the File Safely
Let’s read the file contents into memory and handle missing file errors gracefully.
def read_file(filepath):
try:
with open(filepath, 'r', encoding='utf-8') as f:
return f.readlines()
except FileNotFoundError:
print(f"Error: File not found - {filepath}")
return []
Explanation:
- This reads all lines into a list.
- It uses UTF-8 encoding for compatibility.
- If the file doesn’t exist, it prints an error and returns an empty list.
Update main()
to use it:
lines = read_file(args.file)
if not lines:
return
Step 3: Count Lines
This one’s easy! Just use len() on the list of lines.
def count_lines(lines):
return len(lines)
In main()
:
line_count = count_lines(lines)
print(f"Total lines: {line_count}")
Step 4: Count Words
We want to count all words, stripping punctuation so “hello!” becomes just “hello”.
import string
def count_words(lines):
words = []
for line in lines:
clean_line = line.translate(str.maketrans('', '', string.punctuation))
words.extend(clean_line.strip().split())
return len(words), words
Explanation:
str.maketrans('', '', string.punctuation)
removes all punctuation.split()
breaks the line into words.- We return:
- total word count
- list of all words (for frequency analysis)
Step 5: Analyze Word Frequency
We’ll use collections.Counter
to tally up word counts.
from collections import Counter
def word_frequencies(words):
return Counter(word.lower() for word in words)
Explanation:
- Converts all words to lowercase (so “Python” and “python” are the same).
- Returns a dictionary-like object with counts.
To show the top 10 words:
for word, count in freqs.most_common(10):
print(f"{word}: {count}")
Step 6: Detect Long Lines
Now let’s find lines that are longer than the user-specified threshold.
def detect_long_lines(lines, max_length=80):
return [i + 1 for i, line in enumerate(lines) if len(line) > max_length]
Explanation:
- We return line numbers (1-based) for each line that’s too long.
enumerate()
gives us both index and content.
Step 7: Display Everything Nicely
Let’s wrap up by printing all results cleanly.
def display_results(file_path, lines, word_count, freqs, long_lines):
print(f"\nAnalysis of: {file_path}")
print(f"Total Lines: {len(lines)}")
print(f"Total Words: {word_count}")
print("\nTop 10 Frequent Words:")
for word, count in freqs.most_common(10):
print(f" {word}: {count}")
if long_lines:
print(f"\nLines longer than threshold: {len(long_lines)}")
print("Line numbers:", long_lines)
else:
print("\nNo lines exceed the specified length.")
Then update main()
:
word_count, words = count_words(lines)
freqs = word_frequencies(words)
long_lines = detect_long_lines(lines, args.length)
display_results(args.file, lines, word_count, freqs, long_lines)
Final Code (All Together)
You can now paste everything into a file named text_analyzer.py
.
import argparse
from collections import Counter
import string
def read_file(filepath):
try:
with open(filepath, 'r', encoding='utf-8') as f:
return f.readlines()
except FileNotFoundError:
print(f"File not found: {filepath}")
return []
def count_lines(lines):
return len(lines)
def count_words(lines):
words = []
for line in lines:
line = line.translate(str.maketrans('', '', string.punctuation))
words.extend(line.strip().split())
return len(words), words
def word_frequencies(words):
return Counter(word.lower() for word in words)
def detect_long_lines(lines, max_length=80):
return [i + 1 for i, line in enumerate(lines) if len(line) > max_length]
def display_results(file_path, lines, word_count, freqs, long_lines):
print(f"\nAnalysis of: {file_path}")
print(f"Total Lines: {len(lines)}")
print(f"Total Words: {word_count}")
print("\nTop 10 Frequent Words:")
for word, count in freqs.most_common(10):
print(f" {word}: {count}")
if long_lines:
print(f"\nLines longer than threshold: {len(long_lines)}")
print("Line numbers:", long_lines)
else:
print("\nNo lines exceed the specified length.")
def main():
parser = argparse.ArgumentParser(description="Command-line Text Analyzer")
parser.add_argument("file", help="Path to the text file")
parser.add_argument("--length", type=int, default=80, help="Max line length to check for long lines")
args = parser.parse_args()
lines = read_file(args.file)
if not lines:
return
line_count = count_lines(lines)
word_count, words = count_words(lines)
freqs = word_frequencies(words)
long_lines = detect_long_lines(lines, args.length)
display_results(args.file, lines, word_count, freqs, long_lines)
if __name__ == "__main__":
main()
Testing the Analyzer
python text-analyzer.py dummy_text_file.txt
python text-analyzer.py dummy_text_file.txt --length 100
python text-analyzer.py long_story.txt
python text-analyzer.py long_story.txt --length 50
Conclusion
Congratulations! You’ve built a simple yet powerful command-line text analyzer in Python. This tool can help you quickly analyze text files, count words, and detect long lines.