Skip to main content

Build a Command-Line Text Analyzer in Python - Step by Step

· 6 min read
Jagdish Kumawat
Founder @ Dewiride

Build a Command-Line Text Analyzer in Python that reads a text file, counts lines or words, displays frequent words, detect long lines.

Introduction

Whether you're a developer, writer, or student, analyzing text files can help uncover patterns and insights fast. Instead of doing it manually, why not build your own command-line text analyzer in Python?

In this guide, we'll build a tool that:

  • Reads a text file
  • Counts lines and words
  • Displays frequent words
  • Detects long lines

Let's dive in.

Prerequisites

  1. Python installed on your machine (3.6+ recommended).
  2. Create a new folder to keep the project related files in it. I'll name it as command-line-text-analyzer.
  3. Open the above newly created folder in Visual Studio Code (VSCode).
  4. Create a new file named text-analyzer.py.

Project Structure in VS Code

Step 1: Accept Command-Line Arguments

To analyze any file from the terminal, we need to accept its path and an optional max line length. We'll use argparse for this.

text-analyzer.py
import argparse

def main():
parser = argparse.ArgumentParser(description="Command-line Text Analyzer")
parser.add_argument("file", help="Path to the text file")
parser.add_argument("--length", type=int, default=80, help="Max line length to check for long lines")
args = parser.parse_args()

print(f"Analyzing file: {args.file}")
print(f"Long line threshold: {args.length} characters")

if __name__ == "__main__":
main()

Explanation:

  • argparse.ArgumentParser() lets us define command-line arguments.

  • file is required; --length is optional (defaults to 80).

  • Run this with:

    Terminal
    python text-analyzer.py dummy_text_file.txt --length 100

Create a sample text files for testing. I am using dummy_text_file.txt with some random text and long_story.txt with multiple paragraphs.

Sample Text Files

📄Download Sample TXT Files Zip

Step 2: Read the File Safely

Let’s read the file contents into memory and handle missing file errors gracefully.

text-analyzer.py
def read_file(filepath):
try:
with open(filepath, 'r', encoding='utf-8') as f:
return f.readlines()
except FileNotFoundError:
print(f"Error: File not found - {filepath}")
return []

Explanation:

  • This reads all lines into a list.
  • It uses UTF-8 encoding for compatibility.
  • If the file doesn’t exist, it prints an error and returns an empty list.

Update main() to use it:

text-analyzer.py
lines = read_file(args.file)
if not lines:
return

Step 3: Count Lines

This one’s easy! Just use len() on the list of lines.

text-analyzer.py
def count_lines(lines):
return len(lines)

In main():

text-analyzer.py
line_count = count_lines(lines)
print(f"Total lines: {line_count}")

Step 4: Count Words

We want to count all words, stripping punctuation so “hello!” becomes just “hello”.

text-analyzer.py
import string

def count_words(lines):
words = []
for line in lines:
clean_line = line.translate(str.maketrans('', '', string.punctuation))
words.extend(clean_line.strip().split())
return len(words), words

Explanation:

  • str.maketrans('', '', string.punctuation) removes all punctuation.
  • split() breaks the line into words.
  • We return:
    • total word count
    • list of all words (for frequency analysis)

Step 5: Analyze Word Frequency

We’ll use collections.Counter to tally up word counts.

text-analyzer.py
from collections import Counter

def word_frequencies(words):
return Counter(word.lower() for word in words)

Explanation:

  • Converts all words to lowercase (so “Python” and “python” are the same).
  • Returns a dictionary-like object with counts.

To show the top 10 words:

text-analyzer.py
for word, count in freqs.most_common(10):
print(f"{word}: {count}")

Step 6: Detect Long Lines

Now let’s find lines that are longer than the user-specified threshold.

text-analyzer.py
def detect_long_lines(lines, max_length=80):
return [i + 1 for i, line in enumerate(lines) if len(line) > max_length]

Explanation:

  • We return line numbers (1-based) for each line that’s too long.
  • enumerate() gives us both index and content.

Step 7: Display Everything Nicely

Let’s wrap up by printing all results cleanly.

text-analyzer.py
def display_results(file_path, lines, word_count, freqs, long_lines):
print(f"\nAnalysis of: {file_path}")
print(f"Total Lines: {len(lines)}")
print(f"Total Words: {word_count}")

print("\nTop 10 Frequent Words:")
for word, count in freqs.most_common(10):
print(f" {word}: {count}")

if long_lines:
print(f"\nLines longer than threshold: {len(long_lines)}")
print("Line numbers:", long_lines)
else:
print("\nNo lines exceed the specified length.")

Then update main():

text-analyzer.py
word_count, words = count_words(lines)
freqs = word_frequencies(words)
long_lines = detect_long_lines(lines, args.length)

display_results(args.file, lines, word_count, freqs, long_lines)

Final Code (All Together)

You can now paste everything into a file named text_analyzer.py.

text-analyzer.py
import argparse
from collections import Counter
import string

def read_file(filepath):
try:
with open(filepath, 'r', encoding='utf-8') as f:
return f.readlines()
except FileNotFoundError:
print(f"File not found: {filepath}")
return []

def count_lines(lines):
return len(lines)

def count_words(lines):
words = []
for line in lines:
line = line.translate(str.maketrans('', '', string.punctuation))
words.extend(line.strip().split())
return len(words), words

def word_frequencies(words):
return Counter(word.lower() for word in words)

def detect_long_lines(lines, max_length=80):
return [i + 1 for i, line in enumerate(lines) if len(line) > max_length]

def display_results(file_path, lines, word_count, freqs, long_lines):
print(f"\nAnalysis of: {file_path}")
print(f"Total Lines: {len(lines)}")
print(f"Total Words: {word_count}")
print("\nTop 10 Frequent Words:")
for word, count in freqs.most_common(10):
print(f" {word}: {count}")
if long_lines:
print(f"\nLines longer than threshold: {len(long_lines)}")
print("Line numbers:", long_lines)
else:
print("\nNo lines exceed the specified length.")

def main():
parser = argparse.ArgumentParser(description="Command-line Text Analyzer")
parser.add_argument("file", help="Path to the text file")
parser.add_argument("--length", type=int, default=80, help="Max line length to check for long lines")
args = parser.parse_args()

lines = read_file(args.file)
if not lines:
return

line_count = count_lines(lines)
word_count, words = count_words(lines)
freqs = word_frequencies(words)
long_lines = detect_long_lines(lines, args.length)

display_results(args.file, lines, word_count, freqs, long_lines)

if __name__ == "__main__":
main()

Testing the Analyzer

Terminal
python text-analyzer.py dummy_text_file.txt
python text-analyzer.py dummy_text_file.txt --length 100
python text-analyzer.py long_story.txt
python text-analyzer.py long_story.txt --length 50

Sample Output

Sample Output 2

Sample Output 3

Sample Output 4

Conclusion

Congratulations! You’ve built a simple yet powerful command-line text analyzer in Python. This tool can help you quickly analyze text files, count words, and detect long lines.

Stay Updated

Subscribe to our newsletter for the latest tutorials, tech insights, and developer news.

By subscribing, you agree to our privacy policy. Unsubscribe at any time.