Text and Document Processing
How to find and replace across files, extract information from documents, and convert between formats using Claude Code
From files to what's inside them
So far in this module, you've worked on files themselves — renaming them, moving them into folders, converting their formats. Now you're shifting to what's inside those files.
Finding and replacing text across dozens of reports. Pulling key details out of invoices or contracts. Converting a batch of documents from one format to another.
These are the tasks that eat up hours of your week if you do them by hand. Open a file, search for the old text, replace it, save, close, open the next one. Or worse — read through a stack of invoices, copy the total from each one into a spreadsheet, and hope you don't miss any.
Claude Code handles these the same way it handles file operations: you describe what you want, and it does the repetitive part.
Find and replace across many files
The most common text processing task is updating the same thing across a batch of files.
Say your team just finished Q3, and you have 30 report files that still say "Q3 2025." Every one of them needs to say "Q4 2025" instead.
In a word processor, you'd open each file, use Find and Replace, save, close, repeat. Thirty times.
With Claude Code:
In this folder, find every file that contains "Q3 2025" and replace it with "Q4 2025". Show me how many files were changed and how many replacements were made in each one.Claude Code scans the files, makes the replacements, and reports back with a summary: "Updated 24 files. 3 files had multiple replacements. 6 files didn't contain the text."
That summary matters. It tells you what happened without you having to check every file.
When replacements aren't straightforward
Some replacements require judgment, not just a literal swap.
"Change all references to the old product name to the new one — but only in the marketing materials, not in the legal documents."
Claude Code can handle this. It reads the files, infers which ones are marketing materials (from their filenames, folder location, or content), and makes the changes only where appropriate.
But here's the honest part: it will make judgment calls you might disagree with.
A file called brand-overview-legal-review.docx — is that marketing or legal?
Claude Code will guess.
The fix is the same as with bulk file operations: tell Claude Code to show you the plan before making changes.
Show me which files you'd change and what each replacement looks like, but don't modify anything yet. I want to review first.Review the list. Approve the ones that look right. Flag the ones that need a different approach.
Extracting information from documents
This is the one that saves the most time. Instead of reading through a stack of documents yourself, you tell Claude Code what to pull out.
Say you have 40 invoices from different vendors, all in different formats. Some are PDFs, some are Word documents, some are plain text emails you saved as files. You need a spreadsheet with the vendor name, invoice number, date, and total amount from each one.
Read every file in this folder. Each one is an invoice. Extract the vendor name, invoice number, date, and total amount from each. Save the results as a CSV file called invoice-summary.csv.Claude Code opens each file, finds the relevant information (even though every invoice looks different), and builds your spreadsheet.
A few things to know about extraction:
It works best on text-based files. Plain text, Word documents, and many PDFs work well. Scanned documents (PDFs that are really just images of paper) are a different story — Claude Code can't read text from an image. If your invoices are scans, you'll need OCR software (optical character recognition — a tool that reads text from images) to convert them first. Claude Code can help you set this up, but it's an extra step.
It handles inconsistent formats well — often better than you'd expect. One invoice has "Total: $5,000" at the bottom. Another has "Amount Due: 5000.00" in the middle. A third lists "Grand Total" in bold. Claude Code recognizes these variations and pulls the right number from each.
But it's not perfect. When a document has ambiguous data — multiple dollar amounts, or a vendor name that could be interpreted different ways — Claude Code picks the most likely answer. It's usually right, but not always. For anything that matters (like financial data you're submitting), spot-check the results. Open a few of the original files and compare them to the extracted data.
The verification habit
In Module 3, you learned to verify data analysis results. The same habit applies here.
After Claude Code extracts information from 40 invoices, pick five at random. Open the originals. Check the extracted values.
If four out of five match, you're in good shape — scan the rest for obvious outliers. If two out of five are wrong, the extraction isn't reliable enough for this batch. Tell Claude Code what went wrong and ask it to try a different approach.
Worked example: categorizing customer feedback
Here's a realistic scenario you can follow along with.
Your customer success team sends you a folder of 50 text files — each one is a piece of customer feedback collected over the past month.
The filenames are unhelpful: feedback-001.txt through feedback-050.txt.
You need a summary that tells you how many are bug reports, how many are feature requests, how many are praise, and how many are complaints.
Start Claude Code in the folder where you saved the files:
Read all the text files in this folder. Each one contains a piece of customer feedback. Categorize each file as one of: bug report, feature request, praise, or complaint. Then create a summary report that shows: (1) the count for each category, (2) a one-line summary of each piece of feedback, and (3) two representative quotes from each category. Save the report as feedback-summary.md.Claude Code reads every file, decides what category each one falls into, and writes the summary report.
What you get back looks something like this:
Created feedback-summary.md with analysis of 50 feedback files:
- Bug reports: 14
- Feature requests: 18
- Praise: 11
- Complaints: 7
Each file is categorized with a one-line summary.
Representative quotes included for each category.Open feedback-summary.md and review it.
Are the categories reasonable?
Did a glowing testimonial get labeled as a complaint?
Does the count add up to 50?
If something's off:
Files feedback-012 and feedback-037 are categorized as complaints, but they're actually feature requests — the tone is frustrated but they're asking for something new. Please recategorize them and update the counts.Claude Code fixes the two files and updates the totals.
The whole thing — reading 50 files, categorizing them, generating a report — takes a few minutes. Doing it by hand would take over an hour, and you'd probably miscount something.
Converting document formats
The other common task is converting files from one format to another.
Some conversions you might need:
- A set of Word documents into PDFs for sharing
- Markdown files into formatted web pages
- A collection of text files into a single combined document
- Plain text into a structured format like a spreadsheet
Convert all the Word documents in this folder to PDF format. Save the PDFs in a new folder called "pdf-versions" and keep the same filenames.For document conversion, Claude Code often uses a tool called pandoc — a format converter that handles over 20 document types. If pandoc isn't on your machine yet, Claude Code will ask to install it. It's a standard, widely-used document converter — safe to approve.
Some conversions don't need any extra tools at all. Plain text formats (Markdown to HTML, for example) Claude Code handles natively. Office document formats (Word to PDF, for instance) usually require pandoc or something similar.
Combining files
A variation on format conversion is combining multiple files into one.
Say you have 12 monthly reports as separate files and you want a single annual report:
Combine all the monthly report files in this folder into a single document called annual-report-2025.md. Put them in chronological order. Add a heading for each month before its section.Claude Code reads the files, works out the chronological order (from filenames, dates in the content, or both), and stitches them together with headings.
The same safety habits apply
Everything you learned about safety in the previous page still holds.
Work on a copy of the folder if the files are important. Test on a small batch first before running across all files. Review the results before using them.
Text operations carry one additional risk: unlike file moves and renames, text replacements change what's inside the files. If Claude Code replaces the wrong text, the original content is gone. The rewind feature (press Escape twice) can undo these changes — and checkpoints persist even after you close Claude Code, so you can roll back later. But for anything high-stakes, a backup copy is still the simplest safety net.
Tip: For high-stakes text replacements (legal documents, financial reports, anything going to clients), use Plan mode first. Press Shift+Tab twice to enter Plan mode. Claude Code will analyze the files and describe its plan without touching anything. Review the plan, then approve it or ask for changes.
What's next
You've covered file operations, text processing, and document handling. Next up: mail merge and template-based tasks, where you combine a template with data to generate many personalized files at once.