Claude Code for Non-Developers
Working with Data

Exercise: Clean, analyze, and chart sales data

Put your data skills to work by using Claude Code to clean a messy CSV, answer business questions, create a chart, and verify the numbers

What you'll practice

This exercise pulls together the whole module: exploring data, asking questions, cleaning up a mess, charting, and checking the results.

You'll work with a messy sales CSV, the kind of file you might export from a CRM or get from a colleague. Inconsistent formatting, duplicate rows, missing values. Your job is to clean it, answer three business questions, generate a chart, and verify one number by hand.

Before you start

You need two things:

  1. Claude Code installed and working.
  2. A project folder to work in. You'll create the sample data in the next step.

Tip: If you have a real data file from your work, use that instead of the sample. The prompts below still apply. Swap in your filename and adjust the questions to match your data.

Step 1: Set up your project folder

Create a folder for this exercise and start Claude Code:

Mac or Linux:

mkdir ~/Documents/sales-exercise
cd ~/Documents/sales-exercise
claude

Windows (PowerShell):

mkdir ~\Documents\sales-exercise
cd ~\Documents\sales-exercise
claude

Now ask Claude Code to create a messy data file:

Create a file called raw_sales.csv with about 200 rows of realistic sales data. Include these columns: order_id, customer_name, order_date, product, quantity, unit_price, region, status.

Make the data messy on purpose:
- Mix up date formats: some as 2025-03-15, some as 03/15/2025, some as March 15, 2025
- Inconsistent region names: use "West", "west", "WEST", "West Coast", and "Pacific" for the same region. Do the same kind of thing for East, Central, and South.
- About 15 rows with missing customer names
- About 20 duplicate rows (exact copies of other rows)
- Status values should be a mix of "Shipped", "shipped", "SHIPPED", "Delivered", "Pending", "Cancelled"
- Products: Enterprise License, Professional Plan, Starter Kit, Add-on Storage, Training Package

Save it as raw_sales.csv.

Claude Code will generate the file. Before you do anything else, look at what you're working with:

Describe raw_sales.csv. How many rows? What columns? What problems do you see?

Read the summary. It should flag the inconsistent dates, duplicate rows, mixed-up region names, and missing values. This is the "explore first" habit from this module. Understand the data before touching it.

Step 2: Clean the data

Now fix the problems. Ask Claude Code to work on a copy so the original stays untouched:

Make a copy of raw_sales.csv called clean_sales.csv, then clean it up:

1. Standardize all dates to YYYY-MM-DD format
2. Standardize region names: map all variations to West, East, Central, or South
3. Standardize status values: map everything to Shipped, Pending, or Cancelled (treat "Delivered" as "Shipped")
4. Remove exact duplicate rows
5. Keep rows with missing customer names but fill them with "Unknown"

Tell me what you changed before saving the file.

Claude Code will describe what it plans to change before doing anything. Read through it. Does the cleanup make sense?

  • Did it map the region names correctly? ("West Coast" and "Pacific" should both become "West.")
  • Did it handle all the date formats?
  • How many duplicates did it remove?
  • How many missing names did it fill?

If something looks off, say so. "You mapped 'Pacific' to South, but it should be West. Fix that in clean_sales.csv."

Once the cleanup looks right, move on.

Heads up: Always check the row count after cleaning. Ask: "How many rows in raw_sales.csv vs. clean_sales.csv?" The cleaned file should have fewer rows (because you removed duplicates) but every non-duplicate row from the original should still be there.

Step 3: Answer three business questions

Now put the clean data to work. Ask Claude Code these three questions, one at a time:

Question 1:

Using clean_sales.csv, what is the best-selling product by total revenue? Show me revenue for each product.

Question 2:

Which month had the highest total revenue? Show monthly totals.

Question 3:

What is the average order value across all orders? And what's the average order value by region?

After each answer, stop and read it. Does the breakdown make sense? Do the numbers add up?

If a product or region is missing from the results, ask Claude Code why. That's not a mistake on your part. Follow-up questions are how this works.

Tip: Want to drill deeper? Try a follow-up: "Break down Question 1 by region — which product sells best in each region?" Follow-ups in the same conversation are where Claude Code is at its best.

Step 4: Create a chart

Pick whichever of the three answers would work better as a chart than a table. Monthly revenue is a good candidate.

Create a bar chart of monthly revenue from clean_sales.csv. Label each bar with the month name. Title it "Monthly Revenue - 2025". Save it as monthly_revenue.png.

Claude Code will create the chart and save it as an image file. Open it:

Mac:

open monthly_revenue.png

Windows:

start monthly_revenue.png

Look at the chart. If it's not quite right, ask for changes: "Make the bars blue instead of the default color" or "Add dollar amounts on top of each bar" or "Make the text bigger."

Keep going until it looks like something you'd actually put in a slide deck.

Step 5: Verify one number

This is the step that matters most. Pick one number from your analysis. The total revenue for the best-selling product, the highest month's total, the overall average order value — any of them.

Open clean_sales.csv in a spreadsheet (Excel, Google Sheets, Numbers, LibreOffice Calc). Filter or sort to check the number yourself.

Say Claude Code told you the best-selling product is "Enterprise License" with $84,250 in revenue.

  1. Open clean_sales.csv in your spreadsheet app.
  2. Filter the product column to show only "Enterprise License."
  3. Multiply quantity by unit_price for a few rows to confirm the math.
  4. Sum the total and compare it to what Claude Code reported.

Does the number match? If it's close but not exact, ask Claude Code to explain: "How did you calculate total revenue for Enterprise License? Show me the formula you used."

If it's way off, that's not a failure — that's the verification step doing its job. Go back and check whether the cleaning step introduced an error, or whether Claude Code miscounted.

Tip: You don't need to verify every number. One is enough to build confidence or catch a problem early. Make it a habit.

What you just did

You cleaned a messy data file, answered business questions, created a chart, and checked the numbers. That's the whole data workflow from this module in one sitting:

  1. Explore the data before touching it.
  2. Clean it on a copy, not the original.
  3. Ask business questions in plain language.
  4. Create a chart that communicates the answer.
  5. Check at least one number by hand.

The questions change, the files change. The workflow doesn't.

If something went wrong

Error when creating the CSV? Check that you're in the right folder. Type pwd to see where you are.

Cleaning changed something it shouldn't have? Press Escape twice to open the rewind menu and undo the file changes. Then be more specific: "Don't change the product names, only standardize regions, dates, and statuses."

Chart didn't appear? Claude Code may need to install a library. It will ask permission to install matplotlib or a similar tool. Approve it and try again.

Verified number didn't match? Good — that means the verification step worked. Ask Claude Code to walk through its calculation step by step.

Going further (optional)

Done early? Here are a few ways to push further:

  • Write a CLAUDE.md in your project folder with a data dictionary: list each column name and what it means. Run the analysis again and see if the answers come back more accurate on the first try.
  • Ask Claude Code to generate a one-page summary report with the three answers and the chart embedded.
  • Combine files: ask Claude Code to create a customers.csv with customer details (name, email, company, signup_date), then merge it with your sales data. "Which company has the highest total spend?"
  • Export to Excel: "Save clean_sales.csv as clean_sales.xlsx with the columns auto-sized."

What's next

That's Module 3. You can explore, clean, analyze, chart, and verify data files with Claude Code.

Module 4 is about automating repetitive tasks. Instead of doing the same data cleanup by hand every week, you'll turn it into something that runs on its own.

On this page