On-Demand Tutorial

Exploring Data with ChatGPT:
A Step-by-Step Guide

ChatGPT isn’t just for writing and conversation — it’s also a powerful tool for exploring and analyzing data. In this guide, Brandon Krakowsky, Director of Data Science & Research at the Wharton AI and Analytics Initiative (WAIAI) walks you through how to use ChatGPT to perform exploratory data analysis (EDA) on a retail dataset. You’ll learn how to upload CSV files, preview your data, summarize it, and even export analysis-ready code — all from within ChatGPT. To access the data files used in this tutorial, click here.

1. Prepare Your Data

For this example, we’ll use three CSV files representing a retail business:

  • Orders data – detailed transactions including order date, shipping information, product details, discounts, and profit.
  • Customer data – segmentation and demographic information such as region and postal code.
  • Returns data – a list of returned orders identified by order ID.

You can adapt this process for any set of CSV files (or other similar formats) — just make sure your files are cleanly formatted and ready to upload.

2. Load the Files into ChatGPT

Once logged into ChatGPT (the Plus plan is recommended for faster performance and data handling), upload your CSV files directly to the chat.

Start with a simple prompt like:

“I’d like to analyze my customer, orders, and returns data. Please load each CSV file.”

ChatGPT will preview your datasets and provide quick summaries — showing how many rows and columns each file has and highlighting key fields.

3. Generate Data Summaries

After the data loads, you can ask ChatGPT to create structured summaries for each file:

“Please summarize each dataset with column names, data types, missing values, unique values, and basic statistics.”

ChatGPT will return neatly formatted tables showing column names, data types, missing and unique values, and summary statistics like min, max, mean, median, and standard deviation.

4. Understand Data Types

As part of the summary, ChatGPT will identify each column’s data type. Some of the most common include:

  • Object – Text or string fields (e.g., customer names, IDs, regions)
  • Int64 – Whole numbers (e.g., postal codes or quantities)
  • Float64 – Decimal numbers (e.g., sales, discounts, profits)
  • Datetime64 – Dates (e.g., order date, ship date)

If ChatGPT identifies a date as an “object,” you can correct it by prompting:

“Please convert order date and ship date columns to datetime format.”

This ensures ChatGPT can later perform time-based filtering or create visualizations over time.

5. Inspect the Data

It’s always a good idea to take a quick look at the actual data:

“Show the first and last five rows for each table.”

This gives you a tangible sense of what the data looks like — helping spot formatting issues, unexpected nulls, or mismatched columns.

6. Merge the Data

Once everything looks correct, you can combine your datasets. For example:

“Merge the customer and orders data, and check if all orders have matching customer info.”

ChatGPT will automatically perform the join, report how many unique customers and orders exist, and confirm whether any records are missing from the merge.

7. Export Your Work

At any point, you can ask ChatGPT to show or export the code it’s running behind the scenes. Try prompting:

“Please show all the code generated so far and provide a downloadable Python script and Jupyter notebook file.”

ChatGPT will provide downloadable files so you can continue your analysis in your local environment, or tweak the code manually if needed.

By Brandon Krakowsky, Director, Data Science and Research, Wharton AI & Analytics Initiative

Pro Tips for a Smoother Workflow

A few quick ways to get even more out of ChatGPT during your analysis:

  • Name files clearly – Simple filenames like orders_2024.csv or returns.csv help ChatGPT keep things straight during merges.
  • Check row counts after merging – Ask ChatGPT to confirm that no rows were dropped or duplicated.
  • Use data visualizations early – You can prompt, “Plot sales over time” or “Show a bar chart of orders by region” to spot patterns right away.
  • Generate quick documentation – Try, “Create a markdown data dictionary for this dataset” for an instant, shareable summary.

These small prompts can save time and keep your analysis consistent from project to project.