Gain unprecedented access to individual-level datasets with real-world business contexts.
How It Works
Who is Eligible?
All Wharton and Penn faculty and students who will use the data for academic purposes – research, capstone and course projects, and independent study – though access eligibility varies by dataset.
Browse Catalog
Available datasets are listed alphabetically below. Click on the dataset title or the “+” icon to see additional information about the data and access eligibility.
Request Data
To request access, please complete this form. If you are interested in accessing multiple datasets, you must submit a form for each dataset.
Available Datasets
Annalect, from AI at Wharton
A unique and comprehensive dataset from Annalect, the data management division of the Omnicom Group, a leading global advertising, and marketing communications services company. This dataset includes exposures to email and online display advertisements from a travel business company, as well as conversions at the company’s website. Researchers will be able to track exposures, clicks, and conversions for 10,000 individual users (tracked by cookies) for ~60 days. As tourism consumers typically shop over the course of several weeks, this gives researchers the opportunity to explore how customers search for information about a highly considered product and how advertising affects the path to purchase.
Data includes:
- Details about the exposure, including the type of ad, description, and size of the creative, and the campaign the creative was part of
- Information about whether the user clicked on the ad, and if that click eventually led to a conversion
- The type of conversion the user engaged in, such as exploring products or receiving a purchase confirmation
Who can access: anyone at Penn, for the purpose of academic research
Barnes Foundation, from AI at Wharton
The Barnes Foundation is a world-renowned nonprofit cultural and educational institution committed to transforming lives through art by sharing its unparalleled art collection, exhibitions, classes, and public programs with the widest audience possible.
Data includes:
- Data on 300,000 customers, including members and non-members
- Transactions
- All purchase points, product info, and purchase channel
- Historic product calendar and financial spreadsheets
- List of promotions for non-members and non-members
- Calendar of print mail campaigns
Who can access: anyone at Penn, for the purpose of academic research
Chicago Policing Data, from Research on Policing Reform and Accountability
Four years of activity data from the Chicago Police Department, accompanied by Chicago census and crime data and shapefiles for Chicago census block groups, police districts, and police beats.
Data includes:
- Information on over 33,000 police officers, who collectively reported:
- Over 3 million shifts
- Over 1 million stops
- Over 300,000 arrests
- Over 9,000 uses of force
- Between January 2012 and January 2016
Who can access: anyone at Penn, for the purpose of academic research
Clientivity, from AI at Wharton
Clientivity is a hotel booking software platform that empowers users to create, manage, and earn commission from personal, group, and corporate travel. The dataset includes funnel statistics, partner and end-user demographics, and hotel pricing trends.
Data includes:
- 12,000 active partners
- 53,000 partnering hotels, including location, star rating, and review count
Who can access: anyone at Penn, for the purpose of academic research
Coqovins, from AI at Wharton
Coqovins is a virtual sommelier that makes personalized wine recommendations through a chatbot at participating wine stores. The dataset includes wine attributes, wine reviews, and wine details.
Data includes:
- 1,600 individual wine reviews
- 9,100 wine attributes
- 26,000 wine label details
Who can access: anyone at Penn, for the purpose of academic research
CoreLogic, from Wharton Real Estate and Wharton Finance
CoreLogic is the trusted source for property intelligence, with deep knowledge of powerful economic, social, and environmental forces that promote healthy housing markets and thriving communities.
Data includes:
- 10 million observations of property listing data
- More than 600 variables describing listing details (e.g., listing date and price, listing office and agent, commission rate offered to buyer’s agent)
- Property characteristics (number of bedrooms and bathrooms, remarks from sellers)
- Transaction details when a sale occurs (sale price and date, purchasing office and agent)
- Mortgage loan-level data (including origination characteristics and monthly performance data)
Who can access: Wharton faculty, PhD students supervised by Wharton faculty, and Wharton-affiliated researchers, subject to approval
DataView, from Penn Wharton Budget Model
DataView is a powerful new tool that simplifies collecting, visualizing, and analyzing government and other public data.
- Search across millions of data series available from dozens of sources
- Transform and combine data as desired using simple “point and click”
- Visualize your data with graphs, tables, scatterplots, and animated US maps
- Test your ideas using integrated regression analysis
- Create an account to save your work for later as well as share with others
Who can access: anyone at Penn
eMAXX Bond Holders, from Lippincott Library
Historical quarterly bond holder information, covering Q3 of 1998 through 2022. Subscription covers North American and Pacific bonds in the following market sectors:
- Asset-Backed Securities (ABS)/Collateralized Debt Obligations (CDO)
- Corporate
- Government
- Mortgage-Backed Securities (MBS)
- Municipal
Who can access: anyone at Penn, for the purpose of academic research
Expedia, from AI at Wharton
Expedia, the largest online travel company in the world, provided a dataset that details events leading up to conversion (or failure to convert) for approximately 10,000 U.S.-based users searching for hotels in each of four geographic markets (Cancun, NYC, Paris, and Budapest).
Data includes:
- Information about how the user arrived at Expedia
- What promotional pages they have viewed
- Details of their search query, such as dates and number of travelers
- Which hotels were displayed in search results, which hotels were clicked on and which hotels were purchased
Who can access: anyone at Penn, for the purpose of academic research
Experian, from Wharton Real Estate and Wharton Finance
Experian is an American–Irish multinational consumer credit reporting company.
Data includes:
- De-identified credit bureau records
- Mortgage, auto, student loan, and credit card balances
- Credit card limit information
- Credit scores
- Credit inquiries
- Delinquencies, bankruptcies, and judgments
Please provide a budget code with your request. While costs are low, access to Experian data is billed based on scale of usage.
Who can access: Wharton faculty, PhD students supervised by Wharton faculty, and Wharton-affiliated researchers, subject to approval
Felix, from AI at Wharton
Felix is a chat-based platform for sending money from the U.S. to Mexico. Felix Technologies Inc. is a technology company with the mission to make cross-border payments to Latin America as easy as sending a message on WhatsApp. The data includes anonymous credit card transactions and associated risk data.
Data includes:
- Individual customer transactions, including:
- Customer and card identifier,
- Amount of the transaction,
- Date and time,
- Scores to determine if the transaction (and customer) was fraudulent
- Whether the transaction was flagged as fraudulent
Who can access: anyone at Penn, for the purpose of academic research
FTSE Russell, from Wharton Computing
FTSE Russell is a leading global provider of benchmarks, analytics, and data solutions with multi-asset capabilities.
Data includes:
- London Stock Exchange Holdings data for several of Russell’s indexes, including the Russell 1000 and Russell 2000
Who can access: anyone within Wharton
Fuel Cycle/Rent-A-Center, from AI at Wharton
Fuel Cycle is an all-in-one research platform that combines both qualitative and quantitative data to power real-time business decisions. Rent-A-Center stores offer name-brand furniture, electronics, appliances, computers, and smartphones through flexible rental purchase agreements that allow the customer to obtain ownership of the merchandise at the conclusion of an agreed upon rental period.
Data includes:
- Product performance data from Rent-A-Center Rental agreement and rent to own performance metrics for eight TV models
- Customer data, including demographics and customer status (new, active, reactivated)
- Rental agreements, including purchase amounts, discounts, and whether it was a single agreement or if the TV was packaged with other items
- Transactional level data associated with rental agreements, including product info, rate/price changes, whether the product was new or used, and sales channel
- Store information
- Survey data from Fuel Cycle Results from three separate surveys which collected data on specific TV models
Who can access: anyone at Penn, for the purpose of academic research
GDELT 1.0 Events, from Wharton Computing
Supported by Google Jigsaw, the GDELT Project monitors the world’s broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world.
Who can access: Wharton faculty, PhD students, postdocs, and staff. Penn and Wharton UGR/MBA students may access these open-source data here.
Hachette Book Group, from AI at Wharton
Hachette Book Group is a leading trade book publisher based in New York and a division of Hachette Livre (a Lagardère Company).
Data includes:
- Information for ~2,200 books that generated significant traffic during a 12-month period, including:
- Sales data, including shipments, aggregated point of sales (weekly), and affiliate marketing sales data
- Social analytics data, including traffic from social media sites to website
- Analytics for website pages related to books, including clicks, demographics, and visitor counts
- Email campaign data
- Book product metadata, including book information, current price, page count, genre, and ISBN
- NPD BookScan (for sales data from competitors)
- Online ad stats
- Marketing spend/budgets
Who can access: anyone at Penn, for the purpose of academic research
Hertz, from AI at Wharton
The Hertz Corporation is a world leader in retail rental cars and equipment. This dataset includes employee engagement surveys linked to Hertz locations in the U.S. and Canada, transactions of rental cars in those locations and customer satisfaction surveys for those transactions. These data are longitudinal over a two-year window, providing opportunities for research from a variety of different angles. Studies of organizational behavior, customer loyalty and engagement, geographic retail transactions, up selling/add-on behavior, and customer segmentation are all possible in this rich and detailed dataset.
Data includes:
- Over 68,000 responses to a semi-annual employee engagement survey
- Over 3,000 rental locations in U.S. and Canada, all uniquely identified across data
- Over 80,000 responses to a post-transaction customer satisfaction survey with detailed transaction data for the corresponding rental
Who can access: anyone at Penn, for the purpose of academic research
Historical Tweet Database, from Wharton Computing
In partnership with The Annenberg School, Wharton Computing has compiled a dataset of historical Tweets. Collection began in April 2012 and concluded in November 2022. This data is queryable using SQL. For more info, see here.
Data includes:
- Over 14 billion tweet objects, representing about 1% of total Twitter volume
Who can access: anyone within Wharton or Annenberg, for the purpose of academic research
International Gaming Company, from AI at Wharton
An anonymous major sports video game franchise has provided data covering a three-year period, including annual releases of new versions and purchase incidences of virtual currency during that time.
Data includes:
- Records on approximately 60,000 players covering up to three years of player behavior
- Over 1.6 million unique game session records, including player ID, session duration, and game console used
- Over 46,000 purchase incidences, including player ID, game console used, and timestamp of purchase
Who can access: anyone at Penn, for the purpose of academic research
Lexis Nexis Corporate Affiliations, from Lippincott Library
International directory of corporate structure information for public and private companies. It reports firm details including location, size, executives and directors, and links to parent or subsidiary firms. Available coverage begins in 1993.
Who can access: anyone at Penn, for the purpose of academic research.
Nielsen, from Wharton Computing
Nielsen is a global leader in audience measurement, data and analytics, shaping the future of media.
The James M. Kilts Center for Marketing at Chicago Booth and the Nielsen Company have partnered to make two consumer marketing datasets available to US-based academic researchers.
These datasets are available for purchase to interested Penn parties at a substantial discount.
Who can access: tenured and tenure-track faculty, PhD students, and postdoctoral researchers from an accredited academic institution are eligible to have direct access to data from the Kilts Center. Each eligible researcher accessing the data must register and have approval from the Kilts Center.
Police Officer Registry, from Research on Policing Reform and Accountability
This is a dataset of 220,000 sworn law enforcement officers from 98 of the 100 largest policing agencies in the U.S. ranked by number of sworn officers, representing 1/3 of all police officers nationwide.
Data includes:
- Officer names, ranks, and agencies; all obtained from public records
- Agency-level estimates, using both public and commercial data, on party identification, political participation (turnout), household income, age, race/ethnicity, and gender
- Aggregate party identification, political participation (turnout), household income, age, race/ethnicity, and gender for officers’ home U.S. Census tracts, as well as for civilians at large in the jurisdiction
Who can access: anyone at Penn, for the purpose of academic research
Philadelphia Orchestra & Kimmel Center, from Wharton AI & Analytics Initiative
The 2021 union of the Philadelphia Orchestra (PO) and Kimmel Center (KC) brought together two heralded institutions in the Philadelphia performing arts community. With a campus located in the heart of the Avenue of the Arts, the partnered organizations play an integral role in the development and showcase of Philadelphia culture. The data includes customer profile data, customer purchase data, and email marketing/journey data for 2017-present.
Data includes:
- Customer profiles, including customer number, age, gender, and subscription status for email opt-in and newsletters
- Customer purchase data, including customer number, order information, and contribution with order
- Email journey and email marketing data, including email description, customer numbers sent to, customer numbers opening email, customer numbers not opening email, customer numbers click through, and customer numbers unsubscribes
Who can access: anyone at Penn, for the purpose of academic research
Quick Service Restaurant Chain, from AI at Wharton
An anonymous independent purchasing cooperative that serves as a supplier to a major quick service restaurant chain has provided a unique dataset, including individual transactions from approximately 2,300 restaurant locations across four geographic regions and all purchases made by 5,000 random individual customers over the course of two years. In addition to typical transaction data, the data also includes detailed information about what products each customer purchased and customer survey results – allowing a comprehensive view of the product and service quality for each customer purchase.
Data includes:
- Franchise point of sale transactions, including details on which menu item(s) were purchased, quantities of each item, payment information, and any discounts/promotions applied to the order
- Metadata on specific restaurants, including open/close date, and store type (such as street store vs. food court storefront)
- Survey responses submitted by customers linked to individual restaurants
Who can access: anyone at Penn, for the purpose of academic research
Reed Smith, from AI at Wharton
Reed Smith is a dynamic international law firm, dedicated to helping clients move their businesses forward. The firm has more than 1,700 lawyers in 28 offices throughout the United States, Europe, the Middle East, and Asia.
Data includes:
- Timecards data over three years, including task descriptions and codes, hours worked, amount billed, and information about the attorney
- Legal matter data for 8,000-10,000 clients over three years, including types of work, tags, industry, and geography
Who can access: anyone at Penn, for the purpose of academic research
TRACE, from Wharton Computing
The Trade Reporting and Compliance Engine is the FINRA-developed vehicle that facilitates the mandatory reporting of over-the-counter transactions in eligible fixed income securities.
Who can access: anyone at Penn who is approved by FINRA upon application
Add Your Dataset
If you have a dataset that you would like to add to this catalog, simply complete our Data Intake Form. Please note that iWRDS datasets must have a data user agreement with the provider that allows for data use by Wharton faculty and students for educational purposes. If you have any questions, contact ai-analytics@wharton.upenn.edu.
Additional Resources
Business Databases, from Lippincott Library
120+ business databases provided by the Wharton School’s Lippincott Library.
Data Repository, from WRDS
600+ datasets from more than 50 vendors available for users at all experience levels.
CANDOR Corpus, from BetterUp & OID
7+ million word, 850 hour corpus of audio, video, and transcripts from 1,656 recorded conversations.