Women in Data Science @ Penn Conference 2025 Workshop

Friday, April 4
1:00 — 3:00 p.m.

Jon M. Huntsman Hall | 3730 Walnut Street, Room 340, Philadelphia, PA 19104

1:00–3:00 p.m.

From Data to Discovery: Exploring AI with a Patent Case Study, ChatGPT, and Generative Models

Jon M. Huntsman Hall, Room 340

Led by Linda Zhao, Professor of Statistics and Data Science, and Xinyu Xie, Ph.D. Student, both from the University of Pennsylvania. The session will be moderated by Lynn Wu, Associate Professor of Operations, Information and Decisions at the Wharton School.

In today’s data-driven landscape, vast unstructured data sources—such as documents, texts, and electronic health records (EHRs)—demand advanced AI tools to unlock their full potential. Generative AI, powered by Large Language Models (LLMs), is becoming indispensable for processing and extracting insights from complex language-based data. Applications like patent evaluation or transforming healthcare through EHR analysis are just a few examples of how AI is reshaping industries.

This interactive workshop, centered on a patent approval case study, will introduce students to the full pipeline of solving real-world problems using AI and data science. Participants will explore:

How to identify key problems that lend themselves to AI-driven solutions.
Collecting and preparing data, building models, and running state-of-the-art algorithms.
Validating models and interpreting their results.

While the workshop will include live demonstrations using RMarkdown, students will have the opportunity to interact with tools like ChatGPT and HuggingFace, a platform hosting thousands of pre-trained Transformer models. Due to package limitations, certain code segments will be demonstrated, but key concepts and steps will be clearly explained.

The workshop will provide a deep dive into how LLMs evolved from basic concepts such as regression (understanding how weights and coefficients are estimated) to Neural Networks (NN), and finally to powerful Transformer-based models like those found on HuggingFace. By the end of the session, students will have a solid understanding of how modern AI technologies—like ChatGPT and Transformers—help tackle large, unstructured datasets for applications such as patent approvals, EHR analysis, and beyond.

*All attendees should bring a laptop or device. This event is open to all.

Friday, April 4 1:00 — 3:00 p.m.

Jon M. Huntsman Hall | 3730 Walnut Street, Room 340, Philadelphia, PA 19104

1:00–3:00 p.m.

From Data to Discovery: Exploring AI with a Patent Case Study, ChatGPT, and Generative Models

Friday, April 4
1:00 — 3:00 p.m.