Automating RxSense’s Client Onboarding Process

About RxSense
RxSense partners with PBMs (Pharmacy Benefit Managers) to provide cheaper prescription drugs to consumers. Through its proprietary platforms, RxSense streamlines pricing, claims processing, and data analytics to increase transparency and reduce costs across the pharmaceutical supply chain. Their motto, “Connecting the Pharmacy Ecosystem,” reinforces their desire to connect the players in the space.
The PBM space is complex and often opaque with multiple stakeholders—insurers, pharmacies, and drug manufacturers—all interacting through intermediaries. One chief complexity is the fact that many of RxSense’s PBM clients often don’t know how their claims are processed. Proprietary, third-party software, established decades ago, creates unknowns that RxSense has to work through. Deciphering how a benefits plan is set up for a client can take an analyst one month.
Objective
Our key problem statement was as follows: develop an automation tool for prospective clients by predicting benefits intake values in order to facilitate more rapid onboarding and benefits plan creation.
To answer this overall question, we would need to answer the following questions:
- What is the copay structure without manual intake for a new client
- Can we reverse-engineer plan design from historical claims using intelligent parsing and machine learning?
- How can we predict copay and other plan components (e.g. deductible, max dollar per Rx)?
Ultimately, in answering all of these, we drove to save RxSense time and resources, cutting a one-month, manual process down to a shorter, automated process.
Approach
We approached the problem in three main stages.
First, we sought to understand the problem and review the dataset. RxSense provided two central files: a Historical Claims file (with 322,910 historical claims and 55 feature) and a Benefit Intake (125 columns and 70 unique group IDs). We spent two weeks to first understand the makeup of these files and how they fit into RxSense’s regular workstream. Once we comprehended their files, we needed to clean them: unstructured text logic, complex plan variability, and sparsely filled intake columns would prove incredibly difficult to overcome.
Second, we conducted individual and group exploratory data analysis to determine if there are patterns in the data. We isolated key groups and divided them throughout the team, starting at a small level of assigning one group and one column (copay) of that group to one analyst. We discerned several patterns at the micro-level to include a minibatch k-means clustering that showed existence of inherent groupings within the historical claims data. Our analysis evolved into more advanced macro-analysis, including a decision-tree analysis that predicted with 81% accuracy. This stage demonstrated that there were patterns embedded in the data and that we could advance to modeling.
Third, we built a series of machine-learning models that predicted key columns (view “copay modeling plan” visualization) with incredible accuracy. Within those columns, we did not limit the types of models we explored. We leveraged Classification, Regression, and Clustering, finding that their performance varied at the claims and group level.
Solution
Our solution we a machine-learning model that predicts six of the most important columns with >90% accuracy in a matter of minutes.
This solution relies on a series of regression, classification, and clustering models. Specifically, the regression models achieved group level MAE as low as 1.2. The classification models performed excellent at the claims level consistently hitting accuracy >90%.
We endeavored to also ensure our model is modular and scalable. Each benefit component (NP, Generic, DAW, Deductible, and Max $) is modeled via separate reusable notebooks. These are easily adaptable to new copay types or coverage rules and can be easily run to generate new models with additional data in the future.
Our solution also hinges on a key copay parsing logic (visualization below) that helped us overcome issues with unstructured text logic in the claims data. As additional client data is received, this model pipeline will adapt and become more accurate.
In terms of next steps, RxSense will continue to validate our work and integrate it into their pipeline. There was expressed desire to reactivate this team and project to continue integration work, perhaps during next semester’s AI & Analytics Accelerator.
Impact
Our process transforms a laborious one-month process into something that can transpire in a few minutes. This is an incredible efficiency that will have multiple business impacts on RxSense.
First, our quick and accurate model will save RxSense time and resources (costs). This resource saving will not be squandered, as the time spent understanding copay logic can be better used to understand more complex matters like drugs authorized and business trends. Since our model was >90% accurate, more time and resources can be poured into the remaining 10% to ensure anomalies are accounted for.
Second, these resource savings will generate a competitive advantage to the client-onboarding experience for RxSense, increasing its revenue. If it’s easier to onboard, RxSense can advance onboarding quicker and thus handle a larger volume of clients. This will build, what CEO and Founder Rick Bates desired, a formidable moat around their business.
In terms of broad economic impacts, we would anticipate that this tool, once operationalized completely, could encourage larger competitors like Amazon to either invest more heavily in R&D to duplicate our achievements or for a potential merger and acquisition to occur. Efficiency in the PBM space will also have an added advantage to patients around the US of illuminating cost savings, helping to drive down drug prices in the long term.
Data Visualizations



About the AI & Analytics Accelerator
The AI & Analytics Accelerator, part of the Wharton AI & Analytics Initiative, partners with organizations to develop cutting-edge AI and data-driven solutions for real-world challenges. Through collaboration with Wharton faculty, researchers, and students, the Accelerator transforms complex data into actionable insights, driving innovation across industries.