Datathon Case Study

Discovering New Customers and Game Usage via New Data Lenses


About Electronic Arts

Electronic Arts Inc. (EA) is a global leader in digital interactive entertainment. EA develops and delivers games, content, and online services for Internet-connected consoles, mobile devices, and personal computers. EA has more than 300 million registered players around the world.


  • Sports & Entertainment
  • Technology


About the Datathon

As video game companies move toward providing subscription services, Electronic Arts (EA) is trying to achieve better game recommendations using innovative methods to match players with games that they will likely enjoy playing. Twenty-five self-formed Penn/Wharton student teams had one week to analyze user data from Origin Access, EA’s all-you-can-play subscription service, including their activity on the games included in the subscription catalogue and individually purchased games.



EA was interested in identifying multiple value-added insights around:

(1) more diverse and consolidated game attributes, (2) newly discovered attributes of new and experienced players, (3) maximization of game usage and subscription services, (4) converting free trial users to monthly and annual subscriptions, and (5) increasing the number of games users play. Students were given the freedom to exercise judgement on areas that would release the most value from the available data. Students were also encouraged to find novel ways to extract variables from publicly available datasets (e.g., integrating sentiment data from social platforms like Reddit and Twitter or user ratings from websites like VGChartz and Amazon).



On the fifth and final day, the student teams shared their findings with EA executives at the Datathon symposium held on Wharton’s campus, they presented a diverse set of solutions and recommendations using with a variety of machine learning models including but not limited to: random forests, k-means clustering, support-vector machines, and neural networks.

In the end, EA chose the team who used an ensemble of models – support-vector machines model, a random forest model, and k-means clustering algorithm. The team made the case for a simple and easily implementable subscription strategy of increasing the promotion efforts on the 1-month subscription.  In addition, their strategy of recommending the most popular and attractive games for new users and the preferred games based on genre for registered users was both innovative and implementable. The team’s presentation was marked by a sharp visual representation of their approach and a clear translation and description of their results and recommended strategies.

The Winning Team

EA Datathon Winning Team


The winning team used two supervised machine learning models – a support-vector machine (SVM) model and a random forest model – for prediction analysis and an unsupervised learning algorithm – k-means clustering – to group players and games.

Using the initial six datasets provided in the diagram below, the team joined and cleaned the data in Python and R. They then aggregated the user subscription windows, game sessions, and game entitlements (licenses) to generate a single DataFrame with a unique user ID per row, including customer features such as countries, subscription type and duration, franchise played, etc. After processing the data, the team was left with nearly 10,000 unique users. After joining and aggregating the game-related datasets, the team created a single DataFrame to describe the games. Features included game franchise, game genre, duration in the Origin Access Vault game platform, and average time the user spent on the game. In the end, the team created one DataFrame for users and one DataFrame for games to use in a game recommendation model.

Then, the team fed the user-based data into several classifiers, including their SVM model and random forest, to predict the most favored game genre for each person. They used k-means clustering analysis to cluster users and games based on common attributes of each (see Figure 1).

Finally, they used this information to filter and rank game recommendations for each player.

Figure 1. For the prediction analysis and k-means clustering algorithm, the winning team grouped the players using a support-vector machine model and random forest model.


The winning team proposed a recommendation system for both experienced players and new players.

For experienced players, the team recommended that EA create a predication model identifying a player’s favorite genre and a clustering algorithm to group players based upon common attributes. Then they should determine the intersection of the popular games among players within the cluster and the favorite genre of the player of interest, excluding the games the player has already played. Finally, EA should rank the games according to popularity (based upon average time spent playing a specific game) and recommend these games to the player.

For new players, the team suggested that EA recommend games based upon player location and game popularity in the new player’s surrounding area.


The predictive model provides an alternative for EA to improve their game recommendation system and make it more personalized. The clustering data visualization also reflects certain patterns of user preferences and psychology. EA will also use this model to better analyze customer data and prioritize user features within personalized marketing communications.