At SūmerSports, we’ve taken steps to open source our football analytics work to grow the space from both a talent and knowledge perspective. For the 2025 Big Data Bowl, Sūmer released a reduced version of their transformer model fitted for the competition task which ultimately powered my winning submission. That release also led to multiple success stories across the community and exciting growth for the Sūmer team. See their blog post detailing last year’s success.
This year, Sūmer and I are continuing that path.
The Challenge: Decoding the Art of Coverage
The NFL Big Data Bowl 2026 presents a unique challenge: analyzing and predicting trajectories when the ball is in the air using player tracking data and supplementary coverage data. With 22 players on the field generating positional data at 10Hz historically, the complexity of modeling spatial-temporal interactions is slightly reduced, not to mention the much larger dataset contestants are privy too. This year, contestants have roughly half the players and fewer frames to work with.
The Supplementary Dataset: Enriching the Analytics Landscape
Beyond Basic Tracking: Context Matters
SūmerSports’ supplementary dataset represents years of research, engineering, and validation, adding crucial context to raw tracking data. We worked hard to develop highly generalizable models so that we can provide inference-time solutions directly to you.
1. Player-Level Data
- Alignment: Where a player is lined up at the snap. Coupled with season position can help differentiate between an on/off ball linebacker or a TE lined up in the slot vs in-line.
- Targeted Defender: A boolean flag indicating whether that defender was responsible for covering the targeted receiver over the course of the play allowing the model/analysts to distinguish ball-hawk vs. pursuit trajectories.
- Coverage Responsibility: A prediction of which zone/man responsibility the player had over the play. Deep third in Cover 3, and hook/curl players play differently... maybe you can find out how?
2. Frame-Level Data
- Coverage Scheme: Possibly the most influential data we offer, is the frame-level prediction of the scheme that the play caller called on that play. For example, at snap Cover 3 vs at throw point Cover 2 Man.
The data is available at Sūmer Supplement.
Data Investigation
We are also sharing a sample exploratory data analysis notebook, where I used our supplementary data to approach a major analytics problem: “How do disguises evolve over the course of a play?”.
Here’s an example of one of my findings, highlighting a prominent shift in distributions from showing Cover 3 (1-high shell) to Cover 2 (2-high shells) in the training data.

I also gave a sample of our visualization and a guide on how to properly join the supplement data with the BDB training data:

The full data investigation is available at SūmerSports Supplement Investigation.
Transformer Architecture
Last year, Sūmer made a groundbreaking contribution by open sourcing their transformer architecture, which powered my and Smit Bajaj’s (now a Quantitative Analyst with the Philadelphia Eagles) 2025 winning submission: Exposing Coverage Tells in the Presnap.
This year, we’ve updated that same notebook to predict Man vs. Zone coverage using the 2026 dataset. The results are included below.

For those interested in digging deeper into the transformer model:
- Attention Is All You Need, for Sports Tracking Data
- The workshop me and Sūmer’s Director of Deep Learning Research Udit Ranasaria presented at Carnegie Mellon Sports Analytics conference on our public GitHub repository:
- SportsTrackingTransformer or,
- Our Kaggle notebook Modeling with Transformers 2026
A Word of Advice
The transformer notebook was instrumental to me and Smit’s success last year , but the real innovation was our own. I’ve advised dozens of competitors that these resources are meant as a launching pad, tools to accelerate your modeling process, but we’ve intentionally left some questions unanswered.
This architecture has the goal of solving the player-ordering problem and a foundation for the spatial understanding of your model, but we want you to experiment and innovate with your modeling and presentations as that will separate the best submissions from the rest.
Good luck!
The open-source nature of this release ensures that the entire sports analytics community can benefit from these advances, potentially revolutionizing how we understand and analyze America's most popular sport. As the competition progresses, it will be exciting to see how other teams build upon this foundation and what new insights emerge from this rich combination of advanced modeling and comprehensive data.
Join the discussion and contribute to advancing football analytics at the NFL Big Data Bowl 2026 competition page. For any further questions reach out to Vishakh at @VishakhS74 or vishakh.sandwar[at]sumersports.com.

.jpg&w=3840&q=75)

