Professional cycling is a team sport, with a clearly defined roster for each race (startlist) and coaches directing strategy both pre-race and during the race. As such, teammates and the team a rider is on matter significantly for success. This is probably most apparent in the final kilometers of a bunch sprint race where teams jockey for position, attempting to deliver their fast man to the finish line in the best position to win the race.
Evaluating sprinters within this eco-system is difficult. Javi Angulo has a recent piece using the Glicko method (using head to head results) to rate sprinters where he rates top sprinter at end of 2021. Teun van Erp and Rob Lamberts (with an assist from multi-TDF winner Marcel Kittel) use video and power analysis to analyze the determinants of sprint performance in a 2021 study. Besides these detailed analyses, we can use stats like win rate/podium rate/average rank/PCS points won to judge sprint performance at a more superficial level.
But what about measuring the impact on sprint performance of teammates?
I’ve designed a handful of methods which could illuminate our knowledge on this subject. These methods all have clear flaws – most notably that we do not know team strategy/roles within team.
Simple win or loss calculations
At the most basic, we can assume teammates share equal credit for victories in bunch sprints (and equal penalty for not winning). Therefore we can calculate each rider’s team win rate in bunch sprints (I’ve included all 2.1/1.1 or higher races where 20+ riders finished within 3 seconds of the winner).
For 2021, the top results are above. If you follow World Tour rosters closely you’ll realize that all ten of these riders are Quick Step riders – the team which year-on-year dominates the sport. The results are not so Quick Step biased in every year, but they show impact of multi-collinearity (a fancy statistics way of saying that it’s difficult to tease out the unique impact of multiple different factors when you rarely observe them apart from each other).
Eg, in 2020, the top five riders are Arnaud Demare and his sprint train. At least three of them + Demare raced together in bunch sprints 24 times in 2020 (16 times with all five riders together). Demare raced in a bunch sprint just one other time and the other riders were in just five bunch sprints apart from Demare. So was Demare’s extreme success in 2020 (11 wins in 25 races) mostly his dominance, mostly his teammates dominance, or a mix?
We can look at larger samples of seasons to try to look at changes in team personnel. However, looking at Demare’s team going back to 2018 still sees 112 bunch sprints where 66 came with at least three of his sprint lieutenants in the lineup and another 35 with two of the four riders present. FDJ has raced 234 bunch sprints without Demare and won 4% of the bunch sprints vs 26% in 112 races with Demare.
With all four of his helpers, Demare has won 37%. With three of the four, he’s won 23%, with just two he’s won 17%, but with 0 or 1 he’s won 27%. So while it looks like more sprint helpers/more familiar helpers means better results, we’re not much closer to saying how important Demare is vs his helpers vs his team.
We can expand this analysis and just credit riders for team wins where they were present in the bunch at the end of the race. For example, Tim Declercq was only in the bunch for 5 of 24 bunch sprints in 2021 season vs 72% for Michael Morkov. Perhaps if we’re analysing who has the most impact on bunch sprints, we should ignore riders who weren’t present? Of course, it’s likely that without Declercq some of those bunch sprints would have turned into either breakaway victories or a late attack would’ve been launched or Quick Step would’ve burnt other riders essential to the sprint train.
Who Does the Team Trust?
Too much of sports analytics is results oriented, but there is loads of information that is not captured by the result. Teams are privy to training data, injury data, interpersonal relationships, and much else that we can access by looking at how team directors choose their lineups. Eg, analytical models based on results would probably tell you Cavendish shouldn’t have been in the start list for the Tour de France, but Quick Step had seen enough from Cav to think he was the best option to sprint.
We can look at which start lists to see who teams trust to race alongside their best sprinters. To do this analysis, I built a simple rating system for sprinters based on their finishing positions which easily discriminates between the best sprinters (eg, Bennett, Ewan, Van Aert in 2020-21) and also-rans. I then looked at how riders in a team were deployed alongside sprinters. Eg, if Quick Step sends Morkov to a race with Sam Bennett, but Pieter Serry to race with a lesser sprinter like Alvaro Hodeg that might say something in the aggregate.
For 2020-21, Sam Bennett raced with the highest quality best sprinter (obviously as he is one of the best in the world and was the best sprinter on his team in every race he participated in). More interestingly, Morkov was the clear 2nd place rider as he raced with Bennett in 42 of 62 bunch sprints (and 42 of 46 bunch sprints that Bennett participated in). At the bottom of the list are riders who were mainly deployed alongside weaker sprinters or in lineups without a clear sprinter like Honore, Serry, and Vansevenant.
For the full peloton, the top 10 of riders who were not the best sprinter on their team a majority of races is below:
We see Morkov appear along with three of Caleb Ewan’s dedicated sprint train from Lotto Soudal and four of FDJ’s previously discussed sprint train for Demare. Theuns typically rides with Stuyven and/or Pedersen and Consonni was the main support for Elia Viviani.
At the other end of the list are riders typically not deployed with top sprinters at least on teams that have one.
We can also simply look at which riders raced alongside the top sprinters in the highest percentage of bunch sprint races from 2020-21. Eg, for Sam Bennett the importance of Morkov is obvious as Morkov featured with him in 91% of bunch sprints vs 48% for the next highest Quick Step rider. Caleb Ewan relied on both Roger Kluge and Jasper de Buyst for 84%+ of his bunch sprints. The aforementioned FDJ sprint train represents four the eleven most common pairings between top sprinters and a teammate. The other standout was the pairing of Gaviria-Richeze for Team UAE. Richeze missed just one of Gaviria’s bunch sprints in 2020-21.
Advanced Statistical Models
Statistical models for teasing out multi-collinearity do exist. One promising approach used in sports like basketball or hockey is regularization (where coefficients are penalized using Lasso or Ridge regression). Running lasso regression on this type of data essentially produces coefficient estimates which are often zero if the model can’t determine that the term is significantly impacting the results.
To set up the data, I’ve filtered first for riders with 60+ bunch sprints in the last four seasons (2017-20) and then set-up a matrix with a 1 in the rider’s column if they were in that race or a 0 if not. This produces a matrix with over 400 riders. The regression is just run on a binary outcome of a win for the team or not (could run a similar regression of finishing position or podium as well). I’ve also controlled for quality of the best sprinter on the team for each race and the level (.1, Pro, or World Tour) of the race.
Training the model on 2017-20 and testing on 2021 yields interesting results with honestly some nonsensical results at the individual rider level (the top impact rider is Damien Howson – a climber who apparently improves his teams chances of winning a sprint from 6% to 17%). Many of the other top impacts seem believable; eg, I could believe Davide Ballerini increases his team’s chances of winning from 6% to 12% because someone has to be responsible for Quick Step’s incredible ability to win sprints year-after-year.
Turning to the predictions at race level in 2021, Quick Step’s Volta ao Algarve squad in stage 1 was seen as best positioned to win a race of the season. That is driven by Sam Bennett being really good, but also the model viewing riders like Jakobsen, Morkov, Asgreen, Archbold, and Ballerini as all having strong impact on winning. They are viewed with 47% probability of winning stage 1. With Bennett, but with six neutral teammates, they would have only 15% probability.
To evaluate the model I compared it to the predictions with a neutral model (just considering ability of best sprinter). If this model says anything valuable, we’ll see a difference between the quality of prediction for model and the neutral model. We can also compare to a completely naive model which just assigns every team a probability of 1 / N teams in race.
|Metric||Lasso Model||Neutral Model||Naive Model|
|Mean Square Error||0.0393||0.0414||0.0445|
The mean square error of these three different predictions show the neutral model improves on the naive model by 0.031 or about 7% improvement towards perfect. So knowing how good the best sprinter on the team is vs knowing nothing is worth about 7% gain.
The Lasso model which judges impact of individual riders is worth another 4.5% of improvement over the neutral model. So knowing the impact of all riders on probability of winning is about two thirds as valuable as knowing the ability of best sprinter.
My sense is this model is not ready for primetime as there is no correlation between the coefficient produced by the lasso model and how often riders are deployed with the best sprinters. It seems likely that judging a rider’s ability to impact sprint results based off how their team deploys them yields the most useful information.
One thought on “Finding the Sprint MVPs”