Difficulty of achieving GC results in different races

This is the time of year for cycling teams to plan their riders’ programs for the new year and for media/fans to speculate about which races riders will go for in 2022. Part of that process is trying to figure out where riders are best suited to get results – especially in the grand tours (of which we know all three routes now). On the horizon, there’s also been discussion around potential relegation of teams from the top level World Tour and how teams can best optimize their schedules to avoid that relegation.

A lot of the work I did with professional golfers was related to scheduling: where they would be able to play their best golf and where that best golf would be most rewarded by the arcane point system in professional golf. I’ve applied that type of approach below to identify: 1) how difficult it is to achieve different results in stage races and 2) where those results are disproportionately rewarded by cycling’s own arcane points system.

First, it’s not much more difficult to achieve results in the three three-week long grand tours than it is in the week-long stage races in the World Tour. Generally, it’s the same set of riders competing for those results whether it’s the Tour of the Basque Country or the Vuelta a Espana.

Second, grand tour success is heavily rewarded relative to other races. GC positions which are equally difficult to achieve can be rewarded 2x more in grand tours relative to those other World Tour stage races and sometimes 3x more in grand tours relative to other lower level stage races.

These two findings explain why teams and riders compete so much for minor top 10 placings in grand tours even when those minor placings are ~10 minutes back of the GC leader.

Difficulty of achieving GC results

The easiest way to compare the difficulty of achieving a GC result in one race vs another is to simply compare results within the same rider/season. Eg, Tadej Pogacar raced five stage races in 2021 coming in 1st in UAE Tour, 1st in Tirreno Adriatico, 3rd in Basque Country, 1st in Slovenia, and 1st in Tour de France. Based on those five finishes and completely ignoring any context around them, we might judge Basque Country race as the toughest as Pogacar failed to win there.

However, we have hundreds of similar comparisons between these races just from the last decade of results. 227 riders have ridden Basque Country and Tour de France in the same season in the last eight years. 123 have ridden Basque Country and Tirreno Adriatico, 36 have ridden Basque Country and UAE Tour, and 30 have ridden Basque Country and Slovenia. We can leverage those comparisons to judge the relative difficulty between each pair of two races.

Race difficulty comparisons for Tour of Basque Country (2014-21)

Above I’ve shown these aggregate difficulty comparisons for Tour of Basque Country and the ~40 races with at least 30 comparisons in 2014-21. They’re ordered by difficulty where the last column value is the expected finishing position in Race A (Basque Country) given a 5th place GC finish on Race B. Eg, if a rider finishes 5th in the Tour de France they would be expected to achieve an equivalent of 4.3 in Basque Country.

Pogacar’s 2021 races are highlighted in red where Tour de France is the toughest, UAE and Tirreno are similar difficulty to Basque Country (expected finishes of 5.2 and 5.5), and Slovenia is viewed as much easier with expected finish of 18th in Basque Country for someone finishing 5th in Slovenia.

This method confirms the primacy in difficulty of the Tour de France as every comparison race is easier to achieve results in than the Tour. However, it also shows the two other grand tours are not any more difficult to achieve results in than the bigger week-long World Tour races like Basque Country, Tour of Catalonia, Tirreno Adriatico, Paris-Nice, and the Dauphine. A 5th in the Giro d’Italia is worth about 5.6 in those five races on average. A 5th in the Vuelta a Espana is similarly worth about a 4.6 in those five races on average. A 5th in the Tour de France is worth a 3.8.

Scaling all races versus those five week-long stage races shows the following hierarchy:

Below I’ve included my top 20 GC riders entering the Tour of the Basque Country in April and whether they raced Basque Country and the three grand tours in 2021. 12 of the top 20 raced Basque Country – including the two clear best – while 14 of the top 20 raced the Tour de France, only 5 of the top 20 raced the Giro, and 12 of the top 20 raced the Vuelta.

UCI Points

There are two popular point system in professional cycling: the unofficial ProCyclingStats points – which I’ve referenced before – and the official UCI points – which determine team eligibility for the World Tour and other qualifications. The point system is explained well here by INRNG, but basically the UCI decides how to group races together and assigns them different points (eg, the Tour de France is its own category awarding between 18-25% more points for a given placings than the Giro or Vuelta and more than 3x more points than the minor World Tour stage races).

We can use those scales to make equivalencies between what the UCI thinks are similar finishes. Eg, a 12th place finish in the Giro or Vuelta is worth 8th place in Catalonia or Basque Country, 13th place in the Tour de France, and 7th in UAE Tour. Moving outside the World Tour races, that 12th place is worth 2nd place in a 2.1 stage race (Tour of Sicily, Route Occitanie) and 4th in a 2.Pro stage race (Arctic Race Norway, Tour of Denmark).

The UCI is saying 5th place in the Tour de France is worth 1st place in those big week-long stage races, 4th place in the Giro/Vuelta, and more than 1st place in every other stage race.

Rewards vs Difficulty

We can combine these two difficulty measures from my research and UCI point scales to find which races over and under-reward finishing highly. I use my research from above to find equivalent performances and then look to see how those are rewarded between races. Eg, I found it roughly equal in difficulty to finish 5th at Vuelta a Espana and Basque Country. However, Vuelta rewards finishing 5th with 2x the UCI points as Basque Country.

Relative reward of 5th in Basque Country vs equivalent finish in other races

Basque Country is rewarded between 50-75% as much for eleven World Tour stage races, but is rewarded 2x as much for the minor 2.1 level stage races like Valencia, Besseges, and Alpes Maritimes. These races attract difficult fields, but are shorter/lower level so they receive fewer points for equivalently difficult results.

The Sweet Spot

So earlier I said the sweet spot in terms of rewards were grand tours with the inverse being some of these week-long stage races at World Tour level. That is without factoring in the time the race takes (grand tours require 24 days of racing with rest leading in and recovery time leading out while the Basque Country is only 6 days of racing with many riders taking just a week before and after between races). Obviously if you’re optimizing at the team level with this data, you’ll factor greater time commitment for grand tours and the position of races on the schedule into planning.

Many of these effects are driven by different strength of fields in different races. I’ve shown GC ratings for riders before (others like PCS have similar rankings). Aggregating those ratings by race yields this plot where the x axis shows the strength of the riders in that race. Eg, Tour de France has the strongest field of GC riders, followed by the Vuelta. Part of the reason results are difficult to achieve in Catalonia and Basque Country are because they rank 3rd and 4th in strength of their riders. The Giro shows up as having a relatively weaker field (more comparable to the week-long stage races) which means the rewards in terms of UCI points are higher for equivalent positions.

Finding the Sprint MVPs

Professional cycling is a team sport, with a clearly defined roster for each race (startlist) and coaches directing strategy both pre-race and during the race. As such, teammates and the team a rider is on matter significantly for success. This is probably most apparent in the final kilometers of a bunch sprint race where teams jockey for position, attempting to deliver their fast man to the finish line in the best position to win the race.

Evaluating sprinters within this eco-system is difficult. Javi Angulo has a recent piece using the Glicko method (using head to head results) to rate sprinters where he rates top sprinter at end of 2021. Teun van Erp and Rob Lamberts (with an assist from multi-TDF winner Marcel Kittel) use video and power analysis to analyze the determinants of sprint performance in a 2021 study. Besides these detailed analyses, we can use stats like win rate/podium rate/average rank/PCS points won to judge sprint performance at a more superficial level.

But what about measuring the impact on sprint performance of teammates?

I’ve designed a handful of methods which could illuminate our knowledge on this subject. These methods all have clear flaws – most notably that we do not know team strategy/roles within team.

Simple win or loss calculations

At the most basic, we can assume teammates share equal credit for victories in bunch sprints (and equal penalty for not winning). Therefore we can calculate each rider’s team win rate in bunch sprints (I’ve included all 2.1/1.1 or higher races where 20+ riders finished within 3 seconds of the winner).

For 2021, the top results are above. If you follow World Tour rosters closely you’ll realize that all ten of these riders are Quick Step riders – the team which year-on-year dominates the sport. The results are not so Quick Step biased in every year, but they show impact of multi-collinearity (a fancy statistics way of saying that it’s difficult to tease out the unique impact of multiple different factors when you rarely observe them apart from each other).

Eg, in 2020, the top five riders are Arnaud Demare and his sprint train. At least three of them + Demare raced together in bunch sprints 24 times in 2020 (16 times with all five riders together). Demare raced in a bunch sprint just one other time and the other riders were in just five bunch sprints apart from Demare. So was Demare’s extreme success in 2020 (11 wins in 25 races) mostly his dominance, mostly his teammates dominance, or a mix?

We can look at larger samples of seasons to try to look at changes in team personnel. However, looking at Demare’s team going back to 2018 still sees 112 bunch sprints where 66 came with at least three of his sprint lieutenants in the lineup and another 35 with two of the four riders present. FDJ has raced 234 bunch sprints without Demare and won 4% of the bunch sprints vs 26% in 112 races with Demare.

With all four of his helpers, Demare has won 37%. With three of the four, he’s won 23%, with just two he’s won 17%, but with 0 or 1 he’s won 27%. So while it looks like more sprint helpers/more familiar helpers means better results, we’re not much closer to saying how important Demare is vs his helpers vs his team.

We can expand this analysis and just credit riders for team wins where they were present in the bunch at the end of the race. For example, Tim Declercq was only in the bunch for 5 of 24 bunch sprints in 2021 season vs 72% for Michael Morkov. Perhaps if we’re analysing who has the most impact on bunch sprints, we should ignore riders who weren’t present? Of course, it’s likely that without Declercq some of those bunch sprints would have turned into either breakaway victories or a late attack would’ve been launched or Quick Step would’ve burnt other riders essential to the sprint train.

Who Does the Team Trust?

Too much of sports analytics is results oriented, but there is loads of information that is not captured by the result. Teams are privy to training data, injury data, interpersonal relationships, and much else that we can access by looking at how team directors choose their lineups. Eg, analytical models based on results would probably tell you Cavendish shouldn’t have been in the start list for the Tour de France, but Quick Step had seen enough from Cav to think he was the best option to sprint.

We can look at which start lists to see who teams trust to race alongside their best sprinters. To do this analysis, I built a simple rating system for sprinters based on their finishing positions which easily discriminates between the best sprinters (eg, Bennett, Ewan, Van Aert in 2020-21) and also-rans. I then looked at how riders in a team were deployed alongside sprinters. Eg, if Quick Step sends Morkov to a race with Sam Bennett, but Pieter Serry to race with a lesser sprinter like Alvaro Hodeg that might say something in the aggregate.

For 2020-21, Sam Bennett raced with the highest quality best sprinter (obviously as he is one of the best in the world and was the best sprinter on his team in every race he participated in). More interestingly, Morkov was the clear 2nd place rider as he raced with Bennett in 42 of 62 bunch sprints (and 42 of 46 bunch sprints that Bennett participated in). At the bottom of the list are riders who were mainly deployed alongside weaker sprinters or in lineups without a clear sprinter like Honore, Serry, and Vansevenant.

For the full peloton, the top 10 of riders who were not the best sprinter on their team a majority of races is below:

We see Morkov appear along with three of Caleb Ewan’s dedicated sprint train from Lotto Soudal and four of FDJ’s previously discussed sprint train for Demare. Theuns typically rides with Stuyven and/or Pedersen and Consonni was the main support for Elia Viviani.

At the other end of the list are riders typically not deployed with top sprinters at least on teams that have one.

We can also simply look at which riders raced alongside the top sprinters in the highest percentage of bunch sprint races from 2020-21. Eg, for Sam Bennett the importance of Morkov is obvious as Morkov featured with him in 91% of bunch sprints vs 48% for the next highest Quick Step rider. Caleb Ewan relied on both Roger Kluge and Jasper de Buyst for 84%+ of his bunch sprints. The aforementioned FDJ sprint train represents four the eleven most common pairings between top sprinters and a teammate. The other standout was the pairing of Gaviria-Richeze for Team UAE. Richeze missed just one of Gaviria’s bunch sprints in 2020-21.

Advanced Statistical Models

Statistical models for teasing out multi-collinearity do exist. One promising approach used in sports like basketball or hockey is regularization (where coefficients are penalized using Lasso or Ridge regression). Running lasso regression on this type of data essentially produces coefficient estimates which are often zero if the model can’t determine that the term is significantly impacting the results.

To set up the data, I’ve filtered first for riders with 60+ bunch sprints in the last four seasons (2017-20) and then set-up a matrix with a 1 in the rider’s column if they were in that race or a 0 if not. This produces a matrix with over 400 riders. The regression is just run on a binary outcome of a win for the team or not (could run a similar regression of finishing position or podium as well). I’ve also controlled for quality of the best sprinter on the team for each race and the level (.1, Pro, or World Tour) of the race.

Training the model on 2017-20 and testing on 2021 yields interesting results with honestly some nonsensical results at the individual rider level (the top impact rider is Damien Howson – a climber who apparently improves his teams chances of winning a sprint from 6% to 17%). Many of the other top impacts seem believable; eg, I could believe Davide Ballerini increases his team’s chances of winning from 6% to 12% because someone has to be responsible for Quick Step’s incredible ability to win sprints year-after-year.

Turning to the predictions at race level in 2021, Quick Step’s Volta ao Algarve squad in stage 1 was seen as best positioned to win a race of the season. That is driven by Sam Bennett being really good, but also the model viewing riders like Jakobsen, Morkov, Asgreen, Archbold, and Ballerini as all having strong impact on winning. They are viewed with 47% probability of winning stage 1. With Bennett, but with six neutral teammates, they would have only 15% probability.

To evaluate the model I compared it to the predictions with a neutral model (just considering ability of best sprinter). If this model says anything valuable, we’ll see a difference between the quality of prediction for model and the neutral model. We can also compare to a completely naive model which just assigns every team a probability of 1 / N teams in race.

MetricLasso ModelNeutral ModelNaive Model
Mean Square Error0.03930.04140.0445

The mean square error of these three different predictions show the neutral model improves on the naive model by 0.031 or about 7% improvement towards perfect. So knowing how good the best sprinter on the team is vs knowing nothing is worth about 7% gain.

The Lasso model which judges impact of individual riders is worth another 4.5% of improvement over the neutral model. So knowing the impact of all riders on probability of winning is about two thirds as valuable as knowing the ability of best sprinter.

No correlation between metrics

My sense is this model is not ready for primetime as there is no correlation between the coefficient produced by the lasso model and how often riders are deployed with the best sprinters. It seems likely that judging a rider’s ability to impact sprint results based off how their team deploys them yields the most useful information.

Evaluating Riders: Log Rank

Evaluating rider performance in professional cycling is a hard problem. While more advanced statistics like climbing times, segment times, survival with leading group, and others are available for certain races and certain riders, for most races and certainly for anything historical we’re left with something like this PCS result table: finishing rank in race, maybe UCI points, PCS points, and time gaps.

So any rider performance statistic has to be based on one of those three data-points: time gaps, points, or finishing rank. Each has its place.

Time gaps are a very poor way to evaluate success in a bunch sprint where 100 riders might finish on the same time, but they can be a good way to evaluate success on a mountain stage with an uphill finish.

PCS points have been developed into a widely used evaluative method which recognizes that success in cycling can be achieved in a wide array of competitions (GC, race wins, jersey competitions) and has dozens of different scales which are used for different quality of races, but fundamentally their point scales are opinions on the value of different results relative to each other.

Finally, finishing rank is often used to count victories, podiums, or top 10 finishes across the season, but is plagued by vastly different difficulty levels to achieve certain results (how good is 3rd in a World Tour race relative to 1st in a .1 race?). Ranks are often notoriously difficult to take averages of; Wout Van Aert’s transcendent 2021 Tour de France yielded an average rank of 25th for a return of 3 stage wins, just behind Enric Mas’s 6th on GC with nary a stage podium finish.

In recent months, I’ve developed my own tweaks to use finishing rank as an evaluative method, producing a stat I’m calling Log Rank. The handful of keys to make it work are:

  1. All finishing ranks in a race are transformed by taking the natural logarithm. This produces a value system where the difference between finishing 1st vs 5th are large, while the difference between finishing 50th vs 100th is not as large. The red dots below show equal gaps between results; so 1st and 3rd are separated about as much as 3rd and 7th/8th. However 1st and 7th/8th are separated equally as 7th/8th and 55th. I think this is a fairly intuitive appraisal of the value of different finishing positions.

2. Using these transformed ranks, taking averages are much easier. For example, Wout Van Aert’s final week of Tour de France where he finished 25th, 40th, 36th, 43rd, 1st, 1st (average 24th) are transformed into 3.2, 3.7, 3.6, 3.8, 0, 0 (average log rank of 2.4) which can be re-transformed back into average rank of 11th (by taking e^x where x = average log rank). Basically, this says we care way more about Van Aert’s two victories than the fact he finished outside the top 20 in those other races. In fact, he could have finished 50th in those four stages (new average of 34th), but his log rank would only change to 13th.

3. The difficulty of different races are found by an objective system which looks at how difficult it is to achieve certain results in different level races. For example, in recent seasons it is roughly similar difficulty to achieve a 10th place in an U23 2.2 level race as a 27th place in a World Tour race. Using a host of these type of comparisons, I’ve created a Strength of Peloton rating system to judge all level of races against each other based on the difficulty to achieve certain levels of results. All that needs to be said here is that results are adjusted based on what type of races they were achieved in. For example, Ethan Hayter and Tadej Pogacar achieved very similar raw finishing ranks in 2021, but Pogacar did so against the 4th toughest pelotons and Hayter only around the 600th toughest.

2021 Log Rank Rankings

Applying those three steps yields the following top 10 for all 2021 results, just averaging all race results (ignoring time trials):

RiderAverage Log Rank
Wout Van Aert4.3
Tadej Pogacar4.9
Mathieu Van Der Poel5.0
Primoz Roglic6.9
Sam Bennett8.3
Sonny Colbrelli9.2
Ethan Hayter10.0
Jasper Philipsen10.2
David Gaudu10.4
Julian Alaphilippe11.6

Building on Log Rank

The next challenge was to build on this basic Log Rank to add in parcours level impacts of things like the climbing difficulty and whether the race ended in a bunch sprint. For example, Enric Mas raced 66 times on the road in non-time trials in 2021. If we’re judging how good of a rider he is we probably don’t care about where he finished in the flatter stages which littered the Tour de France and Vuelta a Espana. However, we care a lot about how he performed in the tougher climbing stages of those races and others.

The find the impact of climbing difficulty and a bunch sprint finish I set-up a mixed effects model which can be run over results from a given period of time (eg, July 2019 to June 2021 to predict performance going into the 2021 Tour de France). The model was specified using three random effects involving individual riders attempting to find a) their general level of ability to finish with a good finishing rank in races b) the impact of climbing difficulty on their finishes, and c) the impact of the race ending in a bunch sprint on their finishes.

lmer(log_rnk ~ (1 + climb_difficulty | rider) +
 (0 + bunch_sprint | rider)

Using this model, we would expect a sprinter like Sam Bennett who struggles in the hills and mountains, but generally ranks highly in terms of finishing rank to have a smaller individual coefficient (indicating that he generally achieves high finishes), a larger climbing difficulty coefficient (indicating that as races get tougher in terms of climbing his finish rank get larger/worse), and a negative bunch sprint coefficient (indicating that he finishes with better ranks when the race ends in a bunch sprint vs smaller group).

The model results for July 2019 to June 2021 show Bennett with about the 50th best general ability to finish highly (a above), the 20th worst impact of climbing difficulty (b above), and the 2nd best bunch sprint impact (c above). Overall, he would be expected to finish with an average rank of 3.7 in a flat, bunch sprint race – 2nd best in world between Wout Van Aert (3.1) and in front of Caleb Ewan (3.8).

We can similarly look for hilly races not ending in bunch sprints (prototypical classics race) where Mathieu Van Der Poel had the best prediction at that time at 6.8 – essentially tied with Wout Van Aert – and ahead of Roglic, Pogacar, Van Avermaet, and Alaphilippe.

The top predictions in high mountains race were unsurprisingly the three main recent grand tour winners: Pogacar, Roglic, and Bernal. They were followed by Mikel Landa and Adam Yates.

Where Does Separation Occur?

This post will be not cover particularly novel ground if you pay any attention to professional cycling. Many of the conclusions are obvious. However, if you find yourself in the know already, trust that this post is the necessary building block for more interesting work.

Fundamentally, cycling races can be viewed as war of attrition. As I laid out during this post and Dr. Seiler discusses in his video here, races normally feature a long stretch of steady efforts, before the pace is ramped up towards the end. This steadily increasing pace towards the back-end of the race is what creates separation between riders in most races. The exception is in some primarily flatter races which just do not feature the type of topography which results in time gaps and so the peloton finishes together in a bunch sprint. In all other races, separation is typically created – particularly on hills and/or mountainous sections of the race – but also on cobbled roads, gravel/poorly surfaced roads, in crosswinds, etc.

So that is a fan’s understanding, informed by some limited studying of power outputs on significant climbs across a large sample of races. However, we can leverage an even larger data-set of individual race segments on all types of flat, uphill, downhill, poor surface roads, etc. I’ve gathered a data-set of rider speeds on different length race segments primarily from 2020 professional season to do just that. There are 22,500 unique segments in this data-set covering 177 races.

What Produces Separation?

The metric of choice for showing separation is the time difference between 90th percentile in speed and 10th percentile in speed on a segment, divided by the median speed over that segment. Eg, if 90th percentile is 27 km/h, 10th percentile is 20 km/h, and median is 22 km/h the Separation Factor is about 32%. That is fairly high among all segments where the mean is 12% and median is 7%. The max Separation Factor for the average race is around 48% – typically a short segment.

Essentially treat the Separation Factor as the percentage difference in speed between riders racing the fastest and those racing the slowest. On the nine decisive final climbs in the 2020 Tour de France the Separation Factors averaged 29%, ranging between 18% for Col de la Loze and 39% for Orcieres-Merlette.

Separation factor average by gradient

Separation is primarily created by higher gradients. This is maybe the most blindingly obviously statement I’ve ever made, but there it is. Flatter or downhill segments created very little separation among the group on average, while uphill segments create increasingly more as the gradient increases from about 3% to over 10%.

When comparing segments on cobbles vs similar gradient segments on normal roads the rougher roads show a highly statistically significant difference of about 5 to 8% larger Separation Factor for cobbled sections vs normal roads, depending on how it is modelled (a model with gradient included tends to diminish the impact as many cobbled/white road sections are also uphill). The impact here is roughly a Separation Factor of 9% for a flat, non-cobbled segment vs 14% or higher for a flat, cobbled segment.

Separation factor by percentage thru race

And replicating the work done previously showing that power varied more in later stages of the races, segments further towards the end of a race provide for more separation than those earlier in the race, with the most significant increase in roughly the last third of the race.

Where is Separation Largest?

Clearly segments further through the race have the highest separation between fastest and slowest riders. But where does the moment with the largest separation occur in these races? For this sample of 177 races, the key moment on average is 88% through the race, with about 40% of races having this key moment in the last 3% of the race (or last 5km for a typical 180km race). It’s important to note a segment is counted as occurring based on where it ends within a race.

Where segment of max separation moment occurs in races

Again, to any fan of cycling the knowledge that the largest time gaps occur near the end of a race – particularly on summit finish climbs – is not novel. However, this data does show how rare it is for the segments which produce the largest time gaps to occur anywhere in the first half of the race.

Survival Probability (2020 TDF)

In recent years the Tour de France has added the live tracking feature to their online/second-screen coverage of the Tour. This telemetry data shows the position of every rider on the course (absent any errors/malfunctions/bike changes) throughout the race – including information about their speed, the road conditions, and wind conditions.

So far this has largely been exploited only as a social media activation tool for NTT (eg, on Twitter @letourdata). But knowing the position of every rider with their speed is obviously powerful information. For example, who was pulling in the lead group to try to extend the gap on stage 7 of this year’s Tour de France? How large was the group at the bottom of each final climb? How much time did Zakarin lose to the leaders on stage 8 on descents? Which Jumbo Visma domestique drove the pace the hardest on the climbs?

Leveraging this data, I’ve analyzed ten of the hilly or mountainous stages of this year’s Tour de France to look at the probability of staying with the front group (defined as the group with Primoz Roglic as he was in yellow for the lion’s share of these stages) over the stage. I’ve decided to ignore riders who spend the stage in the breakaway, but anyone who attacked away from Roglic (eg, Pogacar in stage 8) counts as surviving as well.

Survival Probability for Notable Stages

Survival probability with Roglic by rider type (Stage 8 2020 TDF)

Stage 8 was a short 141 km stage with three major climbs – Col de Mente at 82 km to the finish, Port de Bales at 37 km to the finish, and Col de Peyresourde at 11 km to the finish. Col de Mente did little to shake-up the peloton and almost all riders were able to come together to the bottom of the Port de Bales – a 12 km HC climb. That was where the major selection on the stage came; by the end of the climb, less than 20% of the riders in all rider types except Climbers had been distanced from the GC group. About 60% of Climbers survived Port de Bales with the GC group.

The selection for climbers came largely on the Peyresourde and about 30% of climbers survived with Roglic to the end of that climb (with nine riders finishing on the same time from the GC group).

Survival probability with Roglic by rider type (Stage 4 2020 TDF)

Compare that with Stage 4. Stage 4 was not particularly selective before the final climb with a handful of category 3/4 climbs leading up to the 1st category climb to Orcieres-Merlette. At the end of the final warm-up climb at 20 km to go at least 50% of domestiques and sprint train riders were still there along with upwards of 75% of puncheurs, mountain helpers, and climbers. The non-climbers were distanced quickly on the final climb, but it wasn’t until the final few kilometers of the stage that the selection was made among climbers and even then over 60% of them came to the line with Roglic (with 16 riders finishing on the same time).

Survival probability with Roglic by rider type (Stage 17 2020 TDF)

Stage 17 had two HC climbs – the Col de Madeleine summit came with 64 km left and the race finished on the Col de la Loze. The major selection here came very early on the Madeleine where already only half of climbers were left in the front group with 5 km to go on that climb. Riders were steadily distanced on Col de la Loze until the leaders came over the line with massive time gaps. The first six riders came in alone and there were 17 different groups in the top 20 riders.

Most Selective Climbs

The four most selective climbs for Climber rider types were Col de la Loze on Stage 17 (50% of climbers at beginning vs 12% at end), Montee du plateau des Glieres on Stage 18 (72% of climbers were left at beginning of climb and 22% at end), Col de Peyresourde on Stage 8 (54% at beginning and 27% at end), and Col de Madeleine on Stage 17 (100% at beginning and 50% at end).

For the full peloton, Col de la Loze (Stage 17) was the most selective overall, cutting the peloton down to less than a sixth of its size before the climb. The Madeleine (Stage 17), Port de Bales (Stage 8 penultimate climb), and Glieres (Stage 18) were the next most selective – each reducing the peloton to a fifth of its prior size.

With enough data it would be interesting to tease out the most important factors to make a climb selective. Is it the length, the gradient, a combination of the two, the position in the stage? Based on this limited sample of 50+ climbs, the two most important factors are the length in kilometers (long climbs are more selective) and the overall difficulty in terms of vertical gain (gradient * length). The difference in length and vertical gain between a typical HC climb like the Col de Madeleine and a 1st category climb like the Orcieres-Merlette climb is about five times more important than the difference in distance to the finish between a climb 100 km from the finish and one which is a summit finish. However, that is a very weak claim with only a dozen stages worth of data.

Other Survival Probabilities

Stage 6 2020 TDF
Stage 9 2020 TDF
Stage 12 2020 TDF
Stage 13 2020 TDF
Stage 14 2020 TDF
Stage 15 2020 TDF
Stage 16 2020 TDF
Stage 18 2020 TDF

Power Output Throughout Race

In recent posts I’ve explored race level weighted average power from top level riders. I’ve shown that power outputs are higher on tougher climbing stages, higher for riders in breakaways, lower on very hot race days, higher in one day races than stage races, and higher for higher placed riders. I’ve also dug into power output by rider types, showing that climbers have the widest spread between their max power output on climbing stages and their lowest power output on flatter stages.

Next I’m going to explore power outputs over the course of a race by exploiting power files for climbs throughout a race. I have over 15,000 unique rider/climb power files showing power output, gradient, distance, and position of each climb throughout a stage. These cover over 300 riders, for nearly 200 unique races, and over 650 different climbs within those races.

Where does power output diverge?

A simple model of a bike race is explained clearly by sports scientist Stephen Seiler in this Youtube video. Over the course of the race, riders raise the level of the race by raising power output. This steadily winnows the pack down. In a very selective race it may winnow down to 1 or 2 riders; in a less selective race you may go to the line with most of the peloton remaining. In the case of the 2020 World Championship Road Race in the video above, the race was very selective leading to a final group with six of the best riders in the world.

Average power output on climbs by position in the race

Exploiting these power files, we can draw a curve of the power required throughout the average stage. Interestingly, the curve does not follow this simple model of steadily increasing power. There’s a spike in the first quarter of the race where presumably the breakaway is being established, but power declines in the last 20 percent of the race.

However, this does not invalidate Seiler’s point as this graph considers all riders in the peloton. Of course, as riders are shed by the peloton because they cannot maintain the steadily increasing pace, they drop their power output and continue on to the finish at some lower pace. This is best illustrated by the gruppetto concept in stage races; each rider doesn’t struggle to the finish at the best pace they can maintain. Instead, they are happy to reduce their power output and save their energy for another day.

When I re-create the chart above only stratified into riders finishing top 10, 11-25, 26-50, and 51-100, we see a clear divergence in power over the course of a race. And for those in the top 10 we see a steadily increasing curve from around 4.80-5.00 watts/kg in the first half of the race to about 6.00 watts/kg in the final stages. For riders finishing outside the top 100 we see a steady decline in power output from again around 4.70 to 4.80 watts/kg in the first half to around 4.20 watts/kg in the final stages.

Power output throughout race by finish position

That divergence really appears around 60-70% through the race. Riders in the top 50 but not top 25 can hang on until around 65% through the race, while riders in the top 25 but not top 10 can stick until about 80% through. Of course, nothing is ever this cut and dried considering different parcours with more or fewer climbs, but this gives an idea of the averages across pro races.

Power output by rider type

Leveraging the rider cluster types introduced here and further discussed here we can draw similar curves for six basic rider types. You can see three clear groups: climbers, mountain helpers/puncheurs, and sprinters/sprint train/domestiques. These shows obvious trends where non-climbers are declining in power in the last third of the race and climbers are increasing in power over that time.

Pre-defined rider clusters show obvious trends in power output across the race


Most interesting is how this model can be applied to identify the toughest obstacle for a rider in a race. A lot of cycling commentary focuses around whether certain riders can overcame a tough climb with the final group – often because that rider has a faster sprint and will beat those riders if they can stick around. These findings can be applied to identify where that most critical obstacle to overcome is located within a race.

To do so, we have to establish a baseline level of power output. This should not be the average level or the level maintained by the peloton at the start of the race, but instead some lower level that is theoretically the floor for a rider in the World Tour/Pro Conti peloton. I would propose using that roughly 4.00 watts/kg level that riders outside the top 100 finishers maintain in closing stages of a race.

We can then scale power outputs relative to that baseline 4.00 watts/kg level. To stick with the final group we can look at what is required for top 10 or top 25 finishers on final climbs. On average, this is in excess of 5.50 watts/kg up to 6.00 watts/kg. This is about 1.5-2.0 watts/kg higher than our baseline. We can also look at what is maintained by riders in the first half to two thirds of the race. That 4.70-4.80 watts/kg level is about 0.7-0.8 watts/kg higher than our baseline. To bring it all together, the curve below approximates the power required to get over the average climb with the lead group based on that climb’s position within the race – all scaled relative to a final climb.

We can see the first 60% of the race or so requires about 35-40% of the power over baseline compared to the final climb. This increases steadily then from that point. The first 25% is likely slightly higher as the breakaway is established here in many races.

An example from 2019 Il Lombardia is shown below with climb difficulty estimates for each climb. These climbing difficulties are based on gradient, distance, and elevation of each climb so that a typical category 1 climb in a grand tour is around 10.0 and a typical category 2 climb in a grand tour is around a 5.0.

Climbing difficulty for climbs in 2019 Il Lombardia (adjusted for position in race in red)

Lombardia had six climbs in 2019, two of the 1st category difficulty, three of 2nd category difficulty, and one of the third category difficulty. However, because of how the easiest climb (Battaglia) was placed in the race, it was likely to require more power to overcome than the three 2nd category climbs in the first 180 km. The Sormano climb is the objectively tougher climb without considering position compared to Civiglio or others, but Civiglio’s later position in the race makes it a much more equal comparison.

Power Output by Rider Types

In my last post I introduced simple rider clusters based on a handful of features calculated for each rider from race result data. These clusters divided riders into six groups – sprinters, sprint train, puncheurs, domestiques, climbers, and mountain helpers. Three of these groups are leaders who are more likely to be going for race wins and three are helpers who are assisting the leaders. One follow-up that became possible was to analyze power outputs based on these rider types.

By Time Duration

Leveraging over seven thousand power files, I can link rider types to power outputs over different time durations. I chose to look at 10, 30, 60, 120, 300, 600, 1200, and 2400 seconds which covers the full spread of efforts from sprints to longer efforts like the final of a one day classic or high mountain climb. For each power file I extracted the best power output from these time durations, calculated watts per kg using weights from procyclingstats.com, and adjusted those relative to average for all riders. I had data from 169 riders with at least 10 power files from 2019 and 2020.

An example from four riders with 2019-20 data

An example of the curves produced for four riders are above. Smith packs a better sprint than the other three riders, but tails-off on efforts outside two minutes. Kamna excels on 10 minute plus efforts. De Gendt is second best at pretty much all points. Declercq is well off the highest outputs between 1-5 minutes, but is close on longer efforts.

29% of the data came from domestiques, 29% from mountain helpers, 16% from sprint trains (so 74% from helpers), 12% from puncheurs, 8% from sprinters, and 6% from climbers (so 26% from leaders).

I looked at both the 80th percentile of power output (so the better performances for a rider) and the median. As you would expect, the 80th percentile data produced wider spread between power outputs versus average.

Power outputs by time duration for rider clusters

Sprinters produce over 10% more power than average riders in 10 second efforts – while puncheurs and sprint train riders were both above average here.

Puncheurs were consistently above average at all time periods, while domestiques were consistently below average.

Climbers peaked with about 7% more power than average in 20+ minute efforts, while mountain helpers were about 4% higher than average. At 10 second efforts, climbers were about 13 percentage points behind sprinters, while at 1200 seconds climbers were about 13 percentage points ahead of sprinters.

None of this is earth-shattering information; if anything, this shows the validity of rider clusters based on simply result data because we’re seeing the expected power outputs. Classifying riders as members of sprint train or mountain helpers is a valid distinction; they are producing different power outputs over different time durations.

By Stage Types

We can also break-down overall power output in a race based on the type of stage it is. I’ve simply broken down the races into three types: those ending in a bunch sprint (20+ rider group), non-bunch sprints on hilly parcours, and non-bunch sprints on mountainous parcours. The dividing line between hilly and mountains is roughly Fleche Wallonne or Giro dell’Emilia.

The metric here is relative weighted average power – so power output relative to a rider’s own average across all races. In this case, 120% is basically max effort – the efforts required of a winning breakaway rider or top 5 in mountain stage – and 80% is a low effort day like a flat bunch sprint finish in a grand tour. For example, the three big breakaway days for Neilson Powless in the 2020 Tour de France were 114%, 116%, and 114% efforts, while he did 82%, 89%, and 78% on three flatter days in the bunch.

Climbers have the widest gap between efforts in mountain races and bunch sprint races

Of course, flatter stages require lower power outputs in general for all riders as discussed in a previous post. But, we can identify some significant differences between the clusters. Climbers are clearly different from other clusters in their mountain/bunch sprint outputs, while mountain helpers are clearly different from sprinters/domestiques/sprint trains.

Climber types have the widest gap between performance by parcours. In bunch sprint races they produce ~92% of their average weighted average power. In mountainous races, they are over 105% of their average weighted average power. Domestiques have the narrowest gap between bunch sprint days and mountain days.

Simple Rider Clustering

Cycling is fundamentally a team sport, and like all team sports it has roles/positions which riders fill in each race. Unlike most team sports however, those roles/positions are not explicitly stated prior to the race by teams. Confusing things further, cycling teams compete at different strength races regularly. A rider who is a helper at a World Tour level race could easily be the protected leader in a lower level 1.1 race. The challenge to successfully define which position/role each rider fulfills on their team can be collapsed into answering two questions: 1) which parcours fits a rider (sprint finishes, hills, mountains) and 2) are they typically the leader or a helper (do they finish as the top rider in their team often or rarely?).

Cluster analysis is regularly used in other team sports to define roles – even in sports with more defined positions. This paper from the Sloan Sports Analytics Conference from 2012 discusses clustering based on roles in the context of the NBA. This talk from Opta Pro Forum in 2015 discusses clustering based on player types in the context of football. There have been many more advanced and refined attempts at clustering in both (and other) sports since. Clustering is most easily done either with the K-means method or with hierarchical clustering. Both operate by feeding certain features for each row of data into the algorithm. For K-means, you have to pre-define the number of clusters you’re looking for (this can be optimized so it’s not necessarily arbitrary), but for hierarchical a tree is built which steadily divides the data into smaller and smaller clusters.

Clustering in Pro Cycling

K-Means is the method I’ll use here. The key to using K-means (and any clustering method) is defining the best features for your data so that there are obvious ways for the algorithm to divide the data. For this, I’ve defined season long average values for 2017-2020 for four statistics:

  1. % of points earned in bunch sprint finishes (of all points earned) – where points are earned decay from 1st place earning the most to a cut-off between 15th and 50th place depending on the strength of the peloton earning the least
  2. Overall points per race-day – with the same definition of points
  3. % of race-days finishing as #1 rider on your team (must also finish in top 20 in the race)
  4. Difficulty of the parcours weighted by points earned – where tougher mountain stages are high difficulty and flat stages are low difficulty

These four features define 1) whether a rider earns points in sprint finishes, 2) whether they are finishing high in races, 3) whether they are leading the team, and 4) whether they fit best on flatter, hillier, or mountainous races. We can generate other features like how often a rider is in the breakaway, their performance in time trials, whether they’re successful in tough conditions, or how strong the races they participate in are, but this gives a good start and have strong data availability going back 3+ years.

Performing the Clustering

K-Means can be optimized using several methods (elbow, silhouette, etc) to find the correct number of clusters. Sometimes the number will be obvious and sometimes a small range is appropriate. For this data, between 4 and 7 clusters was the best fit. After fitting the model, six produced the most explainable clusters.

The six clusters produced can be broadly defined as three leader clusters and three helper clusters with the three levels corresponding to mountainous or flatter parcours.

  1. Sprinters – the easiest cluster to define; these riders are most successful in bunch sprints in flatter races and are often the leader
  2. Climbers – these riders get few points in bunch sprints; rather they earn points in mountainous finishes and are often the leader of the team
  3. Puncheurs – these riders are best on hillier parcours and can win from the bunch or in smaller groups
  4. Climbing helper – these riders earn fewer points and are leaders less often, but are more often successful in mountainous/hilly stages
  5. Sprint train – these riders earn points often in bunch sprints finishes, but are rarely leaders
  6. Domestiques – this is the catch-all group for riders who aren’t successful in mountain/hilly stages, nor do they earn bunch sprint points often; these can be road captains or super-strong men like Tim Declercq whose work is done before the pointy end of the race.
Cluster% of RidersExample (2019)
Sprinter9%Caleb Ewan
Climber8%Egan Bernal
Puncheur11%Alberto Bettiol
Climbing helper19%Marc Soler
Sprint train20%Max Richeze
Domestique34%Luke Rowe
Distribution of clusters in World Tour / Pro Conti riders

So about 28% of riders fit into one of the three leader clusters, another 39% in the two specialized helper clusters, and 34% in the more generic domestique cluster. Said more clearly, in an eight man grand tour team you’ll normally have two protected riders, three specialized helpers, and three less specialized domestiques.

Visualizing Clusters

This visual lays out how this looks at the team level with colors denoting clusters, % of races as leader on x-axis, and parcours fit on y-axis. Below is Bora Hansgrohe – one of the most successful teams in the World Tour in 2019.

Bora Hansgrohe team plot in 2019

They had three primary sprinters in 2019 who are clustered on the lower right and two climbers in the upper right. They have a number of puncheurs of whom Max Schachmann is the prime example. The clustering isn’t perfect here; Formolo is more of a climbing helper and Postlberger is more involved in the sprint train, but because of the mixed roles they get classified here. Muhlberger is certainly a climbing helper though. In the bottom left are numerous support riders of which Schwarzmann, Archbold, Burghardt, and Selig are seen as sprint train and most of the rest are domestiques. You can argue Bodnar and Oss are more likely sprint train than not (and Oss is clustered with sprint train for 2017, 2018, and 2020).

In general though, these plots give a strong overview of which roles riders are fulfilling in a team for a given season.

A generic plot of where all riders fell in 2019 is below.

Plot of six clusters for 2019 World Tour and Pro-Conti riders


This clustering has numerous applications like:

  1. does having more sprint train domestiques predict more success for sprinters / same for climbers and their helpers?
  2. how does power output differ across clusters on different stage types?
  3. which types of riders are most successful on different parcours?
  4. which teams are most and least balanced (high or low percentage of riders as leader clusters vs helper clusters)?

The Impact of Temperature on Relative Power Output

In my recent posts I’ve introduced the concept of relative power output where a rider’s weighted average power in a particular race is compared to the average of all of their races to create a rider specific relative measure. Given sufficient sample size both of races for individual riders and riders in the data-set we can show what factors contribute to higher or lower power output on stages. So far, it looks like races with a lot of climbing, time trials, shorter races in general, high finishing position on the stage, and being in the breakaway leads to higher relative power output.

Another significant factor is the temperature the race is ridden at. I have temperature data for >95% of race-days in my data-set. The average temperature is about 20.5 C and 11% of race-days have an average temperature over 30 C.

To find the impact of temperature, we can leverage to relative power output model built in a recent post. That considers factors like the length and climbing difficulty of a stage, as well as the finishing position of the riders. That model produces predictions and we can train the temperature model on the residuals of that model and the actual power output on the stage. For example, stage 5 of the UAE Tour in 2019 is predicted to have a relative power output of 93% of a rider’s average weighted average power (eg, 256 watts if their average weighted average power is 275 watts). We can train the temperature impact model on the residual of that prediction (93%) and the actual (73%).

The ideal temperature is about 13 degrees Celsius (57 degrees Fahrenheit); this is where relative power output has peaked for the pro peloton. Higher temperatures have shown extreme impacts on the relative power output with a race at 30 C coming in about 3% lower than average and the hottest days like 2020 Strade Bianche impacting relative power output by -10%!

Incorporating temperature into the model shows that for every 1 degree Celsius away from 13 C relative power output drops by 0.4 percentage points such that the 2020 Strade Bianche race would be expected to have -10.6% lower relative power output than the average race. Adding temperature also increases the R^2 of the model from 0.25 to 0.29; it also improves the model fit out-of-sample with R^2 increasing and SE dropping from 0.10 to 0.09.

Power Output in Breakaways

In two recent posts covering relative power output in the 2020 Tour de France and showing the impact of stage characteristics on power output, I made the claim that power output is higher for riders in the breakaway than otherwise. This is likely not a controversial statement for anyone, but in this post I’ll show that breakaway riders are required to produce more power than normal on the days they ride ahead of the peloton.

My data-set comes from Pro Cycling Stats which have collected kilometers before the peloton for all World Tour races in 2020. There are 54 race days in this data-set where at least one rider rode ahead of the peloton. I linked this data to my stage level power output data. For power output, I’m using the riders’ relative power output compared to their average power output on all stages. For example, James Knox had a normalized power of 258 watts in stage 5 of 2020 UAE Tour which was 101% of his average normalized power (255 watts) in all stages.

In total, I have 1417 rider race-days with any breakaway out-front where I have power output (across 54 unique races). About 10% of those race-days have >0% of the stage in the breakaway.

On average, the riders in the breakaway have done 106% of their average normalized power while spending an average of 35% of the stage ahead of the peloton. The riders not in the breakaway have done 98.5% of their average normalized power.

Impact of percentage of stage in breakaway on relative power output

The graph above shows how much higher relative power output is depending on percentage of stage spent in the breakaway. Riders who a higher percentage of the stage in the break have higher relative power output. This relationship holds within stages as percentage of time in the breakaway is positively associated with power output in 49 of 54 stages measured – with a median coefficient of 0.14. That means a rider who spends the entire stage in the breakaway vs one who spends none of the stage in the breakaway will output 14% more of their average normalized power.