Where Does Separation Occur?

This post will be not cover particularly novel ground if you pay any attention to professional cycling. Many of the conclusions are obvious. However, if you find yourself in the know already, trust that this post is the necessary building block for more interesting work.

Fundamentally, cycling races can be viewed as war of attrition. As I laid out during this post and Dr. Seiler discusses in his video here, races normally feature a long stretch of steady efforts, before the pace is ramped up towards the end. This steadily increasing pace towards the back-end of the race is what creates separation between riders in most races. The exception is in some primarily flatter races which just do not feature the type of topography which results in time gaps and so the peloton finishes together in a bunch sprint. In all other races, separation is typically created – particularly on hills and/or mountainous sections of the race – but also on cobbled roads, gravel/poorly surfaced roads, in crosswinds, etc.

So that is a fan’s understanding, informed by some limited studying of power outputs on significant climbs across a large sample of races. However, we can leverage an even larger data-set of individual race segments on all types of flat, uphill, downhill, poor surface roads, etc. I’ve gathered a data-set of rider speeds on different length race segments primarily from 2020 professional season to do just that. There are 22,500 unique segments in this data-set covering 177 races.

What Produces Separation?

The metric of choice for showing separation is the time difference between 90th percentile in speed and 10th percentile in speed on a segment, divided by the median speed over that segment. Eg, if 90th percentile is 27 km/h, 10th percentile is 20 km/h, and median is 22 km/h the Separation Factor is about 32%. That is fairly high among all segments where the mean is 12% and median is 7%. The max Separation Factor for the average race is around 48% – typically a short segment.

Essentially treat the Separation Factor as the percentage difference in speed between riders racing the fastest and those racing the slowest. On the nine decisive final climbs in the 2020 Tour de France the Separation Factors averaged 29%, ranging between 18% for Col de la Loze and 39% for Orcieres-Merlette.

Separation factor average by gradient

Separation is primarily created by higher gradients. This is maybe the most blindingly obviously statement I’ve ever made, but there it is. Flatter or downhill segments created very little separation among the group on average, while uphill segments create increasingly more as the gradient increases from about 3% to over 10%.

When comparing segments on cobbles vs similar gradient segments on normal roads the rougher roads show a highly statistically significant difference of about 5 to 8% larger Separation Factor for cobbled sections vs normal roads, depending on how it is modelled (a model with gradient included tends to diminish the impact as many cobbled/white road sections are also uphill). The impact here is roughly a Separation Factor of 9% for a flat, non-cobbled segment vs 14% or higher for a flat, cobbled segment.

Separation factor by percentage thru race

And replicating the work done previously showing that power varied more in later stages of the races, segments further towards the end of a race provide for more separation than those earlier in the race, with the most significant increase in roughly the last third of the race.

Where is Separation Largest?

Clearly segments further through the race have the highest separation between fastest and slowest riders. But where does the moment with the largest separation occur in these races? For this sample of 177 races, the key moment on average is 88% through the race, with about 40% of races having this key moment in the last 3% of the race (or last 5km for a typical 180km race). It’s important to note a segment is counted as occurring based on where it ends within a race.

Where segment of max separation moment occurs in races

Again, to any fan of cycling the knowledge that the largest time gaps occur near the end of a race – particularly on summit finish climbs – is not novel. However, this data does show how rare it is for the segments which produce the largest time gaps to occur anywhere in the first half of the race.

Survival Probability (2020 TDF)

In recent years the Tour de France has added the live tracking feature to their online/second-screen coverage of the Tour. This telemetry data shows the position of every rider on the course (absent any errors/malfunctions/bike changes) throughout the race – including information about their speed, the road conditions, and wind conditions.

So far this has largely been exploited only as a social media activation tool for NTT (eg, on Twitter @letourdata). But knowing the position of every rider with their speed is obviously powerful information. For example, who was pulling in the lead group to try to extend the gap on stage 7 of this year’s Tour de France? How large was the group at the bottom of each final climb? How much time did Zakarin lose to the leaders on stage 8 on descents? Which Jumbo Visma domestique drove the pace the hardest on the climbs?

Leveraging this data, I’ve analyzed ten of the hilly or mountainous stages of this year’s Tour de France to look at the probability of staying with the front group (defined as the group with Primoz Roglic as he was in yellow for the lion’s share of these stages) over the stage. I’ve decided to ignore riders who spend the stage in the breakaway, but anyone who attacked away from Roglic (eg, Pogacar in stage 8) counts as surviving as well.

Survival Probability for Notable Stages

Survival probability with Roglic by rider type (Stage 8 2020 TDF)

Stage 8 was a short 141 km stage with three major climbs – Col de Mente at 82 km to the finish, Port de Bales at 37 km to the finish, and Col de Peyresourde at 11 km to the finish. Col de Mente did little to shake-up the peloton and almost all riders were able to come together to the bottom of the Port de Bales – a 12 km HC climb. That was where the major selection on the stage came; by the end of the climb, less than 20% of the riders in all rider types except Climbers had been distanced from the GC group. About 60% of Climbers survived Port de Bales with the GC group.

The selection for climbers came largely on the Peyresourde and about 30% of climbers survived with Roglic to the end of that climb (with nine riders finishing on the same time from the GC group).

Survival probability with Roglic by rider type (Stage 4 2020 TDF)

Compare that with Stage 4. Stage 4 was not particularly selective before the final climb with a handful of category 3/4 climbs leading up to the 1st category climb to Orcieres-Merlette. At the end of the final warm-up climb at 20 km to go at least 50% of domestiques and sprint train riders were still there along with upwards of 75% of puncheurs, mountain helpers, and climbers. The non-climbers were distanced quickly on the final climb, but it wasn’t until the final few kilometers of the stage that the selection was made among climbers and even then over 60% of them came to the line with Roglic (with 16 riders finishing on the same time).

Survival probability with Roglic by rider type (Stage 17 2020 TDF)

Stage 17 had two HC climbs – the Col de Madeleine summit came with 64 km left and the race finished on the Col de la Loze. The major selection here came very early on the Madeleine where already only half of climbers were left in the front group with 5 km to go on that climb. Riders were steadily distanced on Col de la Loze until the leaders came over the line with massive time gaps. The first six riders came in alone and there were 17 different groups in the top 20 riders.

Most Selective Climbs

The four most selective climbs for Climber rider types were Col de la Loze on Stage 17 (50% of climbers at beginning vs 12% at end), Montee du plateau des Glieres on Stage 18 (72% of climbers were left at beginning of climb and 22% at end), Col de Peyresourde on Stage 8 (54% at beginning and 27% at end), and Col de Madeleine on Stage 17 (100% at beginning and 50% at end).

For the full peloton, Col de la Loze (Stage 17) was the most selective overall, cutting the peloton down to less than a sixth of its size before the climb. The Madeleine (Stage 17), Port de Bales (Stage 8 penultimate climb), and Glieres (Stage 18) were the next most selective – each reducing the peloton to a fifth of its prior size.

With enough data it would be interesting to tease out the most important factors to make a climb selective. Is it the length, the gradient, a combination of the two, the position in the stage? Based on this limited sample of 50+ climbs, the two most important factors are the length in kilometers (long climbs are more selective) and the overall difficulty in terms of vertical gain (gradient * length). The difference in length and vertical gain between a typical HC climb like the Col de Madeleine and a 1st category climb like the Orcieres-Merlette climb is about five times more important than the difference in distance to the finish between a climb 100 km from the finish and one which is a summit finish. However, that is a very weak claim with only a dozen stages worth of data.

Other Survival Probabilities

Stage 6 2020 TDF
Stage 9 2020 TDF
Stage 12 2020 TDF
Stage 13 2020 TDF
Stage 14 2020 TDF
Stage 15 2020 TDF
Stage 16 2020 TDF
Stage 18 2020 TDF

Power Output Throughout Race

In recent posts I’ve explored race level weighted average power from top level riders. I’ve shown that power outputs are higher on tougher climbing stages, higher for riders in breakaways, lower on very hot race days, higher in one day races than stage races, and higher for higher placed riders. I’ve also dug into power output by rider types, showing that climbers have the widest spread between their max power output on climbing stages and their lowest power output on flatter stages.

Next I’m going to explore power outputs over the course of a race by exploiting power files for climbs throughout a race. I have over 15,000 unique rider/climb power files showing power output, gradient, distance, and position of each climb throughout a stage. These cover over 300 riders, for nearly 200 unique races, and over 650 different climbs within those races.

Where does power output diverge?

A simple model of a bike race is explained clearly by sports scientist Stephen Seiler in this Youtube video. Over the course of the race, riders raise the level of the race by raising power output. This steadily winnows the pack down. In a very selective race it may winnow down to 1 or 2 riders; in a less selective race you may go to the line with most of the peloton remaining. In the case of the 2020 World Championship Road Race in the video above, the race was very selective leading to a final group with six of the best riders in the world.

Average power output on climbs by position in the race

Exploiting these power files, we can draw a curve of the power required throughout the average stage. Interestingly, the curve does not follow this simple model of steadily increasing power. There’s a spike in the first quarter of the race where presumably the breakaway is being established, but power declines in the last 20 percent of the race.

However, this does not invalidate Seiler’s point as this graph considers all riders in the peloton. Of course, as riders are shed by the peloton because they cannot maintain the steadily increasing pace, they drop their power output and continue on to the finish at some lower pace. This is best illustrated by the gruppetto concept in stage races; each rider doesn’t struggle to the finish at the best pace they can maintain. Instead, they are happy to reduce their power output and save their energy for another day.

When I re-create the chart above only stratified into riders finishing top 10, 11-25, 26-50, and 51-100, we see a clear divergence in power over the course of a race. And for those in the top 10 we see a steadily increasing curve from around 4.80-5.00 watts/kg in the first half of the race to about 6.00 watts/kg in the final stages. For riders finishing outside the top 100 we see a steady decline in power output from again around 4.70 to 4.80 watts/kg in the first half to around 4.20 watts/kg in the final stages.

Power output throughout race by finish position

That divergence really appears around 60-70% through the race. Riders in the top 50 but not top 25 can hang on until around 65% through the race, while riders in the top 25 but not top 10 can stick until about 80% through. Of course, nothing is ever this cut and dried considering different parcours with more or fewer climbs, but this gives an idea of the averages across pro races.

Power output by rider type

Leveraging the rider cluster types introduced here and further discussed here we can draw similar curves for six basic rider types. You can see three clear groups: climbers, mountain helpers/puncheurs, and sprinters/sprint train/domestiques. These shows obvious trends where non-climbers are declining in power in the last third of the race and climbers are increasing in power over that time.

Pre-defined rider clusters show obvious trends in power output across the race

Implications

Most interesting is how this model can be applied to identify the toughest obstacle for a rider in a race. A lot of cycling commentary focuses around whether certain riders can overcame a tough climb with the final group – often because that rider has a faster sprint and will beat those riders if they can stick around. These findings can be applied to identify where that most critical obstacle to overcome is located within a race.

To do so, we have to establish a baseline level of power output. This should not be the average level or the level maintained by the peloton at the start of the race, but instead some lower level that is theoretically the floor for a rider in the World Tour/Pro Conti peloton. I would propose using that roughly 4.00 watts/kg level that riders outside the top 100 finishers maintain in closing stages of a race.

We can then scale power outputs relative to that baseline 4.00 watts/kg level. To stick with the final group we can look at what is required for top 10 or top 25 finishers on final climbs. On average, this is in excess of 5.50 watts/kg up to 6.00 watts/kg. This is about 1.5-2.0 watts/kg higher than our baseline. We can also look at what is maintained by riders in the first half to two thirds of the race. That 4.70-4.80 watts/kg level is about 0.7-0.8 watts/kg higher than our baseline. To bring it all together, the curve below approximates the power required to get over the average climb with the lead group based on that climb’s position within the race – all scaled relative to a final climb.

We can see the first 60% of the race or so requires about 35-40% of the power over baseline compared to the final climb. This increases steadily then from that point. The first 25% is likely slightly higher as the breakaway is established here in many races.

An example from 2019 Il Lombardia is shown below with climb difficulty estimates for each climb. These climbing difficulties are based on gradient, distance, and elevation of each climb so that a typical category 1 climb in a grand tour is around 10.0 and a typical category 2 climb in a grand tour is around a 5.0.

Climbing difficulty for climbs in 2019 Il Lombardia (adjusted for position in race in red)

Lombardia had six climbs in 2019, two of the 1st category difficulty, three of 2nd category difficulty, and one of the third category difficulty. However, because of how the easiest climb (Battaglia) was placed in the race, it was likely to require more power to overcome than the three 2nd category climbs in the first 180 km. The Sormano climb is the objectively tougher climb without considering position compared to Civiglio or others, but Civiglio’s later position in the race makes it a much more equal comparison.

Power Output by Rider Types

In my last post I introduced simple rider clusters based on a handful of features calculated for each rider from race result data. These clusters divided riders into six groups – sprinters, sprint train, puncheurs, domestiques, climbers, and mountain helpers. Three of these groups are leaders who are more likely to be going for race wins and three are helpers who are assisting the leaders. One follow-up that became possible was to analyze power outputs based on these rider types.

By Time Duration

Leveraging over seven thousand power files, I can link rider types to power outputs over different time durations. I chose to look at 10, 30, 60, 120, 300, 600, 1200, and 2400 seconds which covers the full spread of efforts from sprints to longer efforts like the final of a one day classic or high mountain climb. For each power file I extracted the best power output from these time durations, calculated watts per kg using weights from procyclingstats.com, and adjusted those relative to average for all riders. I had data from 169 riders with at least 10 power files from 2019 and 2020.

An example from four riders with 2019-20 data

An example of the curves produced for four riders are above. Smith packs a better sprint than the other three riders, but tails-off on efforts outside two minutes. Kamna excels on 10 minute plus efforts. De Gendt is second best at pretty much all points. Declercq is well off the highest outputs between 1-5 minutes, but is close on longer efforts.

29% of the data came from domestiques, 29% from mountain helpers, 16% from sprint trains (so 74% from helpers), 12% from puncheurs, 8% from sprinters, and 6% from climbers (so 26% from leaders).

I looked at both the 80th percentile of power output (so the better performances for a rider) and the median. As you would expect, the 80th percentile data produced wider spread between power outputs versus average.

Power outputs by time duration for rider clusters

Sprinters produce over 10% more power than average riders in 10 second efforts – while puncheurs and sprint train riders were both above average here.

Puncheurs were consistently above average at all time periods, while domestiques were consistently below average.

Climbers peaked with about 7% more power than average in 20+ minute efforts, while mountain helpers were about 4% higher than average. At 10 second efforts, climbers were about 13 percentage points behind sprinters, while at 1200 seconds climbers were about 13 percentage points ahead of sprinters.

None of this is earth-shattering information; if anything, this shows the validity of rider clusters based on simply result data because we’re seeing the expected power outputs. Classifying riders as members of sprint train or mountain helpers is a valid distinction; they are producing different power outputs over different time durations.

By Stage Types

We can also break-down overall power output in a race based on the type of stage it is. I’ve simply broken down the races into three types: those ending in a bunch sprint (20+ rider group), non-bunch sprints on hilly parcours, and non-bunch sprints on mountainous parcours. The dividing line between hilly and mountains is roughly Fleche Wallonne or Giro dell’Emilia.

The metric here is relative weighted average power – so power output relative to a rider’s own average across all races. In this case, 120% is basically max effort – the efforts required of a winning breakaway rider or top 5 in mountain stage – and 80% is a low effort day like a flat bunch sprint finish in a grand tour. For example, the three big breakaway days for Neilson Powless in the 2020 Tour de France were 114%, 116%, and 114% efforts, while he did 82%, 89%, and 78% on three flatter days in the bunch.

Climbers have the widest gap between efforts in mountain races and bunch sprint races

Of course, flatter stages require lower power outputs in general for all riders as discussed in a previous post. But, we can identify some significant differences between the clusters. Climbers are clearly different from other clusters in their mountain/bunch sprint outputs, while mountain helpers are clearly different from sprinters/domestiques/sprint trains.

Climber types have the widest gap between performance by parcours. In bunch sprint races they produce ~92% of their average weighted average power. In mountainous races, they are over 105% of their average weighted average power. Domestiques have the narrowest gap between bunch sprint days and mountain days.

Simple Rider Clustering

Cycling is fundamentally a team sport, and like all team sports it has roles/positions which riders fill in each race. Unlike most team sports however, those roles/positions are not explicitly stated prior to the race by teams. Confusing things further, cycling teams compete at different strength races regularly. A rider who is a helper at a World Tour level race could easily be the protected leader in a lower level 1.1 race. The challenge to successfully define which position/role each rider fulfills on their team can be collapsed into answering two questions: 1) which parcours fits a rider (sprint finishes, hills, mountains) and 2) are they typically the leader or a helper (do they finish as the top rider in their team often or rarely?).

Cluster analysis is regularly used in other team sports to define roles – even in sports with more defined positions. This paper from the Sloan Sports Analytics Conference from 2012 discusses clustering based on roles in the context of the NBA. This talk from Opta Pro Forum in 2015 discusses clustering based on player types in the context of football. There have been many more advanced and refined attempts at clustering in both (and other) sports since. Clustering is most easily done either with the K-means method or with hierarchical clustering. Both operate by feeding certain features for each row of data into the algorithm. For K-means, you have to pre-define the number of clusters you’re looking for (this can be optimized so it’s not necessarily arbitrary), but for hierarchical a tree is built which steadily divides the data into smaller and smaller clusters.

Clustering in Pro Cycling

K-Means is the method I’ll use here. The key to using K-means (and any clustering method) is defining the best features for your data so that there are obvious ways for the algorithm to divide the data. For this, I’ve defined season long average values for 2017-2020 for four statistics:

  1. % of points earned in bunch sprint finishes (of all points earned) – where points are earned decay from 1st place earning the most to a cut-off between 15th and 50th place depending on the strength of the peloton earning the least
  2. Overall points per race-day – with the same definition of points
  3. % of race-days finishing as #1 rider on your team (must also finish in top 20 in the race)
  4. Difficulty of the parcours weighted by points earned – where tougher mountain stages are high difficulty and flat stages are low difficulty

These four features define 1) whether a rider earns points in sprint finishes, 2) whether they are finishing high in races, 3) whether they are leading the team, and 4) whether they fit best on flatter, hillier, or mountainous races. We can generate other features like how often a rider is in the breakaway, their performance in time trials, whether they’re successful in tough conditions, or how strong the races they participate in are, but this gives a good start and have strong data availability going back 3+ years.

Performing the Clustering

K-Means can be optimized using several methods (elbow, silhouette, etc) to find the correct number of clusters. Sometimes the number will be obvious and sometimes a small range is appropriate. For this data, between 4 and 7 clusters was the best fit. After fitting the model, six produced the most explainable clusters.

The six clusters produced can be broadly defined as three leader clusters and three helper clusters with the three levels corresponding to mountainous or flatter parcours.

  1. Sprinters – the easiest cluster to define; these riders are most successful in bunch sprints in flatter races and are often the leader
  2. Climbers – these riders get few points in bunch sprints; rather they earn points in mountainous finishes and are often the leader of the team
  3. Puncheurs – these riders are best on hillier parcours and can win from the bunch or in smaller groups
  4. Climbing helper – these riders earn fewer points and are leaders less often, but are more often successful in mountainous/hilly stages
  5. Sprint train – these riders earn points often in bunch sprints finishes, but are rarely leaders
  6. Domestiques – this is the catch-all group for riders who aren’t successful in mountain/hilly stages, nor do they earn bunch sprint points often; these can be road captains or super-strong men like Tim Declercq whose work is done before the pointy end of the race.
Cluster% of RidersExample (2019)
Sprinter9%Caleb Ewan
Climber8%Egan Bernal
Puncheur11%Alberto Bettiol
Climbing helper19%Marc Soler
Sprint train20%Max Richeze
Domestique34%Luke Rowe
Distribution of clusters in World Tour / Pro Conti riders

So about 28% of riders fit into one of the three leader clusters, another 39% in the two specialized helper clusters, and 34% in the more generic domestique cluster. Said more clearly, in an eight man grand tour team you’ll normally have two protected riders, three specialized helpers, and three less specialized domestiques.

Visualizing Clusters

This visual lays out how this looks at the team level with colors denoting clusters, % of races as leader on x-axis, and parcours fit on y-axis. Below is Bora Hansgrohe – one of the most successful teams in the World Tour in 2019.

Bora Hansgrohe team plot in 2019

They had three primary sprinters in 2019 who are clustered on the lower right and two climbers in the upper right. They have a number of puncheurs of whom Max Schachmann is the prime example. The clustering isn’t perfect here; Formolo is more of a climbing helper and Postlberger is more involved in the sprint train, but because of the mixed roles they get classified here. Muhlberger is certainly a climbing helper though. In the bottom left are numerous support riders of which Schwarzmann, Archbold, Burghardt, and Selig are seen as sprint train and most of the rest are domestiques. You can argue Bodnar and Oss are more likely sprint train than not (and Oss is clustered with sprint train for 2017, 2018, and 2020).

In general though, these plots give a strong overview of which roles riders are fulfilling in a team for a given season.

A generic plot of where all riders fell in 2019 is below.

Plot of six clusters for 2019 World Tour and Pro-Conti riders

Applications

This clustering has numerous applications like:

  1. does having more sprint train domestiques predict more success for sprinters / same for climbers and their helpers?
  2. how does power output differ across clusters on different stage types?
  3. which types of riders are most successful on different parcours?
  4. which teams are most and least balanced (high or low percentage of riders as leader clusters vs helper clusters)?

The Impact of Temperature on Relative Power Output

In my recent posts I’ve introduced the concept of relative power output where a rider’s weighted average power in a particular race is compared to the average of all of their races to create a rider specific relative measure. Given sufficient sample size both of races for individual riders and riders in the data-set we can show what factors contribute to higher or lower power output on stages. So far, it looks like races with a lot of climbing, time trials, shorter races in general, high finishing position on the stage, and being in the breakaway leads to higher relative power output.

Another significant factor is the temperature the race is ridden at. I have temperature data for >95% of race-days in my data-set. The average temperature is about 20.5 C and 11% of race-days have an average temperature over 30 C.

To find the impact of temperature, we can leverage to relative power output model built in a recent post. That considers factors like the length and climbing difficulty of a stage, as well as the finishing position of the riders. That model produces predictions and we can train the temperature model on the residuals of that model and the actual power output on the stage. For example, stage 5 of the UAE Tour in 2019 is predicted to have a relative power output of 93% of a rider’s average weighted average power (eg, 256 watts if their average weighted average power is 275 watts). We can train the temperature impact model on the residual of that prediction (93%) and the actual (73%).

The ideal temperature is about 13 degrees Celsius (57 degrees Fahrenheit); this is where relative power output has peaked for the pro peloton. Higher temperatures have shown extreme impacts on the relative power output with a race at 30 C coming in about 3% lower than average and the hottest days like 2020 Strade Bianche impacting relative power output by -10%!

Incorporating temperature into the model shows that for every 1 degree Celsius away from 13 C relative power output drops by 0.4 percentage points such that the 2020 Strade Bianche race would be expected to have -10.6% lower relative power output than the average race. Adding temperature also increases the R^2 of the model from 0.25 to 0.29; it also improves the model fit out-of-sample with R^2 increasing and SE dropping from 0.10 to 0.09.

Power Output in Breakaways

In two recent posts covering relative power output in the 2020 Tour de France and showing the impact of stage characteristics on power output, I made the claim that power output is higher for riders in the breakaway than otherwise. This is likely not a controversial statement for anyone, but in this post I’ll show that breakaway riders are required to produce more power than normal on the days they ride ahead of the peloton.

My data-set comes from Pro Cycling Stats which have collected kilometers before the peloton for all World Tour races in 2020. There are 54 race days in this data-set where at least one rider rode ahead of the peloton. I linked this data to my stage level power output data. For power output, I’m using the riders’ relative power output compared to their average power output on all stages. For example, James Knox had a normalized power of 258 watts in stage 5 of 2020 UAE Tour which was 101% of his average normalized power (255 watts) in all stages.

In total, I have 1417 rider race-days with any breakaway out-front where I have power output (across 54 unique races). About 10% of those race-days have >0% of the stage in the breakaway.

On average, the riders in the breakaway have done 106% of their average normalized power while spending an average of 35% of the stage ahead of the peloton. The riders not in the breakaway have done 98.5% of their average normalized power.

Impact of percentage of stage in breakaway on relative power output

The graph above shows how much higher relative power output is depending on percentage of stage spent in the breakaway. Riders who a higher percentage of the stage in the break have higher relative power output. This relationship holds within stages as percentage of time in the breakaway is positively associated with power output in 49 of 54 stages measured – with a median coefficient of 0.14. That means a rider who spends the entire stage in the breakaway vs one who spends none of the stage in the breakaway will output 14% more of their average normalized power.

Relative Power Output by Stage Characteristics

In my last post I dug into the Tour de France power data shared by @Velofacts, specifically adding to his analysis by breaking-down the relative power output of each rider compared to themselves. In other words, instead of judging the power output of Thomas de Gendt relative to other riders who have different skill-sets, judge relative to his own level. Some of the key findings were: 1) the flat sprint stages saw significantly lower relative power output than the big mountain days, 2) tough days where the peloton pushed hard like stage 7 saw comparable power output to the mountain days, and 3) riders saw peak power output when they were in the break.

This post will take that analysis further and determine what stage characteristics lead to high or low relative power outputs across pro races. To do that, I’ve collected nearly 10,000 individual stages linked to specific races for pro riders across the World and continental tours for 2019 and 2020. This data-set includes 292 unique riders with 98% of the data coming from riders with at least 10 races (the minimum to include in modelling below). The average normalized power for this sample was 278 watts (4.09 watts/kg) with the 10th to 90th percentile represented by 232 to 321 watts (3.49 to 4.67 watts/kg).

Model Creation

I separated 2019 from 2020 with 2019 acting as the training set and 2020 as the test set so we’ll be able to judge how predictive the model is without having seen the data yet. I built two models: a simple linear regression with easy to interpret effects and then a random forest based model (xgboost) which should theoretically have better performance with worse interpretability.

I linked the race level power files to my existing data-set of stage results which include variables like whether a race was a time trial, one day race, and/or grand tour, what the climbing difficulty of the stage was, whether the stage ended with an uphill finish, what class of race it was (World Tour or lower levels), but also finishing position data from riders.

To build the linear model, I included:

one_day_race, time_trial, length of stage (km), natural log of finish position, climb_difficulty of stage, and rider_DNF (did not finish race). I also included an interaction between log finish position and climb difficulty with the idea that there is probably a larger difference in power output by finish position on tougher stages.

The model was built to predict the relative power output on the stage calculated in the form of: eg, 300 watts on stage / 285 watts on average = 1.053 relative power output.

The linear model achieved in-sample R^2 of 0.25 with a standard error of 0.10. Obviously predicting power output is a high variance task. Five of the seven variables were judged significant at p < 0.01 level (rider DNF was not significant and the finish position/climb difficulty interaction was significant at p <0.05 level).

The coefficients were:

VariableCoefficientSE
Intercept1.0960.01
Natural log of finish position (1)-0.0140.002
climb_difficulty (2)0.0070.001
time_trial0.1380.009
one_day_race0.0550.003
length in km-0.0005<0.001
Natural log of finish position * climb_difficulty-0.0005<0.001
Rider DNF-0.0090.01
R^2 = 0.25, SE = 0.10

(1): Actually natural log of rank + 1 to allow for interaction term as LN(1) = 0

(2): Climbing difficulty is judged on a scale starting at 0 where the toughest mountain stages are around 30. Classic races like Flanders and Strade Bianche come in at 4-5, hillier races like Liege-Bastogne-Liege and Fleche Wallonne at 8-10, grand tour mountain stages typically start at 12 and up.

Practical Impacts

A rider finishing 1st on a mountain stage (climb difficulty = 15) will be estimated to have 9.4% higher power output than the same rider finishing 150th on that mountain stage. On a flat stage, the 1st place rider will have about 6.3% higher power output than the same rider finishing 150th.

One day races are raced with 5.5% higher relative power – which matches the findings of van Erp and Sanders that one day races are ridden at a higher intensity than stage races.

Time trials obviously have much higher normalized power as they are much shorter races. In this case, 14% higher relative power. Related, stage length plays a small role with shorter stages = greater power output. As time trials are shorter, much of this impact comes from time trials, but shorter stages like Stage 20 of the 2019 Tour de France have much higher normalized power than longer stages.

Testing

Testing this model on the 2020 data shows similar out-of-sample fit – R^2 of 0.23 and a standard error of 0.10. Again, predicting relative power at the stage level is a high variance endeavor!

The highest predicted power output in the test set (>2750 races in 2020) was Thomas De Gendt’s stage 20 time trial in the 2020 Tour de France which was predicted at 121.9% of his average normalized power. The Planche de Belles Filles time trial had almost all of the elements of a high power output stage: short, a time trial, with a lot of climbing. De Gendt finished 20th so our predicted power for the higher finishing riders would have been even higher. De Gendt actually recorded 135% of his average normalized power!

The highest road race prediction was Pierre Latour at Mont Ventoux Challenge – a one day race with two significant climbs where Latour finished 4th. The prediction was 118.5% of Latour’s average normalized power, but he only produced 105%. The residuals for ten riders with power files from that raced showed it as the 3rd largest negative difference between predicted and actual power – indicating the race required much less power than predicted by the model.

The flip-side of that was Stage 7 of the Tour de France where Bora attempted to make the race extremely difficult to shed Peter Sagan’s sprint rivals. Later the stage exploded in the crosswinds. Overall, it ranked as the 6th largest positive difference between predicted and actual power. Parcours and race type play a significant role in power output in a race, but how the race is ridden is a huge factor.

Gradient Boosted Model

Gradient boosted models leverage hundreds or thousands of independent random forest models to learn which variables are most significant and derive predictions. In this case, I used the same training and testing data and the same variables with the xgboost package in R.

Optimizing for root mean square error gave me an error of 0.088 on the training data and 0.099 on the testing data. Out of sample, the R^2 was 0.24 – not much improved on the linear model. Based on that lack of significant improvement from the boosted model, it makes sense to rely on the more easily interpreted linear model.

Predictions

To end, here are the top 10 over-estimated and under-estimated stages by the model for 2020. As mentioned, the Mont Ventoux Challenge was one of the most over-estimated in power output alongside four of the flatter Tour de France stages, the pan flat Milano-Torino race, a flat stage from Tirreno-Adriatico, a Tour of Portugal time trial, and – surprisingly – Milano-Sanremo.

On the under-estimated side, there’s a handful of minor French and Spanish stage races from February along with the World Championship road race, a Binckbank Tour stage where Mathieu van der Poel won from 60km out, and the aforementioned stage 7 of the Tour de France.

Intensity in Tour de France 2020

Twitter user Velofacts does great work compiling and sharing power data from pro riders on Strava. He’s collected stage level normalized power data from about 60 Tour de France riders and what looks to be nearly 1000 different stages. He’s analyzed the data by calculating watts/kg which shows FDJ domestique Sebastien Reichenbach as the rider who has generated the most watts/kg at just under 4.0 average over the race. He has also calculated stage level averages which shows the peak power output stage was stage 9 (306 watts) and that mountain stages were generally raced at 275-300 watts (with Bora’s demolition of the peloton in stage 7 and ensuing splits in crosswinds coming in as the 5th toughest stage).

Related to my recent post on measuring intensity using relative perceived effort and power data, we can utilize this data to calculate a rider’s relative power output across the race. van Erp and Sanders have found power output in grand tours is not on average any higher than even lower level one day races, however this can be explained by riders pacing themselves throughout with as many big efforts and lesser efforts. Is this obvious in the data?

Relative power outputs in 2020 Tour de France

Of Velofacts’s 60 some riders I’ve chosen 33 of the most interesting riders who made an impact on the Tour and calculated their relative power output (each stage divided by their race average). I’ve also classified them roughly into three groups to show what role they played in the race; guys like Dries Devenyns, Roger Kluge, and Tim Declercq were ‘Workers’, Sepp Kuss, Harold Tejeda, and Reichenbach were ‘Climbers’, and Simon Geschke, Quentin Pacher, and Carlos Verona were ‘Breakaway’.

Relative power output averages by rider type in 2020 Tour de France (data from @Velofacts)

You can see the whole peloton got a break on stages 3, 5, 11, and 21 with all of the groups having much lower average power outputs. Climbers posted their peak relative power output in Stage 9 at 16% higher than their Tour average. Breakaway men peaked in Stages 9 and 16. Workers peaked in relative power on the difficult mountain days between stages 16-18, but also had to generate equal effort in Stage 7. Interestingly, the workers had less variable power outputs overall with a standard deviation of 7% vs 9-10% for the other two groups of riders.

Selected riders with relative power output by stage (data from @Velofacts)

Some interesting items above:

  1. The days riders were in breakaways are obvious – especially the ones in the first half of the race on easier days. Pacher’s stage 4 breakaway was 16% higher than his average and Ladagnous’s in stage 11 was equal to his race average, but about 15% higher than the average for other riders!
  2. Similarly, we have three of the top 5 on Stage 16 in this data-set – stage winner Kamna was +19%, Geschke +21%, and Reichenbach +19%. Those are three of the top 6 relative efforts in the entire race.
  3. The upper range in terms of efforts looks like about +20%. Harold Tejeda and Sepp Kuss both hit those figures in Stage 9, while Geschke and Pacher did in Stage 16. The standard deviation across all 619 stages analyzed by me is about 10%.
  4. The three riders who varied the most between stages were Neilson Powless, Simon Geschke, and Sepp Kuss. Powless and Geschke were both involved in several large breakaways racking up the 3rd and 10th most kilometers in breakaways according the Pro Cycling Stats. Kuss was Roglic’s top climbing domestique and as such had four efforts of 10% or higher than his average as well as three of around 20% below his average.
  5. Tim Declercq’s monster efforts at the front of the peloton on Stage 10 is obvious. That was Declercq’s peak effort in the race. He spent about 67% of the race in front of the peloton – by far the highest total of the day.

Relative Perceived Exertion (RPE)

Continuing my exploration of the recent pro cycling analytics papers, today I’m going to dig into three related papers on measuring intensity to monitor fatigue. The goal is to apply these findings to build an intensity metric that can be applied globally to see which riders have experienced higher or lower intensities at a given point in the season.

I will examine:

The datasets here are nine riders from single cycling team within the 2016 Giro (paper C), twelve riders from a single cycling team within the 2016 Giro and/or 2016 Vuelta (paper A), and twenty riders (presumably from the same team) in a range of World Tour and lower (HC, Level 1) level races (paper B). Paper A also included a baseline training data-set from two weeks prior to each race. The authors gathered power output, heart rate, and relative perceived exertion data from each race and calculated intensity metrics.

Relative perceived exertion (RPE) is of particular interest as it provides a data point which is not publicly available in the ways that power data (for example from Strava) or riding speed is for many pros. For those unfamiliar, RPE is simply the athlete’s assessment of the difficulty of their workout on a scale of 1-10 where 10 is the most difficult.

The RPE was obtained 30 min after the exercise bout based on the question: “How hard was your workout?”

pg 2 Sanders and Heijbor (2017)

Intensity in Grand Tours

Paper A analyzes the intensity metrics in four groups: a baseline two weeks prior to a grand tour and then week 1, week 2, and week 3 of grand tours. They find the intensity as measured by RPE increases from 3.5 in baseline training to 6.0 in week 1, 7.0 in week 2, and 7.4 in week 3 – where week 3 is significantly different from week 1 (and all three weeks from the baseline). Power output – both mean watts and normalized power in watts – differed significantly in weeks 2 and 3 from week 1.

This matches what we typically see in grand tours where the first week is easier than subsequent weeks. Eg, of eight stages in the 2016 Giro classified as mountain stages by ProCyclingStats only one was in week 1 (stages 1-7), while three were in week 2 (stages 8-14) and four in week 3 (stages 15-21). This was similar to 2016 Vuelta where all seven of mountains stages came in the final two weeks.

Paper C digs into the differences between different stage types using the same type of data-set just from the 2016 Giro. They divide stages into four types: flat, semi-mountainous, mountainous, and time trials which seem – based on sample size – to largely correspond to the aforementioned PCS categorization. A mountain stage had to either have 35km+ of total climbing and/or a 10km+ finishing climb, while a flat stage could not have more than 13km of climbing and could not end uphill.

RPE by stage types showed flat stages easier at 5.8, semi-mountainous/hilly at 6.5, mountain stages at 7.8, and time trials at 6.8. The gap between mountain and flat stages was significant. Power output also increased significantly between each of the three road stages with mountain > semi-mountainous > flat.

So we have some basic findings:

  • Baseline training leading into a grand tour (and presumably in taper mode) is about a 3.5 on RPE
  • Flat stages – as are typical in the first weeks of grand tours – rate around 6.0 in RPE
  • Hillier stages rate around 6.5
  • Time trials will be rated around 6.5-7.0 – presumably higher for those riding them with intent to compete for podium/in team time trials
  • Mountain stages will rate closer to 8.0

Influence of Category

This is the most interesting paper of the bunch as it leverages the vast array of races a World Tour team will enter throughout the year to attempt to tease out intensity differences by category. Pro cycling is organized with a the highest level being the World Tour of the most elite ~35-40 or so events including the grand tours and the five one-day monuments at the top of the heap. Below that level is three additional levels of .HC, .1, and .2 races. A World Tour team will typically compete only in the first two levels in a season with maybe a quarter to half the teams in a given race at the .HC level being World Tour teams and a lower percentage being World Tour teams in Level 1 races.

A note, the RPE values in this study are collected on a different 6-20 scale from the 1-10 scale used in the other two papers.

The authors show two sets of a results utilizing RPE; one focusing on one-day races comparing monuments (the five most prestigious one day races) with three other levels of one-day races (World Tour, HC, Level 1) and another for grand tours compared to three other levels of stage races (World Tour, HC, Level 1).

They find a RPE of approximately 18 for the monuments, vs 17 for World Tour races and 16 for HC/Level 1 races. The monuments differ significantly from each of the three lower levels and World Tour also differs significantly from Level 1 races. Monuments tend to be much longer races (268 km on average vs 219 km average for World Tour and <200 km average for HC/Level 1 races) which can explain the differences in intensity.

We do not see a similar stratification for stage races. Of stage races, the grand tours actually average the lowest RPE (14.5), and they stand-out as significantly lower in terms of max/mean heart rate (power output is not significantly different in a high or low sense). This is likely to do with team strategy for which I can’t explain better than the authors.

When comparing single-day races with multi-day races, it is clear that for all the race categories the single-day races are higher in volume, load and intensity compared to the multi-day races. Race regulations are an important contributor to this. Volume and load are higher competing in single-day races because race regulations allow longer races within all the single-day race categories compared to the multi-day race categories.

Furthermore, the higher intensities within the single-day races could be caused by differences in race tactics between the single-day and multi-day races. In a single-day race, a cycling team has one goal and that is to finish as high as possible and thus the whole team (race leader and domestiques) will work without any necessity to hold back for other days to come. Within a multi-day race, a team has different goals per stage and this will depend on their overall goal. For example, when a team brings a sprinter as a team leader to a multi-day race, on the flat stages the support riders will likely have to work on the front of the peloton which will result in an increased exercise intensity and load whilst the support riders for a climber will have a higher exercise load on the climbing stages when working for their leader.

Overall race length (i.e. number of stages) can be a cause for the slightly higher intensity measures (absolute and relative PO, IF) in the 2.1 race category compared to higher level multi-day stage races. On average, the lower category races are shorter and some have only two-race days. The more days a multi-day race consists of, the more riders will most likely aim to spread their energy over multiple days (and aim to minimise energy expenditure on days where it’s possible).

pg 11-12 van Erp and Sanders (2019)

Some more findings:

  • One day races see significant stratification between monuments > other World Tour races > other one day races where monuments are about 8% higher RPE than World Tour and 12% higher than HC/Level 1.
  • Stage races overall do not see this stratification – likely because of strategic pacing the riders implement over the length of the race. I presume stages ridden most competitively will be similar to one day races, while those ridden less competitively will be lower than average. Overall, the level for stage races is about 8% below the HC/Level 1 one-day races.

Implementing an intensity measure globally

These papers provide a solid foundation for a global RPE metric. Difficulty of the profile can increase the RPE by about 33% between a typical flat stage and a typical mountain stage in a grand tour. In addition, monuments and World Tour one-day races will be raced with between 4% and 12% higher intensity than HC/Level 1 one-day races. Stage races on average will not be raced differently by level, however certain stages will be ridden with higher intensity than others such that the combined average is approximately 8% below a HC/Level 1 one-day race.

There’s a big missing piece here; how do we estimate which stages in a stage race were ridden with higher vs lower intensity in stage races?

Different riders will ride differently depending on their team orders/role; eg, a domestique for a team focused on their sprinter like Quick Step will surely have higher RPE on a flat stage where he is responsible for bringing back the breakaway or leading out a sprinter than on a mountain stage where he isn’t protecting a GC leader. If riders could be classified as primarily flat or primarily climbing riders this would be an easier determination to make.

We could also use finishing position and/or presence in a breakaway to estimate higher RPE than normal. Eg, Michal Kwiatkowski’s stage 18 victory in the Tour de France came in a long breakaway over four high mountains where he was in a small group for most of the stage. He certainly had a higher RPE than in stage 17 when he finished 130th in the gruppetto on a similar high mountain stage.

2020 Road World Championships

This idea of recent rider-specific intensity is particularly relevant this week. The Road World Championships are being held with the men’s road race on Sunday – just a week after the Tour de France. While the startlist isn’t completely final, a large majority of the contenders per the betting odds competed at the Tour – meaning 21 days of racing in the last 30 days as of this Sunday. Typically worlds are held two weeks following the Vuelta a Espana (and two months after the Tour) which means the amount of racing in many of the contenders’ legs will be higher than normal.

Top contenders like Jakob Fuglsang, Thomas Pidcock, and Diego Ulissi did not ride the Tour. Fuglsang rode several one-day classics – high RPE events – in August followed by a week-long stage race in mid-September. Pidcock rode the U23 Giro d’Italia at the turn of August into September. Ulissi has ridden two stage races for a combined ten races worth of effort in the past month – most spent riding as one of the leaders. In each case, these riders have roughly half the race days in their legs as co-favorites Wout van Aert and Julian Alaphilippe.