Power Output by Rider Types

In my last post I introduced simple rider clusters based on a handful of features calculated for each rider from race result data. These clusters divided riders into six groups – sprinters, sprint train, puncheurs, domestiques, climbers, and mountain helpers. Three of these groups are leaders who are more likely to be going for race wins and three are helpers who are assisting the leaders. One follow-up that became possible was to analyze power outputs based on these rider types.

By Time Duration

Leveraging over seven thousand power files, I can link rider types to power outputs over different time durations. I chose to look at 10, 30, 60, 120, 300, 600, 1200, and 2400 seconds which covers the full spread of efforts from sprints to longer efforts like the final of a one day classic or high mountain climb. For each power file I extracted the best power output from these time durations, calculated watts per kg using weights from procyclingstats.com, and adjusted those relative to average for all riders. I had data from 169 riders with at least 10 power files from 2019 and 2020.

An example from four riders with 2019-20 data

An example of the curves produced for four riders are above. Smith packs a better sprint than the other three riders, but tails-off on efforts outside two minutes. Kamna excels on 10 minute plus efforts. De Gendt is second best at pretty much all points. Declercq is well off the highest outputs between 1-5 minutes, but is close on longer efforts.

29% of the data came from domestiques, 29% from mountain helpers, 16% from sprint trains (so 74% from helpers), 12% from puncheurs, 8% from sprinters, and 6% from climbers (so 26% from leaders).

I looked at both the 80th percentile of power output (so the better performances for a rider) and the median. As you would expect, the 80th percentile data produced wider spread between power outputs versus average.

Power outputs by time duration for rider clusters

Sprinters produce over 10% more power than average riders in 10 second efforts – while puncheurs and sprint train riders were both above average here.

Puncheurs were consistently above average at all time periods, while domestiques were consistently below average.

Climbers peaked with about 7% more power than average in 20+ minute efforts, while mountain helpers were about 4% higher than average. At 10 second efforts, climbers were about 13 percentage points behind sprinters, while at 1200 seconds climbers were about 13 percentage points ahead of sprinters.

None of this is earth-shattering information; if anything, this shows the validity of rider clusters based on simply result data because we’re seeing the expected power outputs. Classifying riders as members of sprint train or mountain helpers is a valid distinction; they are producing different power outputs over different time durations.

By Stage Types

We can also break-down overall power output in a race based on the type of stage it is. I’ve simply broken down the races into three types: those ending in a bunch sprint (20+ rider group), non-bunch sprints on hilly parcours, and non-bunch sprints on mountainous parcours. The dividing line between hilly and mountains is roughly Fleche Wallonne or Giro dell’Emilia.

The metric here is relative weighted average power – so power output relative to a rider’s own average across all races. In this case, 120% is basically max effort – the efforts required of a winning breakaway rider or top 5 in mountain stage – and 80% is a low effort day like a flat bunch sprint finish in a grand tour. For example, the three big breakaway days for Neilson Powless in the 2020 Tour de France were 114%, 116%, and 114% efforts, while he did 82%, 89%, and 78% on three flatter days in the bunch.

Climbers have the widest gap between efforts in mountain races and bunch sprint races

Of course, flatter stages require lower power outputs in general for all riders as discussed in a previous post. But, we can identify some significant differences between the clusters. Climbers are clearly different from other clusters in their mountain/bunch sprint outputs, while mountain helpers are clearly different from sprinters/domestiques/sprint trains.

Climber types have the widest gap between performance by parcours. In bunch sprint races they produce ~92% of their average weighted average power. In mountainous races, they are over 105% of their average weighted average power. Domestiques have the narrowest gap between bunch sprint days and mountain days.

Simple Rider Clustering

Cycling is fundamentally a team sport, and like all team sports it has roles/positions which riders fill in each race. Unlike most team sports however, those roles/positions are not explicitly stated prior to the race by teams. Confusing things further, cycling teams compete at different strength races regularly. A rider who is a helper at a World Tour level race could easily be the protected leader in a lower level 1.1 race. The challenge to successfully define which position/role each rider fulfills on their team can be collapsed into answering two questions: 1) which parcours fits a rider (sprint finishes, hills, mountains) and 2) are they typically the leader or a helper (do they finish as the top rider in their team often or rarely?).

Cluster analysis is regularly used in other team sports to define roles – even in sports with more defined positions. This paper from the Sloan Sports Analytics Conference from 2012 discusses clustering based on roles in the context of the NBA. This talk from Opta Pro Forum in 2015 discusses clustering based on player types in the context of football. There have been many more advanced and refined attempts at clustering in both (and other) sports since. Clustering is most easily done either with the K-means method or with hierarchical clustering. Both operate by feeding certain features for each row of data into the algorithm. For K-means, you have to pre-define the number of clusters you’re looking for (this can be optimized so it’s not necessarily arbitrary), but for hierarchical a tree is built which steadily divides the data into smaller and smaller clusters.

Clustering in Pro Cycling

K-Means is the method I’ll use here. The key to using K-means (and any clustering method) is defining the best features for your data so that there are obvious ways for the algorithm to divide the data. For this, I’ve defined season long average values for 2017-2020 for four statistics:

  1. % of points earned in bunch sprint finishes (of all points earned) – where points are earned decay from 1st place earning the most to a cut-off between 15th and 50th place depending on the strength of the peloton earning the least
  2. Overall points per race-day – with the same definition of points
  3. % of race-days finishing as #1 rider on your team (must also finish in top 20 in the race)
  4. Difficulty of the parcours weighted by points earned – where tougher mountain stages are high difficulty and flat stages are low difficulty

These four features define 1) whether a rider earns points in sprint finishes, 2) whether they are finishing high in races, 3) whether they are leading the team, and 4) whether they fit best on flatter, hillier, or mountainous races. We can generate other features like how often a rider is in the breakaway, their performance in time trials, whether they’re successful in tough conditions, or how strong the races they participate in are, but this gives a good start and have strong data availability going back 3+ years.

Performing the Clustering

K-Means can be optimized using several methods (elbow, silhouette, etc) to find the correct number of clusters. Sometimes the number will be obvious and sometimes a small range is appropriate. For this data, between 4 and 7 clusters was the best fit. After fitting the model, six produced the most explainable clusters.

The six clusters produced can be broadly defined as three leader clusters and three helper clusters with the three levels corresponding to mountainous or flatter parcours.

  1. Sprinters – the easiest cluster to define; these riders are most successful in bunch sprints in flatter races and are often the leader
  2. Climbers – these riders get few points in bunch sprints; rather they earn points in mountainous finishes and are often the leader of the team
  3. Puncheurs – these riders are best on hillier parcours and can win from the bunch or in smaller groups
  4. Climbing helper – these riders earn fewer points and are leaders less often, but are more often successful in mountainous/hilly stages
  5. Sprint train – these riders earn points often in bunch sprints finishes, but are rarely leaders
  6. Domestiques – this is the catch-all group for riders who aren’t successful in mountain/hilly stages, nor do they earn bunch sprint points often; these can be road captains or super-strong men like Tim Declercq whose work is done before the pointy end of the race.
Cluster% of RidersExample (2019)
Sprinter9%Caleb Ewan
Climber8%Egan Bernal
Puncheur11%Alberto Bettiol
Climbing helper19%Marc Soler
Sprint train20%Max Richeze
Domestique34%Luke Rowe
Distribution of clusters in World Tour / Pro Conti riders

So about 28% of riders fit into one of the three leader clusters, another 39% in the two specialized helper clusters, and 34% in the more generic domestique cluster. Said more clearly, in an eight man grand tour team you’ll normally have two protected riders, three specialized helpers, and three less specialized domestiques.

Visualizing Clusters

This visual lays out how this looks at the team level with colors denoting clusters, % of races as leader on x-axis, and parcours fit on y-axis. Below is Bora Hansgrohe – one of the most successful teams in the World Tour in 2019.

Bora Hansgrohe team plot in 2019

They had three primary sprinters in 2019 who are clustered on the lower right and two climbers in the upper right. They have a number of puncheurs of whom Max Schachmann is the prime example. The clustering isn’t perfect here; Formolo is more of a climbing helper and Postlberger is more involved in the sprint train, but because of the mixed roles they get classified here. Muhlberger is certainly a climbing helper though. In the bottom left are numerous support riders of which Schwarzmann, Archbold, Burghardt, and Selig are seen as sprint train and most of the rest are domestiques. You can argue Bodnar and Oss are more likely sprint train than not (and Oss is clustered with sprint train for 2017, 2018, and 2020).

In general though, these plots give a strong overview of which roles riders are fulfilling in a team for a given season.

A generic plot of where all riders fell in 2019 is below.

Plot of six clusters for 2019 World Tour and Pro-Conti riders

Applications

This clustering has numerous applications like:

  1. does having more sprint train domestiques predict more success for sprinters / same for climbers and their helpers?
  2. how does power output differ across clusters on different stage types?
  3. which types of riders are most successful on different parcours?
  4. which teams are most and least balanced (high or low percentage of riders as leader clusters vs helper clusters)?

The Impact of Temperature on Relative Power Output

In my recent posts I’ve introduced the concept of relative power output where a rider’s weighted average power in a particular race is compared to the average of all of their races to create a rider specific relative measure. Given sufficient sample size both of races for individual riders and riders in the data-set we can show what factors contribute to higher or lower power output on stages. So far, it looks like races with a lot of climbing, time trials, shorter races in general, high finishing position on the stage, and being in the breakaway leads to higher relative power output.

Another significant factor is the temperature the race is ridden at. I have temperature data for >95% of race-days in my data-set. The average temperature is about 20.5 C and 11% of race-days have an average temperature over 30 C.

To find the impact of temperature, we can leverage to relative power output model built in a recent post. That considers factors like the length and climbing difficulty of a stage, as well as the finishing position of the riders. That model produces predictions and we can train the temperature model on the residuals of that model and the actual power output on the stage. For example, stage 5 of the UAE Tour in 2019 is predicted to have a relative power output of 93% of a rider’s average weighted average power (eg, 256 watts if their average weighted average power is 275 watts). We can train the temperature impact model on the residual of that prediction (93%) and the actual (73%).

The ideal temperature is about 13 degrees Celsius (57 degrees Fahrenheit); this is where relative power output has peaked for the pro peloton. Higher temperatures have shown extreme impacts on the relative power output with a race at 30 C coming in about 3% lower than average and the hottest days like 2020 Strade Bianche impacting relative power output by -10%!

Incorporating temperature into the model shows that for every 1 degree Celsius away from 13 C relative power output drops by 0.4 percentage points such that the 2020 Strade Bianche race would be expected to have -10.6% lower relative power output than the average race. Adding temperature also increases the R^2 of the model from 0.25 to 0.29; it also improves the model fit out-of-sample with R^2 increasing and SE dropping from 0.10 to 0.09.

Power Output in Breakaways

In two recent posts covering relative power output in the 2020 Tour de France and showing the impact of stage characteristics on power output, I made the claim that power output is higher for riders in the breakaway than otherwise. This is likely not a controversial statement for anyone, but in this post I’ll show that breakaway riders are required to produce more power than normal on the days they ride ahead of the peloton.

My data-set comes from Pro Cycling Stats which have collected kilometers before the peloton for all World Tour races in 2020. There are 54 race days in this data-set where at least one rider rode ahead of the peloton. I linked this data to my stage level power output data. For power output, I’m using the riders’ relative power output compared to their average power output on all stages. For example, James Knox had a normalized power of 258 watts in stage 5 of 2020 UAE Tour which was 101% of his average normalized power (255 watts) in all stages.

In total, I have 1417 rider race-days with any breakaway out-front where I have power output (across 54 unique races). About 10% of those race-days have >0% of the stage in the breakaway.

On average, the riders in the breakaway have done 106% of their average normalized power while spending an average of 35% of the stage ahead of the peloton. The riders not in the breakaway have done 98.5% of their average normalized power.

Impact of percentage of stage in breakaway on relative power output

The graph above shows how much higher relative power output is depending on percentage of stage spent in the breakaway. Riders who a higher percentage of the stage in the break have higher relative power output. This relationship holds within stages as percentage of time in the breakaway is positively associated with power output in 49 of 54 stages measured – with a median coefficient of 0.14. That means a rider who spends the entire stage in the breakaway vs one who spends none of the stage in the breakaway will output 14% more of their average normalized power.

Relative Power Output by Stage Characteristics

In my last post I dug into the Tour de France power data shared by @Velofacts, specifically adding to his analysis by breaking-down the relative power output of each rider compared to themselves. In other words, instead of judging the power output of Thomas de Gendt relative to other riders who have different skill-sets, judge relative to his own level. Some of the key findings were: 1) the flat sprint stages saw significantly lower relative power output than the big mountain days, 2) tough days where the peloton pushed hard like stage 7 saw comparable power output to the mountain days, and 3) riders saw peak power output when they were in the break.

This post will take that analysis further and determine what stage characteristics lead to high or low relative power outputs across pro races. To do that, I’ve collected nearly 10,000 individual stages linked to specific races for pro riders across the World and continental tours for 2019 and 2020. This data-set includes 292 unique riders with 98% of the data coming from riders with at least 10 races (the minimum to include in modelling below). The average normalized power for this sample was 278 watts (4.09 watts/kg) with the 10th to 90th percentile represented by 232 to 321 watts (3.49 to 4.67 watts/kg).

Model Creation

I separated 2019 from 2020 with 2019 acting as the training set and 2020 as the test set so we’ll be able to judge how predictive the model is without having seen the data yet. I built two models: a simple linear regression with easy to interpret effects and then a random forest based model (xgboost) which should theoretically have better performance with worse interpretability.

I linked the race level power files to my existing data-set of stage results which include variables like whether a race was a time trial, one day race, and/or grand tour, what the climbing difficulty of the stage was, whether the stage ended with an uphill finish, what class of race it was (World Tour or lower levels), but also finishing position data from riders.

To build the linear model, I included:

one_day_race, time_trial, length of stage (km), natural log of finish position, climb_difficulty of stage, and rider_DNF (did not finish race). I also included an interaction between log finish position and climb difficulty with the idea that there is probably a larger difference in power output by finish position on tougher stages.

The model was built to predict the relative power output on the stage calculated in the form of: eg, 300 watts on stage / 285 watts on average = 1.053 relative power output.

The linear model achieved in-sample R^2 of 0.25 with a standard error of 0.10. Obviously predicting power output is a high variance task. Five of the seven variables were judged significant at p < 0.01 level (rider DNF was not significant and the finish position/climb difficulty interaction was significant at p <0.05 level).

The coefficients were:

VariableCoefficientSE
Intercept1.0960.01
Natural log of finish position (1)-0.0140.002
climb_difficulty (2)0.0070.001
time_trial0.1380.009
one_day_race0.0550.003
length in km-0.0005<0.001
Natural log of finish position * climb_difficulty-0.0005<0.001
Rider DNF-0.0090.01
R^2 = 0.25, SE = 0.10

(1): Actually natural log of rank + 1 to allow for interaction term as LN(1) = 0

(2): Climbing difficulty is judged on a scale starting at 0 where the toughest mountain stages are around 30. Classic races like Flanders and Strade Bianche come in at 4-5, hillier races like Liege-Bastogne-Liege and Fleche Wallonne at 8-10, grand tour mountain stages typically start at 12 and up.

Practical Impacts

A rider finishing 1st on a mountain stage (climb difficulty = 15) will be estimated to have 9.4% higher power output than the same rider finishing 150th on that mountain stage. On a flat stage, the 1st place rider will have about 6.3% higher power output than the same rider finishing 150th.

One day races are raced with 5.5% higher relative power – which matches the findings of van Erp and Sanders that one day races are ridden at a higher intensity than stage races.

Time trials obviously have much higher normalized power as they are much shorter races. In this case, 14% higher relative power. Related, stage length plays a small role with shorter stages = greater power output. As time trials are shorter, much of this impact comes from time trials, but shorter stages like Stage 20 of the 2019 Tour de France have much higher normalized power than longer stages.

Testing

Testing this model on the 2020 data shows similar out-of-sample fit – R^2 of 0.23 and a standard error of 0.10. Again, predicting relative power at the stage level is a high variance endeavor!

The highest predicted power output in the test set (>2750 races in 2020) was Thomas De Gendt’s stage 20 time trial in the 2020 Tour de France which was predicted at 121.9% of his average normalized power. The Planche de Belles Filles time trial had almost all of the elements of a high power output stage: short, a time trial, with a lot of climbing. De Gendt finished 20th so our predicted power for the higher finishing riders would have been even higher. De Gendt actually recorded 135% of his average normalized power!

The highest road race prediction was Pierre Latour at Mont Ventoux Challenge – a one day race with two significant climbs where Latour finished 4th. The prediction was 118.5% of Latour’s average normalized power, but he only produced 105%. The residuals for ten riders with power files from that raced showed it as the 3rd largest negative difference between predicted and actual power – indicating the race required much less power than predicted by the model.

The flip-side of that was Stage 7 of the Tour de France where Bora attempted to make the race extremely difficult to shed Peter Sagan’s sprint rivals. Later the stage exploded in the crosswinds. Overall, it ranked as the 6th largest positive difference between predicted and actual power. Parcours and race type play a significant role in power output in a race, but how the race is ridden is a huge factor.

Gradient Boosted Model

Gradient boosted models leverage hundreds or thousands of independent random forest models to learn which variables are most significant and derive predictions. In this case, I used the same training and testing data and the same variables with the xgboost package in R.

Optimizing for root mean square error gave me an error of 0.088 on the training data and 0.099 on the testing data. Out of sample, the R^2 was 0.24 – not much improved on the linear model. Based on that lack of significant improvement from the boosted model, it makes sense to rely on the more easily interpreted linear model.

Predictions

To end, here are the top 10 over-estimated and under-estimated stages by the model for 2020. As mentioned, the Mont Ventoux Challenge was one of the most over-estimated in power output alongside four of the flatter Tour de France stages, the pan flat Milano-Torino race, a flat stage from Tirreno-Adriatico, a Tour of Portugal time trial, and – surprisingly – Milano-Sanremo.

On the under-estimated side, there’s a handful of minor French and Spanish stage races from February along with the World Championship road race, a Binckbank Tour stage where Mathieu van der Poel won from 60km out, and the aforementioned stage 7 of the Tour de France.

Intensity in Tour de France 2020

Twitter user Velofacts does great work compiling and sharing power data from pro riders on Strava. He’s collected stage level normalized power data from about 60 Tour de France riders and what looks to be nearly 1000 different stages. He’s analyzed the data by calculating watts/kg which shows FDJ domestique Sebastien Reichenbach as the rider who has generated the most watts/kg at just under 4.0 average over the race. He has also calculated stage level averages which shows the peak power output stage was stage 9 (306 watts) and that mountain stages were generally raced at 275-300 watts (with Bora’s demolition of the peloton in stage 7 and ensuing splits in crosswinds coming in as the 5th toughest stage).

Related to my recent post on measuring intensity using relative perceived effort and power data, we can utilize this data to calculate a rider’s relative power output across the race. van Erp and Sanders have found power output in grand tours is not on average any higher than even lower level one day races, however this can be explained by riders pacing themselves throughout with as many big efforts and lesser efforts. Is this obvious in the data?

Relative power outputs in 2020 Tour de France

Of Velofacts’s 60 some riders I’ve chosen 33 of the most interesting riders who made an impact on the Tour and calculated their relative power output (each stage divided by their race average). I’ve also classified them roughly into three groups to show what role they played in the race; guys like Dries Devenyns, Roger Kluge, and Tim Declercq were ‘Workers’, Sepp Kuss, Harold Tejeda, and Reichenbach were ‘Climbers’, and Simon Geschke, Quentin Pacher, and Carlos Verona were ‘Breakaway’.

Relative power output averages by rider type in 2020 Tour de France (data from @Velofacts)

You can see the whole peloton got a break on stages 3, 5, 11, and 21 with all of the groups having much lower average power outputs. Climbers posted their peak relative power output in Stage 9 at 16% higher than their Tour average. Breakaway men peaked in Stages 9 and 16. Workers peaked in relative power on the difficult mountain days between stages 16-18, but also had to generate equal effort in Stage 7. Interestingly, the workers had less variable power outputs overall with a standard deviation of 7% vs 9-10% for the other two groups of riders.

Selected riders with relative power output by stage (data from @Velofacts)

Some interesting items above:

  1. The days riders were in breakaways are obvious – especially the ones in the first half of the race on easier days. Pacher’s stage 4 breakaway was 16% higher than his average and Ladagnous’s in stage 11 was equal to his race average, but about 15% higher than the average for other riders!
  2. Similarly, we have three of the top 5 on Stage 16 in this data-set – stage winner Kamna was +19%, Geschke +21%, and Reichenbach +19%. Those are three of the top 6 relative efforts in the entire race.
  3. The upper range in terms of efforts looks like about +20%. Harold Tejeda and Sepp Kuss both hit those figures in Stage 9, while Geschke and Pacher did in Stage 16. The standard deviation across all 619 stages analyzed by me is about 10%.
  4. The three riders who varied the most between stages were Neilson Powless, Simon Geschke, and Sepp Kuss. Powless and Geschke were both involved in several large breakaways racking up the 3rd and 10th most kilometers in breakaways according the Pro Cycling Stats. Kuss was Roglic’s top climbing domestique and as such had four efforts of 10% or higher than his average as well as three of around 20% below his average.
  5. Tim Declercq’s monster efforts at the front of the peloton on Stage 10 is obvious. That was Declercq’s peak effort in the race. He spent about 67% of the race in front of the peloton – by far the highest total of the day.

Relative Perceived Exertion (RPE)

Continuing my exploration of the recent pro cycling analytics papers, today I’m going to dig into three related papers on measuring intensity to monitor fatigue. The goal is to apply these findings to build an intensity metric that can be applied globally to see which riders have experienced higher or lower intensities at a given point in the season.

I will examine:

The datasets here are nine riders from single cycling team within the 2016 Giro (paper C), twelve riders from a single cycling team within the 2016 Giro and/or 2016 Vuelta (paper A), and twenty riders (presumably from the same team) in a range of World Tour and lower (HC, Level 1) level races (paper B). Paper A also included a baseline training data-set from two weeks prior to each race. The authors gathered power output, heart rate, and relative perceived exertion data from each race and calculated intensity metrics.

Relative perceived exertion (RPE) is of particular interest as it provides a data point which is not publicly available in the ways that power data (for example from Strava) or riding speed is for many pros. For those unfamiliar, RPE is simply the athlete’s assessment of the difficulty of their workout on a scale of 1-10 where 10 is the most difficult.

The RPE was obtained 30 min after the exercise bout based on the question: “How hard was your workout?”

pg 2 Sanders and Heijbor (2017)

Intensity in Grand Tours

Paper A analyzes the intensity metrics in four groups: a baseline two weeks prior to a grand tour and then week 1, week 2, and week 3 of grand tours. They find the intensity as measured by RPE increases from 3.5 in baseline training to 6.0 in week 1, 7.0 in week 2, and 7.4 in week 3 – where week 3 is significantly different from week 1 (and all three weeks from the baseline). Power output – both mean watts and normalized power in watts – differed significantly in weeks 2 and 3 from week 1.

This matches what we typically see in grand tours where the first week is easier than subsequent weeks. Eg, of eight stages in the 2016 Giro classified as mountain stages by ProCyclingStats only one was in week 1 (stages 1-7), while three were in week 2 (stages 8-14) and four in week 3 (stages 15-21). This was similar to 2016 Vuelta where all seven of mountains stages came in the final two weeks.

Paper C digs into the differences between different stage types using the same type of data-set just from the 2016 Giro. They divide stages into four types: flat, semi-mountainous, mountainous, and time trials which seem – based on sample size – to largely correspond to the aforementioned PCS categorization. A mountain stage had to either have 35km+ of total climbing and/or a 10km+ finishing climb, while a flat stage could not have more than 13km of climbing and could not end uphill.

RPE by stage types showed flat stages easier at 5.8, semi-mountainous/hilly at 6.5, mountain stages at 7.8, and time trials at 6.8. The gap between mountain and flat stages was significant. Power output also increased significantly between each of the three road stages with mountain > semi-mountainous > flat.

So we have some basic findings:

  • Baseline training leading into a grand tour (and presumably in taper mode) is about a 3.5 on RPE
  • Flat stages – as are typical in the first weeks of grand tours – rate around 6.0 in RPE
  • Hillier stages rate around 6.5
  • Time trials will be rated around 6.5-7.0 – presumably higher for those riding them with intent to compete for podium/in team time trials
  • Mountain stages will rate closer to 8.0

Influence of Category

This is the most interesting paper of the bunch as it leverages the vast array of races a World Tour team will enter throughout the year to attempt to tease out intensity differences by category. Pro cycling is organized with a the highest level being the World Tour of the most elite ~35-40 or so events including the grand tours and the five one-day monuments at the top of the heap. Below that level is three additional levels of .HC, .1, and .2 races. A World Tour team will typically compete only in the first two levels in a season with maybe a quarter to half the teams in a given race at the .HC level being World Tour teams and a lower percentage being World Tour teams in Level 1 races.

A note, the RPE values in this study are collected on a different 6-20 scale from the 1-10 scale used in the other two papers.

The authors show two sets of a results utilizing RPE; one focusing on one-day races comparing monuments (the five most prestigious one day races) with three other levels of one-day races (World Tour, HC, Level 1) and another for grand tours compared to three other levels of stage races (World Tour, HC, Level 1).

They find a RPE of approximately 18 for the monuments, vs 17 for World Tour races and 16 for HC/Level 1 races. The monuments differ significantly from each of the three lower levels and World Tour also differs significantly from Level 1 races. Monuments tend to be much longer races (268 km on average vs 219 km average for World Tour and <200 km average for HC/Level 1 races) which can explain the differences in intensity.

We do not see a similar stratification for stage races. Of stage races, the grand tours actually average the lowest RPE (14.5), and they stand-out as significantly lower in terms of max/mean heart rate (power output is not significantly different in a high or low sense). This is likely to do with team strategy for which I can’t explain better than the authors.

When comparing single-day races with multi-day races, it is clear that for all the race categories the single-day races are higher in volume, load and intensity compared to the multi-day races. Race regulations are an important contributor to this. Volume and load are higher competing in single-day races because race regulations allow longer races within all the single-day race categories compared to the multi-day race categories.

Furthermore, the higher intensities within the single-day races could be caused by differences in race tactics between the single-day and multi-day races. In a single-day race, a cycling team has one goal and that is to finish as high as possible and thus the whole team (race leader and domestiques) will work without any necessity to hold back for other days to come. Within a multi-day race, a team has different goals per stage and this will depend on their overall goal. For example, when a team brings a sprinter as a team leader to a multi-day race, on the flat stages the support riders will likely have to work on the front of the peloton which will result in an increased exercise intensity and load whilst the support riders for a climber will have a higher exercise load on the climbing stages when working for their leader.

Overall race length (i.e. number of stages) can be a cause for the slightly higher intensity measures (absolute and relative PO, IF) in the 2.1 race category compared to higher level multi-day stage races. On average, the lower category races are shorter and some have only two-race days. The more days a multi-day race consists of, the more riders will most likely aim to spread their energy over multiple days (and aim to minimise energy expenditure on days where it’s possible).

pg 11-12 van Erp and Sanders (2019)

Some more findings:

  • One day races see significant stratification between monuments > other World Tour races > other one day races where monuments are about 8% higher RPE than World Tour and 12% higher than HC/Level 1.
  • Stage races overall do not see this stratification – likely because of strategic pacing the riders implement over the length of the race. I presume stages ridden most competitively will be similar to one day races, while those ridden less competitively will be lower than average. Overall, the level for stage races is about 8% below the HC/Level 1 one-day races.

Implementing an intensity measure globally

These papers provide a solid foundation for a global RPE metric. Difficulty of the profile can increase the RPE by about 33% between a typical flat stage and a typical mountain stage in a grand tour. In addition, monuments and World Tour one-day races will be raced with between 4% and 12% higher intensity than HC/Level 1 one-day races. Stage races on average will not be raced differently by level, however certain stages will be ridden with higher intensity than others such that the combined average is approximately 8% below a HC/Level 1 one-day race.

There’s a big missing piece here; how do we estimate which stages in a stage race were ridden with higher vs lower intensity in stage races?

Different riders will ride differently depending on their team orders/role; eg, a domestique for a team focused on their sprinter like Quick Step will surely have higher RPE on a flat stage where he is responsible for bringing back the breakaway or leading out a sprinter than on a mountain stage where he isn’t protecting a GC leader. If riders could be classified as primarily flat or primarily climbing riders this would be an easier determination to make.

We could also use finishing position and/or presence in a breakaway to estimate higher RPE than normal. Eg, Michal Kwiatkowski’s stage 18 victory in the Tour de France came in a long breakaway over four high mountains where he was in a small group for most of the stage. He certainly had a higher RPE than in stage 17 when he finished 130th in the gruppetto on a similar high mountain stage.

2020 Road World Championships

This idea of recent rider-specific intensity is particularly relevant this week. The Road World Championships are being held with the men’s road race on Sunday – just a week after the Tour de France. While the startlist isn’t completely final, a large majority of the contenders per the betting odds competed at the Tour – meaning 21 days of racing in the last 30 days as of this Sunday. Typically worlds are held two weeks following the Vuelta a Espana (and two months after the Tour) which means the amount of racing in many of the contenders’ legs will be higher than normal.

Top contenders like Jakob Fuglsang, Thomas Pidcock, and Diego Ulissi did not ride the Tour. Fuglsang rode several one-day classics – high RPE events – in August followed by a week-long stage race in mid-September. Pidcock rode the U23 Giro d’Italia at the turn of August into September. Ulissi has ridden two stage races for a combined ten races worth of effort in the past month – most spent riding as one of the leaders. In each case, these riders have roughly half the race days in their legs as co-favorites Wout van Aert and Julian Alaphilippe.

Tom Dumoulin in Grand Tours (van Erp et al 2019)

Some of the best pro cycling research in the last few years has come out of Team Sunweb thanks to sports scientists Teun van Erp and Dajo Sanders. They have had access to Sunweb’s power files for several seasons of racing and have written several detailed analyses of the differences between men’s and women’s racing, the relationships between different training load measures across different stage types, the influence of race category and results on intensity, and several others.

The most interesting work was done by van Erp in collaboration with three other researchers, and was published in November 2019 in Medicine and science in sports and exercise as Load, Intensity, and Performance Characteristics in Multiple Grand Tours. The pdf can be accessed at that link.

Their work analyzes four grand tour performances by Tom Dumoulin where he was the GC leader for Sunweb/Giant-Alpecin in the 2015 Vuelta, 2017 Giro, 2018 Giro, and 2018 TDF. As the paper notes, Dumoulin finished 6th, 1st, 2nd, and 2nd in those tours and won at least a stage in each. Not included in the analysis were three other grand tours in the time period where he DNF’d as the focus was on his performance while contending for GC throughout.

Their data set was Dumoulin’s power data on the finishing climbs throughout the Tours. In this case, they had 33 climbs ranging from short efforts like Mur de Bretagne in Stage 6 2018 TDF to longer efforts like Mount Etna in the Giro. They supplemented the power files with information about the climbs (gradient, distance) and stage conditions (temperature, altitude climbed prior to final climb).

The main findings were that the three different grand tours had broadly similar requirements to win in terms of load and intensity characteristics. The power requirements over 33 final climbs in those tours averaged 5.9 watts/kg +/- 0.6. And those power outputs were impacted significantly by the duration of the climb and the amount of climbing prior to the climb on the stage.

Ross Tucker and others in less formal analyses have shown before that watts/kg in the high 5s/low 6s are required to contend for grand tours in the mountains, so it is great to see that replicated with actual power data from a World Tour team. Ross quotes the work of twitter climb timing expert Ammattipyoraily in a 2015 article here showing estimated watts/kg for Tour winners using Michele Ferrari’s equation; he showed Armstrong averaging 5.92 watts/kg or higher in his last six Tour wins, with more modern winners like Contador, Nibali, Wiggins, and Froome in the 5.87-6.07 watts/kg range.

Mike Puchowicz has also posted graphs of relative power output for the top contenders in 2013 and 2014 TDFs here. You can dig into Ross’s archives on the Tour analysis here and read Mike’s work at Veloclinic.

Returning to the paper, the most interesting part of this work is when they analyze a range of variables and how they impact the power output on a climb. Obviously I would love to see this with more than one rider, but their results fit the smell test and give us some coefficients. Their three factors influencing power output on final climbs are:

  1. duration of the climb (length is negatively associated with power output)
  2. gradient of the climb (steepness is positively associated with power output)
  3. total elevation gain (TEG) before mountain (a lot of preliminary climbing is negatively associated with power output)

The log duration in minutes of climb is such that a 15 minute effort is ridden at 0.8 watts/kg higher than a 45 minute effort. The gradient (in %) is such that a 5% climb is ridden at -0.6 watts/kg lower than a 10% climb. And the total elevation gain is such that a climb after a comparatively flat stage before a climb (TEG of 1000 meters) is ridden at about 0.45 watts/kg higher than a climb after a comparatively mountainous stage before a climb (TEG of 3000 meters).

This certainly fits with what you would assume; a short, steep climb at the end of a flatter stage (think the typical wall finish in a Vuelta) would be ridden at a high watts/kg, while a long, grinding alpine climb at the end of a tougher climbing day would be ridden at lower watts/kg.

They also show Dumoulin’s maximum power profile over each race, which shows 20 minutes efforts of 6.0-6.2 watts/kg and 60 minute efforts of 5.1-5.6 watts/kg across the four races (page 11). They also plot each of the 33 finishing climbs with the climb duration (X axis) and watts/kg (Y axis) to show his relative power output (page 12). He likely peaked in these four races in stage 14 of the 2018 Giro on Mount Zoncolan where he was bang-on 6.0 watts/kg for 40 minutes in finishing 5th on the stage.

General Classification Model

Winning the GC in a stage race should be considered as distinct from stage-by-stage success. Twelve of the last 30 TDF winners have done so with zero or one stage wins and eleven of the last 30 TDF winners have done so without winning a non-time trial stage.

To measure a rider’s GC ability, I collected results from the major stage races for the last 30 years. Riders were awarded points based on finishing positions with five different (arbitrary) scales for groups of races (A: Tour de France, B: Vuelta/Giro, C: Switzerland/Dauphine/Paris-Nice, D: races like Tour of Catalonia, Tirreno-Adriatico, Tour of Basque Country, etc, E: all other races of significance to predicting a grand tour GC). That final point is critical; stage races without apparent ability to transfer to winning a grand tour GC (Tour Down Under, Four Days of Dunkirk, old Dubai Tour) were not considered.

Which races are most strongly correlated with Tour de France results?

I matched all rider results (for their careers) in GC races with all of their Tour de France results (for their careers). Eg, Chris Froome’s 2018 Giro victory is matched with each of his Tour de France entries. After filtering for riders with 15+ GC races and at least one GC victory, I grouped by race (Giro, Vuelta, etc) and ran a Spearman correlation of GC results in that race with GC results in the TDF. Spearman correlations measure how strongly one ranking is associated with another ranking.

Most strongly associated with TDF results:

spearman-corr-gc-with-TDF

Results in the Tour de France (in other years) is most predictive of results in other TDF years. Closely following are the other two grand tours and the two most important Tour de France warm-ups (Dauphine/Switzerland). The strong correlation of the Tour of California is surprising, but that was dominated by American riders like Floyd Landis and Levi Leipheimer in the early years and two TDF winners have also won (Egan Bernal and Bradley Wiggins).

The other major week-long races follow at various degrees of moderate to weak correlation. It’s clear that winning a grand tour is something distinct from winning a week long or five day long race. The non-existent correlation of the Tour de l’Avenir (the U23 Tour de France) is interesting; Egan Bernal’s 2019 Tour de France victory was the only one won by a Tour de l’Avenir winner since it became a U25 only race in 1992.

Calculating GC Rankings

The points scales were designed with A races awarding 25 points to winner, B races 20 points, C races 12 points, D races 10 points, and E races 8 points. The value of points decays over time with a weight equal to 1 / (days_since + 730) and a five year window (meaning results since the 2014 Tour de France would be considered when predicting 2019 Tour de France).

I looked at numerous calculations methods (average points per races, total points, best performance, ignoring weighting, etc), but settled on counting just the top five results for each rider. Eg, for Egan Bernal entering the 2019 Tour de France his top five results were #1 in 2019 Switzerland, #1 in 2018 California, #1 in 2019 Paris-Nice, #3 in 2019 Catalonia, and #2 in 2018 Romandie. Bernal ranked 3rd best behind Geraint Thomas and Vincenzo Nibali and ahead of Jakob Fuglsang and Nairo Quintana going into last year’s Tour.

Tour de France Competitiveness

Last year’s Tour was one of the most wide-open with the fourth lowest point total for the #1 ranked rider since 1992 (only 2007, 2006, and 1999 were lower). The previous year’s Tour was the peak in this regard with Chris Froome coming off four wins in five years + holding the other two grand tour titles. Froome eclipsed Miguel Indurain in 1993, 1994, and 1995 and Alberto Contador in 2011.

The most competitive Tour in terms of the average points for the top 15 riders was 2016. All of the top 10 riders in GC points at the time of that Tour started the race (though this did not turn out to be a particularly competitive race with both Nibali and Contador performing poorly).

top-5-in-each-TDF

Above is a graph of the top five riders on each TDF startlist in terms of their GC points ranking. The peaks of Indurain, Armstrong, Contador, and Froome are visible, as well as the weaker transition periods that accompanied both Armstrong and Indurain’s departures from the sport.

Should the 2020 Tour de France actually run as scheduled, we would be in store for another very competitive race with no clear favorite and Froome, Bernal, Primoz Roglic, and Thomas all within less than 10 GC points of each other – similar to the competitive situation last year.

Predicting Tour de France success

Predicting podium success even for the best rider entering each year has not been a slam dunk. Since 1992 only 18 of 28 ‘best GC riders’ have finished on the podium – though the list of failures has been mostly among the weaker ‘best GC riders’ during those transition periods – including Alex Zulle in 1997 and 1998, Damiano Cunego in 2006, and Alexandre Vinokourov in 2007.

A logistic regression model predicting podium success using 1) the natural log of the GC rank entering the race, 2) the best GC performance for each rider, and 3) whether the rider had riden the Giro d’Italia showed each as significant predictors.

The coefficients showed a rider who entered with #1 ranking, had won the previous year TDF, and had ridden the Giro (the situation Chris Froome found himself in in 2018) would be expected to finish on the podium about 61% of the time. If not riding the Giro, that number would be 75%.

A rider in Egan Bernal’s position going into 2019 (#3 ranking, best finish having just won the Tour of Switzerland, and not having ridden the Giro) is predicted for about a 32% chance at the podium.

The limitations of this model are obvious:

  1. It doesn’t take team orders into account. Roberto Heras ranked 3rd, 2nd, and 4th in GC ranking in the 2001-2003 years riding in support of Lance Armstrong while never sniffing the podium. Also, riders who enter aiming for stages after big efforts in the Giro (eg, Simon Yates and Vincenzo Nibali in 2019) will be overrated.
  2. Abandons and DNFs are considered equally alongside finishes outside the top 3. Who knows what Chris Froome’s form would have been if not knocked out in 2014, but all the model sees is the #1 rider in the world defending his 2013 victory as not finishing on the podium.
  3. Young riders are probably underrated; Ullrich entered the 1996 Tour de France with no GC success in his career (though he already had a 3rd place in the World Championship Time Trial at age 20), but finished 2nd easily to team leader Bjarne Riis.

Comparing with Archived Odds

Sports Odds History has pre-race odds available dating back to 2009 from Westgate sportsbook.

The top three favorites (ML odds):

2019: Thomas (+225), Bernal (+550), Fuglsang (+550)

2018: Froome (+150), Porte (+400), Quintana (+800)… Thomas was +1400

2017: Froome (+125), Porte (+175), Quintana (+600)

2016: Froome (+110), Quintana (+200), Contador (+450)

2015: Froome (+175), Quintana (+225), Contador (+350)

2014: Froome (-111), Contador (+150), Nibali (+900)

2013: Froome (-154), Contador (+250), Joaquim Rodriguez (+1800)

2012: Wiggins (+110), Evans (+200), Menchov (+1600)

2011: Contador (-167), A. Schleck (+210), Evans (+2000)

2010: Contador (-200), A. Schleck (+700), Armstrong (+800)

2009: Contador (+100), Armstrong (+250), Leipheimer (+400)

The podium model matches the bookmaker favorite every year except 2015 (Contador #1, Froome #2, Nibali #3, Quintana #4) and 2016 (Nibali #1, Froome #2, Contador #3, Quintana #4).

Used as a diagnostic tool to judge which riders should have been considered favorites in each race, this GC model has value. As a predictive method looking ahead, certainly less so than evaluating the current bookmaker prices ahead of each race.

Climbing Recap thru Pyrenees

Before the GC battle recommences on Thursday with three straight Alpine stages, let’s recap the climbing battles so far through four stages (really three with effort for the GC contenders). Below is a chart of the top climbers so far in terms of time lost to Thibaut Pinot in each of the four mountain stages. Remarkably, Pinot has lost time to just one GC contender on one stage: when Geraint Thomas took two seconds in stage 6.

top-climbers-thrust15

The likes of Adam Yates, Fabio Aru, Bardet, Quintana, and Dan Martin can’t even fit on the chart. Those five on average lost 37 seconds to Pinot on Planche de Belles Filles and each has lost at least 2.5 minutes per stage to Pinot in stage 14 and 15. The top 11 on this graph averaged losses of 18 seconds to Pinot on Planche de Belles Filles and only Uran (stage 15) and Porte (stage 14) have lost more than two minutes to Pinot on stages 14 and 15.

So far, Landa has been next best. In stage 14 he was part of the most select group on the Tourmalet until Pinot broke away with less than 500m to the line. In stage 15, he broke away on the penultimate climb and only Pinot could catch him on the final climb. And in stage 6 he launched an attack which kept him away for a few kilometers and then wasn’t dropped at any point by the attacks in the final kilometer on that stage. He has a competitive Giro in his legs which may hamper him in the final week, but has shown the ability to attack from the pack multiple times.

Both Buchmann (3rd best) and Bernal (4th best) had great results in their tune-up races in June. Buchmann finished even with Jakob Fuglsang ahead of all other GC contenders in the only true mountain stage, while Bernal dominated in Switzerland where he was easily the top climber in three mountain stages.

This presentation strips out the impacts of the time trials, the crosswinds, and the punchy breakaways Alaphilippe launched in stages 3 and 8. It shows Alaphilippe, Thomas, and Kruijswijk have been almost dead-even across three mountain contests. Each has looked shaky once: Kruijswijk lost around 30 seconds in stage 6, Thomas in stage 14, and Alaphilippe in stage 15.

Projecting losses

These plots below show the time lost to the leader (Pinot both days) based on an estimate of how far from the finish a rider was dropped from contact with the leader (from GPS data and broadcast). The point is to measure time losses per kilometer; this isn’t designed to predict – for example – Romain Bardet’s losses when being dropped on the Col su Soulor in stage 14.

st-14-dropped

st-15-dropped

In stage 14 up the Tourmalet, the GC men lost about 24 seconds per kilometer to Pinot + final selection (Landa, Bernal, Alaphilippe, Buchmann, Kruijswijk). Riders began to be dropped almost immediately with Yates and Martin going before 10km left, Quintana and Aru around 10km left, and more when FDJ blew-up the group with a bit over 5km to go. The Tourmalet is at least 6-7% gradient for every kilometer in the last nine coming in so time losses were linear with Valverde having the largest deviation (regression line predicts about 90 seconds, he loses a bit over 50 seconds).

Stage 15 was both a shorter climb and less steep in the finishing kilometers. The GC men lost about 14 seconds per kilometer to Pinot. The linear fit (and forcing the intercept to 0) doesn’t fit this data nearly as well as for stage 14. This is probably because Prat d’Albis is very steep in the middle and less so in the closing kilometers (eg, the stretch between 9km and 5km to go where Pinot attacked averages over 9% vs 5% in the final two kilometers). So Bernal and Buchmann lost only ~5 seconds/KM vs more than double that for those jettisoned around 6-8km to go.

prat-albis

Projecting forward – and taking the difference between these two rates of losses – maybe we can figure on 20 seconds/KM lost on mountaintop finishes. However, stage 18 up and down the Galibier finishes with 20km of downhill which may allow Alaphilippe to claw back some time lost if he’s dropped earlier. Stage 19 ends with an odd category 1 climb with steep ramps to start, a gentler middle, a steep end, and then a flat-ish few kilometers to the finish. Stage 20 is the only true summit finish, but even that climb has numerous flatter sections. Alaphilippe’s road to winning in Paris may only require him to survive until the final kilometers of these climbs if none of the five chasers has a big attack in them.