Stage 6 Tour de France Preview

This is a very interesting hilly stage which has the potential to turn out in a bunch of different directions after a start to the Tour which has pretty much followed the script so far.

Headline Numbers

These are outputs of machine learning models trained on top-level races in the last six years.

Climbing difficulty: 5.4 (on a scale where 0 = flat and 20+ = high mountains)

Probability of morning break winning: 32%

Probability of ending in bunch sprint: 8% (at least 20+ riders finishing together in winning group)

And 220 kilometers in length, which is the longest of this year’s Tour

Similar Stages

Recent similar stages in terms of parcours and uphill finish include Stages 1 and 2 in 2021, Stage 3 in 2019, Stages 5 and 6 in 2018, Stage 3 in 2017, and Stage 2 in 2016. Each had hilly parcours without any significant climbing efforts, but ending in a fairly short uphill finish. Those seven stages have been won 3x by Sagan, 2x by Alaphilippe, and once by MVDP (with Dan Martin winning also). So these stages tend to be won by ultra-elite punchy riders.

However, the yellow jersey was in play at least hypothetically for a bunch of riders entering those stages which kept their teams engaged in A) limiting who got away in morning breakaway and B) trying to pull that break back. In those seven stages, between 4 and 7 riders got into the morning breakaways and the groups were mostly riders from second tier teams with 2019 Tim Wellens being the best of the bunch.

In this case, only four other riders are within 30 seconds of Wout Van Aert. Of them, Lampaert and Boasson Hagen don’t have a chance of taking the jersey from Van Aert from the peloton tomorrow. Powless perhaps has a very outside shot. Pogacar probably has no interest with the efforts on the cobbles yesterday and a mountain top finish tomorrow. That leaves no team particularly interested to chase to try to get into yellow.

In terms of the stage win, this is not a finish for sprinters. Of the seven comparison stages, they were contested by final groups of 40 or fewer riders where Sagan, Michael Matthews, and Sonny Colbrelli were the most “sprinter-like” riders in the bunch. Translating that to 2022, that means Ewan, Groenewegen, Philipsen, and Jakobsen probably have no shot tomorrow. That means four fewer teams who will be interested in chasing things down; rather, they’ll want to fire riders up the road in the breakaway.

That leaves almost every team outside the six with GC favorites (UAE, Jumbo, INEOS, Movistar, Bora, and FDJ) interested in making sure they are represented in the breakaway. In the first week of stages in the last four Tours it has been extremely rare for double-digit riders to get into the morning breakaway. We’ve seen it only twice, once on Stage 6 in 2019 (summit finish) and once on Stage 7 in 2021. It is rare to see large breakaways go in the first week, but this could be the day.

How it could play out

I would amend the model’s estimate of the breakaway’s win probability upward to something like 80%. That remaining 20% is if Jumbo-Visma stubbornly brings the breakaway back to keep Van Aert in yellow/try to get another stage win or if something weird happens with morning break and a lot of strong teams don’t get someone in it.

Stage 7 from last year might be the template for how the stage goes. That was also the longest stage in the race after a first week where two hill top finishes and a time trial established a hierarchy. It was hillier with more climbing and less climbing at the very end of the race, but not dramatically so. In the end, nearly 30 riders got away in the first 50 kilometers of the stage including major one day race/classics/puncheurs like Asgreen, Van Der Poel, Van Aert, Mohoric, Stuyven, and Kragh Andersen. The GC group motored in five minutes down still 30 riders strong.

My model which is designed to predict which riders will get in breakaway based on tomorrow’s parcours predicts the following chances (trained on past breakaways so this considers both desire and ability to get into the break). All together, this model predicts 9 riders. So if we see something like 18 riders, we could multiply these chances by 2x. I’ve italicized anyone probably on team duties.

15%: Wellens, MVDP
14%: Gougeard
13%: Bonnamour
12%: Cort, Politt, Perez, Gallopin
11%: Storer, Clarke, Dewulf
10%: Rolland, Teuns, Oliveira, Bouchard
9%: Van Moer, Lafay, Mohoric, Soler, Gilbert, Simmons
8%: Goossens, Tratnik, Mollema, Sanchez, Houle, Van Der Hoorn, Benoot, Turgis, Martin, Kamna, Woods, Dillier

Notably on this list there isn’t a Quick Step rider as Cattaneo, Honore, Bagioli, and Asgreen are all 6-7%. Surely they will be represented in any move which has a chance to stick.

In addition, it’s possible we see non-traditional breakaway specialists try to get into the move. This could be someone like Sagan, Michael Matthews, or even Jasper Stuyven if he gets any freedom from the peloton.

Combining this intuition about break’s chances with support from the numbers I have the following probabilities for tomorrow given an 80% chance of morning breakaway winning:

RiderWin Probability
Van Der Poel Mathieu21.2%
Wellens Tim4.2%
Mohoric Matej4.0%
Woods Michael3.5%
Matthews Michael3.1%
Bonnamour Franck2.5%
Teuns Dylan2.1%
Pogacar Tadej1.9%
Roglic Primoz1.9%
Van Aert Wout1.9%
Martin Guillaume1.9%
Stuyven Jasper1.8%
Mollema Bauke1.8%
Ciccone Giulio1.6%
Vuillermoz Alexis1.6%
Asgreen Kasper1.6%
Bagioli Andrea1.6%
Lutsenko Alexey1.5%
Honore Mikkel Frolich1.3%
Barguil Warren1.2%
Guerreiro Ruben1.1%
Sagan Peter1.0%
Kron Andreas1.0%
Bardet Romain0.9%

MVDP at 21% is pretty rich given his mediocre form so far, but given his overall ability + propensity to get into breakaways I can make sense of it. I doubt Bonnamour at 2.5% to get his first pro win today is close to right.

GC in 2022 Tour de France

There’s a number of big stories on the general classification of this year’s race. Obviously Pogacar is going for his third straight Tour all before turning 24. He’ll get another showdown with Primoz Roglic after their tight battle in 2020 and Roglic crashing out of the race in 2021. Jumbo-Visma also has another contender beyond Roglic in the form of last year’s 2nd place Jonas Vingegaard. And a bit under-the-radar, but INEOS Grenadiers haven’t gone more than three grand tours without winning GC since 2015 (a time period during which they’ve won 9 of 21 GC titles).

Pogacar vs Roglic?

Pogacar is the bookmakers favorite – significantly – with a price of about 1.77 for/1.90 against at Pinnacle (implied about 52% to win the Tour). Both Roglic and Vingegaard are implied around 12-14% based on odds of about 5.0. Vingegaard has shortened significantly since a strong performance in the warmup Dauphine race as Roglic used to have roughly two thirds of the win probability among the two in March, but that has shortened to about 50/50 or even advantage Vingegaard in the last two weeks. So while this race will certainly be billed as Roglic vs Pogacar, Vingegaard is coming in very strong to have equal odds – especially given the team will certainly defer to Roglic a bit.

What percentage of Roglic+Vingegaard win probability does Roglic have?

Of course the question must be asked whether Pogacar deserves to be the massive favorite? He is likely the biggest Tour de France pre-race favorite since Chris Froome in 2013 (I’ve seen about equal odds for Froome 2013 as Pogacar this year). In fact, that’s still true looking at all grand tours back to that 2013 TDF, so Pogacar’s as big a grand tour favorite as we’ve seen in 26 straight races.

If you average each rider’s top 7 GC performances in the last three years – using Pro Cycling Stats points system – Roglic and Pogacar come out well ahead of the competition, but close to one another. The 300+ points accrued by this scale is approximately what Roglic entered last year’s race at and similar to where Chris Froome entered the 2017 edition. Froome entered the 2018 Tour at over 400 points on this scale.

RiderAverage PCS Points of Top 7 GC Performances
Primoz Roglic333
Tadej Pogacar306
Geraint Thomas202
Adam Yates189
Enric Mas184
Jonas Vingegaard179
Top 6 riders in 2022 TDF by this method

In fact, whatever way you slice and dice it, Roglic and Pogacar have gained points in GC races at nearly identical rates. So why is Pogacar such a massive favorite?

My hypothesis is that Pogacar has shown himself more capable of putting out truly dominant performances. I’ve generated a quick method to find the most dominant stage race performances in recent years. What I’ve done is strip out riders in the breakaway, and then take the average seconds gained over other top riders in the stage. We’re trying to identify performances like Chris Froome’s multi-mountain raid in Stage 19 of the 2018 Giro where he won by 180 seconds over 2nd place.

Indeed, among grand tour stages since 2018, Froome’s victory in that stage rates #1 with a weighted average of 273 seconds over the chasers. Pogacar’s Stage 8 victory in last year’s Tour ranks #2, and Richard Carapaz on Stage 14 of 2019 Giro ranks #3. That’s a pretty good short-list of dominating efforts – basically multi-climb mountain raids.

Expanding out from grand tours, Pogacar also has >60 seconds gained dominant performances on Stage 6 of 2022 Tirreno Adriatico, Stage 20 of 2019 Vuelta, the Stage 20 Time Trial in 2020 Tour de France, Stage 5 of 2021 Tirreno Adriatico, and Stage 9 of 2019 Vuelta. That’s a total of six massive efforts in three years which won him four races and produced his shock podium at 2019 Vuelta.

In the same time period, Roglic has just three similar efforts – all in the final week of the 2021 Vuelta on Stages 17, 20, and 21 (and that Stage 20 effort was a group effort with other riders). Roglic just has not shown the ability to produce massive race-winning efforts nearly as often as Pogacar relying more on a very strong time trial and late attacks on climbs.

Of course, looking physiologically it’s also possible Pogacar just has more watts available for longer than Roglic. CronosWatts.com produced a phenomenal article comparing the two riders over their careers back in March, looking at their best climbing performances and the times and watts per KG they produced. I’ve overlayed their two graphs showing the two riders with Roglic represented by the green trend line and Pogacar in blue.

Frederic Portoleau’s conclusion:

The 2 Slovenians have a very similar level in the mountains for durations of effort of less than 25 minutes. For the long climbs, a small advantage for Pogacar. On a climb like Alpe d’Huez Pogacar must be able to achieve a time of 38min30sec or a little less in the event of maximum effort. Roglic for his part, has the potential to climb Alpe d’Huez in 39 min.

Frederic Portoleau from https://www.chronoswatts.com/news/203/

Perhaps you can argue Roglic is faster on the sub 10 minute climbs which might allow him to steal some time on the finishes of stages 6, 8, 9, and 14, but overall they are even on efforts like those faced on Planche de Belles Filles on Stage 7 and Peyragudes on Stage 17. Pogacar has the advantage on climbs like Col du Granon (Stage 11), Alpe d’Huez (Stage 12), and Hautacum (Stage 18) – at least using these historical values.

The same site has published summaries of 2020 and 2021 Tours with major climbs. Unfortunately they don’t include the Planche de Belles Filles time trial in 2020 or the full Ventoux ascent where Pogacar was dropped by Vingegaard and lost 40 seconds in 2021. They also include non-competitive climbs like Stages 6/16 in 2020 where GC riders were not riding full gas. Looking at the summary Average Standard Watts Pogacar beat out Roglic by maybe 0.5% in 2020 and Vingegaard by about 2% in 2021. The missing data works against us here, but just using their times in lieu of power estimates, Pogacar rode 8% faster than Roglic on Planche de Belles Filles and Vingegaard rode full Ventoux ascent 1% faster than Pogacar. Combining those values with other climbs says Pogacar has been about 1.5% better over the last two Tours. That seems like enough to call him a clear favorite.

Teammates

The Pogacar vs Jumbo-Visma battle won’t just be confined to those three riders; depending on tactics we could see one of Jumbo-Visma, UAE Team Emirates, or even INEOS Grenadiers try to control the race by leveraging their teams. In fact, we could see this early on potentially windy stages like 2, 3, and 4 or the cobbled stage 5.

I’ve ranked team quality on x-axis of who has the better time trial riders and y-axis of who has the better classics/one day riders. This might give us an indication of who is best setup to support their riders on the flat or hilly days where there is wind or cobbles in play.

Jumbo-Visma is the clear leader here as they have very strong time trial riders supporting Roglic like Van Aert and Vingegaard while also having strong classics riders like Laporte and Benoot. UAE is one of five strong teams behind Jumbo-Visma along with INEOS, Quick Step, and Bahrain, and BORA. Advantage Jumbo-Visma, but this isn’t a chasm like between Jumbo and Movistar.

Moving to the mountains, during the last two Tours there’s been a lot of talk about Pogacar’s team not being strong enough to support him, while Jumbo-Visma has been seen as a super team with multiple GC contenders lining up to support Roglic. Measuring a climbing domestique’s ability to support GC riders is still definitely not a solved problem, but I’ve tried leveraging my rider ratings which identify how good riders are at racing certain parcours based on their finishing position. Lower values below indicate better expected finish positions across the top 4 support riders on each squad.

TeamAverage of #2-5 ClimbersTeam Climbing Rank
2020 Jumbo-Visma12.41st
2020 UAE15.64th
2020 INEOS15.85th
2021 Jumbo-Visma11.83rd
2021 INEOS11.01st
2021 UAE16.54th
2022 Jumbo-Visma9.71st
2022 UAE12.82nd
2022 INEOS15.54th
Expected climbing performance rank by top 4 climbing domestiques

In 2020, Jumbo-Visma had by far the best climbing domestiques to back up Roglic and they rode a defensive race which delivered Roglic to the final time trial with a minute advantage. Roglic’s teammates could only watch as Pogacar made up the difference and won the Tour.

Last year, Jumbo-Visma again had a wide advantage over UAE, though INEOS was strongest, but a strong team was less important after Pogacar’s incredible first week and the team had an easy job to protect a five minute lead after nine stages.

This year, Jumbo-Visma will again have an advantage over UAE, but only because of how much stronger their lineup is this year. Both squads have improved vs 2020 and 2021. Sepp Kuss is likely the best pure climbing domestique in the race – ranking 14th in my climber rating – which will allow Jumbo-Visma to have something like three of the final 15 riders in the lead group. UAE added veterans Marc Soler and George Bennett over the offseason which should give Pogacar’s team something like 5 riders in the last 40 riders in the lead group compared to just two in 2021.

Other Contenders

Based on betting odds and making reasonable assumptions about where the vig is on the GC winner market, books are pricing Pogacar, Roglic, and Vingegaard at something like 80% for one of them to win. That leaves about a 20% chance of a big surprise whether from a former winner like Geraint Thomas, a perennial contender like Yates or Quintana, or one of the younger crowd of podium contenders like Vlasov or Enric Mas.

Contenders other than Roglic, Pogacar, JonasImplied Probability of Winning
Geraint Thomas3%
Daniel Felipe Martinez2.5%
Aleksandr Vlasov2%
Ben O’Connor1.5%
Enric Mas1%
Jack Haig, Damiano Caruso, Adam Yates<1%
Jakob Fuglsang, Romain Bardet, Nairo Quintana<1%
Alexey Lutsenko, David Gaudu, Rigoberto Uran<1%

This gives INEOS perhaps a 6-7% chance of winning their first Tour in three years. Thomas has just won the Tour de Suisse – one of two big warm-up races, but only after the favorite Vlasov left with a positive Covid test. Martinez had an incredible spring with a win in the Tour of the Basque Country and podiums after two other big races, but looked undercooked at the Suisse warmup and has a best grand tour GC result of just 5th. The final INEOS rider Adam Yates ranks as the third best performing rider on climbing stages in the race, but has just two 4th places in his GC career largely due to a poor time trial and big drop-off in performance in later stages of races.

Of the remaining riders, Mas, O’Connor, and Haig will likely be done in by the 40km time trial on stage 20 where they could easily lose two minutes plus to the Slovenians/Vingegaard. O’Connor was the final rider dropped by Roglic/Vingegaard on the final stage of the Dauphine tune-up race and while he finished 4th last year, he benefitted from gaining 6.5 minutes on other GC riders in a breakaway and likely wasn’t the 4th best rider in the race.

Vlasov has a string of strong week-long GC performances in the spring including a massive win in the Tour de Romandie mountain time trial, but he left Tour de Suisse with Covid. If he’s back on form he has a decent podium chance as his team support ranks 3rd best in the mountains with a strong squad of Austrian/German climbers.

Sprinters in 2022 Tour de France

One of the biggest stories leading into the 2022 Tour de France was whether Quick Step will select young sprinting star Fabio Jakobsen or 34-time Tour de France stage winner Mark Cavendish to lead them on sprint stages. That was resolved today with Jakobsen’s selection. That significantly clears up what should be a very compelling sprint battle between two young stars – Jakobsen and Alpecin’s Jasper Philipsen – and a host of veterans including Caleb Ewan, Dylan Groenewegen, Peter Sagan, and Wout van Aert.

Who are the best sprinters?

I’ve written this year about my Bunch Sprint Model which evaluates sprinting success based on finishing position solely in sprints a rider contests while also considering the strength of opposition a riders sprints against again considering only those opposing sprinters who contested the sprint. You can read more about methodology and results at this link. Think of this model as looking to identify the best sprinters if they all had a chance to sprint against one another.

Back in February this model loved Fabio Jakobsen due to a very high hit rate in sprints he actually contested. Since then, Jakobsen has continued to sprint at a high level with six wins on a variety of parcours, while Jasper Philipsen performed very well at UAE Tour and then hasn’t done much since. This model evaluates the two of them as neck-and-neck on top of the sprinting world.

Behind them, the model rates Cavendish 3rd and Olav Kooij 4th. Cavendish failed to be selected, while Kooij also couldn’t rate selection on Jumbo-Visma’s GC focused squad. Caleb Ewan (5th) and Wout Van Aert (6th) are the other two elite sprinters at this year’s Tour. Van Aert has only participated in a bunch sprint seven times in 2022, but six have been podiums.

Further down the list are veterans like Alexander Kristoff, Dylan Groenewegen, Peter Sagan, and Mads Pedersen. Kristoff landed Stage 1 and the yellow jersey in 2020, but hasn’t won a World Tour sprint since. Sagan has had multiple covid bouts, but finally landed his first win in Tour de Suisse a few weeks ago. Pedersen might be more of factor on the more classics-like finishes as three of his five 2022 victories have come on either uphill finishes or finishes with a small climb right before the finish.

Groenewegen has been a bit in the wilderness due to his suspension, being eclipsed by younger riders at Jumbo Visma, and his subsequent transfer to Bike Exchange. He’s won five times this year, but has only a single podium finish in World Tour sprints. He’s actually won his last three contested sprints across three sub-World Tour races, but was dropped on several climbs in Dauphine and left that race without contesting a sprint.

The only other sprinters it makes sense to mention are Team DSM’s Alberto Dainese and Bike Exchange’s second sprinter Michael Matthews. Dainese won a shock victory in Stage 11 of the Giro, but doesn’t have another finish better than 5th in a sprint all year. Matthews is really more of a tough parcours sprinter at this point in his career as his only wins since 2020 have come in one day classic Bretagne Classic and on a tough stage of the Volta a Catalunya this spring.

Ewan’s Lotto Soudal Team Changes Strategy

Caleb Ewan’s Lotto Soudal team has a well established approach to grand tours since landing Ewan in 2019. They’ve brought the 6th, 2nd, and 2nd heaviest lineups to the last three Tours de France and 4th, 5th, and 4th heaviest to the 2019, 2021, and 2022 Giros – driven by big engines like Roger Kluge, Jasper De Buyst, and Thomas De Gendt. Beyond the size of Ewan’s teammates, they relied on experienced riders to back Ewan, regularly trotting out lineups where 3-4 riders had ridden 25+ bunch sprints with Ewan in recent seasons. That element will be different in 2022 as while they will again have one of the heaviest starting squads, Ewan’s teammates have very little experience supporting him in sprints. Riders like Kluge, De Buyst, and recent additions Michael Schwarzmann and Rudiger Selig were left out in favor of more classics focused engines like Frederik Frison and Florian Vermeersch. It will be interesting to see if their tactics shift more towards Ewan surfing wheels rather than utilizing a big sprint train.

Philipsen’s Rise

Jasper Philipsen was always a highly touted sprinter, landing three World Tour stage wins before his 23rd birthday, but he was squeezed out of UAE Team Emirates by veteran sprinters and the team’s GC focus around Tadej Pogacar and transferred to Alpecin for the 2021 season. There he has blossomed into potentially the best sprinter in the pro peloton thanks to a massive 2021 season.

His 2021 story was inextricably tied to Mark Cavendish as he was Cav’s main opponent in his breakout Tour of Turkey in 2021 (Cav landed four wins to Philipsen’s two) and again he kept coming up short to Cavendish in the Tour de France where Philipsen reeled off six stage podiums, but couldn’t score a win. Philipsen followed up the Tour with two Vuelta stage wins and four one day race wins, including on cobbled terrain. If the cobbled stage 5 turns out to be less vicious than expected it wouldn’t be a surprise to see Philipsen sprinting for the win as he’s handled similar terrain in the past.

Bike Exchange All in For Groenewegen

Before Bike Exchange’s team announcement there was all possibility of them sending a balanced team to chase GC or stages with Simon Yates, but instead they’ve gone all in on Groenewegen and Matthews as of their announced team only Nick Schultz is anything of a climber. They will likely have the heaviest lineup at the Tour at 73.75 kg; that would also be the heaviest of any team at the Tour since Lotto Soudal’s 2016 team built around Andre Greipel and his sprint train. Bike Exchange will hope that extra power will allow them to keep Groenewegen up front during potentially windy stages 2, 3, and 4.

TeamAverage Weight of Riders in KG
Bike Exchange73.8
Alpecin73.0
Lotto Soudal72.6
Quick Step71.5
Bahrain70.8

Fabio Jakobsen + Michael Morkov

Michael Morkov’s dominance as a leadout man has been well established in recent years as he’s guided Sam Bennett and Mark Cavendish to back-to-back green jersey wins and six Tour de France stage wins. Of course combining him with Fabio Jakobsen should produce good results. However their success in 2022 has been massive with Jakobsen winning five of seven sprints he has contested with Morkov in the lineup.

It’s difficult to know which are the best leadout riders on Quick Step as the team is just phenomenally well drilled overall, but it seems like Jakobsen has ridden more with the ‘B’ team than the elites. Since last July when Jakobsen started sprinting again he’s ridden most often with Florian Senechal and Bert Van Lerberghe in bunch sprints. On the other hand, Cavendish has been most often deployed with Morkov and Davide Ballerini.

I’ve written in the past about how tough it is to evaluate sprint helpers and that the best guide may just be to look at how teams deploy riders in different races. With a start in 2022, Morkov will now have raced four straight Tours for Quick Step (2019-22) – as well as the 2022 Giro with Cavendish. Compare that record to Senechal and Van Lerberghe. Senechal has ridden just two grand tours (neither the TDF) in that time period, while Van Lerberghe has similarly raced just two grand tours and won’t feature in this TDF either. While Quick Step’s full sprint train might be a bit lighter than past years, combining Jakobsen with Morkov could still produce tons of success.

Van Aert’s Green Jersey Bid

Van Aert appears to be a massive favorite for the green jersey points competition as his odds – even after a minor injury last week – sit at 1.65 (implied at around 56%). He benefits from a race bereft of many true sprint stages (only four are evaluated in the Tour regulations as flat stages: 2, 3, 19, 21) where true sprinters could challenge him, while there are also a lot of classics-esque and medium mountain stages where he should find great success from a reduced peloton.

YearStages evaluated as flat by organizers
20224
20218
20207
20197
20187

This graph above shows the result of a model which considers whether a rider was able to survive in the group and sprint for the win (finish in top 25 in a bunch sprint race) depending on the climbing difficulty of the parcours. Something like stage 6 into Longwy or stage 8 in Lausanne would rate a 5.5 on this scale, while the flat stages 2 and 3 would rate <1. Stage 10 which features a long, shallow drag to the finish in Megeve would rate just off this scale ~8.5.

Van Aert, Matthews, and Philipsen show fairly consistent ability to stick with the front group as climbing intensifies, but the other sprinters show degraded abilities on tougher terrain – particularly Jakobsen and Groenewegen.

It will be a huge advantage for Van Aert that he has been >85% to survive to the sprint finish regardless of the difficulty of the climbing that day. His ability to survive on those tougher sprint days like stages 4, 6, 8, 13, and 15 and to even get into the break on harder days will make him tough to defeat.

Looking at the last 10 points competitions, we can split up the source of points between finish-line sprints and intermediate sprints. Finish-line sprints can be accrued by being a great bunch sprinter, while intermediate sprints can be accrued by getting breakaways or by tactically out-sprinting opponents on the road. In the last 10 competitions, the green jersey winner ranked 1st in points from finish-line sprints 8 times and 2nd twice. The record on intermediate sprints was more mixed with Kittel, Cavendish, Bennett, and Sagan twice taking green without gathering the most intermediate sprint points. We haven’t recently seen someone take green by dominating intermediate sprints and not being one of the two best on finish-line sprints. Will Van Aert be one of the two best sprinters on finish-line sprints?

If not, the market shows Jakobsen and Philipsen as the best odds among pure sprinters. Jakobsen won the points jersey at 2021 Vuelta and might look to follow Cavendish and Bennett and Quick Step green jersey winners. Philipsen was fairly close to Jakobsen in points competition when he abandoned Vuelta, but he managed only 4th in green jersey race in 2021 Tour de France due to hardly contesting intermediate sprints.

A Better Bunch Sprint Model

I introduced a very basic model for rating riders two months ago which simply took the natural logarithm of finishing rank in each race to make the stat Log Rank. At the end of that piece, I introduced a way to model Log Rank over long time periods to find whether riders a) achieve better or worse finishing ranks overall, b) achieve better or worse ranks in bunch sprint finishes, and c) achieve better or worse ranks in races with a lot of climbing. That ranking model does a good job of distinguishing riders who are expected to perform better or worse in bunch sprints, but not a great job at distinguishing truly great from merely good sprinters.

The issues with that Log Rank model are: 1) it considers all different parcours of races in building the overall impact data point, not just races ending in bunch sprints, 2) it considers all bunch sprints for a rider, even those where a heavier sprinter was jettisoned a climb and failed to participate in the sprint finish, 3) it considers bunch sprints where a rider was present in the bunch, but was actually helping a teammate (eg, Davide Ballerini often sprints for himself in smaller races, but is in the sprint train for bigger ones), and 4) it doesn’t consider the quality of the sprinters participating alongside each rider in the sprint (ie, the competition on that day may be much reduced by tougher parcours, mechanicals, crashes, or splits in the bunch).

So how to account for these issues. First, we want to just evaluate sprinters based on bunch sprint finishes. Anything which doesn’t end in a bunch sprint is ignored by this new model. Second, we want to ignore any race for a rider where they didn’t finish with the first group in the sprint AND in the top 25 positions; this indicates they were capable of sprinting. Third, we want to ignore any race where a rider wasn’t the top finisher on the team. Many riders participating in as a lead-out man can rack up 10th place finishes which can pollute our understanding of them as sprinters in races where they compete as team leader. And fourth, we consider the cumulative strength of the sprinting field which meets these first three criteria based on the simple Log Rank model outputs.

Determing strength of sprinting field

How does point #4 above work in practice? Seventeen sprinters in UAE Tour stage 1 on Sunday qualified for these criteria including the top 13 finishers. My basic Log Rank model predicts following finishing positions in a generic strong race for those seventeen riders.

RiderPredicted Rank
Jasper Philipsen3.0
Arnaud Demare3.2
Sam Bennett3.3
Pascal Ackermann4.0
Dylan Groenewegen4.9
Elia Viviani6.9
Mark Cavendish7.0
Marijn van den Berg9.3
Olav Kooij9.6
Marc Sarreau10.1
Rudy Barbier10.8
Max Kanter16.5
Emils Liepins27.7
Jonathan Milan29.7
Tom Devriendt34.5
Michael Schwarzmann35.2
Jonathan Canaveral47.3
Qualifying sprinters from UAE Tour Stage 1 (2022)

A lot of very talented sprinters were in this race – including seven with an expected finishing rank of 7.0. Compare to stage 1 of Tour of Oman where the top sprinters were Fernando Gaviria (5.0), Mark Cavendish (7.0) and no one else with a predicted rank better than 10.0.

To determine the cumulative strength of sprinting field, I just take the reciprocal of each rider’s predicted log rank (1 / predicted log rank) and add them together. A top sprinter like Bennett or Philipsen will contribute 1/3 or 0.33 points while someone with a very low prediction like Canaveral or Schwarzmann will contibute 1/40 or 0.03 points.

The top races for sprinters tend to be the Tour de France, Milano-Sanremo, Paris-Nice, and UAE Tour with cumulative strength of sprinting fields around 3.0 to 4.0 depending on the specific day. World Tour races in general average just under 2.0, with a wide range, while .Pro races average just above 1.0, again with a wide range. The lowest pro races at .1 level tend to average just below 1.0 with hardly any rating better than 1.5.

With that data calculated, it is simple to specify a model using this strength of sprinting field and rider to predict both finishing rank and whether a rider won the sprint. Both of these models find 1) the impact of individual rider on success metric and 2) a potentially non-linear impact of the cumulative strength of sprint field.

To Predict Finishing Rank:
gam(log(finish_rnk) ~ rider + s(strength_sprint_field))

To Predict Win:
gam(win ~ rider + s(strength_sprint_field))

I ran both models for this example on data since the start of 2020, only considering riders who participated in at least 16 sprints meeting the criteria laid out above. This ranged from Wout Van Aert with 19 sprints to Philipsen/Ackermann with 45.

Who is the top sprinter in early 2022?

Both models produce similar results given the data. Fabio Jakobsen is seen as the most likely sprinter to win a given race and the sprinter who will finish with the best finishing position overall. For example, in a typical World Tour level sprint the models predicts Jakobsen to win 53% of the time and finish an average of 1.9. Wout Van Aert is predicted 2nd in win probability at 44% and 4th in finishing rank at 2.7. Sam Bennett is tied with Caleb Ewan for 3rd in win probability at 38%, but slightly ahead of him for 2nd place in finishing rank at 2.2. Ewan is predicted at 2.6 in finishing rank.

Those four comprise a fairly clear top group with Jakobsen fairly clearly the #1 sprinter in the world. Behind those four are guys like Philipsen, Groenewegen, Cavendish, and Demare. As a sign of his diminished form in recent years, Peter Sagan ranks outside the top 25 in predicted win probability and 15th in predicted finishing rank.

Fabio Jakobsen

Looking at the data in this way it’s obvious why Jakobsen is the top predicted sprinter while ranking only fifth in the PCS Sprinter Ranking and 14th (!) in my own basic Log Rank model. Jakobsen had three week long stage races in his comeback from serious injury last year where he didn’t compete as a sprinter. Basically, the basic Log Rank model sees a guy who was “awful” at sprinting for a dozen sprints. But, when we restrict just to races where he was the team leader and he was in the sprint pack, the graph below shows he has been dominant.

Jakobsen is winning nearly 70% of his sprints where he is the leader and is contesting the sprint since the start of 2020. That blows everyone else away, with Van Aert and Ewan managing only a mid 40% win rate in that time. Jakobsen has raced lesser competition than guys like Van Aert, Ewan, and Bennett, but he’s dominated that competition.

One of the big stories of this and last cycling season is Mark Cavendish’s return to massive success with Quick Step, including tying the record for career Tour de France stage wins. He has twelve wins since the start of 2021 – including two this season – and easily rates as a top 10 sprinter in the world right now. Because he and Jakobsen race for the same team, only one of them is likely to make the Tour de France team where Quick Step sprinters have been steered to 14 sprint stages in the last five races. Unfortunately for Cavendish, Jakobsen isn’t simply just another top 10 sprinter – he’s the best in the world right now.

Too Many Leaders – Analyzing Team Depth

Pro cycling teams have to juggle a lot of goals: for the season, for a stage race, for an individual race. They also need to juggle ambitions of ~30 riders of varying levels of experience and skill. In most races, teams only ride for a designated leader or maybe 2-3 designated leaders. Based on the parcours and who is performing best, teams decide who are the protected leaders and who will be riding in support in each race.

As we move into a new season, teams have hired new riders and let others go. I had a go at making some basic projections on how teams strengthened or weakened their squad with transfers, age based regression, and natural regression/progression in points earned. One of my caveats in that article was that the projections did not account for team strategies or rider schedules based on transfers. There are only so many leadership positions to go around and teams who hire more leaders are at risk of needing to demote some leaders to support roles in certain races.

In this piece I define “leader” as the top finisher for a team in a race. Of course the top finisher is not always the rider(s) who were designated as the leader at the beginning of the race. However, the top six riders in % of races as leader in 2021 were Nairo Quintana, David Gaudu, Giacomo Nizzolo, Guillaume Martin, Aleksandr Vlasov, and Tadej Pogacar so I think it’s a reasonable proxy.

How This Plays Out For Teams

A quick example, UAE Team Emirates transferred in five major signings who spent at least some races in 2022 as their team’s leader – sprinters Pascal Ackermann and Alvaro Hodeg and climbers Joao Almeida, George Bennett, and Marc Soler. They transferred out four major riders who spent some races as leaders – sprinter Alexander Kristoff, climbers Joe Dombrowski, and David De La Cruz, and puncher Sven Erik Bystrom. Five in, four out. The riders leaving were UAE’s #1 rider on a race day 52 times. The riders coming in were their team’s #1 rider on a race day 76 times. UAE also hired wunderkind climber Juan Ayuso who was the leader in a race – primarily at U23 level – 17 times in 2021. In total, they raced 233 times in 2021 and the riders on their team for 2022 were the #1 rider on their team 288 times – a surplus of 55 races.

We can repeat that same calculation for the other 17 World Tour teams and actually most teams have a surplus; 11 have at least 7% more leaders in their team than 2021 race days, another 5 are within +/- 3%, and only Lotto Soudal (6% fewer) and DSM (13% fewer) aren’t equal or with a surplus. Overall, the surplus is 12% at World Tour and 8% at Pro Tour level. This makes total sense. Teams tend to discard riders who don’t have the capacity to be leaders anymore and hire those that do as a natural progression of the sport. However, some teams are legitimately going to be squeezed for leadership opportunities in 2022 – even if we don’t see Covid related cancellations like the prior two years.

EF Education is probably the most over-subscribed in terms of leaders. They hired riders who were team leaders 102 times in 2021, but got rid of riders who were team leaders just 36 times in 2022 – a surplus of 66 races. Their issues might not be as extreme as represented here as many of their additions come from non-World Tour level teams and/or are developing riders who might need a year before becoming full-fledged leaders. In fact, only 77% of the 2021 leaders on EF came while racing for a World Tour level team (86% is the average for the full World Tour).

All Leaders Aren’t Equal

We need to account for the difference acquiring a leader like Giacomo Nizzolo (who finished 1st on his team 60% of races at World Tour level) and one like Marijn Van Den Berg (moving to aforementioned EF Education) who led his team in 45% of races at U23 level. If we arbitrarily assign a weight of 1x for leaders while riding for World Tour teams, 0.67x for leaders while riding for Pro Tour teams, and 0.33x for any other leaders, we can get a better idea of how much competition there will be for leadership roles. At EF, they now rank third with a surplus of about 18%. The World Tour in general averages a 3% surplus by this method.

Doing that weighting shows BORA and Jumbo Visma as the two with the most competitive leadership competitions. BORA ranked 4th best in adding talent through transfers per ProCyclingStats and 2nd best at adding talent by my projections. They added climbers Aleksandr Vlasov, Jai Hindley, and Sergio Higuita who combined to lead their team 62 times in 2021, and sprinters Danny Van Poppel and Sam Bennett who combined to lead 41 times. Sprints-wise, they should be fine as they’re also losing Peter Sagan and Pascal Ackermann (47 races as leaders) and Van Poppel has also said he’s switching to support Bennett.

Where BORA will see the squeeze is in general classification and hilly/mountain stage leadership. Just filtering to leadership in hilly/mountainous races, BORA rode 94 races in 2021, while their currently employed riders were leaders of their team in such races 146 times! That’s a greater than 50% surplus – far beyond any other World Tour team.

BORA’s 2022 squad with 2021 data from hilly/mountainous races

Flip that around to flatter/classics races and Jumbo Visma looks to be the team with the most issues with too many leaders. Despite moving star sprinter Dylan Groenewegen onwards, they’ve still a tight squeeze. They have a surplus of 36% due to adding Christophe Laporte (punchy sprinter), Tosh Van Der Sande (leadout man), and Tiesj Benoot (classics rider). What looks most likely is that those three will simply sacrifice more of their own ambitions to support Wout Van Aert in classics and young sprinters like David Dekker and Olav Kooij in flatter races.

I wrote in my 2022 team projections piece about DSM’s losses in the transfer market. They were especially hard hit in the climbers/GC riders department where they lost Michael Storer, Jai Hindley, Tiesj Benoot, and Ilan Van Wilder. Those four combined to lead in 34 of DSM’s 102 hilly/mountainous races in 2021 and the other transferred out riders combined for 9 more for a total of 42% of DSM’s races being led by riders leaving the team. They only added a sprinter – John Degenkolb – from a World Tour team, with the rest of their additions coming from lower level squads. Still on the team is Romain Bardet (leader in 63% of hilly/mountainous races he entered), but no one else who led in more than 20% of their hilly/mountainous. In races without Bardet, they’ll be handing out leadership opportunities to their wide array of young climbing talent and hoping for quick development.

Team DSM’s 2022 squad with 2021 data from hilly/mountainous races

Competition for Leadership vs Depth/Optionality

The flip-side of framing this as an issue of too many leaders is that talented riders who were leaders in smaller teams can now move up and support superstars like Van Aert. The team also has cover in case of injury; for Jumbo Visma, if Van Aert suffers an injury their spring classics season isn’t completely ruined as they can plug in competent classics riders like Benoot or Laporte.

BORA just released their preliminary plans for the three grand tours, but they also have the option within those plans to either leave off a rider who is struggling with form or choose to fully back a rider in strong form for GC. Between Buchmann, Vlasov, Kelderman, Hindley, and Schachmann they have riders who have finished 4th in Tour de France, 4th in the Giro, 2nd/3rd in the same Giro, and won a World Tour stage race in back-to-back years. And that ignores Higuita, Konrad, and Kamna who have won grand tour stages in the last three years. There’s definitely option value there in knowing that you can select the best of that bunch for your main focus in major races.

Projected Team Points for 2022

It’s not really the start of sports season if an analyst doesn’t produce projections, so I’ve whipped up some basic points projections for the cycling World Tour and Pro Tour teams.

A few points on methodology:

  1. I’ve used the PCS Points from ProCyclingStats.com at the rider-level to build these.
  2. I’ve built a very basic model for projecting points which only knows what a rider did the previous season, how old they are (age matters!), and whether their team is at World Tour or not. Only riders competing in the following year on a World Tour or Pro Tour (and ProConti for past years) level were modelled. Obviously what happened in 2019 and 2020 is relevant, but I will leave a more advanced model to next season.
  3. All riders with a Pro Tour or World Tour contract as of start of this January were predicted for 2022, with their projected points aggregated to determine the collective points projections for each team.
  4. That’s it. I did nothing to account for 2021 injuries, changes in how riders would be deployed across races, and any #gainz which may have occurred over the off-season. This is certainly wrong as many riders who missed large chunks of 2021 will race full schedules in 2022 (Caleb Ewan, Remco Evenepoel, etc.) and we’re already seeing injuries to riders like Mathieu Van Der Poel which will affect points earned in 2022.

Rider Level Projections

A model which just considers the previous year’s performance + age and level of team will tend to produce projections which closely match the rankings from the previous year. The same top five from 2021 is projected to be the top five in 2022, while young stars like Remco Evenepoel (17th in 2021 to 11th in 2022) and Ethan Hayter (26th in 2021 to 17th in 2022) are projected to improve their ranking. Older riders are projected to decline with Alejandro Valverde (12th to 37th) and Mark Cavendish (22nd to 32nd) being the sharpest expected declines.

Above graph describes how riders tend to retain points from year 1 to year 2. Peak age riders tend to regress about 20% or said another way they retain about 80% of their points the next year. Riders who score highest number of points tend to regress more in year 2, while those scoring closer to zero point in year 1 regress less. Younger riders tend to hold onto their points the most (though even highest point riders here tend to regress more). Older riders fall off significantly with a rider scoring 1000 points in year 1 at age 35 retaining more like 70% of their points in year 2.

However, these rider projections are fairly dumb; a projection system which ignores Mark Cavendish doing nothing for four seasons before resurrecting his career is probably not going to make great specific projections for riders. Where I hope the projections do well is at the aggregate team-level where the errors of predicting 25-30 individual riders can cancel each other out.

Team Level Projections

Based on individual rider projections/performances, I created three different team totals: 1) 2021 points earned by the team, 2) the 2021 points earned by the riders employed for 2022, and 3) the projected 2022 points earned by the riders employed for 2022. This way I can calculate who hired the best new riders vs who lost the best riders vs who has riders most primed to improve or decline. Delta due to rider development shows how riders are expected to earn points differently in 2022 vs 2021 due to age or regression. Delta due to Transfers shows how teams added either better or worse riders based on 2021 points. Eg, EF Education hired better riders based on 2021, while Lotto Soudal hired worse riders. However, Lotto is expected to improve due to age in 2022.

World Tour Team Projections for 2022

Of the 18 World Tour teams, I see EF Education improving the most versus 2021 – primarily due to transfers. They signed four top 200 riders in my PCS Points projections (equivalent to a ~top 10 rider on the average team) including Esteban Chaves (projected as their 2nd best rider in 2022). They have only lost three riders who signed with a Pro Tour or World Tour team – headlined by Sergio Higuita (103rd best rider in 2021).

BORA-Hansgrohe is another who looks set to improve significantly due to incoming transfers. They signed the 26th, 29th, 86th, and 88th best riders in my 2022 projections with Sam Bennett hoping to return to his ‘best sprinter in the world’ form. They also added the aforementioned Higuita and Aleksandr Vlasov. BORA loses two strong riders in Peter Sagan and Pascal Ackermann, but they should come out ahead on aggregate.

Lotto-Soudal and Team DSM should improve primarily from internal development of younger riders. DSM is by far the youngest team in the World Tour but has a lot of the early 20s riders who tend to increase significantly. DSM do have to deal with significant losses due to transfers as they were the hardest hit team in percentage terms. Lotto also has a lot of younger riders and do not have any significant regression candidates as their top scoring rider in 2021 was Tim Wellens at only 65th in PCS Points. Caleb Ewan will also presumably have a healthier season (21st and 11th in 2019-20 PCS Points).

My model also projects Quick Step to not lead the World Tour in PCS Points in 2022 (falling just short of UAE Team Emirates by 200 points). They have led in total points accumulated every year since 2013, but the model sees significant riders lost (Joao Almeida in particular ranked 5th in 2021) and significant decline from its current crop of riders (Cavendish in particular). However, the model doesn’t know Quick Step basically got half a season each out of two very promising young riders in Remco Evenepoel and Fabio Jakobsen. My bet is the Belgians manage to pull off their tenth straight #1 ranking by the end of the year.

Among Pro Tour teams, three teams stood out in 2021: Alpecin Fenix out-earned twelve World Tour teams, while Arkea Samsic and Team TotalEnergies earned points like the weakest World Tour squads. The projections see modest regression for both Alpecin and Arkea driven by regression for their top performers and not particularly strong transfers. Team TotalEnergies added Peter Sagan – once the best rider in the world – and should be improved by 30% due to their quality of transfers, but they also are a quite old team which means their gains will probably be more modest in the end.

Pro Tour Projections for 2022

Among others, Kern Pharma is a very young team which should improve due to aging of their riders. They also signed Hector Carretero from Movistar World Tour team who would’ve ranked third on their team in points in 2021. Along with that, they lose only a single rider from 2021.

Uno-X is a team which the projections aren’t particularly high on, but which may be able to improve in ways the models are ignorant of. They are adding Tobias and Anders Halland Johannsen – two elite U23 riders who finished 1st/2nd (Tobias) and 7th/8th (Anders) in the two major U23 races in France and Italy. The U23 points scales on PCS are probably underweighted relative to the difficulty of those races so the Johannsen’s are better positioned to earn points. Not accounting for new opportunities / lesser opportunities for transferred riders is another blind-spot of my model.

Impact of Aging on Performance

Like all physical competitions, cycling is impacted by aging. Younger riders improve their race craft, get access to better coaching/training, and physically mature. Older riders suffer injuries and physical deterioration and succumb to mental pressures of years spent training, travelling, and competing. Younger riders get faster and smarter. Older riders get slower and more worn-down.

Research on many team sports indicate varied “peak” ages for players between early 20s and 30s for different sports. For example, this Baseball Prospectus piece reviews three different approaches and finds somewhere between 26-28 as peak age for hitters. This CJ Turturo piece examines the impact of aging in NHL hockey and finds age 22 as peak for forwards, age 24 for defenders, and age 27 for goalies (part II of that document). Others in studies quoted by Turturo have found 25 for forwards, 22 for defenders, and 24 for all skaters. In 2013, I found golfers peak in their early 30s, which makes sense as golfer is less of an physically demanding sport compared to baseball or hockey. In a later study, I found different aging curves for different skillsets within golf.

I applied similar methodology to these studies above to identify the aging curve in cycling, from which we can derive a peak age and determine how much we should expect young cyclists to improve and old cyclists to decline. Using the delta method where a rider season is compared to the following rider season identified a peak around 26-27 with riders improving before that age and declining after that age. Using the GAM method where a curve is fit to all rider careers identified 26-28 as the peak with riders improving before those ages and declining after 28. The two methods differ in the steepness of the aging curves; delta method shows a steeper curve of improvement < age 25, while GAM method shows a less steep curve of improvement at those ages and a much sharper decline from age 35 onwards.

Methodology and Data

I gathered PCS points per season (raw total) for each rider between 2010 and 2021. PCS points are awarded for race finishes, GC finishes, and points/mountains jersey finishes. The top points scorers tend to reflect who is considered the top riders, but in my opinion they overweight success in one day races and underweight success in stage races (in the individual stages). Nevertheless, they are a well-accepted and discussed data point which is available consistently going back over a decade.

One thing to consider is accumulating PCS points is part performance and part opportunity. A rider who at age 22 races for a Continental level team as the leader in U23 races and at age 23 races for a World Tour team as a domestique will have fewer opportunities to earn points (though improved performance may cancel that out and there are always freaks like Pogacar and Evenepoel).

I also adjusted points earned in 2020 and 2021 to account for the impact of Coronavirus on races being held. 2020 had 14% fewer points earned and 2021 had 3% fewer points earned than an average season.

Important to note: I am using age on June 30th of that season as the age for that season when binning, but otherwise am using continuous ages relative to that June 30th date. Eg, Peter Sagan (January 1990 DOB) is considered as a discrete age 32 in 2022 (as he will be 32 on June 30th) and a continuous age of 32.4 in 2022 (as he will be 32 and 5 months on June 30th). Some other websites report current age and/or use discrete ages which will make ages look lower.

DELTA METHOD

For the delta method, I simply compared points accumulated by a rider in year 1 to those accumulated by that rider in year 2. I used the rider’s age on June 30 of year to determine the rider’s age for that season. The delta method just measures the change between year 1 and year 2, averages across all riders at that age, and ascribes the total average change to aging. My yearly age samples for seasons in the mid-20s were over 1500 and were over 500 for all seasons between 20-33 and over 100 for all seasons between 19-38.

Riders improved their PCS points from 19-20, 20-21, and 21-22 by an average of 88% (eg, 100 points to 188 points). Age 22-23 and 23-24 earned improvements of an average of 38%, followed by 15% from age 24-25. At that point, performance was fairly steady from 25-26 to 28-29 at between up 5% and down 4%. The peak age seems to be from 26 into 27.

Performance starts declining more significantly as a rider moves into their 30s (an average of -12% down for 29-30, 30-31, 31-32, 32-33, and 33-34). The sharper declines follow that, averaging 25% down from age 34 onwards.

Aging impacts at each age produced by Delta Method

GAM METHOD

For the GAM method, I built a non-linear model which aims to approximate the average aging curve for the full population of riders across their career. The model is in the form of PCS_PTS ~ s(age) + rider so that the overall model finds the average curve over a career; the rider term allows for the height of the curve to vary between massively successful riders like Froome and Cancellara and lower level riders who have scored few points. I included all seasons where riders were between 19 and 38 years old (the ages for which I had > 100 rider samples) and all riders with 4+ seasons in my 12 year sample (using 4+ or 6+ seasons did not impact results).

The aging curve produced was very similar to the delta method. What differed was that the growth curve for riders at 23-24 and under was also much shallower (average of 30% from 19-20, 20-21, 21-22 instead of the 88% from delta method and average of 15% from 22-23 and 23-24 instead of 38% from delta method). The decline curve was sharper after age 35 with 35-36, 36-37, and 37-38 meaning an average decline of 48% instead of 25% shown by delta method.

Aging curve at each age based on GAM Method

This GAM method graph should be interpreted slightly differently as the average progression for a rider throughout their career. The Delta method graph just shows the average change in points season to season at each age. Notably, older ages feature better riders (eg, age 34 is actually the peak for average PCS Points per season because you’ve filtered down to riders who have aged better than the average rider).

What does this mean for 2022?

Summarizing the results of these two approaches, we can see 1) riders tend to improve in earning PCS Points thru age 24 into age 25, 2) riders tend to earn similar PCS Points from age 25 through age 29, and 3) riders start declining in PCS Points earned from 30 onwards, accelerating from age 32-33 onwards.

Among top riders who are in that age 32-34 range we have sprinters like Elia Viviani, Giacomo Nizzolo, and Matteo Trentin, punchier riders like Diego Ulissi and Ion Izagirre, and climbers like Mikel Landa, Primoz Roglic, and Rafal Majka. The most prominent rider who switched teams over the winter was Peter Sagan who will be 32 for the entire season. Some of these riders will decline – some precipitously – while others will fend off age and produce just as strong as season as 2021.

In the aggregate though, these aging curves suggest teams which are more comprised of 30+ year old riders will fall-off more than those with younger riders. Among the World Tour teams, Israel Start-up Nation had the oldest roster in 2021 and now again in 2022 with their 30.8 year old average. Based on this aging curve, their riders are set to decline by 5% on average from their 2021 point totals. Since last year, their major additions were Nizzolo (age 33), Jakob Fuglsang (age 37), and Hugo Houle (age 31).

Team TotalEnergies races at the Pro Tour level and is the team which signed Peter Sagan (along with several of his mid to late 30s support riders). Their average team age ballooned from 28.9 in 2021 to 30.6 in 2022. They are also in-line for a 5% overall decline in their performance versus 2021. Those aren’t huge declines, but considering the salaries being paid to stars like Nizzolo, Fuglsang, and Sagan and the performances expected, they will be fighting against that current to produce.

Average Team age in 2021 and 2022

The younger teams most likely to improve collectively in 2022 mostly race at that Pro Tour level. Equipo Kern Pharma, Sport Vlaanderen, UNO-X, and Bardiani will all average under 25 years old in 2022. Those four are projected to improve just by aging by 7-9% in 2022 versus their 2021 performance.

However the most interesting team is Team DSM in the World Tour. DSM has added eight riders in 2022 – six of them under 25 – while they lost their two oldest riders from 2021. They are the only World Tour team with an average age under 27 in 2022 (25.7 years old). They are expected to improve collectively by around 5% versus 2021 performance by the aging curve. Riders like Kevin Vermaerke, Thymen Arensmen, Mark Donovan, and Andreas Leknessund all fit the bill of having previous World Tour experience + being aged 23 and under.

Difficulty of achieving GC results in different races

This is the time of year for cycling teams to plan their riders’ programs for the new year and for media/fans to speculate about which races riders will go for in 2022. Part of that process is trying to figure out where riders are best suited to get results – especially in the grand tours (of which we know all three routes now). On the horizon, there’s also been discussion around potential relegation of teams from the top level World Tour and how teams can best optimize their schedules to avoid that relegation.

A lot of the work I did with professional golfers was related to scheduling: where they would be able to play their best golf and where that best golf would be most rewarded by the arcane point system in professional golf. I’ve applied that type of approach below to identify: 1) how difficult it is to achieve different results in stage races and 2) where those results are disproportionately rewarded by cycling’s own arcane points system.

First, it’s not much more difficult to achieve results in the three three-week long grand tours than it is in the week-long stage races in the World Tour. Generally, it’s the same set of riders competing for those results whether it’s the Tour of the Basque Country or the Vuelta a Espana.

Second, grand tour success is heavily rewarded relative to other races. GC positions which are equally difficult to achieve can be rewarded 2x more in grand tours relative to those other World Tour stage races and sometimes 3x more in grand tours relative to other lower level stage races.

These two findings explain why teams and riders compete so much for minor top 10 placings in grand tours even when those minor placings are ~10 minutes back of the GC leader.

Difficulty of achieving GC results

The easiest way to compare the difficulty of achieving a GC result in one race vs another is to simply compare results within the same rider/season. Eg, Tadej Pogacar raced five stage races in 2021 coming in 1st in UAE Tour, 1st in Tirreno Adriatico, 3rd in Basque Country, 1st in Slovenia, and 1st in Tour de France. Based on those five finishes and completely ignoring any context around them, we might judge Basque Country race as the toughest as Pogacar failed to win there.

However, we have hundreds of similar comparisons between these races just from the last decade of results. 227 riders have ridden Basque Country and Tour de France in the same season in the last eight years. 123 have ridden Basque Country and Tirreno Adriatico, 36 have ridden Basque Country and UAE Tour, and 30 have ridden Basque Country and Slovenia. We can leverage those comparisons to judge the relative difficulty between each pair of two races.

Race difficulty comparisons for Tour of Basque Country (2014-21)

Above I’ve shown these aggregate difficulty comparisons for Tour of Basque Country and the ~40 races with at least 30 comparisons in 2014-21. They’re ordered by difficulty where the last column value is the expected finishing position in Race A (Basque Country) given a 5th place GC finish on Race B. Eg, if a rider finishes 5th in the Tour de France they would be expected to achieve an equivalent of 4.3 in Basque Country.

Pogacar’s 2021 races are highlighted in red where Tour de France is the toughest, UAE and Tirreno are similar difficulty to Basque Country (expected finishes of 5.2 and 5.5), and Slovenia is viewed as much easier with expected finish of 18th in Basque Country for someone finishing 5th in Slovenia.

This method confirms the primacy in difficulty of the Tour de France as every comparison race is easier to achieve results in than the Tour. However, it also shows the two other grand tours are not any more difficult to achieve results in than the bigger week-long World Tour races like Basque Country, Tour of Catalonia, Tirreno Adriatico, Paris-Nice, and the Dauphine. A 5th in the Giro d’Italia is worth about 5.6 in those five races on average. A 5th in the Vuelta a Espana is similarly worth about a 4.6 in those five races on average. A 5th in the Tour de France is worth a 3.8.

Scaling all races versus those five week-long stage races shows the following hierarchy:

Below I’ve included my top 20 GC riders entering the Tour of the Basque Country in April and whether they raced Basque Country and the three grand tours in 2021. 12 of the top 20 raced Basque Country – including the two clear best – while 14 of the top 20 raced the Tour de France, only 5 of the top 20 raced the Giro, and 12 of the top 20 raced the Vuelta.

UCI Points

There are two popular point system in professional cycling: the unofficial ProCyclingStats points – which I’ve referenced before – and the official UCI points – which determine team eligibility for the World Tour and other qualifications. The point system is explained well here by INRNG, but basically the UCI decides how to group races together and assigns them different points (eg, the Tour de France is its own category awarding between 18-25% more points for a given placings than the Giro or Vuelta and more than 3x more points than the minor World Tour stage races).

We can use those scales to make equivalencies between what the UCI thinks are similar finishes. Eg, a 12th place finish in the Giro or Vuelta is worth 8th place in Catalonia or Basque Country, 13th place in the Tour de France, and 7th in UAE Tour. Moving outside the World Tour races, that 12th place is worth 2nd place in a 2.1 stage race (Tour of Sicily, Route Occitanie) and 4th in a 2.Pro stage race (Arctic Race Norway, Tour of Denmark).

The UCI is saying 5th place in the Tour de France is worth 1st place in those big week-long stage races, 4th place in the Giro/Vuelta, and more than 1st place in every other stage race.

Rewards vs Difficulty

We can combine these two difficulty measures from my research and UCI point scales to find which races over and under-reward finishing highly. I use my research from above to find equivalent performances and then look to see how those are rewarded between races. Eg, I found it roughly equal in difficulty to finish 5th at Vuelta a Espana and Basque Country. However, Vuelta rewards finishing 5th with 2x the UCI points as Basque Country.

Relative reward of 5th in Basque Country vs equivalent finish in other races

Basque Country is rewarded between 50-75% as much for eleven World Tour stage races, but is rewarded 2x as much for the minor 2.1 level stage races like Valencia, Besseges, and Alpes Maritimes. These races attract difficult fields, but are shorter/lower level so they receive fewer points for equivalently difficult results.

The Sweet Spot

So earlier I said the sweet spot in terms of rewards were grand tours with the inverse being some of these week-long stage races at World Tour level. That is without factoring in the time the race takes (grand tours require 24 days of racing with rest leading in and recovery time leading out while the Basque Country is only 6 days of racing with many riders taking just a week before and after between races). Obviously if you’re optimizing at the team level with this data, you’ll factor greater time commitment for grand tours and the position of races on the schedule into planning.

Many of these effects are driven by different strength of fields in different races. I’ve shown GC ratings for riders before (others like PCS have similar rankings). Aggregating those ratings by race yields this plot where the x axis shows the strength of the riders in that race. Eg, Tour de France has the strongest field of GC riders, followed by the Vuelta. Part of the reason results are difficult to achieve in Catalonia and Basque Country are because they rank 3rd and 4th in strength of their riders. The Giro shows up as having a relatively weaker field (more comparable to the week-long stage races) which means the rewards in terms of UCI points are higher for equivalent positions.

Finding the Sprint MVPs

Professional cycling is a team sport, with a clearly defined roster for each race (startlist) and coaches directing strategy both pre-race and during the race. As such, teammates and the team a rider is on matter significantly for success. This is probably most apparent in the final kilometers of a bunch sprint race where teams jockey for position, attempting to deliver their fast man to the finish line in the best position to win the race.

Evaluating sprinters within this eco-system is difficult. Javi Angulo has a recent piece using the Glicko method (using head to head results) to rate sprinters where he rates top sprinter at end of 2021. Teun van Erp and Rob Lamberts (with an assist from multi-TDF winner Marcel Kittel) use video and power analysis to analyze the determinants of sprint performance in a 2021 study. Besides these detailed analyses, we can use stats like win rate/podium rate/average rank/PCS points won to judge sprint performance at a more superficial level.

But what about measuring the impact on sprint performance of teammates?

I’ve designed a handful of methods which could illuminate our knowledge on this subject. These methods all have clear flaws – most notably that we do not know team strategy/roles within team.

Simple win or loss calculations

At the most basic, we can assume teammates share equal credit for victories in bunch sprints (and equal penalty for not winning). Therefore we can calculate each rider’s team win rate in bunch sprints (I’ve included all 2.1/1.1 or higher races where 20+ riders finished within 3 seconds of the winner).

For 2021, the top results are above. If you follow World Tour rosters closely you’ll realize that all ten of these riders are Quick Step riders – the team which year-on-year dominates the sport. The results are not so Quick Step biased in every year, but they show impact of multi-collinearity (a fancy statistics way of saying that it’s difficult to tease out the unique impact of multiple different factors when you rarely observe them apart from each other).

Eg, in 2020, the top five riders are Arnaud Demare and his sprint train. At least three of them + Demare raced together in bunch sprints 24 times in 2020 (16 times with all five riders together). Demare raced in a bunch sprint just one other time and the other riders were in just five bunch sprints apart from Demare. So was Demare’s extreme success in 2020 (11 wins in 25 races) mostly his dominance, mostly his teammates dominance, or a mix?

We can look at larger samples of seasons to try to look at changes in team personnel. However, looking at Demare’s team going back to 2018 still sees 112 bunch sprints where 66 came with at least three of his sprint lieutenants in the lineup and another 35 with two of the four riders present. FDJ has raced 234 bunch sprints without Demare and won 4% of the bunch sprints vs 26% in 112 races with Demare.

With all four of his helpers, Demare has won 37%. With three of the four, he’s won 23%, with just two he’s won 17%, but with 0 or 1 he’s won 27%. So while it looks like more sprint helpers/more familiar helpers means better results, we’re not much closer to saying how important Demare is vs his helpers vs his team.

We can expand this analysis and just credit riders for team wins where they were present in the bunch at the end of the race. For example, Tim Declercq was only in the bunch for 5 of 24 bunch sprints in 2021 season vs 72% for Michael Morkov. Perhaps if we’re analysing who has the most impact on bunch sprints, we should ignore riders who weren’t present? Of course, it’s likely that without Declercq some of those bunch sprints would have turned into either breakaway victories or a late attack would’ve been launched or Quick Step would’ve burnt other riders essential to the sprint train.

Who Does the Team Trust?

Too much of sports analytics is results oriented, but there is loads of information that is not captured by the result. Teams are privy to training data, injury data, interpersonal relationships, and much else that we can access by looking at how team directors choose their lineups. Eg, analytical models based on results would probably tell you Cavendish shouldn’t have been in the start list for the Tour de France, but Quick Step had seen enough from Cav to think he was the best option to sprint.

We can look at which start lists to see who teams trust to race alongside their best sprinters. To do this analysis, I built a simple rating system for sprinters based on their finishing positions which easily discriminates between the best sprinters (eg, Bennett, Ewan, Van Aert in 2020-21) and also-rans. I then looked at how riders in a team were deployed alongside sprinters. Eg, if Quick Step sends Morkov to a race with Sam Bennett, but Pieter Serry to race with a lesser sprinter like Alvaro Hodeg that might say something in the aggregate.

For 2020-21, Sam Bennett raced with the highest quality best sprinter (obviously as he is one of the best in the world and was the best sprinter on his team in every race he participated in). More interestingly, Morkov was the clear 2nd place rider as he raced with Bennett in 42 of 62 bunch sprints (and 42 of 46 bunch sprints that Bennett participated in). At the bottom of the list are riders who were mainly deployed alongside weaker sprinters or in lineups without a clear sprinter like Honore, Serry, and Vansevenant.

For the full peloton, the top 10 of riders who were not the best sprinter on their team a majority of races is below:

We see Morkov appear along with three of Caleb Ewan’s dedicated sprint train from Lotto Soudal and four of FDJ’s previously discussed sprint train for Demare. Theuns typically rides with Stuyven and/or Pedersen and Consonni was the main support for Elia Viviani.

At the other end of the list are riders typically not deployed with top sprinters at least on teams that have one.

We can also simply look at which riders raced alongside the top sprinters in the highest percentage of bunch sprint races from 2020-21. Eg, for Sam Bennett the importance of Morkov is obvious as Morkov featured with him in 91% of bunch sprints vs 48% for the next highest Quick Step rider. Caleb Ewan relied on both Roger Kluge and Jasper de Buyst for 84%+ of his bunch sprints. The aforementioned FDJ sprint train represents four the eleven most common pairings between top sprinters and a teammate. The other standout was the pairing of Gaviria-Richeze for Team UAE. Richeze missed just one of Gaviria’s bunch sprints in 2020-21.

Advanced Statistical Models

Statistical models for teasing out multi-collinearity do exist. One promising approach used in sports like basketball or hockey is regularization (where coefficients are penalized using Lasso or Ridge regression). Running lasso regression on this type of data essentially produces coefficient estimates which are often zero if the model can’t determine that the term is significantly impacting the results.

To set up the data, I’ve filtered first for riders with 60+ bunch sprints in the last four seasons (2017-20) and then set-up a matrix with a 1 in the rider’s column if they were in that race or a 0 if not. This produces a matrix with over 400 riders. The regression is just run on a binary outcome of a win for the team or not (could run a similar regression of finishing position or podium as well). I’ve also controlled for quality of the best sprinter on the team for each race and the level (.1, Pro, or World Tour) of the race.

Training the model on 2017-20 and testing on 2021 yields interesting results with honestly some nonsensical results at the individual rider level (the top impact rider is Damien Howson – a climber who apparently improves his teams chances of winning a sprint from 6% to 17%). Many of the other top impacts seem believable; eg, I could believe Davide Ballerini increases his team’s chances of winning from 6% to 12% because someone has to be responsible for Quick Step’s incredible ability to win sprints year-after-year.

Turning to the predictions at race level in 2021, Quick Step’s Volta ao Algarve squad in stage 1 was seen as best positioned to win a race of the season. That is driven by Sam Bennett being really good, but also the model viewing riders like Jakobsen, Morkov, Asgreen, Archbold, and Ballerini as all having strong impact on winning. They are viewed with 47% probability of winning stage 1. With Bennett, but with six neutral teammates, they would have only 15% probability.

To evaluate the model I compared it to the predictions with a neutral model (just considering ability of best sprinter). If this model says anything valuable, we’ll see a difference between the quality of prediction for model and the neutral model. We can also compare to a completely naive model which just assigns every team a probability of 1 / N teams in race.

MetricLasso ModelNeutral ModelNaive Model
Mean Square Error0.03930.04140.0445

The mean square error of these three different predictions show the neutral model improves on the naive model by 0.031 or about 7% improvement towards perfect. So knowing how good the best sprinter on the team is vs knowing nothing is worth about 7% gain.

The Lasso model which judges impact of individual riders is worth another 4.5% of improvement over the neutral model. So knowing the impact of all riders on probability of winning is about two thirds as valuable as knowing the ability of best sprinter.

No correlation between metrics

My sense is this model is not ready for primetime as there is no correlation between the coefficient produced by the lasso model and how often riders are deployed with the best sprinters. It seems likely that judging a rider’s ability to impact sprint results based off how their team deploys them yields the most useful information.

Evaluating Riders: Log Rank

Evaluating rider performance in professional cycling is a hard problem. While more advanced statistics like climbing times, segment times, survival with leading group, and others are available for certain races and certain riders, for most races and certainly for anything historical we’re left with something like this PCS result table: finishing rank in race, maybe UCI points, PCS points, and time gaps.

So any rider performance statistic has to be based on one of those three data-points: time gaps, points, or finishing rank. Each has its place.

Time gaps are a very poor way to evaluate success in a bunch sprint where 100 riders might finish on the same time, but they can be a good way to evaluate success on a mountain stage with an uphill finish.

PCS points have been developed into a widely used evaluative method which recognizes that success in cycling can be achieved in a wide array of competitions (GC, race wins, jersey competitions) and has dozens of different scales which are used for different quality of races, but fundamentally their point scales are opinions on the value of different results relative to each other.

Finally, finishing rank is often used to count victories, podiums, or top 10 finishes across the season, but is plagued by vastly different difficulty levels to achieve certain results (how good is 3rd in a World Tour race relative to 1st in a .1 race?). Ranks are often notoriously difficult to take averages of; Wout Van Aert’s transcendent 2021 Tour de France yielded an average rank of 25th for a return of 3 stage wins, just behind Enric Mas’s 6th on GC with nary a stage podium finish.

In recent months, I’ve developed my own tweaks to use finishing rank as an evaluative method, producing a stat I’m calling Log Rank. The handful of keys to make it work are:

  1. All finishing ranks in a race are transformed by taking the natural logarithm. This produces a value system where the difference between finishing 1st vs 5th are large, while the difference between finishing 50th vs 100th is not as large. The red dots below show equal gaps between results; so 1st and 3rd are separated about as much as 3rd and 7th/8th. However 1st and 7th/8th are separated equally as 7th/8th and 55th. I think this is a fairly intuitive appraisal of the value of different finishing positions.

2. Using these transformed ranks, taking averages are much easier. For example, Wout Van Aert’s final week of Tour de France where he finished 25th, 40th, 36th, 43rd, 1st, 1st (average 24th) are transformed into 3.2, 3.7, 3.6, 3.8, 0, 0 (average log rank of 2.4) which can be re-transformed back into average rank of 11th (by taking e^x where x = average log rank). Basically, this says we care way more about Van Aert’s two victories than the fact he finished outside the top 20 in those other races. In fact, he could have finished 50th in those four stages (new average of 34th), but his log rank would only change to 13th.

3. The difficulty of different races are found by an objective system which looks at how difficult it is to achieve certain results in different level races. For example, in recent seasons it is roughly similar difficulty to achieve a 10th place in an U23 2.2 level race as a 27th place in a World Tour race. Using a host of these type of comparisons, I’ve created a Strength of Peloton rating system to judge all level of races against each other based on the difficulty to achieve certain levels of results. All that needs to be said here is that results are adjusted based on what type of races they were achieved in. For example, Ethan Hayter and Tadej Pogacar achieved very similar raw finishing ranks in 2021, but Pogacar did so against the 4th toughest pelotons and Hayter only around the 600th toughest.

2021 Log Rank Rankings

Applying those three steps yields the following top 10 for all 2021 results, just averaging all race results (ignoring time trials):

RiderAverage Log Rank
Wout Van Aert4.3
Tadej Pogacar4.9
Mathieu Van Der Poel5.0
Primoz Roglic6.9
Sam Bennett8.3
Sonny Colbrelli9.2
Ethan Hayter10.0
Jasper Philipsen10.2
David Gaudu10.4
Julian Alaphilippe11.6

Building on Log Rank

The next challenge was to build on this basic Log Rank to add in parcours level impacts of things like the climbing difficulty and whether the race ended in a bunch sprint. For example, Enric Mas raced 66 times on the road in non-time trials in 2021. If we’re judging how good of a rider he is we probably don’t care about where he finished in the flatter stages which littered the Tour de France and Vuelta a Espana. However, we care a lot about how he performed in the tougher climbing stages of those races and others.

The find the impact of climbing difficulty and a bunch sprint finish I set-up a mixed effects model which can be run over results from a given period of time (eg, July 2019 to June 2021 to predict performance going into the 2021 Tour de France). The model was specified using three random effects involving individual riders attempting to find a) their general level of ability to finish with a good finishing rank in races b) the impact of climbing difficulty on their finishes, and c) the impact of the race ending in a bunch sprint on their finishes.

lmer(log_rnk ~ (1 + climb_difficulty | rider) +
 (0 + bunch_sprint | rider)

Using this model, we would expect a sprinter like Sam Bennett who struggles in the hills and mountains, but generally ranks highly in terms of finishing rank to have a smaller individual coefficient (indicating that he generally achieves high finishes), a larger climbing difficulty coefficient (indicating that as races get tougher in terms of climbing his finish rank get larger/worse), and a negative bunch sprint coefficient (indicating that he finishes with better ranks when the race ends in a bunch sprint vs smaller group).

The model results for July 2019 to June 2021 show Bennett with about the 50th best general ability to finish highly (a above), the 20th worst impact of climbing difficulty (b above), and the 2nd best bunch sprint impact (c above). Overall, he would be expected to finish with an average rank of 3.7 in a flat, bunch sprint race – 2nd best in world between Wout Van Aert (3.1) and in front of Caleb Ewan (3.8).

We can similarly look for hilly races not ending in bunch sprints (prototypical classics race) where Mathieu Van Der Poel had the best prediction at that time at 6.8 – essentially tied with Wout Van Aert – and ahead of Roglic, Pogacar, Van Avermaet, and Alaphilippe.

The top predictions in high mountains race were unsurprisingly the three main recent grand tour winners: Pogacar, Roglic, and Bernal. They were followed by Mikel Landa and Adam Yates.