General Classification Model

Winning the GC in a stage race should be considered as distinct from stage-by-stage success. Twelve of the last 30 TDF winners have done so with zero or one stage wins and eleven of the last 30 TDF winners have done so without winning a non-time trial stage.

To measure a rider’s GC ability, I collected results from the major stage races for the last 30 years. Riders were awarded points based on finishing positions with five different (arbitrary) scales for groups of races (A: Tour de France, B: Vuelta/Giro, C: Switzerland/Dauphine/Paris-Nice, D: races like Tour of Catalonia, Tirreno-Adriatico, Tour of Basque Country, etc, E: all other races of significance to predicting a grand tour GC). That final point is critical; stage races without apparent ability to transfer to winning a grand tour GC (Tour Down Under, Four Days of Dunkirk, old Dubai Tour) were not considered.

Which races are most strongly correlated with Tour de France results?

I matched all rider results (for their careers) in GC races with all of their Tour de France results (for their careers). Eg, Chris Froome’s 2018 Giro victory is matched with each of his Tour de France entries. After filtering for riders with 15+ GC races and at least one GC victory, I grouped by race (Giro, Vuelta, etc) and ran a Spearman correlation of GC results in that race with GC results in the TDF. Spearman correlations measure how strongly one ranking is associated with another ranking.

Most strongly associated with TDF results:


Results in the Tour de France (in other years) is most predictive of results in other TDF years. Closely following are the other two grand tours and the two most important Tour de France warm-ups (Dauphine/Switzerland). The strong correlation of the Tour of California is surprising, but that was dominated by American riders like Floyd Landis and Levi Leipheimer in the early years and two TDF winners have also won (Egan Bernal and Bradley Wiggins).

The other major week-long races follow at various degrees of moderate to weak correlation. It’s clear that winning a grand tour is something distinct from winning a week long or five day long race. The non-existent correlation of the Tour de l’Avenir (the U23 Tour de France) is interesting; Egan Bernal’s 2019 Tour de France victory was the only one won by a Tour de l’Avenir winner since it became a U25 only race in 1992.

Calculating GC Rankings

The points scales were designed with A races awarding 25 points to winner, B races 20 points, C races 12 points, D races 10 points, and E races 8 points. The value of points decays over time with a weight equal to 1 / (days_since + 730) and a five year window (meaning results since the 2014 Tour de France would be considered when predicting 2019 Tour de France).

I looked at numerous calculations methods (average points per races, total points, best performance, ignoring weighting, etc), but settled on counting just the top five results for each rider. Eg, for Egan Bernal entering the 2019 Tour de France his top five results were #1 in 2019 Switzerland, #1 in 2018 California, #1 in 2019 Paris-Nice, #3 in 2019 Catalonia, and #2 in 2018 Romandie. Bernal ranked 3rd best behind Geraint Thomas and Vincenzo Nibali and ahead of Jakob Fuglsang and Nairo Quintana going into last year’s Tour.

Tour de France Competitiveness

Last year’s Tour was one of the most wide-open with the fourth lowest point total for the #1 ranked rider since 1992 (only 2007, 2006, and 1999 were lower). The previous year’s Tour was the peak in this regard with Chris Froome coming off four wins in five years + holding the other two grand tour titles. Froome eclipsed Miguel Indurain in 1993, 1994, and 1995 and Alberto Contador in 2011.

The most competitive Tour in terms of the average points for the top 15 riders was 2016. All of the top 10 riders in GC points at the time of that Tour started the race (though this did not turn out to be a particularly competitive race with both Nibali and Contador performing poorly).


Above is a graph of the top five riders on each TDF startlist in terms of their GC points ranking. The peaks of Indurain, Armstrong, Contador, and Froome are visible, as well as the weaker transition periods that accompanied both Armstrong and Indurain’s departures from the sport.

Should the 2020 Tour de France actually run as scheduled, we would be in store for another very competitive race with no clear favorite and Froome, Bernal, Primoz Roglic, and Thomas all within less than 10 GC points of each other – similar to the competitive situation last year.

Predicting Tour de France success

Predicting podium success even for the best rider entering each year has not been a slam dunk. Since 1992 only 18 of 28 ‘best GC riders’ have finished on the podium – though the list of failures has been mostly among the weaker ‘best GC riders’ during those transition periods – including Alex Zulle in 1997 and 1998, Damiano Cunego in 2006, and Alexandre Vinokourov in 2007.

A logistic regression model predicting podium success using 1) the natural log of the GC rank entering the race, 2) the best GC performance for each rider, and 3) whether the rider had riden the Giro d’Italia showed each as significant predictors.

The coefficients showed a rider who entered with #1 ranking, had won the previous year TDF, and had ridden the Giro (the situation Chris Froome found himself in in 2018) would be expected to finish on the podium about 61% of the time. If not riding the Giro, that number would be 75%.

A rider in Egan Bernal’s position going into 2019 (#3 ranking, best finish having just won the Tour of Switzerland, and not having ridden the Giro) is predicted for about a 32% chance at the podium.

The limitations of this model are obvious:

  1. It doesn’t take team orders into account. Roberto Heras ranked 3rd, 2nd, and 4th in GC ranking in the 2001-2003 years riding in support of Lance Armstrong while never sniffing the podium. Also, riders who enter aiming for stages after big efforts in the Giro (eg, Simon Yates and Vincenzo Nibali in 2019) will be overrated.
  2. Abandons and DNFs are considered equally alongside finishes outside the top 3. Who knows what Chris Froome’s form would have been if not knocked out in 2014, but all the model sees is the #1 rider in the world defending his 2013 victory as not finishing on the podium.
  3. Young riders are probably underrated; Ullrich entered the 1996 Tour de France with no GC success in his career (though he already had a 3rd place in the World Championship Time Trial at age 20), but finished 2nd easily to team leader Bjarne Riis.

Comparing with Archived Odds

Sports Odds History has pre-race odds available dating back to 2009 from Westgate sportsbook.

The top three favorites (ML odds):

2019: Thomas (+225), Bernal (+550), Fuglsang (+550)

2018: Froome (+150), Porte (+400), Quintana (+800)… Thomas was +1400

2017: Froome (+125), Porte (+175), Quintana (+600)

2016: Froome (+110), Quintana (+200), Contador (+450)

2015: Froome (+175), Quintana (+225), Contador (+350)

2014: Froome (-111), Contador (+150), Nibali (+900)

2013: Froome (-154), Contador (+250), Joaquim Rodriguez (+1800)

2012: Wiggins (+110), Evans (+200), Menchov (+1600)

2011: Contador (-167), A. Schleck (+210), Evans (+2000)

2010: Contador (-200), A. Schleck (+700), Armstrong (+800)

2009: Contador (+100), Armstrong (+250), Leipheimer (+400)

The podium model matches the bookmaker favorite every year except 2015 (Contador #1, Froome #2, Nibali #3, Quintana #4) and 2016 (Nibali #1, Froome #2, Contador #3, Quintana #4).

Used as a diagnostic tool to judge which riders should have been considered favorites in each race, this GC model has value. As a predictive method looking ahead, certainly less so than evaluating the current bookmaker prices ahead of each race.

One thought on “General Classification Model

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s