Supercomputer predicts 2026 Cheltenham Gold Cup result

irishracing.com news

Published at Mar 9, 2026, 13:35

Cheltenham 14 March 2025 Inothewayurthinkin and Mark Walsh win The Gold Cup
© Healy Racing Photos

Later this week, Cheltenham racecourse will be filled with the sound of thundering hooves and raucous spectators, as some of the world’s best National Hunt horses will be contending across 28 races.

Punters will of course be looking at all the different permutations and combinations to predict the winner of the prestigious Cheltenham Gold Cup, especially with Willie Mullins trained Fact To File now set to run in the Ryanair chase, as well as the withdrawal of two-time Gold Cup winner Galopin des Champs. The experts at irishracing.com have fired up their supercomputer to assess which horse has the best chance of winning the Gold Cup.

The Algorithm Driving The Supercomputer

The data experts at irishracing formulated a complex and industry leading algorithm to best assess the chances of the competing horses in the 2026 Gold Cup.

The algorithm processed the following steps before running the simulations and generating the winning probabilities of all competing horses.

1. Data Collection & Extraction

The project utilized data sourced from IrishRacing, consisting of two primary groups:

Historical Winners: Data on past victors of specific Cheltenham races to establish “success profiles.”

Current Contenders: The 2026 entries for whom we calculated winning probabilities.

Data was initially stored in nested JSON structures.

We developed a custom Python extraction pipeline to:

-Recursively read JSON files from a directory.

-Dynamically identify race-specific columns (e.g., using regex to find “Form ge X furlongs”).

-Flatten complex nested attributes into a tabular format.

2. Pre-Processing & Feature Engineering

Before modeling, raw data underwent extensive cleaning and transformation:

Odds Conversion: Fractional starting prices (e.g., “5/1”) were parsed and converted into Decimal Odds for mathematical processing.

Weight Normalization: Horse weights were converted from the traditional stones-pounds format into kilograms.

Feature Extraction: * Extracted Horse, Sire, and Dam countries from text strings.

Derived a Distance Fitness boolean (is_dist_fit) to indicate if a horse has previous form over the race’s specific distance.

String Cleaning: Used regex and iterative replacement to clean race names from filenames, ensuring 100% consistency for grouping.

3. Machine Learning Methodology

The model utilizes a Relative Profile Matching approach based on Cosine Similarity and Softmax Normalization.

A. Feature Vectorization

We combined numerical features (Age, Betting Position, Odds, Weight) with categorical features (Breeding and Trainer countries). To make these “machine-readable”:

One-Hot Encoding (OHE): Categorical data was converted into binary vectors to ensure the model doesn’t assume an arbitrary order between countries.

Standardization: Using StandardScaler, we normalized all features to a mean of 0 and a standard deviation of 1, preventing high-value features (like odds) from dominating the calculation.

B. Weighted Nearest Neighbors

Rather than comparing a horse to an “average” winner, we used a Nearest Neighbors approach.

Cosine Similarity: We calculated the angular distance between each current contender and every historical winner.

Max Similarity Score: Each horse was assigned a score based on its “closest” historical match. This acknowledges that there is more than one way to win a race (e.g., a “Young Improver” profile vs. a “Veteran” profile).

Feature Weighting: We applied manual weights to emphasize the most predictive variables, specifically giving the highest priority to Betting Position (0.9) and Distance Fitness (0.5).

C. Intra-Race Probability (Softmax)

To turn similarity scores into a “Chance of Winning,” we applied a Softmax function localized to each specific race name.

This ensures that the total probability for all horses in a single race sums to 100%.

This approach accounts for the “strength of the field”; a horse might be an elite match for history, but if it is racing against five other “elite” matches, its individual chance of winning is lower.

4. Final Output

The final model produced a Winning Chance Score (1-100) for all the competing horses at the races.

Supercomputer predictions for Boodles Cheltenham Gold Cup

As per our supercomputer’s simulations, the Ben Pauling trained The Jukebox Man is the favoured horse with a winning probability of 12.96 percent. The simulations point to a closely contested battle with Willie Mullins’ Gaelic Warrior, who has a win probability of 12.71 percent. Jango Baie and Banbridge shouldn’t be far away either, with winning chances of 11.81 percent and 11.27 percent respectively.

Last year’s winner of the Gold Cup, Inothewayurthinkin, is given next to no chance of repeating the feat by our supercomputer with a winning probability of 0.09 percent.

Latest Stories which may interest you

Supercomputer predicts 2026 Cheltenham Gold Cup result

irishracing.com news

The Algorithm Driving The Supercomputer

1. Data Collection & Extraction

2. Pre-Processing & Feature Engineering

3. Machine Learning Methodology

4. Final Output

Supercomputer predictions for Boodles Cheltenham Gold Cup

Latest Stories which may interest you

Two Jessicas combine to win Curragh finale

Midnight Dusk leaves his rivals for dust

Murtagh Revelling in another Derby winner

English runner Estrange wins Pretty Polly as Oaks heroine beaten