Advanced Horse Race Tactics using Trakus Coordinate Data

The impact of drafting percentage, path efficiency, race strategy and speed fluctuation on horse race tactics.

Abstract artwork

Author’s Note

This post first appeared as a Kaggle notebook submission for the Big Data Derby 2022 analytics competition on 10 November 2022, where it won the 3rd place prize.

Executive Summary

In this report, we will:

  1. Extract four performance-impacting features (drafting proportion, path efficiency, race strategy, speed fluctuation) from the coordinate data
  2. Analyse these features in relation to race distance and race surface/going
  3. Determine the predictive power of these features using a Multinomial Logit model
  4. Recommend racing strategies which we think will help horse owners, trainers and jockeys

Key Takeaways

  • Jockeys and trainers should choose race tactics that prioritise increasing path efficiency and decreasing speed fluctuation over drafting percentage and racing strategy.
  • Public odds significantly undervalue horses with lower speed fluctuations. If these undervalued horses can be identified before the race, this inefficiency can be exploited by placing exotic bets covering these horses.

Introduction

The most groundbreaking information provided in the NYTHA/NYRA dataset is the Trakus coordinate data provided for each horse during the race. Our focus is to extract features from the coordinate data which will help horse owners, trainers and jockeys make better decisions on their racing strategies.

The four features we will generate and analyse are the drafting proportion, path efficiency, race strategy (early runner or sustainer, etc.) and speed fluctuation of a horse during a race. Although horse trainers and jockeys may not be able to control exactly how much drafting benefit they receive, jockeys can certainly attempt to choose strategies that offer more drafting opportunities.

We supplement the Trakus coordinate dataset with the Big Data Derby 2022: Global Horse IDs and places dataset compiled by Mark Green.

Data Preprocessing

Coordinate Reference System Transformation

We use GeoPandas to transform the Trakus coordinate data from the latitude/longitude coordinate reference system based on the Earth’s center of mass that uses degrees as its measurement unit (ESPSG:4326) to a coordinate reference system centered on New York Long Island that uses metres as its measurement unit (EPSG: 32118). This transformation enables more accurate distance calculations.

Finish Line Truncation

The first Trakus sample provided for each race represents the starting position of the horses at the beginning of the race run-up. However, the final Trakus sample of each race does not represent the point where the horses cross the finish line, but is quite some time after the race finishes. The coordinates of the horses after they cross the finish line are irrelevant because the race has already finished.

To remove all coordinates where the horse has already finished the race, we sketch each horse’s path, intersect it with the finish line, and remove the coordinates past the finish line. To determine the location of the finish line, we pinpointed the exact coordinate location of the line on Google Maps.

This is what it looks like:

Feature Engineering

We outline four features generated from the Trakus coordinate data that we use to determine horse performance:

  • Drafting Percentage: percentage of the race where a horse has a drafting benefit.
  • Racing Strategy: whether a horse prefers to lead from the front, settle in the middle of the pack, or start slow.
  • Path Efficiency: the efficiency of the path the horse takes around the course.
  • Speed Fluctuation: the amount of variation in velocity the horse experiences once the horse has settled into the race.

How much drafting benefit does each horse receive?

To generate a feature representing drafting benefit, we need to look at the fundamentals of drafting. The reduction of aerodynamic drag from drafting is an essential element of horseracing because horses are able to save energy which they can expend closer to the finish line. Drafting occurs when a horse follows closely behind another horse.

The coordinate data helps us look slightly ahead of each horse to see if another horse occupies that space. We draw a circle 3 metres ahead of the horse’s current coordinate position and if another horse’s coordinate lies inside the 2 metre radius circle, then there is a drafting benefit.

In the figure below:

  • red points represent horses with draft benefit
  • black points represent horses without draft benefit
  • yellow circles represent look-ahead regions used to check for the presence of drafting
Make this Notebook Trusted to load map: File -> Trust Notebook