Update on Back End System development
I am finished with the development of the back end system
for UK Horse Racing, and am over the moon with what I’ve added in the last
week, more on that later.
It’s a bit of a beast, with over 7100 individual query steps
in the database, and 10Gb of data across all of the tables.
There is data back to 1st January 2012, which
means there are over 400,000 individual runners over that period.
In terms of what’s left to do, there is one big section that
I haven’t completed yet, that’s the advanced pattern spotting section.
I don’t think it’s essential for the moment and it’s
something I need to give more thought to how I implement it, so am happy with
it being left for the moment, we will do it at some stage however.
There are other more advanced concepts that I’d like to
build, such as a proper neural network to process the data, but not a priority
for now.
All that’s left really is a final run through of all the
tables and processes to ensure we have data integrity and there are four micro
strategy areas that I want to expand upon,
The four are:
Micro Strategies
ACE Strategies
100% Strategies
Final Strategies
They all work in a similar way but happen at different
stages of the process and therefore make use of different data.
It will add significant value to the process as a whole to
expand upon them and I will do so over the next week to ten days, it’s not
significant development, just one query for every strategy I add.
The Final Strategies query uses the steps I’ve added in the
last week, and has some very successful strategies, here is the summary table.
|
Metric ID
|
Runs
|
Wins
|
Places
|
Win %
|
Place %
|
|
ZZ001
|
9
|
9
|
9
|
100.00
|
100.00
|
|
ZZ002
|
422
|
352
|
380
|
83.41
|
90.05
|
|
ZZ003
|
52
|
45
|
48
|
86.54
|
92.31
|
|
ZZ004
|
50
|
48
|
49
|
96.00
|
98.00
|
|
ZZ006
|
46
|
36
|
40
|
78.26
|
86.96
|
|
ZZ007
|
29
|
26
|
28
|
89.66
|
96.55
|
|
ZZ008
|
11
|
11
|
11
|
100.00
|
100.00
|
|
ZZ009
|
11
|
11
|
11
|
100.00
|
100.00
|
|
ZZ010
|
33
|
31
|
31
|
93.94
|
93.94
|
|
ZZ011
|
15
|
15
|
15
|
100.00
|
100.00
|
|
ZZ012
|
15
|
15
|
15
|
100.00
|
100.00
|
|
ZZ013
|
42
|
37
|
39
|
88.10
|
92.86
|
|
ZZ014
|
152
|
126
|
136
|
82.89
|
89.47
|
|
ZZ016
|
132
|
125
|
127
|
94.70
|
96.21
|
|
ZZ017
|
176
|
160
|
165
|
90.91
|
93.75
|
|
ZZ018
|
183
|
165
|
171
|
90.16
|
93.44
|
|
ZZ019
|
70
|
62
|
68
|
88.57
|
97.14
|
|
ZZ021
|
32
|
29
|
31
|
90.63
|
96.88
|
|
ZZ022
|
28
|
24
|
25
|
85.71
|
89.29
|
|
ZZ023
|
29
|
29
|
29
|
100.00
|
100.00
|
|
ZZ024
|
22
|
20
|
22
|
90.91
|
100.00
|
|
ZZ025
|
178
|
158
|
168
|
88.76
|
94.38
|
|
ZZ026
|
16
|
16
|
16
|
100.00
|
100.00
|
|
ZZ027
|
7
|
7
|
7
|
100.00
|
100.00
|
|
ZZ028
|
10
|
10
|
10
|
100.00
|
100.00
|
|
ZZ029
|
16
|
16
|
16
|
100.00
|
100.00
|
|
ZZ030
|
14
|
14
|
14
|
100.00
|
100.00
|
I’ve only been able to run it for a few weeks worth of
history data, so the number of runs are a little small at the moment.
Once I am happy that we have enough micro strategies then
I’ll start the lengthy process of re-running all of the historic data.
I’ll do that mostly overnight, and will keep it separate
from the main process so we can run both in parallel and bring them together
when the historic data process reaches the present day.
In terms of what I’ve added in the last week or so, it’s
proving to be very good as you can see from the above table.
What it does is use a combination of some of my favourite
elements:
a)
Creating our own metrics/ratings using the underlying data we
have
b)
Creating blended metrics that consolidate multiple metrics or
events together
c)
Creating a metric store of extreme metrics – for instance
where the horse has the highest speed rating in the race, or for the day. These
are then pooled, and win % are calculated.
d)
Using the metric store data in combination, to create double
metrics, either using the same data or with different metric stores.
All of the above is processed overall and for similar races,
using a scenario id that I apply to every race. I haven’t used the scenario
related data much in the current process, because we don’t have enough history
processed for the new steps to make it useful, I’ll look at that again when the
history run has completed. In theory the scenario data should be more accurate
because it looks at:
The type of race – flat or jumps
The distance of the race
The going
The Class of the race
Special features – such as just for Two Year Olds.
So speed ratings might be more relevant to shorter races for
example.
The idea for this final process came from the mega ratings I
created, one of that groups I called meta , and build it from some of the
output from the metric store processes.
This is going beyond the underlying raw data and is based on
the win% of the metrics in the store and how many apply to a particular horse.
In this respect it’s a bit like a neural net when you put it
all together, because it “learns” what is effective but constantly updating the
win%.
I create a number of these meta metrics and then use them in
conjunction with other raw data to create double metrics.
I run three processes:
A)
Banker – this uses the best attributes for the horse – won
last race, speed superiority over others in the race etc
B)
Perfect – This is a special set of metrics from each data
store, picking only those with the highest win %
C)
Winning – This uses the meta metrics that I created in the
mega rating process.
By creating these double metrics we are making the most of
the meta metrics we’ve created and the best of the other data that we hold.
It does create a lot of metrics, if you have 50 in one set
and 100 in the other, then that gives you 5000 possible combinations, but when
we count how many apply to each horse, we only count them if the win% for that
combination is above 85%, so we’re only counting the most significant ones.
As an example from the Perfect Combination table, here are
the ones with the most runs
|
Metric ID
|
Runs
|
Wins
|
Places
|
Win %
|
Place %
|
|
D169402
|
612
|
608
|
612
|
99.34641
|
100
|
|
D172602
|
348
|
318
|
326
|
91.37931
|
93.67816
|
|
D216402
|
346
|
318
|
326
|
91.90752
|
94.21966
|
|
D169425
|
331
|
327
|
331
|
98.79154
|
100
|
|
D168602
|
326
|
310
|
326
|
95.09203
|
100
|
|
D169418
|
326
|
324
|
326
|
99.38651
|
100
|
|
D169417
|
306
|
304
|
306
|
99.34641
|
100
|
|
D169414
|
292
|
288
|
292
|
98.63013
|
100
|
|
D169416
|
288
|
288
|
288
|
100
|
100
|
|
D167202
|
284
|
258
|
270
|
90.84507
|
95.07042
|
|
D169202
|
282
|
250
|
270
|
88.65248
|
95.74468
|
|
D168002
|
280
|
266
|
280
|
95
|
100
|
|
D169002
|
266
|
254
|
262
|
95.48872
|
98.49624
|
|
D168802
|
266
|
254
|
262
|
95.48872
|
98.49624
|
|
D173402
|
260
|
248
|
250
|
95.38461
|
96.15385
|
|
D152402
|
254
|
246
|
246
|
96.8504
|
96.8504
|
|
D159602
|
220
|
210
|
220
|
95.45454
|
100
|
|
D159802
|
184
|
176
|
180
|
95.65218
|
97.82609
|
|
D172617
|
174
|
159
|
163
|
91.37931
|
93.67816
|
|
D172618
|
174
|
159
|
163
|
91.37931
|
93.67816
|
|
D216417
|
173
|
159
|
163
|
91.90752
|
94.21966
|
|
D216418
|
173
|
159
|
163
|
91.90752
|
94.21966
|
|
D169219
|
172
|
164
|
172
|
95.34883
|
100
|
|
D168619
|
170
|
154
|
170
|
90.58823
|
100
|
|
D160002
|
164
|
162
|
164
|
98.78049
|
100
|
|
D213802
|
162
|
138
|
148
|
85.18519
|
91.35802
|
|
D149802
|
160
|
136
|
146
|
85
|
91.25
|
|
D168625
|
156
|
143
|
155
|
91.66667
|
99.35898
|
|
D169403
|
155
|
155
|
155
|
100
|
100
|
|
D214002
|
152
|
132
|
140
|
86.84211
|
92.10526
|
|
D214202
|
152
|
132
|
140
|
86.84211
|
92.10526
|
|
D150002
|
150
|
130
|
138
|
86.66666
|
92
|
|
D150202
|
150
|
130
|
138
|
86.66666
|
92
|
|
D172625
|
148
|
142
|
145
|
95.94595
|
97.97297
|
|
D216425
|
148
|
142
|
145
|
95.94595
|
97.97297
|
|
D168618
|
145
|
145
|
145
|
100
|
100
|
|
D168614
|
144
|
144
|
144
|
100
|
100
|
|
D169419
|
144
|
144
|
144
|
100
|
100
|
|
D169225
|
142
|
131
|
137
|
92.25352
|
96.47887
|
|
D168617
|
139
|
139
|
139
|
100
|
100
|
|
D167225
|
138
|
131
|
135
|
94.92754
|
97.82609
|
The really exciting thing about this, is how much potential
it has for our FOREX development.
From here I will be working on three strands
1)
Expanding our Horse Racing process to other areas – really
just a case of getting the necessary raw data and plugging it into the system.
Areas will be – Hong Kong, Japan, USA, Australia and then
every country that has Horse Racing and where we can get the data we need.
2)
Recreating my local FOREX process and then getting it build
strategically
I need to go back to the beginning of the FOREX process and
rebuild it based upon what I’ve learnt over the last week or so.
We are going to be able to create a process that works
unlike anything I have every seen for FOREX/Investment, this is going to give
us a serious competitive advantage and I can’t wait to start building it.
I will build a version locally and then get the developers
to convert that to the strategic database, a local version is never every going
to be powerful enough to process the FOREX data, but it means I can develop and
test and the developers just need to conver.
3)
Soccer
This is the next biggest potential market for us, and it is
huge across the globe with many different markets, it’s not just about whether
a match is won, drawn or lost, there are markets for the number of goals,
corners and just about everything else in a match.
It’s a team sport so is different to the other two main
strands, but the size of the market means we have to try and see if these new
processes can work for such a sport.
That is the focus of work next, and will give us sufficient
income streams to then work one by one through every available Investment and
sport niche.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.