Thursday, 9 July 2015

Status Report - Back End Development

Update on Back End System development


I am finished with the development of the back end system for UK Horse Racing, and am over the moon with what I’ve added in the last week, more on that later.

It’s a bit of a beast, with over 7100 individual query steps in the database, and 10Gb of data across all of the tables.

There is data back to 1st January 2012, which means there are over 400,000 individual runners over that period.

In terms of what’s left to do, there is one big section that I haven’t completed yet, that’s the advanced pattern spotting section.

I don’t think it’s essential for the moment and it’s something I need to give more thought to how I implement it, so am happy with it being left for the moment, we will do it at some stage however.

There are other more advanced concepts that I’d like to build, such as a proper neural network to process the data, but not a priority for now.

All that’s left really is a final run through of all the tables and processes to ensure we have data integrity and there are four micro strategy areas that I want to expand upon,

The four are:

Micro Strategies
ACE Strategies
100% Strategies
Final Strategies

They all work in a similar way but happen at different stages of the process and therefore make use of different data.

It will add significant value to the process as a whole to expand upon them and I will do so over the next week to ten days, it’s not significant development, just one query for every strategy I add.

The Final Strategies query uses the steps I’ve added in the last week, and has some very successful strategies, here is the summary table.

Metric ID
Runs
Wins
Places
Win %
Place %
ZZ001
9
9
9
100.00
100.00
ZZ002
422
352
380
83.41
90.05
ZZ003
52
45
48
86.54
92.31
ZZ004
50
48
49
96.00
98.00
ZZ006
46
36
40
78.26
86.96
ZZ007
29
26
28
89.66
96.55
ZZ008
11
11
11
100.00
100.00
ZZ009
11
11
11
100.00
100.00
ZZ010
33
31
31
93.94
93.94
ZZ011
15
15
15
100.00
100.00
ZZ012
15
15
15
100.00
100.00
ZZ013
42
37
39
88.10
92.86
ZZ014
152
126
136
82.89
89.47
ZZ016
132
125
127
94.70
96.21
ZZ017
176
160
165
90.91
93.75
ZZ018
183
165
171
90.16
93.44
ZZ019
70
62
68
88.57
97.14
ZZ021
32
29
31
90.63
96.88
ZZ022
28
24
25
85.71
89.29
ZZ023
29
29
29
100.00
100.00
ZZ024
22
20
22
90.91
100.00
ZZ025
178
158
168
88.76
94.38
ZZ026
16
16
16
100.00
100.00
ZZ027
7
7
7
100.00
100.00
ZZ028
10
10
10
100.00
100.00
ZZ029
16
16
16
100.00
100.00
ZZ030
14
14
14
100.00
100.00

I’ve only been able to run it for a few weeks worth of history data, so the number of runs are a little small at the moment.

Once I am happy that we have enough micro strategies then I’ll start the lengthy process of re-running all of the historic data.

I’ll do that mostly overnight, and will keep it separate from the main process so we can run both in parallel and bring them together when the historic data process reaches the present day.

In terms of what I’ve added in the last week or so, it’s proving to be very good as you can see from the above table.

What it does is use a combination of some of my favourite elements:

a)       Creating our own metrics/ratings using the underlying data we have
b)       Creating blended metrics that consolidate multiple metrics or events together
c)       Creating a metric store of extreme metrics – for instance where the horse has the highest speed rating in the race, or for the day. These are then pooled, and win % are calculated.
d)       Using the metric store data in combination, to create double metrics, either using the same data or with different metric stores.


All of the above is processed overall and for similar races, using a scenario id that I apply to every race. I haven’t used the scenario related data much in the current process, because we don’t have enough history processed for the new steps to make it useful, I’ll look at that again when the history run has completed. In theory the scenario data should be more accurate because it looks at:

The type of race – flat or jumps
The distance of the race
The going
The Class of the race
Special features – such as just for Two Year Olds.

So speed ratings might be more relevant to shorter races for example.

The idea for this final process came from the mega ratings I created, one of that groups I called meta , and build it from some of the output from the metric store processes.

This is going beyond the underlying raw data and is based on the win% of the metrics in the store and how many apply to a particular horse.

In this respect it’s a bit like a neural net when you put it all together, because it “learns” what is effective but constantly updating the win%.

I create a number of these meta metrics and then use them in conjunction with other raw data to create double metrics.


I run three processes:

A)     Banker – this uses the best attributes for the horse – won last race, speed superiority over others in the race etc

B)      Perfect – This is a special set of metrics from each data store, picking only those with the highest win %

C)      Winning – This uses the meta metrics that I created in the mega rating process.


By creating these double metrics we are making the most of the meta metrics we’ve created and the best of the other data that we hold.

It does create a lot of metrics, if you have 50 in one set and 100 in the other, then that gives you 5000 possible combinations, but when we count how many apply to each horse, we only count them if the win% for that combination is above 85%, so we’re only counting the most significant ones.

As an example from the Perfect Combination table, here are the ones with the most runs

Metric ID
Runs
Wins
Places
Win %
Place %
D169402
612
608
612
99.34641
100
D172602
348
318
326
91.37931
93.67816
D216402
346
318
326
91.90752
94.21966
D169425
331
327
331
98.79154
100
D168602
326
310
326
95.09203
100
D169418
326
324
326
99.38651
100
D169417
306
304
306
99.34641
100
D169414
292
288
292
98.63013
100
D169416
288
288
288
100
100
D167202
284
258
270
90.84507
95.07042
D169202
282
250
270
88.65248
95.74468
D168002
280
266
280
95
100
D169002
266
254
262
95.48872
98.49624
D168802
266
254
262
95.48872
98.49624
D173402
260
248
250
95.38461
96.15385
D152402
254
246
246
96.8504
96.8504
D159602
220
210
220
95.45454
100
D159802
184
176
180
95.65218
97.82609
D172617
174
159
163
91.37931
93.67816
D172618
174
159
163
91.37931
93.67816
D216417
173
159
163
91.90752
94.21966
D216418
173
159
163
91.90752
94.21966
D169219
172
164
172
95.34883
100
D168619
170
154
170
90.58823
100
D160002
164
162
164
98.78049
100
D213802
162
138
148
85.18519
91.35802
D149802
160
136
146
85
91.25
D168625
156
143
155
91.66667
99.35898
D169403
155
155
155
100
100
D214002
152
132
140
86.84211
92.10526
D214202
152
132
140
86.84211
92.10526
D150002
150
130
138
86.66666
92
D150202
150
130
138
86.66666
92
D172625
148
142
145
95.94595
97.97297
D216425
148
142
145
95.94595
97.97297
D168618
145
145
145
100
100
D168614
144
144
144
100
100
D169419
144
144
144
100
100
D169225
142
131
137
92.25352
96.47887
D168617
139
139
139
100
100
D167225
138
131
135
94.92754
97.82609

The really exciting thing about this, is how much potential it has for our FOREX development.

From here I will be working on three strands

1)       Expanding our Horse Racing process to other areas – really just a case of getting the necessary raw data and plugging it into the system.

Areas will be – Hong Kong, Japan, USA, Australia and then every country that has Horse Racing and where we can get the data we need.

2)       Recreating my local FOREX process and then getting it build strategically

I need to go back to the beginning of the FOREX process and rebuild it based upon what I’ve learnt over the last week or so.

We are going to be able to create a process that works unlike anything I have every seen for FOREX/Investment, this is going to give us a serious competitive advantage and I can’t wait to start building it.

I will build a version locally and then get the developers to convert that to the strategic database, a local version is never every going to be powerful enough to process the FOREX data, but it means I can develop and test and the developers just need to conver.

3)       Soccer

This is the next biggest potential market for us, and it is huge across the globe with many different markets, it’s not just about whether a match is won, drawn or lost, there are markets for the number of goals, corners and just about everything else in a match.

It’s a team sport so is different to the other two main strands, but the size of the market means we have to try and see if these new processes can work for such a sport.


That is the focus of work next, and will give us sufficient income streams to then work one by one through every available Investment and sport niche.



No comments:

Post a Comment

Note: only a member of this blog may post a comment.