Update on Back End System development

I am finished with the development of the back end system for UK Horse Racing, and am over the moon with what I’ve added in the last week, more on that later.

It’s a bit of a beast, with over 7100 individual query steps in the database, and 10Gb of data across all of the tables.

There is data back to 1^st January 2012, which means there are over 400,000 individual runners over that period.

In terms of what’s left to do, there is one big section that I haven’t completed yet, that’s the advanced pattern spotting section.

I don’t think it’s essential for the moment and it’s something I need to give more thought to how I implement it, so am happy with it being left for the moment, we will do it at some stage however.

There are other more advanced concepts that I’d like to build, such as a proper neural network to process the data, but not a priority for now.

All that’s left really is a final run through of all the tables and processes to ensure we have data integrity and there are four micro strategy areas that I want to expand upon,

The four are:

Micro Strategies

ACE Strategies

100% Strategies

Final Strategies

They all work in a similar way but happen at different stages of the process and therefore make use of different data.

It will add significant value to the process as a whole to expand upon them and I will do so over the next week to ten days, it’s not significant development, just one query for every strategy I add.

The Final Strategies query uses the steps I’ve added in the last week, and has some very successful strategies, here is the summary table.

Metric ID	Runs	Wins	Places	Win %	Place %
ZZ001	9	9	9	100.00	100.00
ZZ002	422	352	380	83.41	90.05
ZZ003	52	45	48	86.54	92.31
ZZ004	50	48	49	96.00	98.00
ZZ006	46	36	40	78.26	86.96
ZZ007	29	26	28	89.66	96.55
ZZ008	11	11	11	100.00	100.00
ZZ009	11	11	11	100.00	100.00
ZZ010	33	31	31	93.94	93.94
ZZ011	15	15	15	100.00	100.00
ZZ012	15	15	15	100.00	100.00
ZZ013	42	37	39	88.10	92.86
ZZ014	152	126	136	82.89	89.47
ZZ016	132	125	127	94.70	96.21
ZZ017	176	160	165	90.91	93.75
ZZ018	183	165	171	90.16	93.44
ZZ019	70	62	68	88.57	97.14
ZZ021	32	29	31	90.63	96.88
ZZ022	28	24	25	85.71	89.29
ZZ023	29	29	29	100.00	100.00
ZZ024	22	20	22	90.91	100.00
ZZ025	178	158	168	88.76	94.38
ZZ026	16	16	16	100.00	100.00
ZZ027	7	7	7	100.00	100.00
ZZ028	10	10	10	100.00	100.00
ZZ029	16	16	16	100.00	100.00
ZZ030	14	14	14	100.00	100.00

I’ve only been able to run it for a few weeks worth of history data, so the number of runs are a little small at the moment.

Once I am happy that we have enough micro strategies then I’ll start the lengthy process of re-running all of the historic data.

I’ll do that mostly overnight, and will keep it separate from the main process so we can run both in parallel and bring them together when the historic data process reaches the present day.

In terms of what I’ve added in the last week or so, it’s proving to be very good as you can see from the above table.

What it does is use a combination of some of my favourite elements:

a) Creating our own metrics/ratings using the underlying data we have

b) Creating blended metrics that consolidate multiple metrics or events together

c) Creating a metric store of extreme metrics – for instance where the horse has the highest speed rating in the race, or for the day. These are then pooled, and win % are calculated.

d) Using the metric store data in combination, to create double metrics, either using the same data or with different metric stores.

All of the above is processed overall and for similar races, using a scenario id that I apply to every race. I haven’t used the scenario related data much in the current process, because we don’t have enough history processed for the new steps to make it useful, I’ll look at that again when the history run has completed. In theory the scenario data should be more accurate because it looks at:

The type of race – flat or jumps

The distance of the race

The going

The Class of the race

Special features – such as just for Two Year Olds.

So speed ratings might be more relevant to shorter races for example.

The idea for this final process came from the mega ratings I created, one of that groups I called meta , and build it from some of the output from the metric store processes.

This is going beyond the underlying raw data and is based on the win% of the metrics in the store and how many apply to a particular horse.

In this respect it’s a bit like a neural net when you put it all together, because it “learns” what is effective but constantly updating the win%.

I create a number of these meta metrics and then use them in conjunction with other raw data to create double metrics.

I run three processes:

A) Banker – this uses the best attributes for the horse – won last race, speed superiority over others in the race etc

B) Perfect – This is a special set of metrics from each data store, picking only those with the highest win %

C) Winning – This uses the meta metrics that I created in the mega rating process.

By creating these double metrics we are making the most of the meta metrics we’ve created and the best of the other data that we hold.

It does create a lot of metrics, if you have 50 in one set and 100 in the other, then that gives you 5000 possible combinations, but when we count how many apply to each horse, we only count them if the win% for that combination is above 85%, so we’re only counting the most significant ones.

As an example from the Perfect Combination table, here are the ones with the most runs

Metric ID	Runs	Wins	Places	Win %	Place %
D169402	612	608	612	99.34641	100
D172602	348	318	326	91.37931	93.67816
D216402	346	318	326	91.90752	94.21966
D169425	331	327	331	98.79154	100
D168602	326	310	326	95.09203	100
D169418	326	324	326	99.38651	100
D169417	306	304	306	99.34641	100
D169414	292	288	292	98.63013	100
D169416	288	288	288	100	100
D167202	284	258	270	90.84507	95.07042
D169202	282	250	270	88.65248	95.74468
D168002	280	266	280	95	100
D169002	266	254	262	95.48872	98.49624
D168802	266	254	262	95.48872	98.49624
D173402	260	248	250	95.38461	96.15385
D152402	254	246	246	96.8504	96.8504
D159602	220	210	220	95.45454	100
D159802	184	176	180	95.65218	97.82609
D172617	174	159	163	91.37931	93.67816
D172618	174	159	163	91.37931	93.67816
D216417	173	159	163	91.90752	94.21966
D216418	173	159	163	91.90752	94.21966
D169219	172	164	172	95.34883	100
D168619	170	154	170	90.58823	100
D160002	164	162	164	98.78049	100
D213802	162	138	148	85.18519	91.35802
D149802	160	136	146	85	91.25
D168625	156	143	155	91.66667	99.35898
D169403	155	155	155	100	100
D214002	152	132	140	86.84211	92.10526
D214202	152	132	140	86.84211	92.10526
D150002	150	130	138	86.66666	92
D150202	150	130	138	86.66666	92
D172625	148	142	145	95.94595	97.97297
D216425	148	142	145	95.94595	97.97297
D168618	145	145	145	100	100
D168614	144	144	144	100	100
D169419	144	144	144	100	100
D169225	142	131	137	92.25352	96.47887
D168617	139	139	139	100	100
D167225	138	131	135	94.92754	97.82609

The really exciting thing about this, is how much potential it has for our FOREX development.

From here I will be working on three strands

1) Expanding our Horse Racing process to other areas – really just a case of getting the necessary raw data and plugging it into the system.

Areas will be – Hong Kong, Japan, USA, Australia and then every country that has Horse Racing and where we can get the data we need.

2) Recreating my local FOREX process and then getting it build strategically

I need to go back to the beginning of the FOREX process and rebuild it based upon what I’ve learnt over the last week or so.

We are going to be able to create a process that works unlike anything I have every seen for FOREX/Investment, this is going to give us a serious competitive advantage and I can’t wait to start building it.

I will build a version locally and then get the developers to convert that to the strategic database, a local version is never every going to be powerful enough to process the FOREX data, but it means I can develop and test and the developers just need to conver.

3) Soccer

This is the next biggest potential market for us, and it is huge across the globe with many different markets, it’s not just about whether a match is won, drawn or lost, there are markets for the number of goals, corners and just about everything else in a match.

It’s a team sport so is different to the other two main strands, but the size of the market means we have to try and see if these new processes can work for such a sport.

That is the focus of work next, and will give us sufficient income streams to then work one by one through every available Investment and sport niche.

launchupdates

Thursday, 9 July 2015

Status Report - Back End Development

Update on Back End System development

No comments:

Post a Comment