Friday, 9 November 2012

Analytics - Correlation (Article 1)

Correlation (Article 1 of an anticipated few...)

==================================
Correlation for Positions & Bonus Points.
by Matt
==================================

Hello Football Fans, This is the First Article for the Football^3 website (other than links to articles I have written for other websites.)  So without further ado, let's jump in at the deep end. 

A few weeks ago I came across a cool Footy-Stats website called 'Shots on Target' and noticed an article on correlation. From this article I spent a good few hours creating my own correlation tables, basically the 'Shots on Target' website inspired me to take their correlation work on a step further. Rather than looking at overall correlation, I have split the correlation out into positions and towards the end of this article I look at how correlation relates to Bonus Points. So let's get on with it, here are the tables I spoke about....


Right questions...
i) Where was the data sourced from?
- Assist to MoM were sourced from  www.whoscored.com
- Bonus Points & Total Points were sourced from www.fantasy.premierleague.com
 (Gameweek 9 - 2012)

ii) What are the abbreviations for the columns? 
SpG - Shot's per Game.
PS% -Passing Success
AW - Ariel Battles Won.
MoM - Man of the Match. 
(I asked WhoScored.com how the Mom's were calculated and received the following reply. Make of it what you will.)
"Hi Matt,
All of our ratings and man of the match awards are statistically calculated using our own unique calculations, with over 200 raw statistics (positively and negatively) affecting them. We do not use any other sites for our ratings, which are highly respected in our field.
Regards,
WhoScored.com"


iii) What does the correlation show?
Well where do we start... I'm hoping that people read this article and point out more ideas that I may have missed, but lets start with a few snippets I have picked out.

The most positive correlation is Goals/Total Points for Strikers, with a massive 0.92 for the correlation coefficient. This shows that without doubt, Goal's and Total Points are very related (well done,) even more so than Shots per Game. Admittedly I do not have data such as Shots on Target here, that may require actually buying data. However 0.92 is more than any other correlation on Shots on Target .co.uk. This means that there is value in splitting the data out into positions, and then the correlation should be even higher for Shots on Target per position, (Kudos to them!).

There seems to be little or no correlation between Passing Success and Total Points so players like Arsenals Arteta  (92% successful pass rate, at the moment highest in Europe) and Liverpool's Allen are truely in the brown stuff when it comes to picks for Fantasy football, but we knew that already, right? Perhaps I should include team win ratios to see if passing success or other metrics are related to result out comes, anyway I digress; point noted. 

iv) Bonus Points.


For me this is the most interesting, Bonus Points are mostly correlated to Total Points, it would indicate that the guys at FPL are awarding bonus points to the players that score the most points in the Gameweek, rather than a player that plays well. This almost makes player like John Obi Mikel  redundant for fantasy football purposes, but we already knew that right? (Please note unlike other websites, I did not remove appearance points, as I thought that not removing them would give players like John Obi Mikel's more chance of finding a good correlation, I could separate this out but has been done else where numerous times.)

v) Random Correlations i.e. between GK Assists and Bonus Points 
I believe this is for two main reasons and two indirect reasons:


Direct reasons.
1) The only keeper to have an assist when I did the modelling was Cesar (QPR), that week Cesar got bonus points too.
2) Most of the keepers don't have bonus points, I didn't have the saves data (&couldn't be bothered to collate the data as that takes the longest manual part, Vlookups etc.) but if I did have the saves data I should imagine this will be around 0.5, as most keepers don't get bonus points & don't get assists, so they are pretty correlated. Where as Goal keepers generally don't get bonus points but do get points for saves so not so correlated... Nice hey, showing each number needs to be interpreted!

Indirect Reasons
3) Think this ones is a biggy, keepers generally won't get in the top three players out of 22, for a game pointswise, hence the total amount of Gameweek points aren't as large. As the correlation shows significantly you have to collect points to get the bonus points.
4) I don't think the data is right for keepers or, well, more data is needed. This is due to too many zero's, I need more keepers to actually get BPs this season, hence I cannot draw strong conclusions that this data set is correct for keepers, more analysis with larger data sets are needed.


Splitting the data out in to position, & trying to understand which correlation is significant, is a time old problem. Reason being is that 'significants'' is wholly subjective, as I have well learnt modelling. Just because things are highly correlated doesn't mean they are related, this could just be coincidental correlation moreover a small correlation could actually be highly significant, long term analysis is needed and hence why I titled this... Article 1 of an anticipated few.


As for further analysis of Correlation I've been really mulling over what to do with it. I think I'm going to have to wait until I have more data.... & more decent metrics, it just takes too long to collate all the metrics at the moment aswell as they are from different places. Outside of the model it didn't throw up anything I didn't know already. Other than the few points which were mentioned above.


I'll leave you with the over all Correlation of all Positions and who the Max Correlation & Least Correlation matters for i.e. each individual position against one another. Please comment or get in touch if you think this was cool or have anything to add. 


@Mattistician






No comments:

Post a Comment