Sunday, March 29, 2020

11 Questions with @Minor_LS


The pursuit of wisdom in any walk of life quickly reveals that what you think you know is not nearly enough to get you to where you want to go. As I'm starting out in my football scouting journey I challenged myself to reach out to those already working in various roles in football to answer a short list of questions. My goal wasn't to get answers but relevant perspectives on the game within the game.

Here is Minor_LS:

How did your first opportunity in football come about?

When I started my football analytics project, there weren't really any freely available tidy databases on Finnish football to query, so what I started out doing was manually collecting the data I was interested in from video clips. The Finnish league had a kind of InStat integration on their website from which I was able to view tagged clips of the major actions of every game in the league, which made the process less painstaking and tedious (which isn't to say it wasn't slow). 

After having collected a season's worth, I started writing about the data I had collected while continuing the data collection process. After about a year or so of blogging, I felt an urge to see how far I was from the type of stuff being done in the professional game over here so I put out some feelers, and ended up joining a team in the men's top tier for that season while simultaneously doing a project for another team in the women's top tier.

Those opportunities basically came about from being proactive, but were really the result of a longer process of putting myself out there - essentially building a base level of written material. Subsequent opportunities in football have been instigated by someone being in contact with me, either through my blog or through some other form of exposure. 

What attracted you to data/analytics? What’s more intriguing now names, numbers or strategy?

I've always been interested in sports data in one form or another. When I was younger, I used to collect ice-hockey cards, and I'd do stuff like write the statistics from the back of the cards into a notebook, or sort the cards in different orders or group them in certain ways. A natural progression from that was obviously Championship Manager/Football Manager, which was something of an obsession of mine throughout my years in school. When I graduated from business school, I ended up writing my thesis about stats based football player evaluation, so it's fair to say that sports data has been one of the dominant narratives of my life.

I think the most intriguing part of football analytics is applying data to solve problems, whether the problem is tracking player development, or figuring out how to measure some particular part of the game or thinking about some more strategic questions. Football analytics, for me, has been a superb resource for personal development, and a great motivator to keep learning new things, and it has almost exclusively revolved around having a concrete problem to solve. 

Which data metric has been the most profound to you? What caught your eye?

I think the answer to this question is probably expected goals, but to make a short answer long, I'm not sure that it's true either. I remember being amazed by xg when it first started appearing in blogs sometime around 2013-2014 or thereabouts, partly because it offered a solution to a fairly large knowledge gap, but also partly because, to me, it was the first glimpse into the level of detail that was available in the data at the time. 

That being said, I'm not sure the actual insight that xg provides is that profound: the closer you are to goal, the more likely you are to score - this is something that anyone who has ever played the game should be pretty aware of (the valuable insight that xg does provide, however, is, in theory, how much more likely you are). Compare, for example, to baseball, where a lot of the insights derived from data have been fairly unintuitive (OBP is more important than AVG, for one, or striking out a lot is ok as long as you're doing damage with your hits). Which isn't to say that it isn't a great top level tool for all football analysis - in theory, xg works better the more you aggregate - just that it felt more revolutionary when it first appeared on the scene than it has ended up being. 

What I mean by this is that although xg absolutely was a revolutionary and brilliant addition to the football analysis toolkit, it still isn't completely grasped how it should be used at the different levels of the game, and whether its use creates more clarity than confusion for the decision makers at clubs. Imagine, for example, a coach who, instead of trying to maximize the total xg accumulated in a match focuses on only creating a particular type of high xg chance, thereby disregarding all other potential goal opportunities. Or a coach who forbids his players from taking shots unless they surpass some arbitrary xg threshold. Both coaches will think that they are making the most optimal decisions based on the highest quality data available, but in fact probably aren't. 

Any type of data is only as good as the interpretation of it, and that is a bit problematic because what ends up happening is that instead of challenging previously held assumptions, metrics end up reinforcing them because how they should be applied is so ambiguous, and because there aren't enough checks and balances for the interpretations being made. This might not be true higher up in the game, where staffs are larger and there are more specialists involved in the football operations, but in the lower leagues this is a reality (the above two examples are taken from real conversations I've had with coaches).

What is biggest misconception in data? How do you combat/counter this?

This touches on what I wrote above. I think the biggest misconception in data is often how it should best be applied. When you talk with a football decision maker about something like xg the first question is almost always about the information that isn't included in the calculation: does it take into account who takes the shot, does it take into account the positioning of the goalkeeper/other players etc. These are all very valid questions, but they certainly aren't the ones that should decide whether the model is usable or not. If your metrics need to be perfect before you can use them, you're going to end up using no metrics and making poorer decisions as a result. 

Fundamentally, I think this stems from a fear of the data, fear of being replaced by it, but I also think this is a misconception. Data can inform you, but it always needs to be interpreted. In the future, people in football aren't going to be replaced by algorithms, they're not even going to be replaced by people who can write algorithms, they're going to be replaced by people who know how to interpret the output of algorithms - and this, I think, is really the thing people should be afraid of, but is also something that they can totally affect themselves. And that starts with using imperfect tools to get slight edges, while understanding the limitations of the tools and where other tools/expertise should be applied in their stead.

If you could start over what skill would you build on first?

What most people probably answer to this question is that they would have learned to code earlier. I'm not sure that would have worked for me - I managed to learn R only once I really needed to, and in the meantime, I managed to learn Excel to a good extent. Once I got the hang of R, picking up SQL was fairly straightforward. The way I see it, the most difficult thing to learn about SQL or R is how the data behaves, how to structure it in such a way that you can combine different dimensions etc - these are concepts that you can get comfortable with in Excel, which will help you further down the road. Personally, for the things I started with, Excel really was the optimal tool, and mastering it really helped me learn to think about data in such a way that I could fairly easily transition to R/SQL. ¨

Looking back, I think this progression was pretty well optimized for the path I was travelling, but today, you maybe wouldn't need to start with Excel, because there are far more places from which to access football data (like the free Statsbomb datasets or the academic data dump of historical event data that did the rounds).

Do you see player development or player recruitment as more important? Why?

It depends on your landscape. In Finland, for example, player recruitment is probably the single most difficult thing for clubs to do well, simply because we're the metaphorical bottom of the barrel and even if you were able to identify players to recruit, that's only a small part of the problem as players simply won't come here unless they have other options. This emphasizes the role of agents in the game, as teams end up doing less actual rigorous player identification, and rely more on who they know are available to them. I believe that the best solution to this problem is to make an effort to recruit from underdeveloped, undervalued markets using data - depending on which Finnish club you are, that could mean the Finnish second tier (which, amazingly, is pretty overlooked by teams in the first tier), the Swedish second tier, Estonia etc.

In the long term, football analytics is probably going to pivot more toward player development (if we are following the path of baseball), and that's also where the biggest gains are going to be found. Imagine if, instead of having to convince a reluctant player to join you, you could make your own youth players better in a targeted way by finding and removing inefficiencies to make them faster/more agile/read the game better - forget figuring out WAR for football, this is truly the only thing that could throw the economics of the game on its head. This is happening in baseball as we speak, previously mediocre players are suddenly becoming world beaters by analyzing themselves as players and working hard to make themselves better, finding the small things that previously separated them from the All Stars - which really came about from having the tools to analyze what makes a player good. In football, we don't have the tools yet - mostly because the game is more complex, and at least partly because we still don't really know exactly what makes a good player good. 

7. What is your favorite sports moment? Why?

I remember the way my dad celebrated when Finland won the Ice Hockey World Championship in 1995 (I was 7 or 8) and I don't think I've ever had a moment like that. I don't want to come off as being emotionally stunted or anything, but I don't really have a very emotional connection to sports anymore, but in terms of the best spectacle I've experienced live, I'd have to say HJK beating Schalke at home in 2011 with a young Teemu Pukki scoring twice. It was a rare Finnish football moment in which the stadium was packed and the end result surpassed any expectations.

What coach/player/team inspires you? Why?

I've always been an Arsene Wenger apologist. I admire the way he managed to carry himself in one of the most high profile jobs in football, staying true to his values until he stepped down at Arsenal. 

What advice would you give to someone wanting to get into data/analytics?

Don't get too wedded to the idea of sports analytics. In any analytics work, domain knowledge is crucial, but there is a ton to learn about the actual process of analysing data to solve a problem by working in other areas.

Have the mindset of always trying things that you haven't done before. I've learned a ton just by trying to figure out how exactly others have done certain things and then trying to replicate it with my own data set. 

Work on your writing and publish your stuff. I've gotten a lot of interesting opportunities just from putting myself out there.

Don't make the mistake of being too dogmatic about data. Most of the times you 'find something' in the data, it's because you messed something up. People in football don't know everything, but most of the time, they know more than you - respect that. Football is a very complex game to analyze, avoid making sweeping conclusions about it. 

When you watch Moneyball (as you do), marvel at the way the supposedly smart people laugh off defense, which was disregarded mostly because they couldn't model it at the time - assume that you're doing the same about things in your field.

Also, related to Moneyball, the biggest thing to learn from the movie (and I'm specifically referencing the movie, although the book also applies but to a lesser extent) is that, in reality, the Athletics' stats based draft strategy was mediocre at best, that a huge part of the contribution to the 20-match winning streak, starters Zito, Hudson and Mulder and position players Chavez and Tejada, are barely mentioned at all, while Carlos Pena had a better career than Scott Hatteberg. This neither invalidates the results nor the process of that organisation, but it shows the importance of telling a story. You as a person, and football analytics as a movement should strive to tell better stories (or stories better).
  
Who is your favorite athlete of all time? Why?

I'm a Finn, so Jari Litmanen is difficult to surpass. He was voted the second best player in the world in 1995! Before Teemu Pukki's move to Norwich, I'm pretty sure most of even the sagest football analytics gurus would have struggled to name two Finnish players without technological help.

What other sport/hobby/discipline do you feel improves your work as an analyst? Why?

I enjoy reading about and watching baseball. 
I play a lot of team sports and I try to use them as a means to try to understand how difficult it is to make decisions in a (for me) fast paced, constantly moving environment. Realising that I struggle with it, knowing myself and my limitations, gives good insight into how difficult it must be for professional athletes in a truly competitive environment. It's good exercise as well!

No comments:

Post a Comment

11 Quick Questions with Xander Wilkinson

  Xander Wilkinson  - SC Heerenveen Scout What was your biggest fear when you decided to become a scout? The only slight fear that I had was...