Why coaches should learn to code.
Part 2: Which language should I learn?
Alan Couzens, M.Sc. (Sports Science)
Oct 3rd, 2017
It’s been more than a year since I wrote my first post on “Why coaches should learn to code.”. It turned out to be a bit of a controversial post. Some thought I was under-valuing, or even overtly disrespecting, the 'gut feel' methods of 'old school' coaches who had 'put in their time' and that as my own time continues to stack up, my opinions might change. And, since we all know that, in computer years, one year is equivalent to a decade :-) I figured, on both counts, it was high time for an update.
Am I still of the opinion that all (old and young) coaches should learn to code? If so, would I change anything regarding the best route to do so from that previous post? And, what new ways have I found coding useful in my day to day life as a coach?
In short – yes(!!!), somewhat, lots (!!!) :-)
The extended version…
Do coaches, in the modern era, need to learn to code?
I’m not sure if it’s the swarm effect of twitter or what, but over that short space of a year, the amount of folks specializing in that juncture between sports and data science seems to have exploded! Only yesterday, @willkirousis directed me to this great post from one of those fine folks who is taking their sports science to the next level by applying themselves full-force to the new field of sports data analytics…
And Jacquie isn’t a one off. In today’s world, the majority of professional teams in all sports – from the N.B.A. & M.L.B in the U.S., Premier Football League in the U.K., to the National Rugby League &, as Jacquie confirms, the AFL in Australia are employing sports data science professionals to ‘crunch the numbers’ on key performance indicators and 'mine' this data for those critical 'gold nugget' insights that make up the 1 or 2% that separates the best from the rest.
This growth in data science makes a lot of sense. As new hardware (gps/positional devices, power meters, cameras, HRV trackers/health wearables etc) comes onto the market, it produces more and more data that, assuming it is useful/competitively advantageous data, demands analysis!
And, if the implementation in team sports is anything to go by, the organizations that are implementing sports data analytics are finding it competitively advantageous in a multitude of ways - from illness and injury prediction/prevention, to performance modelling, to in-play strategy optimization. All fields that endurance athletes could benefit from further insight/information equally, if not more, from!
And endurance sports are at the forefront of adopting these tools & generating more data! However, in our (less financially lucrative) world, in the absence of a budget to hire a specialist in sports analytics, in order to keep up with the advantages that the insights provided by an understanding of data analytics brings, 'data scientist' is another one of those roles that the savvy coach needs to add another hook on the proverbial hat rack for.
And, while it might be a bit intimidating, with the tools available today, getting started isn’t as hard as you might think! The first, and maybe most important, question you’ll face as you start to explore the wonderful world of data science (& the topic of this blog post) is…
Which computer language should I learn?
As I outlined in the previous post, after a ‘few years’ between my exposure to the BASIC language as a ‘complicated’ youth, I took up where I left off by learning PHP.
PHP is primarily a server-side web programming language. As a web programming language it gives coaches the ability to ‘talk to’ the huge amount of data that is available on today’s internet. I spoke in the previous post about how easy it is to ‘query’ this data, i.e. to ask it questions. If you have a database on a server (or an API that can connect to another site's database/server) that you can talk to & you know a server-side language like PHP, the opportunities for ‘interrogating’ that database with questions are endless!
By using a language like PHP, you can ask your program to pull any data that you want to take a closer look at. PHP does a great job of fetching data. However, it’s when we get to these ‘closer looks’, that additional step where we want to pull a bunch of data that we have fetched together to mine it for insight and maybe even use it to make predictions about the future given different decisions/courses of action, that other languages really start to come into their own.
There are 2 big players in the world of computer languages that have developed a specialty for 'taking a closer look' at data – the language that Jacquie mentions in her post – R, an open source language that was built in 1995, almost for the exclusive purpose of statistically analyzing and graphing complex data-sets, & Python, another, more general purpose, open source language that has recently found a really strong foothold in the data science community, due largely to the (growing) plethora of powerful statistics and machine learning libraries that are ridiculously easy to import into your code.
In fact, Python's growth in data science has been so ballistic in recent years that it's now beginning to surpass R as the data science 'language of choice'
And for good reason, it’s just so easy to work with!
I’ve provided a few examples of just how easy this is in previous blogs. For example, in my recent blog that analyzed my team's long term data to deduce the optimal periodization model for ironman athletes, I was able to import modules that give immediate access to incredibly powerful ‘model building’ machine learning algorithms, like neural networks, random forests & decision trees (which I used in another post on injury prediction) with just a few lines of code…
In PHP, I had to code some of these algorithms from scratch but in Python, just about any machine learning algorithm you can think of, already has a module (or 10) built for it, and making use of it is as simple as writing ‘import…’ (whatever module you want to use)
The value of the ease with which you can 'plug and play' algorithms can't be overstated. If you want to compare 5 different algorithms to see which most accurately represents your real world data, you can import and test them all on your dataset in the space of literally, a couple of lines of code for each model. Furthermore, you can fine-tune the hyperparameters of those models just as quickly & easily. Not only does this result in easier model building but it also results in better models that you can have confidence best represent the data.
But Python’s strengths don’t end there! Python is a general purpose language, meaning it is just as ‘at home’ talking on the web with a server (just like PHP) or indeed, talking with files on your computer. Heck, if you get bored with crunching numbers (as if! :-) you can even use it to build computer games! With Python, you can end-to-end, find, collate, analyze and model your data...
- ... you can build a ‘spider’ to crawl the web, look for and 'scrape' data on any question/topic you like,
- ...that same python script can then pull all of that information for you into easy to work with numerical arrays.
- ...the same python script can then collate these arrays and feed ‘traditional’ stats to you.
- ...the same python script can then also use this data to build a predictive model that best describes the data.
- ...the same python script can then live on the web and provide the insights of this predictive model to the world on your website!
THAT is the power of Python!
Bottom line, as a 'jack of all trades' language, a language that can hunt down & find the data, bring it to you with your slippers each morning, then rapidly & efficiently crunch the numbers on said data, then wrap it up in a nice easily digestable package, Python has a lot going for it and, for these reasons, it’s absolutely what I’m spending most of my time coding in these days. But no language is perfect. So what are the drawbacks?
As someone who is still very wet behind my Python ears (do snakes even have ears? :-) here are some quick impressions of how it compares to other languages.
1. It covers a lot of ground…
Because it’s so versatile, it can take a bit more ‘digging’ to get to specific information on the area/task that you want to accomplish. For example, if you open a PHP book, it will generally talk about the web and databases. If you open a Python book, it might cover anything from the 1’s and 0’s of binary code, to machine learning, to building bots and scrapers, to network hacking, to web frameworks like Flask or Django, to using math and science libraries like Numpy & Scipy etc etc. In other words, Python programming covers A LOT of ground! But, on the positive, if you’re anything like me, it is a language that will definitely keep your interest!
2. It is generally less well supported on hosted web servers (although this is slowly improving)
Most traditional web hosts still employ PHP as the server-side default language on their servers. If you want to use Python for server-side scripting, you may have to learn how to access and configure your server via SSH, i.e. via a 'secure shell' remote command line. This 'being forced into learning new, more broad computer skills' could just as easily be seen as a positive of the Python language. In Python you'll find yourself spending more time working from the command line which, on another positive, just feels much more 'hacker cool' than spending your time programming in fancy web interfaces :-)
3. It is less forgiving than a language like PHP…
You will trip (& have to deal with) more errors in Python than some other languages. Because of its versatility, Python doesn’t assume a lot. You have to be clear with it. For instance, when joining numbers to a ‘string’ of letters, you need to tell it that you’re joining these 2 things together, not adding numbers. E.g. 'Alan' + 1 creates a ‘what are you trying to do here? You can't add a number to a word, Dufus!’ error :-) More correctly, str(Alan) + str(1) tells the computer that you want it to join 2 strings of characters together to make one larger string, in this case, ‘Alan’ and ‘1’ joins to make 'Alan1'
It also is a ‘structure based’ language, it draws inferences from the structure of your code. Unlike PHP, you can’t just put stuff wherever you want(!). If you mess up how your blocks of code are indented, even by a space, it will trip an error. You need to keep your code ‘neat and tidy’ – a challenge for folks with my personality type but once you get into the habit, a huge advantage when going back over your code to fix errors and a huge advantage in keeping your code streamlined and simple.
Which brings us to the huge positive impression…
4. You will write better, shorter, more efficient code!
Because of the imposed discipline of the structural constraints coupled with the concise & explicit syntax, you will find yourself writing cleaner, clearer, code. Because of the easy readability of Python code, you will also find yourself making more use of those elements that make your code significantly shorter & more efficient (things like loops and functions) and this will lead to much simpler, better structured scripts. More generally, it will lead to a ‘style’ of coding that you will take with you to any additional language that you take on. My PHP coding has improved a lot since learning Python.
For this reason, if I was to do it again, after getting a handle on the basic web languages of HTML and CSS, I would start my server side journey with Python.
Hopefully, I’ve whet your appetite enough by this point to get you asking the question. Where do I start?
For books, the best ‘starter’ book in my opinion is, without question, “Automate the boring stuff” – A great mix of language basics but also it dives right into a bunch of cool real world applications for Python programming.
Following that, the best intermediate book that I’ve come across is Programming Python by Mark Lutz. This is more of a reference book and will cover just about every use case that comes up in a really easy to follow, conversational tone. Great book.
For data science/machine learning, Introduction to Machine Learning with Python: A guide for data scientists by Andreas Mueller & Sarah Guido is a readily applied, easy to follow intro book to using Python for data analysis focused around the powerful scikit-learn library. At the intermediate level, Jake VanderPlas' Python Data Science Handbook is a fantastic reference text that covers all aspects of data science in python in a good amoutn of depth & Sebastia Raschka's "Python Machine Learning" book offers a really interesting & engaging mix of some of the theoretical/mathematical basis behind the various algorithms along with practical implementation of them in the real world. For an introduction into how to implement the "big daddy" algorithms of Deep Learning in Python, "Hands-on Machine Learning with Scikit-Learn and Tensorflow" is a 'next level' read that does a fantastic job of introducing how to use TensorFlow for more advanced Deep Learning applications
In sports science we are on the precipice of a huge paradigm shift – a shift away from drawing (incredibly limited) conclusions about what ‘works’ and what doesn’t from 10 week studies on a small group of moderately fit college aged subjects to drawing much better, much more population representative conclusions from the mass of real world, real life, long term, individual data that is being accumulated effortlessly every day from every person on the planet who dons a wearable.
Frankly, we need all hands on deck to learn from & make the best use of this data! It’s time to ‘skill up’ and get familiar with those tools (like Python) that will enable you to be on the side of the ‘new world’ folks who actually understand data & are able to use it intelligently to make informed, objective, key decisions, vs the 'old school' coaches who you'll find on the bench lamenting about 'the good old days' when coaching was all about 'gut decisions'. It's 2017, we have the tools, it's time to actually use them to make better, safer, smarter decisions. It’s time to…
Don't miss a post! Sign up for my mailing list to get notified of all new content....