Python for Coaches: Part 1 -- Getting the data (fit files)

Alan Couzens, M.Sc.(Sports Science)

In the spirit of showing the benefit of coding for coaches and sports scientists, I'm going to try to share more snippets of working code so that you can play around with Python and, hopefully learn some useful tips and tricks. If you'd like to follow along with the interactive version and code along yourself, you can head here & save an interactive copy of the blog below (hosted on Google's Colab) to play with your own data)

But, first things first, after you've installed Python on your machine (https://www.python.org/downloads/), there are a few different ways that you can run code. Maybe the easiest is from the command line e.g.

python_command_line.png

In the above, I start python just by typing "python" at the command prompt and then I can run any code I want directly in the shell that comes up. The shell is defined by the >>>> bit. I can type any command that python recognizes there & then hit enter and python will run it. Here I'm telling it to print "Hello World" and, after I hit enter, that's what it does.

All of the code below will work via this method. However, a better approach, when you have long sections of code that you want to run together is to write all of the code together in one 'script' and then run that script as one unit. While there are many different editors available (I use VS Code - https://code.visualstudio.com/) , you can write your code in pretty much any text editor. Here is a python script written in the run of the mill Notepad installed on just about every PC..

python_script.png

And then, to run it. I just go back to my command line and then type "python" and the name of the script (assuming the script is saved in the same folder that my command line is pointing to - in this case Users\alan) and python executes each line of my script.

So those are the most common ways to run python on your machine. But, what if you don't even have Python on your machine? The environment that I'm writing this blog in (Google Colab) is a great online resource for writing, sharing and running code. It brings the script and the terminal that executes the script together in one place. All you need to do to run the code in a cell is hit the "play" icon in the top left corner of each code block...

A couple of things to note if you're following along in the colab notebook. You have to run each of the cell blocks sequentially for the code to work. So, if you're playing around and you get an error, just make sure you've run all of the preceding code blocks. So, let's get started...

The first step in doing any sort of "Data Science" is having data :-) Getting data is a strength of the Python language because it is very versatile and can work in many environments. For example, Python can talk to any files on your computer, it can talk to spreadsheets, it can talk to databases, it can talk to API's that live on the web or it can even "crawl" the web and scrape data from web sites. Each of these avenues represent viable data sources for coaches as well. Strava would be an example of a company that has a great public API that you can plug into. I'll cover how to talk to API's in a coming post. However, let's start at the beginning with data that we all have access to - workout files. In this case, we'll look at one of the most common file types - Garmin .fit files.

To do that, we're going to learn about the most powerful little secret about the Python programming language: There are some really smart dudes and dudettes programming in Python and building FREE libraries that we can plug into! This is, IMO, the biggest strength of Python. Just like Apple's "there's an app for that", when it comes to Python, there's a package for that! So let's start with the first package Python's fitparse. A package designed specifically for the task of parsing Garmin fit files. To install it (or pretty much any package), all we need to do is...

In [2]:
!pip install fitparse
Collecting fitparse
  Downloading https://files.pythonhosted.org/packages/61/ed/5637fe96c56d55dfa6317ab16745f9a83ef2690d093cee8f0b59f983675f/fitparse-1.2.0.tar.gz (65kB)
     |████████████████████████████████| 71kB 3.5MB/s 
Building wheels for collected packages: fitparse
  Building wheel for fitparse (setup.py) ... done
  Created wheel for fitparse: filename=fitparse-1.2.0-cp36-none-any.whl size=68201 sha256=961649acc0700cfacf3e021c0af710a2884aa01c5aeddc082bab0bde9a866b05
  Stored in directory: /root/.cache/pip/wheels/ab/49/9e/b93b1eb4daf3734db8766fc8afd3f2e8571bda3938a7b01a5d
Successfully built fitparse
Installing collected packages: fitparse
Successfully installed fitparse-1.2.0

If you click the little play icon, Python will go ahead and grab that package and install it on this runtime. Pretty cool, eh? No downloading folders & moving them to different directories and configuring paths. For most packages, that's all you have to do type "pip install" and then the package name.

OK. So, we've installed the package. Now how do I use it to read my fit file? First things first, there is a particular class within the fitparse package that defines each individual fit file as its own object. So, we're going to go ahead and import that class so that we can define our file...

In [3]:
from fitparse import FitFile

This class is what we're going to use to define our own fit file that we've uploaded as a FitFile object. So that we can bring in all of the cool analysis methods that make up the fitparse package on our file. I uploaded a file called 'KonaBike.fit' to the files icon in the left menu on the screen so I'm going to reference it here.

If viewing this blog in Colab, You won't see it as it disappears with each new run time but feel free to upload your own fit file and reference it in the next block of code. Just replace 'KonaBike.fit' below with the name of your file. So, let's instantiate our file as a FitFile object..

In [4]:
fit_file = FitFile('KonaBike.fit')

OK, cool so now we have created a variable called 'fit_file' and within this variable we've instantiated one of these FitFile objects that points to the file that we've uploaded. Now let's have a look at the data from our fit file...

In [7]:
for record in fit_file.get_messages("record"):
    # Records can contain multiple pieces of data (ex: timestamp, latitude, longitude, etc)
    for data in record:
        # Print the name and value of the data (and the units if it has any)
        if data.units:
            print(f"{data.name}, {data.value}, {data.units}")
        else:
            print(f"{data.name} {data.value}")
Streaming output truncated to the last 5000 lines.
enhanced_altitude, 41.39999999999998, m
enhanced_speed, 7.658, m/s
heart_rate, 145, bpm
position_lat, 234589001, semicircles
position_long, -1861274262, semicircles
power, 297, watts
speed, 7.658, m/s
temperature, 30, C
timestamp 2016-10-08 22:50:50
record (#20)
altitude, 41.799999999999955, m
cadence, 70, rpm
distance, 176422.02, m
enhanced_altitude, 41.799999999999955, m
enhanced_speed, 7.76, m/s
heart_rate, 144, bpm
position_lat, 234588288, semicircles
position_long, -1861273801, semicircles
power, 293, watts
speed, 7.76, m/s
temperature, 30, C
timestamp 2016-10-08 22:50:51
record (#20)
altitude, 42.200000000000045, m
cadence, 70, rpm
distance, 176429.62, m
enhanced_altitude, 42.200000000000045, m
enhanced_speed, 7.587, m/s
heart_rate, 145, bpm
position_lat, 234587580, semicircles
position_long, -1861273363, semicircles
power, 266, watts
speed, 7.587, m/s
temperature, 30, C
timestamp 2016-10-08 22:50:52
record (#20)
altitude, 42.200000000000045, m
cadence, 70, rpm
distance, 176437.22, m
enhanced_altitude, 42.200000000000045, m
enhanced_speed, 7.633, m/s
heart_rate, 145, bpm
position_lat, 234586873, semicircles
position_long, -1861272929, semicircles
power, 270, watts
speed, 7.633, m/s
temperature, 30, C
timestamp 2016-10-08 22:50:53
record (#20)
altitude, 42.200000000000045, m
cadence, 70, rpm
distance, 176445.22, m
enhanced_altitude, 42.200000000000045, m
enhanced_speed, 7.635, m/s
heart_rate, 145, bpm
position_lat, 234586135, semicircles
position_long, -1861272459, semicircles
power, 270, watts
speed, 7.635, m/s
temperature, 30, C
timestamp 2016-10-08 22:50:54
....

Pretty cool, eh? With just a few lines of code, we have access to all of the data in the file. Let's walk through the above code step by step..

"For record in fit_file.get_messages("record"):" Here we've introduced a really common pattern in Python - the loop. Whenever you see "For x in y" it means for that list of y things, loop over each thing in the list and call it "x". Specifically in this case, we're taking the get_messages method (that you can read about in the FitParse documentation) to grab all of the data variables in the fit file with header "record" and then list them out one by one.

As noted in the comment in the code (comments are denoted by # & are not executed - you can write anything you want after a # to help to give more details on your code): each of these records can contain multiple data sources so we throw in another loop to loop through each of the 'data' in the 'record'.

Another thing that you probably noticed is that the code is indented at different levels, with the second 'for' loop indented 4 spaces after the first. This 'nesting' is very important in the python language. In the case of loops, the things to do for each item in the loop is indented to tell Python which step to execute first, i.e. run through each item in the inner loop first and then move onto the next step in the outer loop.

Then we throw in another really common piece of code - the conditional. It goes "If" this happens do that "else" do this instead. In our case, "If" the data point has a "units" variable, print it with the units, otherwise, print it without the units.

Finally, you'll probably note something a bit different about the print statement. If we put an 'f' in front of what we want to print, we can then put variables directly in the string of text (encased in curly brackets) e.g. if we assigned a variable - name = "Alan" and we typed print(f"My name is {name}") We would see "My name is Alan".

As you can see, when we run the above code, it gives us a lot of data! But what if we only want to pull particular bits and pieces from the file? With a simple change to our if statement, let's only pull the power data..

In [ ]:

In [16]:
for record in fit_file.get_messages("record"):
    # Records can contain multiple pieces of data (ex: timestamp, latitude, longitude, etc)
    for data in record:
        # Print the name and value of the data (and the units if it has any)
        if data.name == 'power':
            print(f"{data.name}, {data.value}, {data.units}")
Streaming output truncated to the last 5000 lines.
power, 246, watts
power, 233, watts
power, 259, watts
power, 227, watts
power, 217, watts
power, 249, watts
power, 228, watts
power, 245, watts
power, 231, watts
power, 211, watts
power, 200, watts
power, 212, watts
power, 228, watts
power, 221, watts
power, 230, watts
power, 221, watts
power, 236, watts
power, 219, watts
power, 233, watts
power, 217, watts
power, 207, watts
power, 225, watts
power, 227, watts
power, 238, watts
power, 243, watts
power, 258, watts
power, 260, watts
power, 287, watts
power, 256, watts
power, 313, watts
power, 307, watts
power, 281, watts
power, 298, watts
power, 303, watts
power, 300, watts
power, 319, watts
power, 312, watts
power, 302, watts
power, 292, watts
power, 251, watts
power, 354, watts
power, 336, watts
power, 303, watts
power, 332, watts
power, 336, watts
power, 317, watts
power, 340, watts
power, 332, watts
power, 309, watts
power, 334, watts
power, 304, watts
power, 277, watts
power, 268, watts
power, 253, watts
power, 270, watts
power, 283, watts
power, 302, watts
power, 291, watts
power, 294, watts
power, 270, watts
power, 268, watts
power, 273, watts
power, 268, watts
power, 251, watts
power, 269, watts
power, 264, watts
power, 280, watts
power, 273, watts
power, 278, watts
power, 259, watts
power, 269, watts
power, 272, watts
power, 281, watts
power, 241, watts
power, 264, watts
power, 246, watts
power, 271, watts
power, 273, watts
power, 295, watts
power, 287, watts
power, 281, watts
power, 282, watts
power, 274, watts
power, 281, watts
power, 275, watts
power, 248, watts
power, 246, watts
power, 257, watts
....


Here we changed our if statement to only include the data that is named 'power' ("if data.name == 'power') Let's say we want to not just view these numbers but put them all in a list so we can perform some further analysis - average power, norm power, rolling averages, bins by power zone etc. Easy peasy...

In [17]:
power = []
for record in fit_file.get_messages("record"):
    # Records can contain multiple pieces of data (ex: timestamp, latitude, longitude, etc)
    for data in record:
        # Print the name and value of the data (and the units if it has any)
        if data.name == 'power':
            power.append(data.value)
print(power)
[0, 46, 232, 321, 355, 340, 304, 253, 129, 116, 199, 145, 232, 260, 254, 268, 284, 285, 297, 339, 325, 351, 296, 331, 455, 454, 443, 409, 375, 341, 340, 342, 317, 283, 366, 351, 308, 308, 269, 273, 311, 260, 258, 280, 159, 297, 257, 246, 218, 256, 183, 317, 294, 294, 243, 259, 257, 257, 226, 233, 229, 256, 281, 249, 242, 246, 236, 205, 236, 274, 214, 234, 249, 303, 256, 231, 244, 235, 224, 255, 280, 289, 307, 266, 251, 235, 237, 243, 255, 281, 220, 239, 262, 270, 254, 253, 251, 244, 210, 256, 230, 241, 252, 252, 255, 283, 277, 268, 268, 273, 252, 220, 212, 198, 226, 221, 209,... ]

Here we created an empty list and named it 'power' then, in our loop, we appended each power value to that list and printed the whole thing out at the end. Now, what if we want to visualize our list? Well, there's a package for that (actually several) but let's use matplotlib. First we import it as we did for fitparse above (only, this time it has a really long name so we're just going to tell python to call it plt).

In [ ]:

In [19]:
import matplotlib.pyplot as plt

Then we pass our list of power values to it, we tell it that we want a bar chart and ask it to 'show'

In [22]:
x = []
for i in range(len(power)):
  x.append(i)

plt.bar(x, power)
plt.show()

For matplotlib's bar chart, to plot the chart we need to tell our graph what the x values are and what the y values are. So, first of all we build a list called x and then we count through each data point (each second) in the power data and add it to that list (1,2,3,4... etc.)

Then we pass our new list of x values and power numbers to matplotlib's 'bar' function and ask it to 'show' and it plots our chart for us! With power in watts on the y axis and time in seconds on the x.

While cool & all, you're probably thinking that's a lot of work to just get data that I could easily get with a click from Training Peaks! And you're right. The real power comes not from looking at individual data points or sessions but pulling entire years or careers(!) of training together into BIG DATA sets. I'll show you how to roll through hundreds and thousands of files in exported folders next time.

Until then...

Train smart,

AC