The Health Effects of Being a YC Founder: A 3-Year Data Analysis (Part 1)
“Guys remember to sleep well, and please workout. Life still goes on during your start-up” - Harj Taggart YC group partner
In the beginning of August, I took the first few days off building Laudspeaker I had taken all year, and had a rare few moments to reflect, and also catch up on sleep. It had been a fun, but long and hard 8 months, and the year had been speeding past. I had a look at my long list of personal projects, and the time felt right to tackle one of them: looking at all the health data I had been collecting over the last few years.
I had started wearing a WHOOP (a heart rate, and sleep monitor) in 2022, and recently learnt that Apple health had also been tracking my activity (even though I had not explicitly turned it on) for at least 5 years. I loved jumping into data, so decided to whip up a juypter notebook, that I will share at the end of the series.
Collecting and understanding my whoop and apple data
I started off by exporting my WHOOP data. WHOOP makes it very easy to export the data it tracks. As off August 2024, the data export consists of 4 csv files: journal_entries.csv, physiological_cycles.csv, sleeps.csv, workouts.csv. Journey entries and workouts are only as valuable as make them be as they involve a decent amount of manual data input via the whoop app, while physiological_cycles.csv and sleeps.csv are very rich sources of information.
After creating a google collab (a hosted version of a jupyter notebook), I loaded the data into pandas dataframes and started off with a few visualizations of the data to build up some intuition:
You can see the whoop health variables I visualized here:
I tend to prefer to get a sense of the data first before any thorough cleaning or transformation so looking at summary statistics or a few plots typically help. In this case describe
is your friend:
A few things jumped out at me here, a lot of my health data did not vary all that much including resting heart rate, average heart rate and skin temperature while there was a lot more variation in my sleep habits.
WHOOP’s strain measure seems to be a some combination of calorie burnt (which makes sense), and max heart rate. You can see the day strain vs calorie burn here:
The correlation matrix helps suggest relationships between different variables as well:
Visualizing My Apple Health Data
In my case, since I just have an Apple phone (no watch), Apple Health records my step count, my total distance walked or run, their estimates of calories burnt, my walking speed, and a few other things like exposure to headphone audio.
Unlike the whoop, there are potentially multiple measurements a day, and all the different measurement types are are recorded in the same file making the export xml much larger than the whoop equivalent. You’ll also notice that the measurement includes which apple mobile device made the measurement (device and sourceVersion columns).
I transformed the data in a few different ways, for example I accumulated the steps, and calories into daily sums, and for the audio exposure I calculated a weighted average sound pressure figure for each day.
I had a lot of step data, going back to the end of 2016, with a gap from mid 2018 to mid 2019 (not sure why) and then data again from then till now. (* I walk a lot more in NY than before)
We’ll revisit the following graphs later:
Seasonality and Day of the week effects
I imagined there were some seasonality or repeating time effects in my data, and wanted to figure out what they were. There are a number of ways of doing this, ranging from more involved statistical approaches to simply plotting monthly, weekly and yearly breakdowns and looking for patterns.
I used both, opting for fourier transforms to figure out the spectra on which time effects were observed, and also plotted the time series with different cuts.
A Fourier transform of step data showed a peak around 0.14
Peaks in the magnitude plot represent dominant frequencies (seasonal cycles) in the time series. For example, a peak at a frequency of 1/365
suggests a yearly seasonality, while a peak at 1/7
suggests a weekly seasonality. In this case we see a peak right around 0.14 ~= 1/7! suggesting weekly patterns to my walking.
Looking at the weekly steps count (ignore the y axis labels) shows I walk a lot more on the weekend during the week, which is true:
Fourier transforms of my sleep data were much more noisy
Plotting the fourier transforms for my sleep data suggested no clear repeating pattern. To confirm this I calculated the signal to noise ratio of the top 5 fourier frequencies, which also suggested noise.
Explicitly plotting sleep by month didn’t help highlight patterns in my sleep either
A Timeline of Data
So far I had been quite surprised by the lack of regularity in my data. Even though the data didn’t clearly show it I definitely felt certain weeks and months were much harder that others. The daily graphs were too detailed and noisy, so I wondered if another cut of the data would make more sense.
I decided to average my data on a weekly basis and develop a Year - Month - Week timeline. This level of granularity was right for me; as soon as I plotted measurements, I could remember and clearly see what happened to my body during specific incidents. The data was telling a much clearer story, for example check out this graph showing how long I stay awake in bed before falling asleep:
Each of these Plots highlight a specific chapter in the story of Laudspeaker. For example in the plot above there is clear spike in my time awake, before falling asleep. I am usually an efficient sleeper, but like most people when I’m worried or particularly stressed it can take much longer, and I remembered when we were pivoting from Tachyon Transfer it was hard for me to sleep with all the uncertainty around the direction of the company.
Something else that was clear was that sleep consistency (do you go to bed at the same time) had been poor ever since starting my entrepreneurial journey but it was much worse immediately after we launched Laudspeakers MVP in October of 2022 and when we closed a large client in Korea, and I had to work Korean hours while living in New York
The period of working Korean hours had a number of effects, including increasing my resting heart slowly.
You can see the drop in heart rate variability more clearly (as I stopped working out when we closed our largest customer, and shifted to weight lifting from running for 6 months)
This was also reflected in fewer calories being burnt (look at the moving average), driven by a reduction in the number steps I took every day, as I was glued to the computer programming all day.
You can also see life milestones, the period I met my now girlfriend is in this plot (but left as an exercise to the reader to find )
Other variables didn’t move much (and probably shoudn’t):
A common pattern I noticed with the long hours associated with closing our biggest customer was my time asleep would lag each brutal week, but then I would try to make up for it (without realizing) leading to spikes in my recovery:
Thanks for reading I’m planning on following up on this with an analysis of other personal metrics I track too!
Methods and Code Snippets:
Whoop Data
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
physiological = pd.read_csv(io.BytesIO(uploaded['physiological_cycles.csv']))
journal_entries = pd.read_csv(io.BytesIO(uploaded['journal_entries.csv']))
sleeps = pd.read_csv(io.BytesIO(uploaded['sleeps.csv']))
workouts = pd.read_csv(io.BytesIO(uploaded['workouts.csv']))
peaking at the data:
physiological.head()
physiological.columns()
which you can see below:
The physiological file, logs each day of measurement with a cycle start, and end time, and tracks a number of heart rate data, sleep data, skin temperature, estimated calorie burn numbers, and then whoop specific measures strain and recovery . Strain is a measure of how hard your body has worked that day, and recovery is supposed to tell you how ready your body is for a workout, and how hard that workout should be.
In my personal experience strain and recovery are directionally useful in that a high strain typically does correlate with a feeling that I have worked out hard, and a low recovery means I find it hard to workout later, but on the other hand I don’t find there to be much of difference in a recovery score of 70% and 90%.
Most of the sleep data is included in the physiological file (only the column naps seems to be unique), so I didn’t analyse that seperately, and I also skipped workouts in this post. Workouts include breakdowns of heart rate zones, letting you know how much of your workout was in a specific heart rate band, eg 30% of a workout was in zone 2 (60–70% of your maximum heart rate)
Apple health Data
Your apple data can be exported as an xml file, and read like this:
import xml.etree.ElementTree as ET
import datetime as dt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
tree = ET.parse('export.xml')
The xml file keeps every measurement (row) in a what apple calls a record, which you can see like this:
root = tree.getroot()
record_list = [x.attrib for x in root.iter('Record')]
record_list
Its more useful to turn it into a dataframe:
record_data = pd.DataFrame(record_list)
and you can peak at the data like this:
record_data.head()
Other Resources:
There were many resources online that were helpful with my analysis, I’d like to thank them below:
https://sekarwrites.com/visualizing-apple-health-data-with-python/
https://betterprogramming.pub/individualizing-my-whoop-4-0-data-to-better-my-athletic-performance-38a27da8fab2