This is me in the bike portion of the Lavaman triathlon in Hawaii. I'm riding in my $1200 bike and $40 helmet, passing riders in $2000+ bikes and funny helmets.
The joys of passing people in a triathlon
A data visualization of the race
August 21, 2015
I squirt some water into my mouth. It's vaguely lemon-lime. As I go to return the water bottle to its cage on my bike, it slips out of my hand, hitting the pavement and bouncing into the bushes. Agh. Rookie mistake, Jimmy. I get back on my bike, but some people I passed a few minutes ago zoom past me.

“Just pass them. Again,” I tell myself, getting back on my bike.

I had just finished the swim portion of Lavaman, an Olympic-distance triathlon, and I had some catching up to do. For those that keep score, I swim a mile in 40 minutes. If the swim was the entire race, according to the official results, I would've ranked 923/1074, or, the bottom 15% of competitors. As a 27 year-old, I was ashamed. I'm a turtle in the ocean, and not the rad, fast kind.

Because I knew I'd have a slow swim time, I made it my mission to pass as many people as possible on the bike. After passing one person, I'd focus squarely on the back of the next person. Each kill[wtf?] gave me an indescribable boost of pride and power. It's a bit of schadenfreude mixed with a self-confidence injection mixed with the lactic acid pain in my legs. Other than the cocktail of emotions at the finish line, passing people is the most satisfying part of the race.

The term "kill" comes from Ragnar running races [?], where competitors proudly display how many kills their team has accrued. A little morbid, yes, but endurance sports lowers testosterone levels[srsly?], so I'm not worried.
The most satisfying kills, by far, were those who were in the aero position on expensive carbon tri bikes, wearing aero helmets. I passed about 72 people in my commuter helmet, alloy road bike, and knock-off Ray-bans[…].

Please do not mistake my boasting as disrespect. I want everybody to race a triathlon, and in a sport where not having the best, most expensive gear feels like a handicap, I want to reassure the non-elite. It's still the engine that matters the most.
No, I did not count to 72 during the race. I forgot to count. With a little noggin-scratching and key-clacking, I figured this out from the results, which are neatly broken down by time for every participant into the triathlon's five segments (swim, bike, run, and the two transitions, T-1 and T-2). At the end of each leg, I ranked everybody. When I compared it to the ranking of the previous leg, I could see who passed whom. Here are some interesting bits I found.
More than a thousand people compete in Lavaman. As a result, the race staggers the start times; these waves are determined by age and sex. It means that passing somebody during the race doesn't necessarily mean you have a better time than them (but it sure feels good). Photo by Natalie Schwab.
if everyone started at once, I passed 500 people

On my way to computing how many people I actually passed, I found out how many people I would have passed if everyone started at the same time (i.e. no wave starts). It turns out I would've passed over 500 people, the third highest number of kills in this theoretical scenario.

My actual number of kills is much less, mainly because I was in the second wave after the pro/elite. I had less opportunity for kills. The majority of people didn't start the race until after I started. At the same time, my number is inflated because my swim time was so bad. I was passed by athletes who started 15 minutes later.

THE RESULTS, ANIMATED This is a simplified version of the race based on the official results. The animation shows every competitor (each dot) going through each leg (including the transitions, T-1 and T-2) at their average speeds. Athletes with incomplete official results (e.g. due to malfunctioning ankle band) could not be displayed.
When do people start?
Color the dots by
Estimated dynamic rank: 0

(I forgot my bib number)
Below are the estimated total kills for each leg for if everyone started at the same time (click 'Wave Start' to see otherwise).
Below are the estimated total kills for each leg for for the actual race (click 'Start Together' to see otherwise).
you passed
passed you
you passed
passed you
you passed
passed you
you passed
passed you
you passed
passed you
you passed
passed you
In the case where everyone starts at the same time, it's impossible to tell how many people you passed in the swim, or who you simply started in front of.
This is the number of people who started in a wave after and passed during the swim.
Age hardly matters
Kevin Moats, 61, placed 31st with a time of 2:15:55. He passed me on the swim (even though I had a ten minute head start) and finished the race 45 minutes faster. I'd like to shake his hand. Photo by Wagner Araujo.

When I first started training for the triathlon, I was surprised at the range of both age and body types of its participants, but it makes sense that the breadth of sports - swimming, running, biking - allows for a larger variety of physiques and fitness levels.

This was most evident on race day, where almost every age 15–75 was represented. It's both inspiring and humbling to be passed by somebody your parents' age. In fact, I met a young woman who was racing with her mother for the third year in a row. How much does age matter really? It turns out, it matters, but only a tiny amount. It was a statistically significant predictor, but would only explain less than 2% of the variance in finishing time.

Age vs finish times The points are all over the place. That means that given a participant's age, this line would be very imprecise at predicting the person's completion time. (r2=1.8%)[?]. You might also notice the line isn't very steep, meaning that if it did precisely predict completion time, a 70 year-old would only be on average 20min slower than a 25 year-old. But despite the high variance, age is still a significant predictor for rank and completion time (p < .001). So, age does matter, but there are more things that matter, things not captured in the official results, that explain the other 98% of the variance (I would guess, for example, how much one trains). If we examine men and women separately, it's virtually the same line. Their r2s are both low (6.4% and 1.5%, respectively), but significant. In other words, given that you're entering Lavaman (regardless of sex), age is a poor predictor of completion time. Obviously, someone age 70+ entering Lavaman is likely to be much more fit than the average 70+ year old.
Which sport is the most important?

What makes triathlons interesting as a race is that all three sports, and their transitions, matter. It's about the challenge of training for all three and then completing one after the next. But what segment of the race best predicts your rank? All the data screams, “Running!” It has the most variance (sd = 17min), meaning the gap between the slowest and fastest runner is largest. Whether we're looking at time or rank, the running leg made the most difference.

Not surprisingly, of the three sports, age affected running the most. (And surprisingly, age does not significantly affect the bike portion.) One factor in this might be that the later waves (older folks) experience more heat, slowing them down in the run, especially.

Keep in mind that because the run was the biggest predictor of rank, it doesn't mean that that's where you should train. I, being in the 14th percentile for example, can gain some serious minutes with standard swim training, but improving my run will get harder and harder.

The leg that most affects rank Each vertical line is an athlete. The center diagonal line represents athletes' overall rank (the x-axis). The length of the green and blue lines is how much higher or lower that athlete ranked in a certain leg compared to overall rank. What this graph shows is that, of all the legs, the rank in the running leg closest matches the overall rank. You can see this by clicking on the different legs. The running has the least spread, unlike T-1 and T-2, which are the least likely predictors in overall rank. The bike portion is a close second.

Below I've included all the data and code used to create these visulizations. Want to see who exactly you passed? Want to apply this kind of visualization to another race? Think younger people are better at transitions? See for yourself.

1. Official Lavaman 2015 race results
2. Official Lavaman 2015 race results, extended (no longer online)
3. CSV of filtered race results, with additional columns of computed kills
4. Python code that takes (2) and outputs (3).
5. D3.js code to make the race visualization and scatterplot (three files)
6. R output of linear regressions that form the basis of my statistical interpretations
7. To my surprise, this article was shortlisted for the Kantar Information Is Beautiful Awards