Math IA
Math IA
With the learning of mathematics, I have always been absorbed by its integrations into
everyday life. Being someone looking to go into the field of medicine, it really strikes my
curiosity to find out how I could use math in order to predict likely outcomes through
graphs and correlations. It grabbed my attention when I recently looked into how I could
use statistics in order to create a model to predict likely outcomes of a bodily variable.
When looking at my own life, one such hobby that would exemplify a predicted bodily
As one of my core hobbies, running has been a crucial part in my exercise and
enjoyment of life. Specifically, long distance running without stopping has always
fascinated me through its difficulty, always requiring me to pace myself to avoid the
pitfalls of tiredness and fatigue that comes with long distance running. Often, heart rate
can be linked with the level of strain that the body experiences, and is an important
dependent variable for runners to monitor when running at faster paces (the speed at
which one runs at) for longer periods of time. With this in mind, heart rate’s relationship
equation to figure out the heart rate achieved at a certain pace is crucial to determine
the level of challenge that is suitable for me. My goal for this investigation is to
mathematically determine if there is a relationship between the pace at which one runs
to heart rate. Ultimately, the final goal would be to find a mathematical equation to
predict the heart rate increase in relation to the pace at which I run at.
1
Aim and approach
As indicated in the introduction, the aim for this paper will be to mathematically
determine an equation for which my heart rate is affected by the pace at which I run.
In order to achieve this, I will need to collect data from an experiment, as there is no
data on the internet that is able to create an accurate data set of my heart rate in
homeostasis while running at different paces. For that reason, I will be needing to
To specify, heart rate will be measured during the homeostasis of running at a specific
pace, which means that the heart rate has adjusted to supply the extra oxygen to the
body’s increased demand while running at a certain pace. This thus indicates that there
should be a time buffer present before the heart rate measurements are taken.
As a means to approach the aim, I will highlight the independent variable as being the
pace at which I run at and the dependent variable to be my heart rate during
measure the paces for which I will be running at so that I can collect data. As well, I will
be utilizing a xiaomi band 5 in order to record the heart rate data down.
2
My first thought was to run for around 5 minutes and record a singular point of data for
the pace. However, I realize that there are many variables that sway the recording of
heart rate, and could cause an outlier to occur. In order to fix that, I would be recording
a data point for every 15 seconds, and can decide to average out the recordings in
order to reduce the effect of outliers upon my data. Furthermore, I realize that
increasing pace is continuous, and I would have to have a trial for each speed the
treadmill could precisely change, creating an almost impossible and arduous amount of
time and effort in order to satisfy the data collection. As a result, I would change from
continuous data to discrete data by doing a select range of paces by going up a certain
amount until I reach a maximum as a means to lower the amount of trials I would have
to do in the end.
From previous experiences and knowledge, I would anticipate that the heart rate would
increase to match the level of strain the body had to endure, thus increasing whenever
the pace of the running increased. At faster paces, I would have to slow down faster,
suggesting to me that there will be a non-linear positive relationship between pace and
speed, as the strain upon the body is seemingly exponentially amplified until one cannot
run any faster. This further suggests that the equation’s slope of heart rate increase
increases the faster one runs. In addition, the y-intercept of the equation is the resting
heart rate, allowing for the easy collection of this important data point.
3
My hypothetical expected equation would be an exponential equation likened to
𝑐𝑥
𝑦 = ℎ(𝑏) + 𝑎, where 𝑦 is the average heart rate in the last minute after four minutes of
running, 𝑥 is the pace at which I run at with ℎ 𝑏 and 𝑐 being the factors I need to
manipulate in order to stretch and dilate the graph and 𝑎 being used to translate the
graph up in order to fulfill the asymptote requirement for a resting heart beat.
As a means to test my hypothetical equation, I ran the experiment. The data itself would
range from 7 kilometers an hour (km/h) to 14km/h as that is the speed range for which I
usually run at with the minimum of 7km/h being used to cooperate with the time
constraint of the experiment while the 14km/h would be the fastest I could run on the
treadmill safely. As well, I had decided to go up by 0.5 kilometers with each trial,
generating a good amount of 15 trials and thus data points because it was a good
medium between data fidelity and time investment. In addition, an extra point at x=0 will
be recorded, as that would constitute the heart rate at rest and being a y-intercept to
As well to clarify, I will be rounding to two decimal places, as any more would be
For my data collection, the variables that I will need to collect data for is heart rate and
pace. For the dependent variable, heart rate, I will utilize my friend’s mi-smart-band-5
smartwatch and its function to monitor heart rate in order to record heart rate data. As
4
well, I will have a friend record the heart rate every 15 seconds by telling them what it is
at. All of this will be done running on a treadmill for 5 minutes or 300 seconds in order to
standardize the amount of time and paces that I will be running at. Furthermore, 5
minutes is around the average amount of time that it usually takes to reach the
increased heart rate, with the slope for heart rate’s increase usually dropping off to zero
by around four minutes. For the resting data point, I will take three days of rest once all
the data points have been completed and record the data for a minute, with each point
having a 15 second interval between them to match my experiment. Then the four data
points will be averaged out to get the y-intercept. Some controlled variables that I must
account for are the watch I am using, the placement of the watch, the model of the tread
mill, the incline of the tread mill being flat, and the intervals for which I am recording
heart rate. As well, some inconsistencies in the data can be attributed to the
under strain that can affect the heart rate measurement. One way to counteract this
confounding principle would be to take 30 min breaks in between each trial. As well, I
found that the watch’s heart rate monitor can be quite fiddly, and may be a confounding
factor that can affect the accuracy of the test. As a result of these confounding factors,
the data may have outliers, and can mislead the final resulting conclusion.
Unfortunately, due to time constraints, only a single trial can be run for every pace. As
well, due to the large data table, the raw data will be included in the appendix.
After doing the data collection, I had to redo the slowest three, as they had abnormally
high numbers compared to the rest. As well, due to the time constraints, I had to do 9
5
trials in a day, resulting in some outliers that had higher heart rates in the paces
In the following table, I have calculated the average heart rates for the last minute of the
experiment using the formula (𝑎1 + 𝑎2 + 𝑎3 + . . . 𝑎𝑛)/𝑛 with 𝑎 being the heart rates
added together and 𝑛 being the number of heart rates that are being added up. This will
reduce the ability for an outlier to affect the data, and will result in more consistency
The first calculation of 7km/h is demonstrated by inserting the last four recordings of
= 523/4
= 130. 75
Table for average heart rate in bpm in the last minute vs pace
Table 1: The average heart rates (BPM) for the last minute at each pace
6
From here, I could create a scatter plot graph and analyse the trend
Fig 1: Scatterplot graph of the Average heart rate for the last minute vs pace (excludes resting data point)
As well, I was able to calculate the resting heart rate to be around 67.5
Fig 2: table for the graph averaging of the resting heart rate
In order to determine the best form of graph for the data, I would need to utilize my
and compare the r-squared values of each form to determine the best model. For my
case, I want to explore the differences between exponential and linear functions to
model my data, and ultimately choose one to determine what the relationship of the
7
data is. I will limit the domain of the equations to be [0, 14] in following with the context
of the situation as we would not be able to achieve a negative speed as well as the
One way in which to determine the quality of an equation to data is to approximate the
line of best fit for that type of graph and determine its r-squared value for non-linear
data. The r-squared value is a number that quantifies the variation caused by the
relationship of two variables for a specific equation. In this case, the value would be
explaining the percentage variation that running at a faster pace causes heart rate to
increase or decrease. Thus this would give a quantifiable quality to which model best
describes the relationship. The r-squared calculations will be done in the raw data and
Exponential model
To start, I will first investigate an exponential curve, as that was my initial hypothesis for
an equation when speculating on the results. To start, we will be using the previously
𝑐𝑥
stated equation of 𝑦 = ℎ(𝑏) + 𝑎 in order to represent our data. In context of the
situation, 𝑦 would represent the average bpm in the last minute, 𝑥 being the pace at
which I run at, ℎ, 𝑏 and 𝑐 being variables we can manipulate to fit our graph and 𝑎 being
to adjust our graph’s asymptote via vertical translation. However, for the sake of
simplicity, we will be setting the variable of 𝑏 to two, as well as the value of ℎ to one in
8
order to both reduce the variables I would need to calculate for to just being 𝑐 and 𝑎,
To start, I first determined whether or not this was a positive or negative correlation. By
referring to fig 1, I determined that there would be a positive slope and thus a positive
leading coefficient. In addition, we can get the 𝑎 value of the equation as being 66.5 by
67. 5 = 𝑎 + 1
𝑎 = 66. 5
𝑥
Then, using the evidence previously gathered I plotted the equation of 𝑦 = 2 + 66. 5
due to it being able to fulfill the positive slope requirement and going through the x-axis.
9
When looking at the graph, it is apparent that the model created did not fit the graph due
to the slope’s magnitude being too large. Thus I would need to manipulate the slope by
determining a proper 𝑐 value to the equation. To do this, I would need to choose a point
for the graph to pass through, sub the x and y-values of the point into the equation,
isolate 𝑐 and sub the c value back into the equation to achieve a more accurate
representation of the data. The point I chose was (12,175), as that seemed like a
reasonable point to have the slope of the graph pass through. The following equations
𝑐(12)
(175) = 2 + 66. 5
𝑐(12)
108. 5 = 2
𝑙𝑜𝑔2108. 5 = 𝑐(12)
(1/12)𝑙𝑜𝑔2108. 5 = 𝑐
(1/12)(67. 761551232444) = 𝑐
𝑐 ≈ 0. 56
(0.56)𝑥
Thus subbing the value of 𝑐 into our previous equation we get 𝑦 = 2 + 66. 5 and
10
Fig 4: second attempt at creating an exponential representation of the data
Inferring from this graph, although the graph passes through both the y-intercept and
the picked point, it under estimates the values left of the point and over estimates the
values to the right of the point. In order to fix that, I would have to manipulate the
leading coefficient by making it smaller. However, that would change the c value as well,
and would include two variables I would have to solve for, making it so that it would be
impossible to just isolate a single variable to solve for. In the end, we ended up with a
relatively terrible r-squared value at 0.16, thus denoting that the graph only explains
Another method we could try would be to estimate a curve by following the parameters
and evidence that the data suggests, and try to tweak different variables to that match
11
the data. Using the evidence we gathered from figure 3 and 4, we can again start off
𝑥
with 𝑦 = 2 . First, I realized that the growth of the slope was too fast, and realized that I
needed to adjust the 𝑐 value to be smaller. Furthermore, while its growth needed to be
culled, I realized that to balance out that change I would need some other factor to
counteract and raise the graph’s slope without affecting the growth of the slope. Thus I
deduced that I needed an ℎ value that was larger than one. From there, I adjusted the ℎ
and 𝑐 values through said trial and error and stumbled across a curve that looked like it
(0.1)𝑥
could fit the graph. At that point, I had the equation 𝑦 = 85(2 ) producing a graph
like this.
Figure 5: The graph of creating an exponential curve through trial and that matches my data.
I then deduced that I would have to shift the graph down, and I decided to translate it
down to go through the y-intercept as that would guarantee it passes through at least
12
one point. To get the vertical shift downwards, I will subtract the value of the y-intercept
of the graph from the value of the y-intercept from the data and sub that into the 𝑎 value
𝑎 = (67. 5 − 85)
𝑎 =− 17. 5
(0.1)𝑥
All together, we get an equation of 𝑦 = 85(2 ) − 17. 5 that produced the following
graph.
Figure 6: Graph of the final equation with the use of trial and error to produce an exponentially modelled graph
As well, it has an r-squared value of 0.65, a 40% increase in explaining the relationship
between heart rate and pace compared to the previous attempt at creating an
exponential equation.
13
Linear model
Another model that looks like it could fit the graph would be a linear equation. To model
this, I will be using the equation 𝑦 = 𝑚𝑥 + 𝑏, where 𝑦 would represent the average
heart rate in the last minute, 𝑥 being the pace at which one runs at and 𝑏 being the
vertical translation that the graph will have. To start the creation of the graph, we can
immediately acquire the 𝑏 variable as being 67.5, as the value would equate to the
y-intercept which is also the average resting heart rate. The other variable we will need
to calculate is the 𝑚 value, which will be the value for the slope and how pace changes
heart rate. The 𝑚 value can be calculated using the formula 𝑚 = (𝑦 − 𝑦1)/(𝑥 − 𝑥1),
where 𝑥 and 𝑦 are the coordinates of a point on which the linear equation will pass
through and 𝑥1 and 𝑦1 being a different point. For one point, we are utilizing the
y-intercept of (0,67.5), as we want the equation to pass through the y-intercept. For the
other point, by referring to Fig 3, we can see that the point that would create a line best
suited to the graph would be (12,175), as that would result in a pretty even split between
the points on both sides of the equation. The following calculations is to calculate the 𝑚
𝑚 ≈ 8. 96
14
Subbing the 𝑚 and 𝑏 values into the model, the final equation for a linear model would
Fig 7: Graph of the linear model for the relationship between pace and heart rate
I personally like this graph, as it is pretty in the middle of all the data points as well as
looking like it passes through five of my data points. As well, when calculating the
r-squared value of 0.86 or explaining 86% of the variance between the pace and heart
rate, it was the largest out of all the r-values so far discussed within this paper. Overall,
this surprised me as being a better fit than my initial hypothesis of it being exponential.
To end off my investigation, I will be verifying the two models that I have created using
15
generated models as well as key points such as the y-intercept and ultimately decide on
Exponential model
I plugged in all the values from my table including the resting value into the calculator
and utilized the exponential regression function to find the line of best fit for an
𝑥
exponential model. The equation for the graph was 𝑦 = (77. 22646)1. 06931 and had
the highest r-squared value so far of 0.92. Interestingly, there was an absence of a
translation present, unlike the route I took while creating the first exponential model. As
well, it has a y-intercept of 77.2, indicating that the resting heart rate would be 77.2
bpm, which is off by 9.7. Furthermore, its small base value suggests that the data is
more suited to a linear equation rather than an exponential one like I had first
hypothesized. Overall, I personally think that the computer generated equation is the
best representation of an exponential model, as it has the highest r-squared value even
Linear model
For verifying my linear equation, I utilized the linear regression function to find the best
fit for the data. Utilizing the linear regression on my calculator gave me an r-squared
value of 0.94, a slightly higher r-squared value than the best fit for an exponential
model. As well, its y-intercept was at 72.7, closer to the resting value of 67.5 and
beating the exponential model by five units. As well, due to its nature as a linear
equation, the r value is able to be used as the correlation coefficient, describing both the
strength of the equation through its magnitude being closer to 1 and whether it's a
16
positive or negative relationship. With that in mind, judging from the r-value of 0.972, we
can see that its magnitude is quite close to 1, revealing that there is a strong correlation
between the heart rate and pace. As well, the r-value being positive rather than negative
suggests that the data is positively correlated, and thus we can conclude that the
relationship between heart rate and pace is strongly positively linearly correlated. As for
comparing it to my own linear regression, its high correlation suggests that it is indeed a
more accurate predictor for my heart rate. However, when wanting the resting heart
rate, my linear line of best fit would be more accurate, and thus diminishes some of its
significance.
Evaluating the best equation, we can first eliminate the possibility of it being an
exponential equation, as its r-squared value for the computer generated best fit is lower
that the computer generated best fit thus proving that the data follows a linear
relationship. As such, that leaves us with two possibilities, my own personally created
graph and the computer generated graph. Between the two, I chose my equation
𝑦 = 8. 96𝑥 + 67. 5 as the better predictor for heart rate, as although it’s r-squared value
was a bit smaller, it had a more accurate predictor for the resting heart rate, a value that
I would deem to be the most important due to the accuracy in its acquisition.
To reiterate, the purpose of this paper was to evaluate the relationship between heart
rate and pace at which I run at, and to ultimately determine a mathematical equation to
estimate heart rate’s value at different running paces. With that being said, I can
17
conclude that the data best follows a positive linear correlation, and has resulted in me
relationship being exponential, I was still able to reach an equation that could best
model the relationship, thus answering the aim that I had set out to achieve.
However, this paper did not come without regrets and limitations. For one, I am missing
a large amount of data relating to the middle between 0.5km/h and 6.5km/h, as the time
constraint for this experiment restricted me from gathering more data. As well, the
confounding variable that is my body played a major role in the experiment’s results, as
I was sick on the break that I had decided to gather the data, resulting in there being a
month gap between when some of the data was collected. Another issue was that the
time constraint and situation had me gathering 9 data points in a day, thus resulting in
the abnormally high number seen in the recordings from 10km/h to 13km/h. Lastly, I
would have liked to include a quadratic equation to see if it matched my data, but I was
In conclusion, the pace at which one runs at follows a direct linear relationship with
heart rate, and can be modelled best using the equation 𝑦 = 8. 96𝑥 + 67. 5.
18
bibliography:
- (Graphs and computer generated regressions were done on) Desmos | Beautiful
https://www.ncl.ac.uk/webtemplate/ask-assets/external/maths-resources/statistics/regressi
on-and-correlation/simple-linear-regression.html
- (Heart rate increases as you run faster, and good estimate for workload on body)
Chertoff, J. (2024, February 8). What’s my ideal running heart rate? Healthline.
https://www.healthline.com/health/running-heart-rate#:~:text=During%20aerobic%20exe
rcise%20like%20running,how%20hard%20you%27re%20working
19
Appendix:
20
Regression Calculations for Each Graph
21
22