Reliability Validityand Norm Referencesof Standing Broad Jump
Reliability Validityand Norm Referencesof Standing Broad Jump
net/publication/353180493
CITATIONS READS
7 5,441
1 author:
Zarizi Ab Rahman
Universiti Teknologi MARA
21 PUBLICATIONS 22 CITATIONS
SEE PROFILE
All content following this page was uploaded by Zarizi Ab Rahman on 17 July 2021.
Zarizi Ab Rahman1; Azlan Ahmad Kamal2; Mohad Anizu Mohd Noor3; Soh Kim Geok4;
Alnedral5
1
Faculty of Education, Universiti Teknologi MARA, Selangor Branch, Selangor, Malaysia.
2
Faculty of Education, Universiti Teknologi MARA, Selangor Branch, Selangor, Malaysia.
3
Faculty of Sport Science and Recreation, Universiti Teknologi MARA, Selangor, Malaysia.
3
mohadanizu@uitm.edu.my
4
Faculty of Educational Studies, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia.
5
Coaching Department, Faculty of Sport Science, Universitas Negeri Padang, Indonesia.
Abstract
Standing Broad Jump (SBJ) is a field test used to assess leg power. This study aims to determine the
reliability, validity, and develop norm reference among adolescents. The evidence of the reliability,
validity, norm need to establish in the particular population to support the interpretation of the
scores. This study involved 60 subjects and six raters for reliability and validity. 417 subjects for
norm development. The ICC, test-retest, and Pearson Correlation were used to determine the
reliability. Independent sample t was used to determine the validity. The standard deviation method
was used to construct the norm reference. Findings showed the ICC was high among male raters
(.96) and female raters (.99). The consistency of the instrument excellent among males and female
subjects (r = 96, r = .90). Independent sample t-test showed t value (58) = 3.395, p = 0.01 was
significant. Finding showed a significant difference between the elite (M= 2.0871) and non-elite
athlete (M= 1.897) for male. While, there are significant different for female subjects’ t value (58) =
7.324, p = 0.00 was significant. The difference showed the SBJ has the construct validity evidence in
this population. This study also indicated new norm for SBJ as (M = superior 2.54 above, F = 2.06
above, excellence, M= 2.26-2.53, F= 1.74-2.05, good, M= 2.97-2.25, F= 1.40-1.73, average, M=
1.69-1.96, F = 1.06-1.39, poor, M= 1.68 and below, F= 1.05 and below). The results suggest that
SBJ are reliable and validity with the norm reference to assess leg power. This study also will
enhance the quality of Physical Education teacher either local or abroad. Hence, quality teachers
should produce pupils with the balance of intellectual, spiritual, emotional, and physical. This study
also provides new direction for others researcher to conduct new study especially in term of
methodology and population.
Key-words: Objectivity, Reliability, Construct Validity, Norm Reference, Elite, Non-elite, Known
Group Method.
Muscular power is the ability to generate maximum force in the fastest possible time (Miller,
2014). The importance of muscular power is well established in human sports performance (Taipale,
Mikkola, Vesterinen, Nummela, & Hakkinen, 2013; Ronnestad, Kojedal, Losnegard, Kvamme, &
Raastad, 2012). Muscular power also essential for health outcomes among youth and associated with
bone health by increasing bone mass (Reid & Fielding, 2012; Ginty, Rennie, Mills, Stear, Jones, &
Prentice, 2005) to protect from osteoporosis and other bone diseases. Daily life activities such as
walking, climbing stairs, or standing from a seated position require muscular leg power. Hence,
muscular power is very significant not only for the athletes but also on normal population. It also
requires good muscular power, especially when involving in the occupation, which requires a lot of
walking, climbing stairs, and more.
Standing Broad Jump (SBJ) or Standing Long Jump (SLJ) is a field test used to assess
explosive leg power or the ability to apply force in a horizontal direction (Madruga-Parera, Bishop,
Fort-Vanmeerhaeghe, Beltran-Valls, Skok, & Romero-Rodriguez, 2020; Stauffer, Nagle, Goss, &
Robertson, 2010). Although lab tests such as the Wingate test cycle ergometer provide accurate
measurement, it still lacks feasibility. Therefore, field tests can be used to estimate muscular power.
The SBJ test used a simple protocol, time-efficient protocol, and does not require complicated
equipment (Chung, Chow, & Chung, 2013; Burr, Jamnik, Baker, Macpherson, Gledhill, & McGuire,
2008). Furthermore, the test has also been proposed by AAHPER Youth Fitness to assess leg power
(Morrow, Mood, Disch, & Kang 2015). Standing Broad Jump has been used in the Physical Fitness
Test as one of the instruments to assess muscular power in the selection process among candidates of
the Physical Education program in the Faculty of Education, Universiti Teknologi MARA (UiTM).
During the four years of taking Physical Education program, students not only involved in the
classroom learning process, but they need to complete practical activities outside of the classroom,
such as outdoor education, sports, games, athletic, and other curriculum activities. Upon graduation,
they will become a Physical Education teacher at the school. Therefore, it is essential for the
candidates who take the Physical Education course to have a high level of physical fitness, including
lower body muscular power, to effectively perform their daily tasks as students of Physical
Education. Data collection of lower body muscular power is critical to ensure the selection process
fulfils the Physical Education program's criteria.
Valid and reliable tests of motor competence such as SBJ are necessary to allow researchers
or practitioners to identified motor competencies, skill deficiencies and determine the effectiveness of
2. Literature Review
Standing Broad Jump is still used in Malaysia's education system to assess students' leg power
in school, college, and universities, especially in Physical Education and sport science classes.
Ministry of Education in Malaysia also uses SBJ as one of the instruments to assess leg power in the
Physical Education program's selection process in the Institute of Teacher Education in Malaysia
(MOE,2019). Wakai & Linthorne (2005) divided SBJ performance into three parts: (a) the take-off
distance, which is defined as the horizontal distance between the take-off line and the jumper's centre
of mass at the instant of take-off, (b) the flight distance, which is the horizontal distance travelled by
the centre of mass while airborne and (c) the landing distance, which is defined as the distance
between the centre of mass and the heels of the feet at the instant of landing. Reliability is one of the
critical characteristics of a good test. A reliable measure is consistently unchanged over a short period
of time (Baumgartner, Jackson, Mahar & Rowe, 2007). For example, if an individual whose power
ability has not changed is measured twice, the two scores will be identical or consistent within two
days consecutively. Reliability is vital, and for a measurement to have validity, it must be reliable
(Baumgartner, Jackson, Mahar & Rowe, 2007). Scholtes, Terwee, & Poolman (2011) and Khoo & Li
(2016) stated that the type of reliability included using different sets of items from the same
measurement instrument (internal-consistency), across time (test-retest), by different persons
(variation between two or more raters) and on the same occasion (intra-rater). Baumgartner, Jackson,
Mahar, and Rowe (2007) and Miller (2014) also stated lack of agreement among scorers, lack of
consistent performance by the individual tested, failure of an instrument to measure consistently are
the factors of measurement error. The higher the error in any assessment information, the less reliable
it is, and the less likely it is to be useful. Hence, the lower the measurement error, the higher the
reliability, and thus, the measurement instrument is said to be of good quality. A few methods
The new definition of validity is related to validity evidence (Baumgartner, Jackson, Mahar &
Rowe, 2007). Validity evidence refers to empirical evidence that supports the adequacy and
appropriateness of interpretations and actions based on test scores or other modes of assessment
(Messick, 1989b). While Cronbach (1971) asserted that what needs to be valid is the meaning or
interpretation of the scores. Baumgartner, Jackson, Mahar, and Rowe (2007) and Cronbach (1971)
also stated that we do not validate the test but collecting evidence to validate the interpretations made
from the test score. Validity evidence must be collected to support the interpretation of the scores,
either logically or statistically. A few methods of the construct validity evidence of physical activity
can be investigated. Mahar & Rowe (2002) stated some parts of validity theory in psychology and
educational measurement does not seem to fit the different types of research in exercise science.
Therefore, a strong method of construct validation that will fit a wide variety of constructs and
contexts, especially those relating to the study of physical activity, needs to be established.
Baumgartner, Jackson, Mahar & Rowe (2007) suggested construct validity evidence for
exercise science and Physical Education can be determined based on judgment by expertise in a
related area of the variables, comparison of the performance of the group before and after instruction
or training, and statistical procedure namely factor analysis to identify constructs and the test that
yield score leading to valid interpretation. Miller (2013) also reported comparing the mean difference
for elite and non-elite performers as one of the procedures to determine the construct validity
evidence. That method is also known as known different group validity, referring to a test that
discriminates between two groups known to differ on the variable interest (Davidson, 2014; Mahar &
Rowe, 2002). This type of evidence is similar to the “known groups” method, originating by
Cronbach and Meehl (1955). Mahar and Rowe (2002) asserted construct validity evidence exists if
two or more populations differ on a construct. This should be reflected in significant mean
differences on a measure of that construct. Group differences are determined using a parametric or
nonparametric statistic that allows group comparisons such as a t‐test or analysis of variance
(ANOVA), with post‐hoc analysis (McConnell, Kolopack, & Davis, 2001).
The previous researcher in exercise science and Physical Education has used several methods
in determining the reliability and construct validity evidence of SBJ. Reid, Dolan, & De Beliso
(2017) used interclass and intra class reliability coefficients (ICC) to determine the test battery's
Reliability and validity evidence for the SBJ was determined by ICC method, test-retest, and
comparative design suggested by Miller (2014) and Baumgartner, Jackson, Mahar, and Rowe (2007).
A total of sixty subjects (30 males, 30 females) and six assistant raters were involved in this study
consisting of three males and three females from the Physical Education Program at the Faculty of
Education. Therefore, the ICC method was used to obtain inter-rater reliability. Each of the raters is
given intensive training on the administration to enhance the objectivity of the raters. The raters will
provide the same protocol before they administer the test. Each rater will measure SBJ to all the
subjects separately and independently on the same day. The time interval will be given to the raters
after each test is completed. The ICC method was used to obtain an agreement between raters.
While test-retest with 24 hours' time interval and Pearson Correlation was used to determine the
instruments' reliability. The shorters of time interval are considering because no fatigue effects were
found for the SBJ (Artero, Girela-Rejon, Mora, Sjostrom, & Ruiz, 2010) and enhancing the reliability
of the instrument (Bishop, 2008).
On the other hand, evidence of construct validity was obtained by comparing mean scores
between 30 elite athletes and 30 non-elite male and female athletes. The elite players consider
representing the campus for an inter-campus tournament, while non-elite players represent the faculty
for an inter-faculty tournament in handball. The handball players are selected because the handball
players required explosive leg power for throwing the ball with power and speed, which are met
through jumping and physical contact with the opponent (Akilan, & Chittibabu, 2014). The elite
athlete is known to be better of ability compared to the non-elite. Hence, the instruments have
construct validity evidence whenever the elite athlete's mean score is superior to the non-elite athlete
(Miller, 2014; Baumgartner, Jackson, Mahar, & Rowe, 2007; Thomas & Nelson 2001). An
independent t-test was conducted to determine the significant difference between the two groups.
Norm reference-grade with the standard deviation method was used to establish the norm in this
study. The total selected subject is 417 (Male = 207, Female = 210). According to Morrow, Jackson,
Disch, and Mood (2005), the sample size needed for the norm development should be at least 200
people for each variable. All selected subjects for norm development were adolescents age 19-22
male and females who were candidates that undergo fitness test for admission to the Physical
Education program in Faculty of Education, UiTM for 2019 and 2020 intakes. Permission from the
parents or guardians and the declaration of health status were received before the test administration.
The SBJ test obtained data in this study. The test was performed on a hard surface, and participants
Data were analysed using SPSS for Windows ver.26.0. Descriptive statistical methods were
used to obtain the mean and standard deviation. While ICC for inter rater reliability of objectivity,
test-retest, Pearson correlation for reliability evidence, and independent t-test used to establish
construct validity evidence, and standard deviation method for norm development. Table 1 showed a
high degree of reliability was found between male and female inter raters for SBJ measurements. The
average measure ICC was .968 with a 95% confidence interval from .941 to .984 (F (29,58) = 30.639,
p<.001). Findings also showed a high degree of reliability was found between female inter raters for
SBJ measurements. The average measure ICC was .996 with a 95% confidence interval from .993 to
.998 (F (29,58) = 269.457, p<.001).
95% Confidence
F Test with True Value 0
Interval
Table 2 showed the reliability evidence for SBJ male and female subjects. Findings indicate
the SBJ instrument has high reliability both in male and females subject (r = .96 & .90) after test-
retest.
Male Female
Test Retest Test Retest
Test Pearson Correlation 1 0.965** 1 0.900**
Sig. (2-tailed) .000 .000
N 30 30 30 30
** **
Retest Pearson Correlation .965 1 0.900 1
Sig. (2-tailed) .000 .000
N 30 30 30 30
**. Correlation is significant at the 0.01 level (2-tailed).
Table 3 showed the descriptive result SBJ test for elite athlete and non-elite athlete. Findings
showed mean for male elite athlete (M= 2.0871, SD= 0.236) better than non-elite athlete (M= 1.897,
SD = 0.192) for male. Findings also showed mean for female elite athlete (M= 1.597, SD = 0.147)
also better than non-elite athlete (M= 1.337, SD = 0.125).
An independent sample t-test was used to determine construct validity evidence based on the
comparison between elite athletes and non-elite athletes. Tables 4 and 5 showed the findings of
independent sample t-test showed t value (58) = 3.395, p = 0.01 was significant. Findings showed
there were significant different between elite (M= 2.0871, SD .2365) and non-elite athlete (M= 1.897,
SD= .1925). This study also showed significant differences among female subjects. Independent
sample t-test showed t value (58) = 7.324, p = 0.00 was significant. Findings showed there were
significant different between elite (M= 1.5971, SD .1473) and non-elite athlete (M= 1.33, SD= 1252).
The difference showed the SBJ has the construct validity evidence in this population.
Table 5 - Independent Sample T Test for Construct Validity Evidence for Female Subjects
Levene's Test for Equality of
t-test for Equality of Means
Variances
95% Confidence
Sig. (2- Std. Error Interval of the
Mean Difference
tailed) Difference Difference
F Sig. t df Lower Upper
Equal
SBJ variances 0.030 0.862 7.324 58 0.000 0.25951 0.03543 0.16514 0.35388
assumed
Table 6 showed the descriptive statistical analysis of the SBJ test showed that the mean score
and standard deviation for the entire male subjects were (M = 2.11, SD = 0.287) while for female
subjects (M= 1.56, SD = 0.339). The standard deviation method assumes the data are normally
distributed. Hence, skewness and kurtosis were used to determine the normality. Hair, Black, Babin,
& Anderson (2010) and Bryne (2010) asserted that data is considered normal if skewness is between
‐2 to +2. The values for kurtosis between -2 and +2 are considered acceptable to prove normal
distribution (George & Mallery, 2010). Table 10 showed that the distribution of data is considered
normal. Therefore, the standard deviation method can be used for norm development in this study.
Table 7 showed the arrangement for the standard deviation method suggested by Miller
(2014) for five grades. Based on the findings, table 7 showed five categories: superior, excellence,
good, average, and poor for adolescents aged 17 until 22, male and female.
These findings illustrate the ICC for male raters (.96) and female raters (.99) was excellent.
Portney & Watkins (2000) suggested that ICC values less than 0.5 indicate poor reliability, values
between 0.5 and 0.75 indicate moderate reliability, values between 0.75 and 0.9 indicate good
reliability. Values greater than 0.90 indicate excellent reliability. Baumgartner, Jackson, Mahar, &
Rowe (2007) stated that the inter scorer objectivity coefficient should be at least .80. Hence, this
study showed that different raters could easily administer the SBJ protocols without the raters having
different scores. The instrument's consistency was determined by test-retest, and Pearson Correlation
showed r = .96 for male subjects, and r = .90 for female subjects was very high. Miller (2014)
suggested the correlation coefficient for reliability between ±.80 to 1.00 (very high), ±.60 to .79
(high), ±.40 to .59 (moderate), ±.20 to .39 (low), below .20 (extremely low). The instrument's high
reliability due to the close time interval (24 hours) between the test and retest. Bishop (2008)
suggested the shorter the time interval, the higher the reliability of the instrument. The longest time
intervals may cause some physical changes and affect the measurement process. The independent
sample t-test showed there are significant differences for leg power between elite and non-elite
players. This finding indicates that the SBJ test can discriminate the subjects' abilities in terms of leg
power. One of the characteristics of a good test should discriminate students' abilities (Jacob &
Rothstein, 2016). The validity of the instruments will determine whether it can be measured the
construct and yield valid interpretation. The known difference group evidence can determine to
construct validity evidence if two or more populations differ on a construct (Mahar & Rowe, 2002).
The analysis of the norm development for SBJ illustrates Standing Broad Jump score by gender. The
grade provided in this study allows for comparisons of leg power with other populations in the same
categories. For example, average standing broad jump scores obtained in this sample shows not much
of a difference to what was reported in Saint-Maurice, Laurson, Kaj, & Csanyi, (2015) and Sharma
(2014). The SBJ norms also allow the proper selection of the candidates for Physical Education
programs because the objectivity, reliability, and construct validity process in this population was
determined. So, the grade should provide meaningful interpretation in the population.
The main conclusion that can be drawn is that the precise SBJ test battery in terms of
reliability, construct validity, and the latest norm will enhance the success of the evaluation and
interpretation for all candidates to be selected as Physical Education teacher candidates in Malaysia
as well as abroad countries. The proper selection process will ensure the candidates are healthy and fit
to fulfil the requirements of the Physical Education program throughout the study period. Only
quality teacher candidates can equip themselves with all the skills required throughout the study.
Furthermore, these findings also will improve the Physical Education teacher to serve better in school
and contribute significantly to all the pupils to achieve the aims of education worldwide. Moreover,
quality teachers should prepare pupils with the balance of intellectual, spiritual, emotional, and
physical, especially in volatility, uncertainty, complexity, and ambiguity (VUCA) due to the rapidly
changing and hyper-connected world. Every teacher needs to equip with all the necessary skills to
face the VUCA world to provide high quality of teaching and learning in school. This study also
provides a new direction for other researchers to conduct future research, especially with different
methodology and population.
References
Akilan, N., & Chittibabu, B. (2014). Comparison of leg explosive power between volleyball and
handball players. Paripex Indian Journal Research, 3, 55-56.
Almuzaini, K. S., & Fleck, S. J. (2008). Modification of the standing long jump test enhances ability
to predict anaerobic performance. The Journal of Strength & Conditioning Research, 22(4), 1265-
1272.
Ayan-Perez, C., Cancela-Carral, J. M., Lago-Ballesteros, J., & Martínez-Lemos, I. (2017). Reliability
of sargent jump test in 4-to 5-year-old children. Perceptual and Motor Skills, 124(1), 39-57.
Baumgartner, T. A., & Jackson, A. S. (1998). Measurement for Evaluation in Physical Education and
Exercise Science (Ed. 6). WCB/McGraw-Hill.
Bishop, P.A. (2008). Measurement and Evaluation in Physical Activity Applications. Arizona:
Holcomb Hathaway.
Bulten, R., King-Dowling, S., & Cairney, J. (2019). Assessing the validity of standing long jump to
predict muscle power in children with and without motor delays. Paediatric Exercise Science, 31(4),
432-437.
Burr, J. F., Jamnik, R. K., Baker, J., Macpherson, A., Gledhill, N., & McGuire, E. J. (2008).
Relationship of physical fitness test results and hockey playing potential in elite-level ice hockey
players. The Journal of Strength & Conditioning Research, 22(5), 1535-1543.
Byrne, B. M. (2010). Structural Equation Modelling with AMOS: Basic Concepts, Applications, And
Programming (Multivariate Applications Series). New York: Taylor & Francis Group.