Cattle Weight Estimation Using Linear Regression A
Cattle Weight Estimation Using Linear Regression A
Decree of the Director General of Higher Education, Research, and Technology, No. 158/E/KPT/2021
Validity period from Volume 5 Number 2 of 2021 to Volume 10 Number 1 of 2026
JURNAL RESTI
(Rekayasa Sistem dan Teknologi Informasi)
Vol. 8 No. 1 (2024) 72 - 79 e-ISSN: 2580-0760
Abstract
The global cattle farming industry has benefits as a food source, livelihood, economic contribution, land environmental
restoration, and energy source. The importance of predicting cow weight for farmers is to monitor animal development.
Meanwhile, for traders, knowing the animal's weight makes it easier to calculate the price of the animal meat they buy. The
authors propose estimating cattle weighting linear regression and random forest regression. Linear regression can interpret
the linear relationship between dependent and independent variables, and random forest regression can generalize the data
well. The dataset used in this study consisted of ten variables: live body weight, withers height, sacrum height, chest depth,
chest width, maclocks width, hip joint width, oblique body length, oblique back length, and chest circumference. To find out
the model that produces the smallest MAE value. The results show that the linear regression algorithm can produce estimated
weight values for cattle with the best performance. This model produces a mean absolute error (MAE) of 0.35 kg, a mean
absolute percentage error (MAPE) of 0.07%, a root mean square error (RMSE) of 0.5 kg, and an R² of 0.99. Each variable
has excellent correlation performance results and contributes to computer vision and machine learning.
Keywords: cattle; machine learning; linear regression; random forest regressor; prediction model
How to Cite: Anjar Setiawan, Ema Utami, and Dhani Ariatmanto, “Cattle Weight Estimation Using Linear Regression and
Random Forest Regressor”, J. RESTI (Rekayasa Sist. Teknol. Inf.), vol. 8, no. 1, pp. 72 - 79, Feb. 2024.
DOI: https://doi.org/10.29207/resti.v8i1.5494
importance of these animals makes it easier to calculate (MAE) value of 23.19 kg. The algorithms and data
the price of the meat they buy. Several studies have models used can still be improved [15]. Researchers
applied machine learning (ML) and deep learning (DL) found that the linear regression algorithm produced the
to predict animal weights as a form of technological best mean absolute error (MAE) value of 0.35 kg. They
innovation. This research shows cow weights using two are using a model of 100 cow data, feature selection,
machine-learning models. and 50-fold cross-validation. This shows that the linear
regression algorithm can outperform other models
The two models examined in this research are linear
tested based on the trained model.
regression and random forest regressor (RFR). The
advantage of using a random forest regressor (RFR) is This research contributed to creating a method for
that this model can generalize the data well [8]. estimating cattle weight by measuring nine-
Random forests can produce accurate models for dimensional cattle factors using computer vision
classification and regression [9]. The random forest techniques and regression algorithms. This can be used
technique is robust to data complexity. It is based on as a helpful tactic and helps monitor cows' weight
ensemble learning, using many randomly generated precisely and effectively. The findings of this study
decision trees to produce accurate predictions. The suggest that this method may be helpful in real-world
strength of Random Forest lies in its capacity to reduce situations, particularly in livestock management and
overfitting, increase model stability, and offer practical rearing. Additionally, this research highlights how
solutions in various classification and prediction machine learning and computer vision are applied in
scenarios [9]. Many industries involving random forest agriculture and animal husbandry. This study also
techniques in advanced data processing, including shows how linear regression can be used for predictive
bioinformatics, finance, health, and others, have modeling and reliably estimating livestock weights.
effectively used this approach [3]. Random forests can This study also emphasizes the importance of live
offer new perspectives in the investigation of predictive weight as a predictor variable in assessing livestock
models, support the reliability of research, and offer dimensions to increase prediction accuracy.
reliable answers to problems posed by the complexity
Prediction of cow weight based on measurements from
of modern data [8]. As a result, this research has the
images of the cow area using the Random Forests
potential to significantly contribute to advancing data
algorithm provides the best performance with a mean
analysis techniques and expanding knowledge
absolute error (MAE) of 13.44 kg and a correlation
regarding the capabilities and constraints of Random
coefficient of 0.75 [8]. They are predicting sheep
Forests.
weight based on images using a machine-learning
Meanwhile, linear regression can interpret the linear regression algorithm. The experimental results show
relationship between the dependent and independent that the random forest regressor (RFR) method
variables [8]. Linear regression has performance produces better error values with a mean absolute error
capabilities in prediction, relationship analysis, (MAE) of 3,099 kg compared to other machines.
variable selection, model evaluation, and causal Learning regression algorithm method [8], [16]. By
inference [10]. The influence of the independent using the training dataset (70%), test dataset (30%), and
variable on the dependent variable can be evaluated validation dataset (20% of the training dataset), this
through the use and understanding of linear regression research uses the Stacking Regressor algorithm to
[11]. Can investigate whether linear regression can produce the best performance in predicting pig weight
measure the linear relationship between independent with an MAE of 4,331 and MAPE 4,296 on the dataset
and dependent variables [12]. This approach aims to testing. The researchers used a data set of 340 pigs, and
measure the extent to which changes in one variable the proposed model could predict pig weights in the 86
can be associated with changes in other variables. The to 113 kg range.
P value of the regression coefficient will be an essential
In this experiment, the artificial neural network (ANN)
guide in determining the relationship between
method achieved impressive prediction model values,
variables, thereby leading to the validity of research
with an R2 accuracy of 0.7 and an RMSE of 42 kg.
findings [13]. By including control variables in the
However, it should be noted that the evaluation results
model, linear regression can be used as a reliable and
using 3D images of live animals and the ANN
comprehensive analysis method to increase the validity
algorithm show that there is still potential to increase
of research findings by controlling for other variables
the R2 and RMSE values. These findings provide an
[13]. Machine learning in this research provides
exciting challenge to improve the accuracy of
benefits in increasing prediction accuracy, prediction
prediction and model optimization using 3D images
model adaptability, and time and resource efficiency
and ANN algorithms in this research [17].
[14]. In the process, machine learning can produce
alternative models for predicting cow weight that are In the latest research regarding predictions in the
more accurate and efficient [14]. context of determining the Economic Index (EI) and the
Calving Interval (CI) approach in cattle, it was found
They measure cow weight predictions based on deep
that the best model for predicting EI is using the Neural
learning using a convolutional neural network (CNN)
Network Machine Learning Algorithm (NN MLA)
algorithm. Produces a top model Mean Absolute Error
with a Mean Absolute Error (MAE) of 20.72 and Root conducting research, and drawing conclusions based on
Mean Square Error (RMSE) of 29.35. Meanwhile, the the experiments.
best model for CI prediction uses the Gradient Boosting
Machine Learning Algorithm (GB MLA) with MAE
0.79 and RMSE 1.27 [18]. However, the results of this
study highlight that the data used needed to cover a
sufficient number of cattle. By expanding and varying
the training data set, increased prediction accuracy can
be achieved. These findings show the potential for
further development in optimizing predictions to
increase the efficiency of economic indices and
regulate calving intervals in cattle.
This research utilizes a sophisticated deep-learning
algorithm to estimate the pig's body weight by utilizing
images of the pig's back taken from an upper angle. The
algorithm combines R-CNN object detection speed
with regression neural network innovation, producing
weight estimates with a Mean Absolute Error (MAE)
of 0.644 kg and a relative error of 0.374%. This
algorithm can identify and localize the pig's position
and accurately predict the pig's body weight even if the
image's overlapping area is less than 30%. However,
variations in pig body posture can affect the accuracy
of body weight estimation. With the addition of training
data, overall accuracy can be improved, opening up
opportunities for implementing a more efficient non- Figure 1. Research Flow Diagram
contact pig weighing system [19]. 2.2 Dataset Collection
Based on the background and literature, they are The Full Cow Promer (FCP) dataset derived from
explained above. However, for the training data and Kaggle data is a cattle dataset from private farms in the
previous research models that could still be improved, Nizhny Novgorod region of Russia, which will be used
linear regression and random forest regressor (RFR) in this research [19]. The dataset consists of 100 pieces
methods were used to predict cow weight in this of data divided into ten variables: live weight, withers
research. This model produces a mean absolute error height, sacrum height, chest depth, chest width, width
(MAE) of 0.35 kg, a mean absolute percentage error in maclocks, hip joint width, oblique body length,
(MAPE) of 0.07%, a root mean square error (RMSE) oblique hind length, and chest girth. The dataset is cow
of 0.5 kg, and an R² of 0.99. This research aims to body measurement data, carried out manually using a
improve the results of smaller MAE values and measuring tape and recorded in centimeters [11].
contribute to studying computer vision and machine
learning. This research consists of four chapters, after 2.3 Data Preprocessing
which the research method will be explained, followed
Data reduction is carried out to reduce the complexity
by a discussion of the research results. The final section
and size of the data collected. The decline aims to
will close with conclusions.
eliminate irrelevant cow data. By reducing the amount
of cattle data analyzed, researchers can focus on the
2. Research Methods most critical and relevant data [8].
2.1 Research Workflow Data cleaning is carried out to ensure data quality. The
The research flow stages are shown in Figure 1, divided aim is to remove invalid, incomplete, and irrelevant
into four stages: data collection, preprocessing, data. It also provides accurate and reliable research
machine learning scenarios, and evaluation. In the first results [3].
stage, the full-cow-promer dataset obtained from Data labeling is carried out to provide classifications
Kaggle is used. The second stage is preprocessing, for each cow's data. The goal is to identify and
where the processes carried out are data reduction, differentiate data based on specific attributes. Labeling
cleaning, labeling, normalization, feature selection, and data in this research is essential for more focused and
50-fold ross-validation. The third stage is a machine relevant grouping, modeling, and statistical analysis
learning scenario, where a design is created to [20].
determine the best accuracy using data balancing with
a linear regression algorithm and a random forest Data normalization is carried out to convert data into a
regressor. Finally, the fourth stage is evaluation and standard form, making it easier to process and analyze
analysis, which includes evaluating the results, cattle data. Data normalization aims to eliminate scale
differences to ensure that each attribute has a balanced independence, homoscedasticity, and normality of
contribution to obtaining more accurate research results residuals [8]. Linear regression tends to be stable and
[16]. can provide exemplary performance in cases where the
relationship between the independent and dependent
Feature selection aims to identify the most relevant and
variables is linear [8]. Linear regression provides a
significant subset of features in the cattle dataset.
good overview of the linear relationship between
Feature selection aims to reduce data dimensions,
variables. If the relationship is linear, linear regression
increase computational efficiency, eliminate redundant
can provide accurate estimates [14].
components, and improve the performance of
prediction models [3]. As a more complex machine learning algorithm model,
it provides an interpretable benchmark against which to
K-fold cross-validation is carried out to test model
compare the performance of other models. Machine
performance more accurately and reliably by dividing
learning in this research offers benefits in increasing
the data into k subsets of the same size. The purpose of
prediction accuracy, adaptability of prediction models,
k-fold cross-validation is also to help evaluate the
and time and resource efficiency [14]. In the process,
stability and generalization of the model on never-
machine learning can produce alternative models for
before-seen cow data [8].
predicting cow weights that are more accurate and
2.4 Machine Learning Scenario efficient. The multiple linear regression algorithm can
provide performance for finding the best prediction line
This research predicts cow weight using two machine-
[22]. There are several components, including A, the
learning models. The two models examined in this dependent variable or predicted value; b, a constant; Z,
research are random forest regressor (RFR) and linear the independent variable; and c, the regression
regression. The importance of splitting the data set
coefficient. From this equation, a line can be drawn to
using 50-fold cross-validation can balance
predict the dependent variable based on the
computational efficiency and reliable performance
independent variable, namely Equation 2.
estimation. Applying data normalization can avoid bias
in the model and ensure that each feature has a balanced = + + +. . + (2)
contribution to the learning process [20]. This research
2.5 Model Evaluation
uses random forest because this model can generalize
data with good performance [8]. Random forests can In this research, model performance evaluation was
produce accurate models when carrying out carried out to determine the best model from the two
classification and regression [9]. Random forests can models that have been built, namely linear regression
reduce model overfitting, increase efficient computing and random forest regression. Model performance is
time, balance data weights, and select relevant features. measured by the mean absolute error (MAE), root mean
Estimation is essential in selecting the most relevant square error (RMSE), mean absolute percentage error
parts to improve the performance of a more accurate (MAPE), and R2 values, which are the methods used to
cow weight prediction model [3]. measure the accuracy of predictions made by the two
models. The modeling that was successfully created in
RFR can process large amounts of data with efficient
the previous stage will then be evaluated, which is
computing time [21]. Random forests are robust to
defined as Equations 3, 4, 5, and 6.
outlier noise, handle high-dimensional data effectively,
∑$
%&'|!"# "|
capture non-linear relationships, and provide estimates = (3)
of feature importance [9]. RFR can provide better cow
∑$
weight prediction results with high accuracy [3] +
%&' !"# "
( ) = * (4)
Random Forest can handle missing values well [21].
--% #ŷ
, = ∑/ /
Random Forest training can be easily parallelized,
allowing efficient and accelerated use of computing (5)
resources [9]. Random Forest is obtained from the most ∑ +
% #ŷ%
results from each decision tree [9] for RF, which ( = 1− ∑ + (6)
%# 3
consists of Z trees, where Y is the indicator function,
and a is the tree of the RF, defined as Equation 1. 3. Results and Discussions
= ∑ (1) Data on cattle belonging to private farms in the Nizhny
The advantage of linear regression compared to other Novgorod region, Russia. Data collection of all
methods is that it can interpret the linear relationship selected cows was collected in the pen for manual body
between dependent and independent variables [4]. Has measurements [11]. The nine body measurements
performance capabilities in prediction, relationship shown in Figure 2 were taken manually by an expert
analysis, variable selection, model evaluation, and using a measuring tape and recorded in centimeters.
causal inference [6]. In feature selection, linear In Figure 2, you can see each cow's dimensions such as:
regression can be used by analyzing the significance of (1) withers height, (2) hip height, (3) chest depth, (4)
coefficients to test the assumptions of linearity,
heart girth, (5) ilium width, (6) hip joint width, (7)
oblique body length, (8) hip length, (9) chest width.
Markers were made on the cow's body using white Figure 3. Performance Metrics Linear Regression Algorithm with
paint during manual measurements. Then, the Nine Variables and 50-fold Cross Validation
automatic system uses the cow's body parameters using
Figure 3 shows that the linear regression algorithm is
anatomical features. Bone protrusions and depressions
known to have an accuracy level of MAE values of 0.52
on the surface of the cow's body can be measured as
kg, MAPE of 0.12%, RMSE of 0.73 kg, and R-square
anatomical markers [23].
of 0.99.
The researcher's main objective is to find a suitable
Figure 4 shows performance metrics results with eight
model to apply cow weight predictions and find the
variables and 50-fold cross-validation using the linear
model that produces the minor mean absolute error
regression algorithm.
(MAE) error value using the liner regression and
random forest regressor methods.
Table 1 shows the abbreviations and definitions for the
estimated body size of cattle.
Table 1. Abbreviations and definitions of cow body size
Abbreviation Definition
Withers height Vertical distance from the highest
(WH) point on the withers to the highest
point on the bottom of the toe
Hip height (HH) The vertical distance from the
highest point, the hip bone, to the
lowest point, the ground at the
level of the hind legs
Chest depth (CD) The vertical distance from the Figure 4. Performance Metrics Linear Regression Algorithm with
back to the base of the father in the Eight Variables and 50-fold Cross Validation
farthest-reaching section of the
father Figure 4 shows that the linear regression algorithm is
Heart girth (HG) Body circumference at a point just known to have an accuracy level of MAE values of 0.53
posterior to the front leg and kg, MAPE of 0.12%, RMSE of 0.73 kg, and R-square
shoulder and perpendicular to the
body axis
of 0.99.
Ilium width (IW) Distance between the outermost
points of the ilium bone
Figure 5 shows performance metrics results with seven
perpendicular to the base. variables and 50-fold cross-validation using the linear
Hip joint width Comparison of two hip joint regression algorithm.
(HJW) points that aren't moving forward
quickly
Oblique body From the internal posterior
length (OBL) ischium to the anterior humerus'
extremity
Hip length (HL) From the posterior extreme of the
internal ischium to the outermost
point of the ilium
Chest width (CW) Posterior shoulder perpendicular
to the back distance between
corner points
Figure 3 shows performance metrics results with nine Figure 5 shows that the linear regression algorithm is
variables and 50-fold cross-validation using the linear known to have an accuracy level of MAE values of 0.79
regression algorithm.
joint width, (7) oblique body length, (8) hip length, and
(9) chest width has excellent correlation performance
with different color signs to the Live Weithg variable.
3.4 Best Evaluation Model Value
Table 2 shows the results of the best evaluation model
values with five variables and 50-fold cross-validation
using the linear regression algorithm as follows:
Table 2. Best Evaluation Model Value
MAE MAPE RMSE R-square
0.35 kg 0.07% 0.5 kg 0.99
Figure 11. Performance Metrics Random Forest Regressor
Algorithm with Six Variables and 50-fold Cross Validation Table 2 shows that the linear regression model is
superior in predicting outcomes with striking
Figure 11 shows that the random forest regressor evaluation value results. With the best MAE value of
algorithm is known to have an accuracy level of MAE 0.35 kg, MAPE of 0.07%, RMSE of 0.5 kg, and R-
values of 2.2 kg, MAPE of 0.51%, RMSE of 2.2 kg, square reaching 0.99, linear regression proves its
and R-square of 0.94. accuracy compared to random forest.
Figure 12 shows performance metrics results with five This advantage is due to the simple and linear nature of
variables and 50-fold cross-validation using the random linear regression, which effectively captures the
forest regressor algorithm. relationship between input and output variables. These
results are in line with the literature highlighting the
usefulness of linear regression in cases where the
relationships between variables tend to be linear. With
a more straightforward approach, linear regression may
be more agile and efficient, avoiding overfitting that
may occur in more complex models such as random
forests.
Although random forests have advantages in dealing
with data complexity and non-linear patterns, this
research shows that for specific datasets, linear
regression is more suitable and provides more accurate
and stable results. These findings contribute to our
Figure 12. Performance Metrics Random Forest Regressor
Algorithm with Five Variables and 50-fold Cross Validation understanding of the contexts in which linear
regression may be a superior choice in predictive
Figure 12 shows that the random forest regressor modeling.
algorithm is known to have an accuracy level of MAE
values of 2.4 kg, MAPE of 0.54%, RMSE of 2.4 kg, 4. Conclusions
and R-square of 0.95.
The results of the cattle weight prediction experiment
3.3 Relationship Between Variables using the linear regression method produced the best
Figure 13 shows a pattern of positive relationships mean absolute error (MAE) value of 0.35 kg, mean
between the nine variables and the live weight variable. absolute percentage error (MAPE) of 0.07%, root mean
square error (RMSE) of 0.5 kg, and R-square of 0.99
compared to the random forest regressor method, and
the correlation between variables is perfect in
predicting cow weight. These results confirm that linear
regression not only provides accurate predictions but is
also stable and consistent in measuring the variability
between predictions and actual data in the cattle
farming industry more effectively and efficiently. This
research only focuses on how to produce minor mean
absolute error (MAE) error values, so model
optimization has not been carried out, thus opening
opportunities for further research in the future to test
Figure 13. Variable-Pattern Relationships the model in different environmental conditions or with
various cattle breeds.
It can be seen from Figure 13 that the relationship
between the variables (1) withers height, (2) hip height,
(3) chest depth, (4) heart girth, (5) ilium width, (6) hip