Metals 12 00001 v3
Metals 12 00001 v3
Article
Development of Data-Driven Machine Learning Models for the
Prediction of Casting Surface Defects
Shikun Chen 1, *,† and Tim Kaufmann 2, *,†
Abstract: This paper presents an approach for the application of machine learning in the prediction
and understanding of casting surface related defects. The manner by which production data from a
steel and cast iron foundry can be used to create models for predicting casting surface related defect
is demonstrated. The data used for the model creation were collected from a medium-sized steel
and cast iron foundry in which components ranging from 1 to 100 kg in weight are produced from
wear and heat resistant cast iron and steel materials. This includes all production-relevant data from
the melting and casting process, as well as from the mold production, the sand preparation and
component quality related data from the quality management department. The data are tethered
together with each other, the information regarding the identity and number of components that
resulted in scrap due to the casting surface defect metal penetrations was added to the dataset. Six
different machine learning algorithms were trained and an interpretation of the models outputs was
accomplished with the application of the SHAP framework.
Keywords: casting defects; metal penetrations; steel casting; cast iron; machine learning; SHAP;
Citation: Chen, S.; Kaufmann, T. data-driven process modeling
Development of Data-Driven
Machine Learning Models for the
Prediction of Casting Surface Defects.
Metals 2022, 12, 1. https://doi.org/
1. Introduction
10.3390/met12010001
In the coming years, the foundry industry will have to face ever greater challenges in
Academic Editor: Noé Cheung terms of resource and energy optimization due to the expected ecological and economic
Received: 2 November 2021 change towards a climate-neutral Europe. In order to curb climate change and prevent
Accepted: 17 December 2021 a further increase in the earth’s temperature, the European Union has launched the “Eu-
Published: 21 December 2021 ropean Green Deal” in 2019. This political and economical program aims to extend the
current target of reducing the European Union’s CO2 emissions by 40 percent (by 2030
Publisher’s Note: MDPI stays neutral compared to 1990 levels) to a reduction of 50 to 55 percent [1]. For this reason, sustainability
with regard to jurisdictional claims in
and resource conservation are moving to the center of the future business orientation
published maps and institutional affil-
of the manufacturing industry, especially the foundry industries. The foundry industry,
iations.
which is one of the most energy-intensive industries in Germany, employs 70,000 people
in around 600 iron, steel, and non-ferrous metal foundries. The economic importance of
this sector, which is mostly structured as medium-sized enterprises, results from its classic
supplier function and its role as a key sector for the automotive and mechanical engineering
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
industries. From this leading position, foundries have to face a variety of technological,
This article is an open access article
economic, and sustainability-related challenges, as well as increasing competitive pressure:
distributed under the terms and
Competition from emerging markets, increased demands for flexibility, rising delivery, and
conditions of the Creative Commons quality pressure, the strategic goal of a CO2 -neutral Europe, and increasing globalization
Attribution (CC BY) license (https:// are forcing new approaches to maintain competitiveness. Against this backdrop, sustain-
creativecommons.org/licenses/by/ able safeguarding of future viability requires the consistent further development of process
4.0/). technology and control through the application of intelligent, adaptive manufacturing
techniques. There is a big potential in the digitization of the process chain to make manu-
facturing more resilient, sustainable and flexible. The continuing dynamic development of
digitization and the associated methods and technologies have a considerable influence
on the requirements and designs of production systems. For efficient use of production
data, data analysis with machine learning methods or by means of deep learning and
transfer learning approaches and a reflection of the analysis results back into production is
essential, whereby so-called cyber-physical production systems can be built [2]. By using
cyber-physical production systems, the optimization of individual processes or parameters,
as well as the optimization of the process chain is possible [3]. Thus, a sustainable process
design is achievable in order to meet increasingly complex process requirements. Various
approaches for the application of AI methods and assistance apps in the foundry industry
already exist, such as [4–8]. However, a holistic ecosystem of assistance application for
the complete foundry production system does not exist. One possible factor achieving
the above-mentioned goals of resource and energy optimization is the reduction of scrap
rates. High scrap rates weigh doubly detrimental to the resource and energy balance, as
components that have already been cast are required to be melted down and manufactured
again. This case study, therefore, investigates to what extent machine learning can help
model the problem of casting surface defects (exemplary metal penetration) in a foundry
and draw conclusions for the manufacturing process.
2. Machine Learning
Machine Learning (ML) is regarded as programming computers to optimize a capa-
bility criterion using sample data or experience [10]. Machine Learning concentrates on
the development of computer programs that can access data and employ it to learn for
themselves automatically. For the past few years, machine learning has received much
attention. Due to the growing volumes and varieties of available data, computational
processing that is cheaper and more powerful, all of this trends enable a more complex,
Metals 2022, 12, 1 3 of 15
robust, and accurate system to be built, which are capable of any of the facilities that
are associate with intelligence, such as computer vision, natural language processing,
automotive, manufacturing, foundry, etc.
In practice, machine learning algorithms can be divided into three main categories
based on their purpose. Supervised learning occurs when an ML model learns from sample
data and associated target responses that can consist of numeric values or string labels,
such as classes or tags, to later predict the correct response when posed with new examples
Ref. [11]. Supervised learning is frequently applied in applications where historical data
predicts likely future events. Generally, supervised learning problems are categorized into
classification and regression. In regression problems, the model tries to predict results
within a continuous output based on the continuous functions. For example, it can predict
the price of a house according to the size and the living area. In contrast to regression, in
classification output is predicted in discrete value, such as yes or no, true or false, positive
or negative, etc. For instance, given an image, correctly classify whether the content is
composed of cats or dogs (or a combination of the both).
Unlike supervised learning, which is trained using labeled examples, an unsupervised
learning model is prepared by deducing structures present in the input data without
any associated response, leaving the algorithm to determine the data patterns on its
own [11]. A typical example is the recommendation system, it can determine segments
of customers with similar characteristics who can then be treated similarly in marketing
campaigns. There are many popular techniques including clustering and dimension
reduction. Reinforcement learning is concerned with the issue of how to interact with
the environment and to choose suitable actions in a given situation, to maximize the
reward [12]. This type of learning is often used for robotics, gaming, and navigation since
these applications must make decisions and the decisions bear consequences.
In this case the prediction of the casting quality of steel components produced by the
sand casting process, algorithms for this purpose are supervised learning, in addition to
the algorithms from the category regression. The basic workflow of a machine learning
project are given below.
learning process. The machine learning models applied in this research will be explained
in Section 2.2.
To measure the model performance for regression problem, the following metrics
are introduced:
• Mean absolute error (MAE): is a loss metric corresponding to the expected value of
the absolute error loss. If ŷi is the predicted value of the i-th sample, and yi is the
corresponding true value, n is the number of samples, the MAE estimated over n is
defined as:
n −1
1
MAE(y, ŷ) =
n ∑ |yi − ŷi | (1)
i =0
• Mean squared error (MSE): is a loss metric corresponding to the expected value of the
squared error. MSE estimated over n is defined as:
n− 1
1
MSE(y, ŷ) =
n ∑ (yi − ŷi )2 (2)
i =0
• Root mean square error (RMSE): is the square root of value obtained from MSE:
√
RMSE(y, ŷ) = MSE (3)
• R2 score: represents the proportion of variance of y that has been explained by the
independent variables in the model. It provides an indication of fitting goodness
and, therefore, a measure of how well unseen samples are likely to be predicted by
the model:
n
∑ (yi − ŷi )2
i =1
R2 (y, ŷ) = 1 − n (4)
∑ (yi − ȳ)2
i =1
• Root mean squared logarithmic error (RMSLE): computes a risk metric corresponding
to the expected value of the root squared logarithmic error:
v
n −1
u
u1
RMSLE(y, ŷ) = t
n ∑ (loge (1 + yi ) − loge (1 + ŷi ))2 (5)
i =0
where X is input data, w = (w1 , . . . , w p ) is coefficients vector for a linear model, y denotes
the target value, and α ≥ 0 is the complexity parameter which controls the amount
of shrinkage.
Figure 2. The structure of random forest. The decision trees run in parallel with no interaction among
them. A random forest operates by constructing several decision trees during training time and
outputting the mean of the classes as the prediction of all the trees.
2.2.4. XGBoost
XGBoost is short for “Extreme Gradient Boosting” [19], it is also a tree-based ensemble
method like random forest. In addition to that, XGBoost incorporates another popular
Metals 2022, 12, 1 7 of 15
method called boosting [17], where successive trees add extra weight to samples which
have been incorrectly predicted by earlier trees. This way the final predictions are obtained
by a weighted vote of all trees [19]. It is called gradient boosting because it uses a gradient
descent algorithm to minimize the loss when adding new models. By combining the
strengths of these two algorithms, XGBoost has an excellent capacity for solving supervised
learning problems.
2.2.5. CatBoost
CatBoost [20] is another ensemble algorithm that leverages from boosting, however, it
works well with multiple categories of data. Compared to XGBoost, CatBoost also handles
missing numeric values. CatBoost distinguishes with symmetric trees which use the same
split in the nodes of each level, making it much faster than XGBoost. CatBoost calculates
the residuals for each data point and does this with the model trained with other data. In
this way, different residual data are obtained for each data point. These data are evaluated
as targets and the general model is trained to the number of iterations.
(Shapley value) of each feature. φ0 is the constant of the interpretation model (namely the
predicted mean value of all training samples) [24].
Lundberg and Lee [24] demonstrate that SHAP is better aligned with human intuition
than other ML interpretation methods. There are three benefits worth mentioning here.
1. The first one is global interpretability—the collective SHAP framework can show how
much each predictor contributes, either positively or negatively, to the target variable.
The features with positive sign contribute to the final prediction activity, whereas
features with negative sign contribute to the prediction inactivity. In particular, the
importance of a feature i is defined by the SHAP as follow:
1
| N | ! S ⊆∑
φi = |S|!(| N | − |S| − 1)![ f (S ∪ {i }) − f (S)] (8)
N \{i }
Here, f (S) is the output of the ML model to be interpreted using a set of S of features,
and N is the complete set of all features. The contribution of feature i (φi ) is deter-
mined as the average of its contribution among all possible permutations of a feature
set. Furthermore, this equation considers the order of feature, which influence the
observed changes in a model’s output in the presence of correlated features [30].
2. The second benefit is local interpretability—each observation obtains its own set of
SHAP framework. This greatly increases its transparency. We can explain why a case
receives its prediction and the contributions of the predictors. Traditional variable
importance algorithms only show the results across the entire population but not for
each individual case. The local interpretability enables us to pinpoint and contrast the
impact of the factors.
3. Third, SHAP framework suggests a model-agnostic approximation for SHAP frame-
work, which can be calculated for any ML model, while other methods use linear
regression or logistic regression models as the surrogate models.
The core idea behind Shapley value based explanation of machine learning models is
to use fair allocation results from cooperative game theory to allocate credit for model’s
output f ( x ) among its input features. Furthermore, SHAP framework of all the input
features will always sum up to the difference between baseline model output and the
current model output for the prediction being explained, which means we can explicitly
check the contribution of every feature on the individual label.
Mean Std
C (wt%) 1.68 1.24
Si (wt%) 1.12 0.55
Mn (wt%) 0.53 0.23
P (wt%) 0.02 0.01
S (wt%) 0.01 0.01
Cr (wt%) 20.83 6.16
Ni (wt%) 15.25 20.89
Mo (wt%) 0.59 1.47
Cu (wt%) 0.07 0.07
Al (wt%) 0.13 0.46
Ti (wt%) 0.02 0.04
B (wt%) 0.00 0.00
Nb (wt%) 0.39 0.57
V (wt%) 0.30 0.58
W (wt%) 0.79 1.94
Co (wt%) 0.54 1.39
melt duration (min) 112.54 157.44
holding time (min) 94.68 48.40
melt energy (kWh) 622.48 126.80
liquid heel (kg) 90.31 200.39
charged material (kg) 911.67 237.19
scrap (kg) 25.21 35.44
[. . . ] [. . . ] [. . . ]
magnitude in the variance. Third, transformation is applied to some columns such that
data can be represented by a normal or approximate normal distribution. Fourth, three
columns which have too many unique values or few extreme values outside the expected
range are discretized into categorical values using a pre-defined number of bins. Fifth,
feature columns in the dataset that are highly linearly correlated with other feature columns
will be dropped. Finally, to extend the scope of independent variables, new polynomial
features that might help in capturing hidden relationships are created. After pre-processing
the raw input data, the number of independent variables is increased from 51 to 282.
Table 2. The results on training dataset for 6 models. The ET outperforms than other 5 models.
After the hyperparameter tuning of every ML model, some relative important hyper-
parameters are shown on the Table 3. These hyperparameters will then be applied to every
ML model to make predictions.
Model Hyperparameters
GB max_ f eatures = 0.785 min_sample_lea f = 1 n_estimator = 287
min_samples_split = 5
CatBoost learning_rate = 0.03 depth = 5 random_strength = 35
l2_lea f _reg = 0.2
ET max_depth = 11 max_ f eatures = 0.4 n_estimator = 137
min_samples_lea f = 2
XGBoost max_depth = 7 learning_rate = 0.0131 n_estimator = 292
subsample = 0.638
Random forest max_depth = 6 ccp_al pha = 0 n_estimator = 300
max_ f eatures = 0.5164
Ridge al pha = 0.96 copy_X = True normalize = True
solver = auto tol = 0.001
The ML models are evaluated on the training set using fore-mentioned scoring metrics.
The ET algorithm has the best performance on the all five metrics comparing to other five
Metals 2022, 12, 1 11 of 15
models. After training the model, a model with the best hyperparameters of every ML
model can be used to predict the final output on the testing dataset. Table 4 shows the
metrics on the testing set of six models.
Table 4. The results on testing dataset for 6 models. AS in the training set, ET has the best
performance.
The result on the testing set is very similar to the training set, the ET algorithm
outperforms the other five models in terms of all scoring metrics, it can be concluded that
the ET has given the best response and an excellent result in the testing phase. So the ET
algorithm will be chosen as the model for model interpretation and predictions.
Figure 4 demonstrates a residuals plot for the ET model. It shows the difference
between the observed value of the target variable (y) and the predicted value (ŷ).
Figure 4. The residuals plot shows the difference between residuals on the vertical axis and the
dependent variable on the horizontal axis, allowing us to detect regions within the target that might
be susceptible to more or less error. The Histogram on the right side of plot is a common way to
check that residuals are normally distributed.
is horizontally stacked and summed up. In general, it indicates that para_1 is the most
important feature followed by attribute para_2 and para_3. These three features have
greater mean SHAP framework than others. Changing the value of para_1 can cause
the alteration of predicted output on average by 4.26. On the other hand, the very last
271 features contribute only insignificantly to the final prediction (sum of the top three
parameters is higher than the sum of the combined 271 features), compared to the top
3 features. The result is shown in Figure 5.
Figure 5. SHAP feature importance of ET model for the training data set. After pre-processing
(polynomial feature engineering) of input features, the number of features increased from 51 to 282.
The feature importance plot is useful, but contains no information beyond the im-
portance. For a more informative plot, the summary plot (Figure 6) provides feature
importance with feature effects. Each point on the summary plot is a Shapley value for a
feature and an instance. The position on the y-axis is determined by the feature and on the
x-axis by the Shapley value. The color represents the value of the feature from low to high.
Overlapping points are jittered in the y-axis direction, the features are ordered according
to their importance. From Figure 6 we can tell that a high level of the attribute para_1
(excessive content of organic binders in the mold material) and para_2 (exceedingly high
fine particles content in the mold material) content have an impact on the final prediction.
The high comes from red color, and positive is shown on the x-axis. Similarly, it can be
assumed that the attribute para_3 is negatively correlated with the target variable.
Metals 2022, 12, 1 13 of 15
Conclusions
In the foundry industry often times the reasons for bad quality production results
are not well known, understood and or appear seemingly random. Machine learning
could help with this as it is a conclusive and powerful tool to analyse big and complex
Metals 2022, 12, 1 14 of 15
data. The foundry environment with its data from the multiple sub-processes such as
melting, sand preparation, and mold production, fits very well into the definition of
complex data. However, for a practical use in the daily production environment and
even some kind of automated control loop that helps adjusting the manufacturing process
many challenges and problems in the foundry environment need to be resolved. One
challenge for the effective use of machine learning in this environment is the lack of
data collection and access. The foundry industry is characterized by a strong mixture
of new and existing plants in a factory environment that has often grown historically.
In these heterogeneous brownfield environments, complete digitization of production
is associated with challenges, such as analog communication paths, lack of connectivity,
media disruptions, and a large number of different machine controls [31]. This results in
the following pain points for the foundry industry: data acquisition is often still performed
manually even in key processes; fully automated, cross-system data acquisition exists
only in the rarest cases; media discontinuities exist between existing systems and manual
information acquisition; data quality is often highly heterogeneous and not sufficiently
good for data-driven optimization; effective data utilization of process data often does not
take place or only to a limited extent; process optimization is based on assumptions and
experience, trial-and-error and, therefore, multi-parametric relationships are not or only
very challengingly recognizable. This leads to high scrap rates, often times low process
stability and, therefore, poor resource and energy efficiency. In the future, foundries will
need very flexible, resilient, and efficient production. The pain points described above give
rise to the overriding problem that a large number of efficiency potentials are currently not
being exploited. On the one hand, these are of an economic nature, since production costs
can be reduced, for example, through process optimization and a reduction in the scrap rate.
On the other hand, there is considerable potential in the area of ecological sustainability,
which can be leveraged, for example, through process optimization for reduced energy
consumption, but also through targeted energy-optimized plant layout. In order to exploit
these efficiency potentials and as described above to meet the increasing requirements, the
complete digitization of production and use of machine learning seems the next step in
optimizing production processes.
References
1. Kommission, E. Mitteilung der Kommission an das Europäische Parlament, den Europäischen Rat, den Rat, den Europäischen
wirtschafts- und Sozialausschuss und den Ausschuss der Regionen: Der Europäische Grüne Deal 2019. Available online:
https://eur-lex.europa.eu/legal-content/DE/TXT/?uri/ (accessed on 1 October 2021).
2. Thiede, S.; Juraschek, M.; Herrmann, C. Implementing cyber-physical production systems in learning factories. Procedia Cirp
2016, 54, 7–12. [CrossRef]
3. Lee, J.; Noh, S.D.; Kim, H.J.; Kang, Y.S. Implementation of cyber-physical production systems for quality prediction and operation
control in metal casting. Sensors 2018, 18, 1428. [CrossRef] [PubMed]
4. Ferguson, M.; Ak, R.; Lee, Y.T.T.; Law, K.H. Automatic localization of casting defects with convolutional neural networks.
In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 11–14 December 2017;
pp. 1726–1735.
5. Fernández, J.M.M.; Cabal, V.Á.; Montequin, V.R.; Balsera, J.V. Online estimation of electric arc furnace tap temperature by using
fuzzy neural networks. Eng. Appl. Artif. Intell. 2008, 21, 1001–1012. [CrossRef]
Metals 2022, 12, 1 15 of 15
6. Tsoukalas, V. An adaptive neuro-fuzzy inference system (ANFIS) model for high pressure die casting. Proc. Inst. Mech. Eng. Part
B J. Eng. Manuf. 2011, 225, 2276–2286. [CrossRef]
7. Dučić, N.; Jovičić, A.; Manasijević, S.; Radiša, R.; Ćojbašić, Ž.; Savković, B. Application of Machine Learning in the Control of
Metal Melting Production Process. Appl. Sci. 2020, 10, 6048. [CrossRef]
8. Bouhouche, S.; Lahreche, M.; Boucherit, M.; Bast, J. Modeling of ladle metallurgical treatment using neural networks. Arab. J. Sci.
Eng. 2004, 29, 65–84.
9. Hasse, S. Guß-und Gefügefehler: Erkennung, Deutung und Vermeidung von Guß-und Gefügefehlern bei der Erzeugung von Gegossenen
Komponenten; Fachverlag Schiele & Schoen: Berlin, Germany, 2003.
10. Parsons, S. Introduction to machine learning by Ethem Alpaydin, MIT Press, 0-262-01211-1, 400 pp. Knowl. Eng. Rev. 2005,
20, 432–433. [CrossRef]
11. Mueller, J.P.; Massaron, L. Machine Learning for Dummies; John Wiley & Sons: Hoboken, NJ, USA, 2021.
12. Bishop, C.M. Pattern recognition. Mach. Learn. 2006, 128, 89–93.
13. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings
of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2021; Bach, F., Blei, D., Eds.; PMLR: Lille,
France, 2015; Volume 37, pp. 448–456.
14. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
15. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and regression trees. Belmont, CA: Wadsworth. Int. Group
1984, 432, 151–166.
16. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [CrossRef]
17. Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst.
Sci. 1997, 55, 119–139. [CrossRef]
18. Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [CrossRef]
19. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2016; pp. 785–794.
20. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features.
arXiv 2017, arXiv:1706.09516.
21. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?”. Explaining the predictions of any classifier. In Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM: New York, NY, USA, 2016;
pp. 1135–1144.
22. Shrikumar, A.; Greenside, P.; Kundaje, A. Learning important features through propagating activation differences. In Proceedings
of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Precup, D., Teh, Y.W., Eds.;
PMLR: Lille, France, 2017, Volume 70, pp. 3145–3153.
23. Antwarg, L.; Miller, R.M.; Shapira, B.; Rokach, L. Explaining Anomalies Detected by Autoencoders Using SHAP. arXiv 2019,
arXiv:1903.02407.
24. Lundberg, S.; Lee, S.I. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874.
25. Štrumbelj, E.; Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf.
Syst. 2014, 41, 647–665. [CrossRef]
26. Datta, A.; Sen, S.; Zick, Y. Algorithmic transparency via quantitative input influence: Theory and experiments with learning
systems. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2016;
pp. 598–617.
27. Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.R.; Samek, W. On pixel-wise explanations for non-linear classifier
decisions by layer-wise relevance propagation. PLoS ONE 2015, 10, e0130140. [CrossRef]
28. Lipovetsky, S.; Conklin, M. Analysis of regression in game theory approach. Appl. Stoch. Model. Bus. Ind. 2001, 17, 319–330.
[CrossRef]
29. Shapley, L.S. A value for n-person games. Contrib. Theory Games 1953, 2, 307–317.
30. Rodríguez-Pérez, R.; Bajorath, J. Interpretation of machine learning models using shapley values: Application to compound
potency and multi-target activity predictions. J. Comput.-Aided Mol. Des. 2020, 34, 1013–1026. [CrossRef]
31. Beganovic, T. Gießerei 4.0—Wege für das Brownfield. 3. Symposium Gießerei 4.0. 2018. Available online: https:
//s3-eu-west-1.amazonaws.com/editor.production.pressmatrix.com/emags/139787/pdfs/original/cc65a080-6f72-4d1e-
90b1-8361995504d0.pdf (accessed on 1 October 2021).