0% found this document useful (0 votes)
49 views19 pages

Reinforcement Learning Based Agents For Improving Layouts of Automotive Crash Structures

Uploaded by

pad abt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views19 pages

Reinforcement Learning Based Agents For Improving Layouts of Automotive Crash Structures

Uploaded by

pad abt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Applied Intelligence (2024) 54:1751–1769

https://doi.org/10.1007/s10489-024-05276-6

Reinforcement learning based agents for improving layouts


of automotive crash structures
Jens Trilling1 · Axel Schumacher1 · Ming Zhou2

Accepted: 5 January 2024 / Published online: 18 January 2024


© The Author(s) 2024

Abstract
The topology optimization of crash structures in automotive and aeronautical applications is challenging. Purely mathematical
methods struggle due to the complexity of determining the sensitivities of the relevant objective functions and restrictions
according to the design variables. For this reason, the Graph- and Heuristic-based Topology optimization (GHT) was devel-
oped, which controls the optimization process with rules derived from expert knowledge. In order to extend the collected
expert rules, the use of reinforcement learning (RL) agents for deriving a new optimization rule is proposed in this paper.
This heuristic is designed in such a way that it can be applied to many different models and load cases. An environment is
introduced in which agents interact with a randomized graph to improve cells of the graph by inserting edges. The graph is
derived from a structural frame model. Cells represent localized parts of the graph and delineate the areas where agents can
insert edges. A newly developed shape preservation metric is presented to evaluate the performance of topology changes made
by agents. This metric evaluates how much a cell has deformed by comparing its shape in the deformed and undeformed state.
The training process of the agents is described and their performance is evaluated in the training environment. It is shown
how the agents and the environment can be integrated as a new heuristic into the GHT. An optimization of the frame model
and a vehicle rocker model with the enhanced GHT is carried out to assess its performance in practical optimizations.

Keywords Structural optimization · Topology optimization · Automotive crash · Artificial intelligence ·


Reinforcement learning

1 Introduction nonlinearities due to large structural deformations, contact


phenomena between adjacent components, plasticity in the
In the last decades, engineers have significantly improved the material model and failure of individual components. In
crashworthiness of vehicles due to ever-increasing require- addition, simulation results from crash analyses suffer from
ments for the passive safety of automobiles imposed by numerical noise. Individual simulations of crash models can
legislation and consumer protection. The simulation of take hours to days depending on the model complexity and
increasingly complex crash models using the finite ele- the available computing resources. In this context, the use of
ment method (FEM) helps in the development process. algorithms for automatic optimization of structures is dif-
The simulation of crash-loaded structures is subject to high ficult. Though sensitivities can be calculated numerically
through finite differences, it is not only time consuming, but
B Jens Trilling also unreliable due to the complexity of crash simulation out-
jtrilling@uni-wuppertal.de; www.oms.uni-wuppertal.de
lined above. In this context, sensitivities are the derivatives
Axel Schumacher of the objective functions and restrictions of the optimization
schumacher@uni-wuppertal.de; www.oms.uni-wuppertal.de
problem according to the design variables.
Ming Zhou For improving the layout of a structure, algorithms of
zhou@altair.com
topology optimization can be used. In topology optimiza-
1 Optimization of Mechanical Structures, tion, the mechanical structure is improved by adjusting
University of Wuppertal, Gaußstraße 20, the shape and position of structural components. Devel-
Wuppertal 42119, NRW, Germany opment of methods for topology optimization of crash
2 Altair Engineering, Main Street, Irvine 92614, CA, USA models are current research topics. Thereby, the approaches

123
1752 J. Trilling et al.

are manifold. One possibility is to obtain useful gradients • the concept of cells and their advantages and disadvan-
for the optimization as shown in [1], where topologi- tages as an interface between the graph based structures
cal derivatives via the adjoint equilibrium equation are and the RL model,
determined. • the implementation of an environment the agents are
The Equivalent Static Loads Method (ESL) transforms the trained on,
loads of a nonlinear problem into several static problems. • the calculation of the shape preservation metric describ-
The load is determined such that the displacements in an ing the stiffness of a cell and
equivalent linear simulation correspond to those in a specific • the assessment of the performance of the trained agents
time step of the nonlinear simulation. Sensitivities can be in practical optimizations.
determined for the static problems, which are then used to
perform the topology optimization [2–4]. The paper is structured as follows. Section 2 presents
In addition to the calculation of usable sensitivities, related literature and concepts that are used in this paper
there are also methods for topology optimization that do or have directly influenced the work. In Section 3, the
not use direct sensitivity information. Instead, these use requirements for the RL models, also called agents, the
engineering knowledge that guides the optimization in the implementation of the RL environment and the training of
form of heuristic rules. The rules of cellular automata as the agents are introduced. It also describes how the environ-
used in Hybrid Cellular Automata (HCA) [5] can be used ment and the agents are integrated into the GHT process.
to form three-dimensional voxel based structures. In this The best trained model is selected and evaluated in Section 4
process, the cellular automaton redistributes material such within the training environment. To assess the performance
that each voxel in the model is equally utilized for the of the agent based heuristic within practical topology opti-
particular load case. As a criterion for this, the internal mizations, a frame model and a rocker model are studied
energy density of the respective voxels is used. The process with the GHT in Section 5. Finally, the results and findings
of the Graph- and Heuristic-based Topology optimization are summarized in Section 6.
(GHT) are also driven by heuristic update rules [6–9]. Those
updates are performed on a mathematical graph consist-
ing of nodes and edges, which describes the cross section 2 Related work
of an extrusion profile. With the GHT, the real objec-
tive and constraints can be considered in the optimization Section 2.1 gives an overview of different approaches to inte-
process. grate artificial intelligence (AI) into crash simulation and
This work presents a reinforcement learning (RL) based crash optimization. These studies collectively highlight the
approach for topology optimization of crash loaded struc- intersection of AI with crash simulations, illustrating the
tures using the GHT to improve local structural cells with growing trend in this research area. However, while they offer
respect to their stiffness. The RL model is integrated into valuable insights, they do not directly lay the groundwork for
the GHT optimization process and functions as an additional the present work. This is followed with Section 2.2, where
heuristic applicable in many different models and load cases. an introduction to RL is given. Lastly, Section 2.3 discusses
That allows for a more diverse design generation during opti- how the GHT works, as it is the framework for the RL based
mization which in return should result in better optima at the method presented in this paper.
cost of a higher simulation count.
The core concept that is presented here is the underly- 2.1 Use of artificial intelligence in crash simulation
ing RL environment which defines the interface between the and optimization
agent and the crash model that should be improved. For this, a
shape preservation metric is proposed, that describes the stiff- Crash simulation and optimization are integral aspects of
ness of a cell by measuring how much the undeformed and modern automotive and aeronautical crash safety analysis.
deformed cells differ from each other geometrically. While Leveraging AI and machine learning techniques has recently
the GHT process is already well described in the literature gained momentum in the field of crash analysis, allowing for
[6–9], the RL based approach is completely new and will be enhanced computational and predictive capabilities.
investigated in this paper. The key contributions of this work A primary focus of recent studies has been the applica-
are tion of dimensionality reduction and clustering techniques
for analyzing crash simulations. In [10] a clustering algo-
• the combination of the two research fields of RL and rithm to discern structural instabilities from reduced crash
crash optimization, simulation data is incorporated. The study delves into clus-
• the support of the GHT with a new RL based heuristic tering of nodal displacements derived from finite element
that increases the stiffness of local cells, (FE) models spanning different simulations. By subsequently

123
Reinforcement learning based agents... 1753

processing these clusters through dimensionality reduction in a full frontal crash by analyzing different crash relevant
techniques, inherent characteristics of the simulation runs components.
are unraveled, obviating the need for manual sifting of the In [16], the impact point in low speed crashes is identified
data. The practicality of their approach is underscored by its based on the time history of sensor data with conventional
effective application to a longitudinal rail example. feature extracting algorithms. The impact points are classi-
[11] presented an automated evaluation technique aimed at fied by 8 different positions around the vehicle. From 3176
discerning anomalous crash behaviours within sets of crash extracted features of the time series, the 9 most important
simulations. By calculating an outlier score for each sim- features are chosen and passed into a decision tree. Using
ulation state via a k-nearest-neighbour strategy, the study this method, a cross-validated accuracy of 76 % for the given
aggregates these results into a consolidated score for indi- dataset has been achieved.
vidual simulations. By averaging these scores for a given
simulation, the method facilitates the distinction between 2.2 Reinforcement learning overview
regular and outlier simulations. The effectiveness of this
method is underscored by its high precision and notable recall RL [17] is a subset of AI, that describes the process of learn-
when evaluated on five distinct datasets. ing tasks in an unknown and often dynamic environment.
A geometric approach is presented in [12]. Primarily one- An agent performs actions within the environment over time
dimensional structures are embedded by a representative according to its policy. The actions are selected by the agent
regression line used to analyze the deformation behavior of depending on an observation of the environment, i.e. the
different crash models. Those regression lines are param- agents perception of the current state of the environment,
eterized as Bézier curves. Simulation responses are then with the goal of maximizing a cumulative reward. Applying
projected onto the regression line and smoothed with a ker- the action to the environment and generating a new obser-
nel density smoothing. Leveraging a discretized version of vation based on the new state of the environment is called a
the smoothed data, it is possible to effectively identify and step. Depending on the task of the agent, a few steps up to
categorize distinct deformation patterns and find the most an theoretically infinite amount of steps are performed until
influencing parameters regarding the deformation modes the environment reaches a terminal state. The steps from the
through data mining techniques. This method is validated initial state of the environment to the final state are called
on different important structural components in a full frontal an episode. For each step performed, a numerical reward is
crash. given to the agent. This way the agent learns to understand
Crash simulations are intrinsically time dependent. The how beneficial the chosen action has been for the previous
use of time data for AI in crash simulations is therefore state. After an episode, the environment is reset and a new
suggestive. The study from [13] offers a novel data analy- episode starts. This iterative concept of stepping through the
sis methodology for efficiently post-processing bundles of environment is referred to as the RL loop.
FE data from numerical simulations. Similar to the Fourier In many state-of-the-art RL algorithms, the agent itself is
transform, which decomposes temporal signals into their a function approximator, usually an artificial neural network
individual frequency spectra, [13] propose a method that (ANN). Depending on the actual algorithm used, the ANN
characterises the geometry of structures using spectral coef- is trained to either predict the value of the observed state or
ficients. The coefficients with the highest values are decisive predict an action distribution over the possible actions that
for the representation of the original geometry. By selecting maximizes a given objective directly. The value of a state is
these predominant spectral coefficients, the geometry can the expected return starting in the current state and following
consequently be represented in a low-dimensional way. The the current policy. The return is a possibly discounted sum
method is successfully validated on a full frontal crash by of the rewards gained in each performed step. Choosing an
analyzing the behaviour of the vehicles support structure. action is then done by finding the action that maximizes the
In a simpler approach, [14] bring forth the concept of value in the current state.
Oriented Bounding Boxes (OBBs). Those are cuboids that In case of actor-critic algorithms [18], which are used in
encapsulate FE components at minimum volumes through- this paper, both the action distribution and the states value
out the simulation. This geometric abstraction enabled the are approximated in two distinct ANNs. The network predict-
estimation of size, rotation and translation of crash struc- ing the action distribution is called an actor and the network
tures over time. Moreover, their method, which uses a Long predicting the value of the states is called critic. The pol-
Short-Term Memory (LSTM) [15] autoencoder to gener- icy given by the actor network is updated with a gradient
ate a low dimensional representation of the original data, based approach with information provided by the critic [18].
paves the way for predicting and stopping simulations that This combination of both algorithmic approaches enables a
exhibit undesirable deformation modes. The method is val- much higher sampling efficiency compared to their individ-
idated on 196 simulations with varying material properties ual counterparts.

123
1754 J. Trilling et al.

When using an RL model that has been trained with In this work, Python is used for implementing the heuris-
an actor-critic-approach, only the actor is necessary for tic and the RL training. The most important python modules
the decision making given an observed state. This interac- used are stable-baselines3 [24], which is an RL framework,
tion between the actor and the environment is visualized in gym [25], which enables a standardized implementation
Fig. 1. The best actions will be sampled from the probability of environments, networkx [26] for processing graph data,
distribution over the actions. The environment acts according numpy [27] for numerical operations on arrays and qd cae
to the given action and answers with a new state which will [28] for parsing the simulation results. The crash simulations
be the foundation for the observation passed next to the actor are carried out with Ls-Dyna [29].
model.
In contrast to supervised learning, where an ANN is 2.3 Crash optimization with the graph- and heuristic
trained on a given dataset, the RL training is driven by a trial based topology optimization
and error approach. The agent steps through the environment
according to its policy, collecting data as it steps. This is the The GHT is a shape and topology optimization method for
main advantage of using RL especially for complex mechani- crash structures. There are possibilities to optimize the cross
cal problems. For training an RL agent, no information about section of an extrusion profile (GHT-2D) [6, 9] as well as the
optimized structures must be known beforehand. layout of different combined profiles (GHT-3D) [7].
This poses the problem described in the literature as the In this work, the focus will be on the GHT-2D. For all
exploration-exploitation trade-off [17]. While training, the following references to the GHT, it is always the GHT-2D
agent should act according to its current policy, such that it is that is referred to. In the GHT, the cross sections of extrusion
able to use the already learned knowledge about the environ- profiles are described by graphs. The graph nodes contain
ment to reach favourable states. This is called exploitation. the relevant coordinates that describe the geometry of the
At the same time, the agent needs to try out new actions, to profile. Edges between the nodes represent the walls of the
avoid getting stuck in local optima by only acting according extruded model. For the automatic translation of a graph into
to the current policy. This is called exploration. a FE simulation model, the GHT-internal mesher GRAMB
Typical examples where RL is frequently applied is (Graph based Mechanics Builder) can be used. This software
robotics [20] and video games [19, 21]. New observations has been initiated by [8] and further developed in [6]. An
can be obtained fast and in a simple and structured format, example of a graph and its FE counterpart is given in Fig. 2.
like scalar sensor data in robotics and a visual representation As described in [7, 30], the use of a graph allows for
of the environment in games. Thus, RL is a suitable method
for training agents in these application fields.
• an easy way of manipulating the structure,
An example of an application of RL in mechanical prob-
• generating an interpretable structure with little effort,
lems is given in [22, 23]. In the mentioned work, the volume
• a simple check of the manufacturing constraints and
for planar steel frames is calculated with RL considering
• high quality meshs due to the automatic FE meshing with
stresses, displacements and other engineering relevant con-
GRAMB for every design.
straints. There, the cross sectional size is chosen from a list
of discrete sizes. The steel frames are represented by a graph.
A tailored graph embedding is used to preprocess the graph Starting from an initial design, the graph is modified
data into an RL suitable format. over several iterations using heuristics. Heuristics are expert
knowledge condensed into formulas that analyze the mechan-
ical behavior of the structure from the simulation. Within an
iteration, these heuristics suggest a new topology. If desired,
a dimensioning or shape optimization of each new structural
proposal is subsequently performed. The heuristics operate in
parallel and in competition. Only the best designs are passed
on to the next iteration. These new designs are the basis for
the following iterations.
The heuristics used in this work have been developed in
[6, 7] and are listed below.

• Delete Needless Walls (DNW) is a heuristic where unim-


portant edges are deleted. Here, the energy density of the
Fig. 1 The RL loop showing the interaction between the ANN based structure walls is the criterion whether an edge is classi-
actor model and the environment fied as unimportant.

123
Reinforcement learning based agents... 1755

Fig. 2 Example of a GHT graph


consisting of nodes and edges
describing the cross section of
the profile that is converted into
an FE model of an extruded
profile using GRAMB

• Support Buckling Walls (SBW) identifies FE nodes that the agent inserts an edge into the graph locally, a reward is
are moving rapidly towards each other, detecting that the calculated, which indicates whether the newly added edge
structure has a buckling tendency. These areas are sup- improved the local performance of the model. How the
ported with an additional wall. reward for each step is calculated, is shown in Section 3.1.4.
• Balance Energy Density (BED) provides a homogeneous Lastly, in Section 3.1.5, the training process for the agent is
distribution of the absorbed energy in the structure by described.
connecting low and high energy areas.
• Use Deformation Space (UDS) has the variants com-
3.1.1 Environment concept
pression (UDSC) and tension (USDT). For this purpose,
deformation spaces moving towards and away from each
The first step in implementing the environment is to clarify
other are identified and supported by a wall.
what the environment should achieve. It is supposed to give
• Split Long Edges (SLE) reduces the buckling tendency
a framework for an agent to increase the stiffness of a struc-
by splitting and connecting the longest edge with another
ture by manipulating the topology of the graph locally. This
long edge in the graph.
should be achieved by sequentially inserting edges into the
graph. An optimal topology proposal by the trained agent
acting in the environment is desirable, but not mandatory.
3 Reinforcement learning based heuristic This is because the RLS heuristic, which is derived from the
generation proposed environment, works in competition to other heuris-
tics listed in Section 2.3. Therefore, suboptimal topologies
In the following, the framework of the RL based GHT heuris- are sorted out in the optimization process.
tic is formulated. The design of the training environment for Figure 3 gives an overview of stepping through an episode
the heuristic is described in detail in Section 3.1. Section 3.2 of the environment. The environment is split into two main
describes, how the heuristic is integrated into the GHT opti- modules. When an episode starts, the reset module is acti-
mization process. vated. This reset module handles a randomized generation
In this investigation, the aim of the heuristic is to improve of a GHT graph representing the cross section of an extru-
the local stiffness of the structure, hence the heuristic name sion profile, which is then translated into a finite element
RLS, which is an abbreviation for “Reinforcement Learning model and simulated in a randomized load case. Local parts
Stiffness”. of the graph, referred to as cells, are identified. Edges will
be inserted directly into those cells. Based on the results of
3.1 Environment implementation the FE analysis, an observation consisting of the mechani-
cal properties of the initial and deformed simulation model
The environment definition is the most important part when is build. Using this initial observation, the agent is able to
training an agent with RL. It defines what the agent should choose its first action, which is passed into the step mod-
learn based on the received rewards and how the interac- ule. The step module contains similar procedures as the reset
tion between the agent and the environment is implemented. module. Additionaly, the topology of the cell inside the graph
Therefore, the main concept of the environment is shown in is modified first using the given action. The cell is evaluated
Section 3.1.1. Section 3.1.2 explains the interaction between based on its updated topology. Then the reward and a termi-
the agent and the environment, i.e. the actions the agent can nation flag that decides whether the episode should terminate
take and the observations the agent receives. The random is calculated. All of these concepts will be explained in more
generation of models and load cases is shown in 3.1.3. When depth throughout this section.

123
1756 J. Trilling et al.

Fig. 4 Graph representation of the frame simulation model in its default


load case. The semi-transparent box highlights the fully constrained side
of the model. x̂ and ŷ refer to a local profile coordinate system

mechanical problem, ANNs will be used as function approx-


Fig. 3 Overview of stepping through an episode of the environment imators for the agent. The use of ANNs for training on
structured data, such as tabular data like images and fixed-
length vectors, is straightforward. Since the raw data obtained
Before explaining the concepts more, a simulation model from the GHT either has a graph structure or originates from
must be defined with which the agent can interact. The sim- FE simulations with different numbers of nodes and elements
ulation model is an FE model that is used in the environment for every design, the data is considered to be unstructured.
to generate the observations based on all necessary crash To circumvent these problems, the environment identifies
responses. Due to the fact that RL needs many function calls a local part of the graph, called a cell, which is optimized by
by its nature, a finite element model must be chosen that com- the agent. By looking at cells spanned by a fixed number of
putes in a few seconds. Therefore, a simplified version of a nodes, the mechanical state of the cell can be described by
frame simulation model, which was originally presented in fixed length vectors. These vectors can directly be used for
[6] and then modified in [30], is used. The graph representa- the observation.
tion of the simulation model is shown in Fig. 4. It is important An additional advantage of this approach is that the agent
to notice, that the entire method is dependent on the given can identify generalizable behavior in the simulation data,
simulation model. How well the trained agent is able to gen- due to similarities between all cells and their mechanical
eralize the mechanical behaviour in GHT optimizations must behaviour. It is also possible for the agent to recognize pat-
be further investigated. terns from the training data on different models and load
Typical for an extrusion profile, it is made from aluminum cases, as long as the underlying mechanics are similar enough
and has an initial wall thickness of 3 mm and a weight of 20.25 to the behaviour of the frame model. The disadvantages of
g. The rigid sphere has a velocity of 10 ms−1 and a mass of this approach are that the structure can only be optimized
105.1 g. The extrusion depth is set to 5 mm. Likewise, the locally and that inserted edges are bound to the nodes along
mean element edge length is set to 5 mm, which means that the cell boundaries. An example of a graph and a valid cell is
there is only one row of elements in the extrusion direction. shown in Fig. 5. The different size of the graph in Fig. 5 com-
Because of this, phenomena in the extrusion direction cannot pared to the default graph in Fig. 4 is due to a randomization
be represented. This compromise has to be made to save process which is applied onto the default graph to generate a
simulation time. A randomized derivative of this frame model variety of different graphs.
is generated for the agent to train on, which will be explained To increase the number of possible topologies in the cell,
in more detail in Section 3.1.3. the edges along the boundaries of the cell are split. Some
In order to develop the training environment for the RL splitting nodes of the cell might already exist in the graph,
agent, it must be clarified in which format the environment due to a connection onto the side of a cell from the outer
passes the observations to the agent and in which format graph. In this case, no additional edge splitting is required,
the agents actions are given. Due to the complexity of the since the specific side of the cell is already split.

123
Reinforcement learning based agents... 1757

the high mechanical complexity. Since the training mod-


els and load cases are randomly generated, no training
model will be identical to a previous one. For the RL agent
it would accordingly be difficult recognizing a general-
izable pattern within the deformations of the simulated
structure. This is especially the case for the limited num-
ber of training episodes due to the computational costs
of the crash simulations.
• Although the graph based approaches are currently sub-
ject of research, a generally applicable and flexible
integration of these approaches into a reinforcement
Fig. 5 Example of an extrusion profile with a cell of 4 nodes that is to learning library that can be used out of the box does not
be optimized by the agent. The cell is chosen randomly from all possible exist yet.
cells
As already mentioned, the RLS heuristic is intended to
improve the stiffness of the cells and thus of the overall struc-
A cell is considered valid, if all of the following criteria
ture. To allow a fair comparison of the structures found by
are fulfilled.
the agent, the wall thickness of the whole structure is scaled
in such a way that the profile mass remains constant between
• The cell shape must match the shape which the agent all the designs proposed by the agent.
was trained for. In this example, the cell is a quadrilateral
consisting of 4 nodes spanning the cell and 8 nodes in total 3.1.2 Interaction between agents and environment
along the cell boundary. This ensures that all vectors for
the observation can be represented with vectors of fixed The action of the agent is a tuple consisting of an identifica-
length. tion number for the edge to be activated and a flag whether the
• The cell shape must be convex. This ensures that added episode should be terminated. By giving the agent the option
edges by the agent will always be contained inside the to terminate the episode itself, one can avoid making the cell
cell. unnecessarily complex or non-manufacturable by inserting
• The initial cell must be empty, i.e. there must be no edges more edges than necessary. The episode is also terminated
or nodes in it. when the cell performance is considered to be sufficient, i.e.,
• The cell must be large enough, i.e. it is checked whether when the deformation of the cell is small enough or when
the area of the cell is greater than 5% of the area spanned more than 5 edges have been added by the agent. Because
by the entire graph. the cells always follow the same building principle, it is pos-
• The cell must initially absorb a minimum amount of sible to create a mapping between the edge number given by
energy. This is achieved by calculating the shape preser- the agent and the actual edge to be inserted into the graph
vation value, which will be explained in more detail in [34]. If an edge in the graph is already active, the action is
Section 3.1.4. This ensures that the cell is deformed and ignored by the agent and no new edge is inserted.
it makes sense improving the cell. The observation is composed of various geometric param-
eters and simulation responses. For the evaluation of the
This cell based method is preferred over approaches that simulation data, a section plane is generated centered along
process graph data as the observation directly. Examples the profile perpendicular to its extrusion direction. Along this
of such approaches include Graph Neural Networks (GNN) cut, the FE nodes and elements are extracted to form a new
[31], Graph Convolutional Networks (GCN) [32] and graph graph, the evaluation graph, which describes the deformation
embedding algorithms like the graph2vec algorithm [33]. and other responses of the cell at the given section in every
Those methods can parse graphs of various sizes, but are point in time [34]. Since the agent only needs information
not used in this paper, since they introduce difficulties for the about the cells behaviour, the evaluation graph only contains
given task. nodes and edges describing the deformation of the cell, but
not the rest of the graph. Figure 6 shows an example of an
• It is unclear how the action of an agent, which is usually a evaluation graph in its deformed state.
scalar or a vector, is translated into a corresponding edge Although the evaluation graph keeps track of the cells
to be activated in an arbitrary graph. behaviour for every point in time, only specific points in time
• Difficulties in training the agent on the mechanical are used for the generation of the observation. Most com-
behaviour of the entire graph would also arise due to monly these are responses in the undeformed and a specific

123
1758 J. Trilling et al.

and angles between members. The exact manufacturing


constraints are discussed in Section 3.1.3.
• Whether graph nodes on the boundary of the cell are
connected to the rest of the graph. This is an important
information, since those connections might be a source
for additional support for the inserted member and the
cell.
• A reduced vector representation of the adjacency matrix
of the cell. The adjacency and the derived vector describe
the connectivity of nodes in a graph. With this vector, the
agent always knows the current topology of the cell [34].
• Coordinates and displacements of the graph nodes at the
evaluation time. Those coordinates and displacements are
given with respect to a local coordinate system of the cell
and give the agent a sense of the scale of the cell and its
Fig. 6 Example of an evaluation graph representing the cells deforma- deformation. Only the graph nodes along the border of
tion. The nodes of the evaluation graph are derived from corresponding
FE nodes at the center of the considered profile along its extrusion axis. the cell are evaluated, since those are sufficient to analyse
The edges show the connectivity of the graph nodes similar to the FE the stiffness of the cell.
elements connecting the FE nodes in the underlying FE model • The internal energy density for each edge of the cell at
the evaluation time and at the individual times. It tells the
agent how much energy the specific member absorbed in
deformed state. The deformed state corresponds to the point the crash.
in time, where the internal energy of the cell is maximized. • A cross sectional image (96 px by 96 px) of the cell
This point is called evaluation time. Specific responses are in its undeformed and deformed state derived from the
also evaluated at an individual point in time for each edge evaluation graph. This enables the agent to comprehend
of the cell, i.e. when their respective response value is max- the entire cell deformation, independent of the mesh used
imized. in the FE model [34]. A Convolutional Neural Network
By reducing the problem to the consideration of one cell (CNN) [35] is used to process the image data.
in a graph, the structural properties and responses of the
mechanical model can be reduced to single vectors of fixed For edge properties like edge lengths, a feature vector con-
length for each property. The simulation responses are pro- sists of entries along the frame boundary and all edges the
cessed using the evaluation graph and translated into node agent can theoretically insert. For all edges that are already
and edge properties of the corresponding cell with a fixed inserted into the cell, the structural responses are written into
number of nodes along its cell border. the vector. For all other edges not yet inserted, a response
The following features define the observation that is type specific dummy value is used instead. Since added edges
passed to the agent for analysis of the current state: can intersect, the two original intersecting edges have to be
removed and replaced with four split edges. The environ-
ment keeps track of those edge splittings and aggregates the
• The edge lengths of the cells edges. Those ensure that the responses, such that the vectors remain of fixed length. An
agent has a sense of the size of the model. This feature example of such an aggregated edge feature vector mapped
allows a better assessment of the buckling tendency of onto the cell is given in Fig. 7. The inserted edges in the cell
the corresponding walls. intersect, but the intersection is not resolved, i.e. no node is
• The wall thicknesses of the cells edges. This should help added at the intersection point, to emphasize that the feature
the agent to further comprehend the buckling tendency. vector length remains constant. Since the number of nodes
Due to a mass constraint on the model, the wall thickness of the cell is fixed to the number of nodes along the cell bor-
gets smaller for every inserted edge. This information der and inserting edges will not introduce new nodes into the
helps the agent to estimate whether a new edge can be cell as shown, node feature arrays like nodal displacements
inserted without going below the minimum allowed wall consist of as many entries as nodes along the border. In case
thickness. the model is not manufacturable, no FE results are available.
• The manufacturability of the structure. This includes a The response values from the simulations are then substituted
flag for an intersection check, a check for any unresolved with dummy values for the observation.
intersections inside the graph, a flag for edge lengths and All features, that are not inherently bound to any known
edge thicknesses of members and a flag for distances interval, are normalized, e.g. the edge lengths, coordinates,

123
Reinforcement learning based agents... 1759

Fig. 7 Example of an
aggregated internal energy
feature vector, where each of its
entries is mapped onto its
corresponding edge of the cell.
The semi-transparent box
highlights the fully constrained
side of the model

displacements and energies. The mean and standard devia- sen as the impact segment. An edge segment is a series of
tion, which are used when normalizing, are not known before edges with the same orientation. The velocity of the spheri-
the training. Therefore, the normalization is done by calcu- cal impactor is also randomly selected. Here, the sphere can
lating the running mean and running standard deviation of either move perpendicular towards the selected edge seg-
the observations values distributions while training. ment, or the sphere can move in the direction of the center of
gravity of the graph. With additional random adjustment of
3.1.3 Model and load case generation the size, the wall thickness and the orientation of the graph
in space, an unlimited number of models and different load
During the training, the agent should be able to analyze as cases can be built, that will result in a large variety of differ-
many different deformation modes of various cells as possi- ent deformation modes. Figure 8 shows a randomized model
ble. This enables the trained agent to perform useful actions in a random load case.
for similar deformation states in GHT optimizations. The manufacturing constraints used for training the agents
The frame model shown in Section 3.1.1 is used as a foun- are given in Table 1. Those are derived from [30] and adapted
dation to derive randomized models for training the agent. to suit the environment. The edge distance of two edge pairs is
Based on the models graph, about 30000 different graph calculated for all edge pairs that do not share a node. Since the
topologies from previous GHT optimizations were identified edges in the cell border are split, their distance to other edges
to serve as the basis for the randomized model generator. A changes. This results in smaller distances than the unsplit
randomly selected edge segment along the outer frame of cell, although it is geometrically identical. For this specific
the structure is constrained and another edge segment is cho- reason, the minimum distance between edges d is set to the
small value of 4 mm.

3.1.4 Reward function

The definition of the reward function is essential because it


controls what the agent learns in an RL problem. As men-
tioned earlier, the goal for the agent is to improve the overall
stiffness of the model by increasing the stiffness of a cell. For
this purpose, a shape preservation measure à is presented,
which is derived from the deformation of the structure based

Table 1 Manufacturing constraints used for training the agent

Edge length l l ≥ 10 mm
Distance between edges d d ≥ 4 mm
Fig. 8 Example of a randomized graph and load case build by the Connection angle α α ≥ 15◦
environment. The semi-transparent box highlights the fully constrained Wall thickness twall 1 mm ≤ twall ≤ 4 mm
side of the model

123
1760 J. Trilling et al.

(1)  (1)
on the evaluation graph. The evaluation graphs in the unde- only one difference area A emerges. With At = A one
0
formed state and in the deformed state at the evaluation time can see that the value of à is indeed 1.
step are superimposed at their respective center of gravity. If One advantage of using such a metric is, that its values are
the structure preserves its shape, then both cell boundaries not subject to significant noise, unlike section forces or other
lie exactly on top of each other. Otherwise, difference areas crash relevant responses, as they are entirely based on the
emerge, which are summed up and normalized. This process displacements of the cell nodes in the evaluation graph. In
is independent of any inserted edges and works for empty addition, the values of the metric are normalized so that they
cells and for cells with edges in it. Since rigid body transla- are model and load case independent. The simple behavior
tion and rotation have no influence on the shape preservation of the metric simplifies the training for the agent.
of the cell, these are eliminated when creating the evaluation This measure is also used to identify if an empty cell is a
graph. Figure 9 shows the superposition of the evaluation candidate for optimization with the agent. An à ≥ 0.03
graphs in more detail. identifies the cell being deformed and therefore it makes
The following original formula for calculating the shape sense to optimize it. A value of à ≤ 0.01 terminates the
preservation measure à is given by episode, since the deformation of the cell is small.
With this measure on how well the current cell performs,
 ( j) 
A t it is possible to reward or punish the agent by how much
à =  eval . (1) the cell performance improved compared to the cell from the
A t + A  t previous step for a given episode. A relative improvement
0 eval
is considered instead of the absolute improvement to ensure
The area spanned by the evaluation that all resulting rewards have a similar range independent of
 graph at a given simula-
tion point in time is given by At . Difference areas between the actual load case and deformation of the cell. The relative
the superimposed evaluation graphs of the undeformed and improvement is given by the formula
( j)
deformed state are given by A . j is the index of the con-    
sidered difference area. Ã s − Ã s
δ = clip i−1
 i
, −3, 3 , (2)
The shape preservation measure value is bound  between
 Ã 
s0
0 and 1 due to the normalization in à with At + At .
0 eval
A value of 0 implies that the shape of the cell did not change where si refers to the evaluation of the metric value at the
from the initial state to the deformed state and a value of current environment step. The clipping is just a safety precau-
1 means that the structures collapsed into a point, i.e. the tion. It is clipped for numerical stability to avoid any outliers
structure is infinitely weak. In the case when à = 0, no generating too small or large rewards.
difference area emerge, setting the numerator to 0 in the for- Using this improvement the reward function r evaluates
mula for à . For the collapsed cell, à = 1 is true due to the to
fact that the cell has a cross sectional area of At = 0. Then 
eval
δ if model is manufacturable,
r = p+ (3)
−1 else,

which is to be maximized by the agent. p is a penalty defined


by the user to penalize every step through the environment.
In this work it is set to p = −0.05, which means that the
added edge by the agent must at least generate a 5% increase
in performance to be considered useful. In case the model
derived from the graph with the newly added edge is not
manufacturable, the agent is penalized with a value of p − 1.

3.1.5 Training of the agents

Before the agents can be trained, it must be determined which


algorithm will be used. In this work, the PPO (proximal
policy optimization) algorithm [36] is used, since it is a state-
of-the-art algorithm that has a good sample complexity and
Fig. 9 The superposition of an empty deformed and undeformed cell outperforms other algorithms. It is also able to handle dis-
to calculate the shape preservation value à crete actions, i.e. discrete edge numbers that translate into

123
Reinforcement learning based agents... 1761

the edges that are inserted into the cell and continuous obser- The possible hyperparameter values for the policy that is
vation spaces. sampled from to generate the batch of 12 agents are given in
It is necessary to determine the best hyperparameter set- Table 3.
tings, such as the architecture of the underlying ANN to The policy parameters define the size of the underlying
achieve the best possible performance with the PPO algo- neural network. Since the PPO is considered an actor-critic-
rithm. Since the best hyperparameter settings are not known algorithm, a neural network for the actor and a neural network
beforehand, one would usually do a large hyperparameter for the critic is build with the respective number of hidden
tuning, which is too time consuming for the given task. layers and neurons. Both the actor and the critic network
Instead, 12 agents with different hyperparameters are trained share the same preceding layer with the specified amount
simultaneously on the identical task and after training the best of neurons. For the CNN architecture, the default CNN in
agent is manually picked. Table 2 shows the parameters and stable-baselines3 is used, which is the Nature CNN. The
their set of values which is sampled from to generate the hyperparameters of the algorithm and the policy resulting in
batch of 12 agents. The parameters chosen are the ones that the best agents will be shown in Section 4, where the top
are expected to significantly impact the training behaviour of performing agents are selected and evaluated.
the agents. Other parameters exist, but are not shown, since Three different cell types are trained. These include cells
they have not been tuned and are set to the default values with 3 sides, with 4 sides and with 5 sides. Every cell type
used in stable-baselines3. needs an individual agent for training, since the action and
The learning rate is a hyperparameter that determines the observation space is determined by it. In total, 36 agents are
step size at which the parameters of the policy network are trained in parallel.
updated during training via stochastic gradient ascent algo- In order to expedite the training process, parallel compu-
rithms. The batch size parameter specifies how many of tation across eight environments for each agent is used. This
samples of observations and actions will be used to compute approach ensures a rapid provision of data points, enhancing
the policy gradient and update the policy network. Gamma is the efficiency of the training phase. With these settings, train-
a discount factor used in RL algorithms to balance the impor- ing a single agent on a compute cluster takes approximately
tance of immediate and future rewards. A Gamma of 0 results a month. This high computation effort is justified by the fact
in an agent with myopic behaviour, i.e. an agent that does not that the agent only needs to be trained once and can then
look into the future and only uses its current state for deci- be used in the new heuristic for a wide variety of problems
sion making. A Gamma of 1 means that future rewards are without any significant additional invest.
as important as the immediate reward. Using a value close to
0 would help in this environment due to the high uncertainty
3.2 Integration of the agents and the environment
in the crash simulations. However, this approach might only
into the GHT process
find mediocre optima. A value closer to 1 would help in find-
ing better optima due to the future planning that is involved in
In the previous sections, the structure of the training environ-
the decision making. In the end, this might result in a worse
ment is explained. The environment can almost be directly
performance, depending on the unknown uncertainty of the
used as the new RLS heuristic. Differences between the
environment and how well the agent can predict the future.
environment used for the RLS heuristic and the training envi-
The number of rollout steps determines the number of steps
ronment are
taken in the environment before computing the policy update.

Table 3 Hyperparameters of the policy and their set of values that is


Table 2 Hyperparameters of the PPO algorithm and their sets of values sampled from to generate the batch of agents to train
that are sampled from to generate the batch of agents to train Policy parameter Possible values
PPO parameter Possible values
Shared layer neurons {64, 128, 256, 512}
Learning rate {7E − 5, 1E − 4, 3E − 4, 7E − 4} Actor network layers {2, 3}
Batch size {32, 64, 128} Actor network neurons {128, 256, 512, 1024}
Gamma {0.9, 0.95, 0.99, 1.0} Critic network layers {2, 3}
Number of rollout steps {256 }
1 Critic network neurons {128, 256, 512, 1024}
1 CNN architecture {Nature CNN1 }
This value is set constant in this hyperparameter tuning and differs
from the default value of 2048 in stable-baselines3. The value is set 1 The Nature CNN has been introduced in [37] and has been successfully
smaller compared to its default value to update the agent more frequently trained to play classic Atari 2600 games with reinforcement learning
since the collection of rollouts is computationally expensive methods

123
1762 J. Trilling et al.

Table 4 PPO hyperparameters


PPO parameter 3 sided cell 4 sided cell 5 sided cell
of the best performing agents,
that differ from the Learning rate 7E-05 1E-04 7E-05
stable-baselines3 defaults
Batch size 128 32 128
Gamma 1.0 0.99 1.0
Number of rollout steps 256 256 256

• the selection of actions, which is now done by the trained 4 Evaluation and selection of the agents
agent based on the normalized observations, in the training environment
• the FE simulation model to optimize, which is given in
the GHT process and not randomly generated by the envi- In the following, the best performing agents out of the 12
ronment and trained agents per cell type are shown. The Tables 4 and 5
• the cell selection process. show the hyperparameters for the best performing agents for
every cell type. The parameters given are the ones that have
While in training mode, the cell to optimize was chosen been tuned in the hyperparameter tuning.
randomly from the set of valid cells inside the graph. In the The corresponding training history for the best perform-
heuristic counterpart, a cell selection scheme is deployed. ing agents is given in Figs. 10 and 11 with a mean return
According to this scheme, the shape preservation measure and a mean episode length respectively. In the rollout plots,
à for all empty cells within the current graph is calculated. data is collected and averaged over the last 100 training
The more a cell is deformed, the more suitable it is to be episodes. Since those episodes are determined randomly and
optimized. At the same time, larger cells should be preferred, the actions from the agent are sampled from the probability
since they are more likely to show an influence on the global distribution given by the actor network, values at different
structural behavior. Therefore, the shape preservation mea- steps are not directly comparable. In the evaluation plots, 15
sure of the cell is weighted with the corresponding spanned different models and load cases are averaged and compared
area of the cell. This results in the importance λ of a cell between steps. Those models and load cases are always the
same for a specific cell type and the agent always chooses the
λ = Ã · A. (4) most promising action according to its current policy. This
allows for a better comparison of the agents performance
The cell with the highest λ in the given graph, will be chosen between the steps.
for the heuristic activation. It can be seen from the plots, that all agents were able to
It can happen that the agent worsens the cell performance achieve a significant improvement of the return, especially in
compared to the previous step due to a wrong decision. There- the first steps. The mean return is higher for cell types with a
fore, the heuristic chooses the structure that generates the smaller number of cell sides. The episode length for all agents
best shape preservation value of the cell over all steps in the after training is close to 2, which approximates the number
episode. of inserted edges in the graph. It is not the exact number
of inserted edges, because it is possible that the agent theo-

Table 5 Policy hyperparameters


Policy parameter 3 sided cell 4 sided cell 5 sided cell
of the best performing agents,
that differ from the Shared layer neurons 512 64 256
stable-baselines3 defaults
Actor network layers 2 3 3
Actor network neurons 512 128 512
Critic network layers 3 3 2
Critic network neurons 128 128 1024
CNN architecture Nature CNN Nature CNN Nature CNN

123
Reinforcement learning based agents... 1763

Fig. 10 Mean return on rollout


and evaluation data for agents
training on three different cell
types

retically inserts the same edge several times in a row, which The final 3 sided cell has a shape preservation value
would not change the structure. But this effect is negligible in à = 0.02. Only one edge is inserted by the agent. For
the trained state, because agents rarely insert edges multiple the 4 and 5 sided cells, the intermediate steps and the cor-
times, due to the penalization of inserting them. responding cells where only one edge is inserted are also
Since the mean return only measures the performance discussed. For the 4 sided cell, the agent has an à = 0.016
quantitatively but not qualitatively, examples of initial cell with only the diagonal edge inserted first. This diagonal edge
deformation behaviour compared to their improved counter- supports the cell stiffness by utilizing the deformation space
parts are shown in Fig. 12. The figure includes the shape along the compression direction of the deformed cell. From
preservation value à for the given examples. an engineering point of view, it might make more sense to
In all examples, the agent manages to improve the struc- support the cell in the tension direction to avoid buckling of
ture performance in terms of the shape preservation metric. the edge. Although the diagonal edge does not buckle in this
It is consistent with the episode length evaluation, that all example, the agent has learned to avoid the risk of buckling
agents in these examples insert one or two edges. All exam- and inserts a supporting second edge, which is reasonable
ple structures can be manufactured, which is not granted due for cells that absorb more energy. This means that the agent
to the high number of possible invalid edge combinations in failed to recognize that the episode could have been termi-
the cell. Since these are only non-representative examples, it nated earlier. For use in a later optimization, where the overall
must be mentioned that the agents often, but not always, make stiffness of a structure is to be improved, a correct recogni-
reasonable decisions. It is noticeable that the shape preser- tion of the terminal step would have been advantageous, as
vation value is always close to 0 for the final cell. While this the overall wall thickness would have remained larger due to
is desirable, it is not always possible, e.g., if the cell has to the mass constrain. Similar behaviour can be observed for the
absorb a lot of energy due to a direct impact of the sphere on cell with 5 sides. The shape preservation value, where only
the cell. the first edge connecting the blue and orange edge segment

Fig. 11 Mean episode length on


rollout and evaluation data for
agents training on three different
cell types

123
1764 J. Trilling et al.

Fig. 12 Examples of cell


optimizations performed by the
agents for different cell types,
where SC is the selected cell in
a given graph. The circled
numbers indicate the order in
which the agents inserted the
edges. The semi-transparent box
highlights the fully constrained
side of the model

with the pink and red edges is inserted, is à = 0.031. This constraints and keeping the mass m of the model constant.
is only slightly worse than the shape preservation of in the The global y-direction is identical to the local y-direction ŷ
final cell. Although the structural performance improved to of the profile shown in Fig. 4. It is important to notice, that
à = 0.026 with the final cell, the reward received for this the optimization is performed with heuristic activations only
action and final cell is slightly negative due to the penaliza- and no shape optimization is done at any point. The num-
tion of inserting an edge, implying that the agent should not ber of designs passed into the next generation is set to 5. A
have inserted the second edge. maximum of 10 iterations is allowed. In the following, the
optimization problem is formulated:

5 Performance of the agents in topology min y


optimization processes subject to l ≥ 10 mm,
d ≥ 10 mm,
In this section the RLS heuristic is used in practical GHT
optimizations. When dealing with an optimization tasks with α ≥ 15◦ ,
the GHT, it is important to note that the agent with the RLS 1mm ≤ twall ≤ 4 mm,
heuristic can insert multiple edges sequentially in one heuris- m = 20.25g. (5)
tic call. Conventional heuristics can insert or delete only one
edge per call. This gives the RLS heuristic an advantage that In Fig. 13, the optimization path that leads to the optimized
must be factored into the evaluation. design is shown. More simulations and iterations were carried
out than shown, but those did not lead to a better design. It can
5.1 Optimization of a frame model be seen that the heuristic RLS found a valid design in iteration
1 to 6. The design proposed by RLS in iteration 1 is part of the
The load case and the frame model for the GHT optimization path to the optimized design. In iteration 7, the RLS heuristic
have already been shown in Fig. 4. The objective of the opti- was not successful finding a valid design, due to either cells
mization is to minimize the displacement of the rigid sphere not reaching the threshold of the shape preservation value
in the global y-direction y, while satisfying manufacturing à or cells making the graph not manufacturable due to the

123
Reinforcement learning based agents... 1765

Fig. 14 Comparison of the initial and the optimized frame designs


with and without the RLS heuristic active in the optimization. The rigid
impactor is shown in gray

Those split up in 185 function calls from the conventional


GHT heuristics and an additional 73 function calls from the
RLS heuristic. In the current implementation of the inter-
face between the GHT and the RLS heuristic, more than the
strictly necessary simulations are performed. The number of
function calls of the RLS heuristic can be reduced to 31, if
the interface between the GHT and the RLS heuristic is opti-
Fig. 13 Optimization path that leads to the optimized frame design with
mized. In the optimization without the RLS heuristic 230
the RLS heuristic active
simulations are performed.

edge splitting process. All designs of the optimization fulfill 5.2 Optimization of a rocker model
the constraints.
The optimized design reduces the objective y from 72.36 So far, the frame model has been studied, which was also
mm to 7.46 mm. The greatest impact on structural perfor- used to train the agent. How the agent based RLS heuristic
mance comes from combining the RLS and DNW heuristics performs in other models and load cases is examined in this
from iteration 1 to 3 by changing the shape of the overall section.
structure to a triangular shape. This is initiated by the RLS Since the agent is designed to improve the stiffness of a
heuristic in iteration 1, where a diagonal edge is inserted into structure, a relevant vehicle component is selected for occu-
the graph. pant protection in a side crash. In such side impacts, there
Comparing the findings to an optimization without the is little deformation space until the occupant is struck. It is
RLS heuristic active, shows that the GHT finds a much more vital that the vehicle occupants are protected from excessive
complex structure with a worse objective value. This can be intrusion by the opposing vehicle or impactor.
seen in a side by side comparison of the deformed states Therefore, the performance of the RLS heuristic is inves-
of the initial, the optimized design with the RLS heuristic tigated based on a model of a rocker in a side crash against a
active and the optimized design without the RLS heuristic in rigid pole, which is presented in Fig. 15.
Fig. 14. The rocker is made out of aluminum, has an initial wall
It can be observed that the edges must be much thinner thickness of 3.5 mm and has an extrusion length of 600 mm,
in the optimized design without the RLS heuristic active in resulting in a mass of 2.801 kg. The energy of a moving
order to keep the mass of the model constant. The objective of rigid wall is introduced into the rocker through the seat cross
the optimized design without the RLS heuristic is improved member. The rigid wall has a mass of 85 kg and an initial
from 72.36 mm to 14.65 mm compared to the initial design. velocity in negative y-direction of 8.056 ms−1 .
The lower performance can be attributed to the fact that none The objective is to find a topology, that minimizes the
of the already existing heuristics inserted an edge diagonally displacement y of the rigid wall and therefore increases
in the frame. If a shape optimization was active, the GHT the stiffness of the rocker. All manufacturing constraints must
without the RLS heuristic would also find a similar triangular be fulfilled and the mass of the model must be equal for all
shape. designs. The number of concurrent designs is set to 5 and the
In total, 273 function calls, i.e. FE simulations, were car- maximum number of iterations allowed is set to 12. No shape
ried out in the optimization with the RLS heuristic active. optimization is performed during the optimization. The exact

123
1766 J. Trilling et al.

Fig. 15 Rocker model in a side


crash against a rigid pole

optimization problem is formulated as: The design variations that lead to the optimized structure
are shown in Fig. 16. Along this path, the RLS heuristic is
min y activated twice. One time it is activated in iteration 2 and
subject to l ≥ 20 mm, another time in iteration 5. In iteration 3 and 4 the agent
d ≥ 10 mm, proposed the same topology change as in iteration 5, but the
activation of other heuristics resulted in a better overall per-
α ≥ 15◦ , formance of the structure. The design proposals of the RLS
1.5 mm ≤ twall ≤ 3.5 mm, heuristic in iterations 6 and 8 increase the shape preservation
m = 2.801kg. (6) value of the respective cell, but are not useful with respect to
the stiffness of the rocker structure.
The performance of the structure from iteration 4 to 5
gets worse when the RLS heuristic is activated. In iteration
6, the DNW heuristic removes part of an edge added by the
RLS heuristic, causing the structure to perform better in the
long run. Similar to the optimization of the frame model, the
combination of the RLS and DNW heuristic works well.
The optimization was able to increase the objective
y from 68.92 mm to 29.95 mm. Comparing this to the
optimization with inactive RLS heuristic, a slightly worse
improvement from 68.92 mm to 31.53 mm is achieved. A
direct comparison of the deformed models is given in Fig. 17.

Fig. 17 Comparison of the initial and the optimized rocker designs with
Fig. 16 Optimization path that leads to the optimized rocker design and without the RLS heuristic active in the optimization
with the RLS heuristic active

123
Reinforcement learning based agents... 1767

Table 6 Summary of the results


Model RLS active Init. objective Opt. objective # RLS function calls # RLS used
of the RLS heuristic in two
distinct GHT optimizations Frame No 72.36 mm 14.65 mm − −
Yes 72.36 mm 7.46 mm 73 1
Rocker No 68.92 mm 31.53 mm − −
Yes 68.92 mm 29.95 mm 83 2

Although the optimized structure with the active RLS the cells. It is not guaranteed that the heuristic will always
heuristic performs slightly better, the emerging pattern of improve the optimization results. The displacements in crash
the inserted supporting walls in an offset manner is similar. simulations, which are assumed to play a major role in the
The optimization with the RLS heuristic was able to fill more decision process of the agent, behave well in crash simula-
space with this pattern successfully. tions from a mechanical point of view and therefore one could
The optimization with the RLS heuristic active is based assume that the agents decisions are fairly robust. Further
on 296 function calls. 213 of those function calls are from the research needs to be conducted to substantiate this assump-
conventional GHT heuristics and an additional 83 function tion.
calls are from the RLS heuristic. With an improved imple- Accordingly, there are some things that should be further
mentation of the interface between the RLS heuristic and explored in future work. To enhance the design diversity of
the GHT, the number of function calls of the RLS heuristic the cells, it is useful to extend the edge splitting process with
can be reduced to 37. The number of function calls in the more nodes along one edge. But there are limitations to this,
optimization without the RLS heuristic is 166. as very short edges are generated that do not fulfill the man-
ufacturing constraints. Also, only one simulation model for
training has been considered with limited amount of diversity
6 Discussion and conclusion in the load case. It is unclear how different training models
will affect the performance in real optimizations. Objective
In this paper, a novel heuristic for the topology optimization functions other than stiffness were also not investigated in this
of crash structures with the GHT was presented. For this paper. In crash development, force levels are often used as
purpose, RL was used to train agents that can locally improve an optimization objective for crash load cases. An additional
the stiffness of structures. Within the training environment, RL based heuristic that makes the structure more compliant
the agents were able to make plausible decisions about the instead of stiffer could help in those optimizations.
topology of the cells. It was more difficult for the agents to
differentiate if an episode could be terminated early. Author Contributions
• Conceptualization: Jens Trilling, Axel Schumacher, Ming Zhou
The trained agents have been used as a new RL based • Methodology: Jens Trilling, Axel Schumacher, Ming Zhou
heuristic in two GHT optimizations. Firstly, an optimization • Implementation: Jens Trilling
of a frame model was performed, where the new heuristic was • Investigation: Jens Trilling
able to direct the optimization to a better design compared • Writing - original draft preparation: Jens Trilling
• Writing - review and editing: Axel Schumacher, Ming Zhou
to the optimization without the new heuristic. Secondly, an • Supervision: Axel Schumacher
optimization of an application-oriented rocker model was
performed. The differences between the designs with and Funding Open Access funding enabled and organized by Projekt
without the new heuristic were smaller compared to the frame DEAL.
model optimization. Data availability The finite element models generated and analysed
Table 6 summarizes the results of those two optimizations. during the current study are available from the corresponding author on
In both optimizations the optimized structures performed reasonable request.
better with the RLS heuristic active. The use of a new heuris-
tic increases the number of function calls in an optimization. Declarations
Especially with the RL heuristic, where after every added
edge the performance of the cell must be evaluated, many
Compliance with Ethical Standards The data used in this study was
additional simulations must be performed. exclusively generated by the authors. No third parties were involved
Given the results shown, it is valid to state that the in the data generation. No research involving human participants or
presented heuristic is able to help the GHT in the optimiza- animals has been performed.
tion process at the cost of an increased numer of function
Competing interests The authors have no competing interests to
calls. The underlying agents perform reasonable from an declare that are relevant to the content of this article.
engineering perspective with respect to the goal of stiffening

123
1768 J. Trilling et al.

Open Access This article is licensed under a Creative Commons Doctoral thesis, University of Wuppertal, Wuppertal. https://d-nb.
Attribution 4.0 International License, which permits use, sharing, adap- info/1182555063/34
tation, distribution and reproduction in any medium or format, as 13. Iza-Teran R, Garcke J (2019) A Geometrical Method for Low-
long as you give appropriate credit to the original author(s) and the Dimensional Representations of Simulations 7(2). https://doi.org/
source, provide a link to the Creative Commons licence, and indi- 10.1137/17M1154205
cate if changes were made. The images or other third party material 14. Hahner S, Iza-Teran R, Garcke J (2020) Analysis and predic-
in this article are included in the article’s Creative Commons licence, tion of deforming 3d shapes using oriented bounding boxes and
unless indicated otherwise in a credit line to the material. If material lstm autoencoders. In: Farkaš I, Masulli P, Wermter S (eds.) Arti-
is not included in the article’s Creative Commons licence and your ficial neural networks and machine learning – ICANN 2020,
intended use is not permitted by statutory regulation or exceeds the Springer, Cham, pp 284–296. https://doi.org/10.1007/978-3-030-
permitted use, you will need to obtain permission directly from the copy- 61609-0_23
right holder. To view a copy of this licence, visit http://creativecomm 15. Hochreiter S, Schmidhuber J (1997) Long short-term memory.
ons.org/licenses/by/4.0/. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.
1997.9.8.1735
16. Koch M, Wang H, Bäck T (2018) Machine Learning for Predicting
the Damaged Parts of a Low Speed Vehicle Crash. In: 2018 Thir-
References teenth international conference on digital information management
(ICDIM), pp 179–184. IEEE, Piscataway, NJ, USA. https://doi.org/
1. Weider K, Schumacher A (2018) Adjoint Method for Topological 10.1109/ICDIM.2018.8846974
Derivatives for Optimization Tasks with Material and Geometrical 17. Sutton RS, Barto AG (2018) Reinforcement Learning: An Intro-
Nonlinearities. In: EngOpt 2018 proceedings of the 6th interna- duction. MIT Press
tional conference on engineering optimization, Springer, Cham, 18. Konda V, Tsitsiklis J (1999) Actor-critic algorithms. In: Solla S,
pp 867–878. https://doi.org/10.1007/978-3-319-97773-7_75 Todd L, Müller K (eds) Advances in Neural Information Processing
2. Choi WS, Park GJ (2002) Structural optimization using equiv- Systems (NIPS), vol 12. MIT Press, Denver, CO, USA, pp 1008–
alent static loads at all time intervals. Comput Methods 1014
Appl Mech Eng 191(19–20):2105–2122. https://doi.org/10.1016/ 19. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I,
S0045-7825(01)00373-5 Wierstra D, Riedmiller MA (2013) Playing atari with deep rein-
3. Park G-J (2011) Technical overview of the equivalent static loads forcement learning. In: Neural information processing systems
method for non-linear static response structural optimization. (NIPS) deep learning workshop, Lake Tahoe, NV, USA. https://
Struct Multidiscip Optim 43(3):319–337. https://doi.org/10.1007/ doi.org/10.48550/arXiv.1312.5602
s00158-010-0530-x 20. Haarnoja T, Ha S, Zhou A, Tan J, Tucker G, Levine S (2019) Learn-
4. Triller J, Immel R, Timmer A, Harzheim L (2021) The difference- ing to walk via deep reinforcement learning. In: Robotics: science
based equivalent static load method: an improvement of the and systems XV, Freiburg im Breisgau, Germany. https://doi.org/
ESL method’s nonlinear approximation quality. Struct Multidis- 10.48550/arXiv.1812.11103
cip Optim 63(6):2705–2720. https://doi.org/10.1007/s00158-020- 21. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A,
02830-x Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan
5. Patel NM, Kang B-S, Renaud JE, Tovar A (2009) Crashworthiness K, Hassabis D (2018) A general reinforcement learning algo-
Design Using Topology Optimization. J Mech Des 131(6):061013. rithm that masters chess, shogi, and go through self-play. Science
https://doi.org/10.1115/1.3116256 362(6419):1140–1144. https://doi.org/10.1126/science.aar6404
6. Ortmann C, Schumacher A (2013) Graph and heuristic based topol- 22. Hayashi K, Ohsaki M (2020) Reinforcement Learning and Graph
ogy optimization of crash loaded structures. Struct Multidiscip Embedding for Binary Truss Topology Optimization Under Stress
Optim 47(6):839–854. https://doi.org/10.1007/s00158-012-0872- and Displacement Constraints. Front Built Environ 6. https://doi.
7 org/10.3389/fbuil.2020.00059
7. Beyer F, Schneider D, Schumacher A (2021) Finding three- 23. Hayashi K, Ohsaki M (2022) Graph-based reinforcement learning
dimensional layouts for crashworthiness load cases using the for discrete cross-section optimization of planar steel frames. Adv
graph and heuristic based topology optimization. Struct Multidiscip Eng Inform 51:101512. https://doi.org/10.1016/j.aei.2021.101512
Optim 63(1):59–73. https://doi.org/10.1007/s00158-020-02768-0 24. Raffin A, Hill A, Gleave A, Kanervisto A, Ernestus M, Dormann
8. Olschinka C, Schumacher A (2008) Graph Based Topology Opti- N (2021) Stable-baselines3: reliable reinforcement learning imple-
mization of Crashworthiness Structures. In: Proceedings in applied mentations. J Mach Learn Res 22(268):1–8
mathematics and mechanics (PAMM), vol 8, pp 10029–10032. 25. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J,
https://doi.org/10.1002/pamm.200810029 Tang J, Zaremba W (2016) OpenAI Gym. https://doi.org/10.48550/
9. Ortmann C, Sperber J, Schneider D, Link S, Schumacher A (2021) arXiv.1606.01540
Crashworthiness design of cross-sections with the Graph and 26. Hagberg A, Schult DA, Swart PJ (2008) Exploring network struc-
Heuristic based Topology Optimization incorporating competing ture, dynamics, and function using networkx. In: Varoquaux G,
designs. Struct Multidiscip Optim 64(3):1063–1077. https://doi. Vaught T, Millman J (eds.) Proceedings of the 7th python in sci-
org/10.1007/s00158-021-02927-x ence conference, Pasadena, CA, USA, pp 11–15
10. Bohn B, Garcke J, Iza-Teran R, Paprotny A, Peherstorfer B, Schep- 27. Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen
smeier U, Thole C-A (2013) Analysis of car crash simulation data P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R,
with nonlinear machine learning methods. Procedia Computer Sci- Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, Del Río
ence 18:621–630. https://doi.org/10.1016/j.procs.2013.05.226 JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy
11. Kracker D, Dhanasekaran RK, Schumacher A, Garcke J (2022) T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE (2020) Array
Method for automated detection of outliers in crash simula- programming with numpy. Nature 585(7825):357–362. https://doi.
tions. Int J Crashworthiness 28(1):96–107. https://doi.org/10.1080/ org/10.1038/s41586-020-2649-2
13588265.2022.2074634 28. Diez C (2018) qd - Build your own LS-DYNA Tool Quickly
12. Diez C (2019) Process for extraction of knowledge from crash in Python. In: 15th International LS-DYNA Users Conference,
simulations by means of dimensionality reduction and rule mining. Detroit, MI, USA

123
Reinforcement learning based agents... 1769

29. Livermore Software Technology Corporation (LSTC): Ls-Dyna 35. Fukushima K (1980) Neocognitron: a self organizing neural net-
Manuals. https://www.dynasupport.com/manuals/ work model for a mechanism of pattern recognition unaffected by
30. Ortmann C (2015) Entwicklung eines graphen- und heuristik- shift in position. Biol Cybern 36(4):193–202. https://doi.org/10.
basierten Verfahrens zur Topologieoptimierung von Profilquer- 1007/BF00344251
schnitten für Crashlastfälle. Doctoral thesis, University of Wup- 36. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O
pertal, Wuppertal (2017) Proximal Policy Optimization Algorithms. https://doi.org/
31. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G 10.48550/arXiv.1707.06347
(2009) The graph neural network model. IEEE Trans Neural Net- 37. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare
works 20(1):61–80. https://doi.org/10.1109/TNN.2008.2005605 MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen
32. Kipf TN, Welling M (2017) Semi-Supervised Classification with S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wier-
Graph Convolutional Networks. In: 5th International conference stra D, Legg S, Hassabis D (2015) Human-level control through
on learning representations (ICLR), Toulon, France. https://doi. deep reinforcement learning. Nature 518(7540):529–533. https://
org/10.48550/arXiv.1609.02907 doi.org/10.1038/nature14236
33. Narayanan A, Chandramohan M, Venkatesan R, Chen L, Liu Y,
Jaiswal S (2017) graph2vec: Learning Distributed Representations
of Graphs. https://doi.org/10.48550/arXiv.1707.05005
Publisher’s Note Springer Nature remains neutral with regard to juris-
34. Trilling J, Schumacher A, Zhou M (2022) Generation of designs for
dictional claims in published maps and institutional affiliations.
local stiffness increase of crash loaded extrusion profiles with rein-
forcement learning. In: Machine learning and artificial intelligence
in CFD and structural analysis, Wiesbaden, Germany. NAFEMS

123

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy