0% found this document useful (0 votes)

9 views13 pages

Symmetry 15 00916 v2

The paper presents BEM-SM, a BERT-based model designed to solve math word problems by incorporating a supervision module that checks the symmetry between generated expressions and their corresponding problems. It introduces a fine-tuning task to predict the number of operators in expressions, enhancing the model's understanding of the relationships between problems and expressions. Experimental results demonstrate that BEM-SM outperforms existing methods in solving math word problems.

Uploaded by

s358695177

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views13 pages

Symmetry 15 00916 v2

Uploaded by

s358695177

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

SS symmetry

Article
BEM-SM: A BERT-Encoder Model with Symmetry Supervision
Module for Solving Math Word Problem
Yijia Zhang † , Tiancheng Zhang *,† , Peng Xie, Minghe Yu and Ge Yu

School of Computer Science and Engineering, Notrheastern University, Shenyang 110819, China;
2171991@stu.neu.edu.cn (Y.Z.)
* Correspondence: tczhang@mail.neu.edu.cn
† These authors contributed equally to this work.

Abstract: In order to find solutions to math word problems, some modules have been designed to
check the generated expressions, but they neither take into account the symmetry between math word
problems and their corresponding mathematical expressions, nor do they utilize the efficiency of
pretrained language models in natural language understanding tasks. Anyway, designing fine-tuning
tasks for pretrained language models that encourage cooperation with other modules to improve the
performance of math word problem solvers is an unaddressed problem. To solve these problems,
in this paper we propose a BERT-based model for solving math word problems with a supervision
module. Based on pretrained language models, we present a fine-tuning task to predict the number
of different operators in the expressions to learn the potential relationships between the problems
and the expressions. Meanwhile, a supervision module is designed to check the incorrect expressions
generated and improve the model’s performance by optimizing the encoder. A series of experiments
are conducted on three datasets, and the experimental results demonstrate the effectiveness of our
model and its component’s designs.

Keywords: math word problems; natural language processing; pre-trained models

Citation: Zhang, Y.; Zhang, T.; Xie, P.;

1. Introduction
Yu, M.; Yu, G. BEM-SM: A A math word problem (MWP) is described as a natural language narrative where
BERT-Encoder Model with Symmetry descriptive vocabulary is integrated with the intrinsic numerical logic. The core of the
Supervision Module for Solving solution is to extract valid information and numbers from the problem and form a correct
Math Word Problem. Symmetry 2023, representation to calculate the answer. Example 1 in Table 1 shows an example of a MWP.
15, 916. https://doi.org/10.3390/ In this example, the solution model extracts quantities such as “1250” from the problem,
sym15040916 understands the quantitative relationships in the problem, infers the operators between the
Academic Editor: Silvio Pardi quantities, maps the problem to the expression “x = 1250 − 32 − 30”, and computes the
final answer “1188”.
Received: 21 March 2023 Various end-to-end neural network models have been proposed that are capable of
Revised: 7 April 2023
solving these problems without human intervention. Among them, refs. [1–6] encodes the
Accepted: 12 April 2023
problem as a sequence of words based on seq-seq, but these sequence-based approaches
Published: 14 April 2023
only focus on the effect of the sequence of words and lack an understanding of the problem.
The sequence-based decoding approaches might generate incorrect solutions without
controlling the inference of the expressions. To address these problems, refs. [7,8] attempted
Copyright: © 2023 by the authors.
to improve the reasoning ability of the model by converting expression generation into
Licensee MDPI, Basel, Switzerland. a construction of expression trees; ref. [9] encoded the number comparison relationship
This article is an open access article and problem dependency relationships through a graph form, and adopted graph neural
distributed under the terms and networks to learn the problem representation; ref. [10] designed a teacher–student network
conditions of the Creative Commons to integrate information to guide the expression generation; ref. [11] combined common
Attribution (CC BY) license (https:// sense knowledge from an external knowledge base to construct entity graphs, after which
creativecommons.org/licenses/by/ the knowledge representation is perceived using graph neural networks; and ref. [12]
4.0/). enhanced the understanding of the problem through hierarchical semantic learning.

Symmetry 2023, 15, 916. https://doi.org/10.3390/sym15040916 https://www.mdpi.com/journal/symmetry

Symmetry 2023, 15, 916 2 of 13

Table 1. Examples of MWP. Here, the % represents that the mathematical expression inferred by
the GTS is incorrect and cannot solve the problem of Example 2, while the X indicates that the
mathematical expression inferred by our model is correct. We can use this expression to calculate the
solution of the problem of Example 2.

Example 1
A manuscript has 1250 words, Fang typed 32 words
Problem in the morning, she typed another 30 words in the
afternoon, how many words are left to type?
Expression x = 1250 − 32 − 30
Answer 1188
Example 2
A manuscript has 1250 words, and Fang types an
Problem average of 32 words per minute. After she has
typed 30 min, how many words are left to type?
Expression x = 1250 − 32 ∗ 30
GTS: x = 1250 − 32 − 30 (%) Ours: x = 1250 − 32 ∗ 30 (X)

Although these methods which just focus on expression generation have shown good
performance in MWP tasks, none of them can ensure that the generated expressions are
consistent with the mathematical logic of the problems, nor can they design fine-tuning
tasks for pretrained language models to adapt the supervision module to enable the
supervision module to better check the symmetry between the expression generated by the
decoder and the problem text. For example, as shown in Table 1, Example 2 has a small
change in the expression of the problem, but it has a larger change in logic. For example, the
GTS model simply generates the expression directly from the problem expression. It does
not infer that the relationship between quantity “32” and quantity “30” is multiplication,
and it infers the incorrect expression.
To address these issues, we propose a BERT-encoder model with a supervision module
for solving math word problems (BEM-SM). To further confirm the correctness of the
generated expression, we designed a multihead-attention-based supervision module to
supervise the symmetry between the generated expression and the question text. First,
we train the classifier in advance to learn the matching expression of each problem. Then
check whether each problem matches the expressions generated by the decoder during the
solution process. When the mismatch occurs, we improve the model effect by optimizing
the problem representations calculated by the encoder that lead to the incorrect solutions.
On the other hand, the method is based on the advantages of the pretrained language
models in natural language understanding; we adopted them to obtain the contextual
representation of the problems. Moreover, we also proposed a fine-tuning task to predict
the number of different operators in the expressions and cooperated with the supervision
module to identify the incorrect expressions generated by the decoding module. The main
contributions of this paper are summarized as follows:
• We design a multihead-attention-based supervision module to check the generated
incorrect expressions and further improve the performance of the model by optimizing
the encoder.
• We present a fine-tuning task to predict the number of different operators in each
expression, which enables the model to better understand the potential connection
between problems and expressions.
• We conducted a sufficient number of experiments on three datasets, and the experi-
mental results show that our model has a better performance when compared with
the current optimal methods.
Symmetry 2023, 15, 916 3 of 13

2. Related Works
2.1. Math Word Problem
The automatic solution of math word problems, as one of the tools to assist students
in learning, has received widespread attention in recent years. The development of the
MWP task is closely related to the evolution of natural language processing techniques.
Early MWP techniques mainly include rule-based matching methods [13] and statistical
learning-based methods [14], which can only solve problems with limited scenarios due to
their heavy reliance on manual labor and their poor flexibility [15]. In 2017, Wang et al. [1]
proposed the first deep neural network solver, marking the beginning of solving techniques
based on deep learning methods. After that, several deep learning solver models based on
the seq-seq structure have been proposed, such as [1–6], which adopt neural networks to
directly convert problem text sequences into expression sequences. Liu et al. [7] proposed
a solver model with a seq-tree structure, which generates expression trees by top-down
decoding of the tree structure. Xie et al. [8] proposed another tree structure model, that
generates expression trees based on a goal-driven approach. This tree decoding approach
has better generation results than decoding in sequence form. Zhang et al. [9] combined
the advantages of [8] and proposed a graph-tree model to convert problem information
into a graph structure to encode the quantitative relations in the form of a graph structure
for rich problem representations. Zhang et al. [10] proposed a teacher–student network
with multiple decoders to guide the inference of knowledge, in which a multiple decoding
structure composed of basic and perturbation decoders could obtain multiple solutions.
Wu et al. [11] designed a knowledge-aware solution model based on the seq-tree structure,
which combined commonsense knowledge from an external knowledge base in the problem
encoding part. Lin et al. [12] proposed a hierarchical solution model to understand
and analyze the problem from a word–sentence–problem level, and adopted a pointer
generation network at the decoder to guide the model to replicate the existing information
and infer additional information.

2.2. Autoencoder Pretrained Language Models

Autoencoder pretrained language models (PLMs) based on Transformer [16], such as
BERT [17], have been highly successful in the field of natural language processing (NLP).
These models are pretrained on a large number of texts to learn the language represen-
tation, and BERT learns the model parameters using masked language modeling (MLM)
and next sentence prediction (NSP) as pretraining tasks. Since then, many studies have
optimized the BERT model by continuously introducing improvements to the pretraining
tasks. ERNIE [18] optimized the masking process of BERT by masking objects at the word
level, phrase level, and entity level; RoBERT [19] adopted a dynamic masking strategy to
learn different linguistic representations; ALBERT [20] changed the pretraining task NSP to
sentence order prediction (SOP) to model the coherence of sentences. In addition, ALBERT
proposed two parameter reduction methods, called matrix decomposition and cross-layer
parameter sharing, to improve the model training speed. ELECTRA [21] designed the re-
placement tag detection (RTD) pretraining task, which verifies whether each tag is obtained
when the input is replaced, and it is based on a generator–discriminator structure, where
the sampled words are adopted to make changes to the input words and the discriminator
is used to determine whether the words are replaced by the generator. MacBERT [22] is a
Chinese pretrained language model, which models the Chinese language by using similar
words to improve the MLM task by replacing the masked characters with similar words,
and it reduces the gap between the pretraining and fine-tuning phases. In this paper, we
implement a representation of Chinese math word problems based on MacBERT [22] and
English problems based on BERT [17].
Symmetry 2023, 15, 916 4 of 13

3. Model
3.1. Problem Statement
The input of the math word problem model is a sequence of text with length n, denoted
as P = { p1 , p2 , ..., pn }, where pi may be a word from normal language or the number.
The result is a mathematical expression of length m, denoted as A = { a1 , a2 , ..., am },
where ai might be any one of the following three components. The first part is made up of
the numbers that appear in the text of the problem, which is denoted as Vnum . The second
part, which is designated Vcon , consists of some external auxiliary constants that must be
employed to solve mathematical application problems, such as (1,π).
The third part is denoted as Vop , which is made up of the operators such as ’+’, ’−’, etc.

3.2. The Model’s Architecture

The architecture of our proposed BEM-SM model is shown in Figure 1. Here, we
demonstrate the workflow of the model using Example 2 in Table 1 as an example. We
employ PLMs to encode problems in natural language form and design an operator predic-
tion task to fine-tune it. Then, the tree decoder proposed by GTS [8] is used to generate the
mathematical expressions corresponding to each problem. Finally, we design a supervision
module to check the consistency between the problem representation generated by the
encoder and the mathematical expressions generated by the decoder.

Figure 1. Overall architecture of the BEM-SM (The numbers in the Decoder Module are used to
explain how our decoding process is carried out).

3.2.1. Encoder Module

We adopt PLMs as the encoder of the problem, to obtain the problem representation
Z ∈ Rn∗h by mapping the problem into a vector matrix made up of word by word vectors,
where n is the total number of characters contained in the problem and h is the dimension
of the hidden states.
Z = BERT ( P) (1)
Symmetry 2023, 15, 916 5 of 13

−
Then, the overall representation vector of Problem Z is obtained by averaging each
word vector within matrix Z, that is, to take the average of all columns in matrix Z

− 1 n
n i∑
Z= zi (2)
=0

3.2.2. Fine-Tuning Module

In this module, we design a fine-tun ing task to predict the number of different
operators in the expression. To obtain the number of different operators O pre in the
−
relevant mathematical expression, we pass Z into a feed-forward neural network FFN,
which contains two fully connected layers and one ReLU during the fine-tuning stage.
−

O pre = FFN Z (3)

We enable the encoder to recognize the mathematical relationship present in the prob-
lem by introducing the operator prediction task. The decoding module and the supervision
module could thus receive a more accurate problem representation, avoiding the generation
of incorrect solutions. To verify our idea, we, respectively, selected fifteen problems with
the corresponding equation templates of “N/N”, “N ∗ N/N”, and “N ∗ N + N”, and use
the MacBERT and the MacBERT that we have fine-tuned by our fine-tuned approach to
encode these problems, and use the T-SNE to reduce the dimensions of these problem
representations; the results are shown in the blue, green, and red points in Figure 2. By
observing the distribution of the points corresponding to the different problems, we can
see that the encoder after fine-tuning can better separate the problem representations with
different numbers of operators, and problems with similar mathematical expression tem-
plates have more similar vector representations, while the representations of the unrelated
problems are further separated. This makes the matching relationships between the prob-
lem representation vectors and the expression representation vectors more obvious, and
the concatenated feature vectors more representative, to reduce the classification difficulty
of the supervision module.

(a) Without Fine-tuning (b) With Fine-tuning

Figure 2. Scatter chart of problem representation. Here, points of different colors represent dimen-
sionally reduced representations of problems with different equation templates.

3.2.3. Decoder Module

In the decoding module, we seek to produce the mathematical expression A and
compute the solution to each problem. Here, we apply the tree decoder provided by
GTS [8] to generate the expression tree. GTS proposes a tree-structured neural model to
generate expression trees in a goal-driven manner. Given an overall representation vector of
a problem, the model first identifies its goal to achieve and then the goal gets decomposed
Symmetry 2023, 15, 916 6 of 13

into sub-goals combined by an operator in a top-down recursive way. The decoder module
in Figure 1 is an illustration of an expression tree. Here, the feature vector qroot of the root
−
node in the expression tree is initially created using the vector Z, which reflects the overall
information of the problem computed by the encoder.
−
qroot = Z (4)

After that, the attention mechanism is used to calculate the context vector ĉ, and the
mathematical symbol prediction is implemented according to qroot and ĉ shown as 1 , 3 ,
5 , 7 , 9 in Figure 1.
ĉ = Attention(q, Z ) (5)

∼
y = Predict(q, ĉ) (6)
∼
If the prediction token y is the number or a constant, the subtree representation is directly
∼ ∼
implemented by y. If the prediction token y is an operator, the target needs to be decom-
posed into left and right subtargets, and its left subgoal qle f t is calculated based on the
∼
prediction token of this node y, the context vector ĉ and the target vector q, shown as 2 ,
6 in Figure 1.
∼

qle f t = Le f tChild q, ĉ, y (7)

∼

y le f t = Predict qle f t , ĉle f t (8)

Then we continue to construct the expression tree with the left child node as the root
node in a prior order until the prediction mathematical symbols of a node are the number
or a constant. Then the construction of the right sibling node of the node is carried out. In
addition to the features required to build the node, the process also considers the subtree
representation tle f t of the node shown as 4 , 8 in Figure 1. Then the prediction of the
right sibling node is established according to the right child target qright and its context
vector ĉright .
∼

qright = RightChild q, ĉ, y, tle f t (9)

∼

y right = Predict qright , ĉright (10)

If the prediction symbol is an operator, the target decomposition of the node should be
continued until the prediction symbol is the number or a constant. Then the subtree
representation t of each node starts to be constructed upward layer by layer. In this way,
the subtree representation is constructed from the subtree representation of its left and right
subtrees as shown in Figure 1 10 , 11
∼ ∼
t = SubTree Num y i f y ∈ (Vnum or Vcon ) (11)

∼ ∼

t = SubTreeOp tle f t , tright , y i f y ∈ Vop (12)

3.2.4. Supervision Module

In the supervision module inspired by [23], we design a multihead-attention-based
supervision module, as shown in Figure 1. We adopt a classifier to obtain the problem
representation that leads to the generation of incorrect expressions. The specific design is
shown as follows:
Symmetry 2023, 15, 916 7 of 13

First, we use the multiheaded-attention mechanism to capture the relationship among

the parts of the expression and obtain the vector representation of the expression h A ∈ Rm∗h .

h A = MultiheadAttention( A, A, A) (13)
−
Here we average h A and concatenate it with the problem representation Z. Then we put
the concatenated vector in the classifier to determine the consistency between the problem
representation and the expression representation.
− 1 m
m i∑
hA = h Ai (14)
=0
− −

u = FC [ Z : h A ] (15)

− −
In Equation (15) Z and h A are the mean vectors of Z and h A , and [:] denotes the
concatenation operation.
To improve the classification effect and to avoid the error due to the limitation of the
classifier, we use a similar negative sampling algorithm to that in [23], which provides the
classifier with both the real mathematical expression A corresponding to each problem and
the negative example expression Aneg generated according to A. As shown in Algorithm 1,
the parameter λ in the algorithm is a probability threshold between 0 and 1, which decides
whether to change a mathematical symbol in A. In this paper, λ is set to 0.1.

Algorithm 1 Negative Expression Generation

Input: Positive answer A = { A1 , A2 , ..., Am }
Parameter: Probability threshold λn o
neg neg neg
Output: Negative answer Aneg = A1 , A2 , ..., Am
1: Aneg = A
2: while Aneg == A do
3: for i = 1 to m do
4: p ← a random value between 0 and 1
5: if p < λ then
6: if Ai ∈ Vop then
neg
7: Ai ← a random item from Vop \ Ai
8: else if Ai ∈ Vnum then
neg
9: Ai ← a random item from Vnum \ Ai
10: end if
11: end if
12: end for
13: end while
14: return Aneg

3.3. Learning Objectives

Loss Function for Fine-Tuning: In the fine-tuning phase, our goal is to predict the
number of different operators in the mathematical expressions. Our learning objective is to
minimize the following loss function:

L Fine−Tuning = MSE O pre , Otruth (16)

In Equation (16) MSE stands for mean square error, O pre represents the number of
individual mathematical symbols in the expression predicted by the model, and Otruth is
the actual number of individual mathematical symbols in the expression.
Symmetry 2023, 15, 916 8 of 13

Loss Function for Supervision Module: The goal of the supervision module is to
determine whether the problem representation and the expression representation match,
and here our learning goal is to minimize the binary-cross-entropy loss:
− −

Lsupervision = −logP u = 1 A, Z − logP u = 0 Aneg , Z (17)

Loss Function for the Encoder–Decoder Module: The goal of the encoder–decoder
module is to maximize the probability of generating its corresponding mathematical ex-
pression given the problem and, in addition, under the supervision of the teacher module,
the encoder should maximize the consistency of the representation Z and the answer A.
Therefore, the loss function to be minimized is:
−

L Encoder− Decoder = −logP( A| P) − α ∗ logP u = 1 A, Z (18)

Parameter α is the weight of the loss of the supervision module during training. In
this paper, α is set to 0.05.

4. The Experiment
4.1. Datasets
In this paper, we use three publicly available datasets in this field as experimental
data. Among them, Math23K [1] and Ape210K [24] are Chinese datasets. MathQA [25]
is an English dataset that consists of large-scale complex English math word problems.
The Math23K dataset contains a total of 23,162 problems. Ape210K is a larger and more
complex MWP dataset, containing a total of 210,488 problems with 56,532 templates. Due
to the presence of noisy data in Ape210K, we used the Ape-clean dataset obtained by
Liang et al. [26] after data filtering, which contains 81,225 questions.

4.2. Implementation Details

The training process of our model is carried out in two stages. In the fine-tuning stage,
we set the number of epochs to 10, where we train BERT alone and minimize Equation (16).
In the solution stage, we set the number of epochs to 80. In the first 30 epochs, we train
the supervision module and the encoder–decoder module, respectively, and minimize
Equation (17) as well as the first part of Equation (18). After that, the supervision module
can already supervise the problem representation and expression representation accurately,
then, in the second 50 epochs, we minimize Equations (17) and (18) to allow the supervision
module to guide the encoder to generate more accurate problem representations.
In both stages, we remove all MWPs with a problem length n > 100 or an answer
equation length m > 20 on MathQA and Ape-clean, as they will create training obstacles,
and we set the batch size to 32. The embedding dimension of A is set to 128. The hidden
feature dimension h is set to 768. We use the Adam [27] optimizer to ensure training
stability, and set the initial value of the learning rate to 3 × 10−5 on BERT and 3 × 10−4 for
the rest of the modules, halving it every 30 epochs. We set the dropout to 0.5 to prevent
overfitting. The beam search size is set to 5 for the test phase; our model is performed on
four NVIDIA TESLA V100s.

4.3. Baselines
We compared our method with some typical MWP solution models proposed so far:
• Group-ATT [5]: proposed using attention to extract the various relevant features of
the problem.
• GTS [8]: the proposed goal-driven form of generating expression trees is a significant
MWP benchmark model.
Symmetry 2023, 15, 916 9 of 13

• GTS+Teacher [23]: a separable representation of the MWP problem with similar

expressions but different answers is proposed for the Teacher module, and the GTS
model is combined with the Teacher module.
• GTS w SM: GTS model with a combination of our proposed supervision module.
• Graph2Tree [9]: The quantity representation is proposed to be enriched by quantity
cell graph and quantity comparison graph.

4.4. Evaluation
As with other benchmarking methods, we use the solution accuracy as an evaluation
metric. For the Math23k dataset, we use two validation methods, with the first using the
standard division given by the dataset, denoted Math23K, and the second using a fivefold
cross-validation method, denoted Math23K*. For the Ape-clean dataset, the division given
in [26] is used, which contains 79,388 training problems and 1837 testing problems. We
integrate the training set of Ape-clean and the remaining 129,263 questions from Ape210K
for fine-tuning the BERT. Furthermore, for the MathQA dataset, we use the standard
division given by the dataset, denoted MathQA, which contains 23,703 training problems
and 3540 testing problems.

4.5. Overall Results

Table 2 shows the results of our proposed BEM-SM model compared with others.
First, we can see that the overall results of the BEM-SM model are better than all the other
methods that do not use pretrained language models as encoder on the Math23K dataset,
which proves that the pretrained language models are effective on MWP tasks. In addition,
it can be seen that the solution of GTS w SM outperforms GTS w Teacher and the difference
between them lies in the different design of the supervision module, which indicates that
the supervision module we designed has some degree of improvement over Teacher [23]
and thus achieves a better generation performance.

Table 2. Comparison of the accuracy of the answers solved by the BEM-SM and the baseline method.

Model Math23K Math23K*

Group-ATT 0.695 0.669
GTS 0.756 0.743
GTS w Teacher 0.765 0.746
GTS w SM 0.775 0.749
Graph2Tree 0.774 0.755
BEM-SM (Ours) 0.845 0.838

These results prove our model can learn the potential relationship between the prob-
lems and the expressions by predicting the number of different operators, and further
improve the accuracy of the generated expression through the supervision module.

4.6. Ablation Experiment

We conducted ablation experiments for each part of the model innovations. First,
we conducted ablation experiments for the fine-tuning task, and the results are shown in
Tables 3 and 4. BEM-SM w/o FT only uses BERT without fine-tuning in the encoding part.
We can see that the model performance decreases when the fine-tuning task is not consid-
ered, which proves that it can achieve a better problem understanding and the introduction
of the fine-tuning task helps the model to achieve a higher solution performance.
Symmetry 2023, 15, 916 10 of 13

Table 3. The accuracy rate of the answer to the ablation experiment.

Model Math23K Math23K* MathQA Ape-Clean

BEM-SM w/o FT 0.839 0.827 0.812 0.796
BEM-SM w/o SM 0.84 0.837 0.815 0.795
BEM-SM 0.845 0.838 0.817 0.8

Table 4. The accuracy rate of the equation to the ablation experiment.

Model Math23K Math23K* MathQA Ape-Clean

BEM-SM w/o FT 0.709 0.708 0.796 0.66
BEM-SM w/o SM 0.712 0.721 0.801 0.659
BEM-SM 0.72 0.724 0.81 0.672

We also conducted ablation experiments with the supervision module, and the results
are shown in Tables 3 and 4, where BEM-SM w/o SM indicates that the model does
not employ the supervision module. When the supervision module is not used, the
performance of the model decreases for all three verification methods, which proves that
the supervision module designed to optimize problem representation can optimize our
encoder and improve the rationality of the model generated expression.

4.7. Parameter Experiment

We compared different negative sampling forms and the results are shown in Table 5,
where Ex-Acc and An-Acc represent the expression accuracy and answer accuracy, respec-
tively. In the results, changing both the operators and numbers shows better performance
than operating by changing either. In addition, when only changing operators, the overall
effect of the model is better than that of changing the numbers only. The reason is that we
make our encoder learn the number of operators in its corresponding expression in the
fine-tuning stage. Therefore, when the operators in the correct expression are changed, the
mathematical symbol information contained in the expression of the correct expression and
the negative expression is different. However, the information of the operators contained in
the problem representation is the same as that in the correct expression, so the classifier in
our supervision module could effectively classify the problem–expression pairs, obtain the
problem representation that leads to the generation of incorrect expressions, and further op-
timize the encoder. However, when only the numbers in the correct expression are changed,
our encoder could not correctly identify the difference of the operators between the correct
and the negative expressions, and could not achieve the effect mentioned above. This result
further shows that the fine-tuning task we designed could achieve better performance in
the model with the supervision module.

Table 5. Experimental results of negative sampling.

Form of Negative Sampling Ex-Acc An-Acc

Only operator 0.713 0.843
Only number 0.711 0.841
BEM-SM 0.72 0.845

We also conducted experiments on the number of negative expressions and the results
are shown in Table 6. The results show that it performs best when five negative expressions
are generated for each correct expression.
Symmetry 2023, 15, 916 11 of 13

Table 6. Experimental results of the number of negative examples.

Number of
Ex-Acc An-Acc
Negative Sampling
1 0.719 0.841
3 0.719 0.843
5 0.72 0.845
10 0.715 0.842

In addition, we tested the effect of the model on items with different expression
lengths, as shown in Figure 3. The green broken line represents the proportion of items
with different expression lengths. It can be observed that BEM-SM exhibits the best solution
performance in all cases, and the gap between our model and other models is becoming
larger in the case of long expressions (11+). To some extent, it proves that our model has
more advantages for more complex problems.

1 0.6
0.9
0.5
0.8
0.7
0.4

Proportion
0.6
Acc

0.5 0.3
0.4
0.2
0.3
0.2
0.1
0.1
0 0
3- 5 7 9 11+
Expression Length
GTS Graph2Tree BEM-SM Proportion
Figure 3. Model accuracy on items of different expression lengths.

5. Conclusions
In this paper, we propose a BERT-based model with a supervision module for the
automatic solving of math word problems. We designed a multihead-attention-based
supervision module, which makes the encoder generate a more accurate problem repre-
sentation by checking the consistency between the problem representation and expression
representation to improve the solution accuracy. Based on pretrained models, we also
designed a fine-tuning task to predict the number of different operators to better ascertain
the relationship between the problems and expressions, which could improve the solution
accuracy. Our experimental results on three datasets also demonstrate the effectiveness of
our model and the design of its various components.
In the future, the performance of the supervision module in improving the effect
of the model and its flexibility make it the third most important part in addition to the
encoding module and decoding module in the MWP models. In addition, designing other
fine-tuning tasks to further improve the solution effect of the model is also worth studying.
On this basis, we can also try our modules and other fine-tuning tasks on other pretrained
language models.
Symmetry 2023, 15, 916 12 of 13

Author Contributions: Conceptualization, Y.Z.; methodology, Y.Z.; software, Y.Z.; validation, Y.Z.,
and P.X.; formal analysis, T.Z.; investigation, Y.Z.; resources, T.Z.; data curation, Y.Z.; writing—
original draft preparation, Y.Z.; writing—review and editing, Y.Z.; visualization, Y.Z.; supervision,
T.Z.; project administration, T.Z., M.Y. and G.Y.; funding acquisition, T.Z. All authors have read and
agreed to the published version of the manuscript.
Funding: This work is supported by National Natural Science Foundation of China under Grant
(Nos.62272093, 62137001).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: All data used in this manuscript was downloaded from Github.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Wang, Y.; Liu, X.; Shi, S. Deep neural solver for math word problems. In Proceedings of the 2017 Conference on Empirical
Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 845–854.
2. Wang, L.; Wang, Y.; Cai, D.; Zhang, D.; Liu, X. Translating a math word problem to an expression tree. arXiv 2018, arXiv:1811.05632.
3. Wang, L.; Zhang, D.; Zhang, J.; Xu, X.; Gao, L.; Dai, B.T.; Shen, H.T. Template-based math word problem solvers with recursive
neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February
2019; Volume 33, pp. 7144–7151.
4. Chiang, T.R.; Chen, Y.N. Semantically-aligned equation generation for solving and reasoning math word problems. arXiv 2018,
arXiv:1811.00720.
5. Li, J.; Wang, L.; Zhang, J.; Wang, Y.; Dai, B.T.; Zhang, D. Modeling intra-relation in math word problems with different functional
multi-head attentions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence,
Italy, 28 July–2 August 2019; pp. 6162–6167.
6. Meng, Y.; Rumshisky, A. Solving math word problems with double-decoder transformer. arXiv 2019, arXiv:1908.10924.
7. Liu, Q.; Guan, W.; Li, S.; Kawahara, D. Tree-structured decoding for solving math word problems. In Proceedings of the
2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural
Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 2370–2379.
8. Xie, Z.; Sun, S. A goal-driven tree-structured neural model for math word problems. In Proceedings of the Twenty-Eighth
International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019; pp. 5299–5305.
9. Zhang, J.; Wang, L.; Lee, R.K.W.; Bin, Y.; Wang, Y.; Shao, J.; Lim, E.P. Graph-to-tree learning for solving math word problems. In
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020.
10. Zhang, J.; Lee, R.K.W.; Lim, E.P.; Qin, W.; Wang, L.; Shao, J.; Sun, Q. Teacher-student networks with multiple decoders for solving
math word problem. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-20),
Yokohama, Japan, 11–17 July 2020.
11. Wu, Q.; Zhang, Q.; Fu, J.; Huang, X.J. A knowledge-aware sequence-to-tree network for math word problem solving. In
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November
2020; pp. 7137–7146.
12. Lin, X.; Huang, Z.; Zhao, H.; Chen, E.; Liu, Q.; Wang, H.; Wang, S. Hms: A hierarchical solver with dependency-enhanced
understanding for math word problem. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February
2021; Volume 35, pp. 4232–4240.
13. Mukherjee, A.; Garain, U. A review of methods for automatic understanding of natural language mathematical problems. Artif.
Intell. Rev. 2008, 29, 93–122. [CrossRef]
14. Liang, C.C.; Hsu, K.Y.; Huang, C.T.; Li, C.M.; Miao, S.Y.; Su, K.Y. A tag-based statistical english math word problem solver with
understanding, reasoning and explanation. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial
Intelligence (IJCAI-16), New York, NY, USA, 9–15 July 2016; pp. 4254–4255.
15. Zhang, D.; Wang, L.; Zhang, L.; Dai, B.T.; Shen, H.T. The gap of semantic parsing: A survey on automatic math word problem
solvers. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 2287–2305. [CrossRef] [PubMed]
16. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv.
Neural Inf. Process. Syst. 2017, 30, 1–11.
17. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding.
arXiv 2018, arXiv:1810.04805.
18. Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Tian, H.; Wu, H.; Wang, H. Ernie 2.0: A continual pre-training framework for language
understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020;
Volume 34, pp. 8968–8975.
Symmetry 2023, 15, 916 13 of 13

19. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly
optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692.
20. Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language
representations. arXiv 2019, arXiv:1909.11942.
21. Clark, K.L.; Le, M.; Manning, Q.; ELECTRA, C. Pre-training text encoders as discriminators rather than generators. arXiv 2020,
arXiv:2003.10555.
22. Cui, Y.; Che, W.; Liu, T.; Qin, B.; Wang, S.; Hu, G. Revisiting pre-trained models for chinese natural language processing. arXiv
2020, arXiv:2004.13922.
23. Liang, Z.; Zhang, X. Solving math word problems with teacher supervision. In Proceedings of the Twenty-Eighth International
Joint Conference on Artificial Intelligence (IJCAI-21), Montreal, QC, Canada, 19–27 August 2021; pp. 3522–3528.
24. Zhao, W.; Shang, M.; Liu, Y.; Wang, L.; Liu, J. Ape210k: A large-scale and template-rich dataset of math word problems. arXiv
2020, arXiv:2009.11506.
25. Amini, A.; Gabriel, S.; Lin, P.; Koncel-Kedziorski, R.; Choi, Y.; Hajishirzi, H. Mathqa: Towards interpretable math word problem
solving with operation-based formalisms. arXiv 2019, arXiv:1905.13319.
26. Liang, Z.; Zhang, J.; Shao, J.; Zhang, X. Mwp-bert: A strong baseline for math word problems. arXiv 2021, arXiv:2107.13435.
27. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

2 - Antenatal Care
50% (2)
2 - Antenatal Care
44 pages
Development and Validation of Instrument For Assessment of Students Psychomotor Skill in Senior Secondary School Mathematics
100% (1)
Development and Validation of Instrument For Assessment of Students Psychomotor Skill in Senior Secondary School Mathematics
38 pages
1 s2.0 S0306457325000019 Main
No ratings yet
1 s2.0 S0306457325000019 Main
16 pages
Resolving Mathematical Word Problems Through Gener
No ratings yet
Resolving Mathematical Word Problems Through Gener
6 pages
Neural Math Word Problem Solver With Reinforcement Learning
No ratings yet
Neural Math Word Problem Solver With Reinforcement Learning
11 pages
1 s2.0 S2666827023000592 Main
No ratings yet
1 s2.0 S2666827023000592 Main
8 pages
A Weakly Supervised Model For Solving Math Word PR
No ratings yet
A Weakly Supervised Model For Solving Math Word PR
11 pages
Solving MWP Multiencoders Multidecoders
No ratings yet
Solving MWP Multiencoders Multidecoders
11 pages
NeurIPS 2022 Solving Quantitative Reasoning Problems With Language Models Paper Conference
No ratings yet
NeurIPS 2022 Solving Quantitative Reasoning Problems With Language Models Paper Conference
15 pages
Solving Math Word Problems by Combining Language Models With Symbolic Solvers
No ratings yet
Solving Math Word Problems by Combining Language Models With Symbolic Solvers
7 pages
A Survey of Deep Learning For Mathematical Reasoning
No ratings yet
A Survey of Deep Learning For Mathematical Reasoning
24 pages
Source
No ratings yet
Source
16 pages
Are NLP Models Really Able To Solve Simple Math Word Problems?
No ratings yet
Are NLP Models Really Able To Solve Simple Math Word Problems?
15 pages
Mathematical Language Models: A Survey
No ratings yet
Mathematical Language Models: A Survey
34 pages
Reasoning in Large Language Models Through Symbolic Math Word Problems
No ratings yet
Reasoning in Large Language Models Through Symbolic Math Word Problems
13 pages
Towards Tractable Mathematical Reasoning Challenge
No ratings yet
Towards Tractable Mathematical Reasoning Challenge
15 pages
S C M W P U GPT-4 C I C - S - V: Olving Hallenging ATH ORD Roblems Sing ODE Nterpreter With ODE Based ELF Erification
No ratings yet
S C M W P U GPT-4 C I C - S - V: Olving Hallenging ATH ORD Roblems Sing ODE Nterpreter With ODE Based ELF Erification
23 pages
2020 Mathematical Reasoning Via Self-Supervised Skip-Tree Training
No ratings yet
2020 Mathematical Reasoning Via Self-Supervised Skip-Tree Training
21 pages
A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level
No ratings yet
A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level
10 pages
Solving Arithmetic Word Problems Using Natural Language Processing and Rule-Based Classification
No ratings yet
Solving Arithmetic Word Problems Using Natural Language Processing and Rule-Based Classification
12 pages
Specialized Mathematical Solving by A Step-By-Step Expression Chain Generation
No ratings yet
Specialized Mathematical Solving by A Step-By-Step Expression Chain Generation
13 pages
Word Problems Algebra Solving
No ratings yet
Word Problems Algebra Solving
11 pages
TinyGSM: Achieving 80% On GSM8k With Small Language Models
No ratings yet
TinyGSM: Achieving 80% On GSM8k With Small Language Models
102 pages
How Does GPT-2 Compute Greater-Than?: Interpreting Mathematical Abilities in A Pre-Trained Language Model
No ratings yet
How Does GPT-2 Compute Greater-Than?: Interpreting Mathematical Abilities in A Pre-Trained Language Model
26 pages
T Ra: A T - I R A M P S: O OOL Ntegrated Easoning Gent FOR Athematical Roblem Olving
No ratings yet
T Ra: A T - I R A M P S: O OOL Ntegrated Easoning Gent FOR Athematical Roblem Olving
22 pages
Physics of Language Models Part 2.1 Grade-School Math and The Hidden Reasoning Process
No ratings yet
Physics of Language Models Part 2.1 Grade-School Math and The Hidden Reasoning Process
33 pages
Beyond Statistical Learning: Exact Learning Is Essential For General Intelligence
No ratings yet
Beyond Statistical Learning: Exact Learning Is Essential For General Intelligence
24 pages
Moon Instructions
No ratings yet
Moon Instructions
24 pages
2023 Acl-Long 817
No ratings yet
2023 Acl-Long 817
27 pages
08 Math
No ratings yet
08 Math
22 pages
Lisp Programming
No ratings yet
Lisp Programming
22 pages
Green Wizards
No ratings yet
Green Wizards
17 pages
Don't Trust Verify
No ratings yet
Don't Trust Verify
20 pages
Measuring Massive Multitask Language Understanding
No ratings yet
Measuring Massive Multitask Language Understanding
25 pages
Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning
No ratings yet
Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning
12 pages
Exploring The Potential of Using ChatGPT in Physics Education
No ratings yet
Exploring The Potential of Using ChatGPT in Physics Education
19 pages
Achieving 97% On GSM8K Deeply Understanding The Problems Makes LLMs Better Solvers For Math Word Problems 2404.14963v4
No ratings yet
Achieving 97% On GSM8K Deeply Understanding The Problems Makes LLMs Better Solvers For Math Word Problems 2404.14963v4
11 pages
Token-by-Token Regeneration and Domain Biases - A Benchmark of LLMs On Advanced Mathematical Problem-Solving
No ratings yet
Token-by-Token Regeneration and Domain Biases - A Benchmark of LLMs On Advanced Mathematical Problem-Solving
8 pages
Mathchat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
No ratings yet
Mathchat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
24 pages
T Ra: A T - I R A M P S: O OOL Ntegrated Easoning Gent FOR Athematical Roblem Olving
No ratings yet
T Ra: A T - I R A M P S: O OOL Ntegrated Easoning Gent FOR Athematical Roblem Olving
34 pages
Handwritten Mathematical Expression Solver Using CNN
No ratings yet
Handwritten Mathematical Expression Solver Using CNN
8 pages
Measuring Massive Multitask Language Understanding
No ratings yet
Measuring Massive Multitask Language Understanding
27 pages
Arxiv - 20220809 - Jian Qian - Limitations of Language Models in Arithmetic and Symbolic Induction
No ratings yet
Arxiv - 20220809 - Jian Qian - Limitations of Language Models in Arithmetic and Symbolic Induction
12 pages
How Well Do LLM Perform Iin Arithmetic Tasks
No ratings yet
How Well Do LLM Perform Iin Arithmetic Tasks
10 pages
Tinygsm: Achieving 80% On Gsm8K With Small Language Models
No ratings yet
Tinygsm: Achieving 80% On Gsm8K With Small Language Models
15 pages
When Can Transformers Reason With Abstract Symbols?
No ratings yet
When Can Transformers Reason With Abstract Symbols?
55 pages
Training Verifiers To Solve Math Word Problems
No ratings yet
Training Verifiers To Solve Math Word Problems
22 pages
Lemma An Open Language Model For Mathematics
No ratings yet
Lemma An Open Language Model For Mathematics
28 pages
Llemma - An Open Language Model For Mathematics
No ratings yet
Llemma - An Open Language Model For Mathematics
28 pages
MathScale Scaling Instruction Tuning For Mathematical Reasoning
No ratings yet
MathScale Scaling Instruction Tuning For Mathematical Reasoning
15 pages
Autonomous Data Selection With Language Models For Mathematical Texts
No ratings yet
Autonomous Data Selection With Language Models For Mathematical Texts
25 pages
What Makes Math Word Problems Challenging For LLMS?
No ratings yet
What Makes Math Word Problems Challenging For LLMS?
11 pages
Improving Large Language Model
No ratings yet
Improving Large Language Model
14 pages
23ChatGPT As A Math Questioner
No ratings yet
23ChatGPT As A Math Questioner
9 pages
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
No ratings yet
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
45 pages
Large Language Models As Analogical Reasoners
No ratings yet
Large Language Models As Analogical Reasoners
25 pages
L L M A R - : Arge Anguage Odels As Nalogical Eason ERS
No ratings yet
L L M A R - : Arge Anguage Odels As Nalogical Eason ERS
24 pages
Chain of Thought Reasoning
No ratings yet
Chain of Thought Reasoning
14 pages
Large Language Models As Analogical Reasoners
No ratings yet
Large Language Models As Analogical Reasoners
25 pages
LLMs For Mathematicians 1702200180
No ratings yet
LLMs For Mathematicians 1702200180
13 pages
IGNOU MCA Design and Analysis of Algorithms Previous Years Unsolved Papers MCS 211
From Everand
IGNOU MCA Design and Analysis of Algorithms Previous Years Unsolved Papers MCS 211
Manish Soni
No ratings yet
Graph Theory
From Everand
Graph Theory
Ronald Gould
No ratings yet
Lesson 9 The School Head in School-Based Management (SBM)
100% (3)
Lesson 9 The School Head in School-Based Management (SBM)
6 pages
PWSU - Professional Competencies - August 20202docx
No ratings yet
PWSU - Professional Competencies - August 20202docx
7 pages
TTL2 SemiFinals Module
No ratings yet
TTL2 SemiFinals Module
7 pages
Six Habits of Merely Effective Negotiators: Eglecting
No ratings yet
Six Habits of Merely Effective Negotiators: Eglecting
2 pages
ALS IMPLEMENTERS. Doc. Luisito Cantos
No ratings yet
ALS IMPLEMENTERS. Doc. Luisito Cantos
5 pages
NHS FPX 6004 Assessment 3 Training Session For Policy Implementation
No ratings yet
NHS FPX 6004 Assessment 3 Training Session For Policy Implementation
7 pages
HENOK MEZGEBE ASEMAHUGN ID MLO-3436-15A SL
No ratings yet
HENOK MEZGEBE ASEMAHUGN ID MLO-3436-15A SL
3 pages
Week 3
No ratings yet
Week 3
12 pages
Career in Law
No ratings yet
Career in Law
31 pages
Christian School International: Guidebook
No ratings yet
Christian School International: Guidebook
36 pages
Learning Task - Activity 2.2
No ratings yet
Learning Task - Activity 2.2
1 page
Title Defense in Practical Research 1
No ratings yet
Title Defense in Practical Research 1
8 pages
ADA Unit 3
No ratings yet
ADA Unit 3
41 pages
Time Connectives Homework Ks1
100% (1)
Time Connectives Homework Ks1
8 pages
Getting Started Guide PDF
No ratings yet
Getting Started Guide PDF
23 pages
Abirami R - Internship - Report
No ratings yet
Abirami R - Internship - Report
26 pages
CamScanner 10-09-2024 08.35
No ratings yet
CamScanner 10-09-2024 08.35
1 page
Utsav Resume
No ratings yet
Utsav Resume
3 pages
Understanding Archipelagic Insight
No ratings yet
Understanding Archipelagic Insight
5 pages
Revised Advertisement
No ratings yet
Revised Advertisement
15 pages
SPH 101 71239 Scott.2024
No ratings yet
SPH 101 71239 Scott.2024
17 pages
Solution of Micro Process Sheet2
No ratings yet
Solution of Micro Process Sheet2
9 pages
Good Afternoon
No ratings yet
Good Afternoon
30 pages
Mini Research Sociology
No ratings yet
Mini Research Sociology
10 pages
Intramural Resolution
50% (2)
Intramural Resolution
2 pages
Science 4 Q4 W3
No ratings yet
Science 4 Q4 W3
6 pages
Past Simple of Verb Be, Present Simple Vs Past Simple
No ratings yet
Past Simple of Verb Be, Present Simple Vs Past Simple
3 pages
CSA Certification Brochure 2024 PDF
No ratings yet
CSA Certification Brochure 2024 PDF
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Symmetry 15 00916 v2

Uploaded by

Symmetry 15 00916 v2

Uploaded by

SS symmetry

Keywords: math word problems; natural language processing; pre-trained models

Citation: Zhang, Y.; Zhang, T.; Xie, P.;

Symmetry 2023, 15, 916. https://doi.org/10.3390/sym15040916 https://www.mdpi.com/journal/symmetry

2.2. Autoencoder Pretrained Language Models

3.2. The Model’s Architecture

3.2.1. Encoder Module

3.2.2. Fine-Tuning Module

(a) Without Fine-tuning (b) With Fine-tuning

3.2.3. Decoder Module

3.2.4. Supervision Module

First, we use the multiheaded-attention mechanism to capture the relationship among

Algorithm 1 Negative Expression Generation

3.3. Learning Objectives

4.2. Implementation Details

• GTS+Teacher [23]: a separable representation of the MWP problem with similar

4.5. Overall Results

Model Math23K Math23K*

4.6. Ablation Experiment

Table 3. The accuracy rate of the answer to the ablation experiment.

Model Math23K Math23K* MathQA Ape-Clean

Table 4. The accuracy rate of the equation to the ablation experiment.

Model Math23K Math23K* MathQA Ape-Clean

4.7. Parameter Experiment

Table 5. Experimental results of negative sampling.

Form of Negative Sampling Ex-Acc An-Acc

Table 6. Experimental results of the number of negative examples.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.