0% found this document useful (0 votes)
23 views6 pages

ENG2 Verilog

Uploaded by

孫志鵬
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views6 pages

ENG2 Verilog

Uploaded by

孫志鵬
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Session 1: DNN for CAD MLCAD '20, November 16–20, 2020, Virtual Event, Iceland

DAVE: Deriving Automatically Verilog from English


Hammond Pearce Benjamin Tan Ramesh Karri
hammond.pearce@nyu.edu benjamin.tan@nyu.edu rkarri@nyu.edu
New York University New York University New York University
Brooklyn, USA Brooklyn, USA Brooklyn, USA

ABSTRACT ML has recently made great strides in Natural Language Pro-


Specifications for digital systems are provided in natural language, cessing (NLP). Advances in Deep Learning (DL) have included new
and engineers undertake significant efforts to translate these into architectures such as LSTMs [15], RNNs [8], and Transformers [18].
the programming languages understood by compilers for digital These architectures have led to models such as BERT [3] and GPT-
systems. Automating this process allows designers to work with 2 [12] which demonstrate capability in language modelling, lan-
the language in which they are most comfortable — the original guage translation (e.g., English to French), reading comprehen-
natural language — and focus instead on other downstream design sion/understanding (e.g., answering questions from the CoQA [13]
challenges. We explore the use of state-of-the-art machine learning dataset), and information storage/retrieval. In fact, GPT-2 made
(ML) to automatically derive Verilog snippets from English via fine- headlines [6] for initially being “too dangerous” to release given
tuning GPT-2, a natural language ML system. We describe our ap- the “quality” of its text generation. Can we harness this power to
proach for producing a suitable dataset of novice-level digital design produce hardware from task descriptions (like in Fig. 1)?
tasks and provide a detailed exploration of GPT-2, finding encour- Towards the goal of fully automated design from natural lan-
aging translation performance across our task sets (94.8 % correct), guage, we investigate the adaptation of a pre-trained natural lan-
with the ability to handle both simple and abstract design tasks. guage model to perform English to Verilog “translation”. Using
transfer learning [10], we fine-tune the recently presented GPT-2
CCS CONCEPTS for this task by training it on a custom dataset of Task/Result pairs,
as in Fig. 1. The tasks are somewhat akin to novice-level “textbook”
• Computing methodologies → Machine translation; • Hard-
problems (i.e., similar to those found in a classic textbook [17]).
ware → Hardware description languages and compilation.
We validate our approach by presenting a set of “unseen” tasks to
ACM Reference Format: translate and measure the quality of output. Our contributions are:
Hammond Pearce, Benjamin Tan, and Ramesh Karri. 2020. DAVE: Deriving • DAVE, a pre-trained GPT-2 model that can translate natural lan-
Automatically Verilog from English. In 2020 ACM/IEEE Workshop on Machine
guage into Verilog implementation.
Learning for CAD (MLCAD ’20), November 16–20, 2020, Virtual Event, Iceland.
• A method to automatically generate a large quantity of English
ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3380446.3430634
specification, Verilog pairs for fine-tuning DAVE.
• Exploration and evaluation of fine-tuning DAVE.
1 INTRODUCTION
• Rating DAVE in translating complex descriptive tasks besides
In pursuit of simplifying and acceleration digital design, a machine- those presented in simple prescriptive forms.
driven design flow with “no humans in the loop” is a long-term The rest of the paper is as follows. Section 2 provides background
goal of projects such as OpenROAD1 . Typically, the starting point and discuss related work. Section 3 describes our experimental ap-
is human-prepared hardware specifications in a Hardware Descrip- proach and dataset preparation. Section 4 presents the results of
tion Language (HDL) such as Verilog. However, manually producing our experimental investigation. Section 5 concludes.
HDL to match a given specification (e.g. in Fig. 1) requires signif-
icant domain knowledge and is challenging to write error-free. 2 BACKGROUND AND RELATED WORK
As such, there is an opportunity for automatic translation to in-
ML-CAD. ML techniques, including DL have shown promising
crease productivity and reduce the burdens on human designers.
results across numerous applications, including across the CAD do-
Given successful adoption of Machine Learning (ML) throughout
main. Recent work spans the design flow, from early-stage hardware
the Integrated Circuit (IC) Computer-Aided Design (CAD) flow
cost estimations [14], through logic synthesis [20], and physical
(e.g, [7, 14, 20]), we are motivated to investigate if state-of-the-art
ML can help in even earlier design stages.
1 https://theopenroadproject.org/ TASK: Write sequential code for a call button (e.g., in an airplane or hospital). If
the call button ’b’ is pressed (= 1) then the call light ’l’ should turn on (= 1). The
Permission to make digital or hard copies of all or part of this work for personal or output call light ’l’ should turn off (= 0) when the synchronous cancel button ’r’
classroom use is granted without fee provided that copies are not made or distributed is pressed (= 1).
for profit or commercial advantage and that copies bear this notice and the full RESULT:
citation on the first page. Copyrights for components of this work owned by others
than the author(s) must be honored. Abstracting with credit is permitted. To copy / / assume c l o c k c l k
otherwise, or republish, to post on servers or to redistribute to lists, requires prior reg l ;
specific permission and/or a fee. Request permissions from permissions@acm.org. a l w a y s @( p o s e d g e c l k ) b e g i n
i f ( r ) begin
MLCAD ’20, November 16–20, 2020, Virtual Event, Iceland l <= 0 ;
© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. end e l s e i f ( b ) b e g i n
ACM ISBN 978-1-4503-7519-1/20/11. . . $15.00 l <= 1 ;
https://doi.org/10.1145/3380446.3430634 end
end

Figure 1: Example “Task” and Corresponding Verilog

27
Session 1: DNN for CAD MLCAD '20, November 16–20, 2020, Virtual Event, Iceland

design [7]. We explore the use of transfer learning [10] to teach a 1. Generate Task/Result Type: SimpleAssignment
Metastructure I1: a, I2: b, O: c, Operator: OR
DL-based model to produce Verilog by framing it as a machine trans-
lation problem. Transfer learning provides the ability to learn new Define combinational code to return
Template 2. Select suitable {{.I1}} {{.Op}} {{.I2}} in {{.O}};
tasks without large quantities of labelled data in a target domain.
Repository templates
assign {{.O}} = {{.I1}} {{.Op}} {{.I2}};
GPT-2. We use GPT-2 [12] as our starting point, given its state-
of-the-art performance in zero-shot task settings. GPT-2 is based Define combinational code to
Equivalence return 'a' OR 'b' in 'c'.
on the decoder part of the Transformer, a neural network encoder- 3. Fill templates
Routines
assign c = a | b;
decoder architecture with a self-attention mechanism [18]. At the
core of the GPT-2 approach is language modelling, which can be
4. Store combination of TASK: Define combinational code to
framed as an unsupervised distribution estimation from some set of templates as return 'a' OR 'b' in 'c'.
{Task, Result} pair. RESULT: assign c = a | b;
examples (𝑥 1, 𝑥 2, ..., 𝑥𝑛 ), where each example is composed of vari-
able length sequences of symbols (𝑠 1, 𝑠 2, ..., 𝑠𝑛 ) [12]. This statistical
model of language is thus the joint probability distribution of the Figure 2: The Task/Result Generation Process
symbols in the language (as the product of the conditional probabil- the necessary inputs, outputs, and the relationships between them
ities for each symbol given the preceding sequence [1]). Put simply, from a short description of a task. While previous works use al-
the model learns to answer the following: given some sequence of gorithmic approaches such as parse-tree generation and sub-tree
symbols, what is the most likely next symbol in the sequence? matching [21] to identify the salient elements of the natural lan-
Different tasks can be specified in a language itself, e.g., {“trans- guage description for populating templates, we re-cast the problem
late to french”, “english text”, “french text”} [12]. Radford et al. spec- holistically as translation. As we describe next, we prepare exam-
ulate that a model with sufficiently large capacity can learn to ples of task descriptions with varying descriptiveness, and examine
perform tasks demonstrated in natural language without explicit GPT-2’s ability to produce Verilog after transfer learning [10].
supervision. In other words, given a general system which produces
𝑝 (𝑜𝑢𝑡𝑝𝑢𝑡 |𝑖𝑛𝑝𝑢𝑡), a condition can be introduced to model some task 3.2 Dataset Preparation
𝑝 (𝑜𝑢𝑡𝑝𝑢𝑡 |𝑖𝑛𝑝𝑢𝑡, 𝑡𝑎𝑠𝑘). By training GPT-2 on a large, unlabelled In this work, we fine-tune GPT-2 to produce DAVE, aiming for the
dataset (∼8 million webpages), Radford et al. demonstrated the the ability to translate natural language (i.e., English) into Verilog. GPT-
trained model could perform well on numerous tasks without fine- 2 is designed to process contiguous text sequences, so we adopt the
tuning. The trained model then provides a good starting point for approach proposed in [11], to represent the English–Verilog transla-
performance in specific tasks following fine-tuning [11]. Funda- tion task as an ordered sequence in the format ‘TASK: <English
mentally, GPT-2’s pre-trained, implicit capability to process natural Text> RESULT: <Verilog Code>’.
language can be directed towards specific tasks. We attempt to Open-source Verilog code can be found online, but is unstruc-
harness this capability by fine-tuning GPT-2 for translating natural tured, with varying quality and complexity. For this initial study,
language descriptions to Verilog. we design a custom dataset generation tool inspired by the sort
Natural Language → Code. The challenges in translating spec- of template-based, random auto-marking Q&A systems used in
ifications into computer code has driven research in natural lan- teaching settings (e.g., the OASIS Question Engine2 ). Rather than
guage programming [9]. Recent work has shown that there is a produce thousands of Task/Result pairs manually, we prepare sev-
finite limit to the number of unique ways one can express certain eral natural language templates which encapsulate different task
programming structures (e.g. for-loops) in natural language, and as scenarios. An example generation process is shown in Fig. 2.
such it is possible to extract this information and transform it into In step (1) our tool generates a Task/Result metastructure, a de-
its corresponding computer code [9]. Other related works use NLP scriptor for the type of task (e.g., an assignment) and relevant
techniques, including rule-based processing, for formal system mod- information for that task (e.g., variable names, operators). Possi-
eling [4], generating hardware assertions [5], and for enhancing ble metastructure tasks include combinational signal assignments,
documentation by automatically extracting software development registers, sequence generators, or a multi-set of these. Then, in step
tasks and associating them with the relevant paragraphs [16]. While (2), the tool randomly chooses a suitable template for the task that
showing promising results, there are limitations on how flexible encapsulates all information in English and Verilog. In step (3), the
the natural language descriptions can be with respect to structure. tool “fills in” these templates, translating arguments where neces-
Earlier work involves designing separate components to perform sary (e.g. OR operator is ‘or’ in English and ‘|’ in Verilog). Finally,
specific tasks such as identifying “steps”, “loops”, and “comments” in step (4), the tool saves the generated Task/Result pair.
from natural text [9]. To our knowledge, DL techniques to generate Structurally, we organise our templates into the different task
HDL from natural language have not been explored. classes they describe—(combinational) assignments, registers, and
sequence generators. We then categorise them further as either
3 FINE-TUNING GPT-2 FOR VERILOG prescriptive or descriptive. Prescriptive templates are like the ex-
3.1 Problem definition ample presented in Fig. 2. We conjecture that these should be trivial
to translate—simple substitutions and word-reordering is all that
In this work, we focus on an early-stage CAD problem: interpreting is required to convert from the English to Verilog. Descriptive
a high-level, informal description of functionality and producing templates, meanwhile, are more like the example presented in
the corresponding concrete specification. For small designs, de-
signers can craft an RTL specification directly after identifying 2 https://www.oasisqe.com/

28
Session 1: DNN for CAD MLCAD '20, November 16–20, 2020, Virtual Event, Iceland

Table 1: Template-based Dataset Information. (pX → prescriptive; dX → descriptive; X is the task type)
Task # for # Non- Samples Model Verilog
Example of Task in English
Training Training / Template

pa Given inputs ‘a’


Assignment (a)

17 2 2000
and ‘b’, take the nor of these and return the result in ‘c’. assign c = ! ( a | b ) ;

A house has three active-low


alarm detector triggered sensors ‘a’, ‘b’, ‘c’. Write assign l = ! ( a & b & c ) ;
da 3 1 4000
combinatorial logic for a active-high light ‘l’ which
activates when any of the detectors are triggered.
Define a 4-bit assign e = b ^ r ; reg q ;
a l w a y s @( p o s e d g e c o r p o s e d g e r ) b e g i n
pr 9 2 3000 register ‘q’ with input ‘a’ nand ‘b’, enable ‘e’ defined i f ( r ) b e g i n q <= 0 ; end
Register (r)

as ‘b’ xnor ‘r’, an asynchronous reset ‘r’, and a clock ‘c’. e l s e i f ( e ) b e g i n q < = ! ( a & b ) ; end
end ;
Design the code for an alarm system. When the panic / / assume c l o c k c l k
reg a ;
mode ‘m’ is selected (= 0) the alarm system ‘a’ should
dr 3 1 4000 a l w a y s @( p o s e d g e c l k ) b e g i n
activate (= 1) and should only deactivate (= 0) when the i f ( c ) b e g i n a <= 0 ; end
active-low synchronous cancel button ‘c’ is selected (= 1). e l s e i f ( ! m) b e g i n a <= 1 ; end
end
Define sequential enum { s0 , s1 , s 2 } s t a t e ; r e g u ;
Sequence Generator (g)

a l w a y s @( p o s e d g e c ) b e g i n
code which will produce the repeating sequence i f ( s ) b e g i n s t a t e <= s 0 ; u <= b0 ; end
pg 4 2 4000 [0, 1, 0] on output ‘u’. It should advance on clock ‘c’ e l s e begin
whenever enable ‘e’ is present, and a synchronous reset unique case ( s t a t e )
‘r’ should reset the sequence back to the first element. s 0 : i f ( e ) b e g i n s t a t e <= s 1 ; u <= b0 ; end
s 1 : i f ( e ) b e g i n s t a t e <= s 2 ; u <= b1 ; end
s 2 : i f ( e ) b e g i n s t a t e <= s 0 ; u <= b0 ; end
endcase
end

Write a 6-bit register ‘ar’ with input a s s i g n r = yxo >= m; r e g [ 5 : 0 ] a r ;


a l w a y s @( p o s e d g e p ) b e g i n
defined as ‘gv’ modulo ‘lj’, enable ‘q’, synchronous
Multi-task (M-T)

i f ( r ) b e g i n a r <= 0 ; end
reset ‘r’ defined as ‘yxo’ greater than or equal to ‘m’, e l s e i f ( q ) b e g i n a r <= gv % l j ; end
and clock ‘p’. A vault door has three active-low secret end
– N/A N/A 5250 switch pressed sensors ‘et’, ‘lz’, ‘l’. Write combinatorial assign s = ! ( et | lz | l ) ;
a s s i g n nc = t f s > w ; r e g [ 5 : 0 ] w ;
logic for a active-high lock ‘s’ which opens when all of a l w a y s @( p o s e d g e xx ) b e g i n
the switches are pressed. Write a 6-bit register ‘w’ with i f ( nc ) b e g i n w <= 0 ; end
input ‘se’ and ‘md’, enable ‘mmx’, synchronous reset e l s e i f (mmx) b e g i n w <= s e & md ; end
‘nc’ defined as ‘tfs’ greater than ‘w’, and clock ‘xx’. end

Fig. 1. They are more complex to translate, and a human designer Non-Trained templates, i.e., such Task/Result pairs are presented
would implicitly perform intermediate steps—such as understand- to the language model during validation.
ing that a given input is being used as an enable signal or as a reset. While the number of templates might appear low in certain cases
Multi-task templates are random concatenations of two to four as- (e.g., # of Descriptive vs. Prescriptive assignments), the task in-
signment/register templates. Table 1 provides additional examples stances of the given templates vary significantly from each other
of the different task types generated from the various templates. due to the addition or omission of optional clauses in the natural
While at first glance this template-based approach for dataset text during data generation. A template that describes a register
generation might appear to restrict DAVE’s ability to generalize design task may have a clause describing a reset signal, and if the
over English descriptions, this dataset is only used for fine-tuning template is used for a metastructure with no reset signal, that entire
the language model. As GPT-2 is pre-trained over the large WebText clause is omitted. As such a given template identifier refers only
dataset [12], we theorize that DAVE should retain at some ability to the overall sentence structure used in a Task, the unique pattern
to process natural language features such as synonyms and differ- of compulsory words within that template, such as introductory
ent word/clause orders. To validate this hypothesis, we hold-out remarks (e.g. “Describe combinatorial logic to...”), and individual
a subset of templates for use during testing and evaluation. Table 1 words used within that template (e.g. conjunctions, prepositions).
has information about the final dataset, including the number of Descriptive templates have randomly generated settings such as
“Trained” and “Non-Trained” (held-out) templates for all task types. “an attendant call button”. These are generated from the cascaded
In our evaluation, we initially query DAVE with new task in- sub-templates, increasing the entropy of each individual Task/Re-
stances based on Trained templates to observe its baseline ability to sult pair. Register and Sequence Generator templates are allowed
perform “familiar” tasks (i.e., produce Verilog from English descrip- to recursively define the basic template (prescriptive assignments).
tions that are similar to the training data). To study generalizability A register might define a signal (e.g. an enable) as a function (e.g.
of the approach, we query DAVE with new task instances based on ‘a’ nand ‘b’) rather than as a pre-set input (e.g. ‘c’).

29
Session 1: DNN for CAD MLCAD '20, November 16–20, 2020, Virtual Event, Iceland

Multi-tasks combine other types of tasks and are difficult to cat- DAVE’s performance over these tasks as evidence that the GPT-2
egorise. We randomly generate 5,250 multi-task samples, of which language model offers promise for our intended translation purpose.
5000 are used for fine-tuning. We discuss details in Section 4.4.
4.1.1 A measure of equality. There are numerous ways to imple-
3.3 Experimental Platform ment a given specification in any programming language. Take the
example from Fig. 2: while it provides the correct answer as assign
After we generate a suitable corpus of Task/Result pairs according c = a | b;, it could be equivalently specified as assign c = b
to the method described in Section 3.2, we fine-tune the 345 million | a;. This becomes even more of an issue when implementing
parameter GPT-2 model on a high-performance computing node larger and more complex and descriptive specifications.
with 2 Intel Xeon E5-2698 v4 @ 2.20GHz cores, 20 GB of RAM, While there are ways of quantifying identical code (e.g., com-
and an NVIDIA V100 32 GB graphics card over all categories of paring abstract syntax trees), we opt, for a simpler comparison of
Task/Result pairs simultaneously (i.e. the same trained model is used DAVE’s outputs against the template tool using a sequence equiv-
to decode each type of Task). Our fine-tuning script is modified from alence metric. This is because the generated Verilog code should
[19]. We use the Python programming environment, with pytorch be relatively short and simple. More precisely, we define correct-
version 1.5.0, tensorflow version 2.2, and aitextgen version 0.2.3. ness of the generated text as its distance to the template-provided
Underlying these we use cuda version 10.1 and cudnn version 7.6. “correct” answer (excluding white-space characters from both) as
To fine-tune GPT-2, we leave the hyper-parameters at their sug- measured by their Ratcliff-Obershelp similarity [2]. This means
gested defaults (learning rate 1e-4, weight decay 0.05, adam epsilon that if DAVE returns assign c = a | b; as the correct answer
1e-8) and perform fine-tuning for 7500 steps. The training data to the prompt in Fig. 2, it scores 1.00—i.e., the result is fully correct.
covers a random sample of 95% of the generated samples of each However, despite being functionally equivalent, a result of assign
Trained template category, with 5% held back for evaluating the c = b | a; scores only 0.833.
model. To evaluate model “goodness”, we use the same computing While this metric is simple, manual inspection of the results that
resources as for training and use default GPT-2 output generation did not have the expected score of 100, revealed no examples where
parameters (temperature 0.7, top_p 0.9, and top_k 0/disabled). DAVE had performed small but functionally equivalent changes
(e.g., inverting the order of variables compared to their order in
4 EXPERIMENTAL INVESTIGATION the specification). That the output has a deterministic ordering to
4.1 Overview the variables is not a surprising result, as the template engine that
The purpose of this work is to explore the potential for general- DAVE is fine-tuned from has a deterministic order to the Verilog
purpose language models in translating system specifications pro- code that it produces. We provide insights from our investigation
vided in English to their hardware implementations in the Verilog in three parts: DAVE’s performance on prescriptive (Section 4.2),
HDL. As such we are interested in measuring the quality of the descriptive (Section 4.3), and multi tasks (Section 4.4).
generated Verilog. This raises an obvious question—how should
one define “quality”? In this work we are interested in a language 4.2 Translation of Prescriptive Specifications
model which can perform design tasks of a similar difficulty to DAVE’s performance on prescriptive tasks is presented in Table 2,
those posed in a textbook [17]. with Non-Trained templates highlighted in bold. Each row con-
However, there are no automated systems to quantify how well tains information on the number of template samples used for
a specification has been implemented in its corresponding Verilog fine-tuning, the number of template samples used for validation,
if it is “almost” correct. Formal equivalence check is an option, but the number DAVE returned correctly, and (where applicable) the
requires that the design is at least syntactically compliant. This average Ratcliff-Obershelp (R-O) similarity of returned incorrect
presents a challenge as we wish to quantify the quality of DAVE’s answers compared to the correct answer.
Verilog generation. However, given that we generate Task/Result With regards to assignments, DAVE performs well on tasks based
pairs with a template engine, we have a baseline ‘canonical’ re- on Trained (e.g., pa00 3 ) templates, getting 99.7 % of all samples
sponse that we can compare DAVE’s output against. This allows us correct across this validation category. It performs slightly worse
to introduce the equivalence between the two generators as a mea- on tasks drawn from Non-Trained templates (e.g., pa18 4 ), scoring
sure of quality, discussed in subsubsection 4.1.1. Where DAVE’s out- 96.5 % correct. DAVE scores well on Trained register templates (e.g.,
put is not equivalent, we manually examine the result qualitatively. pr00 5 ) (99.2 % correct). Likewise DAVE performed well with the
An important part of our evaluation is to examine DAVE’s per- Non-Trained Templates in this category (e.g. pr116 ), with 98.7 %
formance over unfamiliar texts. Otherwise, it could be argued that correct. While DAVE did well in Trained Sequence Generators (e.g.
the language model has simply learned a kind of pattern recogni- pg017 ) with 99.5 % correct across the samples, it performed poorly
tion over the Task/Result pairs, and is just using string relocation
techniques to score highly during validation. If this notion were 3 pa00 example: “Put the result of ‘a’ nand ‘b’ in ‘c’.”
4 pa18: “Assign into output ‘c’ the result of ‘a’ xor ‘b’.”
applied to a student, we might say that they had learned to produce 5 pr00: “Define a 8-bit register ‘a’ with input ‘a’ defined as ‘b’ and ‘c’, enable ‘e’, and
Verilog by rote, rather than through understanding. clock ‘c’.”
This examination is provided through the Non-Trained Tem- 6 pr11: Given input ’a’, enable ’e’ defined as ’d’ nxor ’f’, an asynchronous reset ’r’

plates. Recall that these are unfamiliar to DAVE, i.e., they were (being ’x’ or ’y’) make a 7-bit register ’q’.
7 pg01: “Define sequential code which will produce the repeating sequence [00, 10, 10]
not seen during fine-tuning, and DAVE has had no opportunity to on the 2-bit output ‘q’. It should advance on each tick of a clock ‘c’ whenever enable
learn/memorize their syntax and structure. We seek insight from defined as ‘a’ nxor ‘b’ is present.”

30
Session 1: DNN for CAD MLCAD '20, November 16–20, 2020, Virtual Event, Iceland

Table 2: Testing DAVE on Prescriptive Tasks Table 3: Testing DAVE on Descriptive and Multi- Tasks
Template Avg. Template Avg.
# Trained # Validated # Correct # Trained # Validated # Correct
Type Name Error R-O Type Name Error R-O
pa00 1900 100 99 0.947 da00 3800 200 200 –

Assign.
pa01 1900 100 100 – da01 3800 200 199 0.952
pa02 1900 100 100 – da02 3800 200 196 0.956
pa03 1900 100 100 – da03 0 200 200 –
pa04 1900 100 100 – dr00 3800 200 200 –

Register
pa05 1900 100 100 – dr01 3800 200 195 0.985
pa06 1900 100 97 0.951 dr02 3800 200 199 0.992
pa07 1900 100 100 –
Assignment

dr03 3800 200 198 0.988


pa08 1900 100 100 – dr04 0 200 196 0.987

M-T
pa09 1900 100 100 – Trained 5000 250 130 0.907
pa10 1900 100 100 – Non-Trained 0 250 103 0.817
pa11 1900 100 100 –
pa12 1900 100 100 –
pa13 1900 100 100 – was closest to pg03 (similarity 0.777). These numbers are similar
pa14 1900 100 99 0.947 enough that we would have expected pg06 to score better. Further
pa15 1900 100 100 –
pa16 1900 100 100 –
formal analysis is an avenue for our future work. It is likely that
pa17 0 100 95 0.956 providing a greater variety of Sequence Generator templates during
pa18 0 100 98 0.898 training would help DAVE produce more accurate results.
pr00 2850 150 148 0.981
pr01 2850 150 149 0.993
pr02 2850 150 149 0.973 4.3 Translation of Descriptive Specifications
pr03 2850 150 150 –
Table 3 presents DAVE’s performance over Descriptive Tasks. While
pr04 2850 150 148 0.990
pr05 2850 150 147 0.982 this category has fewer templates, each template has more oppor-
Register

pr06 2850 100 148 0.993 tunities for entropy due to the presence of optional clauses and
pr07 2850 150 149 0.983 implicit intermediate signals. We also design these templates to be
pr08 2850 150 150 –
pr09 2850 150 150 – more “difficult”—they invoke requirements such as ‘active-high’
pr10 0 150 149 0.960 and ‘active-low’ qualifiers to their variables, terms that DAVE needs
pr11 0 150 147 0.965 to recognise and accommodate in the generated Verilog.
pg01 3800 200 200 –
Seq. Generator

Somewhat surprisingly, DAVE performs better on Descriptive


pg02 3800 200 199 0.996
pg03 3800 200 200 – Tasks than on the Prescriptive Tasks, with 99.2 % correct Assign-
pg04 3800 200 197 0.984 ments and 99.0 % Registers over the Trained Templates. For the
pg05 0 200 200 – Non-Trained templates, the Assignments scored 100 % correct and
pg06 0 200 143 0.889
Registers scored 98 %. To check that this high score was not due to
with the Non-Trained template pg068 , bringing the overall percent- the Non-Trained templates da03 and dr04 being structurally similar
age correct for Non-Trained Templates down to 85.6 %. to the Trained templates, we compare R-O similarities. da03 is most
Discussion. One would expect DAVE to perform well on tasks similar to da01, with a score of 0.686. dr04 is most similar to dr02,
produced from Trained templates, given that these most resemble with a score of 0.703. While these values might seem high, con-
the training data. This held true for all three major categories. One sider the Sequence Generator template pg06, which scored 0.777
might also expect that DAVE would perform worse on task prompts yet DAVE gave the correct answer only 71.5 % of the time.
generated from Non-Trained templates in comparison to prompts Discussion. On a number of occasions, we were particularly
generated from the Trained templates. Our hypothesis is that the impressed that DAVE was able to derive the Boolean combinations
GPT-2 pre-training should allow DAVE to generalise and produce for certain operations. Take this example from da00: “A car has four
the correct Verilog even in unseen tasks. active-low door open sensors ‘a’, ‘b’, ‘c’, ‘d’. Write combinatorial
This holds for Assignments and Registers, but did not entirely logic for a active-low light ‘l’ which illuminates when any of the
hold with the Non-Trained Sequence Generator templates, specif- doors are open.” From that prompt, DAVE is able to correctly gen-
ically with pg06. Closer investigation of this template revealed that erate the output assign l = a & b & c & d;, i.e., it appears to
almost all of DAVE’s errors (>95 %) stem from mis-classification associate ‘any’ and ‘doors’, as well as understand the relationship
of enable and reset signals. This was unexpected as DAVE did not between ‘any’ and the two ‘active-low’ qualifiers. Another example
have this issue over tasks based on any other Sequence Generator of DAVE “understanding” keywords is the generated Verilog for
template. One theory is that the issue may stem from the difference dr00, which we present in Fig. 1. DAVE can correctly implement
between pg06 and the other templates—perhaps it is too unique. To both synchronous and asynchronous resets, as well as infer clocks
evaluate this, we compared the the R-O similarity of templates pg05 for memory elements when no clocks are explicitly specified.
(which scored 100 %) and pg06 with the Trained pg templates. We
found that pg05 was closest to pg01 (similarity 0.820), whereas pg06 4.4 Translation of Multiple Tasks
For insight into how DAVE can handle the processing of multiple
8 pg06: “Produce a design that generates a 3-bit output ‘uy’ with the sequence: [110,
tasks simultaneously we also provided a multi-task metastructure
100, 101, 100]. The output changes with each rising edge of a clock if the enable signal
‘a’ less than ‘b’ is asserted. Whenever an asynchronous reset ‘r’ is asserted, the design consisting of 2-4 registers and assignments in a single Task prompt.
should output the first element of the sequence.” These are presented in Table 3 under M-T. We divide Multi-tasks

31
Session 1: DNN for CAD MLCAD '20, November 16–20, 2020, Virtual Event, Iceland

into two broad categories—those made purely from Trained tem- ACKNOWLEDGMENTS
plates (of which 5000 were presented during the fine-tuning pro- H. Pearce is supported by the National Science Foundation grant
cess), and those made only from Non-Trained templates. Multi-tasks CMMI-1932264. B. Tan and R. Karri are supported in part by the
performed worse than the individual templates (Trained correct Office of Naval Research under Award Number # N00014-18-1-2058.
52 % of the time, and Non-Trained 41.2 %). Upon manual inspection, This work was supported in part by NYU CCS.
DAVE was generating the correct Verilog structures and syntax in
the outputs, usually only getting variable names/operators incor- REFERENCES
rect. This is reflected in the Average Error R-O, which is high given [1] Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. A Neural Probabilistic
the answer lengths. It is likely that the difficulties DAVE is facing Language Model. Journal of Machine Learning Research, 3 (2003), 1137–1155.
[2] Black, P. E. Ratcliff-Obershelp pattern recognition—dictionary of algorithms
with multi-tasks stem from the naïve concatenation of tasks. In and data structures, 2004.
future we will explore multi-tasks where the “sub-tasks” are related. [3] Devlin, J., Chang, M., Lee, K., and Toutanova, K. BERT: pre-training of deep
bidirectional transformers for language understanding. CoRR abs/1810.04805
(2018).
4.5 Discussion and Limitations [4] Drechsler, R., Harris, I. G., and Wille, R. Generating formal system models
The results presented are promising. DAVE has shown clear ability from natural language descriptions. In IEEE Int. High Level Design Validation and
Test Workshop (HLDVT) (2012), pp. 164–165.
to produce syntactically correct Verilog (in our tests, it rarely, if [5] Harris, C. B., and Harris, I. G. Glast: Learning formal grammars to translate
ever, produced outputs that could not compile—errors were almost natural language specifications into hardware assertions. In Design, Automation
Test in Europe Conf. Exhibition (DATE) (2016), pp. 966–971.
always related to operator choice and/or variable names). DAVE is [6] Hern, A. New ai fake text generator may be too dangerous to release, say creators.
capable of producing code with complex relationships between The Guardian (2019).
inputs and outputs, and even with intermediate signals. In total, [7] Kahng, A. B. Machine Learning Applications in Physical Design: Recent Results
and Directions. In Int. Symp. Physical Design (ISPD) (2018), pp. 68–73.
DAVE returned the correct answer in 94.8 % of all validation tests. [8] Liu, P., Qiu, X., and Huang, X. Recurrent neural network for text classification
That said, our work has limitations. Firstly, other than inferring with multi-task learning. CoRR abs/1605.05101 (2016).
clocks, we do not yet ask DAVE to create a signal that was not [9] Mihalcea, R., Liu, H., and Lieberman, H. NLP (Natural Language Processing)
for NLP (Natural Language Programming). In Computational Linguistics and
already named or otherwise described (e.g., we never provide code Intelligent Text Processing (2006), A. Gelbukh, Ed., Springer Berlin Heidelberg,
such as “Output ‘a’ nor ‘b’”, it is always “Output ‘a’ nor ‘b’ in ‘c’.”). pp. 319–330.
[10] Pan, S. J., and Yang, Q. A Survey on Transfer Learning. IEEE Transactions on
Likewise, we never rely on any form of creativity in the generated Knowledge and Data Engineering 22, 10 (Oct. 2010), 1345–1359.
results—our training data suggests that there is only one path for- [11] Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. Improving
ward to the implementation for a given task template. That is, our Language Understanding by Generative Pre-Training, 2018.
[12] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. Language
templates had a many-to-one relationship with the Verilog they de- models are unsupervised multitask learners, 2019.
scribed, despite there being different ways to express functionally [13] Reddy, S., Chen, D., and Manning, C. D. Coqa: A conversational question
identical Verilog. These are the focus of our ongoing studies. answering challenge. Transactions of the Association for Computational Linguistics
7 (2019), 249–266.
DAVE inherits some technical limitations of GPT-2: The model [14] Servadei, L., Zennaro, E., Devarajegowda, K., Manzinger, M., Ecker, W.,
can only generate outputs of up to 1024 tokens (i.e., words, sym- and Wille, R. Accurate Cost Estimation of Memory Systems Inspired by Ma-
chine Learning for Computer Vision. In Design, Automation Test in Europe Conf.
bols). As longer snippets of code can potentially run into this limit, Exhibition (DATE) (Mar. 2019), pp. 1277–1280.
we had to limit certain inputs—sequence generators were capped [15] Sundermeyer, M., Schlüter, R., and Ney, H. Lstm neural networks for language
at no more than 4 elements, and our multi-tasks were prevented modeling. In Conf. Int. Speech Communication Assoc. (2012).
[16] Treude, C., Robillard, M. P., and Dagenais, B. Extracting Development Tasks
from using long-winded descriptive register templates. to Navigate Software Documentation. IEEE Transactions on Software Engineering
41, 6 (June 2015), 565–581.
5 CONCLUSIONS [17] Vahid, F. Digital Design with RTL Design, VHDL, and Verilog. John Wiley & Sons,
Mar. 2010.
This paper set out to explore the potential use of ML for translating [18] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N.,
natural language specifications into their corresponding Verilog Kaiser, Ł., and Polosukhin, I. Attention is All you Need. In Advances in Neural
Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach,
HDL. We adopted the GPT-2 language model and fine-tuned it over R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017,
a large number of English/Verilog Task/Result pairs to produce pp. 5998–6008.
[19] Woolf, M. minimaxir/aitextgen: A robust Python tool for text-based AI training
DAVE. We investigated DAVE’s performance over sets of English to and generation using GPT-2.
Verilog Tasks based on familiar and unfamiliar templates. In general, [20] Yu, C., Xiao, H., and De Micheli, G. Developing synthesis flows without human
DAVE’s performance exceeded our expectations and was able to knowledge. In Design Automation Conf. (DAC) (2018).
[21] Zhao, J., and Harris, I. G. Automatic Assertion Generation from Natural
produce Verilog in response to both simple, prescriptive prompts, as Language Specifications Using Subtree Analysis. In Design, Automation Test in
well show success in acquiring the advanced capabilities required Europe Conf. Exhibition (DATE) (Mar. 2019), pp. 598–601. ISSN: 1558-1101.
to solve more descriptive settings. Our future work will investigate
the use of larger GPT-2 models for DAVE, increasing the complexity
and length of the tasks, and tuning DAVE for specific tasks such
as security assertion generation from natural language collateral.

TRY DAVE !
Click here9 for instructions to run DAVE freely within Google
Colab.
9 https://colab.research.google.com/drive/1aDSMDWL5hieB3_Th9ZdddDMAKQ2DjWxW

32

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy