ENG2 Verilog
ENG2 Verilog
27
Session 1: DNN for CAD MLCAD '20, November 16–20, 2020, Virtual Event, Iceland
design [7]. We explore the use of transfer learning [10] to teach a 1. Generate Task/Result Type: SimpleAssignment
Metastructure I1: a, I2: b, O: c, Operator: OR
DL-based model to produce Verilog by framing it as a machine trans-
lation problem. Transfer learning provides the ability to learn new Define combinational code to return
Template 2. Select suitable {{.I1}} {{.Op}} {{.I2}} in {{.O}};
tasks without large quantities of labelled data in a target domain.
Repository templates
assign {{.O}} = {{.I1}} {{.Op}} {{.I2}};
GPT-2. We use GPT-2 [12] as our starting point, given its state-
of-the-art performance in zero-shot task settings. GPT-2 is based Define combinational code to
Equivalence return 'a' OR 'b' in 'c'.
on the decoder part of the Transformer, a neural network encoder- 3. Fill templates
Routines
assign c = a | b;
decoder architecture with a self-attention mechanism [18]. At the
core of the GPT-2 approach is language modelling, which can be
4. Store combination of TASK: Define combinational code to
framed as an unsupervised distribution estimation from some set of templates as return 'a' OR 'b' in 'c'.
{Task, Result} pair. RESULT: assign c = a | b;
examples (𝑥 1, 𝑥 2, ..., 𝑥𝑛 ), where each example is composed of vari-
able length sequences of symbols (𝑠 1, 𝑠 2, ..., 𝑠𝑛 ) [12]. This statistical
model of language is thus the joint probability distribution of the Figure 2: The Task/Result Generation Process
symbols in the language (as the product of the conditional probabil- the necessary inputs, outputs, and the relationships between them
ities for each symbol given the preceding sequence [1]). Put simply, from a short description of a task. While previous works use al-
the model learns to answer the following: given some sequence of gorithmic approaches such as parse-tree generation and sub-tree
symbols, what is the most likely next symbol in the sequence? matching [21] to identify the salient elements of the natural lan-
Different tasks can be specified in a language itself, e.g., {“trans- guage description for populating templates, we re-cast the problem
late to french”, “english text”, “french text”} [12]. Radford et al. spec- holistically as translation. As we describe next, we prepare exam-
ulate that a model with sufficiently large capacity can learn to ples of task descriptions with varying descriptiveness, and examine
perform tasks demonstrated in natural language without explicit GPT-2’s ability to produce Verilog after transfer learning [10].
supervision. In other words, given a general system which produces
𝑝 (𝑜𝑢𝑡𝑝𝑢𝑡 |𝑖𝑛𝑝𝑢𝑡), a condition can be introduced to model some task 3.2 Dataset Preparation
𝑝 (𝑜𝑢𝑡𝑝𝑢𝑡 |𝑖𝑛𝑝𝑢𝑡, 𝑡𝑎𝑠𝑘). By training GPT-2 on a large, unlabelled In this work, we fine-tune GPT-2 to produce DAVE, aiming for the
dataset (∼8 million webpages), Radford et al. demonstrated the the ability to translate natural language (i.e., English) into Verilog. GPT-
trained model could perform well on numerous tasks without fine- 2 is designed to process contiguous text sequences, so we adopt the
tuning. The trained model then provides a good starting point for approach proposed in [11], to represent the English–Verilog transla-
performance in specific tasks following fine-tuning [11]. Funda- tion task as an ordered sequence in the format ‘TASK: <English
mentally, GPT-2’s pre-trained, implicit capability to process natural Text> RESULT: <Verilog Code>’.
language can be directed towards specific tasks. We attempt to Open-source Verilog code can be found online, but is unstruc-
harness this capability by fine-tuning GPT-2 for translating natural tured, with varying quality and complexity. For this initial study,
language descriptions to Verilog. we design a custom dataset generation tool inspired by the sort
Natural Language → Code. The challenges in translating spec- of template-based, random auto-marking Q&A systems used in
ifications into computer code has driven research in natural lan- teaching settings (e.g., the OASIS Question Engine2 ). Rather than
guage programming [9]. Recent work has shown that there is a produce thousands of Task/Result pairs manually, we prepare sev-
finite limit to the number of unique ways one can express certain eral natural language templates which encapsulate different task
programming structures (e.g. for-loops) in natural language, and as scenarios. An example generation process is shown in Fig. 2.
such it is possible to extract this information and transform it into In step (1) our tool generates a Task/Result metastructure, a de-
its corresponding computer code [9]. Other related works use NLP scriptor for the type of task (e.g., an assignment) and relevant
techniques, including rule-based processing, for formal system mod- information for that task (e.g., variable names, operators). Possi-
eling [4], generating hardware assertions [5], and for enhancing ble metastructure tasks include combinational signal assignments,
documentation by automatically extracting software development registers, sequence generators, or a multi-set of these. Then, in step
tasks and associating them with the relevant paragraphs [16]. While (2), the tool randomly chooses a suitable template for the task that
showing promising results, there are limitations on how flexible encapsulates all information in English and Verilog. In step (3), the
the natural language descriptions can be with respect to structure. tool “fills in” these templates, translating arguments where neces-
Earlier work involves designing separate components to perform sary (e.g. OR operator is ‘or’ in English and ‘|’ in Verilog). Finally,
specific tasks such as identifying “steps”, “loops”, and “comments” in step (4), the tool saves the generated Task/Result pair.
from natural text [9]. To our knowledge, DL techniques to generate Structurally, we organise our templates into the different task
HDL from natural language have not been explored. classes they describe—(combinational) assignments, registers, and
sequence generators. We then categorise them further as either
3 FINE-TUNING GPT-2 FOR VERILOG prescriptive or descriptive. Prescriptive templates are like the ex-
3.1 Problem definition ample presented in Fig. 2. We conjecture that these should be trivial
to translate—simple substitutions and word-reordering is all that
In this work, we focus on an early-stage CAD problem: interpreting is required to convert from the English to Verilog. Descriptive
a high-level, informal description of functionality and producing templates, meanwhile, are more like the example presented in
the corresponding concrete specification. For small designs, de-
signers can craft an RTL specification directly after identifying 2 https://www.oasisqe.com/
28
Session 1: DNN for CAD MLCAD '20, November 16–20, 2020, Virtual Event, Iceland
Table 1: Template-based Dataset Information. (pX → prescriptive; dX → descriptive; X is the task type)
Task # for # Non- Samples Model Verilog
Example of Task in English
Training Training / Template
17 2 2000
and ‘b’, take the nor of these and return the result in ‘c’. assign c = ! ( a | b ) ;
as ‘b’ xnor ‘r’, an asynchronous reset ‘r’, and a clock ‘c’. e l s e i f ( e ) b e g i n q < = ! ( a & b ) ; end
end ;
Design the code for an alarm system. When the panic / / assume c l o c k c l k
reg a ;
mode ‘m’ is selected (= 0) the alarm system ‘a’ should
dr 3 1 4000 a l w a y s @( p o s e d g e c l k ) b e g i n
activate (= 1) and should only deactivate (= 0) when the i f ( c ) b e g i n a <= 0 ; end
active-low synchronous cancel button ‘c’ is selected (= 1). e l s e i f ( ! m) b e g i n a <= 1 ; end
end
Define sequential enum { s0 , s1 , s 2 } s t a t e ; r e g u ;
Sequence Generator (g)
a l w a y s @( p o s e d g e c ) b e g i n
code which will produce the repeating sequence i f ( s ) b e g i n s t a t e <= s 0 ; u <= b0 ; end
pg 4 2 4000 [0, 1, 0] on output ‘u’. It should advance on clock ‘c’ e l s e begin
whenever enable ‘e’ is present, and a synchronous reset unique case ( s t a t e )
‘r’ should reset the sequence back to the first element. s 0 : i f ( e ) b e g i n s t a t e <= s 1 ; u <= b0 ; end
s 1 : i f ( e ) b e g i n s t a t e <= s 2 ; u <= b1 ; end
s 2 : i f ( e ) b e g i n s t a t e <= s 0 ; u <= b0 ; end
endcase
end
i f ( r ) b e g i n a r <= 0 ; end
reset ‘r’ defined as ‘yxo’ greater than or equal to ‘m’, e l s e i f ( q ) b e g i n a r <= gv % l j ; end
and clock ‘p’. A vault door has three active-low secret end
– N/A N/A 5250 switch pressed sensors ‘et’, ‘lz’, ‘l’. Write combinatorial assign s = ! ( et | lz | l ) ;
a s s i g n nc = t f s > w ; r e g [ 5 : 0 ] w ;
logic for a active-high lock ‘s’ which opens when all of a l w a y s @( p o s e d g e xx ) b e g i n
the switches are pressed. Write a 6-bit register ‘w’ with i f ( nc ) b e g i n w <= 0 ; end
input ‘se’ and ‘md’, enable ‘mmx’, synchronous reset e l s e i f (mmx) b e g i n w <= s e & md ; end
‘nc’ defined as ‘tfs’ greater than ‘w’, and clock ‘xx’. end
Fig. 1. They are more complex to translate, and a human designer Non-Trained templates, i.e., such Task/Result pairs are presented
would implicitly perform intermediate steps—such as understand- to the language model during validation.
ing that a given input is being used as an enable signal or as a reset. While the number of templates might appear low in certain cases
Multi-task templates are random concatenations of two to four as- (e.g., # of Descriptive vs. Prescriptive assignments), the task in-
signment/register templates. Table 1 provides additional examples stances of the given templates vary significantly from each other
of the different task types generated from the various templates. due to the addition or omission of optional clauses in the natural
While at first glance this template-based approach for dataset text during data generation. A template that describes a register
generation might appear to restrict DAVE’s ability to generalize design task may have a clause describing a reset signal, and if the
over English descriptions, this dataset is only used for fine-tuning template is used for a metastructure with no reset signal, that entire
the language model. As GPT-2 is pre-trained over the large WebText clause is omitted. As such a given template identifier refers only
dataset [12], we theorize that DAVE should retain at some ability to the overall sentence structure used in a Task, the unique pattern
to process natural language features such as synonyms and differ- of compulsory words within that template, such as introductory
ent word/clause orders. To validate this hypothesis, we hold-out remarks (e.g. “Describe combinatorial logic to...”), and individual
a subset of templates for use during testing and evaluation. Table 1 words used within that template (e.g. conjunctions, prepositions).
has information about the final dataset, including the number of Descriptive templates have randomly generated settings such as
“Trained” and “Non-Trained” (held-out) templates for all task types. “an attendant call button”. These are generated from the cascaded
In our evaluation, we initially query DAVE with new task in- sub-templates, increasing the entropy of each individual Task/Re-
stances based on Trained templates to observe its baseline ability to sult pair. Register and Sequence Generator templates are allowed
perform “familiar” tasks (i.e., produce Verilog from English descrip- to recursively define the basic template (prescriptive assignments).
tions that are similar to the training data). To study generalizability A register might define a signal (e.g. an enable) as a function (e.g.
of the approach, we query DAVE with new task instances based on ‘a’ nand ‘b’) rather than as a pre-set input (e.g. ‘c’).
29
Session 1: DNN for CAD MLCAD '20, November 16–20, 2020, Virtual Event, Iceland
Multi-tasks combine other types of tasks and are difficult to cat- DAVE’s performance over these tasks as evidence that the GPT-2
egorise. We randomly generate 5,250 multi-task samples, of which language model offers promise for our intended translation purpose.
5000 are used for fine-tuning. We discuss details in Section 4.4.
4.1.1 A measure of equality. There are numerous ways to imple-
3.3 Experimental Platform ment a given specification in any programming language. Take the
example from Fig. 2: while it provides the correct answer as assign
After we generate a suitable corpus of Task/Result pairs according c = a | b;, it could be equivalently specified as assign c = b
to the method described in Section 3.2, we fine-tune the 345 million | a;. This becomes even more of an issue when implementing
parameter GPT-2 model on a high-performance computing node larger and more complex and descriptive specifications.
with 2 Intel Xeon E5-2698 v4 @ 2.20GHz cores, 20 GB of RAM, While there are ways of quantifying identical code (e.g., com-
and an NVIDIA V100 32 GB graphics card over all categories of paring abstract syntax trees), we opt, for a simpler comparison of
Task/Result pairs simultaneously (i.e. the same trained model is used DAVE’s outputs against the template tool using a sequence equiv-
to decode each type of Task). Our fine-tuning script is modified from alence metric. This is because the generated Verilog code should
[19]. We use the Python programming environment, with pytorch be relatively short and simple. More precisely, we define correct-
version 1.5.0, tensorflow version 2.2, and aitextgen version 0.2.3. ness of the generated text as its distance to the template-provided
Underlying these we use cuda version 10.1 and cudnn version 7.6. “correct” answer (excluding white-space characters from both) as
To fine-tune GPT-2, we leave the hyper-parameters at their sug- measured by their Ratcliff-Obershelp similarity [2]. This means
gested defaults (learning rate 1e-4, weight decay 0.05, adam epsilon that if DAVE returns assign c = a | b; as the correct answer
1e-8) and perform fine-tuning for 7500 steps. The training data to the prompt in Fig. 2, it scores 1.00—i.e., the result is fully correct.
covers a random sample of 95% of the generated samples of each However, despite being functionally equivalent, a result of assign
Trained template category, with 5% held back for evaluating the c = b | a; scores only 0.833.
model. To evaluate model “goodness”, we use the same computing While this metric is simple, manual inspection of the results that
resources as for training and use default GPT-2 output generation did not have the expected score of 100, revealed no examples where
parameters (temperature 0.7, top_p 0.9, and top_k 0/disabled). DAVE had performed small but functionally equivalent changes
(e.g., inverting the order of variables compared to their order in
4 EXPERIMENTAL INVESTIGATION the specification). That the output has a deterministic ordering to
4.1 Overview the variables is not a surprising result, as the template engine that
The purpose of this work is to explore the potential for general- DAVE is fine-tuned from has a deterministic order to the Verilog
purpose language models in translating system specifications pro- code that it produces. We provide insights from our investigation
vided in English to their hardware implementations in the Verilog in three parts: DAVE’s performance on prescriptive (Section 4.2),
HDL. As such we are interested in measuring the quality of the descriptive (Section 4.3), and multi tasks (Section 4.4).
generated Verilog. This raises an obvious question—how should
one define “quality”? In this work we are interested in a language 4.2 Translation of Prescriptive Specifications
model which can perform design tasks of a similar difficulty to DAVE’s performance on prescriptive tasks is presented in Table 2,
those posed in a textbook [17]. with Non-Trained templates highlighted in bold. Each row con-
However, there are no automated systems to quantify how well tains information on the number of template samples used for
a specification has been implemented in its corresponding Verilog fine-tuning, the number of template samples used for validation,
if it is “almost” correct. Formal equivalence check is an option, but the number DAVE returned correctly, and (where applicable) the
requires that the design is at least syntactically compliant. This average Ratcliff-Obershelp (R-O) similarity of returned incorrect
presents a challenge as we wish to quantify the quality of DAVE’s answers compared to the correct answer.
Verilog generation. However, given that we generate Task/Result With regards to assignments, DAVE performs well on tasks based
pairs with a template engine, we have a baseline ‘canonical’ re- on Trained (e.g., pa00 3 ) templates, getting 99.7 % of all samples
sponse that we can compare DAVE’s output against. This allows us correct across this validation category. It performs slightly worse
to introduce the equivalence between the two generators as a mea- on tasks drawn from Non-Trained templates (e.g., pa18 4 ), scoring
sure of quality, discussed in subsubsection 4.1.1. Where DAVE’s out- 96.5 % correct. DAVE scores well on Trained register templates (e.g.,
put is not equivalent, we manually examine the result qualitatively. pr00 5 ) (99.2 % correct). Likewise DAVE performed well with the
An important part of our evaluation is to examine DAVE’s per- Non-Trained Templates in this category (e.g. pr116 ), with 98.7 %
formance over unfamiliar texts. Otherwise, it could be argued that correct. While DAVE did well in Trained Sequence Generators (e.g.
the language model has simply learned a kind of pattern recogni- pg017 ) with 99.5 % correct across the samples, it performed poorly
tion over the Task/Result pairs, and is just using string relocation
techniques to score highly during validation. If this notion were 3 pa00 example: “Put the result of ‘a’ nand ‘b’ in ‘c’.”
4 pa18: “Assign into output ‘c’ the result of ‘a’ xor ‘b’.”
applied to a student, we might say that they had learned to produce 5 pr00: “Define a 8-bit register ‘a’ with input ‘a’ defined as ‘b’ and ‘c’, enable ‘e’, and
Verilog by rote, rather than through understanding. clock ‘c’.”
This examination is provided through the Non-Trained Tem- 6 pr11: Given input ’a’, enable ’e’ defined as ’d’ nxor ’f’, an asynchronous reset ’r’
plates. Recall that these are unfamiliar to DAVE, i.e., they were (being ’x’ or ’y’) make a 7-bit register ’q’.
7 pg01: “Define sequential code which will produce the repeating sequence [00, 10, 10]
not seen during fine-tuning, and DAVE has had no opportunity to on the 2-bit output ‘q’. It should advance on each tick of a clock ‘c’ whenever enable
learn/memorize their syntax and structure. We seek insight from defined as ‘a’ nxor ‘b’ is present.”
30
Session 1: DNN for CAD MLCAD '20, November 16–20, 2020, Virtual Event, Iceland
Table 2: Testing DAVE on Prescriptive Tasks Table 3: Testing DAVE on Descriptive and Multi- Tasks
Template Avg. Template Avg.
# Trained # Validated # Correct # Trained # Validated # Correct
Type Name Error R-O Type Name Error R-O
pa00 1900 100 99 0.947 da00 3800 200 200 –
Assign.
pa01 1900 100 100 – da01 3800 200 199 0.952
pa02 1900 100 100 – da02 3800 200 196 0.956
pa03 1900 100 100 – da03 0 200 200 –
pa04 1900 100 100 – dr00 3800 200 200 –
Register
pa05 1900 100 100 – dr01 3800 200 195 0.985
pa06 1900 100 97 0.951 dr02 3800 200 199 0.992
pa07 1900 100 100 –
Assignment
M-T
pa09 1900 100 100 – Trained 5000 250 130 0.907
pa10 1900 100 100 – Non-Trained 0 250 103 0.817
pa11 1900 100 100 –
pa12 1900 100 100 –
pa13 1900 100 100 – was closest to pg03 (similarity 0.777). These numbers are similar
pa14 1900 100 99 0.947 enough that we would have expected pg06 to score better. Further
pa15 1900 100 100 –
pa16 1900 100 100 –
formal analysis is an avenue for our future work. It is likely that
pa17 0 100 95 0.956 providing a greater variety of Sequence Generator templates during
pa18 0 100 98 0.898 training would help DAVE produce more accurate results.
pr00 2850 150 148 0.981
pr01 2850 150 149 0.993
pr02 2850 150 149 0.973 4.3 Translation of Descriptive Specifications
pr03 2850 150 150 –
Table 3 presents DAVE’s performance over Descriptive Tasks. While
pr04 2850 150 148 0.990
pr05 2850 150 147 0.982 this category has fewer templates, each template has more oppor-
Register
pr06 2850 100 148 0.993 tunities for entropy due to the presence of optional clauses and
pr07 2850 150 149 0.983 implicit intermediate signals. We also design these templates to be
pr08 2850 150 150 –
pr09 2850 150 150 – more “difficult”—they invoke requirements such as ‘active-high’
pr10 0 150 149 0.960 and ‘active-low’ qualifiers to their variables, terms that DAVE needs
pr11 0 150 147 0.965 to recognise and accommodate in the generated Verilog.
pg01 3800 200 200 –
Seq. Generator
31
Session 1: DNN for CAD MLCAD '20, November 16–20, 2020, Virtual Event, Iceland
into two broad categories—those made purely from Trained tem- ACKNOWLEDGMENTS
plates (of which 5000 were presented during the fine-tuning pro- H. Pearce is supported by the National Science Foundation grant
cess), and those made only from Non-Trained templates. Multi-tasks CMMI-1932264. B. Tan and R. Karri are supported in part by the
performed worse than the individual templates (Trained correct Office of Naval Research under Award Number # N00014-18-1-2058.
52 % of the time, and Non-Trained 41.2 %). Upon manual inspection, This work was supported in part by NYU CCS.
DAVE was generating the correct Verilog structures and syntax in
the outputs, usually only getting variable names/operators incor- REFERENCES
rect. This is reflected in the Average Error R-O, which is high given [1] Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. A Neural Probabilistic
the answer lengths. It is likely that the difficulties DAVE is facing Language Model. Journal of Machine Learning Research, 3 (2003), 1137–1155.
[2] Black, P. E. Ratcliff-Obershelp pattern recognition—dictionary of algorithms
with multi-tasks stem from the naïve concatenation of tasks. In and data structures, 2004.
future we will explore multi-tasks where the “sub-tasks” are related. [3] Devlin, J., Chang, M., Lee, K., and Toutanova, K. BERT: pre-training of deep
bidirectional transformers for language understanding. CoRR abs/1810.04805
(2018).
4.5 Discussion and Limitations [4] Drechsler, R., Harris, I. G., and Wille, R. Generating formal system models
The results presented are promising. DAVE has shown clear ability from natural language descriptions. In IEEE Int. High Level Design Validation and
Test Workshop (HLDVT) (2012), pp. 164–165.
to produce syntactically correct Verilog (in our tests, it rarely, if [5] Harris, C. B., and Harris, I. G. Glast: Learning formal grammars to translate
ever, produced outputs that could not compile—errors were almost natural language specifications into hardware assertions. In Design, Automation
Test in Europe Conf. Exhibition (DATE) (2016), pp. 966–971.
always related to operator choice and/or variable names). DAVE is [6] Hern, A. New ai fake text generator may be too dangerous to release, say creators.
capable of producing code with complex relationships between The Guardian (2019).
inputs and outputs, and even with intermediate signals. In total, [7] Kahng, A. B. Machine Learning Applications in Physical Design: Recent Results
and Directions. In Int. Symp. Physical Design (ISPD) (2018), pp. 68–73.
DAVE returned the correct answer in 94.8 % of all validation tests. [8] Liu, P., Qiu, X., and Huang, X. Recurrent neural network for text classification
That said, our work has limitations. Firstly, other than inferring with multi-task learning. CoRR abs/1605.05101 (2016).
clocks, we do not yet ask DAVE to create a signal that was not [9] Mihalcea, R., Liu, H., and Lieberman, H. NLP (Natural Language Processing)
for NLP (Natural Language Programming). In Computational Linguistics and
already named or otherwise described (e.g., we never provide code Intelligent Text Processing (2006), A. Gelbukh, Ed., Springer Berlin Heidelberg,
such as “Output ‘a’ nor ‘b’”, it is always “Output ‘a’ nor ‘b’ in ‘c’.”). pp. 319–330.
[10] Pan, S. J., and Yang, Q. A Survey on Transfer Learning. IEEE Transactions on
Likewise, we never rely on any form of creativity in the generated Knowledge and Data Engineering 22, 10 (Oct. 2010), 1345–1359.
results—our training data suggests that there is only one path for- [11] Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. Improving
ward to the implementation for a given task template. That is, our Language Understanding by Generative Pre-Training, 2018.
[12] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. Language
templates had a many-to-one relationship with the Verilog they de- models are unsupervised multitask learners, 2019.
scribed, despite there being different ways to express functionally [13] Reddy, S., Chen, D., and Manning, C. D. Coqa: A conversational question
identical Verilog. These are the focus of our ongoing studies. answering challenge. Transactions of the Association for Computational Linguistics
7 (2019), 249–266.
DAVE inherits some technical limitations of GPT-2: The model [14] Servadei, L., Zennaro, E., Devarajegowda, K., Manzinger, M., Ecker, W.,
can only generate outputs of up to 1024 tokens (i.e., words, sym- and Wille, R. Accurate Cost Estimation of Memory Systems Inspired by Ma-
chine Learning for Computer Vision. In Design, Automation Test in Europe Conf.
bols). As longer snippets of code can potentially run into this limit, Exhibition (DATE) (Mar. 2019), pp. 1277–1280.
we had to limit certain inputs—sequence generators were capped [15] Sundermeyer, M., Schlüter, R., and Ney, H. Lstm neural networks for language
at no more than 4 elements, and our multi-tasks were prevented modeling. In Conf. Int. Speech Communication Assoc. (2012).
[16] Treude, C., Robillard, M. P., and Dagenais, B. Extracting Development Tasks
from using long-winded descriptive register templates. to Navigate Software Documentation. IEEE Transactions on Software Engineering
41, 6 (June 2015), 565–581.
5 CONCLUSIONS [17] Vahid, F. Digital Design with RTL Design, VHDL, and Verilog. John Wiley & Sons,
Mar. 2010.
This paper set out to explore the potential use of ML for translating [18] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N.,
natural language specifications into their corresponding Verilog Kaiser, Ł., and Polosukhin, I. Attention is All you Need. In Advances in Neural
Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach,
HDL. We adopted the GPT-2 language model and fine-tuned it over R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017,
a large number of English/Verilog Task/Result pairs to produce pp. 5998–6008.
[19] Woolf, M. minimaxir/aitextgen: A robust Python tool for text-based AI training
DAVE. We investigated DAVE’s performance over sets of English to and generation using GPT-2.
Verilog Tasks based on familiar and unfamiliar templates. In general, [20] Yu, C., Xiao, H., and De Micheli, G. Developing synthesis flows without human
DAVE’s performance exceeded our expectations and was able to knowledge. In Design Automation Conf. (DAC) (2018).
[21] Zhao, J., and Harris, I. G. Automatic Assertion Generation from Natural
produce Verilog in response to both simple, prescriptive prompts, as Language Specifications Using Subtree Analysis. In Design, Automation Test in
well show success in acquiring the advanced capabilities required Europe Conf. Exhibition (DATE) (Mar. 2019), pp. 598–601. ISSN: 1558-1101.
to solve more descriptive settings. Our future work will investigate
the use of larger GPT-2 models for DAVE, increasing the complexity
and length of the tasks, and tuning DAVE for specific tasks such
as security assertion generation from natural language collateral.
TRY DAVE !
Click here9 for instructions to run DAVE freely within Google
Colab.
9 https://colab.research.google.com/drive/1aDSMDWL5hieB3_Th9ZdddDMAKQ2DjWxW
32