0% found this document useful (0 votes)
490 views691 pages

Qualcomm Hexagon: Programmer's Reference Manual

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
490 views691 pages

Qualcomm Hexagon: Programmer's Reference Manual

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 691

Qualcomm Hexagon V73

Programmer’s Reference Manual

80-N2040-53 Rev. AA
October 3, 2022

All Qualcomm products mentioned herein are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

Qualcomm and Hexagon are trademarks or registered trademarks of Qualcomm Incorporated. Other product and brand names may
be trademarks or registered trademarks of their respective owners.
This technical data may be subject to U.S. and international export, re-export, or transfer (“export”) laws. Diversion contrary to U.S.
and international law is strictly prohibited.

Qualcomm Technologies, Inc.


5775 Morehouse Drive
San Diego, CA 92121
U.S.A.

© 2022 Qualcomm Technologies, Inc. and/or its subsidiaries. All rights reserved.
Contents

Figures ..............................................................................................................................14
Tables ................................................................................................................................15

1 Introduction ................................................................................................... 17
1.1 Hexagon V73 processor architecture ......................................................................................17
1.1.1 Memory .........................................................................................................................17
1.1.1.1 Cache memory ...............................................................................................17
1.1.1.2 Virtual memory..............................................................................................17
1.1.2 Registers ........................................................................................................................17
1.1.3 Instruction sequencer....................................................................................................18
1.1.4 Execution units ..............................................................................................................18
1.1.5 Load/store units.............................................................................................................18
1.2 Instruction set .........................................................................................................................19
1.2.1 Addressing modes .........................................................................................................19
1.2.2 Program flow .................................................................................................................20
1.2.3 Instruction pipeline.................................................................................................20
1.3 Technical assistance ................................................................................................................20

2 Registers ........................................................................................................ 21
2.1 General registers .....................................................................................................................21
2.2 Control registers ......................................................................................................................23
2.2.1 Program counter............................................................................................................26
2.2.2 Loop registers ................................................................................................................27
2.2.3 User status register........................................................................................................27
2.2.4 Modifier registers ..........................................................................................................30
2.2.5 Predicate registers .........................................................................................................30
2.2.6 Circular start registers....................................................................................................32
2.2.7 User general pointer register.........................................................................................32
2.2.8 Global pointer................................................................................................................32
2.2.9 Cycle count registers......................................................................................................33
2.2.10 Frame limit register......................................................................................................34
2.2.11 Frame key register .......................................................................................................34
2.2.12 Packet count registers..................................................................................................34
2.2.13 Qtimer registers ...........................................................................................................35

80-N2040-53 Rev. AA 2
Qualcomm Hexagon V73 Programmer’s Reference Manual

3 Instructions.................................................................................................... 36
3.1 Hexagon processor instruction syntax.....................................................................................36
3.1.1 Numeric operands .........................................................................................................37
3.1.2 Terminology ...................................................................................................................38
3.1.3 Register operands ..........................................................................................................39
3.2 Instruction classes ...................................................................................................................41
3.3 Instruction packets ..................................................................................................................42
3.3.1 Packet execution semantics...........................................................................................42
3.3.2 Sequencing semantics ...................................................................................................43
3.3.3 Resource constraints......................................................................................................43
3.3.4 Grouping constraints .....................................................................................................44
3.3.5 Dependency constraints ................................................................................................45
3.3.6 Ordering constraints ......................................................................................................45
3.3.7 Alignment constraints....................................................................................................46
3.4 Instruction intrinsics................................................................................................................46
3.5 Compound instructions...........................................................................................................47
3.6 Duplex instructions .................................................................................................................47

4 Data Processing ............................................................................................ 48


4.1 Data types ...............................................................................................................................48
4.1.1 Fixed-point data.............................................................................................................48
4.1.1.1 Scalar operations..................................................................................................48
4.1.1.2 Vector operations.................................................................................................49
4.1.2 Floating-point data ........................................................................................................50
4.1.2.1 Floating-point operations.....................................................................................50
4.1.3 Complex data .................................................................................................................50
4.1.4 Vector data ....................................................................................................................50
4.2 Instruction options ..................................................................................................................52
4.2.1 Fractional scaling ...........................................................................................................52
4.2.2 Saturation ......................................................................................................................52
4.2.3 Arithmetic rounding ......................................................................................................52
4.2.4 Convergent rounding .....................................................................................................53
4.2.5 Scaling for divide and square-root.................................................................................53
4.3 XTYPE operations ....................................................................................................................54
4.3.1 ALU.................................................................................................................................54
4.3.2 Bit manipulation ............................................................................................................54
4.3.3 Complex .........................................................................................................................55
4.3.4 Floating point.................................................................................................................55
4.3.5 Multiply..........................................................................................................................56
4.3.6 Permute .........................................................................................................................58
4.3.7 Predicate........................................................................................................................58
4.3.8 Shift................................................................................................................................59

80-N2040-53 Rev. AA 3
Qualcomm Hexagon V73 Programmer’s Reference Manual

4.4 ALU32 operations....................................................................................................................60


4.5 Vector operations....................................................................................................................60
4.6 CR operations ..........................................................................................................................62
4.7 Compound operations ............................................................................................................62
4.8 Special operations ...................................................................................................................63
4.8.1 H.264 CABAC processing................................................................................................63
4.8.1.1 CABAC implementation........................................................................................64
4.8.1.2 Code example.......................................................................................................65
4.8.2 IP Internet checksum .....................................................................................................66
4.8.2.1 Code example.......................................................................................................67
4.8.3 Software-defined radio..................................................................................................68
4.8.3.1 Rake despreading .................................................................................................68
4.8.3.2 Polynomial operations .........................................................................................69

5 Memory........................................................................................................... 70
5.1 Memory model........................................................................................................................70
5.1.1 Address space ................................................................................................................70
5.1.2 Byte order ......................................................................................................................70
5.1.3 Alignment ......................................................................................................................71
5.2 Memory loads .........................................................................................................................72
5.3 Memory stores ........................................................................................................................73
5.4 Dual stores ..............................................................................................................................74
5.5 Slot 1 store with slot 0 load.....................................................................................................74
5.6 New-value stores.....................................................................................................................74
5.7 Mem-ops .................................................................................................................................75
5.8 Addressing modes ...................................................................................................................75
5.8.1 Absolute.........................................................................................................................76
5.8.2 Absolute-set...................................................................................................................76
5.8.3 Absolute with register offset .........................................................................................76
5.8.4 Global pointer relative ...................................................................................................76
5.8.5 Indirect...........................................................................................................................78
5.8.6 Indirect with offset ........................................................................................................78
5.8.7 Indirect with register offset ...........................................................................................78
5.8.8 Indirect with auto-increment immediate ......................................................................79
5.8.9 Indirect with auto-increment register ...........................................................................79
5.8.10 Circular with auto-increment immediate ....................................................................80
5.8.11 Circular with auto-increment register .........................................................................81
5.8.12 Bit-reversed with auto-increment register ..................................................................82
5.9 Conditional load/stores...........................................................................................................83
5.10 Cache memory ......................................................................................................................84
5.10.1 Uncached memory ......................................................................................................85
5.10.2 Tightly coupled memory..............................................................................................85
5.10.3 Cache maintenance operations ...................................................................................85

80-N2040-53 Rev. AA 4
Qualcomm Hexagon V73 Programmer’s Reference Manual

5.10.4 L2 cache operations .....................................................................................................86


5.10.5 Cache line zero.............................................................................................................86
5.10.6 Cache prefetch.............................................................................................................87
5.10.6.1 Hardware-based instruction cache prefetching .................................................87
5.10.6.2 Software-based data cache prefetching.............................................................87
5.10.6.3 Software-based L2fetch .....................................................................................87
5.10.6.4 Hardware-based data cache prefetching ...........................................................89
5.11 Memory ordering ..................................................................................................................90
5.12 Atomic operations .................................................................................................................91

6 Conditional Execution ................................................................................. 93


6.1 Scalar predicates .....................................................................................................................93
6.1.1 Generating scalar predicates .........................................................................................93
6.1.2 Consuming scalar predicates .........................................................................................95
6.1.3 Auto-AND predicates .....................................................................................................96
6.1.4 Dot-new predicates........................................................................................................97
6.1.5 Dependency constraints ................................................................................................98
6.2 Vector predicates ....................................................................................................................98
6.2.1 Vector compare .............................................................................................................98
6.2.2 Vector mux instruction ................................................................................................100
6.2.3 Using vector conditionals ............................................................................................100
6.3 Predicate operations .............................................................................................................101

7 Software Stack............................................................................................. 102


7.1 Stack structure ......................................................................................................................102
7.2 Stack frames ..........................................................................................................................103
7.3 Stack protection ....................................................................................................................103
7.3.1 Stack bounds checking.................................................................................................103
7.3.2 Stack smashing protection...........................................................................................104
7.4 Stack registers .......................................................................................................................104
7.5 Stack instructions ..................................................................................................................105

8 Program Flow .............................................................................................. 106


8.1 Conditional instructions ........................................................................................................106
8.2 Hardware loops .....................................................................................................................106
8.2.1 Loop setup ...................................................................................................................108
8.2.2 Loop end ......................................................................................................................109
8.2.3 Loop execution ............................................................................................................110
8.2.4 Pipelined hardware loops ............................................................................................110
8.2.5 Loop restrictions ..........................................................................................................112
8.3 Software branches ................................................................................................................113
8.3.1 Jumps...........................................................................................................................114

80-N2040-53 Rev. AA 5
Qualcomm Hexagon V73 Programmer’s Reference Manual

8.3.2 Calls..............................................................................................................................114
8.3.3 Returns ........................................................................................................................115
8.3.4 Extended branches ......................................................................................................116
8.3.5 Branches to and from packets .....................................................................................116
8.4 Speculative jumps .................................................................................................................117
8.5 Compare jumps .....................................................................................................................118
8.5.1 New-value compare jumps ..........................................................................................118
8.6 Register transfer jumps .........................................................................................................120
8.7 Dual jumps ............................................................................................................................121
8.8 Hint indirect jump target.......................................................................................................121
8.9 Pauses ...................................................................................................................................122
8.10 Exceptions ...........................................................................................................................122

9 PMU Events.................................................................................................. 125


9.1 V73 processor event symbols................................................................................................125

10 Instruction Encoding ................................................................................ 141


10.1 Instructions .........................................................................................................................141
10.2 Sub-instructions ..................................................................................................................143
10.3 Duplexes ..............................................................................................................................145
10.4 Instruction classes ...............................................................................................................147
10.5 Instruction packets ..............................................................................................................148
10.6 Loop packets .......................................................................................................................149
10.7 Immediate values ................................................................................................................150
10.8 Scaled immediate values.....................................................................................................150
10.9 Constant extenders .............................................................................................................151
10.10 New-value operands .........................................................................................................154
10.11 Instruction mapping ..........................................................................................................154

11 Instruction Set ........................................................................................... 155


11.1 ALU32 ..................................................................................................................................156
11.1.1 ALU32 ALU .................................................................................................................156
Logical operations ..........................................................................................158
Negate ............................................................................................................160
NOP ................................................................................................................161
Subtract ..........................................................................................................162
Sign extend.....................................................................................................164
Transfer immediate........................................................................................165
Transfer register.............................................................................................167
Vector add halfwords .....................................................................................168
Vector average halfwords ..............................................................................169
Vector subtract halfwords..............................................................................170

80-N2040-53 Rev. AA 6
Qualcomm Hexagon V73 Programmer’s Reference Manual

Zero extend ....................................................................................................171


11.1.2 ALU32 PERM ..............................................................................................................172
Mux ................................................................................................................174
Shift word by 16 .............................................................................................176
Pack high and low halfwords..........................................................................177
11.1.3 ALU32 PRED ...............................................................................................................178
Conditional shift halfword..............................................................................180
Conditional combine ......................................................................................182
Conditional logical operations........................................................................183
Conditional subtract.......................................................................................185
Conditional sign extend..................................................................................186
Conditional transfer .......................................................................................188
Conditional zero extend .................................................................................189
Compare.........................................................................................................191
Compare to general register ..........................................................................193
11.2 CR ........................................................................................................................................194
Corner detection acceleration .......................................................................196
Logical reductions on predicates....................................................................197
Looping instructions .......................................................................................198
Add to PC........................................................................................................200
Pipelined loop instructions.............................................................................201
Logical operations on predicates ...................................................................203
User control register transfer.........................................................................205
11.3 JR .........................................................................................................................................206
Hinted call subroutine from register ..............................................................207
Hint an indirect jump address ........................................................................208
Jump to address from register .......................................................................209
Hinted jump to address from register............................................................210
11.4 J ...........................................................................................................................................211
Compare and jump.........................................................................................213
Jump to address .............................................................................................217
Jump to address conditioned on new predicate ............................................218
Jump to address condition on register value .................................................219
Transfer and jump ..........................................................................................221
11.5 LD ........................................................................................................................................222
Load-acquire doubleword ..............................................................................224
Load doubleword conditionally......................................................................225
Load byte........................................................................................................227
Load byte conditionally ..................................................................................229
Load byte into shifted vector .........................................................................231
Load half into shifted vector ..........................................................................234
Load halfword ................................................................................................237
Load halfword conditionally...........................................................................239

80-N2040-53 Rev. AA 7
Qualcomm Hexagon V73 Programmer’s Reference Manual

Memory copy .................................................................................................241


Piecemeal memory copy ................................................................................242
Load unsigned byte ........................................................................................243
Load unsigned byte conditionally...................................................................245
Load unsigned halfword .................................................................................247
Load unsigned halfword conditionally ...........................................................249
Load word.......................................................................................................251
Load-acquire word .........................................................................................253
Load word conditionally.................................................................................254
Deallocate stack frame...................................................................................256
Deallocate frame and return..........................................................................258
Load and unpack bytes to halfwords .............................................................260
11.6 MEMOP ...............................................................................................................................268
Operation on memory halfword ....................................................................270
Operation on memory word ..........................................................................271
11.7 NV........................................................................................................................................272
11.7.1 NV J ............................................................................................................................272
11.7.2 NV ST..........................................................................................................................276
Store new-value byte conditionally................................................................278
Store new-value halfword ..............................................................................280
Store new-value halfword conditionally ........................................................282
Store new-value word ....................................................................................284
Store new-value word conditionally ..............................................................286
11.8 ST.........................................................................................................................................288
Store-release doubleword..............................................................................290
Store doubleword conditionally.....................................................................291
Store byte.......................................................................................................293
Store byte conditionally .................................................................................295
Store halfword................................................................................................298
Store halfword conditionally ..........................................................................301
Release ...........................................................................................................305
Store word......................................................................................................306
Store-release word.........................................................................................308
Store word conditionally ................................................................................309
Allocate stack frame.......................................................................................312
11.9 SYSTEM................................................................................................................................314
11.9.1 SYSTEM USER.............................................................................................................314
Store conditional ............................................................................................315
Zero a cache line ............................................................................................316
Memory barrier..............................................................................................317
Breakpoint......................................................................................................318
Data cache prefetch .......................................................................................319
Data cache maintenance user operations......................................................320

80-N2040-53 Rev. AA 8
Qualcomm Hexagon V73 Programmer’s Reference Manual

Send value to DIAG trace ...............................................................................322


Instruction cache maintenance user operations............................................323
Instruction synchronization............................................................................324
L2 cache prefetch ...........................................................................................325
Pause ..............................................................................................................328
Memory thread synchronization....................................................................329
Send value to ETM trace ................................................................................330
Trap ................................................................................................................331
Unpause .........................................................................................................332
11.10 XTYPE.................................................................................................................................333
11.10.1 XTYPE ALU................................................................................................................333
Absolute value word ......................................................................................334
Add and accumulate.......................................................................................335
Add doublewords ...........................................................................................337
Add halfword..................................................................................................339
Add or subtract doublewords with carry .......................................................341
Clip to unsigned..............................................................................................342
Logical doublewords ......................................................................................343
Logical-logical doublewords ...........................................................................345
Logical-logical words ......................................................................................346
Maximum words ............................................................................................349
Maximum doublewords .................................................................................350
Minimum words .............................................................................................351
Minimum doublewords..................................................................................352
Modulo wrap..................................................................................................353
Negate ............................................................................................................354
Round .............................................................................................................355
Subtract doublewords ....................................................................................358
Subtract and accumulate words.....................................................................359
Subtract halfword...........................................................................................360
Sign extend word to doubleword...................................................................362
Vector absolute value halfwords....................................................................363
Vector absolute value words..........................................................................364
Vector absolute difference bytes ...................................................................365
Vector absolute difference halfwords............................................................366
Vector absolute difference words..................................................................367
Vector add compare and select maximum bytes...........................................368
Vector add compare and select maximum halfwords ...................................369
Vector add halfwords .....................................................................................371
Vector add halfwords with saturate and pack to unsigned bytes..................373
Vector reduce add unsigned bytes.................................................................374
Vector reduce add halfwords .........................................................................376
Vector add bytes ............................................................................................378

80-N2040-53 Rev. AA 9
Qualcomm Hexagon V73 Programmer’s Reference Manual

Vector add words ...........................................................................................379


Vector average halfwords ..............................................................................380
Vector average unsigned bytes ......................................................................382
Vector average words ....................................................................................383
Vector clip to unsigned...................................................................................385
Vector conditional negate..............................................................................386
Vector maximum bytes ..................................................................................388
Vector maximum halfwords ...........................................................................389
Vector reduce maximum halfwords...............................................................390
Vector reduce maximum words .....................................................................392
Vector maximum words .................................................................................394
Vector minimum bytes...................................................................................395
Vector minimum halfwords............................................................................397
Vector reduce minimum halfwords ...............................................................398
Vector reduce minimum words......................................................................400
Vector minimum words..................................................................................402
Vector sum of absolute differences unsigned bytes ......................................403
Vector subtract halfwords..............................................................................405
Vector subtract bytes .....................................................................................407
Vector subtract words....................................................................................408
11.10.2 XTYPE BIT .................................................................................................................409
Count population ...........................................................................................411
Count trailing..................................................................................................412
Extract bit field ...............................................................................................413
Insert bit field .................................................................................................416
Interleave/deinterleave .................................................................................418
Linear feedback-shift iteration .......................................................................419
Masked parity.................................................................................................420
Bit reverse ......................................................................................................421
Set/clear/toggle bit ........................................................................................422
Split bit field ...................................................................................................424
Table index .....................................................................................................426
11.10.3 XTYPE COMPLEX ......................................................................................................428
Complex add/sub words ................................................................................431
Complex multiply ...........................................................................................433
Complex multiply real or imaginary ...............................................................437
Complex multiply with round and pack .........................................................439
Complex multiply 32 × 16...............................................................................441
Complex multiply real or imaginary 32-bit.....................................................443
Vector complex multiply real or imaginary ....................................................447
Vector complex conjugate .............................................................................450
Vector complex rotate ...................................................................................451
Vector reduce complex multiply by scalar .....................................................453

80-N2040-53 Rev. AA 10
Qualcomm Hexagon V73 Programmer’s Reference Manual

Vector reduce complex multiply by scalar with round and pack ...................456
Vector reduce complex rotate .......................................................................458
11.10.4 XTYPE FP ..................................................................................................................461
Floating point addition ...................................................................................461
Classify floating point value ...........................................................................462
Compare floating point value.........................................................................464
Convert floating point value to other format.................................................466
Convert integer to floating point value ..........................................................467
Convert floating point value to integer ..........................................................469
Floating point extreme value assistance ........................................................472
Floating point fused multiply-add ..................................................................473
Floating point fused multiply-add with scaling ..............................................474
Floating point reciprocal square root approximation ....................................475
Floating point fused multiply-add for library routines ...................................476
Create floating-point constant .......................................................................478
Floating point maximum ................................................................................479
Floating point minimum.................................................................................480
Floating point multiply ...................................................................................481
Floating point reciprocal approximation........................................................482
Floating point subtraction ..............................................................................483
11.10.5 XTYPE MPY...............................................................................................................484
Vector multiply word by signed half (32 × 16) ...............................................488
Vector multiply word by unsigned half (32 × 16) ...........................................492
Multiply signed halfwords..............................................................................496
Multiply unsigned halfwords..........................................................................503
Polynomial multiply words.............................................................................508
Vector reduce multiply word by signed half (32 × 16) ...................................510
Multiply and use upper result ........................................................................512
Multiply and use full result.............................................................................515
Vector dual multiply .......................................................................................517
Vector dual multiply with round and pack .....................................................520
Vector reduce multiply bytes .........................................................................522
Vector dual multiply signed by unsigned bytes..............................................524
Vector multiply even halfwords .....................................................................526
Vector multiply halfwords..............................................................................528
Vector multiply halfwords with round and pack............................................530
Vector multiply halfwords signed by unsigned ..............................................532
Vector reduce multiply halfwords..................................................................534
Vector multiply bytes .....................................................................................536
Vector polynomial multiply halfwords ...........................................................538
11.10.6 XTYPE PERM.............................................................................................................540
Saturate..........................................................................................................542
Swizzle bytes ..................................................................................................544

80-N2040-53 Rev. AA 11
Qualcomm Hexagon V73 Programmer’s Reference Manual

Vector align ....................................................................................................545


Vector round and pack ...................................................................................547
Vector saturate and pack ...............................................................................549
Vector saturate without pack.........................................................................552
Vector shuffle .................................................................................................555
Vector splat bytes...........................................................................................557
Vector splat halfwords ...................................................................................558
Vector splice...................................................................................................559
Vector sign extend .........................................................................................561
Vector truncate ..............................................................................................563
Vector zero extend.........................................................................................565
11.10.7 XTYPE PRED..............................................................................................................567
Compare byte.................................................................................................569
Compare half..................................................................................................571
Compare doublewords...................................................................................573
Compare bit mask ..........................................................................................574
Mask generate from predicate.......................................................................575
Check for TLB match.......................................................................................576
Predicate transfer...........................................................................................577
Test bit............................................................................................................578
Vector compare halfwords.............................................................................579
Vector compare bytes for any match.............................................................581
Vector compare bytes ....................................................................................582
Vector compare words...................................................................................584
Viterbi pack even and odd predicate bits.......................................................586
Vector mux .....................................................................................................587
11.10.8 XTYPE SHIFT .............................................................................................................588
Shift by immediate .........................................................................................589
Shift by immediate and accumulate...............................................................591
Shift by immediate and add ...........................................................................594
Shift by immediate and logical .......................................................................595
Shift right by immediate with rounding .........................................................598
Shift left by immediate with saturation .........................................................600
Shift by register ..............................................................................................601
Shift by register and accumulate....................................................................604
Shift by register and logical ............................................................................607
Shift by register with saturation.....................................................................610
Vector shift halfwords by immediate.............................................................611
Vector arithmetic shift halfwords with round................................................612
Vector arithmetic shift halfwords with saturate and pack.............................613
Vector shift halfwords by register..................................................................615
Vector shift words by immediate ...................................................................617
Vector shift words by register ........................................................................618

80-N2040-53 Rev. AA 12
Qualcomm Hexagon V73 Programmer’s Reference Manual

Vector shift words with truncate and pack ....................................................620


Instruction Index ............................................................................................................622
Intrinsics Index ...............................................................................................................650

80-N2040-53 Rev. AA 13
Qualcomm Hexagon V73 Programmer’s Reference Manual

Figures
Figure 1-1 Hexagon V73 processor architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
Figure 1-2 Vector instruction example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
Figure 1-3 Instruction classes and combinations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
Figure 1-4 Register field symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
Figure 2-1 General registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32
Figure 2-2 Control registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
Figure 3-1 Packet grouping combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51
Figure 4-1 Vector byte operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
Figure 4-2 Vector halfword operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
Figure 4-3 Vector word operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
Figure 4-4 64-bit shift and add/sub/logical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67
Figure 4-5 Vector halfword shift right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70
Figure 5-1 Hexagon processor byte order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .80
Figure 5-2 L2fetch instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .99
Figure 6-1 Vector byte compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110
Figure 6-2 Vector halfword compare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110
Figure 6-3 Vector mux instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111
Figure 7-1 Stack structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .115
Figure 10-1 Instruction packet encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .158

80-N2040-53 Rev. AA 14
Qualcomm Hexagon V73 Programmer’s Reference Manual

Tables
Table 1-1 Register symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
Table 1-2 Register bit field symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
Table 1-3 Instruction operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
Table 1-4 Data symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31
Table 2-1 General register aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
Table 2-2 General register pairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
Table 2-3 Aliased control registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36
Table 2-4 Control register pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37
Table 2-5 Loop registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38
Table 2-6 User status register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39
Table 2-7 Modifier registers (indirect auto-increment addressing) . . . . . . . . . . . . . . . . . . . . . .41
Table 2-8 Modifier registers (circular addressing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
Table 2-9 Modifier registers (bit-reversed addressing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42
Table 2-10 Predicate registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42
Table 2-11 Circular start registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
Table 2-12 User general pointer register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Table 2-13 Global pointer register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
Table 2-14 Cycle count registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44
Table 2-15 Frame limit register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44
Table 2-16 Frame key register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45
Table 2-17 Packet count registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45
Table 2-18 Qtimer registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46
Table 3-1 Instruction symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47
Table 3-2 Instruction classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48
Table 4-1 Single-precision multiply options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65
Table 4-2 Double precision multiply options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65
Table 4-3 Control register transfer instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71
Table 5-1 Memory alignment restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81
Table 5-2 Load instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81
Table 5-3 Store instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82
Table 5-4 Mem-ops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84
Table 5-5 Addressing modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84
Table 5-6 Offset ranges (global pointer relative) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .86
Table 5-7 Offset ranges (indirect with offset). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .87
Table 5-8 Increment ranges (indirect with auto-inc immediate) . . . . . . . . . . . . . . . . . . . . . . . .88
Table 5-9 Increment ranges (circular with auto-inc immediate). . . . . . . . . . . . . . . . . . . . . . . . .89
Table 5-10 Increment ranges (circular with auto-inc register) . . . . . . . . . . . . . . . . . . . . . . . . . . .91
Table 5-11 Addressing modes (conditional load/store) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
Table 5-12 Conditional offset ranges (indirect with offset) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94
Table 5-13 Cache instructions (user-level) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96
Table 5-14 Memory ordering instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100

80-N2040-53 Rev. AA 15
Qualcomm Hexagon V73 Programmer’s Reference Manual

Table 5-15 Atomic instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101


Table 6-1 Scalar predicate-generating instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104
Table 6-2 Vector mux instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111
Table 6-3 Predicate register instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113
Table 7-1 Stack registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117
Table 7-2 Stack instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .118
Table 8-1 Loop instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121
Table 8-2 Software pipelined loop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .125
Table 8-3 Software pipelined loop (using spNloop0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .126
Table 8-4 Software branch instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .127
Table 8-5 Jump instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .128
Table 8-6 Call instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .128
Table 8-7 Return instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .129
Table 8-8 Speculative jump instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .131
Table 8-9 Compare jump instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .133
Table 8-10 New-value compare jump instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134
Table 8-11 Register transfer jump instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .135
Table 8-12 Dual jump instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .135
Table 8-13 Jump hint instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .136
Table 8-14 Pause instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .137
Table 8-15 V73 exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .138
Table 9-1 V73 processor events symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .141
Table 10-1 Instruction fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .151
Table 10-2 Sub-instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .153
Table 10-3 Sub-instruction registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .154
Table 10-4 Duplex instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .155
Table 10-5 Duplex ICLASS field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .155
Table 10-6 Instruction class encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .157
Table 10-7 Loop packet encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .159
Table 10-8 Scaled immediate encoding (indirect offsets) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .160
Table 10-9 Constant extender encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .161
Table 10-10 Constant extender instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .162
Table 10-11 Instruction mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .165
Table 11-1 Instruction operand symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .166
Table 11-2 Instruction behavior symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .167

80-N2040-53 Rev. AA 16
1 Introduction

The Qualcomm Hexagon™ processor is a general-purpose digital signal processor designed for
high performance and low power across a wide variety of multimedia and modem applications.
V73 is a member of the sixth generation of the Hexagon processor architecture.

1.1 Hexagon V73 processor architecture

1.1.1 Memory
The Hexagon processor features a unified byte-addressable memory. This memory has a single
32-bit virtual address space, which holds both instructions and data. It operates in little-endian
mode.
The load/store architecture supports a complete set of addressing modes for both compiler code
generation and DSP application programming.

1.1.1.1 Cache memory


Memory accesses are cached or uncached. Separate L1 instruction and data caches exist for
program code and data. A unified L2 cache is partly or wholly configured as tightly-coupled
memory (TCM).

1.1.1.2 Virtual memory


Memory is addressed virtually, with virtual-to-physical memory mapping handled by a resident
OS. Virtual memory supports the implementation of memory management and memory
protection in a hardware-independent manner.

1.1.2 Registers
The Hexagon processor has two sets of registers: General registers and Control registers.
The general registers include thirty-two 32-bit registers (named R0 through R31), which are
accessed either as single registers or as aligned 64-bit register pairs. The general registers contain
all data, including pointer, scalar, vector, and accumulator data.
The control registers include special-purpose registers such as program counter, status register,
loop registers, and so on.

80-N2040-53 Rev. AA 17
Qualcomm Hexagon V73 Programmer’s Reference Manual Introduction

1.1.3 Instruction sequencer


The instruction sequencer processes packets of one to four instructions in each cycle. If a packet
contains more than one instruction, the instructions execute in parallel.
The instruction combinations allowed in a packet are limited to the instruction types that can
execute in parallel in the four execution units (shown in Figure 1-1).
Memory
(unified address space)

Load/ Load/
64
4x32 bit Store 64 Store
128 Instructions S3: X Unit
XTYPE Instructions
ALU32 Instructions
J Instructions
Sequencer CR Instructions

Packets of
1-4 instructions
S2: X Unit
XTYPE Instructions
ALU32 Instructions
J Instructions
JR Instructions
General
Control Registers
Hardware Loop Regs
Registers
Modifier Registers S1: Load/Store
Status Register Unit R0-R31
Program Counter LD Instructions
Predicate Registers ST Instructions
User General Pointer ALU32 Instructions
Global Pointer
Circular Start Registers

S0: Load/Store
Unit
LD Instructions
ST Instructions
ALU32 Instructions
MEMOP Instructions
NV Instructions
SYSTEM Instructions

Figure 1-1 Hexagon V73 processor architecture

1.1.4 Execution units


The dual execution units are identical: each includes a 64-bit shifter and a vector
multiply/accumulate unit with four 16 × 16 multipliers to support both scalar and vector
instructions.
These units also perform 32- and 64-bit ALU instructions, and jump and loop instructions.

NOTE: Each execution unit supports floating-point instructions.

1.1.5 Load/store units


The two load/store units can operate on signed or unsigned bytes, halfwords (16-bit), words (32-
bit), or double words (64-bit).
To increase the number of instruction combinations allowed in packets, the load units also
support 32-bit ALU instructions.

80-N2040-53 Rev. AA 18
Qualcomm Hexagon V73 Programmer’s Reference Manual Introduction

1.2 Instruction set


For the Hexagon processor to achieve large amounts of work per cycle, the instruction set is
designed with the following properties:
■ Static grouping (VLIW) architecture
■ Static fusing of simple dependent instructions
■ Extensive compound instructions
■ A large set of SIMD and application-specific instructions
To support efficient compilation, the instruction set is designed to be orthogonal with respect to
registers, addressing modes, and load/store access size.

1.2.1 Addressing modes


The Hexagon processor supports the following memory addressing modes:
■ 32-bit absolute
■ 32-bit absolute-set
■ Absolute with register offset
■ Global pointer relative
■ Indirect
■ Indirect with offset
■ Indirect with register offset
■ Indirect with auto-increment (immediate or register)
■ Circular with auto-increment (immediate or register)
■ Bit-reversed with auto-increment register
For example:
R2 = memw(##myvariable)
R2 = memw(R3=##myvariable)
R2 = memw(R4<<#3+##myvariable)
R2 = memw(GP+#200)
R2 = memw(R1)
R2 = memw(R3+#100)
R2 = memw(R3+R4<<#2)
R2 = memw(R3++#4)
R2 = memw(R0++M1)
R0 = memw(R2++#8:circ(M0))
R0 = memw(R2++I:circ(M0))
R2 = memw(R0++M1:brev)

Auto-increment with register addressing uses one of the two dedicated address-modify registers
M0 and M1 (which are part of the control registers).

NOTE: Atomic memory operations (load locked/store conditional) are supported to implement multi-
thread synchronization.

80-N2040-53 Rev. AA 19
Qualcomm Hexagon V73 Programmer’s Reference Manual Introduction

1.2.2 Program flow


The Hexagon processor supports zero-overhead hardware loops. For example:
loop0(start,#3) // loop 3 times
start:
{ R0 = mpyi(R0,R0) } :endloop0

The loop instructions support nestable loops, with few restrictions on their use.
Software branches use a predicated branch mechanism. Explicit compare instructions generate a
predicate bit, which is then tested by conditional branch instructions. For example:
P1 = cmp.eq(R2, R3)
if (P1) jump end

Jumps and subroutine calls are conditional or unconditional, and support both PC-relative and
register indirect addressing modes. For example:
jump end
jumpr R1
call function
callr R2

The subroutine call instructions store the return address in register R31. Subroutine returns are
performed using a jump indirect instruction through this register. For example:
jumpr R31 // Subroutine return

Two program flow instructions can be grouped into one packet.

1.2.3 Instruction pipeline


Pipeline hazards are resolved by the hardware: instruction scheduling is not constrained by
pipeline restrictions.

1.3 Technical assistance


For assistance or clarification on information in this document, submit a case to Qualcomm
Technologies, Inc. (QTI) at https://createpoint.qti.qualco For assistance or clarification on
information in this document, open a technical support case at https://support.qualcomm.com/.
You will need to register for a Qualcomm ID account and your company must have support
enabled to access our Case system.
Other systems and support resources are listed on https://qualcomm.com/support.
If you need further assistance, you can send an email to qualcomm.support@qti.qualcomm.com.

80-N2040-53 Rev. AA 20
2 Registers

General registers are used for general-purpose computation, including address generation, and
scalar and vector arithmetic.
Control registers support special-purpose processor features such as hardware loops and
predicates.

2.1 General registers


The Hexagon processor has thirty-two 32-bit general-purpose registers (named R0 through R31).
These registers store operands in all of the instructions:
■ Memory addresses for load/store instructions
■ Data operands for arithmetic/logic instructions
■ Vector operands for vector instructions
For example:
R1 = memh(R0) // Load from address R0
R4 = add(R2,R3) // Add
R28 = vaddh(R11,R10) // Vector add halfword

R3 R2 R1 R0

..
R3:2 R1:0

.
R31 R30 R29 R28

R31:30 R29:28

Figure 2-1 General registers

80-N2040-53 Rev. AA 21
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

Aliased registers
Three of the general registers – R29 through R31 – support subroutines (Section 8.3.2) and the
Software Stack. The subroutine and stack instructions implicitly modify the registers. They have
symbol aliases that indicate when these registers are accessed as subroutine and stack registers.
For example:
SP = add(SP, #-8) // SP is alias of R29
allocframe // Modifies SP (R29) and FP (R30)
call init // Modifies LR (R31)

Table 2-1 General register aliases


Register Alias Name Description
R29 SP Stack pointer Points to topmost element of stack in memory.
R30 FP Frame pointer Points to current procedure frame on stack.

Used by external debuggers to examine the stack


and determine call sequence, parameters, local
variables, and so on.
R31 LR Link register Stores return address of a subroutine call.

Register pairs
The general registers can be specified as register pairs that represent a single 64-bit register. For
example:
R1:0 = memd(R3) // Load doubleword
R7:6 = valignb(R9:8,R7:6, #2) // Vector align

NOTE: The first register in a register pair must always be odd-numbered, and the second must be the
next lower register.

Table 2-1 General register pairs


Register Register pair
R0 R1:0
R1
R2 R3:2
R3
R4 R5:4
R5
R6 R7:6
R7

R24 R25:24
R25
R26 R27:26
R27

80-N2040-53 Rev. AA 22
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

Table 2-1 General register pairs


Register Register pair
R28 R29:28
R29 (SP)
R30 (FP) R31:30 (LR:FP)
R31 (LR)

2.2 Control registers


The Hexagon processor includes a set of 32-bit control registers that provide access to processor
features such as the program counter, hardware loops, and vector predicates.
Unlike general registers, control registers are used as instruction operands only in the following
cases:
■ Instructions that require a specific control register as an operand
■ Register transfer instructions
For example:
R2 = memw(R0++M1) // Auto-increment addressing mode (M1)
R9 = PC // Get program counter (PC)
LC1 = R3 // Set hardware loop count (LC1)

NOTE: When a control register is used in a register transfer, the other operand must be a general
register.

80-N2040-53 Rev. AA 23
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

LC0 SA0 UPCYCLELO


Loop registers Cycle count registers
LC1 SA1 UPCYCLEHI

PC Program counter FRAMELIMIT Stack bounds register

USR User status register FRAMEKEY Stack smash register

M0 PKTCOUNTLO
Modifier registers Packet count registers
M1 PKTCOUNTHI

P3:0 Predicate registers UTIMERLO


Qtimer registers
UGP User general pointer UTIMERHI

GP Global pointer

CS0
Circular start registers
CS1

Figure 2-2 Control registers

80-N2040-53 Rev. AA 24
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

Aliased registers
The control registers have numeric aliases (C0 through C31).

Table 2-2 Aliased control registers


Register Alias Name
SA0 C0 Loop start address register 0
LC0 C1 Loop count register 0
SA1 C2 Loop start address register 1
LC1 C3 Loop count register 1
P3:0 C4 Predicate registers 3:0
reserved C5 –
M0 C6 Modifier register 0
M1 C7 Modifier register 1
USR C8 User status register
PC C9 Program counter
UGP C10 User general pointer
GP C11 Global pointer
CS0 C12 Circular start register 0
CS1 C13 Circular start register 1
UPCYCLELO C14 Cycle count register (low)
UPCYCLEHI C15 Cycle count register (high)
UPCYCLE C15:14 Cycle count register
FRAMELIMIT C16 Frame limit register
FRAMEKEY C17 Frame key register
PKTCOUNTLO C18 Packet count register (low)
PKTCOUNTHI C19 Packet count register (high)
PKTCOUNT C19:18 Packet count register
reserved C20-29 –
UTIMERLO C30 Qtimer register (low)
UTIMERHI C31 Qtimer register (high)
UTIMER C31:30 Qtimer register

NOTE: The control register numbers (0 through 31) specify the control registers in Instruction Encodings.

80-N2040-53 Rev. AA 25
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

Register pairs
The control registers can be specified as register pairs that represent a single 64-bit register.
Control registers specified as pairs must use their numeric aliases. For example:
C1:0 = R5:4 // C1:0 specifies the LC0/SA0 register pair

NOTE: The first register in a control register pair must always be odd-numbered, and the second must be
the next lower register.

Table 2-2 Control register pairs


Register Register pair
C0 C1:0
C1
C2 C3:2
C3
C4 C5:4
C5
C6 C7:6
C7

C30 C31:30
C31

2.2.1 Program counter


The program counter (PC) register points to the next instruction packet to execute (Section 3.3). It
is modified implicitly by instruction execution, but can be read directly. For example:
R7 = PC // Get program counter

NOTE: The PC register is read-only: writing to it has no effect.

80-N2040-53 Rev. AA 26
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

2.2.2 Loop registers


The Hexagon processor includes two sets of loop registers to support nested Hardware loops.
Each hardware loop is implemented with a pair of registers containing the loop count and loop
start address. The loop registers are modified implicitly by the loop instruction, but can also be
accessed directly. For example:
loop0(start, R4) // Modifies LC0 and SA0 (LC0=R4, SA0=&start)
LC1 = R22 // Set loop1 count
R9 = SA1 // Get loop1 start address

Table 2-3 Loop registers


Register Name Description
LC0, LC1 Loop count Number of loop iterations to execute.
SA0, SA1 Loop start address Address of first instruction in loop.

2.2.3 User status register


The user status register (USR) stores processor status and control bits that user programs can
access. The status bits contain the status results of certain instructions, while the control bits
contain user-settable processor modes for hardware prefetching. For example:
R9:8 = vaddw(R9:8, R3:2):sat // Vector add words
R6 = USR // Get saturation status

USR stores the following status and control values:


■ Cache prefetch enable
■ Cache prefetch status
■ Floating point modes
■ Floating point status
■ Hardware loop configuration (Section 8.2)
■ Sticky Saturation overflow

NOTE: A user control register transfer to USR cannot be grouped in an instruction packet with a Floating
point instruction.

When a transfer to USR changes the enable trap bits [29:25], an isync instruction (Section 5.11)
must execute before the new exception programming can take effect.

80-N2040-53 Rev. AA 27
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

Table 2-4 User status register


Name RW Bits Field Description
USR 32 User status register
R 31 PFA L2 prefetch active.
1: l2fetch instruction in progress
0: l2fetch finished (or inactive)
Set when nonblocking l2fetch instruction is prefetching
requested data.
Remains set until l2fetch prefetch operation completes
(or inactive).
R 30 reserved Return 0 if read.
Reserved for future expansion. To remain compatible
with future processor versions, software should always
write this field with the same value read from the field.
RW 29 FPINEE Enable trap on IEEE inexact.
RW 28 FPUNFE Enable trap on IEEE underflow.
RW 27 FPOVFE Enable trap on IEEE overflow.
RW 26 FPDBZE Enable trap on IEEE divide-by-zero.
RW 25 FPINVE Enable trap on IEEE invalid.
R 24 reserved Reserved
RW 23:22 FPRND Rounding mode for floating point instructions.
00: Round to nearest, ties to even (default)
01: Toward zero
10: Downward (toward negative infinity)
11: Upward (toward positive infinity)
R 21:20 reserved Return 0 if read.
Reserved for future expansion. To remain compatible
with future processor versions, software should always
write this field with the same value read from the field.
R 19:18 reserved Reserved
R 17 reserved Return 0 if read.
Reserved for future expansion. To remain compatible
with future processor versions, software should always
write this field with the same value read from the field.
RW 16:15 HFI L1 instruction prefetch.
00: Disable
01: Enable (1 line)
10: Enable (2 lines)

80-N2040-53 Rev. AA 28
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

Table 2-4 User status register (cont.)


Name RW Bits Field Description
RW 14:13 HFD L1 data cache prefetch.
Four levels are defined from disabled to aggressive.
Implementation defines how to interpret these levels.
00: disable
01: conservative
10: moderate
11: aggressive
RW 12 PCMME Enable packet counting in Monitor mode.
RW 11 PCGME Enable packet counting in Guest mode.
RW 10 PCUME Enable packet counting in User mode.
RW 9:8 LPCFGE Hardware loop configuration.
Number of loop iterations (0 to 3) remaining before
pipeline predicate should be set.
R 7:6 reserved Return 0 if read.
Reserved for future expansion. To remain compatible
with future processor versions, software should always
write this field with the same value read from the field.
RW 5 FPINPF Floating-point IEEE inexact sticky fag.
RW 4 FPUNFF Floating-point IEEE underflow sticky flag.
RW 3 FPOVFF Floating-point IEEE overflow sticky flag.
RW 2 FPDBZF Floating-point IEEE divide-by-zero sticky flag.
RW 1 FPINVF Floating-point IEEE invalid sticky flag.
RW 0 OVF Sticky saturation overflow.
1: saturation occurred
0: no saturation
Set when saturation occurs while executing an
instruction that specifies optional saturation.
Remains set until explicitly cleared by a USR = Rs
instruction.

80-N2040-53 Rev. AA 29
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

2.2.4 Modifier registers


The modifier registers (M0 to M1) are used in the following addressing modes.

Indirect auto-increment
In Indirect with auto-increment register addressing the modifier registers store a signed 32-bit
value that specifies the increment (or decrement) value.

Table 2-5 Modifier registers used in indirect auto-increment addressing


Register Name Description
M0, M1 Increment Signed auto-increment value.

Circular
In circular addressing (Section 5.8.10) the modifier registers store the circular buffer length and
related “I” values.

Table 2-6 Modifier registers as used in circular addressing


Name RW Bits Field Description
M0, M1 32 Circular buffer specifier.
RW 31:28 I[10:7] I value (MSB - see Section 5.8.11)
RW 27:24 0x0
RW 23:17 I[6:0] I value (LSB)
RW 16:0 Length Circular buffer length

Bit-reversed
In bit-reversed addressing (Section 5.8.12) the modifier registers store a signed 32-bit value that
specifies the increment (or decrement) value.

Table 2-7 Modifier registers as used in bit-reversed addressing


Register Name Description
M0, M1 Increment Signed auto-increment value.

2.2.5 Predicate registers


The predicate registers (P0 through P3) store the status results of the scalar and vector compare
instructions (Chapter 6). For example:
P1 = cmp.eq(R2, R3) // Scalar compare
if (P1) jump end // Jump to address (conditional)
R8 = P1 // Get compare status (P1 only)
P3:0 = R4 // Set compare status (P0-P3)

The four predicate registers can be specified as a register quadruple (P3:0) that represents a
single 32-bit register.

80-N2040-53 Rev. AA 30
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

NOTE: Unlike the other control registers, the predicate registers are only 8 bits wide because vector
compares return a maximum of eight status results.

Table 2-3 Predicate registers


Register Bits Description
P0, P1,P2, P3 8 Compare status results.
P3:0 32 Compare status results.
31:24 P3 register
23:16 P2 register
15:8 P1 register
7:0 P0 register

80-N2040-53 Rev. AA 31
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

2.2.6 Circular start registers


The circular start registers (CS0 through CS1) store the start address of a circular buffer in circular
addressing (Section 5.8.10). For example:
CS0 = R5 // Set circ start register
M0 = R7 // Set modifier register
R0 = memb(R2++#4:circ(M0)) // Load from circ buffer pointed
// to by CS0 with size/K vals in M0

Table 2-8 Circular start registers


Register Name Description
CS0, CS1 Circular start Circular buffer start address.

2.2.7 User general pointer register


The user general pointer (UGP) register is a general-purpose control register. For example:
R9 = UGP // Get UGP
UGP = R3 // Set UGP

NOTE: UGP typically stores the address of thread local storage.

Table 2-4 User general pointer register


Register Name Description
UGP User general pointer General-purpose control register.

2.2.8 Global pointer


The global pointer (GP) is used in GP-relative addressing. For example:
GP = R7 // Set GP
R2 = memw(GP+#200) // GP-relative load

Table 2-9 Global pointer register


Name R/W Bits Field Description
GP 32 Global pointer register
R/W 31:6 GDP Global data pointer (Section 5.8.4).
R 5:0 reserved Return 0 if read.
Reserved for future expansion. To remain forward-
compatible with future processor versions, software
should always write this field with the same value
read from the field.

80-N2040-53 Rev. AA 32
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

2.2.9 Cycle count registers


The cycle count registers (UPCYCLELO to UPCYCLEHI) store a 64-bit value containing the current
number of processor cycles executed since the Hexagon processor was last reset. For example:
R5 = UPCYCLEHI // Get cycle count (high)
R4 = UPCYCLELO // Get cycle count (low)
R5:4 = UPCYCLE // Get cycle count

NOTE: The RTOS must grant permission to access these registers. Without this permission, reading these
registers from user code returns zero.

Table 2-5 Cycle count registers


Register Name Description
UPCYCLELO Cycle count (low) Processor cycle count (low 32 bits)
UPCYCLEHI Cycle count (high) Processor cycle count (high 32 bits)
UPCYCLE Cycle count Processor cycle count (64 bits)

80-N2040-53 Rev. AA 33
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

2.2.10 Frame limit register


The frame limit register (FRAMELIMIT) stores the low address of the memory area reserved for
the software stack (Section 7.3.1). For example:
R9 = FRAMELIMIT // Get frame limit register
FRAMELIMIT = R3 // Set frame limit register

Table 2-10 Frame limit register


Register Name Description
FRAMELIMIT Frame limit Low address of software stack area.

2.2.11 Frame key register


The frame key register (FRAMEKEY) stores the key value that XOR-scrambles return addresses
when they are stored on the software stack (Section 7.3.2). For example:
R2 = FRAMEKEY // Get frame key register
FRAMEKEY = R1 // Set frame key register

Table 2-11 Frame key register


Register Name Description
FRAMEKEY Frame key Key to scramble return addresses
stored on software stack.

2.2.12 Packet count registers


The packet count registers (PKTCOUNTLO to PKTCOUNTHI) store a 64-bit value containing the
current number of instruction packets executed since a PKTCOUNT register was last written to.
For example:
R9 = PKTCOUNTHI // Get packet count (high)
R8 = PKTCOUNTLO // Get packet count (low)
R9:8 = PKTCOUNT // Get packet count

Packet counting can be configured to operate only in specific sets of processor modes (for
example, User mode only, or Guest and Monitor modes only). Bits [12:10] in the User status
register control the configuration for each mode.
Packets with exceptions are not counted as committed packets.

80-N2040-53 Rev. AA 34
Qualcomm Hexagon V73 Programmer’s Reference Manual Registers

NOTE: Each hardware thread has its own set of packet count registers.

The RTOS must grant permission to access these registers. Without this permission, reading these
registers from user code returns zero.

When a value is written to a PKTCOUNT register, the 64-bit packet count value is incremented
before the value is stored in the register.

Table 2-12 Packet count registers


Register Name Description
PKTCOUNTLO Packet count (low) Processor packet count (low 32 bits)
PKTCOUNTHI Packet count (high) Processor packet count (high 32 bits)
PKTCOUNT Cycle count Processor packet count (64 bits)

2.2.13 Qtimer registers


The Qtimer registers (UTIMERLO to UTIMERHI) provide access to the Qtimer global reference
count value. They enable Hexagon software to read the 64-bit time value without having to
perform an expensive advanced high-performance bus (AHB) load. For example:
R5 = UTIMERHI // Get Qtimer reference count (high)
R4 = UTIMERLO // Get Qtimer reference count (low)
R5:4 = UTIMER // Get Qtimer reference count

These registers are read-only – hardware automatically updates these registers to contain the
current Qtimer value.

NOTE: The RTOS must grant permission to access these registers. Without this permission, reading these
registers from user code returns zero.

Table 2-13 Qtimer registers


Register Name Description
UTIMERLO Qtimer (low) Qtimer global reference count (low 32 bits)
UTIMERHI Qtimer (high) Qtimer global reference count (high 32 bits)
UTIMER Qtimer Qtimer global reference count (64 bits)

80-N2040-53 Rev. AA 35
3 Instructions

Instruction encoding is described in Chapter 10. For detailed descriptions of the Hexagon
processor instructions, see Chapter 11.

3.1 Hexagon processor instruction syntax


NOTE: The notation described here does not appear in actual assembly language instructions. It is used
only to specify the instruction syntax and behavior.

Most Hexagon processor instructions have the following syntax:


dest = instr_name(source1,source2,...)[:option1][:option2]...

The item specified on the left-hand side (LHS) of the equation is assigned the value specified by
the right-hand side (RHS). For example:
R2 = add(R3,R1) // Add R3 and R1, assign result to R2
■ Courier font is used for instructions

■ Square brackets enclose optional items (for example, [:sat], means that saturation is
optional)
■ Braces indicate a choice of items (for example, {Rs,#s16}, means that either Rs or a signed
16-bit immediate can be used)

Table 3-1 Instruction symbols


Symbol Example Meaning
= R2 = R3 Assignment of RHS to LHS
# R1 = #1 Immediate constant value
## ##2147483647 32-bit immediate constant value
0x 0xBABE Hexadecimal number prefix
MEMxx R2 = MEMxx(R3) Access memory. xx specifies the size and
type of access.
; R2 = R3; R4 = R5; Instruction delimiter, or end of instruction
{…} {R2 = R3; R5 = R6} Instruction packet delimiter; indicates a
group of parallel instructions
(…) R2 = memw(R0 + #100) Source list delimiter
:endloopX :endloop0 Loop end
X specifies loop instruction (0 or 1)

80-N2040-53 Rev. AA 36
Qualcomm Hexagon V73 Programmer’s Reference Manual Instructions

Table 3-1 Instruction symbols (cont.)


Symbol Example Meaning
:t if (P0.new) jump:t target Direction hint (jump taken)
:nt if (!P1.new) jump:nt target Direction hint (jump not taken)
:sat R2 = add(R1,R2):sat Saturate result
:rnd R2 = mpy(R1.H,R2.H):rnd Round result
:carry R5:4=add(R1:0,R3:2,P1):carry Predicate used as carry input and output
:<<16 R2 = add(R1.L,R2.L):<<16 Shift result left by halfword
:mem_noshuf {memw(R5) = R2; Inhibit load/store reordering (Section 5.5)
R3 = memh(R6)}:mem_noshuf

3.1.1 Numeric operands


Table 3-2 lists the notation that describes numeric operands in the syntax and behavior of
instructions:

Table 3-2 Instruction operands


Symbol Meaning Min Max Example
#uN Unsigned N-bit immediate value 0 2N-1 –
#sN Signed N-bit immediate value -2N-1 2N-1-1 –
#mN Signed N-bit immediate value -(2N-1-1) 2N-1-1 –
#uN:S Unsigned N-bit immediate value representing 0 (2N-1) × 2S –
integral multiples of 2S in specified range
#sN:S Signed N-bit immediate value representing (-2N-1) × 2S (2N-1-1) × –
integral multiples of 2S in specified range
2S
#rN:S Same as #sN:S, but value is offset from PC (-2N-1) × 2S (2N-1-1) × –
of current packet
2S
usatN Saturate value to unsigned N-bit number 0 2N-1 usat16(Rs)

satN Saturate value to signed N-bit number -2N-1 2N-1-1 sat16(Rs)

sxt x->y Sign-extend value from x to y bits – – sxt32->64(Rs)

zxt x->y Zero-extend value from x to y bits – – zxt32->64(Rs)

>>> Logical right shift – – Rss >>> offset

The #uN, #sN, and #mN symbols specify immediate operands in instructions. The # symbol
appears in the actual instruction to indicate the immediate operand.
The #rN symbol specifies loop and branch destinations in instructions. The # symbol does not
appear in the actual instruction; instead, the entire #rN symbol (including its :S suffix) is
expressed as a loop or branch symbol whose numeric value is determined by the assembler and
linker. For example:

80-N2040-53 Rev. AA 37
Qualcomm Hexagon V73 Programmer’s Reference Manual Instructions

call my_proc // instructi


ple

The :S suffix indicates that the S least-significant bits in a value are implied zero bits and
therefore not encoded in the instruction. The implied zero bits are called scale bits.
For example, #s4:2 denotes a signed immediate operand represented by four bits encoded in the
instruction, and two scale bits. The possible values for this operand are -32, -28, -24, -20, -16, -12,
-8, -4, 0, 4, 8, 12, 16, 20, 24, and 28.
The ## symbol specifies a 32-bit immediate operand in an instruction (including a loop or branch
destination). The ## symbol indicates the operand in the actual instruction.
Examples of operand symbols:
Rd = add(Rs,#s16) // #s16 -> signed 16-bit imm value
Rd = memw(Rs++#s4:2) // #s4:2 -> scaled signed 4-bit imm value
call #r22:2 // #r22:2 -> scaled 22-bit PC-rel addr value
Rd = ##u32 // ##u32 -> unsigned 32-bit imm value

NOTE: When an instruction contains more than one immediate operand, the operand symbols are
specified in upper and lower case (for example, #uN and #UN) to indicate where they appear in
the instruction encodings

3.1.2 Terminology
Table 3-3 lists the symbols Hexagon processor instruction names use to specify the supported
data types.

Table 3-3 Data symbols


Size Symbol Type
8-bit B Byte
8-bit UB Unsigned byte
16-bit H Half word
16-bit UH Unsigned half word
32-bit W Word
32-bit UW Unsigned word
64-bit D Double word

80-N2040-53 Rev. AA 38
Qualcomm Hexagon V73 Programmer’s Reference Manual Instructions

3.1.3 Register operands


The following notation describes register operands in the syntax and behavior of instructions:
Rds[.elst]

The ds field indicates the register operand type and bit size (as defined in Table 3-4).

Table 3-4 Register symbols


Symbol Operand type Size (in bits)
d Destination 32
dd 64
s First source 32
ss 64
t Second source 32
tt 64
u Third source 32
uu 64
x Source and destination 32
xx 64

Examples of ds field (describing instruction syntax):


Rd = neg(Rs) // Rd -> 32-bit dest, Rs 32-bit source
Rd = xor(Rs,Rt) // Rt -> 32-bit second source
Rx = insert(Rs,Rtt) // Rx -> both source and dest

Examples of ds field (describing instruction behavior):


Rdd = Rss + Rtt // Rdd, Rss, Rtt -> 64-bit registers

The optional elst (element size and type) field specifies parts of a register when the register is
used as a vector. It can specify the following values:
■ A signed or unsigned byte, halfword, or word within the register (as defined in Figure 3-1)
■ A bit field within the register (as defined in Table 3-5)
Examples of elst field:
EA = Rt.h[1] // .h[1] -> bit field 31:16 in Rt
Pd = (Rss.u64 > Rtt.u64) // .u64 -> unsigned 64-bit value
Rd = mpyu(Rs.L,Rt.H) // .L/.H -> low/high 16-bit fields

NOTE: The control and predicate registers use the same notation as the general registers, but are written
as Cx and Px (respectively) instead of Rx.

80-N2040-53 Rev. AA 39
Qualcomm Hexagon V73 Programmer’s Reference Manual Instructions

Rds.elst
s, t, u 32-bit source register
d 32-bit register destination
x 32-bit register source/destination
ss, tt, uu 64-bit source register pair
dd 64-bit register destination
Rds.elst xx 64-bit register source/destination

.b[7] .b[6] .b[5] .b[4] .b[3] .b[2] .b[1] .b[0] Signed bytes

.ub[7] .ub[6] .ub[5] .ub[4] .ub[3] .ub[2] .ub[1] .ub[0] Unsigned bytes

.h[3] .h[2] .h[1] .h[0] Signed halfwords

.uh[3] .uh[2] .uh[1] .uh[0] Unsigned halfwords

.w[1] .w[0] Signed words

.uw[1] .uw[0] Unsigned words

Figure 3-1 Register field symbols

Table 3-5 Register bit field symbols


Symbol Meaning
.sN Bits [N-1:0] are treated as a N-bit signed number.
For example, R0.s16 means that the least significant 16-bits of R0
are treated as a 16-bit signed number.
.uN Bits [N-1:0] are treated as a N-bit unsigned number.
.H The most-significant 16 bits of a 32-bit register.
.L The least-significant 16 bits of a 32-bit register.

80-N2040-53 Rev. AA 40
Qualcomm Hexagon V73 Programmer’s Reference Manual Instructions

3.2 Instruction classes


The Hexagon processor instructions are assigned to specific instruction classes. Classes determine
the combinations of instructions that can be written in parallel (Section 3.3). presents an
overview of the instruction classes and how they can be grouped together.
Instruction classes logically correspond with instruction types, so they serve as mnemonics for
looking up specific instructions. For instance, the ALU32 class contains ALU instructions that
operate on 32-bit operands.

Table 3-6 Instruction classes and subclasses


Class Subclass Description
XTYPE – Various operations
XTYPE ALU 64-bit ALU operations
XTYPE BIT Bit operations
XTYPE COMPLEX Complex math (using real and
imaginary numbers)
XTYPE FP Floating point operations
XTYPE MPY Multiply operations
XTYPE PERM Vector permute and format conversion
(pack, splat, swizzle)
XTYPE PRED Predicate operations
XTYPE SHIFT Shift operations
(with optional ALU operations)
ALU32 – 32-bit ALU operations
ALU32 ALU Arithmetic and logical
ALU32 PERM Permute
ALU32 PRED Predicate operations
CR – Control register access, loops
JR – Jumps (register indirect addressing mode)
J – Jumps (PC-relative addressing mode)
LD – Memory load operations
MEMOP – Memory operations
NV – New-value operations
NV J New-value jumps
NV ST New-value stores
ST – Memory store operations;
allocate stack frame
SYSTEM – Operating system access
SYSTEM USER Application-level access

80-N2040-53 Rev. AA 41
Qualcomm Hexagon V73 Programmer’s Reference Manual Instructions

3.3 Instruction packets


Instructions can be grouped into very long instruction word (VLIW) packets for parallel execution,
with each packet containing from one to four instructions. Packets of varying length can be freely
mixed in a program.
Vector instructions operate on single instruction multiple data (SIMD) vectors.
Instruction packets must be explicitly specified in software. They are expressed in assembly
language by enclosing groups of instructions in curly braces.
For example, two instructions grouped in a packet:
{ R0 = R1; R2 = R3 }

Four instructions grouped in a packet:


{ R8 = memh(R3++#2);
R12 = memw(R1++#4);
R = mpy(R10,R6):<<1:sat;
R7 = add(R9,#2);
}

Packets have various restrictions on the allowable instruction combinations. The primary
restriction is determined by the instruction class of the instructions in a packet. In particular,
packet formation is subject to the following constraints:
■ Resource constraints determine how many instructions of a specific type can appear in a
packet. The Hexagon processor has a fixed number of execution units: each instruction
executes on a particular type of unit, and each unit can process at most one instruction at a
time. Thus, for example, because the Hexagon processor contains only two load units, an
instruction packet with three load instructions is invalid.
■ Grouping constraints are a small set of rules that apply above and beyond the resource
constraints.
■ Dependency constraints ensure that no write-after-write hazards exist in a packet.
■ Ordering constraints dictate the ordering of instructions within a packet.
■ Alignment constraints dictate the placement of packets in memory.

NOTE: The Hexagon processor executes individual instructions (which are not explicitly grouped in
packets) as packets containing a single instruction.

3.3.1 Packet execution semantics


Packets are defined to have parallel execution semantics. The execution behavior of a packet is
defined as follows:
■ First, instructions in the packet read their source registers in parallel.
■ Next, instructions in the packet execute.
■ Finally, instructions in the packet write their destination registers in parallel.

80-N2040-53 Rev. AA 42
Qualcomm Hexagon V73 Programmer’s Reference Manual Instructions

For example, consider the following packet:


{ R2 = R3; R3 = R2; }

In the first phase, registers R3 and R2 are read from the register file. Then, after execution, R2 is
written with the old value of R3 and R3 is written with the old value of R2. The result of this
packet is the swap of the values of R2 and R3.

NOTE: Dual stores, Dual jumps, New-value stores, New-value compare jumps, and Dot-new predicates
have non-parallel execution semantics.

3.3.2 Sequencing semantics


Packets of any length can freely mix in code. A packet is considered an atomic unit: in essence, a
single large instruction. From the program perspective, a packet either executes to completion or
not at all; it never partially executes. For example, if a packet causes a memory exception, the
exception point is established before the packet.
A packet containing multiple load/store instructions can require service from the external system.
For instance, consider a packet that performs two load operations that both miss in the cache.
The packet requires the memory system to supply the data:
■ From the memory system perspective, the two resulting load requests are processed serially.
■ From the program perspective, however, both load operations must complete before the
packet can complete.
Thus, the packet is atomic from the program perspective.
Packets have a single PC address, which is the address of the start of the packet. Branches cannot
be performed into the middle of a packet.
Architecturally, packets execute to completion – including updating all registers and memory –
before the next packet begins. As a result, application programs are not exposed to any pipeline
artifacts.

3.3.3 Resource constraints


A packet cannot use more hardware resources than are physically available on the processor. For
instance, because the Hexagon processor has only two load units, a packet with three load
instructions is invalid. The behavior of such a packet is undefined. The assembler automatically
rejects packets that oversubscribe the hardware resources.
The processor supports up to four parallel instructions. The instructions are executed in four
parallel pipelines, which are referred to as slots. The four slots are named Slot 0, Slot 1, Slot 2, and
Slot 3.

NOTE: The endloopN instructions (Section 8.2.2) do not use any slots.

Each instruction belongs to specific Instruction classes. For example, jumps belong to instruction
class J, while loads belong to instruction class LD. An instruction’s class determines which slot it
can execute in.

80-N2040-53 Rev. AA 43
Qualcomm Hexagon V73 Programmer’s Reference Manual Instructions

Figure 3-2 shows which instruction classes can be assigned to each of the four slots.
Slot 0 Slot 1 Slot 2 Slot 3
LD instructions LD instructions
XTYPE instructions XTYPE instructions
ST instructions ST instructions
ALU32 instructions ALU32 instructions
ALU32 instructions ALU32 instructions
J instructions J instructions
MEMOP instructions Some J instructions
JR instructions CR instructions
NV instructions
SYSTEM instructions
Some J instructions

XTYPE instructions (32/64-bit) J instructions


Arithmetic, logical, bit manipulation Jump/call PC-relative
Multiply (Integer, fractional, complex)
Floating-point operations
JR instructions
Permute / vector permute operations
Predicate operations Jump/call register
Shift / shift with add/sub/logical
Vector byte ALU LD instructions
Vector halfword (ALU, shift, multiply) Loads (8/16/32/64-bit)
Vector word (ALU, shift) Deallocframe

ALU32 instructions ST instructions


Arithmetic / logical (32-bit) Stores (8/16/32/64-bit)
Vector halfword Allocframe

CR instructions MEMOP instructions


Control-register transfers Operation on memory (8/16/32-bit)
Hardware loop setup
Predicate logicals & reductions SYSTEM instructions
Prefetch
NV instructions Cache maintenance
New-value jumps Bus operations
New-value stores

Figure 3-2 Packet grouping combinations

3.3.4 Grouping constraints


A small number of restrictions determines what constitutes a valid packet. The assembler ensures
that packets follow valid grouping rules. If a packet executes that violates a grouping rule, the
behavior is undefined. The following rules must be followed:
■ Dot-new conditional instructions (Section 6.1.4) must be grouped in a packet with an
instruction that generates dot-new predicates.
■ ST-class instructions can be placed in Slot 1. In this case Slot 0 normally must contain a second
ST-class instruction (Section 5.4).
■ J-class instructions can be placed in Slots 2 or 3. However, only certain combinations of
program flow instructions (J or JR) can be grouped together in a packet (Section 8.7).
Otherwise, at most one program flow instruction is allowed in a packet. Some Jump and
compare-Jump instructions can execute on slots 0 or 1, excluding calls, such as the following:
❒ Instructions of the form “Pd=cmp.xx(); if(Pd.new)jump:hint <target>”
❒ Instructions of the form “If(Pd[.new]) jump[:hint] <target>”
❒ The “jump<target>” instruction

80-N2040-53 Rev. AA 44
Qualcomm Hexagon V73 Programmer’s Reference Manual Instructions

■ JR-class instructions can be placed in Slot 2. However, when encoded in a duplex jumpr
instruction, R31 can be placed in Slot 0 (Section 10.3).
■ Restrictions limit the instructions that can appear in a packet at the setup or end of a
hardware loop (Section 8.2.4).
■ A user control register transfer to the control register USR cannot be grouped with a floating
point instruction (Section 2.2.3).
■ The SYSTEM-class instructions include prefetch, cache operations, bus operations, load
locked, and store conditional instructions (Section 5.10). These instructions have the
following grouping rules:
❒ The brkpt, trap, pause, icinva, isync, and syncht instructions are solo instructions. They
must not be grouped with other instructions in a packet.
❒ The memw_locked, memd_locked, l2fetch, and trace instructions must execute on Slot 0.
They must be grouped only with ALU32 or (non-FP) XTYPE instructions.
❒ The dccleana, dcinva, dccleaninva, and dczeroa instructions must execute on Slot 0. Slot 1
must be empty or an ALU32 instruction.

3.3.5 Dependency constraints


Instructions in a packet cannot write to the same destination register. The assembler
automatically flags such packets as invalid. If the processor executes a packet with two writes to
the same general register, an error exception is raised.
If the processor executes a packet that performs multiple writes to the same predicate or control
register, the behavior is undefined. Three special cases exist for this rule:
■ Conditional writes are allowed to target the same destination register only if at most one of
the writes is actually performed (Section 6.1.5).
■ The overflow flag in the status register has defined behavior when multiple instructions write
to it (Section 2.2.3). Do not group instructions that write to the entire user status register (for
example, USR = R2) in a packet with any instruction that writes to a bit in the user status
register.
■ Multiple compare instructions are allowed to target the same predicate register to perform a
logical AND of the results (Section 6.1.3).

3.3.6 Ordering constraints


In assembly code, instructions can appear in a packet in any order (with the exception of Dual
jumps. The assembler automatically encodes instructions in the packet in the proper order.
In the binary encoding of a packet, the instructions must be ordered from Slot 3 down to Slot 0. If
the packet contains less than four instructions, any unused slot is skipped – a NOP is unnecessary
as the hardware handles the proper spacing of the instructions.
In memory, instructions in a packet must appear in strictly decreasing slot order. Additionally, if an
instruction can go in a higher-numbered slot, and that slot is empty, it must be moved into the
higher-numbered slot.

80-N2040-53 Rev. AA 45
Qualcomm Hexagon V73 Programmer’s Reference Manual Instructions

For example, if a packet contains three instructions and slot 1 is not used, encode the instructions
in the packet as follows:
■ Slot 3 instruction at lowest address
■ Slot 2 instruction follows Slot 3 instruction
■ Slot 0 instructions at the last (highest) address
If a packet contains a single load or store instruction, that instruction must go in Slot 0, which is
the highest address. As an example, a packet containing both LD and ALU32 instructions must be
ordered so the LD is in Slot 0 and the ALU32 in another slot.

3.3.7 Alignment constraints


Packets have the following constraints on their placement or alignment in memory:
■ Packets must be word-aligned (32-bit). If the processor executes an improperly aligned
packet, it raises an error exception (Section 8.10).
■ Packets should not wrap the 4 GB address space. If address wraparound occurs, the processor
behavior is undefined.
No other core-based restrictions exist for code placement or alignment.
If the processor branches to a packet that crosses a 16-byte address boundary, the resulting
instruction fetch stalls for one cycle. Packets that are jump targets or loop body entries can be
explicitly aligned to ensure this does not occur (Section 8.3.5).

3.4 Instruction intrinsics


To support efficient coding of the time-critical sections of a program (without resorting to
assembly language), the C compilers support intrinsics that directly express Hexagon processor
instructions from within C code.
For example:
int main()
{
long long v1 = 0xFFFF0000FFFF0000LL;
long long v2 = 0x0000FFFF0000FFFFLL;
long long result;

// Find the minimum for each half-word in 64-bit vector


result = Q6_P_vminh_PP(v1,v2);
}

Intrinsics are defined for most of the Hexagon processor instructions.

80-N2040-53 Rev. AA 46
Qualcomm Hexagon V73 Programmer’s Reference Manual Instructions

3.5 Compound instructions


The Hexagon processor supports compound instructions, which encode pairs of common
operations in a single instruction. For example, each of the following is a single compound
instruction:
dealloc_return // Deallocate frame and return
R2 &= and(R1, R0) // And and and
R7 = add(R4, sub(#15, R3)) // Subtract and add
R3 = sub(#20, asl(R3, #16)) // Shift and subtract
R5 = add(R2, mpyi(#8, R4)) // Multiply and add
{ // Compare and jump
P0 = cmp.eq (R2, R5)
if (P0.new) jump:nt target
}
{ // Register transfer and jump
R2 = #15
jump target
}

Compound instructions reduce code size and improves code performance.

NOTE: Compound instructions (with the exception of X-and-jump, as shown above) have distinct
assembly syntax from the instructions they are composed of.

3.6 Duplex instructions


To reduce code size the Hexagon processor supports duplex instructions, which encode pairs of
common instructions in a 32-bit instruction container.
Unlike Compound instructions, duplex instructions do not have distinctive syntax – in assembly
code they appear identical to the instructions they are composed of. The assembler is responsible
for recognizing when a pair of instructions can be encoded as a single duplex rather than a pair of
regular instruction words.
To fit two instructions into a single 32-bit word, Duplexes are limited to a subset of the most
common instructions (load, store, branch, ALU), and the most common register operands.

80-N2040-53 Rev. AA 47
4 Data Processing

The Hexagon processor provides a rich set of operations for processing scalar and vector data.
Instructions can perform a wide variety of operations on fixed-point or floating-point data. The
fixed-point operations support scalar and vector data in a variety of sizes. The floating-point
operations support single-precision data.
This chapter presents an overview of the operations provided by the following Hexagon processor
instruction classes:
■ XTYPE – General-purpose data operations
■ ALU32 – Arithmetic/logical operations on 32-bit data

4.1 Data types


The Hexagon processor provides operations for processing the following data types.

4.1.1 Fixed-point data


The Hexagon processor provides operations to process 8-, 16-, 32-, or 64-bit fixed-point data. The
data is either integer or fractional, and in signed or unsigned format.

4.1.1.1 Scalar operations


The Hexagon processor includes the following scalar operations on fixed-point data:
■ Multiplication of 16-bit, 32-bit, and complex data
■ Addition and subtraction of 16-, 32-, and 64-bit data (with and without saturation)
■ Logical operations on 32- and 64-bit data (AND, OR, XOR, NOT)
■ Shifts on 32- and 64-bit data (arithmetic and logical)
■ Min/max, negation, absolute value, parity, norm, swizzle
■ Compares of 8-, 16-, 32-, and 64-bit data
■ Sign and zero extension (8- and 16- to 32-bit, 32- to 64-bit)
■ Bit manipulation
■ Predicate operations

80-N2040-53 Rev. AA 48
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.1.1.2 Vector operations


The Hexagon processor includes the following vector operations on fixed-point data:
■ Multiplication (halfwords, word by half, vector reduce, dual multiply)
■ Addition and subtraction of word and halfword data
■ Shifts on word and halfword data (arithmetic and logical)
■ Min/max, average, negative average, absolute difference, absolute value
■ Compares of word, halfword, and byte data
■ Reduce, sum of absolute differences on unsigned bytes
■ Special-purpose data arrangement (such as pack, splat, shuffle, align, saturate, splice,
truncate, complex conjugate, complex rotate, zero extend)

NOTE: Certain vector operations support automatic scaling, saturation, and rounding.

For example, the following instruction performs a vector operation:


R1:0 += vrmpyh(R3:2,R5:4)

It is defined to perform the following operations in one cycle:


R1:0 += ((R2.L * R4.L) +
(R2.H * R4.H) +
(R3.L * R5.L) +
(R3.H * R5.H)
)

Figure 4-1 shows a schematic of this instruction type.


Rss

Rtt

* * * *
32 32
32 32

Add

64

Rdd
64-bit register pair

Figure 4-1 Vector instruction example

80-N2040-53 Rev. AA 49
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.1.2 Floating-point data


The Hexagon processor provides operations to process 32-bit floating-point numbers. The
numbers are stored in IEEE single-precision floating-point format.
Per the IEEE standard, certain floating-point values are defined to represent positive or negative
infinity, as well as Not-a-Number (NaN), which represents values that have no mathematical
meaning.
Floating-point numbers can be held in a general register.

4.1.2.1 Floating-point operations


The Hexagon processor includes the following operations on floating-point data:
■ Addition and subtraction
■ Multiplication (with optional scaling)
■ Min/max/compare
■ Reciprocal/square root approximation
■ Format conversion

4.1.3 Complex data


The Hexagon processor provides operations to process 32- or 64-bit complex data.
Complex numbers include a signed real portion and a signed imaginary portion. Given two
complex numbers (a + bi) and (c + di), the complex multiply operations computes both the real
portion (ac - bd) and the imaginary portion (ad + bc) in a single instruction.
Complex numbers can be packed in a general register or register pair. When packed, the
imaginary portion occupies the most-significant portion of the register or register pair.

4.1.4 Vector data


The Hexagon processor provides operations to process 64-bit vector data.
Vector data types pack multiple data items – bytes, halfwords, or words – into 64-bit registers.
Vector data operations are common in video and image processing.
Eight 8-bit bytes can be packed into a 64-bit register.

80-N2040-53 Rev. AA 50
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

Rss

Rtt

Op Op Op Op Op Op Op Op

Rdd

Figure 4-1 Vector byte operation example

Four 16-bit halfword values can be packed in a single 64-bit register pair.

Rss

Rtt

Op Op Op Op

Rdd

Figure 4-2 Vector halfword operation example

Two 32-bit word values can be packed in a single 64-bit register pair.

Rss

Rtt

Op Op

Rdd

Figure 4-3 Vector word operation example

80-N2040-53 Rev. AA 51
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.2 Instruction options


Some instructions support optional scaling, saturation, and rounding. There are no mode bits
controlling these options – instead, they are explicitly specified as part of the instruction name.
The options are described in this section.

4.2.1 Fractional scaling


In fractional data format, data is treated as fixed-point fractional values whose range is
determined by the word length and radix point position.
Fractional scaling is specified in an instruction by adding the:<<1 specifier. For example:
R3:2 = cmpy(R0,R1):<<1:sat

When two fractional numbers are multiplied, the product must be scaled to restore the original
fractional data format. The Hexagon processor allows specification of the fractional scaling of the
product in the instruction for shifts of 0 and 1. Perform a shift of 1 for Q1.15 numbers, perform a
shift of 0 for integer multiplication.

4.2.2 Saturation
Certain instructions are available in saturating form. If a saturating arithmetic instruction has a
result which is smaller than the minimum value, the result is set to the minimum value. Similarly,
if the operation has a result which is greater than the maximum value, the result is set to the
maximum value.
Saturation is specified in an instruction by adding the:sat specifier. For example:
R2 = abs(R1):sat

The open virtualization format (OVF) bit in the User status register is set whenever a saturating
operation saturates to the maximum or minimum value. It remains set until explicitly cleared by a
control register transfer to USR. For vector-type saturating operations, if any of the individual
elements of the vector saturate, OVF is set.

4.2.3 Arithmetic rounding


Certain signed multiply instructions support optional arithmetic rounding (also known as biased
rounding). The arithmetic rounding operation takes a double precision fractional value and adds
0x8000 to the low 16-bits (least significant 16-bit halfword).
Rounding is specified in an instruction by adding the :rnd specifier. For example:
R2 = mpy(R1.h,R2.h):rnd

NOTE: Arithmetic rounding can accumulate numerical errors, especially when the number to round is
exactly 0.5. This happens most frequently when dividing by 2 or averaging.

80-N2040-53 Rev. AA 52
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.2.4 Convergent rounding


To address the problem of error accumulation in Convergent rounding, the Hexagon processor
includes four instructions that support positive and negative averaging with a convergent
rounding option.
These instructions work as follows:
1. Compute (A + B) or (A - B) for AVG and NAVG respectively.
2. Based on the two least-significant bits of the result, add a rounding constant as follows:
❒ If the two LSBs are 00, add 0
❒ If the two LSBs are 01, add 0
❒ If the two LSBs are 10, add 0
❒ If the two LSBs are 11, add 1
3. Shift the result right by one bit.

4.2.5 Scaling for divide and square-root


On the Hexagon processor, floating point divide and square-root operations are implemented in
software using library functions. To enable the efficient implementation of these operations, the
processor supports special variants of the multiply-accumulate instruction, named scale FMA.
Scale FMA supports optional scaling of the product generated by the floating-point fused
multiply-add instruction.
Scaling is specified in the instruction by adding the :scale specifier and a predicate register
operand. For example:
R3 += sfmpy(R0,R1,P2):scale

For single precision, the scaling factor is two raised to the power specified by the contents of the
predicate register (which is treated as an 8-bit two's complement value). For double precision, the
predicate register value is doubled before use as a power of two.

NOTE: Do not use scale FMA instructions outside of divide and square-root library routines. No
guarantee is provided that future versions of the Hexagon processor will implement these
instructions using the same semantics. Future versions assume only that compatibility for scale
FMA is limited to the needs of divide and square-root library routines.

80-N2040-53 Rev. AA 53
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.3 XTYPE operations


The XTYPE instruction class includes most of the data-processing operations performed by the
Hexagon processor. These operations are categorized by their operation type.

4.3.1 ALU
XTYPE ALU operations modify 8-, 16-, 32-, and 64-bit data. These operations include:
■ Add and subtract with and without saturation
■ Add and subtract with accumulate
■ Absolute value
■ Logical operations
■ Min, max, negate instructions
■ Register transfers of 64-bit data
■ Word to doubleword sign extension
■ Comparisons

4.3.2 Bit manipulation


XTYPE BIT manipulation operations modify bit fields in a register or register pair. These operations
include:
■ Bit field insert
■ Bit field signed and unsigned extract
■ Count leading and trailing bits
■ Compare bit masks
■ Set/clear/toggle bit
■ Test bit operation
■ Interleave/deinterleave bits
■ Bit reverse
■ Split bit field
■ Masked parity and linear feedback shift
■ Table index formation

80-N2040-53 Rev. AA 54
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.3.3 Complex
XTYPE COMPLEX operations manipulate complex numbers. These operations include:
■ Complex add and subtract
■ Complex multiply with optional round and pack
■ Vector complex multiply
■ Vector complex conjugate
■ Vector complex rotate
■ Vector reduce complex multiply real or imaginary

4.3.4 Floating point


XTYPE FP operations manipulate single-precision floating point numbers. These operations
include:
■ Addition and subtraction
■ Multiplication (with optional scaling)
■ Min/max/compare
■ Format conversion
The Hexagon floating-point operations are defined to support the IEEE floating-point standard.
However, certain IEEE-required operations – such as divide and square root – are not supported
directly. Instead, special instructions are defined to support the implementation of the required
operations as library routines. These instructions include:
■ A special version of the fused multiply-add instruction (designed specifically for use in library
routines)
■ Reciprocal/square root approximations (which compute the approximate initial values used in
reciprocal and reciprocal-square-root routines)
■ Extreme value assistance (which adjusts input values if they cannot produce correct results
using convergence algorithms)

NOTE: The special floating-point instructions are not intended for use directly in user code – use the
special floating-point instructions only in the floating point library.

Format conversion

The floating-point conversion instructions sfmake and dfmake convert an unsigned 10-bit
immediate value into the corresponding floating-point value.
The immediate value must be encoded so bits [5:0] contain the significand, and bits [9:6] the
exponent. The exponent value is added to the initial exponent value (bias - 6).
For example, to generate the single-precision floating point value 2.0, bits [5:0] must be set to 0,
and bits [9:6] set to 7. Performing the sfmake operation on this immediate value yields the
floating point value 0x40000000, which is 2.0.

80-N2040-53 Rev. AA 55
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

NOTE: The conversion instructions are designed to handle common floating point values, including most
integers and many basic fractions (1/2, 3/4, and so on).

Rounding

The Hexagon User status register includes the FPRND field, which specifies the IEEE-defined
floating-point rounding mode.

Exceptions

The Hexagon user status register includes five status fields, which work as sticky flags for the five
IEEE-defined exception conditions: inexact, overflow, underflow, divide by zero, and invalid. A
sticky flag is set when the corresponding exception occurs, and remains set until explicitly
cleared.
The user status register also includes five mode fields which specify whether to perform an
operating-system trap if one of the floating-point exceptions occur. For every instruction packet
containing a floating-point operation, if a floating-point sticky flag and the corresponding trap-
enable bit are both set, a floating-point trap is generated. After the packet commits, the Hexagon
processor then automatically traps to the operating system.

NOTE: Non-floating-point instructions never generate a floating-point trap, regardless of the state of the
sticky flag and trap-enable bits.

4.3.5 Multiply
Multiply operations support fixed-point multiplication, including both single- and double-
precision multiplication, and polynomial multiplication.

Single precision

In single-precision arithmetic a 16-bit value is multiplied by another 16-bit value. These operands
can come from the high portion or low portion of any register. Depending on the instruction, the
result of the 16 × 16 operation can optionally be accumulated, saturated, rounded, or shifted left
by 0 to 1 bits.
The instruction set supports operations on signed × signed, unsigned × unsigned, and signed ×
unsigned data.

80-N2040-53 Rev. AA 56
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

Table 4-1 summarizes the options available for 16 × 16 single precision multiplications. The
symbols used in the table are as follows:
■ SS – Perform signed × signed multiply
■ UU – Perform unsigned × unsigned multiply
■ SU – Perform signed × unsigned multiply
■ A+ – Result added to accumulator
■ A- – Result subtracted from accumulator
■ 0 – Result not added to accumulator

Table 4-1 Single-precision multiply options


Multiply Result Sign Accumulate Sat Rnd Scale
16 × 16 32 SS A+, A- Yes No 0-1

16 × 16 32 SS 0 Yes Yes 0-1

16 × 16 64 SS A+, A- No No 0-1

16 × 16 64 SS 0 No Yes 0-1

16 × 16 32 UU A+, A-, 0 No No 0-1

16 × 16 64 UU A+, A-, 0 No No 0-1

16 × 16 32 SU A+, 0 Yes No 0-1

Double precision

Double precision instructions are available for both 32 × 32 and 32 × 16 multiplication:


■ For 32 × 32 multiplication the result is either 64 or 32 bits. The 32-bit result is either the high
or low portion of the 64-bit product.
■ For 32 × 16 multiplication the result is always taken as the upper 32 bits.
The operands are either signed or unsigned.

Table 4-2 Double precision multiply options


Multiply Result Sign Accumulate Sat Rnd Scale
32 × 32 64 SS, UU A+, A-, 0 No No 0

32 × 32 32 (upper) SS, UU 0 No Yes 0

32 × 32 32 (low) SS, UU A+, 0 No No 0

32 × 16 32 (upper) SS, UU A+, 0 Yes Yes 0-1

32 × 32 32 (upper) SU 0 No No 0

80-N2040-53 Rev. AA 57
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

Polynomial

Polynomial XTYPE MPY instructions are available for both words and vector halfwords.
These instructions are useful for many algorithms including scramble code generation,
cryptographic algorithms, convolutional, and Reed Solomon code.

4.3.6 Permute
XTYPE PERM operations perform various operations on vector data, including arithmetic, format
conversion, and rearrangement of vector elements. Many types of conversions are supported:
■ Swizzle bytes
■ Vector shuffle
■ Vector align
■ Vector saturate and pack
■ Vector splat bytes
■ Vector splice
■ Vector sign extend halfwords
■ Vector zero extend bytes
■ Vector zero extend halfwords
■ Scalar saturate to byte, halfword, word
■ Vector pack high and low halfwords
■ Vector round and pack
■ Vector splat halfwords

4.3.7 Predicate
XTYPE PRED operations modify predicate source data. The categories of instructions available
include:
■ Vector mask generation
■ Predicate transfers
■ Viterbi packing

80-N2040-53 Rev. AA 58
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.3.8 Shift
Scalar XTYPE SHIFT operations perform a variety of 32 and 64-bit shifts followed by an optional
add/sub or logical operation. Figure 4-4 shows the general operation.

Rss

# / Rt

Shift
amount

64-bit shifter

64-bit add/sub/logical

Rxx

Figure 4-4 64-bit shift and add/sub/logical

Four shift types are supported:


■ ASR – Arithmetic shift right
■ ASL – Arithmetic shift left
■ LSR – Logical shift right
■ LSL – Logical shift left
In register-based shifts, the Rt register is a signed two’s-complement number. If this value is
positive, the instruction opcode tells the direction of shift (right or left). If this value is negative,
the shift direction indicated by the opcode is reversed.
When arithmetic right shifts are performed, the sign bit is shifted in, whereas logical right shifts
shift in zeros. Left shifts always shift in zeros.
Some shifts are available with saturation and rounding options.

80-N2040-53 Rev. AA 59
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.4 ALU32 operations


The ALU32 instruction class includes general arithmetic/logical operations on 32-bit data:
■ Add, subtract, negate without saturation on 32-bit data
■ Logical operations such as AND, OR, XOR, AND with immediate, and OR with immediate
■ Scalar 32-bit compares
■ Combine halfwords, combine words, combine with immediates, shift halfwords, and Mux
■ Conditional add, combine, logical, subtract, and transfer.
■ NOP
■ Sign and zero-extend bytes and halfwords
■ Transfer immediates and registers
■ Vector add, subtract, and average halfwords

NOTE: ALU32 instructions can execute on any slot (Section 3.3.3).

Chapter 6 describes the conditional execution and compare instructions.

4.5 Vector operations


Vector operations support arithmetic operations on vectors of bytes, halfwords, and words.
The vector operations belong to the XTYPE instruction class (except for vector add, subtract, and
average halfwords, which are ALU32).

Vector byte operations


The vector byte operations process packed vectors of signed or unsigned bytes. They include the
following operations:
■ Vector add and subtract signed or unsigned bytes
■ Vector min and max signed or unsigned bytes
■ Vector compare signed or unsigned bytes
■ Vector average unsigned bytes
■ Vector reduce add unsigned bytes
■ Vector sum of absolute differences unsigned bytes

80-N2040-53 Rev. AA 60
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

Vector halfword operations

The vector halfword operations process packed 16-bit halfwords. They include the following
operations:
■ Vector add and subtract halfwords
■ Vector average halfwords
■ Vector compare halfwords
■ Vector min and max halfwords
■ Vector shift halfwords
■ Vector dual multiply
■ Vector dual multiply with round and pack
■ Vector multiply even halfwords with optional round and pack
■ Vector multiply halfwords
■ Vector reduce multiply halfwords
For example, Figure 4-5 shows the operation of the vector arithmetic shift right halfword (vasrh)
instruction. In this instruction, each 16-bit half-word is shifted right by the same amount which is
specified in a register or with an immediate value. Because the shift is arithmetic, the bits shifted
in are copies of the sign bit.

Shift Amount Rt / #u4

lost lost lost lost Rss

s/z ext s/z ext s/z ext s/z ext Rdd

Figure 4-5 Vector halfword shift right

Vector word operations


The vector word operations process packed vectors of two words. They include the following
operations:
■ Vector add and subtract words
■ Vector average words
■ Vector compare words
■ Vector min and max words
■ Vector shift words with optional truncate and pack

80-N2040-53 Rev. AA 61
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

For more information on vector operations see Section 11.1.1 and Section 11.10.1.

4.6 CR operations
The CR instruction class includes operations that access the Control registers.

Table 4-3 Control register transfer instructions


Syntax Operation
Rd = Cs Move control register to / from a general register.
Cd = Rs
NOTE: PC is not a valid destination register.
Rdd = Css Move control register pair to / from a general register pair.
Cdd = Rss
NOTE: PC is not a valid destination register.

NOTE: In register-pair transfers, control registers must be specified using their numeric alias names – see
Section 2.2 for details.

4.7 Compound operations


The instruction set includes a number of instructions that perform multiple logical or arithmetic
operations in a single instruction. They include the following operations:
■ AND/OR with inverted input
■ Compound logical register
■ Compound logical predicate
■ Compound add-subtract with immediates
■ Compound shift-operation with immediates (arithmetic or logical)
■ Multiply-add with immediates
For more information see Section 11.10.1.

80-N2040-53 Rev. AA 62
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.8 Special operations


The instruction set includes a number of special-purpose instructions to support specific
applications.

4.8.1 H.264 CABAC processing


H.264/AVC is adopted in a diverse range of multimedia applications:
■ HD-DVDs
■ HDTV broadcasting
■ Internet video streaming
Context Adaptive Binary Arithmetic Coding (CABAC) is one of the two alternative entropy coding
methods specified in the H.264 main profile. CABAC offers superior coding efficiency at the
expense of greater computational complexity. The Hexagon processor includes a dedicated
instruction (decbin) to support CABAC decoding.
Binary arithmetic coding is based on the principle of recursive interval subdivision, and its state is
characterized by two quantities:
■ The current interval range
■ The current offset in the current code interval
The offset is read from the encoded bit stream. When decoding a bin, the interval range is
subdivided in two intervals based on the estimation of the probability pLPS of least probable
symbol (LPS): one interval with width of rLPS = range x pLPS, and another with width of rMPS =
range x pMPS = range -rLPS, where MPS stands for most probable symbol.
Depending on which subinterval the offset falls into, the decoder decides whether the bin is
decoded as MPS or LPS, after which the two quantities are iteratively updated, as shown in
Figure 4-1.
MPS occurs LPS occurs

rLPS rLPS
range

range

rMPS rMPS
offset
offset

rangeNew = rMPS rangeNew = rLPS


offsetNew = offset offsetNew = offset-rMPS

Figure 4-1 Arithmetic decoding for one bin

80-N2040-53 Rev. AA 63
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.8.1.1 CABAC implementation


In H.264 range is a 9-bit quantity, and offset is 9 bits in regular mode and 10 bits in bypass mode
during the whole decoding process. The calculation of rLPS is approximated by a 64 × 4 table of
256 bytes, where the range and the context state (selected for the bin to decode) address the
lookup table. To maintain the precision of the whole decoding process, the new range must be
renormalized to ensure that the most significant bit is always 1, and that the offset is
synchronously refilled from the bit stream.
To simplify the renormalization/refilling process, the decoding scheme shown in Figure 4-2
significantly reduces the frequency of renormalization and refilling bits from the bit-stream, while
also being suitable for DSP implementation.
range: 32 bit register 18=29-bitpos
0000 0000 0001 xxxx xxxx 0000 0000 0000
bitpos=11 9 12=23-bitpos
offset: 32 bit register
0000 0000 000x xxxx xxxx xxxx xxxx xxxx
Decode Decision
(ctxIdx, range, offset)

bitpos=Count_leading_zeros(range)
rLPS=lutLPS[ctxIdx->state][(range>>(29-bitpos))&3]<<(23-bitpos)
rMPS=range-rLPS

Yes offset >= rMPS No

bin = !ctxIdx->valMPS
bin = ctxIdx->valMPS
range = rLPS
range = rMPS
offset = offset - rMPS

ctxIdx->state == 0 Yes

ctxIdx->valMPS=!ctxIdx->valMPS
No

ctxIdx->state = ctxIdx->state =
TransIndexLPS(ctxIdx->state) TransIndexMPS(ctxIdx->state)

Renormalization1
(range, offset)

Done

Figure 4-2 CABAC decoding engine for regular bin

80-N2040-53 Rev. AA 64
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

The Hexagon processor can use the decbin instruction to decode one regular bin in two cycles
(not counting the bin refilling process).
For more information on the decbin instruction see Section 11.10.6.
For example:
Rdd = decbin(Rss,Rtt)

INPUT: Rss and Rtt register pairs as:


Rtt.w1[5:0] = state
Rtt.w1[8] = valMPS
Rtt.w0[4:0] = bitpos
Rss.w0 = range
Rss.w1 = offset

OUTPUT: Rdd register pair is packed as


Rdd.w0[5:0] = state
Rdd.w0[8] = valMPS
Rdd.w0[31:23] = range
Rdd.w0[22:16] = '0'
Rdd.w1 = offset (normalized)

OUTPUT: P0
P0 = (bin)

4.8.1.2 Code example


H264CabacGetBinNC:
/****************************************************************
* Non-conventional call:
* Input: R1:0 = offset : range , R2 = dep, R3 = ctxIdx,
* R4 = (*ctxIdx), R5 = bitpos
*
* Return:
* R1: 0 - offset : range
* P0 - (bin)
*****************************************************************/

// Cycle #1
{ R1:0= decbin(R1:0,R5:4) // Decoding one bin
R6 = asl(R22,R5) // Where R22 = 0x100
}

// Cycle #2
{ memb(R3) = R0 // Save context to *ctxIdx
R1:0 = vlsrw(R1:0,R5) // Re-align range and offset
P1 = cmp.gtu(R6,R1) // Need refill? i.e., P1= (range<0x100)
IF (!P1.new) jumpr:t LR // Return
}
RENORM_REFILL:
...

80-N2040-53 Rev. AA 65
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.8.2 IP Internet checksum


The key features of the Internet checksum1 include:
■ The checksum can be summed in any order
■ Carries can be accumulated using an accumulator larger than size being added, and added
back in at any time
Using standard data-processing instructions, the Internet checksum can be computed at 8 bytes
per cycle in the main loop, by loading words and accumulating into doublewords. After the loop,
the upper word is added to the lower word; then the upper halfword is added to the lower
halfword, and any carries are added back in.
The Hexagon processor supports a dedicated instruction (vradduh) that computes the Internet
checksum at a rate of 16 bytes per cycle.
The vradduh instruction accepts the halfwords of the two input vectors, adds them all together,
and places the result in a 32-bit destination register. This operation can both compute the sum of
16 bytes of input while preserving the carries, and accumulate carries at the end of computation.
For more information on the vradduh instruction, see Vector reduce add halfwords.

NOTE: This operation utilizes the maximum load bandwidth available in the Hexagon processor.

1 See RFC 1071 (http://www.faqs.org/rfcs/rfc1071.html)

80-N2040-53 Rev. AA 66
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.8.2.1 Code example


.text
.global fast_ip_check
// Assumes data is 8-byte aligned
// Assumes data is padded at least 16 bytes afterwords with 0's.
// input R0 points to data
// input R1 is length of data
// returns IP checksum in R0

fast_ip_check:
{
R1 = lsr(R1,#4) // 16-byte chunks, rounded down, +1
R9:8 = combine(#0,#0)
R3:2 = combine(#0,#0)
}
{
loop0(1f,R1)
R7:6 = memd(R0+#8)
R5:4 = memd(R0++#16)
}
.falign
1:
{
R7:6 = memd(R0+#8)
R5:4 = memd(R0++#16)
R2 = vradduh(R5:4,R7:6) // Accumulate 8 halfwords
R8 = vradduh(R3:2,R9:8) // Accumulate carries
}:endloop0
// Drain pipeline
{
R2 = vradduh(R5:4,R7:6)
R8 = vradduh(R3:2,R9:8)
R5:4 = combine(#0,#0)
}
{
R8 = vradduh(R3:2,R9:8)
R1 = #0
}
// May have some carries to add back in
{
R0 = vradduh(R5:4,R9:8)
}
// Possible for one more to pop out
{
R0 = vradduh(R5:4,R1:0)
}
{
R0 = not(R0)
jumpr LR
}

80-N2040-53 Rev. AA 67
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.8.3 Software-defined radio


The Hexagon processor includes six special-purpose instructions which support the
implementation of software-defined radio. The instructions greatly accelerate the following
algorithms.

4.8.3.1 Rake despreading


A fundamental operation in despreading is the PN multiply operation. In this operation the
received complex chips are compared against a pseudo-random sequence of QAM constellation
points and accumulated.
Figure 4-3 shows the vrcrotate instruction that performs this operation. The products are
summed to form a soft 32-bit complex symbol. The instruction has both accumulating and non-
accumulating versions.
xx += vrcrotate(Rss,Rt,#0)

Rt

1 j -1 -j 1 j -1 -j 1 j -1 -j 1 j -1 -j

mux mux mux mux

Im3 Re3 Im2 Re2 Im1 Re1 Im0 Re0 Rs

* * * *

+ +

I R Rxx

Figure 4-3 Vector reduce complex rotate

For more information on the vrcrotate instruction, see Vector reduce complex rotate.

NOTE: Using this instruction the Hexagon processor can process 5.3 chips per cycle, and a 12-finger
WCDMA user requires only 15 MHz.

80-N2040-53 Rev. AA 68
Qualcomm Hexagon V73 Programmer’s Reference Manual Data Processing

4.8.3.2 Polynomial operations


The polynomial multiply instructions support the following operations:
■ Scramble code generation (at a rate of 8 symbols per cycle for WCDMA)
■ Cryptographic algorithms (such as elliptic curve)
■ CRC checks (at a rate of 21 bits per cycle)
■ Convolutional encoding
■ Reed-Solomon codes
The four versions of this instruction support 32 × 32 and vector 16 × 16 multiplication both with
and without accumulation, as shown in Figure 4-4.

Rxx += vpmpyh(Rs,Rt)
Rxx += pmpyw(Rs,Rt)

Rs

Rs

Rt

Rt
16 × 16
carryless 16 × 16

32 × 32
carryless
polynomial
multiply * * carryless
polynomial

*
multiply
polynomial
multiply

XOR XOR

XOR

Rxx Rxx

Figure 4-4 Polynomial multiply

For more information on the pmpy instructions, see Polynomial multiply words.

80-N2040-53 Rev. AA 69
5 Memory

The Hexagon processor features a load/store architecture, where numeric and logical instructions
operate on registers. Explicit load instructions move operands from memory to registers, while
store instructions move operands from registers to memory. A small number of instructions
(known as mem-ops) perform numeric and logical operations directly on memory.
The address space is unified: all accesses target the same linear address space, which contains
both instructions and data.

5.1 Memory model


This section describes the memory model for the Hexagon processor.

5.1.1 Address space


The Hexagon processor has a 32-bit byte-addressable memory address space. The entire 4G
linear address space is addressable by the user application. A virtual-to-physical address
translation mechanism is provided.

5.1.2 Byte order


The Hexagon processor is a little-endian machine: the lowest address byte in memory is held in
the least significant byte of a register, as shown in Figure 5-1.

Address Contents
0 A Register Contents
1 B
31 0
2 C
- - - A Load byte
3 D

4 E - - B A Load Halfword

5 F
D C B A Load word
6 G 63

7 H H G F E D C B A Load doubleword

Figure 5-1 Hexagon processor byte order

80-N2040-53 Rev. AA 70
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

5.1.3 Alignment
Even though the Hexagon processor memory is byte-addressable, instructions and data must be
aligned in memory on specific address boundaries:
■ Instructions and instruction packets must be 32-bit aligned
■ Data must be aligned to its native access size.
Any unaligned memory access causes a memory-alignment exception.
Use the Permute instructions in applications that must reference unaligned vector data. The loads
and stores still must be memory-aligned; however, the permute instructions enable easy
rearrangement of the data in registers.

Table 5-1 Memory alignment restrictions


Data type Size (bits) Exception when
Byte 8 Never
Unsigned byte
Halfword 16 LSB[0] != 0 1
Unsigned halfword
Word 32 LSB[1:0] != 00
Unsigned word
Doubleword 64 LSB[2:0] != 000
Instruction 32 LSB[1:0] != 00
Instruction packet
1 LSB = Least significant bits of address

80-N2040-53 Rev. AA 71
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

5.2 Memory loads


Memory can be loaded in byte, halfword, word, or doubleword sizes. The data types supported
are signed or unsigned. The syntax used is memXX, where XX denotes the data type.

Table 5-2 Load instructions


Source Destination Data
Syntax Comment
size (bits) size (bits) placement
Rd = memub(Rs) 8 32 Low 8 bits Zero-extend 8 to 32 bits
Rd = memb(Rs) 8 32 Low 8 bits Sign-extend 8 to 32 bits
Rd = memuh(Rs) 16 32 Low 16 bits Zero-extend 16 to 32 bits
Rd = memh(Rs) 16 32 Low 16 bits Sign-extend 16 to 32 bits
Rd = memubh(Rs) 16 32 Bytes 0 and 2 Bytes 1 and 3 zeroed 1
Rd = membh(Rs) 16 32 Bytes 0 and 2 Bytes 1 and 3 sign-extended
Rd = memw(Rs) 32 32 All 32 bits Load word
Rdd = memubh(Rs) 32 64 Bytes 0,2,4,6 Bytes 1,3,5,7 zeroed
Rdd = membh(Rs) 32 64 Bytes 0,2,4,6 Bytes 1,3,5,7 sign-extended
Rdd = memd(Rs) 64 64 All 64 bits Load doubleword
Ryy = memh_fifo(Rs) 16 64 High 16 bits Shift vector and load halfword
deallocframe 64 64 All 64 bits See Chapter 7
dealloc_return 64 64 All 64 bits See Chapter 7
1 The memubh and membh instructions load contiguous bytes from memory (either 2 or 4 bytes) and unpack these bytes
into a vector of halfwords. The instructions are useful when bytes are used as input into halfword vector operations,
which is common in video and image processing..

NOTE: The memory load instructions belong to instruction class LD, and can execute only in slots 0 or 1.

80-N2040-53 Rev. AA 72
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

5.3 Memory stores


Memory can be stored in byte, halfword, word, or doubleword sizes. The syntax used is memX,
where X denotes the data type.

Table 5-3 Store instructions


Source Destination
Syntax size size Comment
(bits) (bits)
memb(Rs) = Rt 32 8 Store byte (bits 7:0)
memb(Rs) = #s8 8 8 Store byte
memh(Rs) = Rt 32 16 Store lower half (bits 15:0)
memh(Rs) = Rt.H 32 16 Store upper half (bits 31:16)
memh(Rs) = #s8 8 16 Sign-extend 8 to 16 bits
memw(Rs) = Rt 32 32 Store word
memw(Rs) = #s8 8 32 Sign-extend 8 to 32 bits
memd(Rs) = Rtt 64 64 Store doubleword
allocframe(#u11) 64 64 See Chapter 7

NOTE: The memory store instructions belong to instruction class ST, and can execute only in slot 0 or –
when part of a dual store – slot 1.

80-N2040-53 Rev. AA 73
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

5.4 Dual stores


Two memory store instructions can appear in the same instruction packet. The resulting
operation is considered a dual store. For example:
{
memw(R5) = R2 // Dual store
memh(R6) = R3
}

Unlike most packetized operations, dual stores do not execute in parallel (Section 3.3.1). Instead,
the store instruction in Slot 1 effectively executes first, followed by the store instruction in Slot 0.

NOTE: The store instructions in a dual store must belong to instruction class ST, and can execute only in
Slots 0 and 1.

5.5 Slot 1 store with slot 0 load


A slot 1 store operation with a slot 0 load operation can appear in a packet. The packet attribute
:mem_noshuf inhibits the instruction reordering that is otherwise done by the assembler. For
example:
{
memw(R5) = R2 // Slot 1 store
R3 = memh(R6) // Slot 0 load
}:mem_noshuf

Unlike most packetized operations, these memory operations do not execute in parallel
(Section 3.3.1). Instead, the store instruction in Slot 1 effectively executes first, followed by the
load instruction in Slot 0. If the addresses of the two operations are overlapping, the load receives
the newly stored data.

5.6 New-value stores


A memory store instruction can store a register that is assigned a new value in the same
instruction packet (Section 3.3). This feature is expressed in assembly language by appending the
suffix “.new” to the source register. For example:
{
R2 = memh(R4+#8) // load halfword
memw(R5) = R2.new // store newly-loaded value
}

New-value store instructions have the following restrictions:


■ If an instruction uses auto-increment or absolute-set addressing mode (Section 5.8), its
address register cannot be used as the new-value register.
■ If an instruction produces a 64-bit result, its result registers cannot be used as the new-value
register.
■ If the instruction that sets a new-value register is conditional (Section 6.1.2), it must always
execute.

80-N2040-53 Rev. AA 74
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

NOTE: The new-value store instructions belong to instruction class NV, and can execute only in Slot 0.

5.7 Mem-ops
Mem-ops perform basic arithmetic, logical, and bit operations directly on memory operands,
without the need for a separate load or store. Mem-ops can be performed on byte, halfword, or
word sizes.

Table 5-4 Mem-ops


Syntax Operation
memXX(Rs+#u6) [+-|&] = Rt Arithmetic/logical on memory
memXX(Rs+#u6) [+-] = #u5 Arithmetic on memory
memXX(Rs+#u6) = clrbit(#u5) Clear bit in memory
memXX(Rs+#u6) = setbit(#u5) Set bit in memory

NOTE: The mem-op instructions belong to instruction class MEMOP, and can execute only in slot 0.

5.8 Addressing modes

Table 5-5 Addressing modes

Mode Syntax Operation 1


Absolute memXX(##address) EA = address
Absolute-set memXX(Re=##address) EA = address
Re = address
Absolute with register offset memXX(Ru<<#u2+##U32) EA = imm + (Ru << #u2)
Global pointer relative memXX(GP+#immediate) EA = GP + immediate
memXX(#immediate)
Indirect memXX(Rs) EA = Rs
Indirect with offset memXX(Rs+#s11) EA = Rs + imm
Indirect with register offset memXX(Rs+Ru<<#u2) EA = Rs + (Ru << #u2)
Indirect with auto-increment memXX(Rx++#s4) EA = Rx;
immediate Rx += (imm)
Indirect with auto-increment memXX(Rx++Mu) EA = Rx;
register Rx += Mu
Circular with auto-increment memXX(Rx++#s4:circ(Mu)) EA = Rx;
immediate Rx = circ_add(Rx,imm,Mu)
Circular with auto-increment memXX(Rx++I:circ(Mu)) EA = Rx;
register Rx = circ_add(Rx,I,Mu)
Bit-reversed with auto-increment memXX(Rx++Mu:brev) EA = Rx.H + bit_reverse(Rx.L)
register Rx += Mu
1 EA (effective address) is equivalent to VA (virtual address).

80-N2040-53 Rev. AA 75
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

5.8.1 Absolute
The absolute addressing mode uses a 32-bit constant value as the effective memory address. For
example:
R2 = memw(##100000) // Load R2 with word from addr 100000
memw(##200000) = R4 // Store R4 to word at addr 200000

5.8.2 Absolute-set
The absolute-set addressing mode assigns a 32-bit constant value to the specified general
register, then uses the assigned value as the effective memory address. For example:
R2 = memw(R1=##400000) // Load R2 with word from addr 400000
// and load R1 with value 400000
memw(R3=##600000) = R4 // Store R4 to word at addr 600000
// and load R3 with value 600000

5.8.3 Absolute with register offset


The absolute with register offset addressing mode performs an arithmetic left shift of a 32-bit
general register value by the amount specified in a 2-bit unsigned immediate value, and then
adds the shifted result to an unsigned 32-bit constant value to create the 32-bit effective memory
address. For example:
R2 = memh(R3 << #3 + ##100000) // load R2 with signed halfword
// from addr [100000 + (R3 << 3)]

The 32-bit constant value is the base address, and the shifted result is the byte offset.

NOTE: This addressing mode is useful for loading an element from a global table, where the immediate
value is the name of the table, and the register holds the index of the element.

5.8.4 Global pointer relative


The global pointer relative addressing mode adds an unsigned offset value to the Hexagon
processor global data pointer GP to create the 32-bit effective memory address. This addressing
mode accesses global and static data in C.
Global pointer relative addresses can be expressed two ways in assembly language:
■ By explicitly adding an unsigned offset value to register GP
■ By specifying only an immediate value as the instruction operand
For example:
R2 = memh(GP+#100) // Load R2 with signed halfword
// from [GP + 100 bytes]

R3 = memh(#2000) // Load R3 with signed halfword

// from [GP + #2000 - _SDA_BASE]

80-N2040-53 Rev. AA 76
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

Specifying only an immediate value causes the assembler and linker to automatically subtract the
value of the special symbol _SDA_BASE_ from the immediate value, and use the result as the
effective offset from GP.
The global data pointer is programmed in the GDP field of register GP (Section 2.2.8). This field
contains an unsigned 26-bit value that specifies the most significant 26 bits of the 32-bit global
data pointer. The least significant 6 bits of the pointer are always defined as zero.
The memory area referenced by the global data pointer is known as the global data area. It can be
up to 512 KB in length, and – because of the way the global data pointer is defined – must be
aligned to a 64-byte boundary in virtual memory.
When expressed in assembly language, the offset values used in global pointer relative addressing
always specify byte offsets from the global data pointer. The offsets must be integral multiples of
the size of the instruction data type.

Table 5-6 Offset ranges (global pointer relative)


Offset must be
Data type Offset range
multiple of
doubleword 0 ... 524280 8
word 0 ... 262140 4
halfword 0 ... 131070 2
byte 0 ... 65535 1

NOTE: When using global pointer relative addressing, the immediate operand should be a symbol in the
.sdata or .sbss section to ensure that the offset is valid.

80-N2040-53 Rev. AA 77
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

5.8.5 Indirect
The indirect addressing mode uses a 32-bit value stored in a general register as the effective
memory address. For example:
R2 = memub(R1) // load R2 with unsigned byte from addr R1

5.8.6 Indirect with offset


The indirect with offset addressing mode adds a signed offset value to a general register value to
create the 32-bit effective memory address. For example:
R2 = memh(R3 + #100) // load R2 with signed halfword
// from [R3 + 100 bytes]

When expressed in assembly language, the offset values always specify byte offsets from the
general register value. The offsets must be integral multiples of the size of the instruction data
type.

Table 5-7 Offset ranges (indirect with offset)


Offset must be
Data type Offset range
multiple of
doubleword -8192 ... 8184 8
word -4096 ... 4092 4
halfword -2048 ... 2046 2
byte -1024 ... 1023 1

NOTE: The offset range is smaller for conditional instructions (Section 5.9).

5.8.7 Indirect with register offset


The indirect with register offset addressing mode adds a 32-bit general register value to the result
created by performing an arithmetic left shift of a second 32-bit general register value by the
amount specified in a 2-bit unsigned immediate value, forming the 32-bit effective memory
address. For example:
R2 = memh(R3+R4<<#1) // load R2 with signed halfword
// from [R3 + (R4 << 1)]

The register values always specify byte addresses.

80-N2040-53 Rev. AA 78
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

5.8.8 Indirect with auto-increment immediate


The indirect with auto-increment immediate addressing mode uses a 32-bit value stored in a
general register to specify the effective memory address. However, after the address is accessed,
a signed value (known as the increment) is added to the register so it specifies a different memory
address (which is accessed in a subsequent instruction). For example:
R2 = memw(R3++#4) // R3 contains the effective address
// R3 is then incremented by 4

When expressed in assembly language, the increment values always specify byte offsets from the
general register value. The offsets must be integral multiples of the size of the instruction data
type.

Table 5-8 Increment ranges (indirect with auto-increment immediate)


Increment Increment must be
Data type
range multiple of
doubleword -64 ... 56 8
word -32 ... 28 4
halfword -16 ... 14 2
byte -8 ... 7 1

5.8.9 Indirect with auto-increment register


The indirect with auto-increment register addressing mode is functionally equivalent to indirect
with auto-increment immediate, but uses a modifier register Mx (Section 2.2.4) instead of an
immediate value to hold the increment. For example:
R2 = memw(R0++M1) // The effective addr is the value of R0.
// Next, M1 is added to R0 and the result
// is stored in R0.

When auto-incrementing with a modifier register, the increment is a signed 32-bit value which is
added to the general register. This offers two advantages over auto-increment immediate:
■ A larger increment range
■ Variable increments (since the modifier register can be programmed at runtime)
The increment value always specifies a byte offset from the general register value.

NOTE: The signed 32-bit increment range is identical for all instruction data types (doubleword, word,
halfword, byte).

80-N2040-53 Rev. AA 79
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

5.8.10 Circular with auto-increment immediate


The circular with auto-increment immediate addressing mode is a variant of indirect with auto-
increment addressing – it accesses data buffers in a modulo wrap-around fashion. Circular
addressing is commonly used in data stream processing.
Circular addressing is expressed in assembly language with the address modifier “:circ(Mx)”,
where Mx specifies a modifier register that is programmed to specify the circular buffer
(Section 2.2.4). For example:
R0 = memb(R2++#4:circ(M0)) // Load from R2 in circ buf specified
// by M0
memw(R2++#8:circ(M1)) = R0 // Store to R2 in circ buf specified
// by M1

Circular addressing is set up by programming the following elements:


■ The Length field of the Mx register is set to the length (in bytes) of the circular buffer to
access. A circular buffer can be from 4 to (128K-1) bytes long.
■ Bits 27:24 of the Mx register are always set to 0.
■ The circular start register CSx that corresponds to Mx (CS0 for M0, CS1 for M1) is set to the
start address of the circular buffer.
In circular addressing, after memory is accessed at the address specified in the general register,
the general register is incremented by the immediate increment value and then modulo’d by the
circular buffer length to implement wrap-around access of the buffer.
When expressed in assembly language, the increment values always specify byte offsets from the
general register value. The offsets must be integral multiples of the size of the instruction data
type.

Table 5-9 Increment ranges (circular with auto-increment immediate addressing)


Increment Increment must be
Data type
range multiple of
doubleword -64 ... 56 8
word -32 ... 28 4
halfword -16 ... 14 2
byte -8 ... 7 1

When programming a circular buffer, the following rules apply:


■ The start address must be aligned to the native access size of the buffer elements.
■ ABS(Increment) < Length. The absolute value of the increment must be less than the buffer
length.
■ Access size < (Length-1). The memory access size (1 for byte, 2 for halfword, 4 for word, 8 for
doubleword) must be less than (Length-1).
■ Buffers must not wrap around in the 32-bit address space.

NOTE: If any of these rules are not followed, the execution result is undefined.

80-N2040-53 Rev. AA 80
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

The following example sets up and accesses a 150-byte circular buffer:


R4.H = #0 // M0[27:24]= 0x0
R4.L = #150 // length = 150
M0 = R4
R2 = ##cbuf // start addr = cbuf
CS0 = R2
R0 = memb(R2++#4:circ(M0)) // Load byte from circ buf
// specified by M0/CS0
// inc R2 by 4 after load
// wrap R2 around if >= 150

The following C function describes the behavior of the circular add function:
unsigned int
fcircadd(unsigned int pointer, int offset,
unsigned int M_reg, unsigned int CS_reg)
{
unsigned int length;
int new_pointer, start_addr, end_addr;

length = (M_reg&0x01ffff); // Lower 17 bits gives buffer size


new_pointer = pointer+offset;
start_addr = CS_reg;
end_addr = CS_reg + lenth;
if (new_pointer >= end_addr) {
new_pointer -= length;
} else if (new_pointer < start_addr) {
new_pointer += length;
}
return (new_pointer);
}

5.8.11 Circular with auto-increment register


The circular with auto-increment register addressing mode is functionally equivalent to circular
with auto-increment immediate, but uses a register instead of an immediate value to hold the
increment.
Register increments are specified in circular addressing instructions by using the symbol I as the
increment (instead of an immediate value). For example:
R0 = memw(R2++I:circ(M1)) // Load byte with incr of I*4 from
// circ buf specified by M1/CS1

When auto-incrementing with a register, the increment is a signed 11-bit value that is added to
the general register. This offers two advantages over circular addressing with immediate
increments:
■ Larger increment ranges
■ Variable increments (since the increment register can be programmed at runtime)
The circular register increment value is programmed in the I field of the modifier register Mx
(Section 2.2.4) as part of setting up the circular data access. This register field holds the signed 11-
bit increment value.

80-N2040-53 Rev. AA 81
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

Increment values are expressed in units of the buffer element data type, and are automatically
scaled at runtime to the proper data access size.

Table 5-10 Increment ranges (circular with auto-increment register addressing)


Increment Increment must be
Data type
range multiple of
doubleword -8192 ... 8184 8
word -4096 ... 4092 4
halfword -2048 ... 2046 2
byte -1024 ... 1023 1

When programming a circular buffer (with either a register or immediate increment), all the rules
that apply to circular addressing must be followed – for details see Section 5.8.10.

NOTE: If any of these rules are not followed, the execution result is undefined.

5.8.12 Bit-reversed with auto-increment register


The bit-reversed with auto-increment register addressing mode is a variant of indirect with auto-
increment addressing – it accesses data buffers using an address value which is the bit-wise
reversal of the value stored in the general register. Bit-reversed addressing is used in fast Fourier
transforms (FFT) and Viterbi encoding.
The bit-wise reversal of a 32-bit address value is defined as follows:
■ The lower 16 bits are transformed by exchanging bit 0 with bit 15, bit 1 with bit 14, and so on.
■ The upper 16 bits remain unchanged.
Bit-reversed addressing is expressed in assembly language with the address modifier “:brev”. For
example:
R2 = memub(R0++M1:brev) // The address is (R0.H | bitrev(R0.L))
// The orginal R0 (not reversed) is added
// to M1 and written back to R0

The initial values for the address and increment must be set in bit-reversed form, with the
hardware bit-reversing the bit-reversed address value to form the effective address.
The buffer length for a bit-reversed buffer must be an integral power of 2, with a maximum length
of 64K bytes.
To support bit-reversed addressing, buffers must be properly aligned in memory. A bit-reversed
buffer is properly aligned when its starting byte address is aligned to a power of 2 greater than or
equal to the buffer size (in bytes). For example:
int bitrev_buf[256] __attribute__((aligned(1024)));

The bit-reversed buffer declared above is aligned to 1024 bytes because the buffer size is 1024
bytes (256 integer words × 4 bytes), and 1024 is an integral power of 2.

80-N2040-53 Rev. AA 82
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

The buffer location pointer for a bit-reversed buffer must be initialized so the least-significant 16
bits of the address value are bit-reversed.
The increment value must be initialized to the following value:
bitreverse(buffer_size_in_bytes / 2)

...where bitreverse is defined as bit-reversing the least-significant 16 bits while leaving the
remaining bits unchanged.

NOTE: To simplify the initialization of the bit-reversed pointer, bit-reversed buffers can be aligned to a
64K byte boundary. This initializes the bit-reversed pointer to the base address of the bit-reversed
buffer, with no bit-reversing required for the least-significant 16 bits of the pointer value (which
are set to 0 by the 64K alignment).

Because buffers allocated on the stack only have an alignment of 8 bytes or less, in most cases bit-
reversed buffers should not be declared on the stack.
After a bit-reversed memory access is complete, the general register is incremented by the
register increment value. The value in the general register is never affected by the bit-reversal
that is performed as part of the memory access.

NOTE: The Hexagon processor supports only register increments for bit-reversed addressing – it does not
support immediate increments.

5.9 Conditional load/stores


Some load and store instructions can be executed conditionally based on predicate values which
were set in a previous instruction. The compiler generates conditional loads and stores to
increase instruction-level parallelism.
Conditional loads and stores are expressed in assembly language with the instruction prefix “if
(pred_expr)”, where pred_expr specifies a predicate register expression (Section 6.1). For
example:
if (P0) R0 = memw(R2) // Conditional load
if (!P2) memh(R3 + #100) = R1 // Conditional store
if (P1.new) R3 = memw(R3++#4) // Conditional load

Not all addressing modes are supported in conditional loads and stores. Table 5-11 shows which
modes are supported.
Table 5-11 Addressing modes (conditional load/store)
Addressing mode Conditional
Absolute Yes
Absolute-set No
Absolute with register offset No

80-N2040-53 Rev. AA 83
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

Table 5-11 Addressing modes (conditional load/store)


Addressing mode Conditional
Global pointer relative No
Indirect Yes
Indirect with offset Yes
Indirect with register offset Yes
Indirect with auto-increment immediate Yes
Indirect with auto-increment register No
Circular with auto-increment immediate No
Circular with auto-increment register No
Bit-reversed with auto-increment register No

When a conditional load or store instruction uses Indirect with offset addressing mode, the offset
range is smaller than the range normally defined for indirect-with-offset addressing.

Table 5-12 Conditional and normal offset ranges (indirect with offset addressing)
Offset range Offset range Offset must be
Data type
(conditional) (normal) multiple of
doubleword 0 ... 504 -8192 ... 8184 8
word 0 ... 252 -4096 ... 4092 4
halfword 0 ... 126 -2048 ... 2046 2
byte 0 ... 63 -1024 ... 1023 1

NOTE: For more information on conditional execution see Chapter 6.

5.10 Cache memory


The Hexagon processor has a cache-based memory architecture:
■ A level 1 instruction cache holds recently-fetched instructions.
■ A level 1 data cache holds recently-accessed data memory.
Load/store operations that access memory through the level 1 caches are referred to as cached
accesses.
Load/stores that bypass the level 1 caches are referred to as uncached accesses.
The memory management unit (MMU) of the Hexagon processor can configure specific memory
areas to perform cached or uncached accesses. The operating system is responsible for
programming the MMU.
Two types of caching are supported (as cache modes):
■ Write-through caching keep the cache data consistent with external memory by always
writing to the memory any data that is stored in the cache.

80-N2040-53 Rev. AA 84
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

■ Write-back caching stores data in the cache without being immediately written to external
memory. Cached data that is inconsistent with external memory is referred to as dirty.
The Hexagon processor includes dedicated cache maintenance instructions that push dirty data
out to external memory.

5.10.1 Uncached memory


In some cases, load/store operations need to bypass the cache memories and be serviced
externally (for example, when accessing memory-mapped I/O, registers, and peripheral devices,
or other system defined entities). The operating system is responsible for configuring the MMU to
generate uncached memory accesses.
Uncached memory is categorized into two distinct types:
■ Device-type is for accessing memory that has side-effects (such as a memory-mapped FIFO
peripheral). The hardware ensures that interrupts do not cancel a pending device access. The
hardware does not reorder device accesses. Mark peripheral control registers as device-type.
■ Uncached-type is for memory-like memory. No side effects are associated with an access. The
hardware can load from uncached memory multiple times. The hardware can reorder
uncached accesses.
For instruction accesses, device-type memory is functionally identical to uncached-type memory.
For data accesses, they are different.
Code can execute directly from the L2 cache, bypassing the L1 cache.

5.10.2 Tightly coupled memory


The Hexagon processor supports tightly-coupled instruction memory at Level 1, which is defined
as memory with similar access properties to the instruction cache.
Tightly-coupled memory is also supported at level 2, which is defined as backing store to the
primary caches.

5.10.3 Cache maintenance operations


The Hexagon processor includes dedicated cache maintenance instructions that invalidate cache
data or push dirty data out to external memory.
The cache maintenance instructions operate on specific memory addresses. If the instruction
causes an address error (due to a privilege violation), the processor raises an exception.

NOTE: The exception to this rule is the dcfetch operation, which never causes a processor exception.

Whenever maintenance operations are performed on the instruction cache, the isync instruction
(Section 5.11) must execute immediately afterwards. This instruction ensures that subsequent
instructions observe the maintenance operations.

80-N2040-53 Rev. AA 85
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

Table 5-13 Cache instructions (user-level)


Permitted In
Syntax Operation
packet
icinva(Rs) Solo 1 Instruction cache invalidate.
Look up instruction cache at address Rs.
If the address is in the cache, invalidate it.
dccleaninva(Rs) Slot 1 Data cache clean and invalidate.
empty or Look up data cache at address Rs.
ALU32 only If the address is in the cache and has dirty data, flush
that data out to memory. The cache line is then
invalidated, whether or not dirty data was written.
dccleana(Rs) Slot 1 Data cache clean.
empty or Look up data cache at address Rs.
ALU32 only If the address is in the cache and has dirty data, flush
that data out to memory.
dcinva(Rs) Slot 1 Equivalent to dccleaninva(Rs).
empty or
ALU32 only
dcfetch(Rs) Normal 2 Data cache prefetch.
Prefetch data at address Rs into the data cache.
NOTE - This instruction does not cause an exception.
l2fetch(Rs,Rt) ALU32 or L2 cache prefetch.
XTYPE only Prefetch data from memory specified by Rs and Rt
into L2 cache.
1 Solo means that the instruction must not be grouped with other instructions in a packet.
2 Normal means that the normal instruction-grouping constraints apply.

5.10.4 L2 cache operations


Cache maintenance operations operate on both the L1 and L2 caches.
The data cache coherency operations (including clean, invalidate, and clean and invalidate) affect
both the L1 and L2 caches, and ensure that the memory hierarchy remains coherent.
However, the instruction cache invalidate operation affects only the L1 cache. Therefore,
invalidating instructions that might be in the L1 or L2 caches requires a two-step procedure:
1. Use icinva to invalidate instructions from the L1 cache.
2. Use dcinva separately to invalidate instructions from the L2 cache.

5.10.5 Cache line zero


The Hexagon processor includes the instruction dczeroa. This instruction allocates a line in the L1
data cache and clears it (by storing all zeros). The behavior is as follows:
■ The Rs register value must be 32-byte aligned. If it is unaligned, the processor raises an
unaligned error exception.

80-N2040-53 Rev. AA 86
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

■ For a cache hit, the specified cache line is cleared (written with all zeros) and made dirty.
■ For a cache miss, the specified cache line is not fetched from external memory. Instead, the
line is allocated in the data cache, cleared, and made dirty.
This instruction is useful in optimizing write-only data. It allows for the use of write-back pages –
which are the most power and performance efficient – without the need to initially fetch the line
to write. This removes unnecessary read bandwidth and latency.

NOTE: The dczeroa operation has the same exception behavior as write-back stores.

A packet with dczeroa must have slot 1 either empty or containing an ALU32 instruction.

5.10.6 Cache prefetch


The Hexagon processor supports the following types of cache prefetching.

5.10.6.1 Hardware-based instruction cache prefetching


L1 and L2 instruction cache prefetching can be enabled or disabled on a per-thread basis by
setting the HFI field in the User status register.

5.10.6.2 Software-based data cache prefetching


The Hexagon processor includes the dcfetch instruction, which queries the L1 data cache based
on the address specified in the instruction:
■ If the address is present in the cache, no action is taken.
■ If the cache line for the address is missing, the processor attempts to fill the cache line from
the next level of memory. The thread does not stall, but rather continues executing while the
cache line fill occurs in the background.
■ If the address is invalid, no exception is generated and the dcfetch instruction is treated as a
NOP.

5.10.6.3 Software-based L2fetch


More powerful L2 prefetching – of data or instructions – is provided by the l2fetch instruction,
which specifies an area of memory that is prefetched by the Hexagon processor’s hardware
prefetch engine. l2fetch specifies two registers (Rs and Rt) as operands. Rs contains the 32-bit
virtual start address of the memory area to prefetch. Rt contains three bit fields that further
specify the memory area:
■ Rt[15:8] – Width, specifies the width (in bytes) of a block of memory to fetch.
■ Rt[7:0] – Height, specifies the number of Width-sized blocks to fetch.
■ Rt[31:16] – Stride, specifies an unsigned byte offset that increments the pointer after each
Width-sized block is fetched.

80-N2040-53 Rev. AA 87
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

The l2fetch instruction is nonblocking: it initiates a prefetch operation that is performed in the
background by the prefetch engine while the thread continues to execute Hexagon processor
instructions.
The prefetch engine requests all lines in the specified memory area. If the line(s) of interest are
already resident in the L2 cache, the prefetch engine performs no action. If the lines are not in the
L2 cache, the prefetch engine attempts to fetch them.
The prefetch engine makes a best effort to prefetch the requested data, and attempts to perform
prefetching at a lower priority than demand fetches. This prevents the prefetch engine from
adding bus traffic when the system is under a heavy load.
If a program executes an l2fetch instruction while the prefetch operation from a previous l2fetch
is still active, the prefetch engine halts the current prefetch operation.

NOTE: Executing l2fetch with any bit field operand programmed to zero cancels prefetch activity.

The status of the current prefetch operation is maintained in the PFA field of the user status
register. This field can determine whether a prefetch operation has completed.
With respect to MMU permissions and error checking, the l2fetch instruction behaves similarly to
a load instruction. If the virtual address causes a processor exception, the exception is taken. This
differs from the dcfetch instruction, which is treated as a NOP in the presence of a
translation/protection error.

NOTE: Prefetches are dropped when the generated prefetch address resides on a different page than the
start address. The programmer must use sufficiently large pages to ensure this does not occur.

80-N2040-53 Rev. AA 88
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

Figure 5-2 shows two examples of using the l2fetch instruction. The first shows a box prefetch,
where a 2D range of memory is defined within a larger frame. The second example shows a
prefetch for a large linear memory area of size (Lines * 128).
L2FETCH for box prefetch L2FETCH for large linear prefetch

31 31 16 15 8 7 0
16 15 8 7 0

Rt Stride Width Height Rt 128 128 Lines

Rs Start Address Rs Start Address

Stride

Width

Prefetch
Height 128* Lines
Area

Figure 5-2 L2fetch instruction

5.10.6.4 Hardware-based data cache prefetching


L1 data cache prefetching is enabled or disabled on a per-thread basis by setting the HFD field in
the User status register.
When data cache prefetching is enabled, the Hexagon processor observes patterns of data cache
misses, and attempts to predict future misses based on any recurring patterns of misses where
the addresses are separated by a constant stride. If such patterns are found, the processor
attempts to automatically prefetch future cache lines.
Data cache prefetching is user-enabled at four levels of aggressiveness:
■ HFD = 00: No prefetching
■ HFD = 01: Prefetch up to four lines for misses originating from a load, with a post-update
addressing mode that occurs within a hardware loop
■ HFD = 10: Prefetch up to four lines for misses originating from loads that occur within a
hardware loop
■ HFD = 11: Prefetch up to eight lines for misses originating from loads

80-N2040-53 Rev. AA 89
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

5.11 Memory ordering


Some devices might require synchronization of stores and loads when they are accessed. In this
case, a set of processor instructions enable programmer control of the synchronization and
ordering of memory accesses.

Table 5-14 Memory ordering instructions


Syntax Operation
isync Instruction synchronize.
This instruction should execute after any instruction cache maintenance
operation.
syncht Synchronize transactions.
Perform heavyweight synchronization. Ensure that previous program
transactions (for example, memw_locked, cached and uncached load/store) are
complete before execution resumes past this instruction.
The syncht instruction ensures that outstanding memory operations from all
threads are complete before the syncht instruction is committed.
barrier Set memory barrier.
Ensure proper ordering between the program accesses performed before the
instruction and those performed after the instruction.
All accesses before the barrier are globally observable before any access
occurring after the barrier can be observed.
The barrier instruction ensures that all outstanding memory operations from the
thread executing the barrier are complete before the instruction is committed.

Data memory accesses and program memory accesses are treated separately and held in
separate caches. Software should ensure coherency between data and program code if necessary.
For example, with generated or self-modified code, the modified code is placed in the data cache
and can be inconsistent with program cache. The software must explicitly force modified data
cache lines to memory (either by using a write-through policy, or through explicit cache clean
instructions). Use a barrier instruction to ensure completion of the stores. Finally, invalidate
relevant instruction cache contents so the new instructions can be refetched.
Here is the recommended code sequence to change and execute an instruction:
ICINVA(R1) // Clear code from instruction cache
ISYNC // Ensure that ICINVA is finished
MEMW(R1)=R0 // Write the new instruction
DCCLEANINVA(R1) // Force data out of data cache
SYNCHT // Ensure that it is in memory
JUMPR R1 // Can now execute code at R1

NOTE: The memory-ordering instructions must not be grouped with other instructions in a packet,
otherwise the behavior is undefined.

This code sequence differs from the one used in previous processor versions.

80-N2040-53 Rev. AA 90
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

5.12 Atomic operations


The Hexagon processor includes a load locked / store conditional (LL/SC) mechanism to provide
the atomic read-modify-write operation that is necessary to implement synchronization
primitives such as semaphores and mutexes.
These primitives synchronize the execution of different software programs running concurrently
on the Hexagon processor. They can also provide atomic memory support between the Hexagon
processor and external blocks.

Table 5-15 Atomic instructions


Syntax Description
Rd = memw_locked(Rs) Load locked word.
Reserve lock on word at address Rs.
memw_locked(Rs,Pd) = Rt Store conditional word.
If no other atomic operation has been performed at
the address (that is, atomicity is ensured), perform the
store to the word at address Rs and return TRUE in
Pd; otherwise return FALSE.
TRUE indicates that the LL and SC operations were
performed atomically.
Rdd = memd_locked(Rs) Load locked doubleword.
Reserve lock on doubleword at address Rs.
memd_locked(Rs,Pd) = Rtt Store conditional doubleword.
If no other atomic operation has been performed at
the address (that is, atomicity is ensured), perform the
store to the doubleword at address Rs and return
TRUE in Pd; otherwise return FALSE.
TRUE indicates that the LL and SC operations have
been performed atomically.

Here is the recommended code sequence to acquire a mutex:


// Assume mutex address is held in R0
// Assume R1,R3,P0,P1 are scratch

lockMutex:
R3 = #1
lock_test_spin:
R1 = memw_locked(R0) // Do normal test to wait
P1 = cmp.eq(R1,#0) // for lock to be available
if (!P1) jump lock_test_spin
memw_locked(R0,P0) = r3 // Do store conditional (SC)
if (!P0) jump lock_test_spin // was LL and SC done atomically?

Here is the recommended code sequence to release a mutex:


// Assume mutex address is held in R0
// Assume R1 is scratch

R1 = #0
memw(R0) = R1

80-N2040-53 Rev. AA 91
Qualcomm Hexagon V73 Programmer’s Reference Manual Memory

Atomic memX_locked operations are supported for external accesses that use the advanced
extensible interface (AXI) bus and support atomic operations. To perform load-locked operations
with external memory, the operating system must define the memory page as uncacheable,
otherwise the processor behavior is undefined.
If a load locked operation is performed on an address that does not support atomic operations,
the behavior is undefined.
For atomic operations on cacheable memory, the page attributes must be set to cacheable and
write-back, otherwise the behavior is undefined. Cacheable memory must be used when threads
must synchronize with each other.

NOTE: External memX_locked operations are not supported on the AHB. If they are performed on the
AHB, the behavior is undefined.

80-N2040-53 Rev. AA 92
6 Conditional Execution

The Hexagon processor uses a conditional execution model based on compare instructions that
set predicate bits in one of four 8-bit predicate registers (P0 through P3). These predicate bits can
conditionally execute certain instructions.
Conditional scalar operations examine only the least-significant bit in a predicate register, while
conditional vector operations examine multiple bits in the register.
Branch instructions are the main consumers of the predicate registers.

6.1 Scalar predicates


Scalar predicates are 8-bit values in conditional instructions to represent truth values:
■ 0xFF represents true
■ 0x00 represents false
The Hexagon processor provides the four 8-bit predicate registers P0 through P3 to hold scalar
predicates (Section 2.2.5). These registers are assigned values by the predicate-generating
instructions, and examined by the predicate-consuming instructions.

6.1.1 Generating scalar predicates


The following instructions generate scalar predicates:
■ Compare byte, halfword, word, doubleword
■ Compare single- and double-precision floating point
■ Classify floating-point value
■ Compare bitmask
■ Bounds check
■ TLB match
■ Store conditional

80-N2040-53 Rev. AA 93
Qualcomm Hexagon V73 Programmer’s Reference Manual Conditional Execution

Table 6-1 Scalar predicate-generating instructions


Syntax Operation
Pd = cmpb.eq(Rs,{Rt,#u8}) Equal (signed).
Pd = cmph.eq(Rs,{Rt,#s8}) Compare register Rs to Rt or a signed immediate for
Pd = [!]cmp.eq(Rs,{Rt,#s10}) equality. Assign Pd the resulting truth value.
Pd = cmp.eq(Rss,Rtt)
Pd = sfcmp.eq(Rs,Rt)
Pd = dfcmp.eq(Rss,Rtt)
Pd = cmpb.gt(Rs,{Rt,#s8} Greater than (signed).
Pd = cmph.gt(Rs,{Rt,#s8}) Compare register Rs to Rt or a signed immediate for
Pd = [!]cmp.gt(Rs,{Rt,#s10}) signed greater than. Assign Pd the resulting truth
Pd = cmp.gt(Rss,Rtt) value.
Pd = sfcmp.gt(Rs,Rt)
Pd = dfcmp.gt(Rss,Rtt)
Pd = cmpb.gtu(Rs,{Rt,#u7}) Greater than (unsigned).
Pd = cmph.gtu(Rs,{Rt,#u7}) Compare register Rs to Rt or an unsigned immediate
Pd = [!]cmp.gtu(Rs,{Rt,#u9}) for unsigned greater than. Assign Pd the resulting
Pd = cmp.gtu(Rss,Rtt) truth value.
Pd = cmp.ge(Rs,#s8) Greater than or equal (signed).
Pd = sfcmp.ge(Rs,Rt) Compare register Rs to Rt or a signed immediate for
Pd = dfcmp.ge(Rss,Rtt) signed greater than or equal. Assign Pd the resulting
truth value.
Pd = cmp.geu(Rs,#u8) Greater than or equal (unsigned).
Compare register Rs to an unsigned immediate for
unsigned greater than or equal. Assign Pd the
resulting truth value.
Pd = cmp.lt(Rs,Rt) Less than (signed).
Compare register Rs to Rt for signed less than.
Assign Pd the resulting truth value.
Pd = cmp.ltu(Rs,Rt) Less than (unsigned).
Compare register Rs to Rt for unsigned less than.
Assign Pd the resulting truth value.
Pd = sfcmp.uo(Rs,Rt) Unordered (signed).
Pd = dfcmp.uo(Rss,Rtt) Determine if register Rs or Rt is set to the value NaN.
Assign Pd the resulting truth value.
Pd=sfclass(Rs,#u5) Classify value (signed).
Pd=dfclass(Rss,#u5) Determine if register Rs is set to any of the specified
classes. Assign Pd the resulting truth value.
Pd = [!]tstbit(Rs,{Rt,#u5}) Test if bit set.
Rt or an unsigned immediate specifies a bit position.
Test if the bit in Rs that is specified by the bit position
is set. Assign Pd the resulting truth value.
Pd = [!]bitsclr(Rs,{Rt,#u6}) Test if bits clear.
Rt or an unsigned immediate specifies a bitmask.
Test if the bits in Rs that are specified by the bitmask
are all clear. Assign Pd the resulting truth value.

80-N2040-53 Rev. AA 94
Qualcomm Hexagon V73 Programmer’s Reference Manual Conditional Execution

Table 6-1 Scalar predicate-generating instructions (cont.)


Pd = [!]bitsset(Rs,Rt) Test if bits set.
Rt specifies a bitmask.
Test if the bits in Rs that are specified by the bitmask
are all set. Assign Pd the resulting truth value.
memw_locked(Rs,Pd) = Rt Store conditional.
memd_locked(Rs,Pd) = Rtt If no other atomic operation has been performed at
the address (atomicity is ensured), perform the store
to the word at address Rs. Assign Pd the resulting
truth value.
Pd = boundscheck(Rs,Rtt) Bounds check.
Determine if Rs falls in the numeric range defined by
Rtt. Assign Pd the resulting truth value.
Pd = tlbmatch(Rss,Rt) Determine if TLB entry in Rss matches the ASID:PPN
specified in Rt. Assign Pd the resulting truth value.

NOTE: One of the compare instructions (cmp.eq) includes a variant that stores a binary predicate value
(0 or 1) in a general register not a predicate register.

6.1.2 Consuming scalar predicates


Certain instructions can be conditionally executed based on the value of a scalar predicate (or
alternatively specify a scalar predicate as an input to their operation).
The conditional instructions that consume scalar predicates examine only the least-significant bit
of the predicate value. In the simplest case, this bit value directly determines whether the
instruction executes:
■ 1 indicates that the instruction executes
■ 0 indicates that the instruction does not execute
If a conditional instruction includes the operator ! in its predicate expression, the logical negation
of the bit value determines whether the instruction is executed.
Conditional instructions are expressed in assembly language with the instruction prefix “if
(pred_expr)”, where pred_expr specifies the predicate expression. For example:
if (P0) jump target // Jump if P0 is true
if (!P2) R2 = R5 // Assign register if !P2 is true
if (P1) R0 = sub(R2,R3) // Conditionally subtract if P1
if (P2) R0 = memw(R2) // Conditionally load word if P2

The following instructions can be used as conditional instructions:


■ Jumps and calls (Section 8.3)
■ Many load and store instructions (Section 5.9)
■ Logical instructions (including AND/OR/XOR)
■ Shift halfword
■ 32-bit add/subtract by register or short immediate

80-N2040-53 Rev. AA 95
Qualcomm Hexagon V73 Programmer’s Reference Manual Conditional Execution

■ Sign and zero extend


■ 32-bit register transfer and 64-bit combine word
■ Register transfer immediate
■ Deallocate frame and return
When a conditional load or store executes and the predicate expression is false, the instruction is
canceled (including any exceptions that might occur). For example, if a conditional load uses an
address with a memory permission violation, and the predicate expression is false, the load does
not execute and the exception is not raised.
The mux instruction accepts a predicate as one of its basic operands:
Rd = mux(Ps,Rs,Rt)

The mux instruction selects either Rs or Rt based on the least significant bit in Ps. If the least-
significant bit in Ps is a 1, Rd is set to Rs, otherwise it is set to Rt.

6.1.3 Auto-AND predicates


If multiple compare instructions in a packet write to the same predicate register, the result is the
logical AND of the individual compare results. For example:
{
P0 = cmp(A) // If A && B, jump
P0 = cmp(B)
if (P0.new) jump:T taken_path
}

To perform the corresponding OR operation, the following instructions can compute the negation
of an existing compare (using De Morgan’s law):
■ Pd = !cmp.{eq,gt}(Rs, {#s10,Rt} )
■ Pd = !cmp.gtu(Rs, {#u9,Rt} )
■ Pd = !tstbit(Rs, {#u5,Rt} )
■ Pd = !bitsclr(Rs, {#u6,Rt} )
■ Pd = !bitsset(Rs,Rt)

Auto-AND predicates have the following restrictions:


■ If a packet contains endloopN, it cannot perform an auto-AND with predicate register P3.
■ If a packet contains a register transfer from a general register to a predicate register, no other
instruction in the packet can write to the same predicate register. As a result, a register
transfer to P3:0 or C5:4 cannot be grouped with any other predicate-writing instruction.
■ The instructions spNloop0, decbin, tlbmatch, memw_locked, memd_locked, add:carry,
sub:carry, sfcmp, and dfcmp cannot be grouped with another instruction that sets the same
predicate register.

NOTE: A register transfer from a predicate register to a predicate register has the same auto-AND
behavior as a compare instruction.

80-N2040-53 Rev. AA 96
Qualcomm Hexagon V73 Programmer’s Reference Manual Conditional Execution

6.1.4 Dot-new predicates


The Hexagon processor can generate and use a scalar predicate in the same instruction packet
(Section 3.3). This feature is expressed in assembly language by appending the suffix “.new” to
the specified predicate register. For example:
if (P0.new) R3 = memw(R4)

The following C statement and the corresponding assembly code that is generated from it by the
compiler is an example of how to use dot-new predicates.

C statement
if (R2 == 4)
R3 = *R4;
else
R5 = 5;

Assembly code
{
P0 = cmp.eq(R2,#4)
if (P0.new) R3 = memw(R4)
if (!P0.new) R5 = #5
}

In the assembly code, a scalar predicate is generated and then consumed twice within the same
instruction packet.
The following conditions apply to using dot-new predicates:
■ The predicate must be generated by an instruction in the same packet. The assembler
normally enforces this restriction, but if the processor executes a packet that violates this
restriction, the execution result is undefined.
■ A single packet can contain both the dot-new and normal forms of predicates. The normal
form examines the old value in the predicate register, rather than the newly-generated value.
For example:
{
P0 = cmp.eq(R2,#4)
if (P0.new) R3 = memw(R4) // Use newly-generated P0 value
if (P0) R5 = #5 // Use previous P0 value
}

80-N2040-53 Rev. AA 97
Qualcomm Hexagon V73 Programmer’s Reference Manual Conditional Execution

6.1.5 Dependency constraints


Two instructions in an instruction packet should not write to the same destination register
(Section 3.3.5). An exception to this rule is when the two instructions are conditional, and only
one of them ever has the predicate expression value true when the packet executes.
For example, the following packet is valid as long as P2 and P3 never both evaluate to true when
the packet is executed:
{
if (P2) R3 = #4 // P2, P3, or both must be false
if (P3) R3 = #7
}

Because predicate values change at runtime, the programmer is responsible for ensuring that
such packets are always valid during program execution. If they are invalid, the processor takes
the following actions:
■ When writing to general registers, an error exception is raised.
■ When writing to predicate or control registers, the result is undefined.

6.2 Vector predicates


The predicate registers are also used for conditional vector operations. Unlike scalar predicates,
vector predicates contain multiple truth values which are generated by vector predicate-
generating operations.
For example, a vector compare instruction compares each element of a vector and assigns the
compare results to a predicate register. Each bit in the predicate vector contains a truth value
indicating the outcome of a separate compare performed by the vector instruction.
The vector mux instruction uses a vector predicate to selectively merge elements from two
separate vectors into a single destination vector. This operation is useful for enabling the
vectorization of loops with control flow (branches).
The vector instructions that use predicates are described in the following sections.

6.2.1 Vector compare


A vector compare instruction inputs two 64-bit vectors, performs separate compares for each pair
of vector elements, and generates a predicate value which contains a bit vector of truth values.
In Figure 6-1 two 64-bit vectors of bytes (contained in Rss and Rtt) are being compared. The result
is assigned as a vector predicate to the destination register Pd.
In the example vector predicate shown in Figure 6-1, every other compare result in the predicate
is true (for example, 1).

80-N2040-53 Rev. AA 98
Qualcomm Hexagon V73 Programmer’s Reference Manual Conditional Execution

Rss

Rtt

cmp cmp cmp cmp cmp cmp cmp cmp

1 0 1 0 1 0 1 0 Pd
7 0

Figure 6-1 Vector byte compare

Figure 6-2 shows how a vector halfword compare generates a vector predicate. Two 64-bit
vectors of halfwords are being compared. The result is assigned as a vector predicate to the
destination register Pd.
Because a vector halfword compare yields only four truth values, each truth value is encoded as
two bits in the generated vector predicate.

Rss

Rtt

cmp cmp cmp cmp

1 1 0 0 1 1 0 0 Pd
7 0

Figure 6-2 Vector halfword compare

80-N2040-53 Rev. AA 99
Qualcomm Hexagon V73 Programmer’s Reference Manual Conditional Execution

6.2.2 Vector mux instruction


A vector mux instruction conditionally selects the elements from two vectors. The instruction
takes as input two source vectors and a predicate register. For each byte in the vector, the
corresponding bit in the predicate register is used to choose from one of the two input vectors.
The combined result is written to the destination register.

Rss

Rtt

mux mux mux mux mux mux mux mux


P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0]

Rdd

Figure 6-3 Vector mux instruction

Table 6-2 Vector mux instruction


Syntax Operation
Rdd = vmux(Ps,Rss,Rtt) Select bytes from Rss and Rtt

Changing the order of the source operands in a mux instruction enables formation of both senses
of the result. For example:
R1:0 = vmux(P0,R3:2,R5:4) // Choose bytes from R3:2 if true
R1:0 = vmux(P0,R5:4,R3:2) // Choose bytes from R3:2 if false

NOTE: By replicating the predicate bits generated by word or halfword compares, the vector mux
instruction can select words or halfwords.

6.2.3 Using vector conditionals


Vector conditional support is used to vectorize loops with conditional statements.
Consider the following C statement:
for (i=0; i<8; i++) {
if (A[i]) {
B[i] = C[i];
}
}

Assuming arrays of bytes, this code can be vectorized as follows:


R1:0 = memd(R_A) // R1:0 holds A[7]-A[0]
R3 = #0 // Clear R3:2
R2 = #0
P0 = vcmpb.eq(R1:0,R3:2) // Compare bytes in A to zero

80-N2040-53 Rev. AA 100


Qualcomm Hexagon V73 Programmer’s Reference Manual Conditional Execution

R5:4 = memd(R_B) // R5:4 holds B[7]-B[0]


R7:6 = memd(R_C) // R7:6 holds C[7]-C[0]
R3:2 = vmux(P0,R7:6,R5:4) // if (A[i]) B[i]=C[i]
memd(R_B) = R3:2 // store B[7]-B[0]

6.3 Predicate operations


The Hexagon processor provides a set of operations for manipulating and moving predicate
registers.

Table 6-3 Predicate register instructions


Syntax Operation
Pd = Ps Transfer predicate Ps to Pd
Pd = Rs Transfer register Rs to predicate Pd
Rd = Ps Transfer predicate Ps to register Rd
Pd = and(Ps,[!]Pt) Set Pd to bitwise AND of Ps and [NOT] Pt
Pd = or(Ps,[!]Pt) Set Pd to bitwise OR of Ps and [NOT] Pt
Pd = and(Ps, and(Pt,[!]Pu) Set Pd to AND of Ps and (AND of Pt and [NOT] Pu)
Pd = and(Ps, or(Pt,[!]Pu) Set Pd to AND of Ps and (OR of Pt and [NOT] Pu)
Pd = or(Ps, and(Pt,[!]Pu) Set Pd to OR of Ps and (AND of Pt and [NOT] Pu)
Pd = or(Ps, or(Pt,[!]Pu) Set Pd to OR of Ps and (OR of Pt and [NOT] Pu)
Pd = not(Ps) Set Pd to bitwise inversion of Ps
Pd = xor(Ps,Pt) Set Pd to bitwise exclusive OR of Ps and Pt
Pd = any8(Ps) Set Pd to 0xFF if any bit in Ps is 1, 0x00 otherwise
Pd = all8(Ps) Set Pd to 0x00 if any bit in Ps is 0, 0xFF otherwise

NOTE: These instructions belong to instruction class CR.

Predicate registers can be transferred to and from the general registers either individually or as
register quadruples (Section 2.2.5).

80-N2040-53 Rev. AA 101


7 Software Stack

The Hexagon processor includes dedicated registers and instructions to support a call stack for
subroutine execution.
The stack structure follows standard C conventions.

7.1 Stack structure


The stack is defined to grow from high addresses to low addresses. The stack pointer register SP
points to the data element that is currently on the top of the stack.
Figure 7-1 Stack structure
Stack in memory

Saved LR
Saved FP

Higher address
Procedure local
data on stack

Stack frame

Saved LR
Saved FP FP register

Procedure local
data on stack

SP register
Lower address

Unallocated stack

80-N2040-53 Rev. AA 102


Qualcomm Hexagon V73 Programmer’s Reference Manual Software Stack

NOTE: The Hexagon processor supports three dedicated stack instructions: allocframe, deallocframe,
and dealloc_return (Section 7.5).

The SP address must always remain 8-byte aligned for the stack instructions to work properly.

7.2 Stack frames


The stack stores stack frames, which are data structures that store state information on the active
subroutines in a program (for example, those that were called but have not yet returned). Each
stack frame corresponds to an active subroutine in the program.
A stack frame contains the following elements:
■ The local variables and data used by the subroutine
■ The return address for the subroutine call (pushed from the link register LR)
■ The address of the previous stack frame allocated on the stack (pushed from the frame
pointer register FP)
The frame pointer register FP always contains the address of the saved frame pointer in the
current stack frame. It facilitates debugging by enabling a debugger to examine the stack in
memory and easily determine the call sequence, function parameters, and so on.

NOTE: For leaf functions it is often unnecessary to save FP and LR. In this case, FP contains the frame
pointer of the calling function, not the current function.

7.3 Stack protection


The Hexagon processor supports the following features to protect the integrity of the software
stack.

7.3.1 Stack bounds checking


Stack bounds checking prevents a stack frame from being allocated past the lower boundary of
the software stack.
FRAMELIMIT is a 32-bit control register that stores a memory address that specifies the lower
bound of the memory area reserved for the software stack. When the allocframe instruction
allocates a new stack frame, it compares the new stack pointer value in SP with the stack bound
value in FRAMELIMIT. If SP is less than FRAMELIMIT, the Hexagon processor raises exception 0x27
(Section 8.10).

NOTE: Stack bounds checking is performed when the processor is in User and Guest modes, but not in
Monitor mode.

80-N2040-53 Rev. AA 103


Qualcomm Hexagon V73 Programmer’s Reference Manual Software Stack

7.3.2 Stack smashing protection


Stack smashing is a technique malicious code uses to gain control over an executing program.
Malicious code causes buffer overflows to occur in a procedure’s local data, with the goal of
modifying the subroutine return address stored in a stack frame so it points to the malicious code
instead of the intended return code.
Stack smashing protection prevents this from occurring by scrambling the subroutine return
address when a new stack frame is allocated, and then unscrambling the return address when the
frame is deallocated. Because the value in FRAMEKEY changes regularly and varies from device to
device, it becomes difficult to precalculate a malicious return address.
FRAMEKEY is a 32-bit control register which scrambles return addresses stored on the stack:
■ In the allocframe instruction, the 32-bit return address in link register LR is XOR-scrambled
with the value in FRAMEKEY before it is stored in the new stack frame.
■ In deallocframe and dealloc_return, the return address loaded from the stack frame is
unscrambled with the value in FRAMEKEY before it is stored in LR.
After a processor reset, the default value of FRAMEKEY is 0. If this value is not changed, stack
smashing protection is effectively disabled.

NOTE: Each hardware thread has its own instance of the FRAMEKEY register.

7.4 Stack registers


Table 7-1 Stack registers
Register Name Description Alias
SP Stack pointer Points to topmost stack element in memory R29
FP Frame pointer Points to previous stack frame on stack R30
LR Link register Contains return address of subroutine call R31
FRAMELIMIT Frame limit register Contains lowest address of stack area C16
FRAMEKEY Frame key register Contains scrambling key for return addresses C17

NOTE: SP, FP, and LR are aliases of three General registers. These general registers are conventionally
dedicated for use as stack registers.

80-N2040-53 Rev. AA 104


Qualcomm Hexagon V73 Programmer’s Reference Manual Software Stack

7.5 Stack instructions


The Hexagon processor includes the allocframe and deallocframe instructions to efficiently
allocate and deallocate stack frames on the call stack.

Table 7-2 Stack instructions


Syntax Operation
allocframe(#u11:3) Allocate stack frame.
This instruction is used after a call. It first XORs the values in LR
and FRAMEKEY, and pushes the resulting scrambled return
address and FP to the top of stack.
Next, it subtracts an unsigned immediate from SP to allocate
room for local variables. If the resulting SP is less than
FRAMELIMIT, the processor raises exception 0x27. Otherwise,
SP is set to the new value, and FP is set to the address of the
old frame pointer on the stack.
The immediate operand as expressed in assembly syntax
specifies the byte offset. This value must be 8-byte aligned. The
valid range is from 0 to 16 KB.
deallocframe Deallocate stack frame.
Use this instruction before a return to free a stack frame. It first
loads the saved FP and LR values from the address at FP, and
XORs the restored LR with the value in FRAMEKEY to
unscramble the return address. SP is then pointed back to the
previous frame.
dealloc_return Subroutine return with stack frame deallocate.
Perform deallocframe operation, and then perform subroutine
return (Section 8.3.3) to the target address loaded from LR by
deallocframe.

NOTE: The allocframe and deallocframe instructions load and store the LR and FP registers on the stack
as a single aligned 64-bit register pair (LR:FP).

80-N2040-53 Rev. AA 105


8 Program Flow

The Hexagon processor supports the following program flow facilities.

8.1 Conditional instructions


Many Hexagon processor instructions can conditionally execute. For example:
if (P0) R0 = memw(R2) // Conditionally load word if P0
if (!P1) jump label // Conditionally jump if not P1

The following instructions can be specified as conditional:


■ Jumps and calls
■ Many load and store instructions
■ Logical instructions (including AND/OR/XOR)
■ Shift halfword
■ 32-bit add/subtract by register or short immediate
■ Sign and zero extend
■ 32-bit register transfer and 64-bit combine word
■ Register transfer immediate
■ Deallocate frame and return
For more information, see Section 5.9 and Chapter 6.

8.2 Hardware loops


The Hexagon processor includes hardware loop instructions that perform loop branches with zero
overhead. For example:
loop0(start,#3) // loop 3 times
start:
{ R0 = mpyi(R0,R0) } :endloop0

Two sets of hardware loop instructions are provided – loop0 and loop1 – to nest hardware loops
one level deep. For example:
// Sum the rows of a 100x200 matrix.

loop1(outer_start,#100)
outer_start:

80-N2040-53 Rev. AA 106


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

R0 = #0
loop0(inner_start,#200)
inner_start:
R3 = memw(R1++#4)
{ R0 = add(R0,R3) }:endloop0
{ memw(R2++#4) = R0 }:endloop1

Use the hardware loop instructions as follows:


■ For non-nested loops, loop0 is used.
■ For nested loops, loop0 is used for the inner loop, and loop1 for the outer loop.

NOTE: If a program must create loops nested more than one level deep, the two innermost loops can be
implemented as hardware loops, with the remaining outer loops implemented as software
branches.

Each hardware loop is associated with a pair of dedicated loop registers:


■ The loop start address register SAn is set to the address of the first instruction in the loop
(which is typically expressed in assembly language as a label).
■ The loop count register LCn is set to a 32-bit unsigned value which specifies the number of
loop iterations to perform. When the PC reaches the end of the loop, LCn is examined to
determine whether to repeat or exit the loop.
The hardware loop setup instruction sets both of these registers at once – typically there is no
need to set them individually. However, because the loop registers completely specify the
hardware loop state, saving and restoring the registers (either automatically by a processor
interrupt or manually by the programmer) enables a suspended hardware loop to resume
normally once its loop registers are reloaded with the saved values.
The Hexagon processor provides two sets of loop registers for the two hardware loops:
■ SA0 and LC0 are used by loop0
■ SA1 and LC1 are used by loop1

80-N2040-53 Rev. AA 107


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

Table 8-1 Hardware loop instructions


Syntax Description
loopN(start, Rs) Hardware loop with register loop count.
Set registers SAn and LCn for hardware loop N:
■ SAn is assigned the specified start address of the loop.
■ LCn is assigned the value of general register Rs.
NOTE: The loop start operand is encoded as a PC-relative
immediate value.
loopN(start, Hardware loop with immediate loop count.
#count) Set registers SAn and LCn for hardware loop N:
■ SAn is assigned the specified start address of the loop.
■ LCn is assigned the specified immediate value (0 to 1023).
NOTE: The loop start operand is encoded as a PC-relative
immediate value.
:endloopN Hardware loop end instruction.
Performs the following operation:
if (LCn > 1) {PC = SAn; LCn = LCn-1}

NOTE: This instruction appears in assembly as a suffix appended to the


last packet in the loop. It is encoded in the last packet.
SAn = Rs Set loop start address to general register Rs
LCn = Rs Set loop count to general register Rs

NOTE: The loop instructions are assigned to instruction class CR.

8.2.1 Loop setup


To set up a hardware loop, the loop registers SAn and LCn must be set to the proper values. This is
done in two ways:
■ A loopN instruction
■ Register transfers to SAn and LCn
The loopN instruction performs all the work of setting SAn and LCn. For example:
loop0(start,#3) // SA0=&start, LC0=3
start:
{ R0 = mpyi(R0,R0) } :endloop0

In this example, the hardware loop (consisting of a single multiply instruction) executes three
times. The loop0 instruction sets register SA0 to the address value of label start, and LC0 to 3.
Loop counts are limited to the range 0 to1023 when they are expressed as immediate values in
loopN. If the desired loop count exceeds this range, it must be specified as a register value. For
example:
Using loopN:
R1 = #20000;
loop0(start,R1) // LC0=20000, SA0=&start

80-N2040-53 Rev. AA 108


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

start:
{ R0 = mpyi(R0,R0) } :endloop0

Using register transfers:


R1 = #20000
LC0 = R1 // LC0=20000
R1 = #start
SA0 = R1 // SA0=&start
start:
{ R0 = mpyi(R0,R0) } :endloop0

If a loopN instruction is located too far from its loop start address, the PC-relative offset value that
specifies the start address can exceed the maximum range of the instruction’s start-address
operand. If this occurs, either move the loopN instruction closer to the loop start, or specify the
loop start address as a 32-bit constant (Section 10.9).
For example, using 32-bit constants:
R1 = #20000;
loop0(##start,R1) // LC0=20000, SA0=&start
...

8.2.2 Loop end


The loop end instruction indicates the last packet in a hardware loop. It is expressed in assembly
language by appending the packet with the symbol “:endloopN”, where N specifies the
hardware loop (0 or 1). For example:
loop0(start,#3)
start:
{ R0 = mpyi(R0,R0) } :endloop0 // last packet in loop

The last instruction in the loop must always be expressed in assembly language as a packet (using
curly braces), even if it is the only instruction in the packet.
Nested hardware loops can specify the same instruction as the end of both the inner and outer
loops. For example:
// Sum the rows of a 100x200 matrix.
// Software pipeline the outer loop.

p0 = cmp.gt(R0,R0) // p0 = false
loop1(outer_start,#100)
outer_start:
{ if (p0) memw(R2++#4) = R0
p0 = cmp.eq(R0,R0) // p0 = true
R0 = #0
loop0(inner_start,#200) }
inner_start:
R3 = memw(R1++#4)
{ R0 = add(R0,R3) }:endloop0:endloop1
memw(R2++#4) = R0

80-N2040-53 Rev. AA 109


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

Though endloopN behaves like a regular instruction (by implementing the loop test and branch),
it does not execute in any instruction slot, and does not count as an instruction in the packet.
Therefore a single instruction packet which is marked as a loop end can perform up to six
operations:
■ Four regular instructions (the normal limit for an instruction packet)
■ The endloop0 test and branch
■ The endloop1 test and branch

NOTE: The endloopN instruction is encoded in the instruction packet (Section 10.6).

8.2.3 Loop execution


After a hardware loop is set up, the loop body always executes at least once regardless of the
specified loop count (because the loop count is not examined until the last instruction in the
loop). Therefore, if a loop must be optionally executed zero times, it must be preceded with an
explicit conditional branch. For example:
loop0(start,R1)
P0 = cmp.eq(R1,#0)
if (P0) jump skip
start:
{ R0 = mpyi(R0,R0) } :endloop0
skip:

In this example a hardware loop is set up with the loop count in R1, but if the value in R1 is zero a
software branch skips over the loop body.
After the loop end instruction of a hardware loop is executed, the Hexagon processor examines
the value in the corresponding loop count register:
■ If the value is greater than 1, the processor decrements the loop count register and performs
a zero-cycle branch to the loop start address.
■ If the value is less than or equal to 1, the processor resumes program execution at the
instruction immediately following the loop end instruction.

NOTE: Because nested hardware loops can share the same loop end instruction, the processor can
examine both loop count registers in a single operation.

8.2.4 Pipelined hardware loops


Software pipelined loops are common for VLIW architectures such as the Hexagon processor.
They offer increased code performance in loops by overlapping multiple loop iterations.
A software pipeline has three sections:
■ A prologue in which the loop is primed
■ A kernel (or steady-state) portion
■ An epilogue which drains the pipeline

80-N2040-53 Rev. AA 110


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

A simple example is shown in Table 8-2.

Table 8-2 Software pipelined loop

int foo(int *A, int *result)


{
int i;
for (i=0;i<100;i++) {
result[i]= A[i]*A[i];
}
}

foo:
{ R3 = R1
loop0(.kernel,#98) // Decrease loop count by 2
}
R1 = memw(R0++#4) // First prologue stage
{ R1 = memw(R0++#4) // Second prologue stage
R2 = mpyi(R1,R1)
}
.falign
.kernel:
{ R1 = memw(R0++#4) // Kernel
R2 = mpyi(R1,R1)
memw(R3++#4) = R2
}:endloop0
{ R2 = mpyi(R1,R1) // First epilogue stage
memw(R3++#4) = R2
}
memw(R3++#4) = R2 // Second epilogue stage
jumpr lr

In Table 8-2, the kernel section of the pipelined loop performs three iterations of the loop in
parallel:
■ The load for iteration N+2
■ The multiply for iteration N+1
■ The store for iteration N
One drawback to software pipelining is the extra code necessary for the prologue and epilogue
sections of a pipelined loop.
To address this issue, the Hexagon processor provides the spNloop0 instruction, where the “N” in
the instruction name indicates a digit in the range 1 to 3. For example:
P3 = sp2loop0(start,#10) // Set up pipelined loop

The spNloop0 instruction is a variant of the loop0 instruction: it sets up a normal hardware loop
using SA0 and LC0, but also performs the following additional operations:
■ When the spNloop0 instruction executes, it assigns the truth value false to the predicate
register P3.

80-N2040-53 Rev. AA 111


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

■ After the associated loop executes N times, P3 is automatically set to true.


This feature (known as automatic predicate control) enables the store instructions in the kernel
section of a pipelined loop to conditionally execute by P3 and thus – because of the way
spNloop0 controls P3 – not execute during the pipeline warm-up. This can reduce the code size of
many software pipelined loops by eliminating the need for prologue code.
The spNloop0 instruction cannot be used to eliminate the epilogue code from a pipelined loop;
however, in some cases it is possible to do this through the use of programming techniques.
Typically, the issue affecting the removal of epilogue code is load safety. If the kernel section of a
pipelined loop can safely access past the end of its arrays – either because it is known as safe, or
because the arrays have been padded at the end – epilogue code is unnecessary. However, if load
safety cannot be ensured, explicit epilogue code is required to drain the software pipeline.
Table 8-3 shows how spNloop0 and load safety simplify the code shown in Table 8-2.

Table 8-3 Software pipelined loop (using spNloop0)

int foo(int *A, int *result)


{
int i;
for (i=0;i<100;i++) {
result[i]= A[i]*A[i];
}
}

foo:
{ // load safety assumed
P3 = sp2loop0(.kernel,#102) // Set up pipelined loop
R3 = R1
}
.falign
.kernel:
{ R1 = memw(R0++#4) // Kernel
R2 = mpyi(R1,R1)
if (P3) memw(R3++#4) = R2
}:endloop0

jumpr lr

NOTE: The count value that spNloop0 uses to control the P3 setting is stored in the user status register
USR.LPCFG.

8.2.5 Loop restrictions


Hardware loops have the following restrictions:
■ The loop setup packet in loopN or spNloop0 (Section 8.2.4) cannot contain a speculative
indirect jump, new-value compare jump, or dealloc_return.

80-N2040-53 Rev. AA 112


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

■ The last packet in a hardware loop cannot contain any program flow instructions (including
jumps or calls).
■ The loop end packet in loop0 cannot contain any instruction that changes SA0 or LC0.
Similarly, the loop end packet in loop1 cannot contain any instruction that changes SA1 or
LC1.
■ The loop end packet in spNloop0 cannot contain any instruction that changes P3.

NOTE: SA1 and LC1 can be changed at the end of loop0, while SA0 and LC0 can be changed at the end of
loop1.

8.3 Software branches


Unlike hardware loops, software branches use an explicit instruction to perform a branch
operation. Software branches include jumps, calls, and returns.
The target address for branch instructions is specified as register indirect or PC-relative offsets.
PC-relative offsets are normally less than 32 bits, but can be specified as 32 bits by using the
appropriate syntax in the target operand (Section 8.3.4).
Branch instructions are unconditional or conditional, with the execution of conditional
instructions controlled by a predicate expression.

Table 8-4 Software branch instructions


Syntax Operation
[if (pred_expr)] jump label Branch to address specified by register Rs or PC-
[if (pred_expr)] jumpr Rs relative offset.
Can be conditionally executed.
[if (pred_expr)] call label Branch to address specified by register Rs or PC-
[if (pred_expr)] callr Rs relative offset.
Store subroutine return address in link register LR.
Can be conditionally executed.
[if (pred_expr)] jumpr LR Branch to subroutine return address contained in link
register LR.
Can be conditionally executed.

80-N2040-53 Rev. AA 113


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

8.3.1 Jumps
Jump instructions change the program flow to a target address, which are specified by either a
register or a PC-relative immediate value. Jump instructions can be conditional based on the
value of a predicate expression.

Table 8-5 Jump instructions


Syntax Operation
jump label Direct jump.
Branch to address specified by label.
Label is encoded as PC-relative signed immediate value.
jumpr Rs Indirect jump.
Branch to address contained in general register Rs.
if ([!]Ps) jump label Conditional jump.
if ([!]Ps) jumpr Rs Perform jump if predicate expression evaluates to true.

NOTE: Conditional jumps can be specified as speculative (Section 8.4).

8.3.2 Calls
Call instructions jump to subroutines. The instruction performs a jump to the target address and
also stores the return address in the link register LR.
The forms of call are functionally similar to jump instructions and include both PC-relative and
register indirect in both unconditional and conditional forms.

Table 8-6 Call instructions


Syntax Operation
call label Direct subroutine call.
Branch to address specified by label, and store return address in
register LR. Label is encoded as PC-relative signed immediate value.
callr Rs Indirect subroutine call.
Branch to address contained in general register Rs, and store return
address in register LR.
if ([!]Ps) call label Conditional call.
if ([!]Ps) callr Rs If predicate expression evaluates to true, perform subroutine call to
specified target address.

80-N2040-53 Rev. AA 114


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

8.3.3 Returns
Return instructions return from a subroutine. The instruction performs an indirect jump to the
subroutine return address stored in link register LR.
Returns are implemented as jump register indirect instructions, and support both unconditional
and conditional forms.

Table 8-7 Return instructions


Syntax Operation
jumpr LR Subroutine return.
Branch to subroutine return address contained in link register
LR.
if ([!]Ps) jumpr LR Conditional subroutine return.
If predicate expression evaluates to true, perform subroutine
return to specified target address.
dealloc_return Subroutine return with stack frame deallocate.
Perform deallocframe operation (Section 7.5) and then
perform subroutine return to the target address loaded by
deallocframe from the link register.
if ([!]Ps) Conditional subroutine return with stack frame deallocate.
dealloc_return If predicate expression evaluates to true, perform
deallocframe and subroutine return to the target address
loaded by deallocframe from the link register.

NOTE: The link register LR is an alias of general register R31. Therefore subroutine returns can be
performed with the instruction jumpr R31.

The conditional subroutine returns (including dealloc_return) can be specified as speculative


(Section 8.4).

80-N2040-53 Rev. AA 115


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

8.3.4 Extended branches


When a jump or call instruction specifies a PC-relative offset as the branch target, the offset value
is normally encoded in significantly less than 32 bits. This can limit the ability for programs to
specify “long” branches, which span a large range of the processor’s memory address space.
To support long branches, the jump and call instructions have special versions that encode a full
32-bit value as the PC-relative offset.

NOTE: Such instructions use an extra word to store the 32-bit offset (Section 10.9).

The size of a PC-relative branch offset is expressed in assembly language by optionally prefixing
the target label with the symbol “##” or “#”:
■ “##” specifies that the assembler must use a 32-bit offset
■ “#” specifies that the assembler must not use a 32-bit offset
■ No “#” specifies that the assembler use a 32-bit offset only if necessary
For example:
jump ##label // 32-bit offset
call #label // Non 32-bit offset
jump label // Offset size determined by assembler

8.3.5 Branches to and from packets


Instruction packets are atomic: even if they contain multiple instructions, they are referenced
only by the address of the first instruction in the packet. Therefore, branches to a packet can
target only the packet’s first instruction.
Packets can contain up to two branches (Section 8.7). The branch destination can target the
current packet or the beginning of another packet.
A branch does not interrupt the execution of the current packet: all of the instructions in the
packet execute, even if they appear in the assembly source after the branch instruction.
If a packet is at the end of a hardware loop, it cannot contain a branch instruction.

80-N2040-53 Rev. AA 116


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

8.4 Speculative jumps


Conditional instructions normally depend on predicates that are generated in a previous
instruction packet. However, Dot-new predicates enable conditional instructions to use a
predicate generated in the same packet that contains the conditional instruction.
When dot-new predicates are used with a conditional jump, the resulting instruction is called a
speculative jump. For example:
{
P0 = cmp.eq(R9,#16) // single-packet compare-and-jump
IF (P0.new) jumpr:t R11 // ... enabled by use of P0.new
}

Speculative jumps require the programmer to specify a direction hint in the jump instruction,
indicating whether the conditional jump is expected.
The hint initializes the dynamic branch predictor of the Hexagon processor. Whenever the
predictor is wrong, the speculative jump instruction takes two cycles to execute instead of one
(due to a pipeline stall).
Hints can improve program performance by indicating how speculative jumps are expected to
execute over the course of a program: the more often the specified hint indicates how the
instruction actually executes, the better the performance.
Hints are expressed in assembly language by appending the suffix “:t” or “:nt” to the jump
instruction symbol. For example:
■ jump:t – The jump instruction is most often taken
■ jump:nt – The jump instruction is most often not taken
In addition to dot-new predicates, speculative jumps also accept conditional arithmetic
expressions (=0, !=0, >=0, <=0) involving the general register Rs.

Table 8-8 Speculative jump instructions


Syntax Operation
if ([!]Ps.new) jump:t label Speculative direct jump.
if ([!]Ps.new) jump:nt label If predicate expression evaluates to true, jump to address
specified by label.
if ([!]Ps.new) jumpr:t Rs Speculative indirect jump.
if ([!]Ps.new) jumpr:nt Rs If predicate expression evaluates to true, jump to address in
register Rs.
if (Rs == #0) jump:t label Speculative direct jump.
if (Rs == #0) jump:nt label If predicate Rs = 0 is true, jump to address specified by label.
if (Rs != #0) jump:t label Speculative direct jump.
if (Rs != #0) jump:nt label If predicate Rs != 0 is true, jump to address specified by label.
if (Rs >= #0) jump:t label Speculative direct jump.
if (Rs >= #0) jump:nt label If predicate Rs >= 0 is true, jump to address specified by label.
if (Rs <= #0) jump:t label Speculative direct jump.
if (Rs <= #0) jump:nt label If predicate Rs <= 0 is true, jump to address specified by label.

80-N2040-53 Rev. AA 117


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

NOTE: The hints :t and :nt interact with the predicate value to determine the instruction cycle count.

Speculative indirect jumps are not supported with register Rs predicates.

8.5 Compare jumps


To reduce code size, the Hexagon processor supports a compound instruction that combines a
compare with a speculative jump in a single 32-bit instruction.
For example:
{
p0 = cmp.eq (R2,R5) // Single-instr compare-and-jump
if (p0.new) jump:nt target // Enabled by compound instr
}

The register operands used in a compare jump are limited to R0 through R7 or R16 through R23
(Table 10-3).
The compare and jump instructions that are used in a compare jump are limited to the
instructions listed in Table 8-9. The compare can use predicate P0 or P1, while the jump must
specify the same predicate that is set in the compare.
A compare jump instruction is expressed in assembly source as two independent compare and
jump instructions in a packet. The assembler translates the two instructions into a single
compound instruction.

Table 8-9 Compare jump instructions


Compare Instruction Jump Instruction
Pd = cmp.eq (Rs, Rt) IF (Pd.new) jump:t label
Pd = cmp.gt (Rs, Rt) IF (Pd.new) jump:nt label
Pd = cmp.gtu (Rs, Rt) IF (!Pd.new) jump:t label
Pd = cmp.eq (Rs,#U5) IF (!Pd.new) jump:nt label
Pd = cmp.gt (Rs,#U5)
Pd = cmp.gtu (Rs,#U5)
Pd = cmp.eq (Rs,#-1)
Pd = cmp.gt (Rs,#-1)
Pd = tstbit (Rs, #0)

8.5.1 New-value compare jumps


A compare jump instruction can access a register that is assigned a new value in the same
instruction packet (Section 3.3). This feature is expressed in assembly language by the following
changes:
■ Appending the suffix “.new” to the new-value register in the compare
■ Rewriting the compare jump so its constituent compare and jump operations appear as a
single conditional instruction
For example:
// load-compare-and-jump packet enabled by new-value compare jump

80-N2040-53 Rev. AA 118


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

{
R0 = memw(R2+#8)
if (cmp.eq(R0.new,#0)) jump:nt target
}

New-value compare jump instructions have the following restrictions:


■ They are limited to the instruction forms listed in Table 8-10.
■ They cannot be combined with another jump instruction in the same packet.
■ If an instruction produces a 64-bit result or performs a floating-point operation, its result
registers cannot be used as the new-value register.
■ If an instruction uses auto-increment or absolute-set addressing mode (Section 5.8), its
address register cannot be used as the new-value register.
■ If the instruction that sets a new-value register is conditional (Section 6.1.2), it must always
execute.
If the specified jump direction hint is wrong (Section 8.4), a new-value compare jump takes three
cycles to execute instead of one. While this penalty is one cycle longer than in a regular
speculative jump, the overall performance is still better than using a regular speculative jump
(which must execute an extra packet in all cases).

NOTE: New-value compare jump instructions are assigned to instruction class NV, which can execute
only in Slot 0. The instruction that assigns the new value must execute in Slot 1, 2, or 3.

Table 8-10 New-value compare jump instructions


if ([!]cmp.eq (Rs.new, Rt)) jump:[hint] label
if ([!]cmp.gt (Rs.new, Rt)) jump:[hint] label
if ([!]cmp.gtu (Rs.new, Rt)) jump:[hint] label

if ([!]cmp.gt (Rs, Rt.new)) jump:[hint] label


if ([!]cmp.gtu (Rs, Rt.new)) jump:[hint] label
if ([!]cmp.eq (Rs.new, #u5)) jump:[hint] label
if ([!]cmp.gt (Rs.new, #u5)) jump:[hint] label
if ([!]cmp.gtu (Rs.new ,#u5)) jump:[hint] label

if ([!]cmp.eq (Rs.new, #-1)) jump:[hint] label


if ([!]cmp.gt (Rs.new, #-1)) jump:[hint] label

if ([!]tstbit (Rs.new, #0)) jump:[hint] label

80-N2040-53 Rev. AA 119


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

8.6 Register transfer jumps


To reduce code size, the Hexagon processor supports a compound instruction that combines a
register transfer with an unconditional jump in a single 32-bit instruction.
For example:
{
jump target // Jump to label “target”
R1 = R2 // Assign contents of reg R2 to R1
}

The source and target register operands in the register transfer are limited to R0 through R7 or
R16 through R23 (Table 10-3).
The target address in the jump is a scaled 9-bit PC-relative address value (as opposed to the 22-bit
value in the regular unconditional jump instruction).
A register transfer jump instruction is expressed in assembly source as two independent
instructions in a packet. The assembler translates the instructions into a single compound
instruction.

Table 8-11 Register transfer jump instructions


Syntax Operation
jump label; Register transfer jump.
Rd=Rs
Perform register transfer and branch to address specified by label.
Label is encoded as PC-relative 9-bit signed immediate value.
jump label; Register transfer immediate jump.
Rd=#u6
Perform register transfer (of 6-bit unsigned immediate value) and
branch to address specified by label.
Label is encoded as PC-relative 9-bit signed immediate value.

80-N2040-53 Rev. AA 120


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

8.7 Dual jumps


Two software branch instructions (referred to here as “jumps”) can appear in the same
instruction packet, under the conditions listed in Table 8-12.
The first jump is defined as the jump instruction at the lower address, and the second jump as the
jump instruction at the higher address.
Unlike most packetized operations, dual jumps are not executed in parallel (Section 3.3.1).
Instead, the two jumps are processed in a well-defined order in a packet:
1. The predicate in the first jump is evaluated.
2. If the first jump is taken, the second jump is ignored.
3. If the first jump is not taken, the second jump is performed.

Table 8-12 Dual jump instructions


First jump Second jump
Instruction Description
in packet? in packet?
jump Direct jump No Yes
if ([!]Ps[.new]) jump Conditional jump Yes Yes
call Direct calls No Yes
if ([!]Ps) call
Pd=cmp.xx ; if ([!]Pd.new) Compare jump Yes Yes
jump
if ([!]cmp.xx(Rs.new, Rt)) New-value compare jump No No
jump
jumpr Indirect jumps No No
if ([!]Ps[.new]) jumpr Indirect calls
callr dealloc_return
if ([!]Ps) callr
dealloc_return
if ([!]Ps[.new])
dealloc_return
endloopN Hardware loop end No No

NOTE: If a call is ignored in a dual jump, the link register LR is not changed.

8.8 Hint indirect jump target


Because it obtains the jump target address from a register, the jumpr instruction (Section 8.3.1)
normally causes the processor to stall for one cycle.
To avoid the stall penalty caused by a jumpr instruction, the Hexagon processor supports the
hintjr jump hint instruction, which can be specified before the jumpr instruction.

80-N2040-53 Rev. AA 121


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

The hintjr instruction indicates that the program is about to execute a jumpr to the address
contained in the specified register.

Table 8-13 Speculative jump hint instruction


Syntax Operation
hintjr(Rs) Informs the processor that the jumpr(Rs) instruction is about to be
performed.

NOTE: To prevent a stall, the hintjr instruction must execute at least 2 packets before the corresponding
jumpr instruction.

The hintjr instruction is not needed for jumpr instructions used as returns (Section 8.3.3),
because in this case the Hexagon processor automatically predicts the jump targets based on the
most recent nested call instructions.

8.9 Pauses
Pauses suspend the execution of a program for a period of time, and put it into low-power mode.
The program remains suspended for the duration specified in the instruction.
The pause instruction accepts an unsigned 8-bit immediate operand which specifies the pause
duration in terms of cycles. The maximum possible duration is 263 cycles (255+8).
Hexagon processor interrupts cause a program to exit the paused state before its specified
duration has elapsed.
The pause instruction is useful for implementing user-level low-power synchronization operations
(such as spin locks).

Table 8-14 Pause instruction


Syntax Operation
pause(#u8) Suspend program in low-power mode for specified cycle
duration.

8.10 Exceptions
Exceptions are internally-generated disruptions to the program flow.
The Hexagon processor OS handles fatal exceptions by terminating the execution of the
application system. The user is responsible for fixing the problem and recompiling their
applications.
The error messages generated by exceptions include the following information to assist in
locating the problem:
■ Cause code – Hexadecimal value indicating the type of exception
■ User IP – PC value indicating the instruction executed when the exception occurred
■ Bad VA – Virtual address indicating the data accessed when the exception occurred

80-N2040-53 Rev. AA 122


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

NOTE: The cause code, user IP, and Bad VA values are stored in the Hexagon processor system control
registers SSR[7:0], ELR, and BADVA respectively.

If multiple exceptions occur simultaneously, the exception with the lowest error code value has
the highest exception priority.
If a packet contains multiple loads, or a load and a store, and both operations have an exception
of any type, all slot 1 exceptions process before any slot 0 exception is processed.

Table 8-15 V73 exceptions


Cause
Event type Event description Notes
code
0x0 Reset Software thread reset. Non-maskable,
highest priority
0x01 Precise, Unrecoverable BIU error (bus error, timeout, L2 Non-maskable
unrecoverable parity error, and so on).
0x03 Precise, Double exception (exception occurs while Non-maskable
unrecoverable SSR[EX]=1).
0x11 Precise Privilege violation: User/Guest mode execute to Non-maskable
page with no execute permissions (X=0).
0x12 Precise Privilege violation: User mode execute to a Non-maskable
page with no user permissions (X=1, U=0).
0x15 Precise Invalid packet. Non-maskable
0x16 Precise Illegal execution of coprocessor instruction. Non-maskable
0x17 Precise Instruction cache error. Non-maskable
0x1A Precise Privilege violation: Executing a guest mode Non-maskable
instruction in user mode.
0x1B Precise Privilege violation: Executing a supervisor Non-maskable
instruction in User/Guest mode.
0x1D Precise, Packet with multiple writes to the same Non-maskable
unrecoverable destination register.
0x1E Precise, Program counter values that are not properly Non-maskable
unrecoverable aligned.
0x20 Precise Load to misaligned address. Non-maskable
0x21 Precise Store to misaligned address. Non-maskable
0x22 Precise Privilege violation: User/Guest mode read to Non-maskable
page with no read permission (R=0).
0x23 Precise Privilege violation: User/Guest mode write to Non-maskable
page with no write permissions (W=0).
0x24 Precise Privilege violation: User mode read to page with Non-maskable
no user permission (R=1, U=0).
0x25 Precise Privilege violation: User mode write to page with Non-maskable
no user permissions (W=1, U=0).
0x26 Precise Coprocessor VMEM address error. Non-maskable
0x27 Precise Stack overflow: Allocframe instruction exceeded Non-maskable,
FRAMELIMIT.

80-N2040-53 Rev. AA 123


Qualcomm Hexagon V73 Programmer’s Reference Manual Program Flow

Table 8-15 V73 exceptions (cont.)


Cause
Event type Event description Notes
code
0x42 Imprecise Data abort. Non-maskable
0x43 Imprecise NMI. Non-maskable
0x44 Imprecise Multiple TLB match. Non-maskable
0x45 Imprecise Livelock exception. Non-maskable
0x60 TLB miss-X Missing fetch address on PC-page. Non-maskable
0x61 TLB miss-X Missing fetch on second page from packet that Non-maskable
spans pages.
0x62 TLB miss-X icinva. Non-maskable
Reserved
0x70 TLB miss-RW Memory read. Non-maskable
0x71 TLB miss-RW Memory write. Non-maskable
Reserved
#u8 Trap0 Software Trap0 instruction. Non-maskable
#u8 Trap1 Software Trap1 instruction. Non-maskable
Reserved
0x80 Debug Single-step debug exception.
Reserved
0xBF Floating-point Execution of floating-point instruction resulted in Non-maskable
exception.
0xC0 Interrupt0 General external interrupt. Maskable, highest
priority general
interrupt
0xC1 Interrupt 1 General external interrupt Maskable

0xC2 Interrupt 2 General external interrupt VIC0 interface


0xC3 Interrupt 3 General external interrupt VIC1 interface
0xC4 Interrupt 4 General external interrupt VIC2 interface
0xC5 Interrupt 5 General external interrupt VIC3 interface
0xC6 Interrupt 6 General external interrupt
0xC7 Interrupt 7 General external interrupt Lowest-priority
interrupt

80-N2040-53 Rev. AA 124


9 PMU Events

The Hexagon processor can collect execution statistics on the applications it executes. The
statistics summarize the various types of Hexagon processor events that occurred while the
application was running.
Execution statistics are collected in hardware or software:
■ Statistics are collected in hardware with the performance monitor unit (PMU), which is
defined as part of the Hexagon processor architecture.
■ Statistics are collected in software using the Hexagon simulator. The simulator statistics are
presented in the same format used by the PMU.
Execution statistics are expressed in terms of processor events. This chapter defines the event
symbols, along with their associated numeric codes.

NOTE: Because the types of execution events vary across the Hexagon processor versions, different
types of statistics are collected for each version. This chapter lists the event symbols defined for
version V73.

9.1 V73 processor event symbols


Table 9-1 defines the symbols that represent processor events for the V73 Hexagon processor.

Table 9-1 V73 processor events symbols


Event Symbol Definition
0x1 COUNTER0_OVERFLOW Event detected by counter1 to build an effective 64-bit counter.
0x2 COUNTER2_OVERFLOW Event detected by counter3 to build an effective 64-bit counter.
0x3 COMMITTED_PKT_ANY Number of packets that are committed by any thread. Packets
are executed.
0x4 COMMITTED_PKT_BSB Number of packets that are committed two cycles after an ear-
lier packet in the same thread.
0x5 COUNTER4_OVERFLOW Event detected by counter5 to build an effective 64-bit counter.
0x6 COUNTER6_OVERFLOW Event detected by counter7 to build an effective 64-bit counter.
0x7 COMMITTED_PKT_B2B Number of packets that are committed one cycle after the ear-
lier packet in the same thread.
0x8 COMMITTED_PKT_SMT Number of packets that are committed on the SMT threads. In-
cludes the second, third, and fourth packets that are commit-
ted in one cycle.
0xa CYCLES_5_THREAD_RUNNING Processor cycles that exactly five threads are running. Run-
ning means the threads are not in the Wait or Stop state.

80-N2040-53 Rev. AA 125


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0xb CYCLES_6_THREAD_RUNNING Processor cycles that exactly six threads are running. Running
means the threads are not in the Wait or Stop state.
0xc COMMITTED_PKT_T0 Number of packets that are committed by thread 0. Packets
are executed.
0xd COMMITTED_PKT_T1 Number of packets that are committed by thread 1. Packets
are executed.
0xe COMMITTED_PKT_T2 Number of packets that are committed by thread 2. Packets
are executed.
0xf COMMITTED_PKT_T3 Number of packets that are committed by thread 3. Packets
are executed.
0x10 COMMITTED_PKT_T4 Number of packets that are committed by thread 4. Packets
are executed.
0x11 COMMITTED_PKT_T5 Number of packets that are committed by thread 5. Packets
are executed.
0x12 ICACHE_DEMAND_MISS Number of I-cache cacheable demand primary or secondary
misses. Includes secondary misses.
0x13 DCACHE_DEMAND_MISS Number of D-cache cacheable demand primary or secondary
misses. Includes dczero stalls. Excludes uncacheables,
prefetches, and no-allocate store misses.
0x14 DCACHE_STORE_MISS Number of D-cache cacheable store misses.
0x15 COMMITTED_PKT_T6 Thread 6 committed a packet. Packets are executed.
0x16 COMMITTED_PKT_T7 Thread 7 committed a packet. Packets are executed.
0x17 CU_PKT_READY_NOT_DISPATCHED Packets were ready at the CU scheduler but were not sched-
uled because either the scheduler’s thread was not picked or
there was an inter-cluster resource conflict.
0x18 COMMITTED_PKT_5_THREAD_RUNNING Number of committed packets with five threads running. Run-
ning means the threads are not in Wait or Stop mode.
0x19 COMMITTED_PKT_6_THREAD_RUNNING Number of committed packets with six threads running. Run-
ning means the threads are not in Wait or Stop mode.
0x1a COMMITTED_PKT_7_THREAD_RUNNING Number of committed packets with seven threads running.
Running means the threads are not in Wait or Stop mode.
0x1b COMMITTED_PKT_8_THREAD_RUNNING Number of committed packets with eight threads running. Run-
ning means the threads are not in Wait or Stop mode.
0x1c IU_L1S_ACCESS Number of IU L1S loads. Includes demands or prefetches.
0x1d IU_L1S_PREFETCH Number of IU L1S prefetches.
0x1e IU_L1S_AXIS_STALL Number of IU L1S stalls due to an AXI slave.
0x1f IU_L1S_NO_GRANT IU request to L1S, and no grant from the vector unit.
0x20 ANY_IU_REPLAY Any IU stall other than an I-cache miss. Includes a jump regis-
ter stall, fetchcross stall, ITLB miss stall, and so on. Excludes
a CU replay.
0x21 ANY_DU_REPLAY Any DU replay. Includes a bank conflict, store buffer full, and
so on. Excludes a stall due to a cache miss.

80-N2040-53 Rev. AA 126


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0x22 CYCLES_7_THREAD_RUNNING Processor cycles that exactly seven threads are running. Run-
ning means the threads are not in Wait or Stop mode.
0x23 ISSUED_PACKETS Speculatively issued packets were delivered from an IU.
0x24 LOOPCACHE_PACKETS Committed packets were cloned from the packet queue during
a pinned hardware loop.
0x25 COMMITTED_PKT_1_THREAD_RUNNING Number of committed packets with one thread running. Run-
ning means the thread is not in Wait or Stop mode.
0x26 COMMITTED_PKT_2_THREAD_RUNNING Number of committed packets with two threads running. Run-
ning means the threads are not in Wait or Stop mode.
0x27 COMMITTED_PKT_3_THREAD_RUNNING Number of committed packets with three threads running.
Running means the threads are not in Wait or Stop mode.
0x28 THREAD_LMH_THROTTLE For a specific thread, the sustained power exceeds the limits
management threshold and limits budget threshold. Results in
throttling that is based on thread priority.
0x29 LMH_THROTTLE Throttling is based on the value of the peak current that is over
the current limits of LMH.
0x2a COMMITTED_INSTS Number of committed instructions. Increments by up to eight
per cycle. Duplex of two instructions counts as two instruc-
tions. Does not include end loops.
0x2b COMMITTED_TC1_INSTS Number of committed TC1 class instructions. Increments by
up to eight per cycle. Duplex of two TC1 instructions counts as
two separate TC1 instructions. Does not include NOP instruc-
tions.
0x2c COMMITTED_PRIVATE_INSTS Number of committed instructions that have per-cluster (pri-
vate) execution resources. Increments by up to eight per cycle.
Duplex of two private instructions counts as two private in-
structions.
0x2d GLOBAL_POWERLIMITS_OVER Sustained global power that exceeds the overall global limits
management threshold and limits the budget threshold. Caus-
es the thread-specific LMH to engage.
0x2e CYCLES_8_THREAD_RUNNING Processor cycles that exactly eight threads are running. Run-
ning means the threads are not in Wait or Stop mode.
0x2f COMMITTED_PKT_4_THREAD_RUNNING Number of committed packets with four threads running. Run-
ning means the threads are not in Stop or Wait mode.
0x30 COMMITTED_LOADS Number of committed load instructions. Includes cached and
uncached. Increments by two for dual loads. Excludes
prefetches, memory operations, and coprocessor loads.
0x31 COMMITTED_STORES Number of committed store instructions. Includes cached and
uncached. Increments by two for dual stores. Excludes mem-
ory operations and coprocessor stores.
0x32 COMMITTED_MEMOPS Number of committed memory operations instructions.
Cached or uncached.
0x33 COMMITTED_NOPS Number of committed NOPs.
0x34 ISSUED_INSTS Speculatively issued instructions delivered from the IU.

80-N2040-53 Rev. AA 127


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0x35 DISPATCHED_PACKETS Number of packets that the CU dispatched. NOP instructions
are squashed.
0x36 DISPATCHED_INSTS Number of instructions that the CU dispatched.
0x37 COMMITTED_PROGRAM_FLOW_INSTS Number of committed packets that contain a program flow in-
struction. Includes CR jumps, endloop, J, JR, dealloc_return,
system/trap, superset of event 56. Dual jumps count as two
jumps.
0x38 COMMITTED_PKT_CHANGED_FLOW Number of committed packets that resulted in a change of
flow. Any taken jump. Includes endloop and dealloc_return.
0x39 COMMITTED_PKT_ENDLOOP Number of committed packets containing an end loop that was
taken.
0x3a PST_USED_P0P1BUSY Number of times a store port was used when p0 and p1 were
both occupied. Only increments when store port is present.
0x3b CYCLES_1_THREAD_RUNNING Processor cycles that exactly one thread is running. Running
means the thread is not in Wait or Stop mode.
0x3c CYCLES_2_THREAD_RUNNING Processor cycles that exactly two threads are running. Run-
ning means the threads are not in Wait or Stop mode.
0x3d CYCLES_3_THREAD_RUNNING Processor cycles that exactly three threads are running. Run-
ning means the threads are not in Wait or Stop mode.
0x3e CYCLES_4_THREAD_RUNNING Processor cycles that exactly four threads are running. Run-
ning means the threads are not in Wait or Stop mode.
0x3f AXI_LINE128_READ_REQUEST Number of 128-byte line read requests issued by the primary
AXI master. Includes all interleaved requests.
0x40 AXI_READ_REQUEST All read requests issued by the primary AXI master. Includes
full lines, partial lines, and all interleaved requests.
0x41 AXI_LINE32_READ_REQUEST Number of 32-byte line read requests issued by the primary
AXI master. Includes all interleaved requests.
0x42 AXI_WRITE_REQUEST All write requests issued by the primary AXI master. Includes
full lines, partial lines, and all interleaved requests.
0x43 AXI_LINE32_WRITE_REQUEST Number of 32-byte line write requests issued by the primary
AXI master. Includes all interleaved requests. All bytes are val-
id.
0x44 AHB_READ_REQUEST Number of read requests issued by the AHB master.
0x45 AHB_WRITE_REQUEST Number of write requests issued by the AHB master.
0x46 AXI_LINE128_WRITE_REQUEST Number of 128-byte line write requests issued by the primary
AXI master. Includes all interleaved requests. All bytes are val-
id.
0x47 AXI_SLAVE_MULTI_BEAT_ACCESS Number of AXI slave multi-beat accesses.
0x48 AXI_SLAVE_SINGLE_BEAT_ACCESS Number of AXI slave single-beat accesses.
0x49 AXI2_READ_REQUEST All read requests issued by the secondary AXI master. In-
cludes full lines and partial lines.
0x4a AXI2_LINE32_READ_REQUEST Number of 32-byte line read requests issued by the secondary
AXI master.

80-N2040-53 Rev. AA 128


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0x4b AXI2_WRITE_REQUEST All write requests issued by the secondary AXI master. In-
cludes full lines and partial lines.
0x4c AXI2_LINE32_WRITE_REQUEST Number of 32-byte line write requests issued by the secondary
AXI master.
0x4d AXI2_CONGESTION Secondary AXI command or data queue is full. An operation is
stuck at the head of the secondary AXI master command
queue.
0x50 COMMITTED_FPS Number of committed floating point instructions. Increments
by two for dual floating-point operations. Excludes conver-
sions.
0x51 REDIRECT_BIMODAL_MISPREDICT Mispredicted bimodal branch direction caused a control flow
redirect.
0x52 REDIRECT_TARGET_MISPREDICT Mispredicted branch target caused a control flow redirect. In-
cludes an RAS mispredict, and HintJR mispredict. Excludes
indirect jumps and calls other than JUMPR R31 returns. Ex-
cludes direction mispredicts.
0x53 REDIRECT_LOOP_MISPREDICT Mispredicted hardware loop end caused a control flow redi-
rect. Can only happen when the loop has few packets and the
loop count is 2 or less.
0x54 REDIRECT_MISC Control flow is redirected for a reason other than events 81,
82, and 83. Includes exceptions, traps, interrupts, non-R31
jumps, multiple initialization loops in flight, and so on.
0x55 AXI_LINE256_WRITE_REQUEST Number of 256-byte line write requests issued by the AXI mas-
ter. All bytes are valid.
0x56 NUM_PACKET_CRACKED Number of packets that cracked.
0x58 JTLB_MISS Instruction or data address translation request was missed in
the JTLB.
0x5a COMMITTED_PKT_RETURN Number of committed return instructions. Includes canceled
returns.
0x5b COMMITTED_PKT_INDIRECT_JUMP Number of committed indirect jumps or call instructions. In-
cludes canceled instructions. Does not include JUMPR R31
returns.
0x5c COMMITTED_BIMODAL_BRANCH_INSTS Number of committed bimodal branches. Includes *.old and
*.new. Increments by two for dual jumps.
0x5f VTCM_FIFO_FULL_CYCLES Cycles cluster can be issued if the VTCM FIFO queue is full.
0x60 DU_STORE_BUFFER_COALESCED Number of times an incoming store was coalesced into an ex-
isting valid store buffer entry. Each valid store buffer entry is
dword-aligned.
0x61 DU_L1S_LOAD_ACCESS Number of scalar load accesses to L1S.
0x62 ICACHE_ACCESS Number of I-cacheline fetches.
0x63 BTB_HIT Number of branch target buffer hits.
0x64 BTB_MISS Number of branch target buffer misses.
0x65 IU_DEMAND_SECONDARY_MISS Number of I-cache secondary misses.

80-N2040-53 Rev. AA 129


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0x67 FAST_FETCH_KILLED Number of fast fetches that were killed (after an I-cache ac-
cess).
0x69 FETCHED_PACKETS_DROPPED Number of packets that are dropped because the IU cannot
deliver them to the CU.
0x6b IU_PREFETCHES_SENT_TO_L2 Number of IU prefetches sent to the L2 cache. Includes cache-
lines not dropped by the L2 cache. Excludes replayed
prefetches and only counts prefetches the L2 accepts. Ex-
cludes IU prefetches that are sent to L2 ITCM.
0x6c ITLB_MISS Number of ITLB misses that go to JTLB.
0x72 FETCH_2_CYCLE Number of two-cycle fetches in an IU (returns, loop end, fall
through, BTB).
0x73 FETCH_3_CYCLE Number of three-cycle fetches in an IU.
0x75 L2_IU_SECONDARY_MISS Number of L2 secondary misses from an IU.
0x76 L2_IU_ACCESS Number of L2 cacheable access from an IU. Includes any ac-
cess to the L2 cache that was the result of an IU command, ei-
ther demand or L1 prefetch access. Excludes any prefetches
generated in the L2 cache. Excludes L2fetch, TCM accesses,
and uncacheables. Address must target the primary AXI mas-
ter.
0x77 L2_IU_MISS Number of L2 misses from an IU. Of the events qualified by
0x76, the event that resulted in an L2 miss (demand miss or
L1 prefetch miss). An L2 miss is any condition that prevents
the immediate return of data to the IU, excluding pipeline con-
flicts.
0x78 L2_IU_PREFETCH_ACCESS Number of prefetches from an IU to the L2 cache. Any IU
prefetch access sent to the L2 cache. Access must be L2 ca-
cheable and target the primary AXI. Does not include L2 fetch-
generated accesses.
0x79 L2_IU_PREFETCH_MISS Number of L2 misses that were IU prefetches. Of the events
qualified by 0x78, the events that resulted in an L2 miss.
0x7c L2_DU_READ_ACCESS Number of L2 cacheable read accesses from a DU. Any read
access from the DU that might cause a lookup in the L2 cache.
Includes loads, L1 prefetches, dcfetches. Excludes the initial
L2fetch command, uncacheables, TCM accesses, and copro-
cessor loads. Must target the primary AXI master.
0x7d L2_DU_READ_MISS Number of L2 read misses from a DU. Of the events qualified
by 0x7C, any event that resulted in an L2 miss (that is, the line
was not previously allocated in the L2 cache and will be
fetched from the backing memory).
0x7e L2FETCH_ACCESS Number of L2 fetch accesses from a DU. Any access to the L2
cache from the L2 prefetch engine that was initiated by pro-
gramming the L2Fetch engine.
0x7f L2FETCH_MISS Number of L2 fetch misses from a programmed inquiry. Of the
events qualified by 0x7E, the event that resulted in an L2 miss
(that is, the line was not previously allocated in the L2 cache
and will be fetched from the backing memory).

80-N2040-53 Rev. AA 130


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0x81 L2_ACCESS All requests to the L2 cache. Does not include internally gen-
erated accesses like L2 fetch, however the programming of
the L2Fetch engine is counted. All accesses to odd interleave
or even interleave are counted. Can be L2 cacheable or TCM.
0x82 L2_PIPE_CONFLICT_STALL Request is not taken by the L2 cache due to a pipe conflict.
The conflict can be a tag bank, data bank, or other pipeline
conflict.
0x83 L2_TAG_ARRAY_CONFLICT Of the items in event 130, the items caused by a conflict with
the tag array.
0x87 TCM_DU_ACCESS Number of TCM accesses from a DU. DU access to the L2
TCM space. Excludes HVX requests.
0x88 TCM_DU_READ_ACCESS Number of TCM read accesses from a DU. DU read access to
the L2 TCM space. Includes HVX requests.
0x89 TCM_IU_ACCESS Number of TCM accesses from an IU. IU access to the L2
TCM space.
0x8a L2_CASTOUT L2 cache evicts a dirty line due to an allocation. This event is
not triggered on cache operations.
0x8b L2_DU_STORE_ACCESS Number of L2 cacheable store access from a DU. Any store
access from the DU that might cause a lookup in the L2 cache.
Excludes cache operations, uncacheables, TCM, and copro-
cessor stores. Must target the primary AXI master.
0x8c L2_DU_STORE_MISS Number of L2 misses from a DU. Of the events qualified by
0x8B, the events that resulted in a miss. Specifically, the cases
where the line is not in the cache or a coalesce buffer.
0x8d L2_DU_PREFETCH_ACCESS Number of L2 prefetch accesses from a DU. Of the events
qualified by 0x7C, the events that are dcfetch and
dhwprefetch. These L2 cacheable events target the primary
AXI master.
0x8e L2_DU_PREFETCH_MISS Number of L2 prefetch misses from a DU. Of the events qual-
ified by 0x8D, the events that missed the L2 cache.
0x90 L2_DU_LOAD_SECONDARY_MISS Number of L2 load secondary misses from a DU. Hit a busy
line in the scoreboard, which prevented a return. A busy con-
dition can include pipeline bubbles caused by back-to-back
loads, like L1 UC loads.
0x91 L2FETCH_COMMAND Number of L2fetch commands. Excludes L2 fetch stop com-
mands.
0x92 L2FETCH_COMMAND_KILLED L2 fetch command was killed because a stop command was
issued. Increments once for each L2 fetch command that is
killed. If multiple commands are queued to the L2Fetch en-
gine, the kill of each command will be recorded.
0x93 L2FETCH_COMMAND_OVERWRITE L2 fetch command was overwritten. Kills an old L2 fetch com-
mand and replaces it with a new command.
0x94 L2FETCH_ACCESS_CREDIT_FAIL L2 fetch access cannot get a credit. L2 fetch is blocked due to
a missing L2 fetch or L2 evict credit.
0x95 AXI_SLAVE_READ_BUSY AXI slave read access hit a busy line.
0x96 AXI_SLAVE_WRITE_BUSY AXI slave write access hit a busy line.

80-N2040-53 Rev. AA 131


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0x97 L2_ACCESS_EVEN Of the events in 0x81, number of accesses made to the even
L2 cache.
0x98 CLADE_HIGH_PRIO_L2_ACCESS Number of IU or DU requests for a high-priority CLADE region.
Not counted for an L2 fetch.
0x99 CLADE_LOW_PRIO_L2_ACCESS Number of IU or DU requests for a low-priority CLADE region.
Not counted for an L2 fetch.
0x9a CLADE_HIGH_PRIO_L2_MISS Number of CLADE high-priority L2 accesses that missed in the
L2 cache.
0x9b CLADE_LOW_PRIO_L2_MISS Number of CLADE low-priority L2 accesses that missed in the
L2 cache.
0x9c CLADE_HIGH_PRIO_EXCEPTION CLADE high-priority decode that had an exception.
0x9d CLADE_LOW_PRIO_EXCEPTION CLADE low-priority decode that had an exception.
0x9e AXI2_SLAVE_READ_BUSY AXI secondary slave read access hit a busy line.
0x9f AXI2_SLAVE_WRITE_BUSY AXI secondary slave write access hit a busy line.
0xa0 ANY_DU_STALL Any DU stall. Increments once when the thread has a DU stall
(D-cache miss or DTLB miss).
0xa1 DU_BANK_CONFLICT_REPLAY DU bank conflict replay. Dual memory access to same bank,
but different lines.
0xa2 DU_CREDIT_REPLAY Number of times a packet took a replay because insufficient
QoS DU credits were available.
0xa3 L2_FIFO_FULL_REPLAY Number of L2 even or odd FIFO full replays.
0xa4 DU_STORE_BUFFER_FULL_REPLAY Number of DU replays because a demand load access hit in
the store buffer.
0xa7 DU_SNOOP_REQUEST Number of DU snoop requests that were accepted.
0xa8 DU_FILL_REPLAY Fill has an index conflict with an instruction from the same
thread in the pipeline. Fills and demands might be from differ-
ent threads if there is a prefetch from the deferral queue, or if
a fill has not be acknowledged for very long and forces itself
into the pipeline.
0xa9 PST_3STORETYPE_SBCONF_REPLAY Number of times a packet on the lower priority cluster had to
replay because one cluster had a dual store and the other
cluster had a single store or a memop. Only increments when
store port is present.
0xac DU_READ_TO_L2 Number of DU reads to L2 cache. Total of everything that
brings data from the L2 array. Includes prefetches (dcfetch
and hwprefetch). Excludes coprocessor loads.
0xad DU_WRITE_TO_L2 Number of DU writes to L2 cache. Total of everything that is
written out of the DU to the L2 array. Includes dczeroa. Ex-
cludes dcclean, dccleaninv, tag writes, and coprocessor
stores.
0xae PST_3LDST_L2FIFOCONF_REPLAY Number of times a packet on lower priority cluster had to re-
play because one cluster had a dual uncacheable loads and
the other cluster had a single store or the one cluster had a
dual store and other cluster had a load miss. Only increments
when store port is present.

80-N2040-53 Rev. AA 132


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0xaf DCZERO_COMMITTED Dczeroa instruction was committed.
0xb0 L2ITCM_IU_READ Number of ITCM accesses from an IU. Includes IU demand
fetches and IU prefetches. This event is not included in any
other L2 events.
0xb1 L2ITCM_DU_READ Number of L2 ITCM read accesses from a DU. Includes all de-
mands and prefetches. This event is not included in any other
L2 events.
0xb2 L2ITCM_DU_WRITE Number of L2 ITCM write accesses from a DU. Includes stores
and dczeroa events. Does not include any cache operations.
This event is not included in any other L2 events.
0xb3 DTLB_MISS DTLB miss that goes to JTLB. When both slots miss to differ-
ent pages, increments by two. When both slots miss to the
same page, only counts S1 because S1 goes first and fills for
S0.
0xb4 L2ITCM_BIMODAL_WRITES_SUCCESS Number of successful bimodal writes into L2 ITCM.
0xb6 STORE_BUFFER_HIT_REPLAY Store buffer hit is replayed because a packet with two stores is
going to the same bank but different cachelines, followed by a
load from an address that was pushed into the store buffer.
0xb7 STORE_BUFFER_FORCE_REPLAY Store buffer must drain, forcing the current packet to replay.
Typically occurs on a cache index match between the current
packet and store buffer. Can also a store buffer timeout.
0xb8 TAG_WRITE_CONFLICT_REPLAY Number of inter-cluster tag write conflicts.
0xb9 SMT_BANK_CONFLICT Number of inter-thread SMT bank conflicts.
0xba PORT_CONFLICT_REPLAY Number of all port conflict replays, including the same cluster
replays caused by high-priority fills and store buffer force
drains. Includes inter-cluster replays.
0xbb L2ITCM_BIMODAL_WRITES_DROPPED Number of bimodal writes into L2 ITCM that were dropped.
0xbc L2ITCM_IU_PREFETCH_READ Number of ITCM accesses from IU prefetches. Includes only
prefetches from an IU. This event is included in event 176. It is
not included in any other L2 events.
0xbd PAGE_CROSS_REPLAY Page cross from a valid packet that caused a replay. Excludes
pdkill packets. Counts twice if both slots cause a page cross.
0xbe PST_STORE_SENTON_OTHPORT Number of times a slot 0 store was sent to the other store buf-
fer because the other cluster had a slot 1 store or memop. Only
increments when store port is present.
0xbf DU_DEMAND_SECONDARY_MISS Number of DU demand secondary misses.
0xc0 DU_MISC_REPLAY All DU replays not counted by other replay events. This event
counts every time ANY_DU_REPLAY counts and no other DU
replay event counts.
0xc1 GUARDBUF_SETMATCH_CRACKING_RE- Number of replays taken by a younger access due to an index
PLAY match with a guard buffer entry. If the younger access hits in
D$, only the guard buffer entry of the other thread is checked.
Otherwise, an index match with either thread’s guard buffer
entry results in this replay.

80-N2040-53 Rev. AA 133


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0xc2 DU_STATE_REPLAY Number of times an access replayed because the access one
cycle ahead of it was allocating or invalidating a way in the
same index.
0xc3 DCFETCH_COMMITTED Number of dcfetches that were committed. Includes hits and
drops. Does not include convert-to-prefetches.
0xc4 DCFETCH_HIT Number of dcfetch hits in D-cache. Includes hitting valid or re-
served lines.
0xc5 DCFETCH_MISS Number of dcfetches missed in L1 cache. Counts the dcfetch-
es issued to L2 FIFO.
0xc6 DCACHE_EVICTION_IN_PIPE_REPLAY Number of replays taken by a packet because an eviction in
progress is occupying the pipe.
0xc7 STBUF_MATCH_PARTIAL_CRACK_RE- Number of replays taken by a partial crack store due to a
PLAY dword match with an existing store buffer entry.
0xc8 DU_LOAD_UNCACHEABLE Load instructions with addresses uncacheable in the L1
cache
0xc9 DU_DUAL_LOAD_UNCACHEABLE Packets where both loads have addresses uncache-
able in the L1 cache
0xca DU_STORE_UNCACHEABLE Store instructions with addresses uncacheble in the L1 cache
0xcb DU_STORE_RELEASE_CREDIT_STALL Stall occurs because there are not enough credits from the
store release.
0xcd AXI_LINE256_READ_REQUEST Number of 256-byte line read requests issued by the AXI mas-
ter. All bytes are valid.
0xce AXI_LINE64_READ_REQUEST Number of 64-byte line read requests issued by the primary
AXI master. Includes all interleaved requests.
0xcf AXI_LINE64_WRITE_REQUEST Number of 64-byte line write requests issued by the primary
AXI master. Includes all interleaved requests. All bytes are val-
id.
0xd1 AHB_8_READ_REQUEST Number of 8-byte read requests issued by the AHB.
0xd3 L2FETCH_COMMAND_PAGE_TERMINA- L2fetch command terminated because it cannot get a page
TION translation from VA to PA. Includes terminations due to per-
mission errors. That is, an address translation can fail because
the VA to PA is not in the TLB, or the properties in the transla-
tion are not acceptable and the command terminates.
0xd5 L2_DU_STORE_COALESCE Number of events from 139 that were coalesced
0xd6 L2_STORE_LINK Number of times a new store links to something else in the
scoreboard.
0xd7 L2_SCOREBOARD_70_PERCENT_FULL Increments by one for every cycle where the L2 scoreboard is
at least 70% full. For a 32-entry scoreboard, 23 or more entries
are consumed. This event continues to count even if the score-
board is more than 80% full. For more than one interleave, this
event considers only the scoreboard that has the most entries
consumed.

80-N2040-53 Rev. AA 134


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0xd8 L2_SCOREBOARD_80_PERCENT_FULL Increments by one for every cycle where the L2 scoreboard is
at least 80% full. For a 32-entry scoreboard, 26 or more entries
are consumed. This event continues to count even if the score-
board is more than 90% full. For more than one interleave, this
event considers only the scoreboard that has the most entries
consumed.
0xd9 L2_SCOREBOARD_90_PERCENT_FULL Increments by one for every cycle where the L2 scoreboard is
at least 90% full. For a 32-entry scoreboard, 29 or more entries
are consumed. For more than one interleave, this event con-
siders only the scoreboard that has the most entries con-
sumed.
0xda L2_SCOREBOARD_FULL_REJECT L2 scoreboard is too full to accept a selector request, and the
selector has a request.
0xdc L2_EVICTION_BUFFERS_FULL Counts every cycle when all eviction buffers in any interleave
are occupied.
0xdd AHB_MULTI_BEAT_READ_REQUEST Number of 32-byte multi-beat read requests issued by the
AHB.
0xdf L2_DU_LOAD_SECOND- Of the events in 0x90, the events where the primary miss was
ARY_MISS_ON_SW_PREFETCH a DC fetch or L2 fetch.
0xe0 L2FETCH_DROP L2 fetch data dropped because a previous eviction has not
completed.
0xe5 THREAD_OFF_PVIEW_CYCLES Cycles cluster cannot commit because a thread is in the Off or
Wait state.
0xe6 ARCH_LOCK_PVIEW_CYCLES Cycles cluster cannot commit due to a kernel lock or TLB lock.
0xe7 REDIRECT_PVIEW_CYCLES Cycles cluster cannot commit because of redirects such as
branch mispredicts.
0xe8 IU_NO_PKT_PVIEW_CYCLES Cycles cluster cannot commit because the interrupt queue is
empty.
0xe9 DU_CACHE_MISS_PVIEW_CYCLES Cycles cluster cannot commit due to a D-cache cacheable
miss.
0xea DU_BUSY_OTHER_PVIEW_CYCLES Cycles cluster cannot commit due to a DU replay, DU bubble,
or DTLB miss.
0xeb CU_BUSY_PVIEW_CYCLES Cycles cluster cannot commit due to a register interlock, reg-
ister port conflict, bubbles due to a timing class such as
tc_3stall, no B2B HVX, or HVX FIFO is full.
0xec SMT_DU_CONFLICT_PVIEW_CYCLES Cycles cluster cannot commit due to a DU resource conflict.
0xed COPROC_BUSY_PVIEW_CYCLES Cycles cluster cannot commit due to a D-cache uncacheable
access.
0xee DU_UNCACHED_PVIEW_CYCLES Cycles cluster cannot commit due to a DU resource conflict.
0xef SYSTEM_BUSY_PVIEW_CYCLES Cycles cluster cannot commit due to system level stalls, in-
cluding DMA synchronization, ETM is full, Qtimer read is not
ready, AXI bus is busy, and global cache operations synchro-
nization.
0xf1 AXI_LINE128_READ_REQUEST_EVEN Number of 128-byte line read requests issued by the even in-
terleaved AXI master.

80-N2040-53 Rev. AA 135


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0xf2 AXI_READ_REQUEST_EVEN All read requests issued by the even interleaved AXI master.
0xf3 AXI_LINE32_READ_REQUEST_EVEN Number of 32-byte line read requests issued by the even-inter-
leaved AXI master.
0xf4 AXI_WRITE_REQUEST_EVEN All write requests issued by the even-interleaved AXI master.
0xf5 AXI_LINE32_WRITE_REQUEST_EVEN Number of 32-byte line write requests issued by the even-in-
terleaved AXI master. All bytes are valid.
0xf6 AXI_LINE128_WRITE_REQUEST_EVEN Number of 128-byte line write requests issued by the even-in-
terleaved AXI master. All bytes are valid.
0xf8 AXI_LINE64_READ_REQUEST_EVEN Number of 64-byte line read requests issued by the even-inter-
leaved AXI master.
0xf9 AXI_LINE64_WRITE_REQUEST_EVEN Number of 64-byte line write requests issued by the even-in-
terleaved AXI master. All bytes are valid.
0xfa AXI_WR_CONGESTION_EVEN Even-interleaved AXI write command or data queue is full, and
an operation is stuck at the head of the even interleaved AXI
master command queue.
0xfb AXI_INCOMPLETE_WRITE_RE- L2 line-sized write was made to the even-interleaved AXI mas-
QUEST_EVEN ter, but not all bytes were valid. Includes segmented writes.
Excludes WT stores. This event captures the number of writes
coalesced at a line level.
0xfc AXI_LINE256_READ_REQUEST_EVEN Number of 256-byte line read requests issued by an even-in-
terleaved AXI master. All bytes are valid.
0xfd AXI_LINE256_WRITE_REQUEST_EVEN Number of 256-byte line write requests issued by an even-in-
terleaved AXI master. All bytes are valid.
0xfe CYCLES_3_COPROC_THREADS_ONE_- Number of processor cycles during which a cluster has three
CLUSTER threads in Run mode with the coprocessor bit (SSR.XE) en-
abled.
0x11b HVXLD_L2_SECONDARY_MISS Of the events in 0xFB, the events where the load cannot be re-
turned due to the immediately prior access for the line being a
pending load or pending L2Fetch
0x2fa L2_CLEAN_CASTOUT Number of clean line evictions from L2 cache. Triggers when
L2 cache evicts a line due to an allocation. Not triggered on
cache operations.
0x2fb AXI3_READ_REQUEST All read requests issued by the tertiary AXI master. Includes
full lines and partial lines.
0x2fc AXI3_LINE32_READ_REQUEST Number of 32-byte line read requests issued by the tertiary
AXI master.
0x2fd AXI3_WRITE_REQUEST All write requests issued by the tertiary AXI master. Includes
full lines and partial lines.
0x2fe AXI3_LINE32_WRITE_REQUEST Number of 32-byte line write requests issued by the tertiary
AXI master.
0x2ff AXI3_RD_CONGESTION Tertiary AXI read command queue is full, and an operation is
stuck at the head of the primary AXI master command queue.
Includes all interleaved requests.
0x300 CYCLES_1_PACKET_COMMITTED Number of cycles when one packet is committed.

80-N2040-53 Rev. AA 136


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0x301 CYCLES_2_PACKET_COMMITTED Number of cycles when two packets are committed.
0x302 CYCLES_3_PACKET_COMMITTED Number of cycles when three packets are committed.
0x303 CYCLES_4_PACKET_COMMITTED Number of cycles when four packets are committed.
0x304 SMT_CLUSTER0 Number of cycles when more than one packet is committed in
cluster 0.
0x305 SMT_CLUSTER1 Number of cycles when more than one packet is committed in
cluster 1.
0x306 SMT_INTERCLUSTER Number of cycles when packets are committed on both clus-
ters.
0x307 SMT_CONFLICT_- Number of cases when a packet is SMT-able without slot re-
FOR_REG_READ_OR_CU_FWD source conflicts, but it cannot go due to s0/s1 register read/CU
forwarding.
0x308 COMMITTED_PKT_2_THREAD_RUN- Number of committed packets with two threads running on the
NING_2T_PLUS_0T same cluster. Running means the threads are not in the Wait
or Stop state.
0x309 COMMITTED_PKT_2_THREAD_RUN- Number of committed packets with two threads running on the
NING_1T_PLUS_1T different cluster. Running means the threads are not in the
Wait or Stop state.
0x30a COMMITTED_PKT_3_THREAD_RUN- Number of committed packets with three threads running on
NING_3T_PLUS_0T the same cluster. Running means the threads are not in the
Wait or Stop state.
0x30b COMMITTED_PKT_3_THREAD_RUN- Number of committed packets with three threads running, two
NING_2T_PLUS_1T threads on one cluster and 1 on another cluster. Running
means the threads are not in the Wait or Stop state.
0x30c COMMITTED_PKT_4_THREAD_RUN- Number of committed packets with four threads running on
NING_4T_PLUS_0T same cluster. Running means the threads are not in the Wait
or Stop state.
0x30d COMMITTED_PKT_4_THREAD_RUN- Number of committed packets with four threads running, three
NING_3T_PLUS_1T threads on one cluster and one thread running on another
cluster. Running means the threads are not in the Wait or Stop
state.
0x30e COMMITTED_PKT_4_THREAD_RUN- Number of committed packets with four threads running, two
NING_2T_PLUS_2T threads on one cluster and two threads running on another
cluster. Running means the threads are not in the Wait or Stop
state.
0x30f COMMITTED_PKT_5_THREAD_RUN- Number of committed packets with 5 threads running, four
NING_4T_PLUS_1T threads on one cluster and one thread running on another
cluster. Running means the threads are not in the Wait or Stop
state.
0x310 COMMITTED_PKT_5_THREAD_RUN- Number of committed packets with 5 threads running, three
NING_3T_PLUS_2T threads on one cluster and two threads running on another
cluster. Running means the threads are not in the Wait or Stop
state.

80-N2040-53 Rev. AA 137


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0x311 COMMITTED_PKT_6_THREAD_RUN- Number of committed packets with six threads running, four
NING_4T_PLUS_2T threads on one cluster and two threads running on another
cluster. Running means the threads are not in the Wait or Stop
state.
0x312 COMMITTED_PKT_6_THREAD_RUN- Number of committed packets with six threads running, three
NING_3T_PLUS_3T threads on one cluster and three threads running on another
cluster. Running means the threads are not in the Wait or Stop
state.
0x313 ICACHE_DE- Number of iprfetches initiated on demand misses.
MAND_MISS_PREFETCH_MISS
0x314 SIMPLE_PACKET Number of committed simple packets, which can be dis-
patched on in-cluster SMT threads. Includes eligible packets
that are committed on both primary and in-cluster SMT
threads.
0x315 AXI3_LINE64_WRITE_REQUEST Number of 64-byte line write requests issued by the primary
AXI3 master. Includes all interleaved requests. All bytes are
valid.
0x316 AXI3_LINE64_READ_REQUEST Number of 64-byte line read requests issued by the primary
AXI3 master. Includes all interleaved requests
0x317 AXI3_WR_CONGESTION Tertiary AXI write command or data queue is full, and an oper-
ation is stuck at the head of the primary AXI3 master command
queue. Includes all interleaved requests.
0x318 AXI3_INCOMPLETE_WRITE_REQUEST L2 line-sized write was made to the AXI3 master, but not all
bytes were valid. Includes segmented writes. Excludes WT
stores.
0x319 ICACHE_DATA_REPLAY Number of I-cache data replays due to incorrect way predic-
tions.
0x31c SMT_PKT_PICKED_BUT_NOT_COM- In-cluster SMT thread is picked, but not committed.
MIT_PVIEW_CYCLES
0x31d SMT_PKT_IQ_NO_PKT_PVIEW_CYCLES In-cluster SMT thread is not picked because no packet is in IQ
on the SMT thread.
0x31e SMT_PKT_NOT_SIMPLE_PVIEW_CYCLES In-cluster SMT thread is not picked because no simple packet
is on the SMT thread.
0x31f SMT_PKT_NOT_READY_PVIEW_CYCLES In-cluster SMT thread is not picked because no simple packet
is ready for dispatch on the SMT thread.
0x320 SMT_PKT_SLOT_CONFLICT_PVIEW_CY- In-cluster SMT thread is not picked due to a slot conflict be-
CLES tween the primary thread and SMT thread.
0x321 SMT_PKT_REG_FWD_BLOCK_PVIEW_- In-cluster SMT thread is not picked due to a register and for-
CYCLES ward block on the SMT thread.
0x322 CLADE2_EB_FULL CLADE2 can use up to two eviction buffer entries. Indicates
that both entries are used and CLADE2 is congested.
0x323 CLADE2_RD_REQ Number of L2 cache read requests in the CLADE2 region.
0x324 CLADE2_RDCACHE_MISS Number of L2 cache read request misses in the CLADE2 re-
gion.
0x325 CLADE2_WR_REQ Number of L2 cache write requests in the CLADE2 region.

80-N2040-53 Rev. AA 138


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0x326 CLADE2_WRCACHE_MISS Number of L2 cache write request misses in the CLADE2 re-
gion.
0x327 AXI_EWD_REQUEST L2 cache eviction of clean data to the AXI master bus.
0x328 AXI_EWD_REQUEST_EVEN L2 cache eviction of clean data to the even interleaved AXI
master bus.
0x329 AXI_CMO_REQUEST Cache maintenance operation request from the QDSP6 core
to AXIM.
0x32a AXI_CMO_REQUEST_EVEN Cache maintenance operation request from the QDSP6 core
to AXIM Interleave0.
0x32b ICACHE_DE- Number of iprfetches initiated on a demand miss in IU0.
MAND_MISS_PREFETCH_MISS_IU0
0x32c ICACHE_DE- Number of iprfetches initiated on a demand miss in IU1.
MAND_MISS_PREFETCH_MISS_IU1
0x330 VMEM_ST_SMT_DU_PORT_CONLICT_RE- Number of times any packet takes a replay because CU didn’t
PLAY allocate a port at schedule time and it didn’t arbitrate for the
port in DU based on the state bit (VMEMSttoVTCM) but need-
ed a port as it mapped to L2.
0x333 DU_SPF_DTLBPGCROSS Number of stopping prefetching due to page cross
0x334 DU_SPF_DCACHE_HIT Number of prefetch requests hitting in L1D$
0x335 DU_SPF_DCACHE_MISS Number of prefetch requests missing in L1D$
0x336 DU_SPF_L2FIFOFULL_RETRY Number of prefetch retry on L2FIFO queue full
0x337 DU_SPF_L2BUFFULL_RETRY Number of prefetch retry on L2 credits/AQoS busy
0x338 DU_SPF_CONFLICT_RETRY Number of cycles that prefetches losing arbitration
0x350 DU_NUM_WAY_PREDICTIONS Number of times DU did way predictions for loads
0x351 DU_WAY_PRED_REPLAYS Number of times DU replayed due to way misprediction
0x352 DU_BANKCONFLICTREPLAY_INVALID Number of time bank conflict replay was prevented due to way
prediction
0xbbf DU_CACHE_MISS_L2HIT_PVIEW_CYCLES Cycles cluster cannot commit due to D-cache cacheable miss
hitting in L2.
0xbc0 DU_CACHE_MISS_TCM_PVIEW_CYCLES Cycles cluster cannot commit due to D-cache cacheable miss
going to TCM.
0xbc1 DU_CACHE_MISS_AXI_PVIEW_CYCLES Cycles cluster cannot commit due to D-cache cacheable miss
going to AXI.
0xbc2 DU_CACHE_MISS_AHB_PVIEW_CYCLES Cycles cluster cannot commit due to D-cache cacheable miss
going to AHB.
0xbc3 DU_CACHE_MISS_AXI2_PVIEW_CYCLES Cycles cluster cannot commit due to D-cache cacheable miss
going to AXI2.
0xbc4 ICACHE_DEMAND_MISS_L2HIT_PRI –
0xbc5 ICACHE_DEMAND_MISS_L2MISS_PRI –
0xbc6 ICACHE_DEMAND_MISS_L2TCMHIT_PRI –
0xbc7 ICACHE_DEMAND_MISS_L2ITCMHIT_PRI –

80-N2040-53 Rev. AA 139


Qualcomm Hexagon V73 Programmer’s Reference Manual PMU Events

Table 9-1 V73 processor events symbols


Event Symbol Definition
0xbc8 ICACHE_IPREFETCHES_SENT_L2HIT –
0xbc9 ICACHE_IPREFECHES_SENT_L2MISS –
0xbca ICACHE_IPREFETCHES_SENT_L2TCM –
0xbcb ICACHE_IPREFETCHES_SENT_L2ITCM –

80-N2040-53 Rev. AA 140


10 Instruction Encoding

10.1 Instructions
Hexagon processor instructions are encoded in a 32-bit instruction word. The instruction word
format varies according to the instruction type.
The instruction words contain two types of bit fields:
■ Common fields appear in every processor instruction, and are defined the same in all
instructions.
■ Instruction-specific fields appear only in some instructions, or vary in definition across the
instruction set.

Table 10-1 Instruction bit fields


Name Description Type
ICLASS Instruction class Common
Parse Packet / loop bits

80-N2040-53 Rev. AA 141


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Encoding

Table 10-1 Instruction bit fields


Name Description Type
MajOp Major opcode Instruction-
Maj specific
MinOp Minor opcode
Min
RegType Register type (32-bit, 64-bit)
Type Operand type (byte, halfword, and so on)
Amode Addressing mode
dn Destination register operand
sn Source register operand
tn Source register operand #2
xn Source and destination register operand
un Predicate or modifier register operand
sH Source register bit field (Rs.H or Rs.L)
tH Source register #2 bit field (Rt.H or Rt.L)
UN Unsigned operand
Rs No source register read
P Predicate expression
PS Predicate sense (Pu or !Pu)
DN Dot-new predicate
PT Predict taken
sm Supervisor mode only

NOTE: In some cases, instruction-specific fields encode instruction attributes other than the ones
described for the fields in Table 10-1.

Reserved bits

Some instructions contain reserved bits that do not currently encode instruction attributes.
Always set these bits to 0 to ensure compatibility with any future changes in the instruction
encoding.

NOTE: Reserved bits appear as ‘-’ characters in the instruction encoding tables.

80-N2040-53 Rev. AA 142


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Encoding

10.2 Sub-instructions
To reduce code size, the Hexagon processor supports the encoding of certain pairs of instructions
in a single 32-bit container. Instructions encoded this way are sub-instructions, and the containers
are duplexes (Section 10.3).
Sub-instructions are limited to certain commonly-used instructions:
■ Arithmetic and logical operations
■ Register transfer
■ Loads and stores
■ Stack frame allocation/deallocation
■ Subroutine return
Table 10-2 lists the sub-instructions along with the group identifiers that encode them in
duplexes.
Sub-instructions can access only a subset of the general registers (R0 to R7, R16 to R23).
Table 10-3 lists the sub-instruction register encodings.

NOTE: Certain sub-instructions implicitly access registers such as SP (R29).

Table 10-2 Sub-instructions


Group Instruction Description
L1 Rd = memw(Rs+#u4:2) Word load
L1 Rd = memub(Rs+#u4:0) Unsigned byte load
Group Instruction Instruction
L2 Rd = memh/memuh(Rs+#u3:1) Halfword loads
L2 Rd = memb(Rs+#u3:0) Signed byte load
L2 Rd = memw(r29+#u5:2) Load word from stack
L2 Rdd = memd(r29+#u5:3) Load pair from stack
L2 deallocframe Deallocate stack frame
L2 if ([!]P0) dealloc_return Deallocate stack frame and return
if ([!]P0.new) dealloc_return:nt
L2 jumpr R31 Return
if ([!]P0) jumpr R31
if ([!]P0.new) jumpr:nt R31
Group Instruction Instruction
S1 memw(Rs+#u4:2) = Rt Store word
S1 memb(Rs+#u4:0) = Rt Store byte
Group Instruction Instruction
S2 memh(Rs+#u3:1) = Rt Store halfword
S2 memw(r29+#u5:2) = Rt Store word to stack
S2 memd(r29+#s6:3) = Rtt Store pair to stack
S2 memw(Rs+#u4:2) = #U1 Store immediate word #0 or #1

80-N2040-53 Rev. AA 143


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Encoding

Table 10-2 Sub-instructions (cont.)


Group Instruction Description
S2 memb(Rs+#u4) = #U1 Store immediate byte #0 or #1
S2 allocframe(#u5:3) Allocate stack frame
Group Instruction Instruction
A Rx = add(Rx,#s7) Add immediate
A Rd = Rs Transfer
A Rd = #u6 Set to unsigned immediate
A Rd = #-1 Set to -1
A if ([!]P0[.new]) Rd = #0 Conditional clear
A Rd = add(r29,#u6:2) Add immediate to stack pointer
A Rx = add(Rx,Rs) Register add
A P0 = cmp.eq(Rs,#u2) Compare register equal immediate
A Rdd = combine(#0,Rs) Combine zero and register into pair
A Rdd = combine(Rs,#0) Combine register and zero into pair
A Rdd = combine(#u2,#U2) Combine immediates into pair
A Rd = add(Rs,#1) Add and subtract 1
Rd = add(Rs,#-1)
A Rd = sxth/sxtb/zxtb/zxth(Rs) Sign- and zero-extends
A Rd = and(Rs,#1) And with 1

Table 10-3 Sub-instruction registers


Register Encoding
Rs,Rt,Rd,Rx 0000 = R0
0001 = R1
0010 = R2
0011 = R3
0100 = R4
0101 = R5
0110 = R6
0111 = R7
1000 = R16
1001 = R17
1010 = R18
1011 = R19
1100 = R20
1101 = R21
1110 = R22
1111 = R23

80-N2040-53 Rev. AA 144


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Encoding

Table 10-3 Sub-instruction registers


Register Encoding
Rdd,Rtt 000 = R1:0
001 = R3:2
010 = R5:4
011 = R7:6
100 = R17:16
101 = R19:18
110 = R21:20
111 = R23:22

10.3 Duplexes
A duplex is encoded as a 32-bit instruction with bits [15:14] set to 00. The sub-instructions that
comprise a duplex are encoded as 13-bit fields in the duplex.
An instruction packet can contain one duplex and up to two other (non-duplex) instructions. The
duplex must always appear as the last word in a packet.
The sub-instructions in a duplex always execute in slot 0 and slot 1.

Table 10-4 Duplex instruction encoding


Bits Name Description
15:14 Parse bits 00 = Duplex type, ends the packet and indicates that
word contains two sub-instructions
12:0 Sub-instruction low Encodes slot 0 sub-instruction
28:16 Sub-instruction high Encodes slot 1 sub-instruction
31:29, 13 4-bit ICLASS Indicates which group the low and high sub-instructions
belong to.

The duplex ICLASS field values that specify the group of each sub-instruction in a duplex are
shown in Table 10-5

Table 10-5 Duplex ICLASS field


ICLASS Low slot 0 subinsn type High slot 1 subinsn type
0x0 L1-type L1-type
0x1 L2-type L1-type
0x2 L2-type L2-type
0x3 A-type A-type
0x4 L1-type A-type
0x5 L2-type A-type
0x6 S1-type A-type
0x7 S2-type A-type
0x8 S1-type L1-type
0x9 S1-type L2-type
0xA S1-type S1-type

80-N2040-53 Rev. AA 145


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Encoding

Table 10-5 Duplex ICLASS field


ICLASS Low slot 0 subinsn type High slot 1 subinsn type
0xB S2-type S1-type
0xC S2-type L1-type
0xD S2-type L2-type
0xE S2-type S2-type
0xF Reserved Reserved

Duplexes have the following grouping constraints:


■ Constant extenders expand the range of an instruction’s immediate operand to 32 bits, and
can expand the following sub-instructions:
– Rx = add(Rx,#s7)
– Rd = #u6
A duplex can contain only one constant-extended instruction, and it must appear in the slot 1
position.
■ When the sub-instructions are treated as 13-bit unsigned integer values for two instructions
with the same sub-instruction group in a duplex, the instruction corresponding to the
numerically smaller value must be encoded in the slot 1 position of the duplex.1
■ Sub-instructions must conform to any slot assignment grouping rules that apply to the
individual instructions, even if a duplex pattern exists that violates those assignments. One
exception to this rule exists:
– jumpr R31 must appear in the Slot 0 position

1 The sub-instruction register and immediate fields are assumed to be 0 when performing this comparison.

80-N2040-53 Rev. AA 146


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Encoding

10.4 Instruction classes


The instruction class (Section 3.2) is encoded in the four most-significant bits of the instruction
word (31:28). These bits are referred to as the instruction’s ICLASS field. The Slots column in
Table 10-6 indicates which slots can receive the instruction class.

Table 10-6 Instruction class encoding


Encoding Instruction class Slots
0000 Constant extender (Section 10.9) –
0001 J 2,3
0010 J 2,3
0011 LD 0,1
ST
0100 LD 0,1
ST
(conditional or GP-relative)
0101 J 2,3
0110 CR 3
0111 ALU32 0,1,2,3
1000 XTYPE 2,3
1001 LD 0,1
1010 ST 0
1011 ALU32 0,1,2,3
1100 XTYPE 2,3
1101 XTYPE 2,3
1110 XTYPE 2,3
1111 ALU32 0,1,2,3

For details on encoding the individual class types, see Chapter 11.

80-N2040-53 Rev. AA 147


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Encoding

10.5 Instruction packets


Instruction packets are encoded using two bits of the instruction word (15:14), which are referred
to as the Parse field of the instruction word. The field values have the following definitions:
■ ‘11’ indicates that an instruction is the last instruction in a packet (the instruction word at the
highest address).
■ ‘01’ or ‘10’ indicate that an instruction is not the last instruction in a packet.
■ ‘00’indicates a duplex.
If any sequence of four consecutive instructions occurs without one of them containing ‘11’ or
‘00’, the processor raises an error exception (illegal opcode).

31 16

15 0
P P

Packet / loop parse bits:


01, 10 = not end of packet
11 = end of packet
00 = duplex

Figure 10-1 Parse field instruction packet encoding

The following examples show how to use the Parse field to encode instruction packets:
{ A ; B}
01 11 // Parse fields of instructions A,B

{ A ; B ; C}
01 01 11 // Parse fields of instructions A,B,C

{ A ; B ; C ; D}
01 01 01 11 // Parse fields of instructions A,B,C,D

80-N2040-53 Rev. AA 148


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Encoding

10.6 Loop packets


In addition to encoding the last instruction in a packet, the Parse field of the instruction word
(Section 10.5) encodes the last packet in a hardware loop.
The Hexagon processor supports two Hardware loops, labeled 0 and 1. The last packet in these
loops is subject to the following restrictions:
■ The last packet in a hardware loop 0 must contain two or more instruction words.
■ The last packet in a hardware loop 1 must contain three or more instruction words.
If the last packet in a loop is expressed in assembly language with fewer than the required
number of words, the assembler automatically adds one or two NOP instructions to the encoded
packet so it contains the minimum required number of instruction words.
The Parse fields in a packet’s first and second instruction words (the words at the lowest
addresses) encode whether the packet is the last packet in a hardware loop.
Table 10-7 Parse field loop packet encoding
Parse field in Parse field in
Packet
first Instruction second Instruction
Not last in loop 01 or 11 01 or 11 1
Last in loop 0 10 01 or 11
Last in loop 1 01 10
Last in loops 0 & 1 10 10
1 Not applicable for single-instruction packets.

The following examples show how to use the Parse field to encode loop packets:
{ A B}:endloop0
10 11 // Parse fields of instrs A,B

{ A B C}:endloop0
10 01 11 // Parse fields of instrs A,B,C

{ A B C D}:endloop0
10 01 01 11 // Parse fields of instrs A,B,C,D

{ A B C}:endloop1
01 10 11 // Parse fields of instrs A,B,C

{ A B C D}:endloop1
01 10 01 11 // Parse fields of instrs A,B,C,D

{ A B C}:endloop0:endloop1
10 10 11 // Parse fields of instrs A,B,C

{ A B C D}:endloop0:endloop1
10 10 01 11 // Parse fields of instrs A,B,C,D

80-N2040-53 Rev. AA 149


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Encoding

10.7 Immediate values


To conserve encoding space, the Hexagon processor often stores immediate values in instruction
fields that are smaller (in bit size) than the values actually needed in the instruction operation.
When an instruction operates on one of its immediate operands, the processor automatically
extends the immediate value to the bit size required by the operation:
■ Signed immediate values are sign-extended
■ Unsigned immediate values are zero-extended

10.8 Scaled immediate values


To minimize the number of bits in instruction words to store certain immediate values, the
Hexagon processor stores the values as scaled immediate values. Use scaled immediate values
when an immediate value must represent integral multiples of a power of 2 in a specific range.
For example, consider an instruction operand whose possible values are the following:
-32, -28, -24, -20, -16, -12, -8, -4, 0, 4, 8, 12, 16, 20, 24, 28
Encoding the full range of integers -32...28 normally requires 6 bits. However, if the operand is
stored as a scaled immediate, it can first be shifted right by two bits, storing only the four
remaining bits in the instruction word. When the operand is fetched from the instruction word,
the processor automatically shifts the value left by two bits to recreate the original operand
value.

NOTE: The scaled immediate value in the example above is represented notationally as #s4:2.

Scaled immediate values commonly encode address offsets that apply to data types of varying
size. For example, Table 10-8 shows how to use the byte offsets in immediate-with-offset
addressing mode that are stored as 11-bit scaled immediate values. This enables the offsets to
span the same range of data elements regardless of the data type.

Table 10-8 Scaled immediate encoding (indirect offsets)


Offset size Scale Offset size Offset range Offset range
Data type
(stored) bits (effective) (bytes) (elements)
byte 11 0 11 -1024 ... 1023 -1024 ... 1023
halfword 11 1 12 -2048 ... 2046 -1024 ... 1023
word 11 2 13 -4096 ... 4092 -1024 ... 1023
doubleword 11 3 14 -8192 ... 8184 -1024 ... 1023

80-N2040-53 Rev. AA 150


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Encoding

10.9 Constant extenders


To support the use of 32-bit operands in a number of instructions, the Hexagon processor defines
constant extenders, which are an instruction word that exists solely to extend the bit range of an
immediate or address operand that is contained in an adjacent instruction in a packet.
For example, the absolute addressing mode specifies a 32-bit constant value as the effective
address. Instructions using this addressing mode are encoded in a single packet containing both
the normal instruction word and a second word with a constant extender that increases the range
of the instruction’s normal constant operand to a full 32 bits.

NOTE: Constant extended operands can encode symbols.

A constant extender is encoded as a 32-bit instruction with the 4-bit ICLASS field set to 0 and the
2-bit Parse field set to its usual value (Section 10.5). The remaining 26 bits in the instruction word
store the data bits that are prepended to an operand as small as 6 bits to create a full 32-bit value.

Table 10-9 Constant extender encoding


Bits Name Description
31:28 ICLASS Instruction class = 0000
27:16 Extender high High 12 bits of 26-bit constant extension
15:14 Parse Parse bits
13:0 Extender low Low 14 bits of 26-bit constant extension

Within a packet, a constant extender must be positioned immediately before the instruction that
it extends: in terms of memory addresses, the extender word must reside at address
(<instr_address> - 4).
The constant extender effectively serves as a prefix for an instruction: it does not execute in a
slot, nor does it consume any slot resources. All packets must contain four or fewer words, and
the constant extender occupies one word.
If the instruction operand to extend is longer than 6 bits, the overlapping bits in the base
instruction must be encoded as zeros. The value in the constant extender always supplies the
upper 26 bits.
The Regclass field in Table 10-10 lists the values to set bits [27:24] to in the instruction word to
identify the instruction as one that might include a constant extender.

NOTE: When the base instruction encodes two constant operands, the extended immediate is the one
specified in the table.

Constant extenders appear in disassembly listings as Hexagon instructions with the name
immext.

NOTE: If a constant extender is encoded in a packet for an instruction that does not accept a constant
extender, the execution result is undefined. The assembler normally ensures that only valid
constant extenders are generated.

80-N2040-53 Rev. AA 151


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Encoding

Table 10-10 Constant extender instructions


ICLASS Regclass Instructions
LD ---1 Rd = mem{b,ub,h,uh,w,d}(##U32)
if ([!]Pt[.new]) Rd = mem{b,ub,h,uh,w,d} (Rs + ##U32)
// predicated loads
LD ---- Rd = mem{b,ub,h,uh,w,d} (Rs + ##U32)
Rd = mem{b,ub,h,uh,w,d} (Re=##U32)
Rd = mem{b,ub,h,uh,w,d} (Rt<<#u2 + ##U32)
if ([!]Pt[.new]) Rd = mem{b,ub,h,uh,w,d} (##U32)
ST ---0 mem{b,h,w,d}(##U32) = Rs[.new] // GP-stores
if ([!]Pt[.new]) mem{b,h,w,d}(Rs + ##U32) = Rt[.new]
// predicated stores
ST ---- mem{b,h,w,d}(Rs + ##U32) = Rt[.new]
mem{b,h,w,d}(Rd=##U32) = Rt[.new]
mem{b,h,w,d}(Ru<<#u2 + ##U32) = Rt[.new]
if ([!]Pt[.new]) mem{b,h,w,d}(##U32) = Rt[.new]
MEMOP ---- [if [!]Ps] memw(Rs + #u6) = ##U32 // constant store
memw(Rs + Rt<<#u2) = ##U32 // constant store
NV ---- if (cmp.xx(Rs.new,##U32)) jump:hint target
ALU32 ---- Rd = ##u32
Rdd = combine(Rs,##u32)
Rdd = combine(##u32,Rs)
Rdd = combine(##u32,#s8)
Rdd = combine(#s8,##u32)
Rd = mux (Pu, Rs,##u32)
Rd = mux (Pu, ##u32, Rs)
Rd = mux(Pu,##u32,#s8)
if ([!]Pu[.new]) Rd = add(Rs,##u32)
if ([!]Pu[.new]) Rd = ##u32
Pd = [!]cmp.eq (Rs,##u32)
Pd = [!]cmp.gt (Rs,##u32)
Pd = [!]cmp.gtu (Rs,##u32)
Rd = [!]cmp.eq(Rs,##u32)
Rd = and(Rs,##u32)
Rd = or(Rs,##u32)
Rd = sub(##u32,Rs)
ALU32 ---- Rd = add(Rs,##s32)
XTYPE 00-- Rd = mpyi(Rs,##u32)
Rd += mpyi(Rs,##u32)
Rd -= mpyi(Rs,##u32)
Rx += add(Rs,##u32)
Rx -= add(Rs,##u32)
ALU32 ---- 1 Rd = ##u32
Rd = add(Rs,##s32)
J 1--- jump (PC + ##s32)
call (PC + ##s32)
if ([!]Pu) call (PC + ##s32)
CR ---- Pd = spNloop0(PC+##s32,Rs/#U10)
loop0/1 (PC+##s32,#Rs/#U10)

80-N2040-53 Rev. AA 152


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Encoding

Table 10-10 Constant extender instructions (cont.)


ICLASS Regclass Instructions
XTYPE 1--- Rd = add(pc,##s32)
Rd = add(##u32,mpyi(Rs,#u6))
Rd = add(##u32,mpyi(Rs,Rt))
Rd = add(Rs,add(Rt,##u32))
Rd = add(Rs,sub(##u32,Rt))
Rd = sub(##u32,add(Rs,Rt))
Rd = or(Rs,and(Rt,##u32))
Rx = add/sub/and/or (##u32,asl/asr/lsr(Rx,#U5))
Rx = add/sub/and/or (##u32,asl/asr/lsr(Rs,Rx))
Rx = add/sub/and/or (##u32,asl/asr/lsr(Rx,Rs))
Pd = cmpb/h.{eq,gt,gtu} (Rs,##u32)
1 Constant extension is only for a Slot 1 sub-instruction.

Encoding 32-bit address operands in load/stores

Two methods exist for encoding a 32-bit absolute address in a load or store instruction:
1. For unconditional load/stores, the GP-relative load/store instruction is used. The assembler
encodes the absolute 32-bit address as follows:
❒ The upper 26 bits are encoded in a constant extender
❒ The lower 6 bits are encoded in the 6 operand bits contained in the GP-relative
instruction
In this case the 32-bit value encoded must be a plain address, and the value stored in the GP
register is ignored.

NOTE: When a constant extender is explicitly specified with a GP-relative load/store, the
processor ignores the value in GP and creates the effective address directly from the
32-bit constant value.
2. For conditional load/store instructions that have their base address encoded only by a 6-bit
immediate operand, a constant extender must be explicitly specified; otherwise, the
execution result is undefined. The assembler ensures that these instructions always include a
constant extender.
This case applies also to instructions that use the absolute-set addressing mode or absolute-
plus-register-offset addressing mode.

Encoding 32-bit immediate operands

The immediate operands of certain instructions use scaled immediates (Section 10.8) to increase
their addressable range. When using constant extenders, scaled immediates are not scaled by the
processor. Instead, the assembler must encode the full 32-bit unscaled value as follows:
■ The upper 26 bits are encoded in the constant extender
■ The lower six 6 bits are encoded in the base instruction in the least-significant bit positions of
the immediate operand field.
■ Any overlapping bits in the base instruction are encoded as zeros.

80-N2040-53 Rev. AA 153


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Encoding

Encoding 32-bit jump/call target addresses

When a jump/call has a constant extender, the resulting target address is forced to a 32-bit
alignment (bits 1:0 in the address are cleared by hardware). The resulting jump/call operation
never causes an alignment violation.

10.10 New-value operands


Instructions that include a new-value register operand specify in their encodings which
instruction in the packet has its destination register accessed as the new-value register.
New-value consumers include a 3-bit instruction field named Nt that specifies this information.
■ Nt[0] is reserved and must always be encoded as zero. A nonzero value produces undefined
results.
■ Nt[2:1] encodes the distance (in instructions) from the producer to the consumer, as follows:
❒ Nt[2:1] = 00 // reserved
❒ Nt[2:1] = 01 // producer is +1 instruction ahead of consumer
❒ Nt[2:1] = 10 // producer is +2 instructions ahead of consumer
❒ Nt[2:1] = 11 // producer is +3 instructions ahead of consumer

“ahead” is defined here as the instruction encoded at a lower memory address than the
consumer instruction, not counting empty slots or constant extenders. For example, the following
producer/consumer relationship is encoded with Nt[2:1] set to 01.
...
<producer instruction word>
<consumer constant extender word>
<consumer instruction word>
...

NOTE: Instructions with 64-bit register pair destinations cannot produce new-values. The assembler
flags this case with an error, as the result is undefined.

10.11 Instruction mapping


Some Hexagon processor instructions are encoded by the assembler as variants of other
instructions. This is done for operations that are functionally equivalent to other instructions, but
are still defined as separate instructions because of their programming utility as common
operations.
Table 10-11 Instructions mapped to other instructions
Instruction Mapping
Rd = not(Rs) Rd = sub(#-1,Rs)
Rd = neg(Rs) Rd = sub(#0,Rs)
Rdd = Rss Rdd = combine(Rss.H32, Rss.L32)

80-N2040-53 Rev. AA 154


11 Instruction Set

This chapter describes the instruction set for version 7 of the Hexagon processor.
The instructions are listed alphabetically within instruction categories. The following information
is provided for each instruction:
■ Instruction name
■ A brief description of the instruction
■ A high-level functional description (syntax and behavior) with possible operand types
■ Instruction class and slot information for grouping instructions in packets
■ C intrinsic functions that provide access to the instruction
■ Instruction encoding

80-N2040-53 Rev. AA 155


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.1 ALU32
The ALU32 instruction class includes instructions that perform arithmetic and logical operations
on 32-bit data.
ALU32 instructions are executable on any slot.

11.1.1 ALU32 ALU


The ALU32 ALU instruction subclass includes instructions that perform arithmetic and logical
operations on individual 32-bit items.

Add
Add a source register either to another source register or to a signed 16-bit immediate value.
Store the result in destination register. Source and destination registers are 32 bits. If the result
overflows 32 bits, it wraps around. Optionally saturate result to a signed value between
0x80000000 and 0x7fffffff.
For 64-bit versions of this operation, see the XTYPE add instructions.

Syntax Behavior
Rd=add(Rs,#s16) apply_extension(#s);
Rd=Rs+#s;
Rd=add(Rs,Rt) Rd=Rs+Rt;
Rd=add(Rs,Rt):sat Rd=sat32(Rs+Rt);

Class: ALU32 (slots 0,1,2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=add(Rs,#s16) Word32 Q6_R_add_RI(Word32 Rs, Word32 Is16)


Rd=add(Rs,Rt) Word32 Q6_R_add_RR(Word32 Rs, Word32 Rt)
Rd=add(Rs,Rt):sat Word32 Q6_R_add_RR_sat(Word32 Rs, Word32
Rt)

80-N2040-53 Rev. AA 156


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse d5
1 0 1 1 i i i i i i i s s s s s P P i i i i i i i i i d d d d d Rd=add(Rs,#s16)
ICLASS P MajOp MinOp s5 Parse t5 d5
1 1 1 1 0 0 1 1 0 0 0 s s s s s P P - t t t t t - - - d d d d d Rd=add(Rs,Rt)
1 1 1 1 0 1 1 0 0 1 0 s s s s s P P - t t t t t - - - d d d d d Rd=add(Rs,Rt):sat

Field name Description


MajOp Major opcode
MinOp Minor opcode
P Predicated
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 157


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Logical operations
Perform bitwise logical operations (AND, OR, XOR, NOT) either on two source registers or on a
source register and a signed 10-bit immediate value. Store result in destination register. Source
and destination registers are 32 bits.
For 64-bit versions of these operations, see the XTYPE logical instructions.

Syntax Behavior
Rd=and(Rs,#s10) apply_extension(#s);
Rd=Rs&#s;
Rd=and(Rs,Rt) Rd=Rs&Rt;
Rd=and(Rt,~Rs) Rd = (Rt & ~Rs);
Rd=not(Rs) Assembler mapped to: "Rd=sub(#-1,Rs)"
Rd=or(Rs,#s10) apply_extension(#s);
Rd=Rs|#s;
Rd=or(Rs,Rt) Rd=Rs|Rt;
Rd=or(Rt,~Rs) Rd = (Rt | ~Rs);
Rd=xor(Rs,Rt) Rd=Rs^Rt;

Class: ALU32 (slots 0,1,2,3)

Intrinsics

Rd=and(Rs,#s10) Word32 Q6_R_and_RI(Word32 Rs, Word32 Is10)


Rd=and(Rs,Rt) Word32 Q6_R_and_RR(Word32 Rs, Word32 Rt)
Rd=and(Rt,~Rs) Word32 Q6_R_and_RnR(Word32 Rt, Word32 Rs)
Rd=not(Rs) Word32 Q6_R_not_R(Word32 Rs)
Rd=or(Rs,#s10) Word32 Q6_R_or_RI(Word32 Rs, Word32 Is10)
Rd=or(Rs,Rt) Word32 Q6_R_or_RR(Word32 Rs, Word32 Rt)
Rd=or(Rt,~Rs) Word32 Q6_R_or_RnR(Word32 Rt, Word32 Rs)
Rd=xor(Rs,Rt) Word32 Q6_R_xor_RR(Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp MinOp s5 Parse d5
0 1 1 1 0 1 1 0 0 0 i s s s s s P P i i i i i i i i i d d d d d Rd=and(Rs,#s10)
0 1 1 1 0 1 1 0 1 0 i s s s s s P P i i i i i i i i i d d d d d Rd=or(Rs,#s10)
ICLASS P MajOp MinOp s5 Parse t5 d5

80-N2040-53 Rev. AA 158


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 0 0 0 1 0 0 0 s s s s s P P - t t t t t - - - d d d d d Rd=and(Rs,Rt)
1 1 1 1 0 0 0 1 0 0 1 s s s s s P P - t t t t t - - - d d d d d Rd=or(Rs,Rt)
1 1 1 1 0 0 0 1 0 1 1 s s s s s P P - t t t t t - - - d d d d d Rd=xor(Rs,Rt)
1 1 1 1 0 0 0 1 1 0 0 s s s s s P P - t t t t t - - - d d d d d Rd=and(Rt,~Rs)
1 1 1 1 0 0 0 1 1 0 1 s s s s s P P - t t t t t - - - d d d d d Rd=or(Rt,~Rs)

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
P Predicated
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 159


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Negate
Perform arithmetic negation on a source register. Store result in destination register. Source and
destination registers are 32 bits.
For 64-bit and saturating versions of this instruction, see the XTYPE-class negate instructions.

Syntax Behavior
Rd=neg(Rs) Assembler mapped to: "Rd=sub(#0,Rs)"

Class: N/A

Intrinsics

Rd=neg(Rs) Word32 Q6_R_neg_R(Word32 Rs)

80-N2040-53 Rev. AA 160


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

NOP
Perform no operation. This instruction is used for padding and alignment.
Within a packet, it can be positioned in any slot 0 through 3.

Syntax Behavior
nop

Class: ALU32 (slots 0,1,2,3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp Parse
0 1 1 1 1 1 1 1 - - - - - - - - P P - - - - - - - - - - - - - - nop

Field name Description


MajOp Major opcode
Rs No Rs read
ICLASS Instruction class
Parse Packet/loop parse bits

80-N2040-53 Rev. AA 161


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Subtract
Subtract a source register from either another source register or from a signed 10-bit immediate
value. Store the result in the destination register. Source and destination registers are 32 bits. If
the result underflows 32 bits, it wraps around. Optionally saturate result to a signed value
between 0x8000_0000 and 0x7fff_ffff.
For 64-bit versions of this operation, see the XTYPE subtract instructions.

Syntax Behavior
Rd=sub(#s10,Rs) apply_extension(#s);
Rd=#s-Rs;
Rd=sub(Rt,Rs) Rd=Rt-Rs;
Rd=sub(Rt,Rs):sat Rd=sat32(Rt - Rs);

Class: ALU32 (slots 0,1,2,3)

Notes
■ If saturation occurs during execution of this instruction (a result is clamped to either
maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=sub(#s10,Rs) Word32 Q6_R_sub_IR(Word32 Is10, Word32 Rs)


Rd=sub(Rt,Rs) Word32 Q6_R_sub_RR(Word32 Rt, Word32 Rs)
Rd=sub(Rt,Rs):sat Word32 Q6_R_sub_RR_sat(Word32 Rt, Word32
Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp MinOp s5 Parse d5
0 1 1 1 0 1 1 0 0 1 i s s s s s P P i i i i i i i i i d d d d d Rd=sub(#s10,Rs)
ICLASS P MajOp MinOp s5 Parse t5 d5
1 1 1 1 0 0 1 1 0 0 1 s s s s s P P - t t t t t - - - d d d d d Rd=sub(Rt,Rs)
1 1 1 1 0 1 1 0 1 1 0 s s s s s P P - t t t t t - - - d d d d d Rd=sub(Rt,Rs):sat

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
P Predicated
ICLASS Instruction class
Parse Packet/loop parse bits

80-N2040-53 Rev. AA 162


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 163


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Sign extend
Sign-extend the least-significant byte or halfword from the source register and place the 32-bit
result in the destination register.
Rd=sxth(Rs) Rd=sxtb(Rs)

Rs Rs

Sign-extend Rd Sign-extend Rd

Syntax Behavior
Rd=sxtb(Rs) Rd = sxt8->32(Rs);

Rd=sxth(Rs) Rd = sxt16->32(Rs);

Class: ALU32 (slots 0,1,2,3)

Intrinsics

Rd=sxtb(Rs) Word32 Q6_R_sxtb_R(Word32 Rs)


Rd=sxth(Rs) Word32 Q6_R_sxth_R(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp MinOp s5 Parse C d5
0 1 1 1 0 0 0 0 1 0 1 s s s s s P P 0 - - - - - - - - d d d d d Rd=sxtb(Rs)
0 1 1 1 0 0 0 0 1 1 1 s s s s s P P 0 - - - - - - - - d d d d d Rd=sxth(Rs)

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
C Conditional
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s

80-N2040-53 Rev. AA 164


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Transfer immediate
Assign an immediate value to a 32-bit destination register.
Two types of assignment are supported. The first sign-extends a 16-bit signed immediate value to
32 bits. The second assigns a 16-bit unsigned immediate value to either the upper or lower 16 bits
of the destination register, leaving the other 16 bits unchanged.

Rd=#s16
Sign-Extend 16-bit immediate

Rx.H=#u16
16-bit immediate Unchanged

Rx.L=#u16
Unchanged 16-bit immediate

Syntax Behavior
Rd=#s16 apply_extension(#s);
Rd=#s;
Rdd=#s8 if ("#s8<0") {
Assembler mapped to: "Rdd=combine(#-1,#s8)";
} else {
Assembler mapped to: "Rdd=combine(#0,#s8)";
}
Rx.[HL]=#u16 Rx.h[01]=#u;

Class: ALU32 (slots 0,1,2,3)

Intrinsics

Rd=#s16 Word32 Q6_R_equals_I(Word32 Is16)


Rdd=#s8 Word64 Q6_P_equals_I(Word32 Is8)
Rx.H=#u16 Word32 Q6_Rh_equals_I(Word32 Rx, Word32
Iu16)
Rx.L=#u16 Word32 Q6_Rl_equals_I(Word32 Rx, Word32
Iu16)

80-N2040-53 Rev. AA 165


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp MinOp x5 Parse
0 1 1 1 0 0 0 1 i i 1 x x x x x P P i i i i i i i i i i i i i i Rx.L=#u16
0 1 1 1 0 0 1 0 i i 1 x x x x x P P i i i i i i i i i i i i i i Rx.H=#u16
ICLASS Rs MajOp MinOp Parse d5
0 1 1 1 1 0 0 0 i i - i i i i i P P i i i i i i i i i d d d d d Rd=#s16

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
x5 Field to encode register x

80-N2040-53 Rev. AA 166


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Transfer register
Transfer a source register to a destination register. Source and destination registers are either 32
bits or 64 bits.

Syntax Behavior
Rd=Rs Rd=Rs;
Rdd=Rss Assembler mapped to:
"Rdd=combine(Rss.H32,Rss.L32)"

Class: ALU32 (slots 0,1,2,3)

Intrinsics

Rd=Rs Word32 Q6_R_equals_R(Word32 Rs)


Rdd=Rss Word64 Q6_P_equals_P(Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp MinOp s5 Parse C d5
0 1 1 1 0 0 0 0 0 1 1 s s s s s P P 0 - - - - - - - - d d d d d Rd=Rs

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
C Conditional
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s

80-N2040-53 Rev. AA 167


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector add halfwords


Add the two 16-bit halfwords of Rs to the two 16-bit halfwords of Rt. The results are optionally
saturated to signed or unsigned 16-bit values.

Syntax Behavior
Rd=vaddh(Rs,Rt)[:sat] for (i=0;i<2;i++) {
Rd.h[i]=[sat16](Rs.h[i]+Rt.h[i]);
}
Rd=vadduh(Rs,Rt):sat for (i=0;i<2;i++) {
Rd.h[i]=usat16(Rs.uh[i]+Rt.uh[i]);
}

Class: ALU32 (slots 0,1,2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the Status Register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=vaddh(Rs,Rt) Word32 Q6_R_vaddh_RR(Word32 Rs, Word32 Rt)


Rd=vaddh(Rs,Rt):sat Word32 Q6_R_vaddh_RR_sat(Word32 Rs, Word32 Rt)
Rd=vadduh(Rs,Rt):sat Word32 Q6_R_vadduh_RR_sat(Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS P MajOp MinOp s5 Parse t5 d5
1 1 1 1 0 1 1 0 0 0 0 s s s s s P P - t t t t t - - - d d d d d Rd=vaddh(Rs,Rt)
1 1 1 1 0 1 1 0 0 0 1 s s s s s P P - t t t t t - - - d d d d d Rd=vaddh(Rs,Rt):sat
1 1 1 1 0 1 1 0 0 1 1 s s s s s P P - t t t t t - - - d d d d d Rd=vadduh(Rs,Rt):sat

Field name Description


MajOp Major opcode
MinOp Minor opcode
P Predicated
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 168


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector average halfwords


The vavgh instruction adds the two 16-bit halfwords of Rs to the two 16-bit halfwords of Rd, and
shifts the result right by 1 bit. Optionally, a rounding constant is added before shifting.
The vnavgh instruction subtracts the two 16-bit halfwords of Rt from the two 16-bit halfwords of
Rs, and shifts the result right by 1 bit. For vector negative average with rounding, see the XTYPE
VNAVGH instruction (Vector average halfwords).

Syntax Behavior
Rd=vavgh(Rs,Rt) for (i=0;i<2;i++) {
Rd.h[i]=((Rs.h[i]+Rt.h[i])>>1);
}
Rd=vavgh(Rs,Rt):rnd for (i=0;i<2;i++) {
Rd.h[i]=((Rs.h[i]+Rt.h[i]+1)>>1);
}
Rd=vnavgh(Rt,Rs) for (i=0;i<2;i++) {
Rd.h[i]=((Rt.h[i]-Rs.h[i])>>1);
}

Class: ALU32 (slots 0,1,2,3)

Intrinsics

Rd=vavgh(Rs,Rt) Word32 Q6_R_vavgh_RR(Word32 Rs, Word32 Rt)


Rd=vavgh(Rs,Rt):rnd Word32 Q6_R_vavgh_RR_rnd(Word32 Rs, Word32
Rt)
Rd=vnavgh(Rt,Rs) Word32 Q6_R_vnavgh_RR(Word32 Rt, Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS P MajOp MinOp s5 Parse t5 d5
1 1 1 1 0 1 1 1 - 0 0 s s s s s P P - t t t t t - - - d d d d d Rd=vavgh(Rs,Rt)
1 1 1 1 0 1 1 1 - 0 1 s s s s s P P - t t t t t - - - d d d d d Rd=vavgh(Rs,Rt):rnd
1 1 1 1 0 1 1 1 - 1 1 s s s s s P P - t t t t t - - - d d d d d Rd=vnavgh(Rt,Rs)

Field name Description


MajOp Major opcode
MinOp Minor opcode
P Predicated
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 169


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector subtract halfwords


Subtract each of the two halfwords in 32-bit vector Rs from the corresponding halfword in vector
Rt. Optionally, saturate each 16-bit addition to either a signed or unsigned 16-bit value.
Applying saturation to the vsubh instruction clamps the result to the signed range 0x8000 to
0x7fff, whereas applying saturation to vsubuh ensures that the unsigned result is in the range 0 to
0xffff. When saturation is not needed, use vsubh.

Syntax Behavior
Rd=vsubh(Rt,Rs)[:sat] for (i=0;i<2;i++) {
Rd.h[i]=[sat16](Rt.h[i]-Rs.h[i]);
}
Rd=vsubuh(Rt,Rs):sat for (i=0;i<2;i++) {
Rd.h[i]=usat16(Rt.uh[i]-Rs.uh[i]);
}

Class: ALU32 (slots 0,1,2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=vsubh(Rt,Rs) Word32 Q6_R_vsubh_RR(Word32 Rt, Word32 Rs)


Rd=vsubh(Rt,Rs):sat Word32 Q6_R_vsubh_RR_sat(Word32 Rt, Word32 Rs)
Rd=vsubuh(Rt,Rs):sat Word32 Q6_R_vsubuh_RR_sat(Word32 Rt, Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS P MajOp MinOp s5 Parse t5 d5
1 1 1 1 0 1 1 0 1 0 0 s s s s s P P - t t t t t - - - d d d d d Rd=vsubh(Rt,Rs)
1 1 1 1 0 1 1 0 1 0 1 s s s s s P P - t t t t t - - - d d d d d Rd=vsubh(Rt,Rs):sat
1 1 1 1 0 1 1 0 1 1 1 s s s s s P P - t t t t t - - - d d d d d Rd=vsubuh(Rt,Rs):sat

Field name Description


MajOp Major opcode
MinOp Minor opcode
P Predicated
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 170


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Zero extend
Zero-extend the least significant byte or halfword from Rs and place the 32-bit result in Rd.
Rd=zxth(Rs) Rd=zxtb(Rs)

Rs Rs

0x0000 Rd 0x000000 Rd

Syntax Behavior
Rd=zxtb(Rs) Assembler mapped to: "Rd=and(Rs,#255)"
Rd=zxth(Rs) Rd = zxt16->32(Rs);

Class: ALU32 (slots 0,1,2,3)

Intrinsics

Rd=zxtb(Rs) Word32 Q6_R_zxtb_R(Word32 Rs)


Rd=zxth(Rs) Word32 Q6_R_zxth_R(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp MinOp s5 Parse C d5
0 1 1 1 0 0 0 0 1 1 0 s s s s s P P 0 - - - - - - - - d d d d d Rd=zxth(Rs)

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
C Conditional
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s

80-N2040-53 Rev. AA 171


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.1.2 ALU32 PERM


The ALU32 PERM instruction subclass includes instructions that rearrange or perform format
conversion on vector data types.

Combine words into doubleword


Combine halfwords or words into larger values.
In a halfword combine, either the high or low halfword of the first source register is transferred to
the most-significant halfword of the destination register, while either the high or low halfword of
the second source register is transferred to the least-significant halfword of the destination
register. Source and destination registers are 32 bits.
In a word combine, the first source register is transferred to the most-significant word of the
destination register, while the second source register is transferred to the least-significant word of
the destination register. Source registers are 32 bits and the destination register is 64 bits.
In a variant of word combine, signed 8-bit immediate values (instead of registers) are transferred
to the most- and least-significant words of the 64-bit destination register. Optionally, one of the
immediate values can be 32 bits.

Rd=combine(Rt.[HL],Rs.[HL])

Rt.H Rt.L Rs.H Rs.L


Rdd=combine(Rs,Rt)

Mux Mux Rs Rt

Rd Rdd

Syntax Behavior
Rd=combine(Rt.[HL],Rs.[HL]) Rd = (Rt.uh[01]<<16) | Rs.uh[01];
Rdd=combine(#s8,#S8) apply_extension(#s);
Rdd.w[0]=#S;
Rdd.w[1]=#s;
Rdd=combine(#s8,#U6) apply_extension(#U);
Rdd.w[0]=#U;
Rdd.w[1]=#s;
Rdd=combine(#s8,Rs) apply_extension(#s);
Rdd.w[0]=Rs;
Rdd.w[1]=#s;
Rdd=combine(Rs,#s8) apply_extension(#s);
Rdd.w[0]=#s;
Rdd.w[1]=Rs;

80-N2040-53 Rev. AA 172


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rdd=combine(Rs,Rt) Rdd.w[0]=Rt;
Rdd.w[1]=Rs;

Class: ALU32 (slots 0,1,2,3)

Intrinsics

Rd=combine(Rt.H,Rs.H) Word32 Q6_R_combine_RhRh(Word32 Rt, Word32 Rs)


Rd=combine(Rt.H,Rs.L) Word32 Q6_R_combine_RhRl(Word32 Rt, Word32 Rs)
Rd=combine(Rt.L,Rs.H) Word32 Q6_R_combine_RlRh(Word32 Rt, Word32 Rs)
Rd=combine(Rt.L,Rs.L) Word32 Q6_R_combine_RlRl(Word32 Rt, Word32 Rs)
Rdd=combine(#s8,#S8) Word64 Q6_P_combine_II(Word32 Is8, Word32 IS8)
Rdd=combine(#s8,Rs) Word64 Q6_P_combine_IR(Word32 Is8, Word32 Rs)
Rdd=combine(Rs,#s8) Word64 Q6_P_combine_RI(Word32 Rs, Word32 Is8)
Rdd=combine(Rs,Rt) Word64 Q6_P_combine_RR(Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp MinOp s5 Parse d5
0 1 1 1 0 0 1 1 - 0 0 s s s s s P P 1 i i i i i i i i d d d d d Rdd=combine(Rs,#s8)
0 1 1 1 0 0 1 1 - 0 1 s s s s s P P 1 i i i i i i i i d d d d d Rdd=combine(#s8,Rs)
ICLASS Rs MajOp MinOp Parse d5
0 1 1 1 1 1 0 0 0 I I I I I I I P P I i i i i i i i i d d d d d Rdd=combine(#s8,#S8)
0 1 1 1 1 1 0 0 1 - - I I I I I P P I i i i i i i i i d d d d d Rdd=combine(#s8,#U6)
ICLASS P MajOp MinOp s5 Parse t5 d5
1 1 1 1 0 0 1 1 1 0 0 s s s s s P P - t t t t t - - - d d d d d Rd=combine(Rt.H,Rs.H)
1 1 1 1 0 0 1 1 1 0 1 s s s s s P P - t t t t t - - - d d d d d Rd=combine(Rt.H,Rs.L)
1 1 1 1 0 0 1 1 1 1 0 s s s s s P P - t t t t t - - - d d d d d Rd=combine(Rt.L,Rs.H)
1 1 1 1 0 0 1 1 1 1 1 s s s s s P P - t t t t t - - - d d d d d Rd=combine(Rt.L,Rs.L)
1 1 1 1 0 1 0 1 0 - - s s s s s P P - t t t t t - - - d d d d d Rdd=combine(Rs,Rt)

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
P Predicated
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 173


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Mux
Select between two source registers based on the least-significant bit of a predicate register. If the
bit is 1, transfer the first source register to the destination register; otherwise, transfer the second
source register. Source and destination registers are 32 bits.
In a variant of mux, signed 8-bit immediate values are used instead of registers for either or both
source operands.
For 64-bit versions of this instruction, see the XTYPE vmux (Vector mux) instruction.

Syntax Behavior
Rd=mux(Pu,#s8,#S8) PREDUSE_TIMING;
apply_extension(#s);
Rd = (Pu[0] ? #s : #S);
Rd=mux(Pu,#s8,Rs) PREDUSE_TIMING;
apply_extension(#s);
Rd = (Pu[0] ? #s : Rs);
Rd=mux(Pu,Rs,#s8) PREDUSE_TIMING;
apply_extension(#s);
Rd = (Pu[0] ? Rs : #s);
Rd=mux(Pu,Rs,Rt) PREDUSE_TIMING;
Rd = (Pu[0] ? Rs : Rt);

Class: ALU32 (slots 0,1,2,3)

Intrinsics

Rd=mux(Pu,#s8,#S8) Word32 Q6_R_mux_pII(Byte Pu, Word32 Is8, Word32


IS8)
Rd=mux(Pu,#s8,Rs) Word32 Q6_R_mux_pIR(Byte Pu, Word32 Is8, Word32 Rs)
Rd=mux(Pu,Rs,#s8) Word32 Q6_R_mux_pRI(Byte Pu, Word32 Rs, Word32 Is8)
Rd=mux(Pu,Rs,Rt) Word32 Q6_R_mux_pRR(Byte Pu, Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp u2 s5 Parse d5
0 1 1 1 0 0 1 1 0 u u s s s s s P P 0 i i i i i i i i d d d d d Rd=mux(Pu,Rs,#s8)
0 1 1 1 0 0 1 1 1 u u s s s s s P P 0 i i i i i i i i d d d d d Rd=mux(Pu,#s8,Rs)
ICLASS Rs u1 Parse d5
0 1 1 1 1 0 1 u u I I I I I I I P P I i i i i i i i i d d d d d Rd=mux(Pu,#s8,#S8)
ICLASS P MajOp s5 Parse t5 u2 d5
1 1 1 1 0 1 0 0 - - - s s s s s P P - t t t t t - u u d d d d d Rd=mux(Pu,Rs,Rt)

80-N2040-53 Rev. AA 174


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
P Predicated
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
u2 Field to encode register u

80-N2040-53 Rev. AA 175


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Shift word by 16
ASLH performs an arithmetic left shift of the 32-bit source register by 16 bits (one halfword). The
lower 16 bits of the destination are zero-filled.

Rs

0x0000 Rd

ASRH performs an arithmetic right shift of the 32-bit source register by 16 bits (one halfword).
The upper 16 bits of the destination are sign-extended.

Rs

Sign-extend Rd

Syntax Behavior
Rd=aslh(Rs) Rd=Rs<<16;
Rd=asrh(Rs) Rd=Rs>>16;

Class: ALU32 (slots 0,1,2,3)

Intrinsics

Rd=aslh(Rs) Word32 Q6_R_aslh_R(Word32 Rs)


Rd=asrh(Rs) Word32 Q6_R_asrh_R(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp MinOp s5 Parse C d5
0 1 1 1 0 0 0 0 0 0 0 s s s s s P P 0 - - - - - - - - d d d d d Rd=aslh(Rs)
0 1 1 1 0 0 0 0 0 0 1 s s s s s P P 0 - - - - - - - - d d d d d Rd=asrh(Rs)

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
C Conditional
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s

80-N2040-53 Rev. AA 176


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Pack high and low halfwords


Pack together the most-significant halfwords from Rs and Rt into the most-significant word of
register pair Rdd, and the least-significant halfwords from Rs and Rt into the least-significant
halfword of Rdd.
Rs Rt

Rdd

Syntax Behavior
Rdd=packhl(Rs,Rt) Rdd.h[0]=Rt.h[0];
Rdd.h[1]=Rs.h[0];
Rdd.h[2]=Rt.h[1];
Rdd.h[3]=Rs.h[1];

Class: ALU32 (slots 0,1,2,3)

Intrinsics

Rdd=packhl(Rs,Rt) Word64 Q6_P_packhl_RR(Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS P MajOp MinOp s5 Parse t5 d5
1 1 1 1 0 1 0 1 1 - - s s s s s P P - t t t t t - - - d d d d d Rdd=packhl(Rs,Rt)

Field name Description


MajOp Major opcode
MinOp Minor opcode
P Predicated
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 177


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.1.3 ALU32 PRED


The ALU32 PRED instruction subclass includes instructions that perform conditional arithmetic
and logical operations based on the values stored in a predicate register, and which produce
predicate results. They are executable on any slot.

Conditional add
If the least-significant bit of predicate Pu is set, add a 32-bit source register to either another
register or an immediate value. The result is placed in 32-bit destination register. If the predicate
is false, the instruction does nothing.
Syntax Behavior
if ([!]Pu[.new]) if([!]Pu[.new][0]){
Rd=add(Rs,#s8) apply_extension(#s);
Rd=Rs+#s;
} else {
NOP;
}
if ([!]Pu[.new]) if([!]Pu[.new][0]){
Rd=add(Rs,Rt) Rd=Rs+Rt;
} else {
NOP;
}

Class: ALU32 (slots 0,1,2,3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

ICLASS Rs MajOp PS u2 s5 Parse D d5


N
0 1 1 1 0 1 0 0 0 u u s s s s s P P 0 i i i i i i i i d d d d d if (Pu) Rd=add(Rs,#s8)
0 1 1 1 0 1 0 0 0 u u s s s s s P P 1 i i i i i i i i d d d d d if (Pu.new) Rd=add(Rs,#s8)
0 1 1 1 0 1 0 0 1 u u s s s s s P P 0 i i i i i i i i d d d d d if (!Pu) Rd=add(Rs,#s8)

d if
(!Pu.new)
0 1 1 1 0 1 0 0 1 u u s s s s s P P 1 i i i i i i i i d d d d
Rd=add(Rs,#s8)

ICLASS P MajOp MinOp s5 Parse D t5 PS u2 d5


N
1 1 1 1 1 0 1 1 0 - 0 s s s s s P P 0 t t t t t 0 u u d d d d d if (Pu) Rd=add(Rs,Rt)
1 1 1 1 1 0 1 1 0 - 0 s s s s s P P 0 t t t t t 1 u u d d d d d if (!Pu) Rd=add(Rs,Rt)
1 1 1 1 1 0 1 1 0 - 0 s s s s s P P 1 t t t t t 0 u u d d d d d if (Pu.new) Rd=add(Rs,Rt)
1 1 1 1 1 0 1 1 0 - 0 s s s s s P P 1 t t t t t 1 u u d d d d d if (!Pu.new) Rd=add(Rs,Rt)

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
DN Dot-new
PS Predicate sense
P Predicated

80-N2040-53 Rev. AA 178


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
u2 Field to encode register u

80-N2040-53 Rev. AA 179


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Conditional shift halfword


Conditionally shift a halfword.
The aslh instruction performs an arithmetic left shift of the 32-bit source register by 16 bits (one
halfword). The lower 16 bits of the destination are zero-filled.

Rs

0x0000 Rd

The asrh instruction performs an arithmetic right shift of the 32-bit source register by 16 bits (one
halfword). The upper 16 bits of the destination are sign-extended.

Rs

Sign-extend Rd

Syntax Behavior
if ([!]Pu[.new]) Rd=aslh(Rs) if([!]Pu[.new][0]){
Rd=Rs<<16;
} else {
NOP;
}
if ([!]Pu[.new]) Rd=asrh(Rs) if([!]Pu[.new][0]){
Rd=Rs>>16;
} else {
NOP;
}

Class: ALU32 (slots 0,1,2,3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp MinOp s5 Parse C S dn u2 d5
0 1 1 1 0 0 0 0 0 0 0 s s s s s P P 1 - 0 0 u u - - - d d d d d if (Pu) Rd=aslh(Rs)
0 1 1 1 0 0 0 0 0 0 0 s s s s s P P 1 - 0 1 u u - - - d d d d d if (Pu.new) Rd=aslh(Rs)
0 1 1 1 0 0 0 0 0 0 0 s s s s s P P 1 - 1 0 u u - - - d d d d d if (!Pu) Rd=aslh(Rs)
0 1 1 1 0 0 0 0 0 0 0 s s s s s P P 1 - 1 1 u u - - - d d d d d if (!Pu.new) Rd=aslh(Rs)
0 1 1 1 0 0 0 0 0 0 1 s s s s s P P 1 - 0 0 u u - - - d d d d d if (Pu) Rd=asrh(Rs)
0 1 1 1 0 0 0 0 0 0 1 s s s s s P P 1 - 0 1 u u - - - d d d d d if (Pu.new) Rd=asrh(Rs)
0 1 1 1 0 0 0 0 0 0 1 s s s s s P P 1 - 1 0 u u - - - d d d d d if (!Pu) Rd=asrh(Rs)
0 1 1 1 0 0 0 0 0 0 1 s s s s s P P 1 - 1 1 u u - - - d d d d d if (!Pu.new) Rd=asrh(Rs)

80-N2040-53 Rev. AA 180


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
C Conditional
S Predicate sense
dn Dot-new
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
u2 Field to encode register u

80-N2040-53 Rev. AA 181


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Conditional combine
If the least-significant bit of predicate Pu is set, the most-significant word of destination Rdd is
taken from the first source register Rs, while the least-significant word is taken from the second
source register Rt. If the predicate is false, this instruction does nothing.

Syntax Behavior
if ([!]Pu[.new]) if ([!]Pu[.new][0]) {
Rdd=combine(Rs,Rt) Rdd.w[0]=Rt;
Rdd.w[1]=Rs;
} else {
NOP;
}

Class: ALU32 (slots 0,1,2,3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

ICLASS P MajOp MinOp s5 Parse D t5 PS u2 d5


N
1 1 1 1 1 1 0 1 0 0 0 s s s s s P P 0 t t t t t 0 u u d d d d d if (Pu) Rdd=combine(Rs,Rt)

1 1 1 1 1 1 0 1 0 0 0 s s s s s P P 0 t t t t t 1 u u d d d d d if (!Pu)
Rdd=combine(Rs,Rt)
if (Pu.new)
1 1 1 1 1 1 0 1 0 0 0 s s s s s P P 1 t t t t t 0 u u d d d d d Rdd=combine(Rs,Rt)

1 1 1 1 1 1 0 1 0 0 0 s s s s s P P 1 t t t t t 1 u u d d d d d if (!Pu.new)
Rdd=combine(Rs,Rt)

Field name Description


DN Dot-new
MajOp Major opcode
MinOp Minor opcode
P Predicated
PS Predicate sense
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
u2 Field to encode register u

80-N2040-53 Rev. AA 182


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Conditional logical operations


If the least-significant bit of predicate Pu is set, do a logical operation on the source values. The
result is placed in 32-bit destination register. If the predicate is false, the instruction does nothing.

Syntax Behavior
if ([!]Pu[.new]) if([!]Pu[.new][0]){
Rd=and(Rs,Rt) Rd=Rs&Rt;
} else {
NOP;
}
if ([!]Pu[.new]) if([!]Pu[.new][0]){
Rd=or(Rs,Rt) Rd=Rs|Rt;
} else {
NOP;
}
if ([!]Pu[.new]) if([!]Pu[.new][0]){
Rd=xor(Rs,Rt) Rd=Rs^Rt;
} else {
NOP;
}

Class: ALU32 (slots 0,1,2,3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
D
ICLASS P MajOp MinOp s5 Parse N t5 PS u2 d5

1 1 1 1 1 0 0 1 - 0 0 s s s s s P P 0 t t t t t 0 u u d d d d d if (Pu) Rd=and(Rs,Rt)
1 1 1 1 1 0 0 1 - 0 0 s s s s s P P 0 t t t t t 1 u u d d d d d if (!Pu) Rd=and(Rs,Rt)
1 1 1 1 1 0 0 1 - 0 0 s s s s s P P 1 t t t t t 0 u u d d d d d if (Pu.new) Rd=and(Rs,Rt)
1 1 1 1 1 0 0 1 - 0 0 s s s s s P P 1 t t t t t 1 u u d d d d d if (!Pu.new) Rd=and(Rs,Rt)
1 1 1 1 1 0 0 1 - 0 1 s s s s s P P 0 t t t t t 0 u u d d d d d if (Pu) Rd=or(Rs,Rt)
1 1 1 1 1 0 0 1 - 0 1 s s s s s P P 0 t t t t t 1 u u d d d d d if (!Pu) Rd=or(Rs,Rt)
1 1 1 1 1 0 0 1 - 0 1 s s s s s P P 1 t t t t t 0 u u d d d d d if (Pu.new) Rd=or(Rs,Rt)
1 1 1 1 1 0 0 1 - 0 1 s s s s s P P 1 t t t t t 1 u u d d d d d if (!Pu.new) Rd=or(Rs,Rt)
1 1 1 1 1 0 0 1 - 1 1 s s s s s P P 0 t t t t t 0 u u d d d d d if (Pu) Rd=xor(Rs,Rt)
1 1 1 1 1 0 0 1 - 1 1 s s s s s P P 0 t t t t t 1 u u d d d d d if (!Pu) Rd=xor(Rs,Rt)
1 1 1 1 1 0 0 1 - 1 1 s s s s s P P 1 t t t t t 0 u u d d d d d if (Pu.new) Rd=xor(Rs,Rt)
1 1 1 1 1 0 0 1 - 1 1 s s s s s P P 1 t t t t t 1 u u d d d d d if (!Pu.new) Rd=xor(Rs,Rt)

Field name Description


DN Dot-new
MajOp Major opcode
MinOp Minor opcode
P Predicated
PS Predicate sense
ICLASS Instruction class
Parse Packet/loop parse bits

80-N2040-53 Rev. AA 183


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
u2 Field to encode register u

80-N2040-53 Rev. AA 184


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Conditional subtract
If the least-significant bit of predicate Pu is set, subtract a 32-bit source register Rt from register
Rs. The result is placed in a 32-bit destination register. If the predicate is false, the instruction
does nothing.

Syntax Behavior
if ([!]Pu[.new]) if([!]Pu[.new][0]){
Rd=sub(Rt,Rs) Rd=Rt-Rs;
} else {
NOP;
}

Class: ALU32 (slots 0,1,2,3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

ICLASS P MajOp MinOp s5 Parse D t5 PS u2 d5


N
1 1 1 1 1 0 1 1 0 - 1 s s s s s P P 0 t t t t t 0 u u d d d d d if (Pu) Rd=sub(Rt,Rs)
1 1 1 1 1 0 1 1 0 - 1 s s s s s P P 0 t t t t t 1 u u d d d d d if (!Pu) Rd=sub(Rt,Rs)
1 1 1 1 1 0 1 1 0 - 1 s s s s s P P 1 t t t t t 0 u u d d d d d if (Pu.new) Rd=sub(Rt,Rs)
1 1 1 1 1 0 1 1 0 - 1 s s s s s P P 1 t t t t t 1 u u d d d d d if (!Pu.new) Rd=sub(Rt,Rs)

Field name Description


DN Dot-new
MajOp Major opcode
MinOp Minor opcode
P Predicated
PS Predicate sense
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
u2 Field to encode register u

80-N2040-53 Rev. AA 185


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Conditional sign extend


Conditionally sign-extend the least-significant byte or halfword from Rs and put the 32-bit result
in Rd.
Rd=sxth(Rs) Rd=sxtb(Rs)

Rs Rs

Sign-extend Rd Sign-extend Rd

Syntax Behavior
if ([!]Pu[.new]) Rd=sxtb(Rs) if([!]Pu[.new][0]){
Rd=sxt8->32(Rs);
} else {
NOP;
}
if ([!]Pu[.new]) Rd=sxth(Rs) if([!]Pu[.new][0]){
Rd=sxt16->32(Rs);
} else {
NOP;
}

Class: ALU32 (slots 0,1,2,3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp MinOp s5 Parse C S dn u2 d5
0 1 1 1 0 0 0 0 1 0 1 s s s s s P P 1 - 0 0 u u - - - d d d d d if (Pu) Rd=sxtb(Rs)
0 1 1 1 0 0 0 0 1 0 1 s s s s s P P 1 - 0 1 u u - - - d d d d d if (Pu.new) Rd=sxtb(Rs)
0 1 1 1 0 0 0 0 1 0 1 s s s s s P P 1 - 1 0 u u - - - d d d d d if (!Pu) Rd=sxtb(Rs)
0 1 1 1 0 0 0 0 1 0 1 s s s s s P P 1 - 1 1 u u - - - d d d d d if (!Pu.new) Rd=sxtb(Rs)
0 1 1 1 0 0 0 0 1 1 1 s s s s s P P 1 - 0 0 u u - - - d d d d d if (Pu) Rd=sxth(Rs)
0 1 1 1 0 0 0 0 1 1 1 s s s s s P P 1 - 0 1 u u - - - d d d d d if (Pu.new) Rd=sxth(Rs)
0 1 1 1 0 0 0 0 1 1 1 s s s s s P P 1 - 1 0 u u - - - d d d d d if (!Pu) Rd=sxth(Rs)
0 1 1 1 0 0 0 0 1 1 1 s s s s s P P 1 - 1 1 u u - - - d d d d d if (!Pu.new) Rd=sxth(Rs)

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
C Conditional
S Predicate sense
dn Dot-new
ICLASS Instruction class

80-N2040-53 Rev. AA 186


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
u2 Field to encode register u

80-N2040-53 Rev. AA 187


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Conditional transfer
If the LSB of predicate Pu is set, transfer register Rs or a signed immediate into destination Rd. If
the predicate is false, this instruction does nothing.

Syntax Behavior
if ([!]Pu[.new]) apply_extension(#s);
Rd=#s12 if ([!]Pu[.new][0]) Rd=#s;
else NOP;
if ([!]Pu[.new]) Rd=Rs Assembler mapped to: "if ([!]Pu[.new]) Rd=add(Rs,#0)"
if ([!]Pu[.new]) Assembler mapped to: "if ([!]Pu[.new])
Rdd=Rss Rdd=combine(Rss.H32,Rss.L32)"

Class: ALU32 (slots 0,1,2,3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
D
ICLASS Rs MajOp PS u2 Parse N d5

0 1 1 1 1 1 1 0 0 u u 0 i i i i P P 0 i i i i i i i i d d d d d if (Pu) Rd=#s12
0 1 1 1 1 1 1 0 0 u u 0 i i i i P P 1 i i i i i i i i d d d d d if (Pu.new) Rd=#s12
0 1 1 1 1 1 1 0 1 u u 0 i i i i P P 0 i i i i i i i i d d d d d if (!Pu) Rd=#s12
0 1 1 1 1 1 1 0 1 u u 0 i i i i P P 1 i i i i i i i i d d d d d if (!Pu.new) Rd=#s12

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
DN Dot-new
PS Predicate sense
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
u2 Field to encode register u

80-N2040-53 Rev. AA 188


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Conditional zero extend


Conditionally zero-extend the least-significant byte or halfword from Rs and put the 32-bit result
in Rd.
Rd=zxth(Rs) Rd=zxtb(Rs)

Rs Rs

0x0000 Rd 0x000000 Rd

Syntax Behavior
if ([!]Pu[.new]) Rd=zxtb(Rs) if([!]Pu[.new][0]){
Rd=zxt8->32(Rs);
} else {
NOP;
}
if ([!]Pu[.new]) Rd=zxth(Rs) if([!]Pu[.new][0]){
Rd=zxt16->32(Rs);
} else {
NOP;
}

Class: ALU32 (slots 0,1,2,3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp MinOp s5 Parse C S dn u2 d5
0 1 1 1 0 0 0 0 1 0 0 s s s s s P P 1 - 0 0 u u - - - d d d d d if (Pu) Rd=zxtb(Rs)
0 1 1 1 0 0 0 0 1 0 0 s s s s s P P 1 - 0 1 u u - - - d d d d d if (Pu.new) Rd=zxtb(Rs)
0 1 1 1 0 0 0 0 1 0 0 s s s s s P P 1 - 1 0 u u - - - d d d d d if (!Pu) Rd=zxtb(Rs)
0 1 1 1 0 0 0 0 1 0 0 s s s s s P P 1 - 1 1 u u - - - d d d d d if (!Pu.new) Rd=zxtb(Rs)
0 1 1 1 0 0 0 0 1 1 0 s s s s s P P 1 - 0 0 u u - - - d d d d d if (Pu) Rd=zxth(Rs)
0 1 1 1 0 0 0 0 1 1 0 s s s s s P P 1 - 0 1 u u - - - d d d d d if (Pu.new) Rd=zxth(Rs)
0 1 1 1 0 0 0 0 1 1 0 s s s s s P P 1 - 1 0 u u - - - d d d d d if (!Pu) Rd=zxth(Rs)
0 1 1 1 0 0 0 0 1 1 0 s s s s s P P 1 - 1 1 u u - - - d d d d d if (!Pu.new) Rd=zxth(Rs)

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
C Conditional
S Predicate sense
dn Dot-new
ICLASS Instruction class
Parse Packet/loop parse bits

80-N2040-53 Rev. AA 189


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


d5 Field to encode register d
s5 Field to encode register s
u2 Field to encode register u

80-N2040-53 Rev. AA 190


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Compare
The register form compares two 32-bit registers for unsigned greater than, greater than, or equal.
The immediate form compares a register against a signed or unsigned immediate value. The 8-bit
predicate register Pd is set to all 1's or all 0's depending on the result. For 64-bit versions of this
instruction, see the XTYPE compare instructions.

Syntax Behavior
Pd=[!]cmp.eq(Rs,#s10) apply_extension(#s);
Pd=Rs[!]=#s ? 0xff : 0x00;
Pd=[!]cmp.eq(Rs,Rt) Pd=Rs[!]=Rt ? 0xff : 0x00;
Pd=[!]cmp.gt(Rs,#s10) apply_extension(#s);
Pd=Rs<=#s ? 0xff : 0x00;
Pd=[!]cmp.gt(Rs,Rt) Pd=Rs<=Rt ? 0xff : 0x00;
Pd=[!]cmp.gtu(Rs,#u9) apply_extension(#u);
Pd=Rs.uw[0]<=#u.uw[0] ? 0xff : 0x00;
Pd=[!]cmp.gtu(Rs,Rt) Pd=Rs.uw[0]<=Rt.uw[0] ? 0xff : 0x00;
Pd=cmp.ge(Rs,#s8) Assembler mapped to: "Pd=cmp.gt(Rs,#s8-1)"
Pd=cmp.geu(Rs,#u8) if ("#u8==0") {
Assembler mapped to:
"Pd=cmp.eq(Rs,Rs)";
} else {
Assembler mapped to:
"Pd=cmp.gtu(Rs,#u8-1)";
}
Pd=cmp.lt(Rs,Rt) Assembler mapped to: "Pd=cmp.gt(Rt,Rs)"
Pd=cmp.ltu(Rs,Rt) Assembler mapped to: "Pd=cmp.gtu(Rt,Rs)"

Class: ALU32 (slots 0,1,2,3)

Intrinsics

Pd=!cmp.eq(Rs,#s10) Byte Q6_p_not_cmp_eq_RI(Word32 Rs, Word32 Is10)


Pd=!cmp.eq(Rs,Rt) Byte Q6_p_not_cmp_eq_RR(Word32 Rs, Word32 Rt)
Pd=!cmp.gt(Rs,#s10) Byte Q6_p_not_cmp_gt_RI(Word32 Rs, Word32 Is10)
Pd=!cmp.gt(Rs,Rt) Byte Q6_p_not_cmp_gt_RR(Word32 Rs, Word32 Rt)
Pd=!cmp.gtu(Rs,#u9) Byte Q6_p_not_cmp_gtu_RI(Word32 Rs, Word32 Iu9)
Pd=!cmp.gtu(Rs,Rt) Byte Q6_p_not_cmp_gtu_RR(Word32 Rs, Word32 Rt)
Pd=cmp.eq(Rs,#s10) Byte Q6_p_cmp_eq_RI(Word32 Rs, Word32 Is10)
Pd=cmp.eq(Rs,Rt) Byte Q6_p_cmp_eq_RR(Word32 Rs, Word32 Rt)
Pd=cmp.ge(Rs,#s8) Byte Q6_p_cmp_ge_RI(Word32 Rs, Word32 Is8)
Pd=cmp.geu(Rs,#u8) Byte Q6_p_cmp_geu_RI(Word32 Rs, Word32 Iu8)
Pd=cmp.gt(Rs,#s10) Byte Q6_p_cmp_gt_RI(Word32 Rs, Word32 Is10)

80-N2040-53 Rev. AA 191


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Pd=cmp.gt(Rs,Rt) Byte Q6_p_cmp_gt_RR(Word32 Rs, Word32 Rt)


Pd=cmp.gtu(Rs,#u9) Byte Q6_p_cmp_gtu_RI(Word32 Rs, Word32 Iu9)
Pd=cmp.gtu(Rs,Rt) Byte Q6_p_cmp_gtu_RR(Word32 Rs, Word32 Rt)
Pd=cmp.lt(Rs,Rt) Byte Q6_p_cmp_lt_RR(Word32 Rs, Word32 Rt)
Pd=cmp.ltu(Rs,Rt) Byte Q6_p_cmp_ltu_RR(Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp MinOp s5 Parse d2
0 1 1 1 0 1 0 1 0 0 i s s s s s P P i i i i i i i i i 0 0 0 d d Pd=cmp.eq(Rs,#s10)
0 1 1 1 0 1 0 1 0 0 i s s s s s P P i i i i i i i i i 1 0 0 d d Pd=!cmp.eq(Rs,#s10)
0 1 1 1 0 1 0 1 0 1 i s s s s s P P i i i i i i i i i 0 0 0 d d Pd=cmp.gt(Rs,#s10)
0 1 1 1 0 1 0 1 0 1 i s s s s s P P i i i i i i i i i 1 0 0 d d Pd=!cmp.gt(Rs,#s10)
0 1 1 1 0 1 0 1 1 0 0 s s s s s P P i i i i i i i i i 0 0 0 d d Pd=cmp.gtu(Rs,#u9)
0 1 1 1 0 1 0 1 1 0 0 s s s s s P P i i i i i i i i i 1 0 0 d d Pd=!cmp.gtu(Rs,#u9)
ICLASS P MajOp MinOp s5 Parse t5 d2
1 1 1 1 0 0 1 0 - 0 0 s s s s s P P - t t t t t - - - 0 0 0 d d Pd=cmp.eq(Rs,Rt)
1 1 1 1 0 0 1 0 - 0 0 s s s s s P P - t t t t t - - - 1 0 0 d d Pd=!cmp.eq(Rs,Rt)
1 1 1 1 0 0 1 0 - 1 0 s s s s s P P - t t t t t - - - 0 0 0 d d Pd=cmp.gt(Rs,Rt)
1 1 1 1 0 0 1 0 - 1 0 s s s s s P P - t t t t t - - - 1 0 0 d d Pd=!cmp.gt(Rs,Rt)
1 1 1 1 0 0 1 0 - 1 1 s s s s s P P - t t t t t - - - 0 0 0 d d Pd=cmp.gtu(Rs,Rt)
1 1 1 1 0 0 1 0 - 1 1 s s s s s P P - t t t t t - - - 1 0 0 d d Pd=!cmp.gtu(Rs,Rt)

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
P Predicated
ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 192


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Compare to general register


The register form compares two 32-bit registers for unsigned greater than, greater than, or equal.
The immediate form compares a register against a signed or unsigned immediate value. The
resulting zero or one is placed in a general register.

Syntax Behavior
Rd=[!]cmp.eq(Rs,#s8) apply_extension(#s);
Rd=(Rs[!]=#s);
Rd=[!]cmp.eq(Rs,Rt) Rd=(Rs[!]=Rt);

Class: ALU32 (slots 0,1,2,3)

Intrinsics

Rd=!cmp.eq(Rs,#s8) Word32 Q6_R_not_cmp_eq_RI(Word32 Rs, Word32 Is8)


Rd=!cmp.eq(Rs,Rt) Word32 Q6_R_not_cmp_eq_RR(Word32 Rs, Word32 Rt)
Rd=cmp.eq(Rs,#s8) Word32 Q6_R_cmp_eq_RI(Word32 Rs, Word32 Is8)
Rd=cmp.eq(Rs,Rt) Word32 Q6_R_cmp_eq_RR(Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Rs MajOp MinOp s5 Parse d5
0 1 1 1 0 0 1 1 - 1 0 s s s s s P P 1 i i i i i i i i d d d d d Rd=cmp.eq(Rs,#s8)
0 1 1 1 0 0 1 1 - 1 1 s s s s s P P 1 i i i i i i i i d d d d d Rd=!cmp.eq(Rs,#s8)
ICLASS P MajOp MinOp s5 Parse t5 d5
1 1 1 1 0 0 1 1 0 1 0 s s s s s P P - t t t t t - - - d d d d d Rd=cmp.eq(Rs,Rt)
1 1 1 1 0 0 1 1 0 1 1 s s s s s P P - t t t t t - - - d d d d d Rd=!cmp.eq(Rs,Rt)

Field name Description


MajOp Major opcode
MinOp Minor opcode
Rs No Rs read
P Predicated
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 193


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.2 CR
The CR instruction class includes instructions which manage control registers, including hardware
looping, modulo addressing, and status flags.
CR instructions are executable on slot 3.

End loop instructions


The endloop instructions mark the end of a hardware loop. If the loop count (LC) register
indicates that a loop should continue to iterate, the LC register is decremented and the program
flow changes to the address in the start address (SA) register.
The endloopN instruction is actually a pseudo-instruction encoded in bits 15:14 of each
instruction. Therefore, no distinct 32-bit encoding exists for this instruction.
Syntax Behavior
endloop0 if (USR.LPCFG) {
if (USR.LPCFG==1) {
P3=0xff;
}
USR.LPCFG=USR.LPCFG-1;
}
if (LC0>1) {
PC=SA0;
LC0=LC0-1;
}
endloop01 if (USR.LPCFG) {
if (USR.LPCFG==1) {
P3=0xff;
}
USR.LPCFG=USR.LPCFG-1;
}
if (LC0>1) {
PC=SA0;
LC0=LC0-1;
} else {
if (LC1>1) {
PC=SA1;
LC1=LC1-1;
}
}
endloop1 if (LC1>1) {
PC=SA1;
LC1=LC1-1;
}

Class: N/A

Notes
■ This instruction cannot be grouped in a packet with any program flow instructions.

80-N2040-53 Rev. AA 194


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

■ The Next PC value is the address immediately following the last instruction in the packet
containing this instruction.
■ The PC value is the address of the start of the packet

80-N2040-53 Rev. AA 195


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Corner detection acceleration


The FASTCORNER9 instruction takes the Ps and Pt values and treats them as a circular bit string. If
any contiguous nine bits are set around the circle, the result is true, false otherwise. The sense
may be optionally inverted. This instruction is used to accelerate FAST corner detection.

Syntax Behavior
Pd=[!]fastcorner9(Ps,Pt) PREDUSE_TIMING;
tmp.h[0]=(Ps<<8)|Pt;
tmp.h[1]=(Ps<<8)|Pt;
for (i = 1; i < 9; i++) {
tmp &= tmp >> 1;
}
Pd = tmp == 0 ? 0xff : 0x00;

Class: CR (slot 2,3)

Notes

■ This instruction may execute on either slot2 or slot3, even though it is a CR-type

Intrinsics

Pd=!fastcorner9(Ps,Pt) Byte Q6_p_not_fastcorner9_pp(Byte Ps, Byte


Pt)
Pd=fastcorner9(Ps,Pt) Byte Q6_p_fastcorner9_pp(Byte Ps, Byte Pt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS sm s2 Parse t2 d2
0 1 1 0 1 0 1 1 0 0 0 0 - - s s P P 1 - - - t t 1 - - 1 - - d d Pd=fastcorner9(Ps,Pt)
0 1 1 0 1 0 1 1 0 0 0 1 - - s s P P 1 - - - t t 1 - - 1 - - d d Pd=!fastcorner9(Ps,Pt)

Field name Description


sm Supervisor mode only
ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s2 Field to encode register s
t2 Field to encode register t

80-N2040-53 Rev. AA 196


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Logical reductions on predicates


The any8 instruction sets a destination predicate register to 0xff if any of the low eight bits in
source predicate register Ps are set. Otherwise, the predicate is set to 0x00.
The all8 instruction sets a destination predicate register to 0xff if all of the low eight bits in the
source predicate register Ps are set. Otherwise, the predicate is set to 0x00.

Syntax Behavior
Pd=all8(Ps) PREDUSE_TIMING;
Pd = (Ps == 0xff ? 0xff : 0x00);
Pd=any8(Ps) PREDUSE_TIMING;
Pd = (Ps ? 0xff : 0x00);

Class: CR (slot 2,3)

Notes

■ This instruction may execute on either slot2 or slot3, even though it is a CR-type

Intrinsics

Pd=all8(Ps) Byte Q6_p_all8_p(Byte Ps)


Pd=any8(Ps) Byte Q6_p_any8_p(Byte Ps)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS sm s2 Parse d2
0 1 1 0 1 0 1 1 1 0 0 0 - - s s P P 0 - - - - - - - - - - - d d Pd=any8(Ps)
0 1 1 0 1 0 1 1 1 0 1 0 - - s s P P 0 - - - - - - - - - - - d d Pd=all8(Ps)

Field name Description


sm Supervisor mode only
ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s2 Field to encode register s

80-N2040-53 Rev. AA 197


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Looping instructions
loopN is a single instruction which sets up a hardware loop. The N in the instruction name
indicates the set of loop registers to use. Loop0 is the innermost loop, while loop1 is the outer
loop. The loopN instruction first sets the start address (SA) register based on a PC-relative
immediate add. The relative immediate is added to the PC and stored in SA. The loop count (LC)
register is set to either an unsigned immediate or to a register value.

Syntax Behavior
loop0(#r7:2,#U10) apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
SA0=PC+#r;
LC0=#U;
USR.LPCFG=0;
loop0(#r7:2,Rs) apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
SA0=PC+#r;
LC0=Rs;
USR.LPCFG=0;
loop1(#r7:2,#U10) apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
SA1=PC+#r;
LC1=#U;
loop1(#r7:2,Rs) apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
SA1=PC+#r;
LC1=Rs;

Class: CR (slot 3)

Notes

■ This instruction cannot execute in the last address of a hardware loop.


■ The Next PC value is the address immediately following the last instruction in the packet
containing this instruction.
■ The PC value is the address of the start of the packet
■ A PC-relative address is formed by taking the decoded immediate value and adding it to the
current PC value.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS sm s5 Parse
0 1 1 0 0 0 0 0 0 0 0 s s s s s P P - i i i i i - - - i i - - - loop0(#r7:2,Rs)
0 1 1 0 0 0 0 0 0 0 1 s s s s s P P - i i i i i - - - i i - - - loop1(#r7:2,Rs)
ICLASS sm Parse
0 1 1 0 1 0 0 1 0 0 0 I I I I I P P - i i i i i I I I i i - I I loop0(#r7:2,#U10)
0 1 1 0 1 0 0 1 0 0 1 I I I I I P P - i i i i i I I I i i - I I loop1(#r7:2,#U10)

80-N2040-53 Rev. AA 198


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


sm Supervisor mode only
ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s

80-N2040-53 Rev. AA 199


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Add to PC
Add an immediate value to the program counter (PC) and place the result in a destination register.
This instruction is typically used with a constant extender to add a 32-bit immediate value to PC.

Syntax Behavior
Rd=add(pc,#u6) Rd=PC+apply_extension(#u);

Class: CR (slot 3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS sm Parse d5
0 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 P P - i i i i i i - - d d d d d Rd=add(pc,#u6)

Field name Description


sm Supervisor mode only
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d

80-N2040-53 Rev. AA 200


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Pipelined loop instructions


The spNloop0 instruction is a single instruction that sets up a hardware loop with automatic
predicate control. This saves code size by enabling many software pipelined loops to generate
without prologue code. Upon executing this instruction, the P3 register automatically clears. After
the loop executes N times (where N is selectable from 1-3), the P3 register is set. This ensures
that store instructions in the loop are predicated with P3 and thus not enabled during the
pipeline warm-up.
In the spNloop0 instruction, the loop 0 (inner-loop) registers are used. This instruction sets the
start address (SA0) register based on a PC-relative immediate add. The relative immediate is
added to the PC and stored in SA0. The loop count (LC0) is set to either an unsigned immediate or
to a register value. The predicate P3 is cleared. The USR.LPCFG bits are set based on the N value.

Syntax Behavior
p3=sp1loop0(#r7:2,#U10) apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
SA0=PC+#r;
LC0=#U;
USR.LPCFG=1;
P3=0;
p3=sp1loop0(#r7:2,Rs) apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
SA0=PC+#r;
LC0=Rs;
USR.LPCFG=1;
P3=0;
p3=sp2loop0(#r7:2,#U10) apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
SA0=PC+#r;
LC0=#U;
USR.LPCFG=2;
P3=0;
p3=sp2loop0(#r7:2,Rs) apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
SA0=PC+#r;
LC0=Rs;
USR.LPCFG=2;
P3=0;
p3=sp3loop0(#r7:2,#U10) apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
SA0=PC+#r;
LC0=#U;
USR.LPCFG=3;
P3=0;
p3=sp3loop0(#r7:2,Rs) apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
SA0=PC+#r;
LC0=Rs;
USR.LPCFG=3;
P3=0;

80-N2040-53 Rev. AA 201


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: CR (slot 3)

Notes

■ The predicate generated by this instruction can not be used as a .new predicate, nor can it be
automatically ANDed with another predicate.
■ This instruction cannot execute in the last address of a hardware loop.
■ The Next PC value is the address immediately following the last instruction in the packet
containing this instruction.
■ The PC value is the address of the start of the packet
■ A PC-relative address is formed by taking the decoded immediate value and adding it to the
current PC value.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS sm s5 Parse
0 1 1 0 0 0 0 0 1 0 1 s s s s s P P - i i i i i - - - i i - - - p3=sp1loop0(#r7:2,Rs)
0 1 1 0 0 0 0 0 1 1 0 s s s s s P P - i i i i i - - - i i - - - p3=sp2loop0(#r7:2,Rs)
0 1 1 0 0 0 0 0 1 1 1 s s s s s P P - i i i i i - - - i i - - - p3=sp3loop0(#r7:2,Rs)
ICLASS sm Parse
0 1 1 0 1 0 0 1 1 0 1 I I I I I P P - i i i i i I I I i i - I I p3=sp1loop0(#r7:2,#U10)
0 1 1 0 1 0 0 1 1 1 0 I I I I I P P - i i i i i I I I i i - I I p3=sp2loop0(#r7:2,#U10)
0 1 1 0 1 0 0 1 1 1 1 I I I I I P P - i i i i i I I I i i - I I p3=sp3loop0(#r7:2,#U10)

Field name Description


sm Supervisor mode only
ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s

80-N2040-53 Rev. AA 202


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Logical operations on predicates


Perform bitwise logical operations on predicate registers.

Syntax Behavior
Pd=Ps Assembler mapped to: "Pd=or(Ps,Ps)"
Pd=and(Ps,and(Pt,[!]Pu)) PREDUSE_TIMING;
Pd = Ps & Pt & (~Pu);
Pd=and(Ps,or(Pt,[!]Pu)) PREDUSE_TIMING;
Pd = Ps & (Pt | (~Pu));
Pd=and(Pt,[!]Ps) PREDUSE_TIMING;
Pd=Pt & (~Ps);
Pd=not(Ps) PREDUSE_TIMING;
Pd=~Ps;
Pd=or(Ps,and(Pt,[!]Pu)) PREDUSE_TIMING;
Pd = Ps | (Pt & (~Pu));
Pd=or(Ps,or(Pt,[!]Pu)) PREDUSE_TIMING;
Pd = Ps | Pt | (~Pu);
Pd=or(Pt,[!]Ps) PREDUSE_TIMING;
Pd=Pt | (~Ps);
Pd=xor(Ps,Pt) PREDUSE_TIMING;
Pd=Ps ^ Pt;

Class: CR (slot 2,3)

Notes

■ This instruction may execute on either slot2 or slot3, even though it is a CR-type

Intrinsics

Pd=Ps Byte Q6_p_equals_p(Byte Ps)


Pd=and(Ps,and(Pt,!Pu)) Byte Q6_p_and_and_ppnp(Byte Ps, Byte Pt, Byte Pu)
Pd=and(Ps,and(Pt,Pu)) Byte Q6_p_and_and_ppp(Byte Ps, Byte Pt, Byte Pu)
Pd=and(Ps,or(Pt,!Pu)) Byte Q6_p_and_or_ppnp(Byte Ps, Byte Pt, Byte Pu)
Pd=and(Ps,or(Pt,Pu)) Byte Q6_p_and_or_ppp(Byte Ps, Byte Pt, Byte Pu)
Pd=and(Pt,!Ps) Byte Q6_p_and_pnp(Byte Pt, Byte Ps)
Pd=and(Pt,Ps) Byte Q6_p_and_pp(Byte Pt, Byte Ps)
Pd=not(Ps) Byte Q6_p_not_p(Byte Ps)
Pd=or(Ps,and(Pt,!Pu)) Byte Q6_p_or_and_ppnp(Byte Ps, Byte Pt, Byte Pu)
Pd=or(Ps,and(Pt,Pu)) Byte Q6_p_or_and_ppp(Byte Ps, Byte Pt, Byte Pu)

80-N2040-53 Rev. AA 203


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Pd=or(Ps,or(Pt,!Pu)) Byte Q6_p_or_or_ppnp(Byte Ps, Byte Pt, Byte Pu)


Pd=or(Ps,or(Pt,Pu)) Byte Q6_p_or_or_ppp(Byte Ps, Byte Pt, Byte Pu)
Pd=or(Pt,!Ps) Byte Q6_p_or_pnp(Byte Pt, Byte Ps)
Pd=or(Pt,Ps) Byte Q6_p_or_pp(Byte Pt, Byte Ps)
Pd=xor(Ps,Pt) Byte Q6_p_xor_pp(Byte Ps, Byte Pt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS sm s2 Parse t2 d2
0 1 1 0 1 0 1 1 0 0 0 0 - - s s P P 0 - - - t t - - - - - - d d Pd=and(Pt,Ps)
ICLASS sm s2 Parse t2 u2 d2
0 1 1 0 1 0 1 1 0 0 0 1 - - s s P P 0 - - - t t u u - - - - d d Pd=and(Ps,and(Pt,Pu))
ICLASS sm s2 Parse t2 d2
0 1 1 0 1 0 1 1 0 0 1 0 - - s s P P 0 - - - t t - - - - - - d d Pd=or(Pt,Ps)
ICLASS sm s2 Parse t2 u2 d2
0 1 1 0 1 0 1 1 0 0 1 1 - - s s P P 0 - - - t t u u - - - - d d Pd=and(Ps,or(Pt,Pu))
ICLASS sm s2 Parse t2 d2
0 1 1 0 1 0 1 1 0 1 0 0 - - s s P P 0 - - - t t - - - - - - d d Pd=xor(Ps,Pt)
ICLASS sm s2 Parse t2 u2 d2
0 1 1 0 1 0 1 1 0 1 0 1 - - s s P P 0 - - - t t u u - - - - d d Pd=or(Ps,and(Pt,Pu))
ICLASS sm s2 Parse t2 d2
0 1 1 0 1 0 1 1 0 1 1 0 - - s s P P 0 - - - t t - - - - - - d d Pd=and(Pt,!Ps)
ICLASS sm s2 Parse t2 u2 d2
0 1 1 0 1 0 1 1 0 1 1 1 - - s s P P 0 - - - t t u u - - - - d d Pd=or(Ps,or(Pt,Pu))
0 1 1 0 1 0 1 1 1 0 0 1 - - s s P P 0 - - - t t u u - - - - d d Pd=and(Ps,and(Pt,!Pu))
0 1 1 0 1 0 1 1 1 0 1 1 - - s s P P 0 - - - t t u u - - - - d d Pd=and(Ps,or(Pt,!Pu))
ICLASS sm s2 Parse d2
0 1 1 0 1 0 1 1 1 1 0 0 - - s s P P 0 - - - - - - - - - - - d d Pd=not(Ps)
ICLASS sm s2 Parse t2 u2 d2
0 1 1 0 1 0 1 1 1 1 0 1 - - s s P P 0 - - - t t u u - - - - d d Pd=or(Ps,and(Pt,!Pu))
ICLASS sm s2 Parse t2 d2
0 1 1 0 1 0 1 1 1 1 1 0 - - s s P P 0 - - - t t - - - - - - d d Pd=or(Pt,!Ps)
ICLASS sm s2 Parse t2 u2 d2
0 1 1 0 1 0 1 1 1 1 1 1 - - s s P P 0 - - - t t u u - - - - d d Pd=or(Ps,or(Pt,!Pu))

Field name Description


sm Supervisor mode only
ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s2 Field to encode register s
t2 Field to encode register t
u2 Field to encode register u

80-N2040-53 Rev. AA 204


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

User control register transfer


Move 32- or 64-bit values between a user control register and a general register. The user control
registers include SA, LC, Predicates, M, USR, PC, UGP, GP, and CS, and UPCYCLE. Registers are
moved as singles or as aligned 64-bit pairs.
The PC register is not writable. A program flow instruction must be used to change the PC value.
0 SA0 8 USR 16 24

1 LC0 9 PC

2 SA1 10 UGP

3 LC1 11 GP
Reserved Reserved
4 P3:0 12 CS0

5 Reserved 13 CS1

6 M0 14 UPCYCLELO

7 M1 UPCYCLEHI
15 23 31

Figure 11-1 User control registers and their register field encodings
Syntax Behavior
Cd=Rs Cd=Rs;
Cdd=Rss Cdd=Rss;
Rd=Cs Rd=Cs;
Rdd=Css Rdd=Css;

Class: CR (slot 3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS sm s5 Parse d5
0 1 1 0 0 0 1 0 0 0 1 s s s s s P P - - - - - - - - - d d d d d Cd=Rs
0 1 1 0 0 0 1 1 0 0 1 s s s s s P P - - - - - - - - - d d d d d Cdd=Rss
0 1 1 0 1 0 0 0 0 0 0 s s s s s P P - - - - - - - - - d d d d d Rdd=Css
0 1 1 0 1 0 1 0 0 0 0 s s s s s P P - - - - - - - - - d d d d d Rd=Cs

Field name Description


sm Supervisor mode only
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s

80-N2040-53 Rev. AA 205


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.3 JR
The JR instruction class includes instructions to change the program flow to a new location
contained in a register.
JR instructions are executable on slot 2.

Call subroutine from register


Change the program flow to a subroutine. This instruction first transfers the next program
counter (NPC) value into the link register, and then jumps to a target address contained in a
register.
This instruction can only appear in slot 2.
Syntax Behavior
callr Rs LR=NPC;
PC=Rs;
callrh Rs LR=NPC;
PC=Rs;
if ([!]Pu) callr Rs if ([!]Pu[0]) {
LR=NPC;
PC=Rs;
}

Class: JR (slot 2)

Notes

■ This instruction can conditionally execute based on the value of a predicate register. If the
instruction is preceded by 'if Pn', the instruction only executes if the least-significant bit of the
predicate register is 1. Similarly, if the instruction is preceded by 'if !Pn', the instruction
executes only if the least-significant bit of Pn is 0.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse
0 1 0 1 0 0 0 0 1 0 1 s s s s s P P - - - - - - - - - - - - - - callr Rs
0 1 0 1 0 0 0 0 1 1 0 s s s s s P P - - - - - - - - - - - - - - callrh Rs
ICLASS s5 Parse u2
0 1 0 1 0 0 0 1 0 0 0 s s s s s P P - - - - u u - - - - - - - - if (Pu) callr Rs
0 1 0 1 0 0 0 1 0 0 1 s s s s s P P - - - - u u - - - - - - - - if (!Pu) callr Rs

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
u2 Field to encode register u

80-N2040-53 Rev. AA 206


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Hinted call subroutine from register


Change the program flow to a subroutine. This instruction first transfers the next program
counter (NPC) value into the link register, and then jumps to a target address contained in a
register. This instruction is effective only when a preceding hintjr exists.
This instruction can only appear in slot 2.

Syntax Behavior
callrh Rs LR=NPC;
PC=Rs;

Class: JR (slot 2)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse
0 1 0 1 0 0 0 0 1 1 0 s s s s s P P - - - - - - - - - - - - - - callrh Rs

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s

80-N2040-53 Rev. AA 207


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Hint an indirect jump address


Provide a hint indicating that there will soon be an indirect call to the address specified in Rs. The
indirect call can be either jumprh or callrh.
This instruction can appear in either slot 2 or slot 3.

Syntax Behavior
hintjr(Rs) ;

Class: JR (slot 2,3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse
0 1 0 1 0 0 1 0 1 0 1 s s s s s P P - - - - - - - - - - - - - - hintjr(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s

80-N2040-53 Rev. AA 208


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Jump to address from register


Change the program flow to a target address. This instruction changes the program counter to a
target address contained in a register.
This instruction can appear only in slot 2.

Syntax Behavior
if ([!]Pu) jumpr Rs Assembler mapped to: "if ([!]Pu)
""jumpr"":nt ""Rs"
if ([!]Pu[.new]) {
jumpr:<hint> Rs if([!]Pu[.new][0]){
PC=Rs;
}
jumpr Rs PC=Rs;
jumprh Rs PC=Rs;

Class: JR (slot 2)

Notes

■ This instruction can conditionally execute based on the value of a predicate register. If the
instruction is preceded by 'if Pn', the instruction only executes if the least-significant bit of the
predicate register is 1. Similarly, if the instruction is preceded by 'if !Pn', the instruction
executes only if the least-significant bit of Pn is 0.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse
0 1 0 1 0 0 1 0 1 0 0 s s s s s P P - - - - - - - - - - - - - - jumpr Rs
0 1 0 1 0 0 1 0 1 1 0 s s s s s P P - - - - - - - - - - - - - - jumprh Rs
ICLASS s5 Parse u2
0 1 0 1 0 0 1 1 0 1 0 s s s s s P P - 0 0 - u u - - - - - - - - if (Pu) jumpr:nt Rs
0 1 0 1 0 0 1 1 0 1 0 s s s s s P P - 0 1 - u u - - - - - - - - if (Pu.new) jumpr:nt Rs
0 1 0 1 0 0 1 1 0 1 0 s s s s s P P - 1 0 - u u - - - - - - - - if (Pu) jumpr:t Rs
0 1 0 1 0 0 1 1 0 1 0 s s s s s P P - 1 1 - u u - - - - - - - - if (Pu.new) jumpr:t Rs
0 1 0 1 0 0 1 1 0 1 1 s s s s s P P - 0 0 - u u - - - - - - - - if (!Pu) jumpr:nt Rs
0 1 0 1 0 0 1 1 0 1 1 s s s s s P P - 0 1 - u u - - - - - - - - if (!Pu.new) jumpr:nt Rs
0 1 0 1 0 0 1 1 0 1 1 s s s s s P P - 1 0 - u u - - - - - - - - if (!Pu) jumpr:t Rs
0 1 0 1 0 0 1 1 0 1 1 s s s s s P P - 1 1 - u u - - - - - - - - if (!Pu.new) jumpr:t Rs

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
u2 Field to encode register u

80-N2040-53 Rev. AA 209


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Hinted jump to address from register


Change the program flow to a target address. This instruction changes the program counter to a
target address contained in a register. This instruction is effective only when a preceding hintjr
exists.
This instruction can appear only in slot 2.

Syntax Behavior
jumprh Rs PC=Rs;

Class: JR (slot 2)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse
0 1 0 1 0 0 1 0 1 1 0 s s s s s P P - - - - - - - - - - - - - - jumprh Rs

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s

80-N2040-53 Rev. AA 210


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.4 J
The J instruction class includes branch instructions (jumps and calls) that obtain the target
address from a (PC-relative) immediate address value.
J instructions are executable on slot 2 and slot 3.

Call subroutine
Change the program flow to a subroutine. This instruction first transfers the next program
counter (NPC) value into the link register, and then jumps to the target address.
This instruction can appear in slots 2 or 3.

Syntax Behavior
call #r22:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
LR=NPC;
PC=PC+#r;
if ([!]Pu) call #r15:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
if ([!]Pu[0]) {
LR=NPC;
PC=PC+#r;
}

Class: J (slots 2,3)

Notes

■ This instruction can conditionally execute based on the value of a predicate register. If the
instruction is preceded by 'if Pn', the instruction only executes if the least-significant bit of the
predicate register is 1. Similarly, if the instruction is preceded by 'if !Pn', the instruction is
executed only if the least-significant bit of Pn is 0.
■ The Next PC value is the address immediately following the last instruction in the packet
containing this instruction.
■ The PC value is the address of the start of the packet
■ A PC-relative address is formed by taking the decoded immediate value and adding it to the
current PC value.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Parse
0 1 0 1 1 0 1 i i i i i i i i i P P i i i i i i i i i i i i i 0 call #r22:2
D
ICLASS Parse N u2

0 1 0 1 1 1 0 1 i i 0 i i i i i P P i - 0 - u u i i i i i i i - if (Pu) call #r15:2


0 1 0 1 1 1 0 1 i i 1 i i i i i P P i - 0 - u u i i i i i i i - if (!Pu) call #r15:2

80-N2040-53 Rev. AA 211


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


ICLASS Instruction class
DN Dot-new
Parse Packet/loop parse bits
u2 Field to encode register u

80-N2040-53 Rev. AA 212


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Compare and jump


Compare two registers, or a register and immediate value, and write a predicate with the result.
Then use the predicate result to conditionally jump to a PC-relative target address.
The registers available as operands are restricted to R0-R7 and R16-R23. The predicate
destination is restricted to P0 and P1.
In assembly syntax, this instruction appears as two instructions in the packet: a compare and a
separate conditional jump. The assembler may convert adjacent compare and jump instructions
into compound compare-jump form.

Syntax Behavior
p[01]=cmp.eq(Rs,#-1); if P[01]=(Rs==-1) ? 0xff : 0x00 if
([!]p[01].new) jump:<hint> #r9:2 ([!]P[01].new[0]) {
apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
p[01]=cmp.eq(Rs,#U5); if P[01]=(Rs==#U) ? 0xff : 0x00 if
([!]p[01].new) jump:<hint> #r9:2 ([!]P[01].new[0]) {
apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
p[01]=cmp.eq(Rs,Rt); if P[01]=(Rs==Rt) ? 0xff : 0x00 if
([!]p[01].new) jump:<hint> #r9:2 ([!]P[01].new[0]) {
apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
p[01]=cmp.gt(Rs,#-1); if P[01]=(Rs>-1) ? 0xff : 0x00 if
([!]p[01].new) jump:<hint> #r9:2 ([!]P[01].new[0]) {
apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
p[01]=cmp.gt(Rs,#U5); if P[01]=(Rs>#U) ? 0xff : 0x00 if
([!]p[01].new) jump:<hint> #r9:2 ([!]P[01].new[0]) {
apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
p[01]=cmp.gt(Rs,Rt); if P[01]=(Rs>Rt) ? 0xff : 0x00 if
([!]p[01].new) jump:<hint> #r9:2 ([!]P[01].new[0]) {
apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}

80-N2040-53 Rev. AA 213


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
p[01]=cmp.gtu(Rs,#U5); if P[01]=(Rs.uw[0]>#U) ? 0xff : 0x00 if
([!]p[01].new) jump:<hint> #r9:2 ([!]P[01].new[0]) {
apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
p[01]=cmp.gtu(Rs,Rt); if P[01]=(Rs.uw[0]>Rt) ? 0xff : 0x00 if
([!]p[01].new) jump:<hint> #r9:2 ([!]P[01].new[0]) {
apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
p[01]=tstbit(Rs,#0); if P[01]=(Rs & 1) ? 0xff : 0x00 if
([!]p[01].new) jump:<hint> #r9:2 ([!]P[01].new[0]) {
apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}

Class: J (slots 0,1,2,3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s4 Parse
p0=cmp.eq(Rs,#-1); if
0 0 0 1 0 0 0 1 1 0 i i s s s s P P 0 - - - 0 0 i i i i i i i -
(p0.new) jump:nt #r9:2
p0=cmp.gt(Rs,#-1); if
0 0 0 1 0 0 0 1 1 0 i i s s s s P P 0 - - - 0 1 i i i i i i i -
(p0.new) jump:nt #r9:2
p0=tstbit(Rs,#0); if (p0.new)
0 0 0 1 0 0 0 1 1 0 i i s s s s P P 0 - - - 1 1 i i i i i i i -
jump:nt #r9:2
p0=cmp.eq(Rs,#-1); if
0 0 0 1 0 0 0 1 1 0 i i s s s s P P 1 - - - 0 0 i i i i i i i -
(p0.new) jump:t #r9:2
p0=cmp.gt(Rs,#-1); if
0 0 0 1 0 0 0 1 1 0 i i s s s s P P 1 - - - 0 1 i i i i i i i -
(p0.new) jump:t #r9:2
p0=tstbit(Rs,#0); if (p0.new)
0 0 0 1 0 0 0 1 1 0 i i s s s s P P 1 - - - 1 1 i i i i i i i - jump:t #r9:2
p0=cmp.eq(Rs,#-1); if
0 0 0 1 0 0 0 1 1 1 i i s s s s P P 0 - - - 0 0 i i i i i i i -
(!p0.new) jump:nt #r9:2
p0=cmp.gt(Rs,#-1); if
0 0 0 1 0 0 0 1 1 1 i i s s s s P P 0 - - - 0 1 i i i i i i i -
(!p0.new) jump:nt #r9:2
p0=tstbit(Rs,#0); if
0 0 0 1 0 0 0 1 1 1 i i s s s s P P 0 - - - 1 1 i i i i i i i -
(!p0.new) jump:nt #r9:2
p0=cmp.eq(Rs,#-1); if
0 0 0 1 0 0 0 1 1 1 i i s s s s P P 1 - - - 0 0 i i i i i i i -
(!p0.new) jump:t #r9:2
p0=cmp.gt(Rs,#-1); if
0 0 0 1 0 0 0 1 1 1 i i s s s s P P 1 - - - 0 1 i i i i i i i -
(!p0.new) jump:t #r9:2
p0=tstbit(Rs,#0); if
0 0 0 1 0 0 0 1 1 1 i i s s s s P P 1 - - - 1 1 i i i i i i i -
(!p0.new) jump:t #r9:2
p0=cmp.eq(Rs,#U5); if
0 0 0 1 0 0 0 0 0 0 i i s s s s P P 0 I I I I I i i i i i i i - (p0.new) jump:nt #r9:2
p0=cmp.eq(Rs,#U5); if
0 0 0 1 0 0 0 0 0 0 i i s s s s P P 1 I I I I I i i i i i i i -
(p0.new) jump:t #r9:2
p0=cmp.eq(Rs,#U5); if
0 0 0 1 0 0 0 0 0 1 i i s s s s P P 0 I I I I I i i i i i i i - (!p0.new) jump:nt #r9:2
p0=cmp.eq(Rs,#U5); if
0 0 0 1 0 0 0 0 0 1 i i s s s s P P 1 I I I I I i i i i i i i -
(!p0.new) jump:t #r9:2

80-N2040-53 Rev. AA 214


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
p0=cmp.gt(Rs,#U5); if
0 0 0 1 0 0 0 0 1 0 i i s s s s P P 0 I I I I I i i i i i i i -
(p0.new) jump:nt #r9:2

0 0 0 1 0 0 0 0 1 0 i i s s s s P P 1 I I I I I i i i i i i i - p0=cmp.gt(Rs,#U5); if
(p0.new) jump:t #r9:2
p0=cmp.gt(Rs,#U5); if
0 0 0 1 0 0 0 0 1 1 i i s s s s P P 0 I I I I I i i i i i i i -
(!p0.new) jump:nt #r9:2
p0=cmp.gt(Rs,#U5); if
0 0 0 1 0 0 0 0 1 1 i i s s s s P P 1 I I I I I i i i i i i i - (!p0.new) jump:t #r9:2
p0=cmp.gtu(Rs,#U5); if
0 0 0 1 0 0 0 1 0 0 i i s s s s P P 0 I I I I I i i i i i i i -
(p0.new) jump:nt #r9:2
p0=cmp.gtu(Rs,#U5); if
0 0 0 1 0 0 0 1 0 0 i i s s s s P P 1 I I I I I i i i i i i i -
(p0.new) jump:t #r9:2
p0=cmp.gtu(Rs,#U5); if
0 0 0 1 0 0 0 1 0 1 i i s s s s P P 0 I I I I I i i i i i i i -
(!p0.new) jump:nt #r9:2

0 0 0 1 0 0 0 1 0 1 i i s s s s P P 1 I I I I I i i i i i i i - p0=cmp.gtu(Rs,#U5); if
(!p0.new) jump:t #r9:2
p1=cmp.eq(Rs,#-1); if
0 0 0 1 0 0 1 1 1 0 i i s s s s P P 0 - - - 0 0 i i i i i i i -
(p1.new) jump:nt #r9:2
p1=cmp.gt(Rs,#-1); if
0 0 0 1 0 0 1 1 1 0 i i s s s s P P 0 - - - 0 1 i i i i i i i - (p1.new) jump:nt #r9:2
p1=tstbit(Rs,#0); if (p1.new)
0 0 0 1 0 0 1 1 1 0 i i s s s s P P 0 - - - 1 1 i i i i i i i -
jump:nt #r9:2

0 0 0 1 0 0 1 1 1 0 i i s s s s P P 1 - - - 0 0 i i i i i i i - p1=cmp.eq(Rs,#-1); if
(p1.new) jump:t #r9:2
p1=cmp.gt(Rs,#-1); if
0 0 0 1 0 0 1 1 1 0 i i s s s s P P 1 - - - 0 1 i i i i i i i -
(p1.new) jump:t #r9:2

0 0 0 1 0 0 1 1 1 0 i i s s s s P P 1 - - - 1 1 i i i i i i i - p1=tstbit(Rs,#0); if (p1.new)
jump:t #r9:2
p1=cmp.eq(Rs,#-1); if
0 0 0 1 0 0 1 1 1 1 i i s s s s P P 0 - - - 0 0 i i i i i i i -
(!p1.new) jump:nt #r9:2
p1=cmp.gt(Rs,#-1); if
0 0 0 1 0 0 1 1 1 1 i i s s s s P P 0 - - - 0 1 i i i i i i i -
(!p1.new) jump:nt #r9:2

0 0 0 1 0 0 1 1 1 1 i i s s s s P P 0 - - - 1 1 i i i i i i i - p1=tstbit(Rs,#0); if
(!p1.new) jump:nt #r9:2
p1=cmp.eq(Rs,#-1); if
0 0 0 1 0 0 1 1 1 1 i i s s s s P P 1 - - - 0 0 i i i i i i i -
(!p1.new) jump:t #r9:2
p1=cmp.gt(Rs,#-1); if
0 0 0 1 0 0 1 1 1 1 i i s s s s P P 1 - - - 0 1 i i i i i i i -
(!p1.new) jump:t #r9:2

0 0 0 1 0 0 1 1 1 1 i i s s s s P P 1 - - - 1 1 i i i i i i i - p1=tstbit(Rs,#0); if
(!p1.new) jump:t #r9:2
p1=cmp.eq(Rs,#U5); if
0 0 0 1 0 0 1 0 0 0 i i s s s s P P 0 I I I I I i i i i i i i -
(p1.new) jump:nt #r9:2
p1=cmp.eq(Rs,#U5); if
0 0 0 1 0 0 1 0 0 0 i i s s s s P P 1 I I I I I i i i i i i i - (p1.new) jump:t #r9:2
p1=cmp.eq(Rs,#U5); if
0 0 0 1 0 0 1 0 0 1 i i s s s s P P 0 I I I I I i i i i i i i -
(!p1.new) jump:nt #r9:2
p1=cmp.eq(Rs,#U5); if
0 0 0 1 0 0 1 0 0 1 i i s s s s P P 1 I I I I I i i i i i i i -
(!p1.new) jump:t #r9:2
p1=cmp.gt(Rs,#U5); if
0 0 0 1 0 0 1 0 1 0 i i s s s s P P 0 I I I I I i i i i i i i -
(p1.new) jump:nt #r9:2
p1=cmp.gt(Rs,#U5); if
0 0 0 1 0 0 1 0 1 0 i i s s s s P P 1 I I I I I i i i i i i i -
(p1.new) jump:t #r9:2
p1=cmp.gt(Rs,#U5); if
0 0 0 1 0 0 1 0 1 1 i i s s s s P P 0 I I I I I i i i i i i i -
(!p1.new) jump:nt #r9:2
p1=cmp.gt(Rs,#U5); if
0 0 0 1 0 0 1 0 1 1 i i s s s s P P 1 I I I I I i i i i i i i -
(!p1.new) jump:t #r9:2
p1=cmp.gtu(Rs,#U5); if
0 0 0 1 0 0 1 1 0 0 i i s s s s P P 0 I I I I I i i i i i i i -
(p1.new) jump:nt #r9:2
p1=cmp.gtu(Rs,#U5); if
0 0 0 1 0 0 1 1 0 0 i i s s s s P P 1 I I I I I i i i i i i i - (p1.new) jump:t #r9:2
p1=cmp.gtu(Rs,#U5); if
0 0 0 1 0 0 1 1 0 1 i i s s s s P P 0 I I I I I i i i i i i i -
(!p1.new) jump:nt #r9:2

80-N2040-53 Rev. AA 215


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
p1=cmp.gtu(Rs,#U5); if
0 0 0 1 0 0 1 1 0 1 i i s s s s P P 1 I I I I I i i i i i i i -
(!p1.new) jump:t #r9:2
ICLASS s4 Parse t4
p0=cmp.eq(Rs,Rt); if
0 0 0 1 0 1 0 0 0 0 i i s s s s P P 0 0 t t t t i i i i i i i -
(p0.new) jump:nt #r9:2

0 0 0 1 0 1 0 0 0 0 i i s s s s P P 0 1 t t t t i i i i i i i - p1=cmp.eq(Rs,Rt); if
(p1.new) jump:nt #r9:2
p0=cmp.eq(Rs,Rt); if
0 0 0 1 0 1 0 0 0 0 i i s s s s P P 1 0 t t t t i i i i i i i -
(p0.new) jump:t #r9:2
p1=cmp.eq(Rs,Rt); if
0 0 0 1 0 1 0 0 0 0 i i s s s s P P 1 1 t t t t i i i i i i i - (p1.new) jump:t #r9:2
p0=cmp.eq(Rs,Rt); if
0 0 0 1 0 1 0 0 0 1 i i s s s s P P 0 0 t t t t i i i i i i i -
(!p0.new) jump:nt #r9:2
p1=cmp.eq(Rs,Rt); if
0 0 0 1 0 1 0 0 0 1 i i s s s s P P 0 1 t t t t i i i i i i i -
(!p1.new) jump:nt #r9:2
p0=cmp.eq(Rs,Rt); if
0 0 0 1 0 1 0 0 0 1 i i s s s s P P 1 0 t t t t i i i i i i i -
(!p0.new) jump:t #r9:2
p1=cmp.eq(Rs,Rt); if
0 0 0 1 0 1 0 0 0 1 i i s s s s P P 1 1 t t t t i i i i i i i -
(!p1.new) jump:t #r9:2
p0=cmp.gt(Rs,Rt); if
0 0 0 1 0 1 0 0 1 0 i i s s s s P P 0 0 t t t t i i i i i i i -
(p0.new) jump:nt #r9:2
p1=cmp.gt(Rs,Rt); if
0 0 0 1 0 1 0 0 1 0 i i s s s s P P 0 1 t t t t i i i i i i i -
(p1.new) jump:nt #r9:2
p0=cmp.gt(Rs,Rt); if
0 0 0 1 0 1 0 0 1 0 i i s s s s P P 1 0 t t t t i i i i i i i -
(p0.new) jump:t #r9:2

0 0 0 1 0 1 0 0 1 0 i i s s s s P P 1 1 t t t t i i i i i i i - p1=cmp.gt(Rs,Rt); if
(p1.new) jump:t #r9:2
p0=cmp.gt(Rs,Rt); if
0 0 0 1 0 1 0 0 1 1 i i s s s s P P 0 0 t t t t i i i i i i i - (!p0.new) jump:nt #r9:2

0 0 0 1 0 1 0 0 1 1 i i s s s s P P 0 1 t t t t i i i i i i i - p1=cmp.gt(Rs,Rt); if
(!p1.new) jump:nt #r9:2

0 0 0 1 0 1 0 0 1 1 i i s s s s P P 1 0 t t t t i i i i i i i - p0=cmp.gt(Rs,Rt); if
(!p0.new) jump:t #r9:2
p1=cmp.gt(Rs,Rt); if
0 0 0 1 0 1 0 0 1 1 i i s s s s P P 1 1 t t t t i i i i i i i - (!p1.new) jump:t #r9:2
p0=cmp.gtu(Rs,Rt); if
0 0 0 1 0 1 0 1 0 0 i i s s s s P P 0 0 t t t t i i i i i i i -
(p0.new) jump:nt #r9:2
p1=cmp.gtu(Rs,Rt); if
0 0 0 1 0 1 0 1 0 0 i i s s s s P P 0 1 t t t t i i i i i i i -
(p1.new) jump:nt #r9:2
p0=cmp.gtu(Rs,Rt); if
0 0 0 1 0 1 0 1 0 0 i i s s s s P P 1 0 t t t t i i i i i i i -
(p0.new) jump:t #r9:2
p1=cmp.gtu(Rs,Rt); if
0 0 0 1 0 1 0 1 0 0 i i s s s s P P 1 1 t t t t i i i i i i i -
(p1.new) jump:t #r9:2
p0=cmp.gtu(Rs,Rt); if
0 0 0 1 0 1 0 1 0 1 i i s s s s P P 0 0 t t t t i i i i i i i -
(!p0.new) jump:nt #r9:2
p1=cmp.gtu(Rs,Rt); if
0 0 0 1 0 1 0 1 0 1 i i s s s s P P 0 1 t t t t i i i i i i i -
(!p1.new) jump:nt #r9:2

0 0 0 1 0 1 0 1 0 1 i i s s s s P P 1 0 t t t t i i i i i i i - p0=cmp.gtu(Rs,Rt); if
(!p0.new) jump:t #r9:2
p1=cmp.gtu(Rs,Rt); if
0 0 0 1 0 1 0 1 0 1 i i s s s s P P 1 1 t t t t i i i i i i i -
(!p1.new) jump:t #r9:2

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s4 Field to encode register s
t4 Field to encode register t

80-N2040-53 Rev. AA 216


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Jump to address
Change the program flow to a target address. This instruction changes the program counter to a
target address that is relative to the PC address. The offset from the current PC address is
contained in the instruction encoding.
A speculated jump instruction includes a hint ("taken" or "not taken") that specifies the expected
value of the conditional expression. If the actual generated value of the predicate differs from this
expected value, the jump instruction incurs a performance penalty.
This instruction can appear in slots 2 or 3.
Syntax Behavior
if ([!]Pu) jump #r15:2 Assembler mapped to: "if ([!]Pu) ""jump"":nt
""#r15:2"
if ([!]Pu) jump:<hint> if ([!]Pu[0]) {
#r15:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
jump #r22:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;

Class: J (slots 0,1,2,3)

Notes
■ This instruction can conditionally execute based on the value of a predicate register. If the
instruction is preceded by 'if Pn', the instruction only executes if the least-significant bit of the
predicate register is 1. Similarly, if the instruction is preceded by 'if !Pn', the instruction
executes only if the least-significant bit of Pn is 0.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Parse
0 1 0 1 1 0 0 i i i i i i i i i P P i i i i i i i i i i i i i - jump #r22:2
D
ICLASS Parse PT N u2

0 1 0 1 1 1 0 0 i i 0 i i i i i P P i 0 0 - u u i i i i i i i - if (Pu) jump:nt #r15:2


0 1 0 1 1 1 0 0 i i 0 i i i i i P P i 1 0 - u u i i i i i i i - if (Pu) jump:t #r15:2
0 1 0 1 1 1 0 0 i i 1 i i i i i P P i 0 0 - u u i i i i i i i - if (!Pu) jump:nt #r15:2
0 1 0 1 1 1 0 0 i i 1 i i i i i P P i 1 0 - u u i i i i i i i - if (!Pu) jump:t #r15:2

Field name Description


ICLASS Instruction class
DN Dot-new
PT Predict-taken
Parse Packet/loop parse bits
u2 Field to encode register u

80-N2040-53 Rev. AA 217


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Jump to address conditioned on new predicate


Perform speculated jump.
Jump if the LSB of the newly-generated predicate is true. The predicate must be generated in the
same packet as the speculated jump instruction.
A speculated jump instruction includes a hint ("taken" or "not taken") that specifies the expected
value of the conditional expression. If the actual generated value of the predicate differs from this
expected value, the jump instruction incurs a performance penalty.
This instruction can appear in slots 2 or 3.

Syntax Behavior
if ([!]Pu.new) jump:<hint> {
#r15:2 if([!]Pu.new[0]){
apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}

Class: J (slots 0,1,2,3)

Notes

■ This instruction can conditionally execute based on the value of a predicate register. If the
instruction is preceded by 'if Pn', the instruction only executes if the least-significant bit of the
predicate register is 1. Similarly, if the instruction is preceded by 'if !Pn', the instruction
executes only if the least-significant bit of Pn is 0.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
D
ICLASS Parse PT N u2

0 1 0 1 1 1 0 0 i i 0 i i i i i P P i 0 1 - u u i i i i i i i - if (Pu.new) jump:nt #r15:2


0 1 0 1 1 1 0 0 i i 0 i i i i i P P i 1 1 - u u i i i i i i i - if (Pu.new) jump:t #r15:2
0 1 0 1 1 1 0 0 i i 1 i i i i i P P i 0 1 - u u i i i i i i i - if (!Pu.new) jump:nt #r15:2
0 1 0 1 1 1 0 0 i i 1 i i i i i P P i 1 1 - u u i i i i i i i - if (!Pu.new) jump:t #r15:2

Field name Description


ICLASS Instruction class
DN Dot-new
PT Predict-taken
Parse Packet/loop parse bits
u2 Field to encode register u

80-N2040-53 Rev. AA 218


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Jump to address condition on register value


Perform register-conditional jump.
Jump if the specified register expression is true.
A register-conditional jump includes a hint ("taken" or "not taken") that specifies the expected
value of the register expression. If the actual generated value of the expression differs from this
expected value, the jump instruction incurs a performance penalty.
This instruction can appear only in slot 3.

Syntax Behavior
if (Rs!=#0) jump:nt #r13:2 if (Rs != 0) {
PC=PC+#r;
}
if (Rs!=#0) jump:t #r13:2 if (Rs != 0) {
PC=PC+#r;
}
if (Rs<=#0) jump:nt #r13:2 if (Rs<=0) {
PC=PC+#r;
}
if (Rs<=#0) jump:t #r13:2 if (Rs<=0) {
PC=PC+#r;
}
if (Rs==#0) jump:nt #r13:2 if (Rs == 0) {
PC=PC+#r;
}
if (Rs==#0) jump:t #r13:2 if (Rs == 0) {
PC=PC+#r;
}
if (Rs>=#0) jump:nt #r13:2 if (Rs>=0) {
PC=PC+#r;
}
if (Rs>=#0) jump:t #r13:2 if (Rs>=0) {
PC=PC+#r;
}

Class: J (slot 3)

Notes

■ This instruction will be deprecated in a future version.

80-N2040-53 Rev. AA 219


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS sm s5 Parse
0 1 1 0 0 0 0 1 0 0 i s s s s s P P i 0 i i i i i i i i i i i - if (Rs!=#0) jump:nt #r13:2
0 1 1 0 0 0 0 1 0 0 i s s s s s P P i 1 i i i i i i i i i i i - if (Rs!=#0) jump:t #r13:2
0 1 1 0 0 0 0 1 0 1 i s s s s s P P i 0 i i i i i i i i i i i - if (Rs>=#0) jump:nt #r13:2
0 1 1 0 0 0 0 1 0 1 i s s s s s P P i 1 i i i i i i i i i i i - if (Rs>=#0) jump:t #r13:2
0 1 1 0 0 0 0 1 1 0 i s s s s s P P i 0 i i i i i i i i i i i - if (Rs==#0) jump:nt #r13:2
0 1 1 0 0 0 0 1 1 0 i s s s s s P P i 1 i i i i i i i i i i i - if (Rs==#0) jump:t #r13:2
0 1 1 0 0 0 0 1 1 1 i s s s s s P P i 0 i i i i i i i i i i i - if (Rs<=#0) jump:nt #r13:2
0 1 1 0 0 0 0 1 1 1 i s s s s s P P i 1 i i i i i i i i i i i - if (Rs<=#0) jump:t #r13:2

Field name Description


sm Supervisor mode only
ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s

80-N2040-53 Rev. AA 220


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Transfer and jump


Move an unsigned immediate or register value into a destination register and unconditionally
jump. In assembly syntax, this instruction appears as two instructions in the packet, a transfer and
a separate jump. The assembler may convert adjacent transfer and jump instructions into
compound transfer-jump form.

Syntax Behavior
Rd=#U6 ; jump #r9:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
Rd=#U;
PC=PC+#r;
Rd=Rs ; jump #r9:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
Rd=Rs;
PC=PC+#r;

Class: J (slots 2,3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS d4 Parse
0 0 0 1 0 1 1 0 - - i i d d d d P P I I I I I I i i i i i i i - Rd=#U6 ; jump #r9:2
ICLASS s4 Parse d4
0 0 0 1 0 1 1 1 - - i i s s s s P P - - d d d d i i i i i i i - Rd=Rs ; jump #r9:2

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d4 Field to encode register d
s4 Field to encode register s

80-N2040-53 Rev. AA 221


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.5 LD
The LD instruction class includes load instructions, which are used to load values into registers.
LD instructions are executable on slot 0 and slot 1.

Load doubleword
Load a 64-bit doubleword from memory and place in a destination register pair.

Syntax Behavior
Rdd=memd(Re=#U6) apply_extension(#U);
EA=#U;
Rdd = *EA;
Re=#U;
Rdd=memd(Rs+#s11:3) apply_extension(#s);
EA=Rs+#s;
Rdd = *EA;
Rdd=memd(Rs+Rt<<#u2) EA=Rs+(Rt<<#u);
Rdd = *EA;
Rdd=memd(Rt<<#u2+#U6) apply_extension(#U);
EA=#U+(Rt<<#u);
Rdd = *EA;
Rdd=memd(Rx++#s4:3) EA=Rx;
Rx=Rx+#s;
Rdd = *EA;
Rdd=memd(Rx++#s4:3:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,#s,MuV);
Rdd = *EA;
Rdd=memd(Rx++I:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,I<<3,MuV);
Rdd = *EA;
Rdd=memd(Rx++Mu) EA=Rx;
Rx=Rx+MuV;
Rdd = *EA;
Rdd=memd(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
Rdd = *EA;
Rdd=memd(gp+#u16:3) apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
Rdd = *EA;

80-N2040-53 Rev. AA 222


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: LD (slots 0,1)

Intrinsics

Rdd=memd(Rx++#s4:3:circ(Mu)) Word32 Q6_R_memd_IM_circ(void** StartAddress,


Word32 Is4_3, Word32 Mu, void* BaseAddress)
Rdd=memd(Rx++I:circ(Mu)) Word32 Q6_R_memd_M_circ(void** StartAddress,
Word32 Mu, void* BaseAddress)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5 d5
0 0 1 1 1 0 1 0 1 1 0 s s s s s P P i t t t t t i - - d d d d d Rdd=memd(Rs+Rt<<#u2)
U
ICLASS Type Parse d5
N
0 1 0 0 1 i i 1 1 1 0 i i i i i P P i i i i i i i i i d d d d d Rdd=memd(gp+#u16:3)
U
ICLASS Amode Type s5 Parse d5
N
1 0 0 1 0 i i 1 1 1 0 s s s s s P P i i i i i i i i i d d d d d Rdd=memd(Rs+#s11:3)
U
ICLASS Amode Type x5 Parse u1 d5
N

1 0 0 1 1 0 0 1 1 1 0 x x x x x P P u 0 - - 0 i i i i d d d d d Rdd=memd(Rx++#s4:3:circ
(Mu))

1 0 0 1 1 0 0 1 1 1 0 x x x x x P P u 0 - - 1 - 0 - - d d d d d Rdd=memd(Rx++I:circ(Mu)
)
U
ICLASS Amode Type e5 Parse d5
N
1 0 0 1 1 0 1 1 1 1 0 e e e e e P P 0 1 I I I I - I I d d d d d Rdd=memd(Re=#U6)
U
ICLASS Amode Type N x5 Parse d5

1 0 0 1 1 0 1 1 1 1 0 x x x x x P P 0 0 - - - i i i i d d d d d Rdd=memd(Rx++#s4:3)
U
ICLASS Amode Type N
t5 Parse d5

1 0 0 1 1 1 0 1 1 1 0 t t t t t P P i 1 I I I I i I I d d d d d Rdd=memd(Rt<<#u2+#U6)
U
ICLASS Amode Type x5 Parse u1 d5
N
1 0 0 1 1 1 0 1 1 1 0 x x x x x P P u 0 - - - - 0 - - d d d d d Rdd=memd(Rx++Mu)
1 0 0 1 1 1 1 1 1 1 0 x x x x x P P u 0 - - - - 0 - - d d d d d Rdd=memd(Rx++Mu:brev)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
e5 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
x5 Field to encode register x

80-N2040-53 Rev. AA 223


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load-acquire doubleword
Load a 64-bit doubleword from memory and place in a destination register pair. The load-acquire
memory operation is observed before any following memory operations (in program order) have
been observed at the local point of serialization. A different order may be observed at the global
point of serialization. (see Ordering and Synchronization).

Syntax Behavior
Rdd=memd_aq(Rs) EA=Rs;
Rdd = *EA

Class: LD (slots 0)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type s5 Parse d5
N
1 0 0 1 0 0 1 0 0 0 0 s s s s s P P 0 1 1 - - - 0 0 0 d d d d d Rdd=memd_aq(Rs)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s

80-N2040-53 Rev. AA 224


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load doubleword conditionally


Load a 64-bit doubleword from memory and place in a destination register pair.
This instruction is conditional based on a predicate value. If the predicate is true, the instruction is
performed, otherwise it is treated as a NOP.

Syntax Behavior
if ([!]Pt[.new]) Rdd=memd(#u6) apply_extension(#u);
EA=#u;
if ([!]Pt[.new][0]) {
Rdd = *EA;
} else {
NOP;
}
if ([!]Pt[.new]) apply_extension(#u);
Rdd=memd(Rs+#u6:3) EA=Rs+#u;
if ([!]Pt[.new][0]) {
Rdd = *EA;
} else {
NOP;
}
if ([!]Pt[.new]) EA=Rx;
Rdd=memd(Rx++#s4:3) if([!]Pt[.new][0]){
Rx=Rx+#s;
Rdd = *EA;
} else {
NOP;
}
if ([!]Pv[.new]) EA=Rs+(Rt<<#u);
Rdd=memd(Rs+Rt<<#u2) if ([!]Pv[.new][0]) {
Rdd = *EA;
} else {
NOP;
}

Class: LD (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5 d5
if (Pv)
0 0 1 1 0 0 0 0 1 1 0 s s s s s P P i t t t t t i v v d d d d d Rdd=memd(Rs+Rt<<#u2)

0 0 1 1 0 0 0 1 1 1 0 s s s s s P P i t t t t t i v v d d d d d if (!Pv)
Rdd=memd(Rs+Rt<<#u2)

0 0 1 1 0 0 1 0 1 1 0 s s s s s P P i t t t t t i v v d d d d d if (Pv.new)
Rdd=memd(Rs+Rt<<#u2)
if (!Pv.new)
0 0 1 1 0 0 1 1 1 1 0 s s s s s P P i t t t t t i v v d d d d d Rdd=memd(Rs+Rt<<#u2)

Pr
Se ed
U
ICLASS ns Ne Type s5 Parse t2 d5
N
e
w

80-N2040-53 Rev. AA 225


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
if (Pt)
0 1 0 0 0 0 0 1 1 1 0 s s s s s P P 0 t t i i i i i i d d d d d
Rdd=memd(Rs+#u6:3)

0 1 0 0 0 0 1 1 1 1 0 s s s s s P P 0 t t i i i i i i d d d d d if (Pt.new)
Rdd=memd(Rs+#u6:3)
if (!Pt)
0 1 0 0 0 1 0 1 1 1 0 s s s s s P P 0 t t i i i i i i d d d d d
Rdd=memd(Rs+#u6:3)
if (!Pt.new)
0 1 0 0 0 1 1 1 1 1 0 s s s s s P P 0 t t i i i i i i d d d d d Rdd=memd(Rs+#u6:3)

U
ICLASS Amode Type x5 Parse t2 d5
N
if (Pt)
1 0 0 1 1 0 1 1 1 1 0 x x x x x P P 1 0 0 t t i i i i d d d d d
Rdd=memd(Rx++#s4:3)
if (!Pt)
1 0 0 1 1 0 1 1 1 1 0 x x x x x P P 1 0 1 t t i i i i d d d d d
Rdd=memd(Rx++#s4:3)

1 0 0 1 1 0 1 1 1 1 0 x x x x x P P 1 1 0 t t i i i i d d d d d if (Pt.new)
Rdd=memd(Rx++#s4:3)
if (!Pt.new)
1 0 0 1 1 0 1 1 1 1 0 x x x x x P P 1 1 1 t t i i i i d d d d d
Rdd=memd(Rx++#s4:3)
U
ICLASS Amode Type N Parse t2 d5

1 0 0 1 1 1 1 1 1 1 0 i i i i i P P 1 0 0 t t i 1 - - d d d d d if (Pt) Rdd=memd(#u6)
1 0 0 1 1 1 1 1 1 1 0 i i i i i P P 1 0 1 t t i 1 - - d d d d d if (!Pt) Rdd=memd(#u6)
1 0 0 1 1 1 1 1 1 1 0 i i i i i P P 1 1 0 t t i 1 - - d d d d d if (Pt.new) Rdd=memd(#u6)

1 0 0 1 1 1 1 1 1 1 0 i i i i i P P 1 1 1 t t i 1 - - d d d d d if (!Pt.new)
Rdd=memd(#u6)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
PredNew PredNew
Sense Sense
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t2 Field to encode register t
t5 Field to encode register t
v2 Field to encode register v
x5 Field to encode register x

80-N2040-53 Rev. AA 226


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load byte
Load a signed byte from memory. The byte at the effective address in memory is placed in the
least-significant 8 bits of the destination register. The destination register is then sign-extended
from 8 bits to 32.

Syntax Behavior
Rd=memb(Re=#U6) apply_extension(#U);
EA=#U;
Rd = *EA;
Re=#U;
Rd=memb(Rs+#s11:0) apply_extension(#s);
EA=Rs+#s;
Rd = *EA;
Rd=memb(Rs+Rt<<#u2) EA=Rs+(Rt<<#u);
Rd = *EA;
Rd=memb(Rt<<#u2+#U6) apply_extension(#U);
EA=#U+(Rt<<#u);
Rd = *EA;
Rd=memb(Rx++#s4:0) EA=Rx;
Rx=Rx+#s;
Rd = *EA;
Rd=memb(Rx++#s4:0:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,#s,MuV);
Rd = *EA;
Rd=memb(Rx++I:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,I<<0,MuV);
Rd = *EA;
Rd=memb(Rx++Mu) EA=Rx;
Rx=Rx+MuV;
Rd = *EA;
Rd=memb(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
Rd = *EA;
Rd=memb(gp+#u16:0) apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
Rd = *EA;

Class: LD (slots 0,1)

Intrinsics

Rd=memb(Rx++#s4:0:circ(Mu Word32 Q6_R_memb_IM_circ(void** StartAddress, Word32


)) Is4_0, Word32 Mu, void* BaseAddress)
Rd=memb(Rx++I:circ(Mu)) Word32 Q6_R_memb_M_circ(void** StartAddress, Word32
Mu, void* BaseAddress)

80-N2040-53 Rev. AA 227


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5 d5
0 0 1 1 1 0 1 0 0 0 0 s s s s s P P i t t t t t i - - d d d d d Rd=memb(Rs+Rt<<#u2)
U
ICLASS Type Parse d5
N
0 1 0 0 1 i i 1 0 0 0 i i i i i P P i i i i i i i i i d d d d d Rd=memb(gp+#u16:0)
U
ICLASS Amode Type s5 Parse d5
N
1 0 0 1 0 i i 1 0 0 0 s s s s s P P i i i i i i i i i d d d d d Rd=memb(Rs+#s11:0)
U
ICLASS Amode Type x5 Parse u1 d5
N
Rd=memb(Rx++#s4:0:circ(
1 0 0 1 1 0 0 1 0 0 0 x x x x x P P u 0 - - 0 i i i i d d d d d
Mu))
1 0 0 1 1 0 0 1 0 0 0 x x x x x P P u 0 - - 1 - 0 - - d d d d d Rd=memb(Rx++I:circ(Mu))
U
ICLASS Amode Type e5 Parse d5
N
1 0 0 1 1 0 1 1 0 0 0 e e e e e P P 0 1 I I I I - I I d d d d d Rd=memb(Re=#U6)

ICLASS Amode Type U x5 Parse d5


N
1 0 0 1 1 0 1 1 0 0 0 x x x x x P P 0 0 - - - i i i i d d d d d Rd=memb(Rx++#s4:0)

ICLASS Amode Type U t5 Parse d5


N
1 0 0 1 1 1 0 1 0 0 0 t t t t t P P i 1 I I I I i I I d d d d d Rd=memb(Rt<<#u2+#U6)
U
ICLASS Amode Type x5 Parse u1 d5
N
1 0 0 1 1 1 0 1 0 0 0 x x x x x P P u 0 - - - - 0 - - d d d d d Rd=memb(Rx++Mu)
1 0 0 1 1 1 1 1 0 0 0 x x x x x P P u 0 - - - - 0 - - d d d d d Rd=memb(Rx++Mu:brev)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
e5 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
x5 Field to encode register x

80-N2040-53 Rev. AA 228


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load byte conditionally


Load a signed byte from memory. The byte at the effective address in memory is placed in the
least-significant 8 bits of the destination register. The destination register is then sign-extended
from 8 bits to 32.
This instruction is conditional based on a predicate value. If the predicate is true, the instruction is
performed, otherwise it is treated as a NOP.

Syntax Behavior
if ([!]Pt[.new]) Rd=memb(#u6) apply_extension(#u);
EA=#u;
if ([!]Pt[.new][0]) {
Rd = *EA;
} else {
NOP;
}
if ([!]Pt[.new]) Rd=memb(Rs+#u6:0) apply_extension(#u);
EA=Rs+#u;
if ([!]Pt[.new][0]) {
Rd = *EA;
} else {
NOP;
}
if ([!]Pt[.new]) Rd=memb(Rx++#s4:0) EA=Rx;
if([!]Pt[.new][0]){
Rx=Rx+#s;
Rd = *EA;
} else {
NOP;
}
if ([!]Pv[.new]) Rd=memb(Rs+Rt<<#u2) EA=Rs+(Rt<<#u);
if ([!]Pv[.new][0]) {
Rd = *EA;
} else {
NOP;
}

Class: LD (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5 d5

d if
(Pv)
0 0 1 1 0 0 0 0 0 0 0 s s s s s P P i t t t t t i v v d d d d
Rd=memb(Rs+Rt<<#u2)

0 0 1 1 0 0 0 1 0 0 0 s s s s s P P i t t t t t i v v d d d d d if (!Pv)
Rd=memb(Rs+Rt<<#u2)
if (Pv.new)
0 0 1 1 0 0 1 0 0 0 0 s s s s s P P i t t t t t i v v d d d d d Rd=memb(Rs+Rt<<#u2)

d ifRd=memb(Rs+Rt<<#u2)
(!Pv.new)
0 0 1 1 0 0 1 1 0 0 0 s s s s s P P i t t t t t i v v d d d d

80-N2040-53 Rev. AA 229


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Pr
Se
ed U
ICLASS ns Type s5 Parse t2 d5
Ne N
e
w
0 1 0 0 0 0 0 1 0 0 0 s s s s s P P 0 t t i i i i i i d d d d d if (Pt) Rd=memb(Rs+#u6:0)
if (Pt.new)
0 1 0 0 0 0 1 1 0 0 0 s s s s s P P 0 t t i i i i i i d d d d d
Rd=memb(Rs+#u6:0)
if (!Pt)
0 1 0 0 0 1 0 1 0 0 0 s s s s s P P 0 t t i i i i i i d d d d d
Rd=memb(Rs+#u6:0)
if (!Pt.new)
0 1 0 0 0 1 1 1 0 0 0 s s s s s P P 0 t t i i i i i i d d d d d
Rd=memb(Rs+#u6:0)
U
ICLASS Amode Type x5 Parse t2 d5
N
if (Pt)
1 0 0 1 1 0 1 1 0 0 0 x x x x x P P 1 0 0 t t i i i i d d d d d
Rd=memb(Rx++#s4:0)
if (!Pt)
1 0 0 1 1 0 1 1 0 0 0 x x x x x P P 1 0 1 t t i i i i d d d d d
Rd=memb(Rx++#s4:0)

1 0 0 1 1 0 1 1 0 0 0 x x x x x P P 1 1 0 t t i i i i d d d d d if (Pt.new)
Rd=memb(Rx++#s4:0)

1 0 0 1 1 0 1 1 0 0 0 x x x x x P P 1 1 1 t t i i i i d d d d d if (!Pt.new)
Rd=memb(Rx++#s4:0)
U
ICLASS Amode Type N Parse t2 d5

1 0 0 1 1 1 1 1 0 0 0 i i i i i P P 1 0 0 t t i 1 - - d d d d d if (Pt) Rd=memb(#u6)
1 0 0 1 1 1 1 1 0 0 0 i i i i i P P 1 0 1 t t i 1 - - d d d d d if (!Pt) Rd=memb(#u6)
1 0 0 1 1 1 1 1 0 0 0 i i i i i P P 1 1 0 t t i 1 - - d d d d d if (Pt.new) Rd=memb(#u6)
1 0 0 1 1 1 1 1 0 0 0 i i i i i P P 1 1 1 t t i 1 - - d d d d d if (!Pt.new) Rd=memb(#u6)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
PredNew PredNew
Sense Sense
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t2 Field to encode register t
t5 Field to encode register t
v2 Field to encode register v
x5 Field to encode register x

80-N2040-53 Rev. AA 230


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load byte into shifted vector


Shift a 64-bit vector right by one byte. Insert a byte from memory into the vacated upper byte of
the vector.
Mem
Ryy

Ryy

Syntax Behavior
Ryy=memb_fifo(Re=#U6) apply_extension(#U);
EA=#U;
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>8)|(tmpV<<56);
}
Re=#U;
Ryy=memb_fifo(Rs) Assembler mapped to:
"Ryy=memb_fifo""(Rs+#0)"
Ryy=memb_fifo(Rs+#s11:0) apply_extension(#s);
EA=Rs+#s;
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>8)|(tmpV<<56);
}
Ryy=memb_fifo(Rt<<#u2+#U6) apply_extension(#U);
EA=#U+(Rt<<#u);
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>8)|(tmpV<<56);
}
Ryy=memb_fifo(Rx++#s4:0) EA=Rx;
Rx=Rx+#s;
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>8)|(tmpV<<56);
}
Ryy=memb_fifo(Rx++#s4:0:circ( EA=Rx;
Mu)) Rx=Rx=circ_add(Rx,#s,MuV);
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>8)|(tmpV<<56);
}

80-N2040-53 Rev. AA 231


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Ryy=memb_fifo(Rx++I:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,I<<0,MuV);
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>8)|(tmpV<<56);
}
Ryy=memb_fifo(Rx++Mu) EA=Rx;
Rx=Rx+MuV;
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>8)|(tmpV<<56);
}
Ryy=memb_fifo(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>8)|(tmpV<<56);
}

Class: LD (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type s5 Parse y5
N

1 0 0 1 0 i i 0 1 0 0 s s s s s P P i i i i i i i i i y y y y y Ryy=memb_fifo(Rs+#s11:0
)
U
ICLASS Amode Type x5 Parse u1 y5
N

1 0 0 1 1 0 0 0 1 0 0 x x x x x P P u 0 - - 0 i i i i y y y y y Ryy=memb_fifo(Rx++#s4:0
:circ(Mu))

1 0 0 1 1 0 0 0 1 0 0 x x x x x P P u 0 - - 1 - 0 - - y y y y y Ryy=memb_fifo(Rx++I:circ(
Mu))

ICLASS Amode Type U e5 Parse y5


N
1 0 0 1 1 0 1 0 1 0 0 e e e e e P P 0 1 I I I I - I I y y y y y Ryy=memb_fifo(Re=#U6)
U
ICLASS Amode Type x5 Parse y5
N

1 0 0 1 1 0 1 0 1 0 0 x x x x x P P 0 0 - - - i i i i y y y y y Ryy=memb_fifo(Rx++#s4:0
)
U
ICLASS Amode Type t5 Parse y5
N

1 0 0 1 1 1 0 0 1 0 0 t t t t t P P i 1 I I I I i I I y y y y y Ryy=memb_fifo(Rt<<#u2+#
U6)
U
ICLASS Amode Type x5 Parse u1 y5
N
1 0 0 1 1 1 0 0 1 0 0 x x x x x P P u 0 - - - - 0 - - y y y y y Ryy=memb_fifo(Rx++Mu)

1 0 0 1 1 1 1 0 1 0 0 x x x x x P P u 0 - - - - 0 - - y y y y y Ryy=memb_fifo(Rx++Mu:br
ev)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type

80-N2040-53 Rev. AA 232


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


UN Unsigned
Parse Packet/loop parse bits
e5 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
x5 Field to encode register x
y5 Field to encode register y

80-N2040-53 Rev. AA 233


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load half into shifted vector


Shift a 64-bit vector right by one halfword. Insert a halfword from memory into the vacated upper
halfword of the vector.
Mem
Ryy

Ryy

Syntax Behavior
Ryy=memh_fifo(Re=#U6) apply_extension(#U);
EA=#U;
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>16)|(tmpV<<48);
}
Re=#U;
Ryy=memh_fifo(Rs) Assembler mapped to:
"Ryy=memh_fifo""(Rs+#0)"
Ryy=memh_fifo(Rs+#s11:1) apply_extension(#s);
EA=Rs+#s;
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>16)|(tmpV<<48);
}
Ryy=memh_fifo(Rt<<#u2+#U6) apply_extension(#U);
EA=#U+(Rt<<#u);
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>16)|(tmpV<<48);
}
Ryy=memh_fifo(Rx++#s4:1) EA=Rx;
Rx=Rx+#s;
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>16)|(tmpV<<48);
}
Ryy=memh_fifo(Rx++#s4:1:circ( EA=Rx;
Mu)) Rx=Rx=circ_add(Rx,#s,MuV);
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>16)|(tmpV<<48);
}

80-N2040-53 Rev. AA 234


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Ryy=memh_fifo(Rx++I:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,I<<1,MuV);
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>16)|(tmpV<<48);
}
Ryy=memh_fifo(Rx++Mu) EA=Rx;
Rx=Rx+MuV;
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>16)|(tmpV<<48);
}
Ryy=memh_fifo(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
{
tmpV = *EA;
Ryy = (((size8u_t)Ryy)>>16)|(tmpV<<48);
}

Class: LD (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type s5 Parse y5
N

1 0 0 1 0 i i 0 0 1 0 s s s s s P P i i i i i i i i i y y y y y Ryy=memh_fifo(Rs+#s11:1
)
U
ICLASS Amode Type x5 Parse u1 y5
N

1 0 0 1 1 0 0 0 0 1 0 x x x x x P P u 0 - - 0 i i i i y y y y y Ryy=memh_fifo(Rx++#s4:1
:circ(Mu))

1 0 0 1 1 0 0 0 0 1 0 x x x x x P P u 0 - - 1 - 0 - - y y y y y Ryy=memh_fifo(Rx++I:circ(
Mu))

ICLASS Amode Type U e5 Parse y5


N
1 0 0 1 1 0 1 0 0 1 0 e e e e e P P 0 1 I I I I - I I y y y y y Ryy=memh_fifo(Re=#U6)
U
ICLASS Amode Type x5 Parse y5
N

1 0 0 1 1 0 1 0 0 1 0 x x x x x P P 0 0 - - - i i i i y y y y y Ryy=memh_fifo(Rx++#s4:1
)
U
ICLASS Amode Type t5 Parse y5
N

1 0 0 1 1 1 0 0 0 1 0 t t t t t P P i 1 I I I I i I I y y y y y Ryy=memh_fifo(Rt<<#u2+#
U6)
U
ICLASS Amode Type x5 Parse u1 y5
N
1 0 0 1 1 1 0 0 0 1 0 x x x x x P P u 0 - - - - 0 - - y y y y y Ryy=memh_fifo(Rx++Mu)

1 0 0 1 1 1 1 0 0 1 0 x x x x x P P u 0 - - - - 0 - - y y y y y Ryy=memh_fifo(Rx++Mu:br
ev)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type

80-N2040-53 Rev. AA 235


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


UN Unsigned
Parse Packet/loop parse bits
e5 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
x5 Field to encode register x
y5 Field to encode register y

80-N2040-53 Rev. AA 236


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load halfword
Load a signed halfword from memory. The 16-bit halfword at the effective address in memory is
placed in the least-significant 16 bits of the destination register. The destination register is then
sign-extended from 16 bits to 32.
Syntax Behavior
Rd=memh(Re=#U6) apply_extension(#U);
EA=#U;
Rd = *EA;
Re=#U;
Rd=memh(Rs+#s11:1) apply_extension(#s);
EA=Rs+#s;
Rd = *EA;
Rd=memh(Rs+Rt<<#u2) EA=Rs+(Rt<<#u);
Rd = *EA;
Rd=memh(Rt<<#u2+#U6) apply_extension(#U);
EA=#U+(Rt<<#u);
Rd = *EA;
Rd=memh(Rx++#s4:1) EA=Rx;
Rx=Rx+#s;
Rd = *EA;
Rd=memh(Rx++#s4:1:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,#s,MuV);
Rd = *EA;
Rd=memh(Rx++I:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,I<<1,MuV);
Rd = *EA;
Rd=memh(Rx++Mu) EA=Rx;
Rx=Rx+MuV;
Rd = *EA;
Rd=memh(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
Rd = *EA;
Rd=memh(gp+#u16:1) apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
Rd = *EA;

Class: LD (slots 0,1)

Intrinsics

Rd=memh(Rx++#s4:1:circ(Mu Word32 Q6_R_memh_IM_circ(void** StartAddress, Word32


)) Is4_1, Word32 Mu, void* BaseAddress)
Rd=memh(Rx++I:circ(Mu)) Word32 Q6_R_memh_M_circ(void** StartAddress, Word32
Mu, void* BaseAddress)

80-N2040-53 Rev. AA 237


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5 d5
0 0 1 1 1 0 1 0 0 1 0 s s s s s P P i t t t t t i - - d d d d d Rd=memh(Rs+Rt<<#u2)
U
ICLASS Type Parse d5
N
0 1 0 0 1 i i 1 0 1 0 i i i i i P P i i i i i i i i i d d d d d Rd=memh(gp+#u16:1)
U
ICLASS Amode Type s5 Parse d5
N
1 0 0 1 0 i i 1 0 1 0 s s s s s P P i i i i i i i i i d d d d d Rd=memh(Rs+#s11:1)
U
ICLASS Amode Type x5 Parse u1 d5
N
Rd=memh(Rx++#s4:1:circ(
1 0 0 1 1 0 0 1 0 1 0 x x x x x P P u 0 - - 0 i i i i d d d d d
Mu))
1 0 0 1 1 0 0 1 0 1 0 x x x x x P P u 0 - - 1 - 0 - - d d d d d Rd=memh(Rx++I:circ(Mu))
U
ICLASS Amode Type e5 Parse d5
N
1 0 0 1 1 0 1 1 0 1 0 e e e e e P P 0 1 I I I I - I I d d d d d Rd=memh(Re=#U6)

ICLASS Amode Type U x5 Parse d5


N
1 0 0 1 1 0 1 1 0 1 0 x x x x x P P 0 0 - - - i i i i d d d d d Rd=memh(Rx++#s4:1)

ICLASS Amode Type U t5 Parse d5


N
1 0 0 1 1 1 0 1 0 1 0 t t t t t P P i 1 I I I I i I I d d d d d Rd=memh(Rt<<#u2+#U6)
U
ICLASS Amode Type x5 Parse u1 d5
N
1 0 0 1 1 1 0 1 0 1 0 x x x x x P P u 0 - - - - 0 - - d d d d d Rd=memh(Rx++Mu)
1 0 0 1 1 1 1 1 0 1 0 x x x x x P P u 0 - - - - 0 - - d d d d d Rd=memh(Rx++Mu:brev)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
e5 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
x5 Field to encode register x

80-N2040-53 Rev. AA 238


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load halfword conditionally


Load a signed halfword from memory. The 16-bit halfword at the effective address in memory is
placed in the least-significant 16 bits of the destination register. The destination register is then
sign-extended from 16 bits to 32.
This instruction is conditional based on a predicate value. If the predicate is true, the instruction is
performed, otherwise it is treated as a NOP.

Syntax Behavior
if ([!]Pt[.new]) Rd=memh(#u6) apply_extension(#u);
EA=#u;
if ([!]Pt[.new][0]) {
Rd = *EA;
} else {
NOP;
}
if ([!]Pt[.new]) apply_extension(#u);
Rd=memh(Rs+#u6:1) EA=Rs+#u;
if ([!]Pt[.new][0]) {
Rd = *EA;
} else {
NOP;
}
if ([!]Pt[.new]) EA=Rx;
Rd=memh(Rx++#s4:1) if([!]Pt[.new][0]){
Rx=Rx+#s;
Rd = *EA;
} else {
NOP;
}
if ([!]Pv[.new]) EA=Rs+(Rt<<#u);
Rd=memh(Rs+Rt<<#u2) if ([!]Pv[.new][0]) {
Rd = *EA;
} else {
NOP;
}

Class: LD (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5 d5

d if
(Pv)
0 0 1 1 0 0 0 0 0 1 0 s s s s s P P i t t t t t i v v d d d d
Rd=memh(Rs+Rt<<#u2)

0 0 1 1 0 0 0 1 0 1 0 s s s s s P P i t t t t t i v v d d d d d if (!Pv)
Rd=memh(Rs+Rt<<#u2)
if (Pv.new)
0 0 1 1 0 0 1 0 0 1 0 s s s s s P P i t t t t t i v v d d d d d Rd=memh(Rs+Rt<<#u2)

d ifRd=memh(Rs+Rt<<#u2)
(!Pv.new)
0 0 1 1 0 0 1 1 0 1 0 s s s s s P P i t t t t t i v v d d d d

80-N2040-53 Rev. AA 239


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Pr
Se
ed U
ICLASS ns Type s5 Parse t2 d5
Ne N
e
w
0 1 0 0 0 0 0 1 0 1 0 s s s s s P P 0 t t i i i i i i d d d d d if (Pt) Rd=memh(Rs+#u6:1)
if (Pt.new)
0 1 0 0 0 0 1 1 0 1 0 s s s s s P P 0 t t i i i i i i d d d d d
Rd=memh(Rs+#u6:1)
if (!Pt)
0 1 0 0 0 1 0 1 0 1 0 s s s s s P P 0 t t i i i i i i d d d d d
Rd=memh(Rs+#u6:1)
if (!Pt.new)
0 1 0 0 0 1 1 1 0 1 0 s s s s s P P 0 t t i i i i i i d d d d d
Rd=memh(Rs+#u6:1)
U
ICLASS Amode Type x5 Parse t2 d5
N
if (Pt)
1 0 0 1 1 0 1 1 0 1 0 x x x x x P P 1 0 0 t t i i i i d d d d d
Rd=memh(Rx++#s4:1)
if (!Pt)
1 0 0 1 1 0 1 1 0 1 0 x x x x x P P 1 0 1 t t i i i i d d d d d
Rd=memh(Rx++#s4:1)

1 0 0 1 1 0 1 1 0 1 0 x x x x x P P 1 1 0 t t i i i i d d d d d if (Pt.new)
Rd=memh(Rx++#s4:1)

1 0 0 1 1 0 1 1 0 1 0 x x x x x P P 1 1 1 t t i i i i d d d d d if (!Pt.new)
Rd=memh(Rx++#s4:1)
U
ICLASS Amode Type N Parse t2 d5

1 0 0 1 1 1 1 1 0 1 0 i i i i i P P 1 0 0 t t i 1 - - d d d d d if (Pt) Rd=memh(#u6)
1 0 0 1 1 1 1 1 0 1 0 i i i i i P P 1 0 1 t t i 1 - - d d d d d if (!Pt) Rd=memh(#u6)
1 0 0 1 1 1 1 1 0 1 0 i i i i i P P 1 1 0 t t i 1 - - d d d d d if (Pt.new) Rd=memh(#u6)
1 0 0 1 1 1 1 1 0 1 0 i i i i i P P 1 1 1 t t i 1 - - d d d d d if (!Pt.new) Rd=memh(#u6)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
PredNew PredNew
Sense Sense
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t2 Field to encode register t
t5 Field to encode register t
v2 Field to encode register v
x5 Field to encode register x

80-N2040-53 Rev. AA 240


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Memory copy
Copy Mu + 1 (length) bytes from the address in Rt (source base) to the address in Rs (destination
base). The source base, destination base, and length values must be aligned to the L2 cache-line
size. Behavior is undefined for non-aligned values and for source or destination buffers partially in
illegal space. The accesses by the memcpy instruction are noncoherent with the cache-hierarchy
of the Q6.
In addition to normal translation exceptions, a coprocessor memory exception occurs if any of the
following are true:
■ Source or destination base address in illegal space
■ Source or destination buffer crosses a page boundary
■ Source base address is NOT in AXI space
■ Destination base address is NOT in VTCM
This instruction is only available on cores with VTCM.

Syntax Behavior
Rdd=pmemcpy(Rx,Rtt)

Class: LD (slots 0,1)

Notes

■ This is a solo instruction. It must not be grouped with other instructions in a packet.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type t5 Parse x5 d5
N
1 0 0 1 1 0 0 1 1 1 1 t t t t t P P 0 x x x x x 0 0 0 d d d d d Rdd=pmemcpy(Rx,Rtt)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 241


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Piecemeal memory copy


Piecemeal memory copy.
Syntax Behavior
Rd=movlen(Rs,Rtt)
Rdd=linecpy(Rs,Rtt)
Rdd=pmemcpy(Rx,Rtt)

Class: CR (slot 3)

Notes

■ This is a solo instruction. It must not be grouped with other instructions in a packet.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS sm t5 Parse s5 d5
0 1 1 0 1 1 1 1 1 1 1 t t t t t P P 0 s s s s s 0 1 0 d d d d d Rd=movlen(Rs,Rtt)

ICLASS Amode Type U t5 Parse s5 d5


N
1 0 0 1 1 0 0 1 1 1 1 t t t t t P P 0 s s s s s 0 0 1 d d d d d Rdd=linecpy(Rs,Rtt)
U
ICLASS Amode Type t5 Parse x5 d5
N
1 0 0 1 1 0 0 1 1 1 1 t t t t t P P 0 x x x x x 0 0 0 d d d d d Rdd=pmemcpy(Rx,Rtt)

Field name Description


sm Supervisor mode only
ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 242


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load unsigned byte


Load an unsigned byte from memory. The byte at the effective address in memory is placed in the
least-significant 8 bits of the destination register. The destination register is then zero-extended
from 8 bits to 32.

Syntax Behavior
Rd=memub(Re=#U6) apply_extension(#U);
EA=#U;
Rd = *EA;
Re=#U;
Rd=memub(Rs+#s11:0) apply_extension(#s);
EA=Rs+#s;
Rd = *EA;
Rd=memub(Rs+Rt<<#u2) EA=Rs+(Rt<<#u);
Rd = *EA;
Rd=memub(Rt<<#u2+#U6) apply_extension(#U);
EA=#U+(Rt<<#u);
Rd = *EA;
Rd=memub(Rx++#s4:0) EA=Rx;
Rx=Rx+#s;
Rd = *EA;
Rd=memub(Rx++#s4:0:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,#s,MuV);
Rd = *EA;
Rd=memub(Rx++I:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,I<<0,MuV);
Rd = *EA;
Rd=memub(Rx++Mu) EA=Rx;
Rx=Rx+MuV;
Rd = *EA;
Rd=memub(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
Rd = *EA;
Rd=memub(gp+#u16:0) apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
Rd = *EA;

Class: LD (slots 0,1)

Intrinsics

Rd=memub(Rx++#s4:0:circ(Mu Word32 Q6_R_memub_IM_circ(void** StartAddress,


)) Word32 Is4_0, Word32 Mu, void* BaseAddress)
Rd=memub(Rx++I:circ(Mu)) Word32 Q6_R_memub_M_circ(void** StartAddress,
Word32 Mu, void* BaseAddress)

80-N2040-53 Rev. AA 243


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5 d5
0 0 1 1 1 0 1 0 0 0 1 s s s s s P P i t t t t t i - - d d d d d Rd=memub(Rs+Rt<<#u2)
U
ICLASS Type Parse d5
N
0 1 0 0 1 i i 1 0 0 1 i i i i i P P i i i i i i i i i d d d d d Rd=memub(gp+#u16:0)
U
ICLASS Amode Type s5 Parse d5
N
1 0 0 1 0 i i 1 0 0 1 s s s s s P P i i i i i i i i i d d d d d Rd=memub(Rs+#s11:0)
U
ICLASS Amode Type x5 Parse u1 d5
N
Rd=memub(Rx++#s4:0:circ
1 0 0 1 1 0 0 1 0 0 1 x x x x x P P u 0 - - 0 i i i i d d d d d
(Mu))

1 0 0 1 1 0 0 1 0 0 1 x x x x x P P u 0 - - 1 - 0 - - d d d d d Rd=memub(Rx++I:circ(Mu)
)
U
ICLASS Amode Type e5 Parse d5
N
1 0 0 1 1 0 1 1 0 0 1 e e e e e P P 0 1 I I I I - I I d d d d d Rd=memub(Re=#U6)
U
ICLASS Amode Type N x5 Parse d5

1 0 0 1 1 0 1 1 0 0 1 x x x x x P P 0 0 - - - i i i i d d d d d Rd=memub(Rx++#s4:0)
U
ICLASS Amode Type N
t5 Parse d5

1 0 0 1 1 1 0 1 0 0 1 t t t t t P P i 1 I I I I i I I d d d d d Rd=memub(Rt<<#u2+#U6)
U
ICLASS Amode Type x5 Parse u1 d5
N
1 0 0 1 1 1 0 1 0 0 1 x x x x x P P u 0 - - - - 0 - - d d d d d Rd=memub(Rx++Mu)
1 0 0 1 1 1 1 1 0 0 1 x x x x x P P u 0 - - - - 0 - - d d d d d Rd=memub(Rx++Mu:brev)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
e5 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
x5 Field to encode register x

80-N2040-53 Rev. AA 244


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load unsigned byte conditionally


Load an unsigned byte from memory. The byte at the effective address in memory is placed in the
least-significant 8 bits of the destination register. The destination register is then zero-extended
from 8 bits to 32.
This instruction is conditional based on a predicate value. If the predicate is true, the instruction is
performed, otherwise it is treated as a NOP.

Syntax Behavior
if ([!]Pt[.new]) apply_extension(#u);
Rd=memub(#u6) EA=#u;
if ([!]Pt[.new][0]) {
Rd = *EA;
} else {
NOP;
}
if ([!]Pt[.new]) apply_extension(#u);
Rd=memub(Rs+#u6:0) EA=Rs+#u;
if ([!]Pt[.new][0]) {
Rd = *EA;
} else {
NOP;
}
if ([!]Pt[.new]) EA=Rx;
Rd=memub(Rx++#s4:0) if([!]Pt[.new][0]){
Rx=Rx+#s;
Rd = *EA;
} else {
NOP;
}
if ([!]Pv[.new]) EA=Rs+(Rt<<#u);
Rd=memub(Rs+Rt<<#u2) if ([!]Pv[.new][0]) {
Rd = *EA;
} else {
NOP;
}

Class: LD (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5 d5

d if
(Pv)
0 0 1 1 0 0 0 0 0 0 1 s s s s s P P i t t t t t i v v d d d d
Rd=memub(Rs+Rt<<#u2)

0 0 1 1 0 0 0 1 0 0 1 s s s s s P P i t t t t t i v v d d d d d if (!Pv)
Rd=memub(Rs+Rt<<#u2)
if (Pv.new)
0 0 1 1 0 0 1 0 0 0 1 s s s s s P P i t t t t t i v v d d d d d Rd=memub(Rs+Rt<<#u2)

d ifRd=memub(Rs+Rt<<#u2)
(!Pv.new)
0 0 1 1 0 0 1 1 0 0 1 s s s s s P P i t t t t t i v v d d d d

80-N2040-53 Rev. AA 245


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Pr
Se
ed U
ICLASS ns Type s5 Parse t2 d5
Ne N
e
w

0 1 0 0 0 0 0 1 0 0 1 s s s s s P P 0 t t i i i i i i d d d d d if (Pt)
Rd=memub(Rs+#u6:0)
if (Pt.new)
0 1 0 0 0 0 1 1 0 0 1 s s s s s P P 0 t t i i i i i i d d d d d
Rd=memub(Rs+#u6:0)

0 1 0 0 0 1 0 1 0 0 1 s s s s s P P 0 t t i i i i i i d d d d d if (!Pt)
Rd=memub(Rs+#u6:0)
if (!Pt.new)
0 1 0 0 0 1 1 1 0 0 1 s s s s s P P 0 t t i i i i i i d d d d d
Rd=memub(Rs+#u6:0)
U
ICLASS Amode Type x5 Parse t2 d5
N
if (Pt)
1 0 0 1 1 0 1 1 0 0 1 x x x x x P P 1 0 0 t t i i i i d d d d d
Rd=memub(Rx++#s4:0)

1 0 0 1 1 0 1 1 0 0 1 x x x x x P P 1 0 1 t t i i i i d d d d d if (!Pt)
Rd=memub(Rx++#s4:0)
if (Pt.new)
1 0 0 1 1 0 1 1 0 0 1 x x x x x P P 1 1 0 t t i i i i d d d d d Rd=memub(Rx++#s4:0)

1 0 0 1 1 0 1 1 0 0 1 x x x x x P P 1 1 1 t t i i i i d d d d d if (!Pt.new)
Rd=memub(Rx++#s4:0)
U
ICLASS Amode Type Parse t2 d5
N
1 0 0 1 1 1 1 1 0 0 1 i i i i i P P 1 0 0 t t i 1 - - d d d d d if (Pt) Rd=memub(#u6)
1 0 0 1 1 1 1 1 0 0 1 i i i i i P P 1 0 1 t t i 1 - - d d d d d if (!Pt) Rd=memub(#u6)
1 0 0 1 1 1 1 1 0 0 1 i i i i i P P 1 1 0 t t i 1 - - d d d d d if (Pt.new) Rd=memub(#u6)
if (!Pt.new)
1 0 0 1 1 1 1 1 0 0 1 i i i i i P P 1 1 1 t t i 1 - - d d d d d Rd=memub(#u6)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
PredNew PredNew
Sense Sense
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t2 Field to encode register t
t5 Field to encode register t
v2 Field to encode register v
x5 Field to encode register x

80-N2040-53 Rev. AA 246


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load unsigned halfword


Load an unsigned halfword from memory. The 16-bit halfword at the effective address in memory
is placed in the least-significant 16 bits of the destination register. The destination register is zero-
extended from 16 bits to 32.

Syntax Behavior
Rd=memuh(Re=#U6) apply_extension(#U);
EA=#U;
Rd = *EA;
Re=#U;
Rd=memuh(Rs+#s11:1) apply_extension(#s);
EA=Rs+#s;
Rd = *EA;
Rd=memuh(Rs+Rt<<#u2) EA=Rs+(Rt<<#u);
Rd = *EA;
Rd=memuh(Rt<<#u2+#U6) apply_extension(#U);
EA=#U+(Rt<<#u);
Rd = *EA;
Rd=memuh(Rx++#s4:1) EA=Rx;
Rx=Rx+#s;
Rd = *EA;
Rd=memuh(Rx++#s4:1:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,#s,MuV);
Rd = *EA;
Rd=memuh(Rx++I:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,I<<1,MuV);
Rd = *EA;
Rd=memuh(Rx++Mu) EA=Rx;
Rx=Rx+MuV;
Rd = *EA;
Rd=memuh(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
Rd = *EA;
Rd=memuh(gp+#u16:1) apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
Rd = *EA;

Class: LD (slots 0,1)

Intrinsics

Rd=memuh(Rx++#s4:1:circ(Mu Word32 Q6_R_memuh_IM_circ(void** StartAddress,


)) Word32 Is4_1, Word32 Mu, void* BaseAddress)
Rd=memuh(Rx++I:circ(Mu)) Word32 Q6_R_memuh_M_circ(void** StartAddress,
Word32 Mu, void* BaseAddress)

80-N2040-53 Rev. AA 247


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5 d5
0 0 1 1 1 0 1 0 0 1 1 s s s s s P P i t t t t t i - - d d d d d Rd=memuh(Rs+Rt<<#u2)
U
ICLASS Type Parse d5
N
0 1 0 0 1 i i 1 0 1 1 i i i i i P P i i i i i i i i i d d d d d Rd=memuh(gp+#u16:1)
U
ICLASS Amode Type s5 Parse d5
N
1 0 0 1 0 i i 1 0 1 1 s s s s s P P i i i i i i i i i d d d d d Rd=memuh(Rs+#s11:1)
U
ICLASS Amode Type x5 Parse u1 d5
N
Rd=memuh(Rx++#s4:1:circ
1 0 0 1 1 0 0 1 0 1 1 x x x x x P P u 0 - - 0 i i i i d d d d d
(Mu))

1 0 0 1 1 0 0 1 0 1 1 x x x x x P P u 0 - - 1 - 0 - - d d d d d Rd=memuh(Rx++I:circ(Mu)
)
U
ICLASS Amode Type e5 Parse d5
N
1 0 0 1 1 0 1 1 0 1 1 e e e e e P P 0 1 I I I I - I I d d d d d Rd=memuh(Re=#U6)
U
ICLASS Amode Type N x5 Parse d5

1 0 0 1 1 0 1 1 0 1 1 x x x x x P P 0 0 - - - i i i i d d d d d Rd=memuh(Rx++#s4:1)
U
ICLASS Amode Type N
t5 Parse d5

1 0 0 1 1 1 0 1 0 1 1 t t t t t P P i 1 I I I I i I I d d d d d Rd=memuh(Rt<<#u2+#U6)
U
ICLASS Amode Type x5 Parse u1 d5
N
1 0 0 1 1 1 0 1 0 1 1 x x x x x P P u 0 - - - - 0 - - d d d d d Rd=memuh(Rx++Mu)
1 0 0 1 1 1 1 1 0 1 1 x x x x x P P u 0 - - - - 0 - - d d d d d Rd=memuh(Rx++Mu:brev)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
e5 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
x5 Field to encode register x

80-N2040-53 Rev. AA 248


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load unsigned halfword conditionally


Load an unsigned halfword from memory. The 16-bit halfword at the effective address in memory
is placed in the least-significant 16 bits of the destination register. The destination register is zero-
extended from 16 bits to 32.
This instruction is conditional based on a predicate value. If the predicate is true, the instruction is
performed, otherwise it is treated as a NOP.

Syntax Behavior
if ([!]Pt[.new]) apply_extension(#u);
Rd=memuh(#u6) EA=#u;
if ([!]Pt[.new][0]) {
Rd = *EA;
} else {
NOP;
}
if ([!]Pt[.new]) apply_extension(#u);
Rd=memuh(Rs+#u6:1) EA=Rs+#u;
if ([!]Pt[.new][0]) {
Rd = *EA;
} else {
NOP;
}
if ([!]Pt[.new]) EA=Rx;
Rd=memuh(Rx++#s4:1) if([!]Pt[.new][0]){
Rx=Rx+#s;
Rd = *EA;
} else {
NOP;
}
if ([!]Pv[.new]) EA=Rs+(Rt<<#u);
Rd=memuh(Rs+Rt<<#u2) if ([!]Pv[.new][0]) {
Rd = *EA;
} else {
NOP;
}

Class: LD (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5 d5

d if
(Pv)
0 0 1 1 0 0 0 0 0 1 1 s s s s s P P i t t t t t i v v d d d d
Rd=memuh(Rs+Rt<<#u2)

0 0 1 1 0 0 0 1 0 1 1 s s s s s P P i t t t t t i v v d d d d d if (!Pv)
Rd=memuh(Rs+Rt<<#u2)
if (Pv.new)
0 0 1 1 0 0 1 0 0 1 1 s s s s s P P i t t t t t i v v d d d d d Rd=memuh(Rs+Rt<<#u2)

d ifRd=memuh(Rs+Rt<<#u2)
(!Pv.new)
0 0 1 1 0 0 1 1 0 1 1 s s s s s P P i t t t t t i v v d d d d

80-N2040-53 Rev. AA 249


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Pr
Se
ed U
ICLASS ns Type s5 Parse t2 d5
Ne N
e
w

0 1 0 0 0 0 0 1 0 1 1 s s s s s P P 0 t t i i i i i i d d d d d if (Pt)
Rd=memuh(Rs+#u6:1)
if (Pt.new)
0 1 0 0 0 0 1 1 0 1 1 s s s s s P P 0 t t i i i i i i d d d d d
Rd=memuh(Rs+#u6:1)

0 1 0 0 0 1 0 1 0 1 1 s s s s s P P 0 t t i i i i i i d d d d d if (!Pt)
Rd=memuh(Rs+#u6:1)
if (!Pt.new)
0 1 0 0 0 1 1 1 0 1 1 s s s s s P P 0 t t i i i i i i d d d d d
Rd=memuh(Rs+#u6:1)
U
ICLASS Amode Type x5 Parse t2 d5
N
if (Pt)
1 0 0 1 1 0 1 1 0 1 1 x x x x x P P 1 0 0 t t i i i i d d d d d
Rd=memuh(Rx++#s4:1)

1 0 0 1 1 0 1 1 0 1 1 x x x x x P P 1 0 1 t t i i i i d d d d d if (!Pt)
Rd=memuh(Rx++#s4:1)
if (Pt.new)
1 0 0 1 1 0 1 1 0 1 1 x x x x x P P 1 1 0 t t i i i i d d d d d Rd=memuh(Rx++#s4:1)

1 0 0 1 1 0 1 1 0 1 1 x x x x x P P 1 1 1 t t i i i i d d d d d if (!Pt.new)
Rd=memuh(Rx++#s4:1)
U
ICLASS Amode Type Parse t2 d5
N
1 0 0 1 1 1 1 1 0 1 1 i i i i i P P 1 0 0 t t i 1 - - d d d d d if (Pt) Rd=memuh(#u6)
1 0 0 1 1 1 1 1 0 1 1 i i i i i P P 1 0 1 t t i 1 - - d d d d d if (!Pt) Rd=memuh(#u6)
1 0 0 1 1 1 1 1 0 1 1 i i i i i P P 1 1 0 t t i 1 - - d d d d d if (Pt.new) Rd=memuh(#u6)
if (!Pt.new)
1 0 0 1 1 1 1 1 0 1 1 i i i i i P P 1 1 1 t t i 1 - - d d d d d Rd=memuh(#u6)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
PredNew PredNew
Sense Sense
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t2 Field to encode register t
t5 Field to encode register t
v2 Field to encode register v
x5 Field to encode register x

80-N2040-53 Rev. AA 250


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load word
Load a 32-bit word from memory and place in a destination register.

Syntax Behavior
Rd=memw(Re=#U6) apply_extension(#U);
EA=#U;
Rd = *EA;
Re=#U;
Rd=memw(Rs+#s11:2) apply_extension(#s);
EA=Rs+#s;
Rd = *EA;
Rd=memw(Rs+Rt<<#u2) EA=Rs+(Rt<<#u);
Rd = *EA;
Rd=memw(Rt<<#u2+#U6) apply_extension(#U);
EA=#U+(Rt<<#u);
Rd = *EA;
Rd=memw(Rx++#s4:2) EA=Rx;
Rx=Rx+#s;
Rd = *EA;
Rd=memw(Rx++#s4:2:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,#s,MuV);
Rd = *EA;
Rd=memw(Rx++I:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,I<<2,MuV);
Rd = *EA;
Rd=memw(Rx++Mu) EA=Rx;
Rx=Rx+MuV;
Rd = *EA;
Rd=memw(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
Rd = *EA;
Rd=memw(gp+#u16:2) apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
Rd = *EA;

Class: LD (slots 0,1)

Intrinsics

Rd=memw(Rx++#s4:2:circ(Mu Word32 Q6_R_memw_IM_circ(void** StartAddress, Word32


)) Is4_2, Word32 Mu, void* BaseAddress)
Rd=memw(Rx++I:circ(Mu)) Word32 Q6_R_memw_M_circ(void** StartAddress, Word32
Mu, void* BaseAddress)

80-N2040-53 Rev. AA 251


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5 d5
0 0 1 1 1 0 1 0 1 0 0 s s s s s P P i t t t t t i - - d d d d d Rd=memw(Rs+Rt<<#u2)
U
ICLASS Type Parse d5
N
0 1 0 0 1 i i 1 1 0 0 i i i i i P P i i i i i i i i i d d d d d Rd=memw(gp+#u16:2)
U
ICLASS Amode Type s5 Parse d5
N
1 0 0 1 0 i i 1 1 0 0 s s s s s P P i i i i i i i i i d d d d d Rd=memw(Rs+#s11:2)
U
ICLASS Amode Type x5 Parse u1 d5
N
Rd=memw(Rx++#s4:2:circ(
1 0 0 1 1 0 0 1 1 0 0 x x x x x P P u 0 - - 0 i i i i d d d d d
Mu))
1 0 0 1 1 0 0 1 1 0 0 x x x x x P P u 0 - - 1 - 0 - - d d d d d Rd=memw(Rx++I:circ(Mu))
U
ICLASS Amode Type e5 Parse d5
N
1 0 0 1 1 0 1 1 1 0 0 e e e e e P P 0 1 I I I I - I I d d d d d Rd=memw(Re=#U6)

ICLASS Amode Type U x5 Parse d5


N
1 0 0 1 1 0 1 1 1 0 0 x x x x x P P 0 0 - - - i i i i d d d d d Rd=memw(Rx++#s4:2)

ICLASS Amode Type U t5 Parse d5


N
1 0 0 1 1 1 0 1 1 0 0 t t t t t P P i 1 I I I I i I I d d d d d Rd=memw(Rt<<#u2+#U6)
U
ICLASS Amode Type x5 Parse u1 d5
N
1 0 0 1 1 1 0 1 1 0 0 x x x x x P P u 0 - - - - 0 - - d d d d d Rd=memw(Rx++Mu)
1 0 0 1 1 1 1 1 1 0 0 x x x x x P P u 0 - - - - 0 - - d d d d d Rd=memw(Rx++Mu:brev)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
e5 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
x5 Field to encode register x

80-N2040-53 Rev. AA 252


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load-acquire word
Load a 32-bit word from memory and place in a destination register. The load-acquire memory
operation is observed before any following memory operations (in program order) are observed
at the local point of serialization. A different order can be observed at the global point of
serialization (see Ordering and Synchronization).

Syntax Behavior
Rd=memw_aq(Rs) EA=Rs;
Rd = *EA

Class: LD (slots 0)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type s5 Parse d5
N
1 0 0 1 0 0 1 0 0 0 0 s s s s s P P 0 0 1 - - - 0 0 0 d d d d d Rd=memw_aq(Rs)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s

80-N2040-53 Rev. AA 253


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load word conditionally


Load a 32-bit word from memory and place in a destination register.
This instruction is conditional based on a predicate value. If the predicate is true, the instruction is
performed, otherwise it is treated as a NOP.

Syntax Behavior
if ([!]Pt[.new]) Rd=memw(#u6) apply_extension(#u);
EA=#u;
if ([!]Pt[.new][0]) {
Rd = *EA;
} else {
NOP;
}
if ([!]Pt[.new]) Rd=memw(Rs+#u6:2) apply_extension(#u);
EA=Rs+#u;
if ([!]Pt[.new][0]) {
Rd = *EA;
} else {
NOP;
}
if ([!]Pt[.new]) Rd=memw(Rx++#s4:2) EA=Rx;
if([!]Pt[.new][0]){
Rx=Rx+#s;
Rd = *EA;
} else {
NOP;
}
if ([!]Pv[.new]) Rd=memw(Rs+Rt<<#u2) EA=Rs+(Rt<<#u);
if ([!]Pv[.new][0]) {
Rd = *EA;
} else {
NOP;
}

Class: LD (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5 d5
if (Pv)
0 0 1 1 0 0 0 0 1 0 0 s s s s s P P i t t t t t i v v d d d d d Rd=memw(Rs+Rt<<#u2)

0 0 1 1 0 0 0 1 1 0 0 s s s s s P P i t t t t t i v v d d d d d if (!Pv)
Rd=memw(Rs+Rt<<#u2)

0 0 1 1 0 0 1 0 1 0 0 s s s s s P P i t t t t t i v v d d d d d if (Pv.new)
Rd=memw(Rs+Rt<<#u2)
if (!Pv.new)
0 0 1 1 0 0 1 1 1 0 0 s s s s s P P i t t t t t i v v d d d d d Rd=memw(Rs+Rt<<#u2)

Pr
Se ed
U
ICLASS ns Ne Type s5 Parse t2 d5
N
e
w

80-N2040-53 Rev. AA 254


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
if (Pt)
0 1 0 0 0 0 0 1 1 0 0 s s s s s P P 0 t t i i i i i i d d d d d
Rd=memw(Rs+#u6:2)

0 1 0 0 0 0 1 1 1 0 0 s s s s s P P 0 t t i i i i i i d d d d d if (Pt.new)
Rd=memw(Rs+#u6:2)
if (!Pt)
0 1 0 0 0 1 0 1 1 0 0 s s s s s P P 0 t t i i i i i i d d d d d
Rd=memw(Rs+#u6:2)
if (!Pt.new)
0 1 0 0 0 1 1 1 1 0 0 s s s s s P P 0 t t i i i i i i d d d d d Rd=memw(Rs+#u6:2)

U
ICLASS Amode Type x5 Parse t2 d5
N
if (Pt)
1 0 0 1 1 0 1 1 1 0 0 x x x x x P P 1 0 0 t t i i i i d d d d d
Rd=memw(Rx++#s4:2)
if (!Pt)
1 0 0 1 1 0 1 1 1 0 0 x x x x x P P 1 0 1 t t i i i i d d d d d
Rd=memw(Rx++#s4:2)

1 0 0 1 1 0 1 1 1 0 0 x x x x x P P 1 1 0 t t i i i i d d d d d if (Pt.new)
Rd=memw(Rx++#s4:2)
if (!Pt.new)
1 0 0 1 1 0 1 1 1 0 0 x x x x x P P 1 1 1 t t i i i i d d d d d
Rd=memw(Rx++#s4:2)
U
ICLASS Amode Type N Parse t2 d5

1 0 0 1 1 1 1 1 1 0 0 i i i i i P P 1 0 0 t t i 1 - - d d d d d if (Pt) Rd=memw(#u6)
1 0 0 1 1 1 1 1 1 0 0 i i i i i P P 1 0 1 t t i 1 - - d d d d d if (!Pt) Rd=memw(#u6)
1 0 0 1 1 1 1 1 1 0 0 i i i i i P P 1 1 0 t t i 1 - - d d d d d if (Pt.new) Rd=memw(#u6)
1 0 0 1 1 1 1 1 1 0 0 i i i i i P P 1 1 1 t t i 1 - - d d d d d if (!Pt.new) Rd=memw(#u6)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
PredNew PredNew
Sense Sense
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t2 Field to encode register t
t5 Field to encode register t
v2 Field to encode register v
x5 Field to encode register x

80-N2040-53 Rev. AA 255


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Deallocate stack frame


Deallocate a stack frame from the call stack. The instruction first loads the saved FP and saved LR
values from the address at FP. It then points SP back to the previous frame.
The following figure shows the stack layout.
Stack in memory

Saved LR
Saved FP

Higher address
Procedure local
data on stack

Stack frame

Saved LR
Saved FP FP register

Procedure local
data on stack

SP register
Lower address

Unallocated stack

Syntax Behavior
Rdd=deallocframe(Rs):raw EA=Rs;
tmp = *EA;
Rdd = frame_unscramble(tmp);
SP=EA+8;
deallocframe Assembler mapped to: "r31:30=deallocframe(r30):raw"

80-N2040-53 Rev. AA 256


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: LD (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type s5 Parse d5
N
1 0 0 1 0 0 0 0 0 0 0 s s s s s P P 0 - - - - - - - - d d d d d Rdd=deallocframe(Rs):raw

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s

80-N2040-53 Rev. AA 257


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Deallocate frame and return


Return from a function with a stack frame. This instruction is equivalent to deallocframe followed
by jumpr R31.

Syntax Behavior
Rdd=dealloc_return(Rs):raw EA=Rs;
tmp = *EA;
Rdd = frame_unscramble(tmp);
SP=EA+8;
PC=Rdd.w[1];
dealloc_return Assembler mapped to:
"r31:30=dealloc_return(r30):raw"
if ([!]Pv) Rdd=dealloc_return(Rs):raw EA=Rs;
if ([!]Pv[0]) {
tmp = *EA;
Rdd = frame_unscramble(tmp);
SP=EA+8;
PC=Rdd.w[1];
} else {
NOP;
}
if ([!]Pv) dealloc_return Assembler mapped to: "if ([!]Pv"")
r31:30=dealloc_return(r30)"":raw"
if ([!]Pv.new) EA=Rs;
Rdd=dealloc_return(Rs):nt:raw if ([!]Pv.new[0]) {
tmp = *EA;
Rdd = frame_unscramble(tmp);
SP=EA+8;
PC=Rdd.w[1];
} else {
NOP;
}
if ([!]Pv.new) EA=Rs;
Rdd=dealloc_return(Rs):t:raw if ([!]Pv.new[0]) {
tmp = *EA;
Rdd = frame_unscramble(tmp);
SP=EA+8;
PC=Rdd.w[1];
} else {
NOP;
}
if ([!]Pv.new) dealloc_return:nt Assembler mapped to: "if
([!]Pv"".new"")
r31:30=dealloc_return(r30)"":nt"":raw"
if ([!]Pv.new) dealloc_return:t Assembler mapped to: "if
([!]Pv"".new"")
r31:30=dealloc_return(r30)"":t"":raw"

80-N2040-53 Rev. AA 258


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: LD (slots 0)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type s5 Parse d5
N

1 0 0 1 0 1 1 0 0 0 0 s s s s s P P 0 0 0 0 - - - - - d d d d d Rdd=dealloc_return(Rs):ra
w
if (Pv.new)
1 0 0 1 0 1 1 0 0 0 0 s s s s s P P 0 0 1 0 v v - - - d d d d d Rdd=dealloc_return(Rs):nt:
raw
if (Pv)
1 0 0 1 0 1 1 0 0 0 0 s s s s s P P 0 1 0 0 v v - - - d d d d d Rdd=dealloc_return(Rs):ra
w
if (Pv.new)
1 0 0 1 0 1 1 0 0 0 0 s s s s s P P 0 1 1 0 v v - - - d d d d d Rdd=dealloc_return(Rs):t:r
aw
if (!Pv.new)
1 0 0 1 0 1 1 0 0 0 0 s s s s s P P 1 0 1 0 v v - - - d d d d d Rdd=dealloc_return(Rs):nt:
raw
if (!Pv)
1 0 0 1 0 1 1 0 0 0 0 s s s s s P P 1 1 0 0 v v - - - d d d d d Rdd=dealloc_return(Rs):ra
w
if (!Pv.new)
1 0 0 1 0 1 1 0 0 0 0 s s s s s P P 1 1 1 0 v v - - - d d d d d Rdd=dealloc_return(Rs):t:r
aw

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
v2 Field to encode register v

80-N2040-53 Rev. AA 259


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Load and unpack bytes to halfwords


Load contiguous bytes from memory and vector unpack them into halfwords.
Rdd=memubh(amode)

mem

zero zero zero zero Rdd

Rd=memubh(amode)

mem

zero zero Rdd

Syntax Behavior
Rd=membh(Re=#U6) apply_extension(#U);
EA=#U;
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.b[i];
}
}
Re=#U;
Rd=membh(Rs) Assembler mapped to: "Rd=membh""(Rs+#0)"
Rd=membh(Rs+#s11:1) apply_extension(#s);
EA=Rs+#s;
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.b[i];
}
}
Rd=membh(Rt<<#u2+#U6) apply_extension(#U);
EA=#U+(Rt<<#u);
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.b[i];
}
}

80-N2040-53 Rev. AA 260


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rd=membh(Rx++#s4:1) EA=Rx;
Rx=Rx+#s;
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.b[i];
}
}
Rd=membh(Rx++#s4:1:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,#s,MuV);
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.b[i];
}
}
Rd=membh(Rx++I:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,I<<1,MuV);
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.b[i];
}
}
Rd=membh(Rx++Mu) EA=Rx;
Rx=Rx+MuV;
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.b[i];
}
}
Rd=membh(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.b[i];
}
}
Rd=memubh(Re=#U6) apply_extension(#U);
EA=#U;
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.ub[i];
}
}
Re=#U;

80-N2040-53 Rev. AA 261


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rd=memubh(Rs+#s11:1) apply_extension(#s);
EA=Rs+#s;
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.ub[i];
}
}
Rd=memubh(Rt<<#u2+#U6) apply_extension(#U);
EA=#U+(Rt<<#u);
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.ub[i];
}
}
Rd=memubh(Rx++#s4:1) EA=Rx;
Rx=Rx+#s;
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.ub[i];
}
}
Rd=memubh(Rx++#s4:1:circ(Mu) EA=Rx;
) Rx=Rx=circ_add(Rx,#s,MuV);
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.ub[i];
}
}
Rd=memubh(Rx++I:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,I<<1,MuV);
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.ub[i];
}
}
Rd=memubh(Rx++Mu) EA=Rx;
Rx=Rx+MuV;
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.ub[i];
}
}

80-N2040-53 Rev. AA 262


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rd=memubh(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
{
tmpV = *EA;
for (i=0;i<2;i++) {
Rd.h[i]=tmpV.ub[i];
}
}
Rdd=membh(Re=#U6) apply_extension(#U);
EA=#U;
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.b[i];
}
}
Re=#U;
Rdd=membh(Rs) Assembler mapped to: "Rdd=membh""(Rs+#0)"
Rdd=membh(Rs+#s11:2) apply_extension(#s);
EA=Rs+#s;
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.b[i];
}
}
Rdd=membh(Rt<<#u2+#U6) apply_extension(#U);
EA=#U+(Rt<<#u);
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.b[i];
}
}
Rdd=membh(Rx++#s4:2) EA=Rx;
Rx=Rx+#s;
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.b[i];
}
}
Rdd=membh(Rx++#s4:2:circ(Mu) EA=Rx;
) Rx=Rx=circ_add(Rx,#s,MuV);
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.b[i];
}
}

80-N2040-53 Rev. AA 263


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rdd=membh(Rx++I:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,I<<2,MuV);
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.b[i];
}
}
Rdd=membh(Rx++Mu) EA=Rx;
Rx=Rx+MuV;
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.b[i];
}
}
Rdd=membh(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.b[i];
}
}
Rdd=memubh(Re=#U6) apply_extension(#U);
EA=#U;
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.ub[i];
}
}
Re=#U;
Rdd=memubh(Rs+#s11:2) apply_extension(#s);
EA=Rs+#s;
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.ub[i];
}
}
Rdd=memubh(Rt<<#u2+#U6) apply_extension(#U);
EA=#U+(Rt<<#u);
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.ub[i];
}
}

80-N2040-53 Rev. AA 264


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rdd=memubh(Rx++#s4:2) EA=Rx;
Rx=Rx+#s;
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.ub[i];
}
}
Rdd=memubh(Rx++#s4:2:circ(Mu EA=Rx;
)) Rx=Rx=circ_add(Rx,#s,MuV);
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.ub[i];
}
}
Rdd=memubh(Rx++I:circ(Mu)) EA=Rx;
Rx=Rx=circ_add(Rx,I<<2,MuV);
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.ub[i];
}
}
Rdd=memubh(Rx++Mu) EA=Rx;
Rx=Rx+MuV;
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.ub[i];
}
}
Rdd=memubh(Rx++Mu:brev) EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
{
tmpV = *EA;
for (i=0;i<4;i++) {
Rdd.h[i]=tmpV.ub[i];
}
}

Class: LD (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type N s5 Parse d5

1 0 0 1 0 i i 0 0 0 1 s s s s s P P i i i i i i i i i d d d d d Rd=membh(Rs+#s11:1)
1 0 0 1 0 i i 0 0 1 1 s s s s s P P i i i i i i i i i d d d d d Rd=memubh(Rs+#s11:1)
1 0 0 1 0 i i 0 1 0 1 s s s s s P P i i i i i i i i i d d d d d Rdd=memubh(Rs+#s11:2)
1 0 0 1 0 i i 0 1 1 1 s s s s s P P i i i i i i i i i d d d d d Rdd=membh(Rs+#s11:2)

80-N2040-53 Rev. AA 265


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type x5 Parse u1 d5
N

1 0 0 1 1 0 0 0 0 0 1 x x x x x P P u 0 - - 0 i i i i d d d d d Rd=membh(Rx++#s4:1:circ
(Mu))
Rd=membh(Rx++I:circ(Mu)
1 0 0 1 1 0 0 0 0 0 1 x x x x x P P u 0 - - 1 - 0 - - d d d d d
)
Rd=memubh(Rx++#s4:1:cir
1 0 0 1 1 0 0 0 0 1 1 x x x x x P P u 0 - - 0 i i i i d d d d d c(Mu))

Rd=memubh(Rx++I:circ(Mu
1 0 0 1 1 0 0 0 0 1 1 x x x x x P P u 0 - - 1 - 0 - - d d d d d
))
Rdd=memubh(Rx++#s4:2:c
1 0 0 1 1 0 0 0 1 0 1 x x x x x P P u 0 - - 0 i i i i d d d d d
irc(Mu))
Rdd=memubh(Rx++I:circ(M
1 0 0 1 1 0 0 0 1 0 1 x x x x x P P u 0 - - 1 - 0 - - d d d d d
u))

1 0 0 1 1 0 0 0 1 1 1 x x x x x P P u 0 - - 0 i i i i d d d d d Rdd=membh(Rx++#s4:2:cir
c(Mu))
Rdd=membh(Rx++I:circ(Mu
1 0 0 1 1 0 0 0 1 1 1 x x x x x P P u 0 - - 1 - 0 - - d d d d d
))
U
ICLASS Amode Type N e5 Parse d5

1 0 0 1 1 0 1 0 0 0 1 e e e e e P P 0 1 I I I I - I I d d d d d Rd=membh(Re=#U6)
U
ICLASS Amode Type x5 Parse d5
N
1 0 0 1 1 0 1 0 0 0 1 x x x x x P P 0 0 - - - i i i i d d d d d Rd=membh(Rx++#s4:1)
U
ICLASS Amode Type e5 Parse d5
N
1 0 0 1 1 0 1 0 0 1 1 e e e e e P P 0 1 I I I I - I I d d d d d Rd=memubh(Re=#U6)
U
ICLASS Amode Type x5 Parse d5
N
1 0 0 1 1 0 1 0 0 1 1 x x x x x P P 0 0 - - - i i i i d d d d d Rd=memubh(Rx++#s4:1)
U
ICLASS Amode Type N e5 Parse d5

1 0 0 1 1 0 1 0 1 0 1 e e e e e P P 0 1 I I I I - I I d d d d d Rdd=memubh(Re=#U6)
U
ICLASS Amode Type N x5 Parse d5

1 0 0 1 1 0 1 0 1 0 1 x x x x x P P 0 0 - - - i i i i d d d d d Rdd=memubh(Rx++#s4:2)
U
ICLASS Amode Type e5 Parse d5
N
1 0 0 1 1 0 1 0 1 1 1 e e e e e P P 0 1 I I I I - I I d d d d d Rdd=membh(Re=#U6)
U
ICLASS Amode Type x5 Parse d5
N
1 0 0 1 1 0 1 0 1 1 1 x x x x x P P 0 0 - - - i i i i d d d d d Rdd=membh(Rx++#s4:2)
U
ICLASS Amode Type t5 Parse d5
N
1 0 0 1 1 1 0 0 0 0 1 t t t t t P P i 1 I I I I i I I d d d d d Rd=membh(Rt<<#u2+#U6)
U
ICLASS Amode Type x5 Parse u1 d5
N
1 0 0 1 1 1 0 0 0 0 1 x x x x x P P u 0 - - - - 0 - - d d d d d Rd=membh(Rx++Mu)

ICLASS Amode Type U t5 Parse d5


N

1 0 0 1 1 1 0 0 0 1 1 t t t t t P P i 1 I I I I i I I d d d d d Rd=memubh(Rt<<#u2+#U
6)
U
ICLASS Amode Type x5 Parse u1 d5
N
1 0 0 1 1 1 0 0 0 1 1 x x x x x P P u 0 - - - - 0 - - d d d d d Rd=memubh(Rx++Mu)
U
ICLASS Amode Type t5 Parse d5
N

1 0 0 1 1 1 0 0 1 0 1 t t t t t P P i 1 I I I I i I I d d d d d Rdd=memubh(Rt<<#u2+#
U6)

80-N2040-53 Rev. AA 266


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type x5 Parse u1 d5
N
1 0 0 1 1 1 0 0 1 0 1 x x x x x P P u 0 - - - - 0 - - d d d d d Rdd=memubh(Rx++Mu)
U
ICLASS Amode Type t5 Parse d5
N

1 0 0 1 1 1 0 0 1 1 1 t t t t t P P i 1 I I I I i I I d d d d d Rdd=membh(Rt<<#u2+#U
6)
U
ICLASS Amode Type x5 Parse u1 d5
N
1 0 0 1 1 1 0 0 1 1 1 x x x x x P P u 0 - - - - 0 - - d d d d d Rdd=membh(Rx++Mu)
1 0 0 1 1 1 1 0 0 0 1 x x x x x P P u 0 - - - - 0 - - d d d d d Rd=membh(Rx++Mu:brev)
Rd=memubh(Rx++Mu:brev
1 0 0 1 1 1 1 0 0 1 1 x x x x x P P u 0 - - - - 0 - - d d d d d
)
Rdd=memubh(Rx++Mu:bre
1 0 0 1 1 1 1 0 1 0 1 x x x x x P P u 0 - - - - 0 - - d d d d d
v)
Rdd=membh(Rx++Mu:brev
1 0 0 1 1 1 1 0 1 1 1 x x x x x P P u 0 - - - - 0 - - d d d d d )

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
e5 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
x5 Field to encode register x

80-N2040-53 Rev. AA 267


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.6 MEMOP
The MEMOP instruction class includes simple operations on values in memory.
MEMOP instructions are executable on slot 0.

Operation on memory byte


Perform ALU or bit operation on the memory byte at the effective address.

Syntax Behavior
memb(Rs+#u6:0)=clrbit(#U5) apply_extension(#u);
EA=Rs+#u;
tmp = *EA;
tmp &= (~(1<<#U));
*EA = tmp;
memb(Rs+#u6:0)=setbit(#U5) apply_extension(#u);
EA=Rs+#u;
tmp = *EA;
tmp |= (1<<#U);
*EA = tmp;
memb(Rs+#u6:0)[+-]=#U5 apply_extension(#u);
EA=Rs[+-]#u;
tmp = *EA;
tmp [+-]= #U;
*EA = tmp;
memb(Rs+#u6:0)[+-|&]=Rt apply_extension(#u);
EA=Rs+#u;
tmp = *EA;
tmp [+-|&]= Rt;
*EA = tmp;

Class: MEMOP (slots 0)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5
0 0 1 1 1 1 1 0 - 0 0 s s s s s P P 0 i i i i i i 0 0 t t t t t memb(Rs+#u6:0)+=Rt
0 0 1 1 1 1 1 0 - 0 0 s s s s s P P 0 i i i i i i 0 1 t t t t t memb(Rs+#u6:0)-=Rt
0 0 1 1 1 1 1 0 - 0 0 s s s s s P P 0 i i i i i i 1 0 t t t t t memb(Rs+#u6:0)&=Rt
0 0 1 1 1 1 1 0 - 0 0 s s s s s P P 0 i i i i i i 1 1 t t t t t memb(Rs+#u6:0)|=Rt
ICLASS s5 Parse
0 0 1 1 1 1 1 1 - 0 0 s s s s s P P 0 i i i i i i 0 0 I I I I I memb(Rs+#u6:0)+=#U5
0 0 1 1 1 1 1 1 - 0 0 s s s s s P P 0 i i i i i i 0 1 I I I I I memb(Rs+#u6:0)-=#U5

0 0 1 1 1 1 1 1 - 0 0 s s s s s P P 0 i i i i i i 1 0 I I I I I memb(Rs+#u6:0)=clrbit(#U
5)
memb(Rs+#u6:0)=setbit(#
0 0 1 1 1 1 1 1 - 0 0 s s s s s P P 0 i i i i i i 1 1 I I I I I
U5)

80-N2040-53 Rev. AA 268


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 269


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Operation on memory halfword


Perform ALU or bit operation on the memory halfword at the effective address.

Syntax Behavior
memh(Rs+#u6:1)=clrbit(#U5) apply_extension(#u);
EA=Rs+#u;
tmp = *EA;
tmp &= (~(1<<#U));
*EA = tmp;
memh(Rs+#u6:1)=setbit(#U5) apply_extension(#u);
EA=Rs+#u;
tmp = *EA;
tmp |= (1<<#U);
*EA = tmp;
memh(Rs+#u6:1)[+-]=#U5 apply_extension(#u);
EA=Rs[+-]#u;
tmp = *EA;
tmp [+-]= #U;
*EA = tmp;
memh(Rs+#u6:1)[+-|&]=Rt apply_extension(#u);
EA=Rs+#u;
tmp = *EA;
tmp [+-|&]= Rt;
*EA = tmp;

Class: MEMOP (slots 0)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5
0 0 1 1 1 1 1 0 - 0 1 s s s s s P P 0 i i i i i i 0 0 t t t t t memh(Rs+#u6:1)+=Rt
0 0 1 1 1 1 1 0 - 0 1 s s s s s P P 0 i i i i i i 0 1 t t t t t memh(Rs+#u6:1)-=Rt
0 0 1 1 1 1 1 0 - 0 1 s s s s s P P 0 i i i i i i 1 0 t t t t t memh(Rs+#u6:1)&=Rt
0 0 1 1 1 1 1 0 - 0 1 s s s s s P P 0 i i i i i i 1 1 t t t t t memh(Rs+#u6:1)|=Rt
ICLASS s5 Parse
0 0 1 1 1 1 1 1 - 0 1 s s s s s P P 0 i i i i i i 0 0 I I I I I memh(Rs+#u6:1)+=#U5
0 0 1 1 1 1 1 1 - 0 1 s s s s s P P 0 i i i i i i 0 1 I I I I I memh(Rs+#u6:1)-=#U5
memh(Rs+#u6:1)=clrbit(#U
0 0 1 1 1 1 1 1 - 0 1 s s s s s P P 0 i i i i i i 1 0 I I I I I
5)
memh(Rs+#u6:1)=setbit(#
0 0 1 1 1 1 1 1 - 0 1 s s s s s P P 0 i i i i i i 1 1 I I I I I
U5)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 270


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Operation on memory word


Perform ALU or bit operation on the memory word at the effective address.

Syntax Behavior
memw(Rs+#u6:2)=clrbit(#U5) apply_extension(#u);
EA=Rs+#u;
tmp = *EA;
tmp &= (~(1<<#U));
*EA = tmp;
memw(Rs+#u6:2)=setbit(#U5) apply_extension(#u);
EA=Rs+#u;
tmp = *EA;
tmp |= (1<<#U);
*EA = tmp;
memw(Rs+#u6:2)[+-]=#U5 apply_extension(#u);
EA=Rs[+-]#u;
tmp = *EA;
tmp [+-]= #U;
*EA = tmp;
memw(Rs+#u6:2)[+-|&]=Rt apply_extension(#u);
EA=Rs+#u;
tmp = *EA;
tmp [+-|&]= Rt;
*EA = tmp;

Class: MEMOP (slots 0)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse t5
0 0 1 1 1 1 1 0 - 1 0 s s s s s P P 0 i i i i i i 0 0 t t t t t memw(Rs+#u6:2)+=Rt
0 0 1 1 1 1 1 0 - 1 0 s s s s s P P 0 i i i i i i 0 1 t t t t t memw(Rs+#u6:2)-=Rt
0 0 1 1 1 1 1 0 - 1 0 s s s s s P P 0 i i i i i i 1 0 t t t t t memw(Rs+#u6:2)&=Rt
0 0 1 1 1 1 1 0 - 1 0 s s s s s P P 0 i i i i i i 1 1 t t t t t memw(Rs+#u6:2)|=Rt
ICLASS s5 Parse
0 0 1 1 1 1 1 1 - 1 0 s s s s s P P 0 i i i i i i 0 0 I I I I I memw(Rs+#u6:2)+=#U5
0 0 1 1 1 1 1 1 - 1 0 s s s s s P P 0 i i i i i i 0 1 I I I I I memw(Rs+#u6:2)-=#U5
memw(Rs+#u6:2)=clrbit(#U
0 0 1 1 1 1 1 1 - 1 0 s s s s s P P 0 i i i i i i 1 0 I I I I I
5)
memw(Rs+#u6:2)=setbit(#
0 0 1 1 1 1 1 1 - 1 0 s s s s s P P 0 i i i i i i 1 1 I I I I I
U5)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 271


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.7 NV
The NV instruction class includes instructions that take the register source operand from another
instruction in the same packet.
NV instructions are executable on slot 0.

11.7.1 NV J
The NV J instruction subclass includes jump instructions that take the register source operand
from another instruction in the same packet.

Jump to address condition on new register value


Compare a register or constant against the value produced by a slot 1 instruction. If the
comparison is true, the program counter is changed to a target address, relative to the current PC.
This instruction is executable only on slot 0.

Syntax Behavior
if ([!]cmp.eq(Ns.new,#-1)) jump:<hint> if ((Ns.new[!]=(-1))) {
#r9:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
if ([!]cmp.eq(Ns.new,#U5)) jump:<hint> if ((Ns.new[!]=(#U))) {
#r9:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
if ([!]cmp.eq(Ns.new,Rt)) jump:<hint> if ((Ns.new[!]=Rt)) {
#r9:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
if ([!]cmp.gt(Ns.new,#-1)) jump:<hint> if ([!](Ns.new>(-1))) {
#r9:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
if ([!]cmp.gt(Ns.new,#U5)) jump:<hint> if ([!](Ns.new>(#U))) {
#r9:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
if ([!]cmp.gt(Ns.new,Rt)) jump:<hint> if ([!](Ns.new>Rt)) {
#r9:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}

80-N2040-53 Rev. AA 272


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
if ([!]cmp.gt(Rt,Ns.new)) jump:<hint> if ([!](Rt>Ns.new)) {
#r9:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
if ([!]cmp.gtu(Ns.new,#U5)) jump:<hint> if ([!](Ns.new.uw[0]>(#U))) {
#r9:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
if ([!]cmp.gtu(Ns.new,Rt)) jump:<hint> if ([!](Ns.new.uw[0]>Rt.uw[0])) {
#r9:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
if ([!]cmp.gtu(Rt,Ns.new)) jump:<hint> if ([!](Rt.uw[0]>Ns.new.uw[0])) {
#r9:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}
if ([!]tstbit(Ns.new,#0)) jump:<hint> if ([!]((Ns.new) & 1)) {
#r9:2 apply_extension(#r);
#r=#r & ~PCALIGN_MASK;
PC=PC+#r;
}

Class: NV (slots 0)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s3 Parse t5

0 0 1 0 0 0 0 0 0 0 i i - s s s P P 0 t t t t t i i i i i i i - if (cmp.eq(Ns.new,Rt))
jump:nt #r9:2
if (cmp.eq(Ns.new,Rt))
0 0 1 0 0 0 0 0 0 0 i i - s s s P P 1 t t t t t i i i i i i i - jump:t #r9:2

0 0 1 0 0 0 0 0 0 1 i i - s s s P P 0 t t t t t i i i i i i i - if (!cmp.eq(Ns.new,Rt))
jump:nt #r9:2
if (!cmp.eq(Ns.new,Rt))
0 0 1 0 0 0 0 0 0 1 i i - s s s P P 1 t t t t t i i i i i i i -
jump:t #r9:2
if (cmp.gt(Ns.new,Rt))
0 0 1 0 0 0 0 0 1 0 i i - s s s P P 0 t t t t t i i i i i i i - jump:nt #r9:2
if (cmp.gt(Ns.new,Rt))
0 0 1 0 0 0 0 0 1 0 i i - s s s P P 1 t t t t t i i i i i i i -
jump:t #r9:2
if (!cmp.gt(Ns.new,Rt))
0 0 1 0 0 0 0 0 1 1 i i - s s s P P 0 t t t t t i i i i i i i -
jump:nt #r9:2
if (!cmp.gt(Ns.new,Rt))
0 0 1 0 0 0 0 0 1 1 i i - s s s P P 1 t t t t t i i i i i i i -
jump:t #r9:2
if (cmp.gtu(Ns.new,Rt))
0 0 1 0 0 0 0 1 0 0 i i - s s s P P 0 t t t t t i i i i i i i -
jump:nt #r9:2

0 0 1 0 0 0 0 1 0 0 i i - s s s P P 1 t t t t t i i i i i i i - if (cmp.gtu(Ns.new,Rt))
jump:t #r9:2
if (!cmp.gtu(Ns.new,Rt))
0 0 1 0 0 0 0 1 0 1 i i - s s s P P 0 t t t t t i i i i i i i -
jump:nt #r9:2

80-N2040-53 Rev. AA 273


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
if (!cmp.gtu(Ns.new,Rt))
0 0 1 0 0 0 0 1 0 1 i i - s s s P P 1 t t t t t i i i i i i i -
jump:t #r9:2

0 0 1 0 0 0 0 1 1 0 i i - s s s P P 0 t t t t t i i i i i i i - if (cmp.gt(Rt,Ns.new))
jump:nt #r9:2
if (cmp.gt(Rt,Ns.new))
0 0 1 0 0 0 0 1 1 0 i i - s s s P P 1 t t t t t i i i i i i i -
jump:t #r9:2
if (!cmp.gt(Rt,Ns.new))
0 0 1 0 0 0 0 1 1 1 i i - s s s P P 0 t t t t t i i i i i i i - jump:nt #r9:2
if (!cmp.gt(Rt,Ns.new))
0 0 1 0 0 0 0 1 1 1 i i - s s s P P 1 t t t t t i i i i i i i -
jump:t #r9:2
if (cmp.gtu(Rt,Ns.new))
0 0 1 0 0 0 1 0 0 0 i i - s s s P P 0 t t t t t i i i i i i i -
jump:nt #r9:2
if (cmp.gtu(Rt,Ns.new))
0 0 1 0 0 0 1 0 0 0 i i - s s s P P 1 t t t t t i i i i i i i -
jump:t #r9:2

0 0 1 0 0 0 1 0 0 1 i i - s s s P P 0 t t t t t i i i i i i i - if (!cmp.gtu(Rt,Ns.new))
jump:nt #r9:2
if (!cmp.gtu(Rt,Ns.new))
0 0 1 0 0 0 1 0 0 1 i i - s s s P P 1 t t t t t i i i i i i i -
jump:t #r9:2
ICLASS s3 Parse

0 0 1 0 0 1 0 0 0 0 i i - s s s P P 0 I I I I I i i i i i i i - if (cmp.eq(Ns.new,#U5))
jump:nt #r9:2
if (cmp.eq(Ns.new,#U5))
0 0 1 0 0 1 0 0 0 0 i i - s s s P P 1 I I I I I i i i i i i i -
jump:t #r9:2

0 0 1 0 0 1 0 0 0 1 i i - s s s P P 0 I I I I I i i i i i i i - if (!cmp.eq(Ns.new,#U5))
jump:nt #r9:2
if (!cmp.eq(Ns.new,#U5))
0 0 1 0 0 1 0 0 0 1 i i - s s s P P 1 I I I I I i i i i i i i -
jump:t #r9:2
if (cmp.gt(Ns.new,#U5))
0 0 1 0 0 1 0 0 1 0 i i - s s s P P 0 I I I I I i i i i i i i - jump:nt #r9:2

0 0 1 0 0 1 0 0 1 0 i i - s s s P P 1 I I I I I i i i i i i i - if (cmp.gt(Ns.new,#U5))
jump:t #r9:2
if (!cmp.gt(Ns.new,#U5))
0 0 1 0 0 1 0 0 1 1 i i - s s s P P 0 I I I I I i i i i i i i -
jump:nt #r9:2
if (!cmp.gt(Ns.new,#U5))
0 0 1 0 0 1 0 0 1 1 i i - s s s P P 1 I I I I I i i i i i i i - jump:t #r9:2
if (cmp.gtu(Ns.new,#U5))
0 0 1 0 0 1 0 1 0 0 i i - s s s P P 0 I I I I I i i i i i i i -
jump:nt #r9:2
if (cmp.gtu(Ns.new,#U5))
0 0 1 0 0 1 0 1 0 0 i i - s s s P P 1 I I I I I i i i i i i i -
jump:t #r9:2
if (!cmp.gtu(Ns.new,#U5))
0 0 1 0 0 1 0 1 0 1 i i - s s s P P 0 I I I I I i i i i i i i -
jump:nt #r9:2
if (!cmp.gtu(Ns.new,#U5))
0 0 1 0 0 1 0 1 0 1 i i - s s s P P 1 I I I I I i i i i i i i -
jump:t #r9:2
if (tstbit(Ns.new,#0))
0 0 1 0 0 1 0 1 1 0 i i - s s s P P 0 - - - - - i i i i i i i -
jump:nt #r9:2
if (tstbit(Ns.new,#0)) jump:t
0 0 1 0 0 1 0 1 1 0 i i - s s s P P 1 - - - - - i i i i i i i -
#r9:2

0 0 1 0 0 1 0 1 1 1 i i - s s s P P 0 - - - - - i i i i i i i - if (!tstbit(Ns.new,#0))
jump:nt #r9:2
if (!tstbit(Ns.new,#0)) jump:t
0 0 1 0 0 1 0 1 1 1 i i - s s s P P 1 - - - - - i i i i i i i -
#r9:2
if (cmp.eq(Ns.new,#-1))
0 0 1 0 0 1 1 0 0 0 i i - s s s P P 0 - - - - - i i i i i i i - jump:nt #r9:2
if (cmp.eq(Ns.new,#-1))
0 0 1 0 0 1 1 0 0 0 i i - s s s P P 1 - - - - - i i i i i i i -
jump:t #r9:2
if (!cmp.eq(Ns.new,#-1))
0 0 1 0 0 1 1 0 0 1 i i - s s s P P 0 - - - - - i i i i i i i -
jump:nt #r9:2

80-N2040-53 Rev. AA 274


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
if (!cmp.eq(Ns.new,#-1))
0 0 1 0 0 1 1 0 0 1 i i - s s s P P 1 - - - - - i i i i i i i -
jump:t #r9:2

0 0 1 0 0 1 1 0 1 0 i i - s s s P P 0 - - - - - i i i i i i i - if (cmp.gt(Ns.new,#-1))
jump:nt #r9:2
if (cmp.gt(Ns.new,#-1))
0 0 1 0 0 1 1 0 1 0 i i - s s s P P 1 - - - - - i i i i i i i -
jump:t #r9:2
if (!cmp.gt(Ns.new,#-1))
0 0 1 0 0 1 1 0 1 1 i i - s s s P P 0 - - - - - i i i i i i i - jump:nt #r9:2
if (!cmp.gt(Ns.new,#-1))
0 0 1 0 0 1 1 0 1 1 i i - s s s P P 1 - - - - - i i i i i i i -
jump:t #r9:2

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s3 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 275


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.7.2 NV ST
The NV ST instruction subclass includes store instructions which take the register source operand
from another instruction in the same packet.

Store new-value byte


Store the least-significant byte in a source register in memory at the effective address.

Syntax Behavior
memb(Re=#U6)=Nt.new apply_extension(#U);
EA=#U;
*EA = Nt.new.b[0];
Re=#U;
memb(Rs+#s11:0)=Nt.new apply_extension(#s);
EA=Rs+#s;
*EA = Nt.new.b[0];
memb(Rs+Ru<<#u2)=Nt.new EA=Rs+(Ru<<#u);
*EA = Nt.new.b[0];
memb(Ru<<#u2+#U6)=Nt.new apply_extension(#U);
EA=#U+(Ru<<#u);
*EA = Nt.new.b[0];
memb(Rx++#s4:0)=Nt.new EA=Rx;
Rx=Rx+#s;
*EA = Nt.new.b[0];
memb(Rx++#s4:0:circ(Mu))=Nt. EA=Rx;
new Rx=Rx=circ_add(Rx,#s,MuV);
*EA = Nt.new.b[0];
memb(Rx++I:circ(Mu))=Nt.new EA=Rx;
Rx=Rx=circ_add(Rx,I<<0,MuV);
*EA = Nt.new.b[0];
memb(Rx++Mu)=Nt.new EA=Rx;
Rx=Rx+MuV;
*EA = Nt.new.b[0];
memb(Rx++Mu:brev)=Nt.new EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
*EA = Nt.new.b[0];
memb(gp+#u16:0)=Nt.new apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
*EA = Nt.new.b[0];

Class: NV (slots 0)

Notes

■ Forms of this instruction that use a new-value operand produced in the packet must execute
on slot 0.
■ This instruction can execute only in slot 0, even though it is an ST instruction.

80-N2040-53 Rev. AA 276


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t3
memb(Rs+Ru<<#u2)=Nt.n
0 0 1 1 1 0 1 1 1 0 1 s s s s s P P i u u u u u i - - 0 0 t t t ew
ICLASS Type Parse t3
0 1 0 0 1 i i 0 1 0 1 i i i i i P P i 0 0 t t t i i i i i i i i memb(gp+#u16:0)=Nt.new
U
ICLASS Amode Type s5 Parse t3
N
1 0 1 0 0 i i 1 1 0 1 s s s s s P P i 0 0 t t t i i i i i i i i memb(Rs+#s11:0)=Nt.new
U
ICLASS Amode Type x5 Parse u1 t3
N
memb(Rx++I:circ(Mu))=Nt.
1 0 1 0 1 0 0 1 1 0 1 x x x x x P P u 0 0 t t t 0 - - - - - 1 -
new

1 0 1 0 1 0 0 1 1 0 1 x x x x x P P u 0 0 t t t 0 i i i i - 0 - memb(Rx++#s4:0:circ(Mu))
=Nt.new
U
ICLASS Amode Type e5 Parse t3
N
1 0 1 0 1 0 1 1 1 0 1 e e e e e P P 0 0 0 t t t 1 - I I I I I I memb(Re=#U6)=Nt.new
U
ICLASS Amode Type N x5 Parse t3

1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 0 0 0 t t t 0 i i i i - 0 - memb(Rx++#s4:0)=Nt.new
U
ICLASS Amode Type N
u5 Parse t3

memb(Ru<<#u2+#U6)=Nt.
1 0 1 0 1 1 0 1 1 0 1 u u u u u P P i 0 0 t t t 1 i I I I I I I new

ICLASS Amode Type U x5 Parse u1 t3


N
1 0 1 0 1 1 0 1 1 0 1 x x x x x P P u 0 0 t t t 0 - - - - - - - memb(Rx++Mu)=Nt.new
memb(Rx++Mu:brev)=Nt.n
1 0 1 0 1 1 1 1 1 0 1 x x x x x P P u 0 0 t t t 0 - - - - - - -
ew

Field name Description


ICLASS Instruction class
Type Type
Parse Packet/loop parse bits
e5 Field to encode register e
s5 Field to encode register s
t3 Field to encode register t
u1 Field to encode register u
u5 Field to encode register u
x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 277


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store new-value byte conditionally


Store the least-significant byte in a source register in memory at the effective address.
This instruction is conditional based on a predicate value. If the predicate is true, the instruction is
performed, otherwise it is treated as a NOP.

Syntax Behavior
if ([!]Pv[.new]) apply_extension(#u);
memb(#u6)=Nt.new EA=#u;
if ([!]Pv[.new][0]) {
*EA = Nt[.new].b[0];
} else {
NOP;
}
if ([!]Pv[.new]) apply_extension(#u);
memb(Rs+#u6:0)=Nt.new EA=Rs+#u;
if ([!]Pv[.new][0]) {
*EA = Nt[.new].b[0];
} else {
NOP;
}
if ([!]Pv[.new]) EA=Rs+(Ru<<#u);
memb(Rs+Ru<<#u2)=Nt.new if ([!]Pv[.new][0]) {
*EA = Nt[.new].b[0];
} else {
NOP;
}
if ([!]Pv[.new]) EA=Rx;
memb(Rx++#s4:0)=Nt.new if ([!]Pv[.new][0]){
Rx=Rx+#s;
*EA = Nt[.new].b[0];
} else {
NOP;
}

Class: NV (slots 0)

Notes

■ Forms of this instruction which use a new-value operand produced in the packet must
execute on slot 0.
■ This instruction can execute only in slot 0, even though it is an ST instruction.

80-N2040-53 Rev. AA 278


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t3
if (Pv)
0 0 1 1 0 1 0 0 1 0 1 s s s s s P P i u u u u u i v v 0 0 t t t memb(Rs+Ru<<#u2)=Nt.n
ew
if (!Pv)
0 0 1 1 0 1 0 1 1 0 1 s s s s s P P i u u u u u i v v 0 0 t t t memb(Rs+Ru<<#u2)=Nt.n
ew
if (Pv.new)
0 0 1 1 0 1 1 0 1 0 1 s s s s s P P i u u u u u i v v 0 0 t t t memb(Rs+Ru<<#u2)=Nt.n
ew
if (!Pv.new)
0 0 1 1 0 1 1 1 1 0 1 s s s s s P P i u u u u u i v v 0 0 t t t memb(Rs+Ru<<#u2)=Nt.n
ew
Pr
Se
ed
ICLASS ns Type s5 Parse t3
Ne
e
w
if (Pv)
0 1 0 0 0 0 0 0 1 0 1 s s s s s P P i 0 0 t t t i i i i i 0 v v memb(Rs+#u6:0)=Nt.new

v if
(Pv.new)
0 1 0 0 0 0 1 0 1 0 1 s s s s s P P i 0 0 t t t i i i i i 0 v memb(Rs+#u6:0)=Nt.new

0 1 0 0 0 1 0 0 1 0 1 s s s s s P P i 0 0 t t t i i i i i 0 v v if (!Pv)
memb(Rs+#u6:0)=Nt.new
if (!Pv.new)
0 1 0 0 0 1 1 0 1 0 1 s s s s s P P i 0 0 t t t i i i i i 0 v v memb(Rs+#u6:0)=Nt.new

ICLASS Amode Type U x5 Parse t3


N

1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 1 0 0 t t t 0 i i i i 0 v v if (Pv)
memb(Rx++#s4:0)=Nt.new
if (!Pv)
1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 1 0 0 t t t 0 i i i i 1 v v memb(Rx++#s4:0)=Nt.new

v if
1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 1 0 0 t t t 1 i i i i 0 (Pv.new)
v
memb(Rx++#s4:0)=Nt.new

1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 1 0 0 t t t 1 i i i i 1 v v if (!Pv.new)
memb(Rx++#s4:0)=Nt.new
U
ICLASS Amode Type Parse t3
N
1 0 1 0 1 1 1 1 1 0 1 - - - i i P P 0 0 0 t t t 1 i i i i 0 v v if (Pv) memb(#u6)=Nt.new
1 0 1 0 1 1 1 1 1 0 1 - - - i i P P 0 0 0 t t t 1 i i i i 1 v v if (!Pv) memb(#u6)=Nt.new
if (Pv.new)
1 0 1 0 1 1 1 1 1 0 1 - - - i i P P 1 0 0 t t t 1 i i i i 0 v v memb(#u6)=Nt.new

v ifmemb(#u6)=Nt.new
(!Pv.new)
1 0 1 0 1 1 1 1 1 0 1 - - - i i P P 1 0 0 t t t 1 i i i i 1 v

Field name Description


ICLASS Instruction class
Type Type
PredNew PredNew
Sense Sense
Parse Packet/loop parse bits
s5 Field to encode register s
t3 Field to encode register t
u5 Field to encode register u
v2 Field to encode register v
x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 279


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store new-value halfword


Store the upper or lower 16-bits of a source register in memory at the effective address.

Syntax Behavior
memh(Re=#U6)=Nt.new apply_extension(#U);
EA=#U;
*EA = Nt.new.h[0];
Re=#U;
memh(Rs+#s11:1)=Nt.new apply_extension(#s);
EA=Rs+#s;
*EA = Nt.new.h[0];
memh(Rs+Ru<<#u2)=Nt.new EA=Rs+(Ru<<#u);
*EA = Nt.new.h[0];
memh(Ru<<#u2+#U6)=Nt.new apply_extension(#U);
EA=#U+(Ru<<#u);
*EA = Nt.new.h[0];
memh(Rx++#s4:1)=Nt.new EA=Rx;
Rx=Rx+#s;
*EA = Nt.new.h[0];
memh(Rx++#s4:1:circ(Mu))=Nt. EA=Rx;
new Rx=Rx=circ_add(Rx,#s,MuV);
*EA = Nt.new.h[0];
memh(Rx++I:circ(Mu))=Nt.new EA=Rx;
Rx=Rx=circ_add(Rx,I<<1,MuV);
*EA = Nt.new.h[0];
memh(Rx++Mu)=Nt.new EA=Rx;
Rx=Rx+MuV;
*EA = Nt.new.h[0];
memh(Rx++Mu:brev)=Nt.new EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
*EA = Nt.new.h[0];
memh(gp+#u16:1)=Nt.new apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
*EA = Nt.new.h[0];

Class: NV (slots 0)

Notes

■ Forms of this instruction that use a new-value operand produced in the packet must execute
on slot 0.
■ This instruction can execute only in slot 0, even though it is an ST instruction.

80-N2040-53 Rev. AA 280


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t3
memh(Rs+Ru<<#u2)=Nt.n
0 0 1 1 1 0 1 1 1 0 1 s s s s s P P i u u u u u i - - 0 1 t t t ew
ICLASS Type Parse t3
0 1 0 0 1 i i 0 1 0 1 i i i i i P P i 0 1 t t t i i i i i i i i memh(gp+#u16:1)=Nt.new
U
ICLASS Amode Type s5 Parse t3
N
1 0 1 0 0 i i 1 1 0 1 s s s s s P P i 0 1 t t t i i i i i i i i memh(Rs+#s11:1)=Nt.new
U
ICLASS Amode Type x5 Parse u1 t3
N
memh(Rx++I:circ(Mu))=Nt.
1 0 1 0 1 0 0 1 1 0 1 x x x x x P P u 0 1 t t t 0 - - - - - 1 -
new

1 0 1 0 1 0 0 1 1 0 1 x x x x x P P u 0 1 t t t 0 i i i i - 0 - memh(Rx++#s4:1:circ(Mu))
=Nt.new
U
ICLASS Amode Type e5 Parse t3
N
1 0 1 0 1 0 1 1 1 0 1 e e e e e P P 0 0 1 t t t 1 - I I I I I I memh(Re=#U6)=Nt.new
U
ICLASS Amode Type N x5 Parse t3

1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 0 0 1 t t t 0 i i i i - 0 - memh(Rx++#s4:1)=Nt.new
U
ICLASS Amode Type N
u5 Parse t3

memh(Ru<<#u2+#U6)=Nt.
1 0 1 0 1 1 0 1 1 0 1 u u u u u P P i 0 1 t t t 1 i I I I I I I new

ICLASS Amode Type U x5 Parse u1 t3


N
1 0 1 0 1 1 0 1 1 0 1 x x x x x P P u 0 1 t t t 0 - - - - - - - memh(Rx++Mu)=Nt.new
memh(Rx++Mu:brev)=Nt.n
1 0 1 0 1 1 1 1 1 0 1 x x x x x P P u 0 1 t t t 0 - - - - - - -
ew

Field name Description


ICLASS Instruction class
Type Type
Parse Packet/loop parse bits
e5 Field to encode register e
s5 Field to encode register s
t3 Field to encode register t
u1 Field to encode register u
u5 Field to encode register u
x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 281


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store new-value halfword conditionally


Store the upper or lower 16 bits of a source register in memory at the effective address.
This instruction is conditional based on a predicate value. If the predicate is true, the instruction is
performed, otherwise it is treated as a NOP.

Syntax Behavior
if ([!]Pv[.new]) memh(#u6)=Nt.new apply_extension(#u);
EA=#u;
if ([!]Pv[.new][0]) {
*EA = Nt[.new].h[0];
} else {
NOP;
}
if ([!]Pv[.new]) apply_extension(#u);
memh(Rs+#u6:1)=Nt.new EA=Rs+#u;
if ([!]Pv[.new][0]) {
*EA = Nt[.new].h[0];
} else {
NOP;
}
if ([!]Pv[.new]) EA=Rs+(Ru<<#u);
memh(Rs+Ru<<#u2)=Nt.new if ([!]Pv[.new][0]) {
*EA = Nt[.new].h[0];
} else {
NOP;
}
if ([!]Pv[.new]) EA=Rx;
memh(Rx++#s4:1)=Nt.new if ([!]Pv[.new][0]){
Rx=Rx+#s;
*EA = Nt[.new].h[0];
} else {
NOP;
}

Class: NV (slots 0)

Notes

■ Forms of this instruction that use a new-value operand produced in the packet must execute
on slot 0.
■ This instruction can execute only in slot 0, even though it is an ST instruction.

80-N2040-53 Rev. AA 282


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t3
if (Pv)
0 0 1 1 0 1 0 0 1 0 1 s s s s s P P i u u u u u i v v 0 1 t t t memh(Rs+Ru<<#u2)=Nt.n
ew
if (!Pv)
0 0 1 1 0 1 0 1 1 0 1 s s s s s P P i u u u u u i v v 0 1 t t t memh(Rs+Ru<<#u2)=Nt.n
ew
if (Pv.new)
0 0 1 1 0 1 1 0 1 0 1 s s s s s P P i u u u u u i v v 0 1 t t t memh(Rs+Ru<<#u2)=Nt.n
ew
if (!Pv.new)
0 0 1 1 0 1 1 1 1 0 1 s s s s s P P i u u u u u i v v 0 1 t t t memh(Rs+Ru<<#u2)=Nt.n
ew
Pr
Se
ed
ICLASS ns Type s5 Parse t3
Ne
e
w
if (Pv)
0 1 0 0 0 0 0 0 1 0 1 s s s s s P P i 0 1 t t t i i i i i 0 v v memh(Rs+#u6:1)=Nt.new

v if
(Pv.new)
0 1 0 0 0 0 1 0 1 0 1 s s s s s P P i 0 1 t t t i i i i i 0 v memh(Rs+#u6:1)=Nt.new

0 1 0 0 0 1 0 0 1 0 1 s s s s s P P i 0 1 t t t i i i i i 0 v v if (!Pv)
memh(Rs+#u6:1)=Nt.new
if (!Pv.new)
0 1 0 0 0 1 1 0 1 0 1 s s s s s P P i 0 1 t t t i i i i i 0 v v memh(Rs+#u6:1)=Nt.new

ICLASS Amode Type U x5 Parse t3


N

1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 1 0 1 t t t 0 i i i i 0 v v if (Pv)
memh(Rx++#s4:1)=Nt.new
if (!Pv)
1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 1 0 1 t t t 0 i i i i 1 v v memh(Rx++#s4:1)=Nt.new

v if
1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 1 0 1 t t t 1 i i i i 0 (Pv.new)
v
memh(Rx++#s4:1)=Nt.new

1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 1 0 1 t t t 1 i i i i 1 v v if (!Pv.new)
memh(Rx++#s4:1)=Nt.new
U
ICLASS Amode Type Parse t3
N
1 0 1 0 1 1 1 1 1 0 1 - - - i i P P 0 0 1 t t t 1 i i i i 0 v v if (Pv) memh(#u6)=Nt.new
1 0 1 0 1 1 1 1 1 0 1 - - - i i P P 0 0 1 t t t 1 i i i i 1 v v if (!Pv) memh(#u6)=Nt.new
if (Pv.new)
1 0 1 0 1 1 1 1 1 0 1 - - - i i P P 1 0 1 t t t 1 i i i i 0 v v memh(#u6)=Nt.new

v ifmemh(#u6)=Nt.new
(!Pv.new)
1 0 1 0 1 1 1 1 1 0 1 - - - i i P P 1 0 1 t t t 1 i i i i 1 v

Field name Description


ICLASS Instruction class
Type Type
PredNew PredNew
Sense Sense
Parse Packet/loop parse bits
s5 Field to encode register s
t3 Field to encode register t
u5 Field to encode register u
v2 Field to encode register v
x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 283


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store new-value word


Store a 32-bit register in memory at the effective address.
Syntax Behavior
memw(Re=#U6)=Nt.new apply_extension(#U);
EA=#U;
*EA = Nt.new;
Re=#U;
memw(Rs+#s11:2)=Nt.new apply_extension(#s);
EA=Rs+#s;
*EA = Nt.new;
memw(Rs+Ru<<#u2)=Nt.new EA=Rs+(Ru<<#u);
*EA = Nt.new;
memw(Ru<<#u2+#U6)=Nt.new apply_extension(#U);
EA=#U+(Ru<<#u);
*EA = Nt.new;
memw(Rx++#s4:2)=Nt.new EA=Rx;
Rx=Rx+#s;
*EA = Nt.new;
memw(Rx++#s4:2:circ(Mu))=Nt. EA=Rx;
new Rx=Rx=circ_add(Rx,#s,MuV);
*EA = Nt.new;
memw(Rx++I:circ(Mu))=Nt.new EA=Rx;
Rx=Rx=circ_add(Rx,I<<2,MuV);
*EA = Nt.new;
memw(Rx++Mu)=Nt.new EA=Rx;
Rx=Rx+MuV;
*EA = Nt.new;
memw(Rx++Mu:brev)=Nt.new EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
*EA = Nt.new;
memw(gp+#u16:2)=Nt.new apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
*EA = Nt.new;

Class: NV (slots 0)

Notes

■ Forms of this instruction that use a new-value operand produced in the packet must execute
on slot 0.
■ This instruction can execute only in slot 0, even though it is an ST instruction.

80-N2040-53 Rev. AA 284


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t3
memw(Rs+Ru<<#u2)=Nt.n
0 0 1 1 1 0 1 1 1 0 1 s s s s s P P i u u u u u i - - 1 0 t t t ew
ICLASS Type Parse t3
0 1 0 0 1 i i 0 1 0 1 i i i i i P P i 1 0 t t t i i i i i i i i memw(gp+#u16:2)=Nt.new
U
ICLASS Amode Type s5 Parse t3
N
1 0 1 0 0 i i 1 1 0 1 s s s s s P P i 1 0 t t t i i i i i i i i memw(Rs+#s11:2)=Nt.new
U
ICLASS Amode Type x5 Parse u1 t3
N
memw(Rx++I:circ(Mu))=Nt.
1 0 1 0 1 0 0 1 1 0 1 x x x x x P P u 1 0 t t t 0 - - - - - 1 -
new

1 0 1 0 1 0 0 1 1 0 1 x x x x x P P u 1 0 t t t 0 i i i i - 0 - memw(Rx++#s4:2:circ(Mu)
)=Nt.new
U
ICLASS Amode Type e5 Parse t3
N
1 0 1 0 1 0 1 1 1 0 1 e e e e e P P 0 1 0 t t t 1 - I I I I I I memw(Re=#U6)=Nt.new
U
ICLASS Amode Type N x5 Parse t3

1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 0 1 0 t t t 0 i i i i - 0 - memw(Rx++#s4:2)=Nt.new
U
ICLASS Amode Type N
u5 Parse t3

memw(Ru<<#u2+#U6)=Nt.
1 0 1 0 1 1 0 1 1 0 1 u u u u u P P i 1 0 t t t 1 i I I I I I I new

ICLASS Amode Type U x5 Parse u1 t3


N
1 0 1 0 1 1 0 1 1 0 1 x x x x x P P u 1 0 t t t 0 - - - - - - - memw(Rx++Mu)=Nt.new
memw(Rx++Mu:brev)=Nt.n
1 0 1 0 1 1 1 1 1 0 1 x x x x x P P u 1 0 t t t 0 - - - - - - -
ew

Field name Description


ICLASS Instruction class
Type Type
Parse Packet/loop parse bits
e5 Field to encode register e
s5 Field to encode register s
t3 Field to encode register t
u1 Field to encode register u
u5 Field to encode register u
x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 285


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store new-value word conditionally


Store a 32-bit register in memory at the effective address.
This instruction is conditional based on a predicate value. If the predicate is true, the instruction is
performed, otherwise it is treated as a NOP.
Syntax Behavior
if ([!]Pv[.new]) memw(#u6)=Nt.new apply_extension(#u);
EA=#u;
if ([!]Pv[.new][0]) {
*EA = Nt[.new];
} else {
NOP;
}
if ([!]Pv[.new]) apply_extension(#u);
memw(Rs+#u6:2)=Nt.new EA=Rs+#u;
if ([!]Pv[.new][0]) {
*EA = Nt[.new];
} else {
NOP;
}
if ([!]Pv[.new]) EA=Rs+(Ru<<#u);
memw(Rs+Ru<<#u2)=Nt.new if ([!]Pv[.new][0]) {
*EA = Nt[.new];
} else {
NOP;
}
if ([!]Pv[.new]) EA=Rx;
memw(Rx++#s4:2)=Nt.new if ([!]Pv[.new][0]){
Rx=Rx+#s;
*EA = Nt[.new];
} else {
NOP;
}

Class: NV (slots 0)

Notes

■ Forms of this instruction that use a new-value operand produced in the packet must execute
on slot 0.
■ This instruction can execute only in slot 0, even though it is an ST instruction.

80-N2040-53 Rev. AA 286


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t3
if (Pv)
0 0 1 1 0 1 0 0 1 0 1 s s s s s P P i u u u u u i v v 1 0 t t t memw(Rs+Ru<<#u2)=Nt.n
ew
if (!Pv)
0 0 1 1 0 1 0 1 1 0 1 s s s s s P P i u u u u u i v v 1 0 t t t memw(Rs+Ru<<#u2)=Nt.n
ew
if (Pv.new)
0 0 1 1 0 1 1 0 1 0 1 s s s s s P P i u u u u u i v v 1 0 t t t memw(Rs+Ru<<#u2)=Nt.n
ew
if (!Pv.new)
0 0 1 1 0 1 1 1 1 0 1 s s s s s P P i u u u u u i v v 1 0 t t t memw(Rs+Ru<<#u2)=Nt.n
ew
Pr
Se
ed
ICLASS ns Type s5 Parse t3
Ne
e
w
if (Pv)
0 1 0 0 0 0 0 0 1 0 1 s s s s s P P i 1 0 t t t i i i i i 0 v v memw(Rs+#u6:2)=Nt.new

v if
(Pv.new)
0 1 0 0 0 0 1 0 1 0 1 s s s s s P P i 1 0 t t t i i i i i 0 v memw(Rs+#u6:2)=Nt.new

0 1 0 0 0 1 0 0 1 0 1 s s s s s P P i 1 0 t t t i i i i i 0 v v if (!Pv)
memw(Rs+#u6:2)=Nt.new
if (!Pv.new)
0 1 0 0 0 1 1 0 1 0 1 s s s s s P P i 1 0 t t t i i i i i 0 v v memw(Rs+#u6:2)=Nt.new

ICLASS Amode Type U x5 Parse t3


N

1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 1 1 0 t t t 0 i i i i 0 v v if (Pv)
memw(Rx++#s4:2)=Nt.new
if (!Pv)
1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 1 1 0 t t t 0 i i i i 1 v v memw(Rx++#s4:2)=Nt.new

v if
1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 1 1 0 t t t 1 i i i i 0 (Pv.new)
v
memw(Rx++#s4:2)=Nt.new

1 0 1 0 1 0 1 1 1 0 1 x x x x x P P 1 1 0 t t t 1 i i i i 1 v v if (!Pv.new)
memw(Rx++#s4:2)=Nt.new
U
ICLASS Amode Type Parse t3
N
1 0 1 0 1 1 1 1 1 0 1 - - - i i P P 0 1 0 t t t 1 i i i i 0 v v if (Pv) memw(#u6)=Nt.new
1 0 1 0 1 1 1 1 1 0 1 - - - i i P P 0 1 0 t t t 1 i i i i 1 v v if (!Pv) memw(#u6)=Nt.new
if (Pv.new)
1 0 1 0 1 1 1 1 1 0 1 - - - i i P P 1 1 0 t t t 1 i i i i 0 v v memw(#u6)=Nt.new

v ifmemw(#u6)=Nt.new
(!Pv.new)
1 0 1 0 1 1 1 1 1 0 1 - - - i i P P 1 1 0 t t t 1 i i i i 1 v

Field name Description


ICLASS Instruction class
Type Type
PredNew PredNew
Sense Sense
Parse Packet/loop parse bits
s5 Field to encode register s
t3 Field to encode register t
u5 Field to encode register u
v2 Field to encode register v
x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 287


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.8 ST
The ST instruction class includes store instructions, used to store values in memory.
ST instructions are executable on slot 0 and slot 1.

Store doubleword
Store a 64-bit register pair in memory at the effective address.
Syntax Behavior
memd(Re=#U6)=Rtt apply_extension(#U);
EA=#U;
*EA = Rtt;
Re=#U;
memd(Rs+#s11:3)=Rtt apply_extension(#s);
EA=Rs+#s;
*EA = Rtt;
memd(Rs+Ru<<#u2)=Rtt EA=Rs+(Ru<<#u);
*EA = Rtt;
memd(Ru<<#u2+#U6)=Rtt apply_extension(#U);
EA=#U+(Ru<<#u);
*EA = Rtt;
memd(Rx++#s4:3)=Rtt EA=Rx;
Rx=Rx+#s;
*EA = Rtt;
memd(Rx++#s4:3:circ(Mu))=Rtt EA=Rx;
Rx=Rx=circ_add(Rx,#s,MuV);
*EA = Rtt;
memd(Rx++I:circ(Mu))=Rtt EA=Rx;
Rx=Rx=circ_add(Rx,I<<3,MuV);
*EA = Rtt;
memd(Rx++Mu)=Rtt EA=Rx;
Rx=Rx+MuV;
*EA = Rtt;
memd(Rx++Mu:brev)=Rtt EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
*EA = Rtt;
memd(gp+#u16:3)=Rtt apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
*EA = Rtt;

80-N2040-53 Rev. AA 288


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: ST (slots 0,1)

Intrinsics

memd(Rx++#s4:3:circ(Mu))=R void Q6_memd_IMP_circ(void** StartAddress, Word32


tt Is4_3, Word32 Mu, Word64 Rtt, void* BaseAddress)
memd(Rx++I:circ(Mu))=Rtt void Q6_memd_MP_circ(void** StartAddress, Word32
Mu, Word64 Rtt, void* BaseAddress)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t5
0 0 1 1 1 0 1 1 1 1 0 s s s s s P P i u u u u u i - - t t t t t memd(Rs+Ru<<#u2)=Rtt
ICLASS Type Parse t5
0 1 0 0 1 i i 0 1 1 0 i i i i i P P i t t t t t i i i i i i i i memd(gp+#u16:3)=Rtt

ICLASS Amode Type U s5 Parse t5


N
1 0 1 0 0 i i 1 1 1 0 s s s s s P P i t t t t t i i i i i i i i memd(Rs+#s11:3)=Rtt
U
ICLASS Amode Type N x5 Parse u1 t5

1 0 1 0 1 0 0 1 1 1 0 x x x x x P P u t t t t t 0 - - - - - 1 - memd(Rx++I:circ(Mu))=Rtt
memd(Rx++#s4:3:circ(Mu))
1 0 1 0 1 0 0 1 1 1 0 x x x x x P P u t t t t t 0 i i i i - 0 - =Rtt
U
ICLASS Amode Type e5 Parse t5
N
1 0 1 0 1 0 1 1 1 1 0 e e e e e P P 0 t t t t t 1 - I I I I I I memd(Re=#U6)=Rtt

ICLASS Amode Type U x5 Parse t5


N
1 0 1 0 1 0 1 1 1 1 0 x x x x x P P 0 t t t t t 0 i i i i - 0 - memd(Rx++#s4:3)=Rtt
U
ICLASS Amode Type u5 Parse t5
N
1 0 1 0 1 1 0 1 1 1 0 u u u u u P P i t t t t t 1 i I I I I I I memd(Ru<<#u2+#U6)=Rtt
U
ICLASS Amode Type x5 Parse u1 t5
N
1 0 1 0 1 1 0 1 1 1 0 x x x x x P P u t t t t t 0 - - - - - - - memd(Rx++Mu)=Rtt
1 0 1 0 1 1 1 1 1 1 0 x x x x x P P u t t t t t 0 - - - - - - - memd(Rx++Mu:brev)=Rtt

Field name Description


ICLASS Instruction class
Type Type
Parse Packet/loop parse bits
e5 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
u5 Field to encode register u
x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 289


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store-release doubleword
Store a 64-bit register pair in memory at the effective address. The store-release memory
operation is observed after all preceding memory operations have been observed at the local
point of serialization. A different order may be observed at the global point of serialization. (see
Ordering and Synchronization).
When the :st (same domain) option is specified, the preceding memory operations are those that
were committed on any thread with the same consistency domain before this instruction was
committed.
When the :at (all threads) option is specified, the preceding memory operations are those that
were committed on any thread before this instruction was committed.
The store release address is limited to certain memory regions. The following are excluded
memory regions: AHB memory space, AXI M2 memory space, Hexagon memory cut-out is
excluded with the exception of addressable TCM and VTCM memory, and memory with the CCCC
types 2, 3, or 4 are excluded. The :st option does not apply to cache operation by index or global
cache operation. The :st option does not apply a consistency domain to vector operations, but
instead uses a per hardware thread ordering scope.

Syntax Behavior
memd_rl(Rs):at=Rtt EA=Rs;
*EA = Rtt
memd_rl(Rs):st=Rtt EA=Rs;
*EA = Rtt

Class: ST (slots 0)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type s5 Parse t5 d2
N
1 0 1 0 0 0 0 0 1 1 1 s s s s s P P 0 t t t t t - - 0 0 1 0 d d memd_rl(Rs):at=Rtt
1 0 1 0 0 0 0 0 1 1 1 s s s s s P P 0 t t t t t - - 1 0 1 0 d d memd_rl(Rs):st=Rtt

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Amode Amode
Type Type
UN Unsigned

80-N2040-53 Rev. AA 290


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store doubleword conditionally


Store a 64-bit register pair in memory at the effective address.
This instruction is conditional based on a predicate value. If the predicate is true, the instruction is
performed, otherwise it is treated as a NOP.
Syntax Behavior
if ([!]Pv[.new]) memd(#u6)=Rtt apply_extension(#u);
EA=#u;
if ([!]Pv[.new][0]) {
*EA = Rtt;
} else {
NOP;
}
if ([!]Pv[.new]) memd(Rs+#u6:3)=Rtt apply_extension(#u);
EA=Rs+#u;
if ([!]Pv[.new][0]) {
*EA = Rtt;
} else {
NOP;
}
if ([!]Pv[.new]) memd(Rs+Ru<<#u2)=Rtt EA=Rs+(Ru<<#u);
if ([!]Pv[.new][0]) {
*EA = Rtt;
} else {
NOP;
}
if ([!]Pv[.new]) memd(Rx++#s4:3)=Rtt EA=Rx;
if ([!]Pv[.new][0]){
Rx=Rx+#s;
*EA = Rtt;
} else {
NOP;
}

Class: ST (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t5
if (Pv)
0 0 1 1 0 1 0 0 1 1 0 s s s s s P P i u u u u u i v v t t t t t
memd(Rs+Ru<<#u2)=Rtt
if (!Pv)
0 0 1 1 0 1 0 1 1 1 0 s s s s s P P i u u u u u i v v t t t t t
memd(Rs+Ru<<#u2)=Rtt
if (Pv.new)
0 0 1 1 0 1 1 0 1 1 0 s s s s s P P i u u u u u i v v t t t t t
memd(Rs+Ru<<#u2)=Rtt
if (!Pv.new)
0 0 1 1 0 1 1 1 1 1 0 s s s s s P P i u u u u u i v v t t t t t
memd(Rs+Ru<<#u2)=Rtt
Pr
Se ed
ICLASS ns Type s5 Parse t5
Ne
e w

0 1 0 0 0 0 0 0 1 1 0 s s s s s P P i t t t t t i i i i i 0 v v if (Pv)
memd(Rs+#u6:3)=Rtt

80-N2040-53 Rev. AA 291


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
if (Pv.new)
0 1 0 0 0 0 1 0 1 1 0 s s s s s P P i t t t t t i i i i i 0 v v
memd(Rs+#u6:3)=Rtt

0 1 0 0 0 1 0 0 1 1 0 s s s s s P P i t t t t t i i i i i 0 v v if (!Pv)
memd(Rs+#u6:3)=Rtt
if (!Pv.new)
0 1 0 0 0 1 1 0 1 1 0 s s s s s P P i t t t t t i i i i i 0 v v
memd(Rs+#u6:3)=Rtt
U
ICLASS Amode Type N x5 Parse t5

if (Pv)
1 0 1 0 1 0 1 1 1 1 0 x x x x x P P 1 t t t t t 0 i i i i 0 v v
memd(Rx++#s4:3)=Rtt
if (!Pv)
1 0 1 0 1 0 1 1 1 1 0 x x x x x P P 1 t t t t t 0 i i i i 1 v v
memd(Rx++#s4:3)=Rtt
if (Pv.new)
1 0 1 0 1 0 1 1 1 1 0 x x x x x P P 1 t t t t t 1 i i i i 0 v v
memd(Rx++#s4:3)=Rtt

1 0 1 0 1 0 1 1 1 1 0 x x x x x P P 1 t t t t t 1 i i i i 1 v v if (!Pv.new)
memd(Rx++#s4:3)=Rtt
U
ICLASS Amode Type Parse t5
N
1 0 1 0 1 1 1 1 1 1 0 - - - i i P P 0 t t t t t 1 i i i i 0 v v if (Pv) memd(#u6)=Rtt
1 0 1 0 1 1 1 1 1 1 0 - - - i i P P 0 t t t t t 1 i i i i 1 v v if (!Pv) memd(#u6)=Rtt
1 0 1 0 1 1 1 1 1 1 0 - - - i i P P 1 t t t t t 1 i i i i 0 v v if (Pv.new) memd(#u6)=Rtt
1 0 1 0 1 1 1 1 1 1 0 - - - i i P P 1 t t t t t 1 i i i i 1 v v if (!Pv.new) memd(#u6)=Rtt

Field name Description


ICLASS Instruction class
Type Type
PredNew PredNew
Sense Sense
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t
u5 Field to encode register u
v2 Field to encode register v
x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 292


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store byte
Store the least-significant byte in a source register at the effective address.
Syntax Behavior
memb(Re=#U6)=Rt apply_extension(#U);
EA=#U;
*EA = Rt.b[0];
Re=#U;
memb(Rs+#s11:0)=Rt apply_extension(#s);
EA=Rs+#s;
*EA = Rt.b[0];
memb(Rs+#u6:0)=#S8 EA=Rs+#u;
apply_extension(#S);
*EA = #S;
memb(Rs+Ru<<#u2)=Rt EA=Rs+(Ru<<#u);
*EA = Rt.b[0];
memb(Ru<<#u2+#U6)=Rt apply_extension(#U);
EA=#U+(Ru<<#u);
*EA = Rt.b[0];
memb(Rx++#s4:0)=Rt EA=Rx;
Rx=Rx+#s;
*EA = Rt.b[0];
memb(Rx++#s4:0:circ(Mu))=Rt EA=Rx;
Rx=Rx=circ_add(Rx,#s,MuV);
*EA = Rt.b[0];
memb(Rx++I:circ(Mu))=Rt EA=Rx;
Rx=Rx=circ_add(Rx,I<<0,MuV);
*EA = Rt.b[0];
memb(Rx++Mu)=Rt EA=Rx;
Rx=Rx+MuV;
*EA = Rt.b[0];
memb(Rx++Mu:brev)=Rt EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
*EA = Rt.b[0];
memb(gp+#u16:0)=Rt apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
*EA = Rt.b[0];

Class: ST (slots 0,1)

Intrinsics

memb(Rx++#s4:0:circ(Mu))= void Q6_memb_IMR_circ(void** StartAddress, Word32


Rt Is4_0, Word32 Mu, Word32 Rt, void* BaseAddress)
memb(Rx++I:circ(Mu))=Rt void Q6_memb_MR_circ(void** StartAddress, Word32 Mu,
Word32 Rt, void* BaseAddress)

80-N2040-53 Rev. AA 293


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t5
0 0 1 1 1 0 1 1 0 0 0 s s s s s P P i u u u u u i - - t t t t t memb(Rs+Ru<<#u2)=Rt
ICLASS s5 Parse
0 0 1 1 1 1 0 - - 0 0 s s s s s P P I i i i i i i I I I I I I I memb(Rs+#u6:0)=#S8
ICLASS Type Parse t5
0 1 0 0 1 i i 0 0 0 0 i i i i i P P i t t t t t i i i i i i i i memb(gp+#u16:0)=Rt
U
ICLASS Amode Type s5 Parse t5
N
1 0 1 0 0 i i 1 0 0 0 s s s s s P P i t t t t t i i i i i i i i memb(Rs+#s11:0)=Rt
U
ICLASS Amode Type x5 Parse u1 t5
N
1 0 1 0 1 0 0 1 0 0 0 x x x x x P P u t t t t t 0 - - - - - 1 - memb(Rx++I:circ(Mu))=Rt
memb(Rx++#s4:0:circ(Mu))
1 0 1 0 1 0 0 1 0 0 0 x x x x x P P u t t t t t 0 i i i i - 0 -
=Rt
U
ICLASS Amode Type e5 Parse t5
N
1 0 1 0 1 0 1 1 0 0 0 e e e e e P P 0 t t t t t 1 - I I I I I I memb(Re=#U6)=Rt
U
ICLASS Amode Type x5 Parse t5
N
1 0 1 0 1 0 1 1 0 0 0 x x x x x P P 0 t t t t t 0 i i i i - 0 - memb(Rx++#s4:0)=Rt
U
ICLASS Amode Type u5 Parse t5
N
1 0 1 0 1 1 0 1 0 0 0 u u u u u P P i t t t t t 1 i I I I I I I memb(Ru<<#u2+#U6)=Rt
U
ICLASS Amode Type x5 Parse u1 t5
N
1 0 1 0 1 1 0 1 0 0 0 x x x x x P P u t t t t t 0 - - - - - - - memb(Rx++Mu)=Rt
1 0 1 0 1 1 1 1 0 0 0 x x x x x P P u t t t t t 0 - - - - - - - memb(Rx++Mu:brev)=Rt

Field name Description


ICLASS Instruction class
Type Type
Parse Packet/loop parse bits
e5 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
u5 Field to encode register u
x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 294


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store byte conditionally


Store the least-significant byte in a source register at the effective address.
This instruction is conditional based on a predicate value. If the predicate is true, the instruction is
performed, otherwise it is treated as a NOP.
Syntax Behavior
if ([!]Pv[.new]) memb(#u6)=Rt apply_extension(#u);
EA=#u;
if ([!]Pv[.new][0]) {
*EA = Rt.b[0];
} else {
NOP;
}
if ([!]Pv[.new]) memb(Rs+#u6:0)=#S6 EA=Rs+#u;
if ([!]Pv[.new][0]){
apply_extension(#S);
*EA = #S;
} else {
NOP;
}
if ([!]Pv[.new]) memb(Rs+#u6:0)=Rt apply_extension(#u);
EA=Rs+#u;
if ([!]Pv[.new][0]) {
*EA = Rt.b[0];
} else {
NOP;
}
if ([!]Pv[.new]) memb(Rs+Ru<<#u2)=Rt EA=Rs+(Ru<<#u);
if ([!]Pv[.new][0]) {
*EA = Rt.b[0];
} else {
NOP;
}
if ([!]Pv[.new]) memb(Rx++#s4:0)=Rt EA=Rx;
if ([!]Pv[.new][0]){
Rx=Rx+#s;
*EA = Rt.b[0];
} else {
NOP;
}

80-N2040-53 Rev. AA 295


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: ST (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t5
if (Pv)
0 0 1 1 0 1 0 0 0 0 0 s s s s s P P i u u u u u i v v t t t t t memb(Rs+Ru<<#u2)=Rt
if (!Pv)
0 0 1 1 0 1 0 1 0 0 0 s s s s s P P i u u u u u i v v t t t t t
memb(Rs+Ru<<#u2)=Rt

0 0 1 1 0 1 1 0 0 0 0 s s s s s P P i u u u u u i v v t t t t t if (Pv.new)
memb(Rs+Ru<<#u2)=Rt
if (!Pv.new)
0 0 1 1 0 1 1 1 0 0 0 s s s s s P P i u u u u u i v v t t t t t
memb(Rs+Ru<<#u2)=Rt
ICLASS s5 Parse
if (Pv)
0 0 1 1 1 0 0 0 0 0 0 s s s s s P P I i i i i i i v v I I I I I
memb(Rs+#u6:0)=#S6
if (!Pv)
0 0 1 1 1 0 0 0 1 0 0 s s s s s P P I i i i i i i v v I I I I I
memb(Rs+#u6:0)=#S6
if (Pv.new)
0 0 1 1 1 0 0 1 0 0 0 s s s s s P P I i i i i i i v v I I I I I
memb(Rs+#u6:0)=#S6
if (!Pv.new)
0 0 1 1 1 0 0 1 1 0 0 s s s s s P P I i i i i i i v v I I I I I
memb(Rs+#u6:0)=#S6
Pr
Se ed
ICLASS ns Ne Type s5 Parse t5
e
w
0 1 0 0 0 0 0 0 0 0 0 s s s s s P P i t t t t t i i i i i 0 v v if (Pv) memb(Rs+#u6:0)=Rt

0 1 0 0 0 0 1 0 0 0 0 s s s s s P P i t t t t t i i i i i 0 v v if (Pv.new)
memb(Rs+#u6:0)=Rt
if (!Pv)
0 1 0 0 0 1 0 0 0 0 0 s s s s s P P i t t t t t i i i i i 0 v v memb(Rs+#u6:0)=Rt

0 1 0 0 0 1 1 0 0 0 0 s s s s s P P i t t t t t i i i i i 0 v v ifmemb(Rs+#u6:0)=Rt
(!Pv.new)

U
ICLASS Amode Type x5 Parse t5
N
if (Pv)
1 0 1 0 1 0 1 1 0 0 0 x x x x x P P 1 t t t t t 0 i i i i 0 v v memb(Rx++#s4:0)=Rt

1 0 1 0 1 0 1 1 0 0 0 x x x x x P P 1 t t t t t 0 i i i i 1 v v if (!Pv)
memb(Rx++#s4:0)=Rt
if (Pv.new)
1 0 1 0 1 0 1 1 0 0 0 x x x x x P P 1 t t t t t 1 i i i i 0 v v memb(Rx++#s4:0)=Rt

v ifmemb(Rx++#s4:0)=Rt
(!Pv.new)
1 0 1 0 1 0 1 1 0 0 0 x x x x x P P 1 t t t t t 1 i i i i 1 v

U
ICLASS Amode Type Parse t5
N
1 0 1 0 1 1 1 1 0 0 0 - - - i i P P 0 t t t t t 1 i i i i 0 v v if (Pv) memb(#u6)=Rt
1 0 1 0 1 1 1 1 0 0 0 - - - i i P P 0 t t t t t 1 i i i i 1 v v if (!Pv) memb(#u6)=Rt
1 0 1 0 1 1 1 1 0 0 0 - - - i i P P 1 t t t t t 1 i i i i 0 v v if (Pv.new) memb(#u6)=Rt
1 0 1 0 1 1 1 1 0 0 0 - - - i i P P 1 t t t t t 1 i i i i 1 v v if (!Pv.new) memb(#u6)=Rt

Field name Description


ICLASS Instruction class
Type Type
PredNew PredNew
Sense Sense
Parse Packet/loop parse bits
s5 Field to encode register s

80-N2040-53 Rev. AA 296


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


t5 Field to encode register t
u5 Field to encode register u
v2 Field to encode register v
x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 297


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store halfword
Store the upper or lower 16-bits of a source register at the effective address.
Syntax Behavior
memh(Re=#U6)=Rt.H apply_extension(#U);
EA=#U;
*EA = Rt.h[1];
Re=#U;
memh(Re=#U6)=Rt apply_extension(#U);
EA=#U;
*EA = Rt.h[0];
Re=#U;
memh(Rs+#s11:1)=Rt.H apply_extension(#s);
EA=Rs+#s;
*EA = Rt.h[1];
memh(Rs+#s11:1)=Rt apply_extension(#s);
EA=Rs+#s;
*EA = Rt.h[0];
memh(Rs+#u6:1)=#S8 EA=Rs+#u;
apply_extension(#S);
*EA = #S;
memh(Rs+Ru<<#u2)=Rt.H EA=Rs+(Ru<<#u);
*EA = Rt.h[1];
memh(Rs+Ru<<#u2)=Rt EA=Rs+(Ru<<#u);
*EA = Rt.h[0];
memh(Ru<<#u2+#U6)=Rt.H apply_extension(#U);
EA=#U+(Ru<<#u);
*EA = Rt.h[1];
memh(Ru<<#u2+#U6)=Rt apply_extension(#U);
EA=#U+(Ru<<#u);
*EA = Rt.h[0];
memh(Rx++#s4:1)=Rt.H EA=Rx;
Rx=Rx+#s;
*EA = Rt.h[1];
memh(Rx++#s4:1)=Rt EA=Rx;
Rx=Rx+#s;
*EA = Rt.h[0];
memh(Rx++#s4:1:circ(Mu))=Rt. EA=Rx;
H Rx=Rx=circ_add(Rx,#s,MuV);
*EA = Rt.h[1];
memh(Rx++#s4:1:circ(Mu))=Rt EA=Rx;
Rx=Rx=circ_add(Rx,#s,MuV);
*EA = Rt.h[0];
memh(Rx++I:circ(Mu))=Rt.H EA=Rx;
Rx=Rx=circ_add(Rx,I<<1,MuV);
*EA = Rt.h[1];
memh(Rx++I:circ(Mu))=Rt EA=Rx;
Rx=Rx=circ_add(Rx,I<<1,MuV);
*EA = Rt.h[0];

80-N2040-53 Rev. AA 298


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
memh(Rx++Mu)=Rt.H EA=Rx;
Rx=Rx+MuV;
*EA = Rt.h[1];
memh(Rx++Mu)=Rt EA=Rx;
Rx=Rx+MuV;
*EA = Rt.h[0];
memh(Rx++Mu:brev)=Rt.H EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
*EA = Rt.h[1];
memh(Rx++Mu:brev)=Rt EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
*EA = Rt.h[0];
memh(gp+#u16:1)=Rt.H apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
*EA = Rt.h[1];
memh(gp+#u16:1)=Rt apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
*EA = Rt.h[0];

Class: ST (slots 0,1)

Intrinsics

memh(Rx++#s4:1:circ(Mu))=Rt. void Q6_memh_IMRh_circ(void** StartAddress, Word32


H Is4_1, Word32 Mu, Word32 Rt, void* BaseAddress)
memh(Rx++#s4:1:circ(Mu))=Rt void Q6_memh_IMR_circ(void** StartAddress, Word32
Is4_1, Word32 Mu, Word32 Rt, void* BaseAddress)
memh(Rx++I:circ(Mu))=Rt.H void Q6_memh_MRh_circ(void** StartAddress, Word32
Mu, Word32 Rt, void* BaseAddress)
memh(Rx++I:circ(Mu))=Rt void Q6_memh_MR_circ(void** StartAddress, Word32
Mu, Word32 Rt, void* BaseAddress)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t5
0 0 1 1 1 0 1 1 0 1 0 s s s s s P P i u u u u u i - - t t t t t memh(Rs+Ru<<#u2)=Rt
0 0 1 1 1 0 1 1 0 1 1 s s s s s P P i u u u u u i - - t t t t t memh(Rs+Ru<<#u2)=Rt.H
ICLASS s5 Parse
0 0 1 1 1 1 0 - - 0 1 s s s s s P P I i i i i i i I I I I I I I memh(Rs+#u6:1)=#S8
ICLASS Type Parse t5
0 1 0 0 1 i i 0 0 1 0 i i i i i P P i t t t t t i i i i i i i i memh(gp+#u16:1)=Rt
0 1 0 0 1 i i 0 0 1 1 i i i i i P P i t t t t t i i i i i i i i memh(gp+#u16:1)=Rt.H
U
ICLASS Amode Type s5 Parse t5
N
1 0 1 0 0 i i 1 0 1 0 s s s s s P P i t t t t t i i i i i i i i memh(Rs+#s11:1)=Rt
1 0 1 0 0 i i 1 0 1 1 s s s s s P P i t t t t t i i i i i i i i memh(Rs+#s11:1)=Rt.H

80-N2040-53 Rev. AA 299


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type x5 Parse u1 t5
N
1 0 1 0 1 0 0 1 0 1 0 x x x x x P P u t t t t t 0 - - - - - 1 - memh(Rx++I:circ(Mu))=Rt
memh(Rx++#s4:1:circ(Mu))
1 0 1 0 1 0 0 1 0 1 0 x x x x x P P u t t t t t 0 i i i i - 0 -
=Rt

1 0 1 0 1 0 0 1 0 1 1 x x x x x P P u t t t t t 0 - - - - - 1 - memh(Rx++I:circ(Mu))=Rt.
H
memh(Rx++#s4:1:circ(Mu))
1 0 1 0 1 0 0 1 0 1 1 x x x x x P P u t t t t t 0 i i i i - 0 -
=Rt.H
U
ICLASS Amode Type N e5 Parse t5

1 0 1 0 1 0 1 1 0 1 0 e e e e e P P 0 t t t t t 1 - I I I I I I memh(Re=#U6)=Rt
U
ICLASS Amode Type x5 Parse t5
N
1 0 1 0 1 0 1 1 0 1 0 x x x x x P P 0 t t t t t 0 i i i i - 0 - memh(Rx++#s4:1)=Rt
U
ICLASS Amode Type e5 Parse t5
N
1 0 1 0 1 0 1 1 0 1 1 e e e e e P P 0 t t t t t 1 - I I I I I I memh(Re=#U6)=Rt.H
U
ICLASS Amode Type x5 Parse t5
N
1 0 1 0 1 0 1 1 0 1 1 x x x x x P P 0 t t t t t 0 i i i i - 0 - memh(Rx++#s4:1)=Rt.H
U
ICLASS Amode Type u5 Parse t5
N
1 0 1 0 1 1 0 1 0 1 0 u u u u u P P i t t t t t 1 i I I I I I I memh(Ru<<#u2+#U6)=Rt
U
ICLASS Amode Type N x5 Parse u1 t5

1 0 1 0 1 1 0 1 0 1 0 x x x x x P P u t t t t t 0 - - - - - - - memh(Rx++Mu)=Rt
U
ICLASS Amode Type u5 Parse t5
N
memh(Ru<<#u2+#U6)=Rt.
1 0 1 0 1 1 0 1 0 1 1 u u u u u P P i t t t t t 1 i I I I I I I
H
U
ICLASS Amode Type x5 Parse u1 t5
N
1 0 1 0 1 1 0 1 0 1 1 x x x x x P P u t t t t t 0 - - - - - - - memh(Rx++Mu)=Rt.H
1 0 1 0 1 1 1 1 0 1 0 x x x x x P P u t t t t t 0 - - - - - - - memh(Rx++Mu:brev)=Rt
1 0 1 0 1 1 1 1 0 1 1 x x x x x P P u t t t t t 0 - - - - - - - memh(Rx++Mu:brev)=Rt.H

Field name Description


ICLASS Instruction class
Type Type
Parse Packet/loop parse bits
e5 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
u5 Field to encode register u
x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 300


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store halfword conditionally


Store the upper or lower 16-bits of a source register in memory at the effective address.
This instruction is conditional based on a predicate value. If the predicate is true, the instruction is
performed, otherwise it is treated as a NOP.
Syntax Behavior
if ([!]Pv[.new]) memh(#u6)=Rt.H apply_extension(#u);
EA=#u;
if ([!]Pv[.new][0]) {
*EA = Rt.h[1];
} else {
NOP;
}
if ([!]Pv[.new]) memh(#u6)=Rt apply_extension(#u);
EA=#u;
if ([!]Pv[.new][0]) {
*EA = Rt.h[0];
} else {
NOP;
}
if ([!]Pv[.new]) memh(Rs+#u6:1)=#S6 EA=Rs+#u;
if ([!]Pv[.new][0]){
apply_extension(#S);
*EA = #S;
} else {
NOP;
}
if ([!]Pv[.new]) memh(Rs+#u6:1)=Rt.H apply_extension(#u);
EA=Rs+#u;
if ([!]Pv[.new][0]) {
*EA = Rt.h[1];
} else {
NOP;
}
if ([!]Pv[.new]) memh(Rs+#u6:1)=Rt apply_extension(#u);
EA=Rs+#u;
if ([!]Pv[.new][0]) {
*EA = Rt.h[0];
} else {
NOP;
}
if ([!]Pv[.new]) EA=Rs+(Ru<<#u);
memh(Rs+Ru<<#u2)=Rt.H if ([!]Pv[.new][0]) {
*EA = Rt.h[1];
} else {
NOP;
}

80-N2040-53 Rev. AA 301


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
if ([!]Pv[.new]) memh(Rs+Ru<<#u2)=Rt EA=Rs+(Ru<<#u);
if ([!]Pv[.new][0]) {
*EA = Rt.h[0];
} else {
NOP;
}
if ([!]Pv[.new]) memh(Rx++#s4:1)=Rt.H EA=Rx;
if ([!]Pv[.new][0]){
Rx=Rx+#s;
*EA = Rt.h[1];
} else {
NOP;
}
if ([!]Pv[.new]) memh(Rx++#s4:1)=Rt EA=Rx;
if ([!]Pv[.new][0]){
Rx=Rx+#s;
*EA = Rt.h[0];
} else {
NOP;
}

Class: ST (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t5

0 0 1 1 0 1 0 0 0 1 0 s s s s s P P i u u u u u i v v t t t t t if (Pv)
memh(Rs+Ru<<#u2)=Rt
if (Pv)
0 0 1 1 0 1 0 0 0 1 1 s s s s s P P i u u u u u i v v t t t t t memh(Rs+Ru<<#u2)=Rt.H

0 0 1 1 0 1 0 1 0 1 0 s s s s s P P i u u u u u i v v t t t t t if (!Pv)
memh(Rs+Ru<<#u2)=Rt
if (!Pv)
0 0 1 1 0 1 0 1 0 1 1 s s s s s P P i u u u u u i v v t t t t t
memh(Rs+Ru<<#u2)=Rt.H
if (Pv.new)
0 0 1 1 0 1 1 0 0 1 0 s s s s s P P i u u u u u i v v t t t t t
memh(Rs+Ru<<#u2)=Rt
if (Pv.new)
0 0 1 1 0 1 1 0 0 1 1 s s s s s P P i u u u u u i v v t t t t t
memh(Rs+Ru<<#u2)=Rt.H
if (!Pv.new)
0 0 1 1 0 1 1 1 0 1 0 s s s s s P P i u u u u u i v v t t t t t
memh(Rs+Ru<<#u2)=Rt
if (!Pv.new)
0 0 1 1 0 1 1 1 0 1 1 s s s s s P P i u u u u u i v v t t t t t
memh(Rs+Ru<<#u2)=Rt.H
ICLASS s5 Parse
if (Pv)
0 0 1 1 1 0 0 0 0 0 1 s s s s s P P I i i i i i i v v I I I I I
memh(Rs+#u6:1)=#S6
if (!Pv)
0 0 1 1 1 0 0 0 1 0 1 s s s s s P P I i i i i i i v v I I I I I
memh(Rs+#u6:1)=#S6
if (Pv.new)
0 0 1 1 1 0 0 1 0 0 1 s s s s s P P I i i i i i i v v I I I I I
memh(Rs+#u6:1)=#S6
if (!Pv.new)
0 0 1 1 1 0 0 1 1 0 1 s s s s s P P I i i i i i i v v I I I I I
memh(Rs+#u6:1)=#S6
Pr
Se
ed
ICLASS ns Type s5 Parse t5
Ne
e
w
0 1 0 0 0 0 0 0 0 1 0 s s s s s P P i t t t t t i i i i i 0 v v if (Pv) memh(Rs+#u6:1)=Rt

80-N2040-53 Rev. AA 302


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
if (Pv)
0 1 0 0 0 0 0 0 0 1 1 s s s s s P P i t t t t t i i i i i 0 v v
memh(Rs+#u6:1)=Rt.H

0 1 0 0 0 0 1 0 0 1 0 s s s s s P P i t t t t t i i i i i 0 v v if (Pv.new)
memh(Rs+#u6:1)=Rt
if (Pv.new)
0 1 0 0 0 0 1 0 0 1 1 s s s s s P P i t t t t t i i i i i 0 v v
memh(Rs+#u6:1)=Rt.H
if (!Pv)
0 1 0 0 0 1 0 0 0 1 0 s s s s s P P i t t t t t i i i i i 0 v v memh(Rs+#u6:1)=Rt

if (!Pv)
0 1 0 0 0 1 0 0 0 1 1 s s s s s P P i t t t t t i i i i i 0 v v
memh(Rs+#u6:1)=Rt.H
if (!Pv.new)
0 1 0 0 0 1 1 0 0 1 0 s s s s s P P i t t t t t i i i i i 0 v v
memh(Rs+#u6:1)=Rt
if (!Pv.new)
0 1 0 0 0 1 1 0 0 1 1 s s s s s P P i t t t t t i i i i i 0 v v
memh(Rs+#u6:1)=Rt.H

ICLASS Amode Type U x5 Parse t5


N
if (Pv)
1 0 1 0 1 0 1 1 0 1 0 x x x x x P P 1 t t t t t 0 i i i i 0 v v
memh(Rx++#s4:1)=Rt
if (!Pv)
1 0 1 0 1 0 1 1 0 1 0 x x x x x P P 1 t t t t t 0 i i i i 1 v v memh(Rx++#s4:1)=Rt

v if
(Pv.new)
1 0 1 0 1 0 1 1 0 1 0 x x x x x P P 1 t t t t t 1 i i i i 0 v
memh(Rx++#s4:1)=Rt

1 0 1 0 1 0 1 1 0 1 0 x x x x x P P 1 t t t t t 1 i i i i 1 v v if (!Pv.new)
memh(Rx++#s4:1)=Rt
if (Pv)
1 0 1 0 1 0 1 1 0 1 1 x x x x x P P 1 t t t t t 0 i i i i 0 v v memh(Rx++#s4:1)=Rt.H

v if
(!Pv)
1 0 1 0 1 0 1 1 0 1 1 x x x x x P P 1 t t t t t 0 i i i i 1 v
memh(Rx++#s4:1)=Rt.H

1 0 1 0 1 0 1 1 0 1 1 x x x x x P P 1 t t t t t 1 i i i i 0 v v if (Pv.new)
memh(Rx++#s4:1)=Rt.H
if (!Pv.new)
1 0 1 0 1 0 1 1 0 1 1 x x x x x P P 1 t t t t t 1 i i i i 1 v v memh(Rx++#s4:1)=Rt.H

U
ICLASS Amode Type Parse t5
N
1 0 1 0 1 1 1 1 0 1 0 - - - i i P P 0 t t t t t 1 i i i i 0 v v if (Pv) memh(#u6)=Rt
1 0 1 0 1 1 1 1 0 1 0 - - - i i P P 0 t t t t t 1 i i i i 1 v v if (!Pv) memh(#u6)=Rt
1 0 1 0 1 1 1 1 0 1 0 - - - i i P P 1 t t t t t 1 i i i i 0 v v if (Pv.new) memh(#u6)=Rt
1 0 1 0 1 1 1 1 0 1 0 - - - i i P P 1 t t t t t 1 i i i i 1 v v if (!Pv.new) memh(#u6)=Rt
1 0 1 0 1 1 1 1 0 1 1 - - - i i P P 0 t t t t t 1 i i i i 0 v v if (Pv) memh(#u6)=Rt.H
1 0 1 0 1 1 1 1 0 1 1 - - - i i P P 0 t t t t t 1 i i i i 1 v v if (!Pv) memh(#u6)=Rt.H
if (Pv.new)
1 0 1 0 1 1 1 1 0 1 1 - - - i i P P 1 t t t t t 1 i i i i 0 v v memh(#u6)=Rt.H

if (!Pv.new)
1 0 1 0 1 1 1 1 0 1 1 - - - i i P P 1 t t t t t 1 i i i i 1 v v memh(#u6)=Rt.H

Field name Description


ICLASS Instruction class
Type Type
PredNew PredNew
Sense Sense
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t
u5 Field to encode register u
v2 Field to encode register v

80-N2040-53 Rev. AA 303


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 304


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Release
The release memory operation is observed after all preceding memory operations have been
observed at the local point of serialization. A different order can be observed at the global point
of serialization (see Ordering and Synchronization). No data is modified by this instruction.
When the :st (same domain) option is specified, the preceding memory operations are those that
were committed on any thread with the same consistency domain before this instruction was
committed.
When the :at (all threads) option is specified, the preceding memory operations are those that
were committed on any thread before this instruction was committed.
The store release address is limited to certain memory regions. The following memory regions are
excluded:
■ AHB memory space
■ AXI M2 memory space
■ Hexagon memory cut-out is excluded with the exception of addressable TCM and VTCM
memory
■ Memory with the CCCC types 2, 3, or 4
The :st option does not apply to cache operation by index or global cache operation. The :st
option does not apply a consistency domain to vector operations, but instead uses a per
hardware thread ordering scope.
Syntax Behavior
release(Rs):at EA=Rs;
*EA = Rs
release(Rs):st EA=Rs;
*EA = Rs

Class: ST (slots 0)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

ICLASS Amode Type U s5 Parse t5 d2


N
1 0 1 0 0 0 0 0 1 1 1 s s s s s P P 0 t t t t t - - 0 0 1 1 d d release(Rs):at
1 0 1 0 0 0 0 0 1 1 1 s s s s s P P 0 t t t t t - - 1 0 1 1 d d release(Rs):st

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Amode Amode
Type Type
UN Unsigned

80-N2040-53 Rev. AA 305


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store word
Store a 32-bit register in memory at the effective address.

Syntax Behavior
memw(Re=#U6)=Rt apply_extension(#U);
EA=#U;
*EA = Rt;
Re=#U;
memw(Rs+#s11:2)=Rt apply_extension(#s);
EA=Rs+#s;
*EA = Rt;
memw(Rs+#u6:2)=#S8 EA=Rs+#u;
apply_extension(#S);
*EA = #S;
memw(Rs+Ru<<#u2)=Rt EA=Rs+(Ru<<#u);
*EA = Rt;
memw(Ru<<#u2+#U6)=Rt apply_extension(#U);
EA=#U+(Ru<<#u);
*EA = Rt;
memw(Rx++#s4:2)=Rt EA=Rx;
Rx=Rx+#s;
*EA = Rt;
memw(Rx++#s4:2:circ(Mu))=Rt EA=Rx;
Rx=Rx=circ_add(Rx,#s,MuV);
*EA = Rt;
memw(Rx++I:circ(Mu))=Rt EA=Rx;
Rx=Rx=circ_add(Rx,I<<2,MuV);
*EA = Rt;
memw(Rx++Mu)=Rt EA=Rx;
Rx=Rx+MuV;
*EA = Rt;
memw(Rx++Mu:brev)=Rt EA=Rx.h[1] | brev(Rx.h[0]);
Rx=Rx+MuV;
*EA = Rt;
memw(gp+#u16:2)=Rt apply_extension(#u);
EA=(Constant_extended ? (0) : GP)+#u;
*EA = Rt;

Class: ST (slots 0,1)

Intrinsics

memw(Rx++#s4:2:circ(Mu))= void Q6_memw_IMR_circ(void** StartAddress, Word32


Rt Is4_2, Word32 Mu, Word32 Rt, void* BaseAddress)
memw(Rx++I:circ(Mu))=Rt void Q6_memw_MR_circ(void** StartAddress, Word32 Mu,
Word32 Rt, void* BaseAddress)

80-N2040-53 Rev. AA 306


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t5
0 0 1 1 1 0 1 1 1 0 0 s s s s s P P i u u u u u i - - t t t t t memw(Rs+Ru<<#u2)=Rt
ICLASS s5 Parse
0 0 1 1 1 1 0 - - 1 0 s s s s s P P I i i i i i i I I I I I I I memw(Rs+#u6:2)=#S8
ICLASS Type Parse t5
0 1 0 0 1 i i 0 1 0 0 i i i i i P P i t t t t t i i i i i i i i memw(gp+#u16:2)=Rt
U
ICLASS Amode Type s5 Parse t5
N
1 0 1 0 0 i i 1 1 0 0 s s s s s P P i t t t t t i i i i i i i i memw(Rs+#s11:2)=Rt
U
ICLASS Amode Type x5 Parse u1 t5
N
1 0 1 0 1 0 0 1 1 0 0 x x x x x P P u t t t t t 0 - - - - - 1 - memw(Rx++I:circ(Mu))=Rt
memw(Rx++#s4:2:circ(Mu)
1 0 1 0 1 0 0 1 1 0 0 x x x x x P P u t t t t t 0 i i i i - 0 -
)=Rt
U
ICLASS Amode Type e5 Parse t5
N
1 0 1 0 1 0 1 1 1 0 0 e e e e e P P 0 t t t t t 1 - I I I I I I memw(Re=#U6)=Rt
U
ICLASS Amode Type x5 Parse t5
N
1 0 1 0 1 0 1 1 1 0 0 x x x x x P P 0 t t t t t 0 i i i i - 0 - memw(Rx++#s4:2)=Rt
U
ICLASS Amode Type u5 Parse t5
N
1 0 1 0 1 1 0 1 1 0 0 u u u u u P P i t t t t t 1 i I I I I I I memw(Ru<<#u2+#U6)=Rt
U
ICLASS Amode Type x5 Parse u1 t5
N
1 0 1 0 1 1 0 1 1 0 0 x x x x x P P u t t t t t 0 - - - - - - - memw(Rx++Mu)=Rt
1 0 1 0 1 1 1 1 1 0 0 x x x x x P P u t t t t t 0 - - - - - - - memw(Rx++Mu:brev)=Rt

Field name Description


ICLASS Instruction class
Type Type
Parse Packet/loop parse bits
e5 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
u1 Field to encode register u
u5 Field to encode register u
x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 307


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store-release word
Store a 32-bit register in memory at the effective address. The store-release memory operation is
observed after all preceding memory operations have been observed at the local point of
serialization. A different order can be observed at the global point of serialization (see Ordering
and Synchronization).
When the :st (same domain) option is specified, the preceding memory operations are those that
were committed on any thread with the same consistency domain before this instruction was
committed.
When the :at (all threads) option is specified, the preceding memory operations are those that
were committed on any thread before this instruction was committed.
The store release address is limited to certain memory regions. The following are excluded
memory regions: AHB memory space, AXI M2 memory space, Hexagon memory cut-out is
excluded with the exception of addressable TCM and VTCM memory, and memory with the CCCC
types 2, 3, or 4 are excluded. The :st option does not apply to cache operation by index or global
cache operation. The :st option does not apply a consistency domain to vector operations, but
instead uses a per hardware thread ordering scope.

Syntax Behavior
memw_rl(Rs):at=Rt EA=Rs;
*EA = Rt
memw_rl(Rs):st=Rt EA=Rs;
*EA = Rt

Class: ST (slots 0)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type s5 Parse t5 d2
N
1 0 1 0 0 0 0 0 1 0 1 s s s s s P P - t t t t t - - 0 0 1 0 d d memw_rl(Rs):at=Rt
1 0 1 0 0 0 0 0 1 0 1 s s s s s P P - t t t t t - - 1 0 1 0 d d memw_rl(Rs):st=Rt

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Amode Amode
Type Type
UN Unsigned

80-N2040-53 Rev. AA 308


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store word conditionally


Store a 32-bit register in memory at the effective address.
This instruction is conditional based on a predicate value. If the predicate is true, the instruction is
performed, otherwise it is treated as a NOP.

Syntax Behavior
if ([!]Pv[.new]) memw(#u6)=Rt apply_extension(#u);
EA=#u;
if ([!]Pv[.new][0]) {
*EA = Rt;
} else {
NOP;
}
if ([!]Pv[.new]) memw(Rs+#u6:2)=#S6 EA=Rs+#u;
if ([!]Pv[.new][0]){
apply_extension(#S);
*EA = #S;
} else {
NOP;
}
if ([!]Pv[.new]) memw(Rs+#u6:2)=Rt apply_extension(#u);
EA=Rs+#u;
if ([!]Pv[.new][0]) {
*EA = Rt;
} else {
NOP;
}
if ([!]Pv[.new]) memw(Rs+Ru<<#u2)=Rt EA=Rs+(Ru<<#u);
if ([!]Pv[.new][0]) {
*EA = Rt;
} else {
NOP;
}
if ([!]Pv[.new]) memw(Rx++#s4:2)=Rt EA=Rx;
if ([!]Pv[.new][0]){
Rx=Rx+#s;
*EA = Rt;
} else {
NOP;
}

80-N2040-53 Rev. AA 309


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: ST (slots 0,1)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse u5 t5
if (Pv)
0 0 1 1 0 1 0 0 1 0 0 s s s s s P P i u u u u u i v v t t t t t memw(Rs+Ru<<#u2)=Rt
if (!Pv)
0 0 1 1 0 1 0 1 1 0 0 s s s s s P P i u u u u u i v v t t t t t
memw(Rs+Ru<<#u2)=Rt

0 0 1 1 0 1 1 0 1 0 0 s s s s s P P i u u u u u i v v t t t t t if (Pv.new)
memw(Rs+Ru<<#u2)=Rt
if (!Pv.new)
0 0 1 1 0 1 1 1 1 0 0 s s s s s P P i u u u u u i v v t t t t t
memw(Rs+Ru<<#u2)=Rt
ICLASS s5 Parse
if (Pv)
0 0 1 1 1 0 0 0 0 1 0 s s s s s P P I i i i i i i v v I I I I I
memw(Rs+#u6:2)=#S6
if (!Pv)
0 0 1 1 1 0 0 0 1 1 0 s s s s s P P I i i i i i i v v I I I I I
memw(Rs+#u6:2)=#S6
if (Pv.new)
0 0 1 1 1 0 0 1 0 1 0 s s s s s P P I i i i i i i v v I I I I I
memw(Rs+#u6:2)=#S6
if (!Pv.new)
0 0 1 1 1 0 0 1 1 1 0 s s s s s P P I i i i i i i v v I I I I I
memw(Rs+#u6:2)=#S6
Pr
Se ed
ICLASS ns Ne Type s5 Parse t5
e
w
if (Pv)
0 1 0 0 0 0 0 0 1 0 0 s s s s s P P i t t t t t i i i i i 0 v v memw(Rs+#u6:2)=Rt

v if
(Pv.new)
0 1 0 0 0 0 1 0 1 0 0 s s s s s P P i t t t t t i i i i i 0 v
memw(Rs+#u6:2)=Rt

0 1 0 0 0 1 0 0 1 0 0 s s s s s P P i t t t t t i i i i i 0 v v if (!Pv)
memw(Rs+#u6:2)=Rt
if (!Pv.new)
0 1 0 0 0 1 1 0 1 0 0 s s s s s P P i t t t t t i i i i i 0 v v memw(Rs+#u6:2)=Rt

ICLASS Amode Type U x5 Parse t5


N

1 0 1 0 1 0 1 1 1 0 0 x x x x x P P 1 t t t t t 0 i i i i 0 v v if (Pv)
memw(Rx++#s4:2)=Rt
if (!Pv)
1 0 1 0 1 0 1 1 1 0 0 x x x x x P P 1 t t t t t 0 i i i i 1 v v memw(Rx++#s4:2)=Rt

1 0 1 0 1 0 1 1 1 0 0 x x x x x P P 1 t t t t t 1 i i i i 0 v v if (Pv.new)
memw(Rx++#s4:2)=Rt
if (!Pv.new)
1 0 1 0 1 0 1 1 1 0 0 x x x x x P P 1 t t t t t 1 i i i i 1 v v memw(Rx++#s4:2)=Rt

U
ICLASS Amode Type Parse t5
N
1 0 1 0 1 1 1 1 1 0 0 - - - i i P P 0 t t t t t 1 i i i i 0 v v if (Pv) memw(#u6)=Rt
1 0 1 0 1 1 1 1 1 0 0 - - - i i P P 0 t t t t t 1 i i i i 1 v v if (!Pv) memw(#u6)=Rt
1 0 1 0 1 1 1 1 1 0 0 - - - i i P P 1 t t t t t 1 i i i i 0 v v if (Pv.new) memw(#u6)=Rt
1 0 1 0 1 1 1 1 1 0 0 - - - i i P P 1 t t t t t 1 i i i i 1 v v if (!Pv.new) memw(#u6)=Rt

Field name Description


ICLASS Instruction class
Type Type
PredNew PredNew
Sense Sense
Parse Packet/loop parse bits
s5 Field to encode register s

80-N2040-53 Rev. AA 310


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


t5 Field to encode register t
u5 Field to encode register u
v2 Field to encode register v
x5 Field to encode register x
Amode Amode
UN Unsigned

80-N2040-53 Rev. AA 311


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Allocate stack frame


Allocate a stack frame on the call stack. This instruction first pushes LR and FP to the top of stack.
It then subtracts an unsigned immediate from SP to allocate room for local variables. FP is set to
the address of the old frame pointer on the stack.
The following figure shows the stack layout.
Stack in memory

Saved LR
Saved FP

Higher address
Procedure local
data on stack

Stack frame

Saved LR
Saved FP FP register

Procedure local
data on stack

SP register
Lower address

Unallocated stack

Syntax Behavior
allocframe(#u11:3) Assembler mapped to:
"allocframe(r29,#u11:3):raw"
allocframe(Rx,#u11:3):raw EA=Rx+-8;
*EA = frame_scramble((LR << 32) | FP);
FP=EA;
frame_check_limit(EA-#u);
Rx = EA-#u;

80-N2040-53 Rev. AA 312


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: ST (slots 0)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type x5 Parse
N
1 0 1 0 0 0 0 0 1 0 0 x x x x x P P 0 0 0 i i i i i i i i i i i allocframe(Rx,#u11:3):raw

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
x5 Field to encode register x
Amode Amode
Type Type
UN Unsigned

80-N2040-53 Rev. AA 313


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.9 SYSTEM
The SYSTEM instruction class includes instructions for managing system resources.

11.9.1 SYSTEM USER


The SYSTEM USER instruction subclass includes instructions which allow user access to system
resources.

Load locked
This memory lock instruction performs a word or double-word locked load.
This instruction returns the contents of the memory at address Rs and also reserves a lock
reservation at that address. For more information, see Atomic operations.

Syntax Behavior
Rd=memw_locked(Rs) EA=Rs;
Rd = *EA;
Rdd=memd_locked(Rs) EA=Rs;
Rdd = *EA;

Class: SYSTEM (slots 0)

Notes
■ This instruction is only grouped with ALU32 or nonfloating-point XTYPE instructions.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type s5 Parse d5
N
1 0 0 1 0 0 1 0 0 0 0 s s s s s P P 0 0 0 - - - 0 0 0 d d d d d Rd=memw_locked(Rs)
1 0 0 1 0 0 1 0 0 0 0 s s s s s P P 0 1 0 - - - 0 0 0 d d d d d Rdd=memd_locked(Rs)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s

80-N2040-53 Rev. AA 314


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Store conditional
This memory lock instruction performs a word or double-word conditional store operation.
If the address reservation is held by this thread and there have been no intervening accesses to
the memory location, the store is performed and the predicate is set to true. Otherwise, the store
is not performed and the predicate returns false. For more information, see Atomic operations.
Syntax Behavior
memd_locked(Rs,Pd)=Rtt EA=Rs;
if (lock_valid) {
*EA = Rtt;
Pd = 0xff;
lock_valid = 0;
} else {
Pd = 0;
}
memw_locked(Rs,Pd)=Rt EA=Rs;
if (lock_valid) {
*EA = Rt;
Pd = 0xff;
lock_valid = 0;
} else {
Pd = 0;
}

Class: SYSTEM (slots 0)

Notes

■ This instruction may only be grouped with ALU32 or non-floating-point XTYPE instructions.
■ The predicate generated by this instruction can not be used as a .new predicate, nor can it be
automatically ANDed with another predicate.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

ICLASS Amode Type U s5 Parse t5 d2


N
1 0 1 0 0 0 0 0 1 0 1 s s s s s P P - t t t t t - - - - 0 0 d d memw_locked(Rs,Pd)=Rt
1 0 1 0 0 0 0 0 1 1 1 s s s s s P P 0 t t t t t - - - - 0 0 d d memd_locked(Rs,Pd)=Rtt

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Amode Amode
Type Type
UN Unsigned

80-N2040-53 Rev. AA 315


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Zero a cache line


The dczeroa instruction clears 32 bytes of memory.
If the memory is marked write-back cacheable, a cache line is allocated in the data cache and 32
bytes are cleared.
If the memory is write-through or write-back, 32 bytes of zeros are sent to memory.
This instruction is useful for efficiently handling write-only data by pre-allocating lines in the
cache.
The address must be 32-byte aligned. If not, an unaligned error exception is raised.
If this instruction appears in a packet, slot 1 must be A-type or empty.

Syntax Behavior
dczeroa(Rs) EA=Rs;
dcache_zero_addr(EA);

Class: SYSTEM (slots 0)

Notes

■ A packet containing this instruction must have slot 1 either empty or executing an ALU32
instruction.

Intrinsics

dczeroa(Rs) void Q6_dczeroa_A(Address a)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type s5 Parse
N
1 0 1 0 0 0 0 0 1 1 0 s s s s s P P 0 - - - - - - - - - - - - - dczeroa(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
Amode Amode
Type Type
UN Unsigned

80-N2040-53 Rev. AA 316


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Memory barrier
The barrier instruction establishes a memory barrier to ensure proper ordering between
load/store accesses within a consistency domain before the barrier instruction and accesses after
the barrier instruction.
All scalar loads, stores, and cache operation by address within a consistency domain before the
barrier are globally observable before any access after the barrier can be observed.
The use of this instruction is system-dependent.

Syntax Behavior
barrier memory_barrier;

Class: SYSTEM (slots 0)

Notes

■ This is a solo instruction. It must not be grouped with other instructions in a packet.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type Parse
N
1 0 1 0 1 0 0 0 0 0 0 - - - - - P P - - - - - - 0 0 0 - - - - - barrier

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
Amode Amode
Type Type
UN Unsigned

80-N2040-53 Rev. AA 317


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Breakpoint

The brkpt instruction causes the program to enter Debug mode if enabled by ISDB.
Execution control is handed to ISDB and the program does not proceed until directed by the
debugger.
If ISDB is disabled, this instruction is treated as a NOP.

Syntax Behavior
brkpt Enter Debug mode;

Class: SYSTEM (slot 3)

Notes

■ This is a solo instruction. It must not be grouped with other instructions in a packet.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS sm Parse
0 1 1 0 1 1 0 0 0 0 1 - - - - - P P - - - - - - 0 0 0 - - - - - brkpt

Field name Description


sm Supervisor mode only
ICLASS Instruction class
Parse Packet/loop parse bits

80-N2040-53 Rev. AA 318


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Data cache prefetch


The dcfetch instruction prefetches the data at address Rs + unsigned immediate.
This instruction is a hint to the memory system, and is handled in an implementation-dependent
manner.

Syntax Behavior
dcfetch(Rs) Assembler mapped to: "dcfetch(Rs+#0)"
dcfetch(Rs+#u11:3) EA=Rs+#u;
dcache_fetch(EA);

Class: SYSTEM (slots 0)

Intrinsics

dcfetch(Rs) void Q6_dcfetch_A(Address a)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type s5 Parse
N
1 0 0 1 0 1 0 0 0 0 0 s s s s s P P 0 - - i i i i i i i i i i i dcfetch(Rs+#u11:3)

Field name Description


ICLASS Instruction class
Amode Amode
Type Type
UN Unsigned
Parse Packet/loop parse bits
s5 Field to encode register s

80-N2040-53 Rev. AA 319


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Data cache maintenance user operations


Perform maintenance operations on the data cache.
The dccleaninva instruction looks up the data cache at address Rs. If this address is in the cache
and has dirty data, the data is flushed out to memory and the line is then invalidated.
The dccleana instruction looks up the data cache at address Rs. If this address is in the cache and
has dirty data, the data is flushed out to memory.
The dcinva instruction looks up the data cache at address Rs. If this address is in the cache, the
line containing the data is invalidated.
If an instruction appears in a packet, slot 1 must be A-type or empty.
In implementations that support L2 cache, these instructions operate on both L1 data and L2
caches.

Syntax Behavior
dccleana(Rs) EA=Rs;
dcache_clean_addr(EA);
dccleaninva(Rs) EA=Rs;
dcache_cleaninv_addr(EA);
dcinva(Rs) EA=Rs;
dcache_cleaninv_addr(EA);

Class: SYSTEM (slots 0)

Notes
■ A packet containing this instruction must have slot 1 either empty or executing an ALU32
instruction.

Intrinsics

dccleana(Rs) void Q6_dccleana_A(Address a)


dccleaninva(Rs) void Q6_dccleaninva_A(Address a)
dcinva(Rs) void Q6_dcinva_A(Address a)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

ICLASS Amode Type U s5 Parse


N
1 0 1 0 0 0 0 0 0 0 0 s s s s s P P - - - - - - - - - - - - - - dccleana(Rs)
1 0 1 0 0 0 0 0 0 0 1 s s s s s P P - - - - - - - - - - - - - - dcinva(Rs)
1 0 1 0 0 0 0 0 0 1 0 s s s s s P P - - - - - - - - - - - - - - dccleaninva(Rs)

80-N2040-53 Rev. AA 320


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
Amode Amode
Type Type
UN Unsigned

80-N2040-53 Rev. AA 321


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Send value to DIAG trace


These instructions send the sources to the external DIAG trace.

Syntax Behavior
diag(Rs)
diag0(Rss,Rtt)
diag1(Rss,Rtt)

Class: SYSTEM (slot 3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS sm s5 Parse
0 1 1 0 0 0 1 0 0 1 0 s s s s s P P - - - - - - 0 0 1 - - - - - diag(Rs)
ICLASS sm s5 Parse t5
0 1 1 0 0 0 1 0 0 1 0 s s s s s P P - t t t t t 0 1 0 - - - - - diag0(Rss,Rtt)
0 1 1 0 0 0 1 0 0 1 0 s s s s s P P - t t t t t 0 1 1 - - - - - diag1(Rss,Rtt)

Field name Description


sm Supervisor mode only
ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 322


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Instruction cache maintenance user operations


The icinva instruction looks up the address given in Rs in the instruction cache. If a translation for
Rs cannot be found, a TLB-miss-X exception with cause code 0x62 is indicated.
If a translation is found but the user does not have proper permissions to the page to invalidate,
the instruction converts to a NOP.
If a translation is found and the user has proper permissions, all ways of all 32 byte segments
matching Rs[11:5] in any instruction cache are invalidated.

Syntax Behavior
icinva(Rs) EA=Rs;
icache_inv_addr(EA);

Class: SYSTEM (slot 2)

Notes

■ This is a solo instruction. It must not be grouped with other instructions in a packet.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS s5 Parse
0 1 0 1 0 1 1 0 1 1 0 s s s s s P P 0 0 0 - - - - - - - - - - - icinva(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s

80-N2040-53 Rev. AA 323


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Instruction synchronization
The isync instruction ensures that all previous instructions have committed before continuing to
the next instruction.
This instruction should execute after the following events (when subsequent instructions must
observe the results of the event):
■ After modifying the TLB with a TLBW instruction
■ After modifying the SSR register
■ After modifying the SYSCFG register
■ After any instruction cache maintenance operation
■ After modifying the TID register

Syntax Behavior
isync instruction_sync;

Class: SYSTEM (slot 2)

Notes
■ This is a solo instruction. It must not be grouped with other instructions in a packet.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Parse
0 1 0 1 0 1 1 1 1 1 0 0 0 0 0 0 P P 0 - - - 0 0 0 0 0 0 0 0 1 0 isync

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits

80-N2040-53 Rev. AA 324


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

L2 cache prefetch
The L2fetch instruction initiates background prefetching into the L2 cache.
Rs specifies the 32-bit virtual start address. There are two forms of this instruction.
In the first form, the dimensions of the area to prefetch are encoded in source register Rt as
follows:
Rt[15:8] = Width of a fetch block in bytes.
Rt[7:0] = Height: the number of Width-sized blocks to fetch.
Rt[31:16] = Stride: an unsigned byte offset which is used to increment the pointer after each
Width-sized block is fetched.
In the second form, the operands are encoded in register pair Rtt as follows:
Rtt[31:16] = Width of a fetch block in bytes.
Rtt[15:0] = Height: the number of Width-sized blocks to fetch.
Rtt[47:32] = Stride: an unsigned byte offset that is used to increment the pointer after each
Width-sized block is fetched.
Rtt[48] = Direction. If clear, perform the prefetches in row major form, meaning fetch cache
lines in a row before proceeding to the next row. If the bit is set, prefetch in column major
form, meaning fetch all cache lines in a column before proceeding to the next column.
The following figure shows two examples of using the L2FETCH instruction.
L2FETCH for box prefetch L2FETCH for large linear prefetch

31 31 16 15 8 7 0
16 15 8 7 0

Rt Stride Width Height Rt 128 128 Lines

Rs Start Address Rs Start Address

Stride

Width

Prefetch
Height 128* Lines
Area

In the box prefetch, a 2D range of memory is defined within a larger frame. The second example
shows prefetch for a large linear area of memory, which has size Lines * 128.

80-N2040-53 Rev. AA 325


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

L2FETCH is nonblocking. After the instruction is initiated, the program continues on to the next
instruction while the prefetching is performed in the background. L2fetch can bring in either code
or data to the L2 cache. If the lines of interest are already in the L2, no action is performed. If the
lines are missing from the L2$, the hardware attempts to fetch them from the system memory.
The hardware prefetch engine continues to request all lines in the programmed memory range.
The prefetching hardware makes a best-effort to prefetch the requested data, and attempts to
perform prefetching at a lower priority than demand fetches. This prevents prefetch from adding
traffic while the system is under heavy load.
If a program initiates a new L2FETCH while an older L2FETCH operation is still pending, the new
request is queued, up to three deep. If three L2FETCHes are already pending, the oldest request is
dropped. During the time a L2 prefetch is active for a thread, the USR:PFA status bit is set to
indicate that prefetches are in progress. The programmer can use this bit to decide whether to
start a new L2FETCH before the previous one completes.
Executing an L2fetch with any subfield programmed as zero cancels all pending prefetches by the
calling thread.
The implementation is free to drop prefetches when needed.
Syntax Behavior
l2fetch(Rs,Rt) l2fetch(Rs,INFO);
l2fetch(Rs,Rtt) l2fetch(Rs,INFO);

Class: SYSTEM (slots 0)

Notes

■ This instruction can only be grouped with ALU32 or non-floating-point XTYPE instructions.

Intrinsics

l2fetch(Rs,Rt) void Q6_l2fetch_AR(Address a, Word32 Rt)


l2fetch(Rs,Rtt) void Q6_l2fetch_AP(Address a, Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

ICLASS Amode Type U s5 Parse t5


N
1 0 1 0 0 1 1 0 0 0 0 s s s s s P P - t t t t t 0 0 0 - - - - - l2fetch(Rs,Rt)
1 0 1 0 0 1 1 0 1 0 0 s s s s s P P - t t t t t - - - - - - - - l2fetch(Rs,Rtt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t
Amode Amode
Type Type

80-N2040-53 Rev. AA 326


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


UN Unsigned

80-N2040-53 Rev. AA 327


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Pause
The PAUSE instruction pauses execution for a specified period of time.
During the pause duration, the program enters a low-power state and does not fetch and execute
instructions. The instruction provides a short immediate that indicates the pause duration. The
program will pause for at most the number of cycles specified in the immediate plus 8. The
minimum pause is 0 cycles, and the maximum pause is implementation-defined.
An interrupt to the program exits the paused state.
System events, such as hardware or DMA completion, can trigger exits from Pause mode.
An implementation is free to pause for durations shorter than (immediate+8), but not longer.
This instruction is useful for implementing user-level low-power synchronization operations, such
as spin locks or wait-for-event signaling.

Syntax Behavior
pause(#u10) Pause for #u cycles;

Class: SYSTEM (slot 2)

Notes

■ This is a solo instruction. It must not be grouped with other instructions in a packet.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Parse
0 1 0 1 0 1 0 0 0 1 - - - - i i P P - i i i i i - - - i i i - - pause(#u10)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits

80-N2040-53 Rev. AA 328


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Memory thread synchronization


The syncht instruction synchronizes memory.
Outstanding memory operations, including cached and uncached loads and stores, are completed
before the processor continues to the next instruction. This ensures that certain memory
operations are performed in the desired order (for example, when accessing I/O devices).
After performing a syncht operation, the processor ceases fetching and executing instructions
from the program until all outstanding memory operations of that program complete.
In multithreaded or multi-core environments, SYNCHT is not concerned with other execution
contexts.
The use of this instruction is system-dependent.
Syntax Behavior
Rd=dmsyncht Rd = DM0;
syncht memory_synch;

Class: SYSTEM (slots 0)

Notes

■ This is a solo instruction. It must not be grouped with other instructions in a packet.
■ This is a monitor-level feature. If performed in User or Guest mode, a privilege error
exception occurs.

Intrinsics

Rd=dmsyncht Word32 Q6_R_dmsyncht()

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
U
ICLASS Amode Type Parse d5
N
1 0 1 0 1 0 0 0 0 0 0 - - - - - P P - - - - - 0 1 1 1 d d d d d Rd=dmsyncht
U
ICLASS Amode Type N Parse

1 0 1 0 1 0 0 0 0 1 0 - - - - - P P - - - - - - - - - - - - - - syncht

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
Amode Amode
Type Type
UN Unsigned

80-N2040-53 Rev. AA 329


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Send value to ETM trace


The trace instruction takes the value of register Rs and emits it to the ETM trace.
The ETM block must be enabled, and the thread must have permissions to perform tracing. The
contents of Rs are user-defined.

Syntax Behavior
trace(Rs) Send value to ETM trace;

Class: SYSTEM (slot 3)

Notes

■ This instruction may only be grouped with ALU32 or non-floating-point XTYPE instructions.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS sm s5 Parse
0 1 1 0 0 0 1 0 0 1 0 s s s s s P P - - - - - - 0 0 0 - - - - - trace(Rs)

Field name Description


sm Supervisor mode only
ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s

80-N2040-53 Rev. AA 330


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Trap
The trap instruction causes a precise exception.
Executing a trap instruction sets the EX bit in SSR to 1, which disables interrupts and enables
Supervisor mode. The program then jumps to the vector location (either TRAP0 or TRAP1). The
instruction specifies a n 8-bit immediate field. This field is copied into the system status register
cause field.
Upon returning from the service routine with a RTE, execution resumes at the packet after the
TRAP instruction.
These instructions are generally intended for user code to request services from the operating
system. Two TRAP instructions are provided so the OS can optimize for fast service routines and
slower service routines.
Syntax Behavior
trap0(#u8) SSR.CAUSE = #u;
TRAP "0";
trap1(#u8) Assembler mapped to: "trap1(R0,#u8)"
trap1(Rx,#u8) if (!can_handle_trap1_virtinsn(#u)) {
SSR.CAUSE = #u;
TRAP "1";
} else if (#u == 1) {
VMRTE;
} else if (#u == 3) {
VMSETIE;
} else if (#u == 4) {
VMGETIE;
} else if (#u == 6) {
VMSPSWAP;

Class: SYSTEM (slot 2)

Notes

■ This is a solo instruction. It must not be grouped with other instructions in a packet.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Parse
0 1 0 1 0 1 0 0 0 0 - - - - - - P P - i i i i i - - - i i i - - trap0(#u8)
ICLASS x5 Parse
0 1 0 1 0 1 0 0 1 0 - x x x x x P P - i i i i i - - - i i i - - trap1(Rx,#u8)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
x5 Field to encode register x

80-N2040-53 Rev. AA 331


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Unpause
The unpause instruction resumes threads whose execution has stalled with a pause instruction.

Syntax Behavior
unpause Unpause threads currently in pause state;

Class: SYSTEM (slot 2)

Notes

■ This is a solo instruction. It must not be grouped with other instructions in a packet.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS Parse
0 1 0 1 0 1 1 1 1 1 1 - - - - - P P 0 1 - - - - 0 0 0 - - - - - unpause

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits

80-N2040-53 Rev. AA 332


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.10 XTYPE
The XTYPE instruction class includes instructions that perform most of the data processing done
by the Hexagon processor.
XTYPE instructions are executable on slot 2 or slot 3.

11.10.1 XTYPE ALU


The XTYPE ALU instruction subclass includes instructions that perform arithmetic and logical
operations.

Absolute value doubleword


Take the absolute value of the 64-bit source register and place it in the destination register.

Syntax Behavior
Rdd=abs(Rss) Rdd = ABS(Rss);

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=abs(Rss) Word64 Q6_P_abs_P(Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 1 0 0 s s s s s P P - - - - - - 1 1 0 d d d d d Rdd=abs(Rss)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 333


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Absolute value word


Take the absolute value of the source register and place it in the destination register.
The 32-bit absolute value is available with optional saturation. The single case of saturation is
when the source register is equal to 0x8000_0000, the destination saturates to 0x7fff_ffff.

Syntax Behavior
Rd=abs(Rs)[:sat] Rd = [sat32](ABS(sxt32->64(Rs)));

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=abs(Rs) Word32 Q6_R_abs_R(Word32 Rs)


Rd=abs(Rs):sat Word32 Q6_R_abs_R_sat(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 1 0 0 1 0 0 s s s s s P P - - - - - - 1 0 0 d d d d d Rd=abs(Rs)
1 0 0 0 1 1 0 0 1 0 0 s s s s s P P - - - - - - 1 0 1 d d d d d Rd=abs(Rs):sat

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 334


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Add and accumulate


Add Rs and Rt or a signed immediate, then add or subtract the resulting value. The result is saved
in Rx.
Syntax Behavior
Rd=add(Rs,add(Ru,#s6)) Rd = Rs + Ru + apply_extension(#s);
Rd=add(Rs,sub(#s6,Ru)) Rd = Rs - Ru + apply_extension(#s);
Rx+=add(Rs,#s8) apply_extension(#s);
Rx=Rx + Rs + #s;
Rx+=add(Rs,Rt) Rx=Rx + Rs + Rt;
Rx-=add(Rs,#s8) apply_extension(#s);
Rx=Rx - (Rs + #s);
Rx-=add(Rs,Rt) Rx=Rx - (Rs + Rt);

Class: XTYPE (slots 2,3)

Intrinsics

Rd=add(Rs,add(Ru,#s6)) Word32 Q6_R_add_add_RRI(Word32 Rs, Word32 Ru,


Word32 Is6)
Rd=add(Rs,sub(#s6,Ru)) Word32 Q6_R_add_sub_RIR(Word32 Rs, Word32 Is6,
Word32 Ru)
Rx+=add(Rs,#s8) Word32 Q6_R_addacc_RI(Word32 Rx, Word32 Rs, Word32
Is8)
Rx+=add(Rs,Rt) Word32 Q6_R_addacc_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx-=add(Rs,#s8) Word32 Q6_R_addnac_RI(Word32 Rx, Word32 Rs, Word32
Is8)
Rx-=add(Rs,Rt) Word32 Q6_R_addnac_RR(Word32 Rx, Word32 Rs, Word32
Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse d5 u5
1 1 0 1 1 0 1 1 0 i i s s s s s P P i d d d d d i i i u u u u u Rd=add(Rs,add(Ru,#s6))
1 1 0 1 1 0 1 1 1 i i s s s s s P P i d d d d d i i i u u u u u Rd=add(Rs,sub(#s6,Ru))
ICLASS RegType MajOp s5 Parse MinOp x5
1 1 1 0 0 0 1 0 0 - - s s s s s P P 0 i i i i i i i i x x x x x Rx+=add(Rs,#s8)
1 1 1 0 0 0 1 0 1 - - s s s s s P P 0 i i i i i i i i x x x x x Rx-=add(Rs,#s8)
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 1 1 1 0 0 0 s s s s s P P 0 t t t t t 0 0 1 x x x x x Rx+=add(Rs,Rt)
1 1 1 0 1 1 1 1 1 0 0 s s s s s P P 0 t t t t t 0 0 1 x x x x x Rx-=add(Rs,Rt)

80-N2040-53 Rev. AA 335


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


RegType Register type
ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
u5 Field to encode register u
x5 Field to encode register x

80-N2040-53 Rev. AA 336


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Add doublewords
The first form of this instruction adds two 32-bit registers. If the result overflows 32 bits, the
result is saturated to 0x7FFF_FFFF for a positive result, or 0x8000_0000 for a negative result. A 32-
bit nonsaturating register add is a ALU32-class instruction and can execute on any slot.
The second instruction form sign-extends a 32-bit register Rt to 64-bits and performs a 64-bit add
with Rss. The result is stored in Rdd.
The third instruction form adds 64-bit registers Rss and Rtt and places the result in Rdd.
The final instruction form adds two 64-bit registers Rss and Rtt. If the result overflows 64 bits, it is
saturated to 0x7fff_ffff_ffff_ffff for a positive result, or 0x8000_0000_0000_0000 for a negative
result.
Syntax Behavior
Rd=add(Rs,Rt):sat:depreca Rd=sat32(Rs+Rt);
ted
Rdd=add(Rs,Rtt) if ("Rs & 1") {
Assembler mapped to: "Rdd=add(Rss,Rtt):raw:hi";
} else {
Assembler mapped to: "Rdd=add(Rss,Rtt):raw:lo";
}
Rdd=add(Rss,Rtt) Rdd=Rss+Rtt;
Rdd=add(Rss,Rtt):raw:hi Rdd=Rtt+sxt32->64(Rss.w[1]);

Rdd=add(Rss,Rtt):raw:lo Rdd=Rtt+sxt32->64(Rss.w[0]);

Rdd=add(Rss,Rtt):sat Rdd=sat64(Rss+Rtt);

Class: XTYPE (slots 2,3)

Notes
■ If saturation occurs during execution of this instruction (a result is clamped to either
maximum or minimum values), the OVF bit in the Status Register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 337


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rdd=add(Rs,Rtt) Word64 Q6_P_add_RP(Word32 Rs, Word64 Rtt)


Rdd=add(Rss,Rtt) Word64 Q6_P_add_PP(Word64 Rss, Word64 Rtt)
Rdd=add(Rss,Rtt):sat Word64 Q6_P_add_PP_sat(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 0 0 0 s s s s s P P - t t t t t 1 1 1 d d d d d Rdd=add(Rss,Rtt)
1 1 0 1 0 0 1 1 0 1 1 s s s s s P P - t t t t t 1 0 1 d d d d d Rdd=add(Rss,Rtt):sat
1 1 0 1 0 0 1 1 0 1 1 s s s s s P P - t t t t t 1 1 0 d d d d d Rdd=add(Rss,Rtt):raw:lo
1 1 0 1 0 0 1 1 0 1 1 s s s s s P P - t t t t t 1 1 1 d d d d d Rdd=add(Rss,Rtt):raw:hi
Rd=add(Rs,Rt):sat:depreca
1 1 0 1 0 1 0 1 1 0 0 s s s s s P P - t t t t t 0 - - d d d d d ted

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 338


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Add halfword
Perform a 16-bit add with optional saturation, and place the result in either the upper or lower
half of a register. If the result goes in the upper half, the sources are any high or low halfword of
Rs and Rt. The lower 16 bits of the result are zeroed.
If the result is placed in the lower 16 bits of Rd, the Rs source can be either high or low, but the
other source must be the low halfword of Rt. In this case, the upper halfword of Rd is the sign-
extension of the low halfword.

Rd=add(Rs.[hl],Rt.[hl])[:sat]

Rs.H Rs.L Rs Rt.H Rt.L Rt

Mux Mux

16-bit Add
0x7FFF 0x8000

Saturate

Sign-extend Result Rd

Syntax Behavior
Rd=add(Rt.L,Rs.[HL])[:sat] Rd=[sat16](Rt.h[0]+Rs.h[01]);

Rd=add(Rt.[HL],Rs.[HL])[:sat]: Rd=([sat16](Rt.h[01]+Rs.h[01]))<<16;
<<16

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 339


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rd=add(Rt.H,Rs.H):<<16 Word32 Q6_R_add_RhRh_s16(Word32 Rt, Word32 Rs)


Rd=add(Rt.H,Rs.H):sat:<<16 Word32 Q6_R_add_RhRh_sat_s16(Word32 Rt, Word32 Rs)
Rd=add(Rt.H,Rs.L):<<16 Word32 Q6_R_add_RhRl_s16(Word32 Rt, Word32 Rs)
Rd=add(Rt.H,Rs.L):sat:<<16 Word32 Q6_R_add_RhRl_sat_s16(Word32 Rt, Word32 Rs)
Rd=add(Rt.L,Rs.H) Word32 Q6_R_add_RlRh(Word32 Rt, Word32 Rs)
Rd=add(Rt.L,Rs.H):<<16 Word32 Q6_R_add_RlRh_s16(Word32 Rt, Word32 Rs)
Rd=add(Rt.L,Rs.H):sat Word32 Q6_R_add_RlRh_sat(Word32 Rt, Word32 Rs)
Rd=add(Rt.L,Rs.H):sat:<<16 Word32 Q6_R_add_RlRh_sat_s16(Word32 Rt, Word32 Rs)
Rd=add(Rt.L,Rs.L) Word32 Q6_R_add_RlRl(Word32 Rt, Word32 Rs)
Rd=add(Rt.L,Rs.L):<<16 Word32 Q6_R_add_RlRl_s16(Word32 Rt, Word32 Rs)
Rd=add(Rt.L,Rs.L):sat Word32 Q6_R_add_RlRl_sat(Word32 Rt, Word32 Rs)
Rd=add(Rt.L,Rs.L):sat:<<16 Word32 Q6_R_add_RlRl_sat_s16(Word32 Rt, Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 1 0 1 0 0 0 s s s s s P P - t t t t t 0 0 - d d d d d Rd=add(Rt.L,Rs.L)
1 1 0 1 0 1 0 1 0 0 0 s s s s s P P - t t t t t 0 1 - d d d d d Rd=add(Rt.L,Rs.H)
1 1 0 1 0 1 0 1 0 0 0 s s s s s P P - t t t t t 1 0 - d d d d d Rd=add(Rt.L,Rs.L):sat
1 1 0 1 0 1 0 1 0 0 0 s s s s s P P - t t t t t 1 1 - d d d d d Rd=add(Rt.L,Rs.H):sat
1 1 0 1 0 1 0 1 0 1 0 s s s s s P P - t t t t t 0 0 0 d d d d d Rd=add(Rt.L,Rs.L):<<16
1 1 0 1 0 1 0 1 0 1 0 s s s s s P P - t t t t t 0 0 1 d d d d d Rd=add(Rt.L,Rs.H):<<16
1 1 0 1 0 1 0 1 0 1 0 s s s s s P P - t t t t t 0 1 0 d d d d d Rd=add(Rt.H,Rs.L):<<16
1 1 0 1 0 1 0 1 0 1 0 s s s s s P P - t t t t t 0 1 1 d d d d d Rd=add(Rt.H,Rs.H):<<16
Rd=add(Rt.L,Rs.L):sat:<<1
1 1 0 1 0 1 0 1 0 1 0 s s s s s P P - t t t t t 1 0 0 d d d d d 6

1 1 0 1 0 1 0 1 0 1 0 s s s s s P P - t t t t t 1 0 1 d d d d d Rd=add(Rt.L,Rs.H):sat:<<1
6

1 1 0 1 0 1 0 1 0 1 0 s s s s s P P - t t t t t 1 1 0 d d d d d Rd=add(Rt.H,Rs.L):sat:<<1
6
Rd=add(Rt.H,Rs.H):sat:<<1
1 1 0 1 0 1 0 1 0 1 0 s s s s s P P - t t t t t 1 1 1 d d d d d 6

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 340


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Add or subtract doublewords with carry


Add or subtract with carry. Predicate register Px is used as an extra input and output.
For adds, add the LSB of the predicate to the sum of the two input pairs.
For subtracts, the predicate is considered a not-borrow. The LSB of the predicate is added to the
first source register and the logical complement of the second argument.
The carry-out from the sum is saved in predicate Px.
These instructions allow efficient addition or subtraction of numbers larger than 64 bits.

Syntax Behavior
Rdd=add(Rss,Rtt,Px):carry PREDUSE_TIMING;
Rdd = Rss + Rtt + Px[0];
Px = carry_from_add(Rss,Rtt,Px[0]) ? 0xff : 0x00;
Rdd=sub(Rss,Rtt,Px):carry PREDUSE_TIMING;
Rdd = Rss + ~Rtt + Px[0];
Px = carry_from_add(Rss,~Rtt,Px[0]) ? 0xff : 0x00;

Class: XTYPE (slots 2,3)

Notes
■ The predicate generated by this instruction can not be used as a .new predicate, nor can it be
automatically ANDed with another predicate.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 x2 d5
1 1 0 0 0 0 1 0 1 1 0 s s s s s P P - t t t t t - x x d d d d d Rdd=add(Rss,Rtt,Px):carry
1 1 0 0 0 0 1 0 1 1 1 s s s s s P P - t t t t t - x x d d d d d Rdd=sub(Rss,Rtt,Px):carry

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x2 Field to encode register x
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 341


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Clip to unsigned
Clip input to unsigned integer.

Syntax Behavior
Rd=clip(Rs,#u5) Rd=MIN((1<<#u)- 1,MAX(Rs,-(1<<#u)));

Class: XTYPE (slots 2,3)

Notes

■ This instruction can only execute on a core with the Hexagon audio extensions

Intrinsics

Rd=clip(Rs,#u5) Word32 Q6_R_clip_RI(Word32 Rs, Word32 Iu5)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 0 0 1 1 0 s s s s s P P 0 i i i i i 1 0 1 d d d d d Rd=clip(Rs,#u5)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 342


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Logical doublewords
Perform bitwise logical AND, OR, XOR, and NOT operations.
The source and destination registers are 64-bit.
For 32-bit logical operations, see the ALU32 logical instructions.
Syntax Behavior
Rdd=and(Rss,Rtt) Rdd=Rss&Rtt;
Rdd=and(Rtt,~Rss) Rdd = (Rtt & ~Rss);
Rdd=not(Rss) Rdd=~Rss;
Rdd=or(Rss,Rtt) Rdd=Rss|Rtt;
Rdd=or(Rtt,~Rss) Rdd = (Rtt | ~Rss);
Rdd=xor(Rss,Rtt) Rdd=Rss^Rtt;

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=and(Rss,Rtt) Word64 Q6_P_and_PP(Word64 Rss, Word64 Rtt)


Rdd=and(Rtt,~Rss) Word64 Q6_P_and_PnP(Word64 Rtt, Word64 Rss)
Rdd=not(Rss) Word64 Q6_P_not_P(Word64 Rss)
Rdd=or(Rss,Rtt) Word64 Q6_P_or_PP(Word64 Rss, Word64 Rtt)
Rdd=or(Rtt,~Rss) Word64 Q6_P_or_PnP(Word64 Rtt, Word64 Rss)
Rdd=xor(Rss,Rtt) Word64 Q6_P_xor_PP(Word64 Rss, Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 1 0 0 s s s s s P P - - - - - - 1 0 0 d d d d d Rdd=not(Rss)
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 1 1 1 s s s s s P P - t t t t t 0 0 0 d d d d d Rdd=and(Rss,Rtt)
1 1 0 1 0 0 1 1 1 1 1 s s s s s P P - t t t t t 0 0 1 d d d d d Rdd=and(Rtt,~Rss)
1 1 0 1 0 0 1 1 1 1 1 s s s s s P P - t t t t t 0 1 0 d d d d d Rdd=or(Rss,Rtt)
1 1 0 1 0 0 1 1 1 1 1 s s s s s P P - t t t t t 0 1 1 d d d d d Rdd=or(Rtt,~Rss)
1 1 0 1 0 0 1 1 1 1 1 s s s s s P P - t t t t t 1 0 0 d d d d d Rdd=xor(Rss,Rtt)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 343


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


MajOp Major opcode

80-N2040-53 Rev. AA 344


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Logical-logical doublewords
Perform a logical operation of the two source operands, then perform a second logical operation
of the result with the destination register Rxx.
The source and destination registers are 64-bit.
Syntax Behavior
Rxx^=xor(Rss,Rtt) Rxx^=Rss^Rtt;

Class: XTYPE (slots 2,3)

Intrinsics

Rxx^=xor(Rss,Rtt) Word64 Q6_P_xorxacc_PP(Word64 Rxx, Word64


Rss, Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min x5
1 1 0 0 1 0 1 0 1 0 - s s s s s P P 0 t t t t t 0 0 0 x x x x x Rxx^=xor(Rss,Rtt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 345


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Logical-logical words
Perform a logical operation of the two source operands, then perform a second logical operation
of the result with the destination register Rx.
The source and destination registers are 32-bit.
Syntax Behavior
Rx=or(Ru,and(Rx,#s10)) Rx = Ru | (Rx & apply_extension(#s));
Rx[&|^]=and(Rs,Rt) Rx [|&^]= (Rs [|&^] Rt);
Rx[&|^]=and(Rs,~Rt) Rx [|&^]= (Rs [|&^] ~Rt);
Rx[&|^]=or(Rs,Rt) Rx [|&^]= (Rs [|&^] Rt);
Rx[&|^]=xor(Rs,Rt) Rx[|&^]=Rs[|&^]Rt;
Rx|=and(Rs,#s10) Rx = Rx | (Rs & apply_extension(#s));
Rx|=or(Rs,#s10) Rx = Rx | (Rs | apply_extension(#s));

80-N2040-53 Rev. AA 346


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rx&=and(Rs,Rt) Word32 Q6_R_andand_RR(Word32 Rx, Word32 Rs, Word32


Rt)
Rx&=and(Rs,~Rt) Word32 Q6_R_andand_RnR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx&=or(Rs,Rt) Word32 Q6_R_orand_RR(Word32 Rx, Word32 Rs, Word32 Rt)
Rx&=xor(Rs,Rt) Word32 Q6_R_xorand_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx=or(Ru,and(Rx,#s10)) Word32 Q6_R_or_and_RRI(Word32 Ru, Word32 Rx, Word32
Is10)
Rx^=and(Rs,Rt) Word32 Q6_R_andxacc_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx^=and(Rs,~Rt) Word32 Q6_R_andxacc_RnR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx^=or(Rs,Rt) Word32 Q6_R_orxacc_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx^=xor(Rs,Rt) Word32 Q6_R_xorxacc_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx|=and(Rs,#s10) Word32 Q6_R_andor_RI(Word32 Rx, Word32 Rs, Word32
Is10)
Rx|=and(Rs,Rt) Word32 Q6_R_andor_RR(Word32 Rx, Word32 Rs, Word32 Rt)
Rx|=and(Rs,~Rt) Word32 Q6_R_andor_RnR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx|=or(Rs,#s10) Word32 Q6_R_oror_RI(Word32 Rx, Word32 Rs, Word32
Is10)
Rx|=or(Rs,Rt) Word32 Q6_R_oror_RR(Word32 Rx, Word32 Rs, Word32 Rt)
Rx|=xor(Rs,Rt) Word32 Q6_R_xoror_RR(Word32 Rx, Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse x5
1 1 0 1 1 0 1 0 0 0 i s s s s s P P i i i i i i i i i x x x x x Rx|=and(Rs,#s10)
ICLASS RegType x5 Parse u5
1 1 0 1 1 0 1 0 0 1 i x x x x x P P i i i i i i i i i u u u u u Rx=or(Ru,and(Rx,#s10))
ICLASS RegType s5 Parse x5
1 1 0 1 1 0 1 0 1 0 i s s s s s P P i i i i i i i i i x x x x x Rx|=or(Rs,#s10)
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 1 1 1 0 0 1 s s s s s P P 0 t t t t t 0 0 0 x x x x x Rx|=and(Rs,~Rt)
1 1 1 0 1 1 1 1 0 0 1 s s s s s P P 0 t t t t t 0 0 1 x x x x x Rx&=and(Rs,~Rt)
1 1 1 0 1 1 1 1 0 0 1 s s s s s P P 0 t t t t t 0 1 0 x x x x x Rx^=and(Rs,~Rt)
1 1 1 0 1 1 1 1 0 1 0 s s s s s P P 0 t t t t t 0 0 0 x x x x x Rx&=and(Rs,Rt)
1 1 1 0 1 1 1 1 0 1 0 s s s s s P P 0 t t t t t 0 0 1 x x x x x Rx&=or(Rs,Rt)
1 1 1 0 1 1 1 1 0 1 0 s s s s s P P 0 t t t t t 0 1 0 x x x x x Rx&=xor(Rs,Rt)

80-N2040-53 Rev. AA 347


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 1 1 1 1 0 1 0 s s s s s P P 0 t t t t t 0 1 1 x x x x x Rx|=and(Rs,Rt)
1 1 1 0 1 1 1 1 1 0 0 s s s s s P P 0 t t t t t 0 1 1 x x x x x Rx^=xor(Rs,Rt)
1 1 1 0 1 1 1 1 1 1 0 s s s s s P P 0 t t t t t 0 0 0 x x x x x Rx|=or(Rs,Rt)
1 1 1 0 1 1 1 1 1 1 0 s s s s s P P 0 t t t t t 0 0 1 x x x x x Rx|=xor(Rs,Rt)
1 1 1 0 1 1 1 1 1 1 0 s s s s s P P 0 t t t t t 0 1 0 x x x x x Rx^=and(Rs,Rt)
1 1 1 0 1 1 1 1 1 1 0 s s s s s P P 0 t t t t t 0 1 1 x x x x x Rx^=or(Rs,Rt)

Field name Description


RegType Register type
ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t
u5 Field to encode register u
x5 Field to encode register x

80-N2040-53 Rev. AA 348


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Maximum words
Select either the signed or unsigned maximum of two source registers and place in a destination
register Rdd.
Syntax Behavior
Rd=max(Rs,Rt) Rd = max(Rs,Rt);
Rd=maxu(Rs,Rt) Rd = max(Rs.uw[0],Rt.uw[0]);

Class: XTYPE (slots 2,3)

Intrinsics

Rd=max(Rs,Rt) Word32 Q6_R_max_RR(Word32 Rs, Word32 Rt)


Rd=maxu(Rs,Rt) UWord32 Q6_R_maxu_RR(Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 1 0 1 1 1 0 s s s s s P P - t t t t t 0 - - d d d d d Rd=max(Rs,Rt)
1 1 0 1 0 1 0 1 1 1 0 s s s s s P P - t t t t t 1 - - d d d d d Rd=maxu(Rs,Rt)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 349


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Maximum doublewords
Select either the signed or unsigned maximum of two 64-bit source registers and place in a
destination register.
Syntax Behavior
Rdd=max(Rss,Rtt) Rdd = max(Rss,Rtt);
Rdd=maxu(Rss,Rtt) Rdd = max(Rss.u64,Rtt.u64);

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=max(Rss,Rtt) Word64 Q6_P_max_PP(Word64 Rss, Word64 Rtt)


Rdd=maxu(Rss,Rtt) UWord64 Q6_P_maxu_PP(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 1 1 0 s s s s s P P - t t t t t 1 0 0 d d d d d Rdd=max(Rss,Rtt)
1 1 0 1 0 0 1 1 1 1 0 s s s s s P P - t t t t t 1 0 1 d d d d d Rdd=maxu(Rss,Rtt)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 350


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Minimum words
Select either the signed or unsigned minimum of two source registers and place in destination
register Rd.
Syntax Behavior
Rd=min(Rt,Rs) Rd = min(Rt,Rs);
Rd=minu(Rt,Rs) Rd = min(Rt.uw[0],Rs.uw[0]);

Class: XTYPE (slots 2,3)

Intrinsics

Rd=min(Rt,Rs) Word32 Q6_R_min_RR(Word32 Rt, Word32 Rs)


Rd=minu(Rt,Rs) UWord32 Q6_R_minu_RR(Word32 Rt, Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 1 0 1 1 0 1 s s s s s P P - t t t t t 0 - - d d d d d Rd=min(Rt,Rs)
1 1 0 1 0 1 0 1 1 0 1 s s s s s P P - t t t t t 1 - - d d d d d Rd=minu(Rt,Rs)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 351


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Minimum doublewords
Select either the signed or unsigned minimum of two 64-bit source registers and place in the
destination register Rdd.
Syntax Behavior
Rdd=min(Rtt,Rss) Rdd = min(Rtt,Rss);
Rdd=minu(Rtt,Rss) Rdd = min(Rtt.u64,Rss.u64);

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=min(Rtt,Rss) Word64 Q6_P_min_PP(Word64 Rtt, Word64 Rss)


Rdd=minu(Rtt,Rss) UWord64 Q6_P_minu_PP(Word64 Rtt, Word64
Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 1 0 1 s s s s s P P - t t t t t 1 1 0 d d d d d Rdd=min(Rtt,Rss)
1 1 0 1 0 0 1 1 1 0 1 s s s s s P P - t t t t t 1 1 1 d d d d d Rdd=minu(Rtt,Rss)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 352


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Modulo wrap
Wrap the Rs value into the modulo range from 0 to Rt.
If Rs is greater than or equal to Rt, wrap it to the bottom of the range by subtracting Rt.
If Rs is less than zero, wrap it to the top of the range by adding Rt.
Otherwise, when Rs fits within the range, no adjustment is necessary. The result is returned in
register Rd.
Syntax Behavior
Rd=modwrap(Rs,Rt) if (Rs < 0) {
Rd = Rs + Rt.uw[0];
} else if (Rs.uw[0] >= Rt.uw[0]) {
Rd = Rs - Rt.uw[0];
} else {
Rd = Rs;
}

Class: XTYPE (slots 2,3)

Intrinsics

Rd=modwrap(Rs,Rt) Word32 Q6_R_modwrap_RR(Word32 Rs, Word32


Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 1 1 1 s s s s s P P - t t t t t 1 1 1 d d d d d Rd=modwrap(Rs,Rt)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 353


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Negate
The first form of this instruction performs a negate on a 32-bit register with saturation. If the
input is 0x80000000, the result is saturated to 0x7fffffff. The nonsaturating 32-bit register negate
is a ALU32-class instruction and can execute on any slot.
The second form of this instruction negates a 64-bit source register and places the result in
destination Rdd.

Syntax Behavior
Rd=neg(Rs):sat Rd = sat32(-Rs.s64);

Rdd=neg(Rss) Rdd = -Rss;

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=neg(Rs):sat Word32 Q6_R_neg_R_sat(Word32 Rs)


Rdd=neg(Rss) Word64 Q6_P_neg_P(Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 1 0 0 s s s s s P P - - - - - - 1 0 1 d d d d d Rdd=neg(Rss)
1 0 0 0 1 1 0 0 1 0 0 s s s s s P P - - - - - - 1 1 0 d d d d d Rd=neg(Rs):sat

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 354


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Round
Perform either arithmetic (.5 is rounded up) or convergent (.5 is rounded towards even) rounding
to any bit location.
Arithmetic rounding has optional saturation. In this version, the result is saturated to a 32-bit
number after adding the rounding constant. After the rounding and saturation have been
performed, the final result is right shifted using a sign-extending shift.
Syntax Behavior
Rd=cround(Rs,#u5) Rd = (#u==0)?Rs:convround(Rs,2**(#u-1))>>#u;
Rd=cround(Rs,Rt) Rd = (zxt5->32(Rt)==0)?Rs:convround(Rs,2**(zxt5->32(Rt)-
1))>>zxt5->32(Rt);

Rd=round(Rs,#u5)[:sa Rd = ([sat32]((#u==0)?(Rs):round(Rs,2**(#u-1))))>>#u;
t]
Rd=round(Rs,Rt)[:sat Rd = ([sat32]((zxt5->32(Rt)==0)?(Rs):round(Rs,2**(zxt5-
] >32(Rt)-1))))>>zxt5->32(Rt);

Rd=round(Rss):sat tmp=sat64(Rss+0x080000000ULL);
Rd = tmp.w[1];
Rdd=cround(Rss,#u6) if (#u == 0) {
Rdd = Rss;
} else if ((Rss & (size8s_t)((1LL << (#u - 1)) - 1LL)) ==
0) {
src_128 = sxt64->128(Rss);
rndbit_128 = sxt64->128(1LL);
rndbit_128 = (rndbit_128 << #u);
rndbit_128 = (rndbit_128 & src_128);
rndbit_128 = (size8s_t) (rndbit_128 >> 1);
tmp128 = src_128+rndbit_128;
tmp128 = (size8s_t) (tmp128 >> #u);
Rdd = sxt128->64(tmp128);
} else {
size16s_t rndbit_128 = sxt64->128((1LL << (#u - 1)));
size16s_t src_128 = sxt64->128(Rss);
size16s_t tmp128 = src_128+rndbit_128;
tmp128 = (size8s_t) (tmp128 >> #u);
Rdd = sxt128->64(tmp128);
}

80-N2040-53 Rev. AA 355


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rdd=cround(Rss,Rt) if (zxt6->32(Rt) == 0) {
Rdd = Rss;
} else if ((Rss & (size8s_t)((1LL << (zxt6->32(Rt) - 1)) -
1LL)) == 0) {
src_128 = sxt64->128(Rss);
rndbit_128 = sxt64->128(1LL);
rndbit_128 = (rndbit_128 << zxt6->32(Rt));
rndbit_128 = (rndbit_128 & src_128);
rndbit_128 = (size8s_t) (rndbit_128 >> 1);
tmp128 = src_128+rndbit_128;
tmp128 = (size8s_t) (tmp128 >> zxt6->32(Rt));
Rdd = sxt128->64(tmp128);
} else {
size16s_t rndbit_128 = sxt64->128((1LL << (zxt6->32(Rt)
- 1)));
size16s_t src_128 = sxt64->128(Rss);
size16s_t tmp128 = src_128+rndbit_128;
tmp128 = (size8s_t) (tmp128 >> zxt6->32(Rt));
Rdd = sxt128->64(tmp128);
}

Class: XTYPE (slots 2,3)

Notes

■ This instruction can only execute on a core with the Hexagon audio extensions
■ If saturation occurs during execution of this instruction (a result is clamped to either
maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 356


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rd=cround(Rs,#u5) Word32 Q6_R_cround_RI(Word32 Rs, Word32


Iu5)
Rd=cround(Rs,Rt) Word32 Q6_R_cround_RR(Word32 Rs, Word32 Rt)
Rd=round(Rs,#u5) Word32 Q6_R_round_RI(Word32 Rs, Word32 Iu5)
Rd=round(Rs,#u5):sat Word32 Q6_R_round_RI_sat(Word32 Rs, Word32
Iu5)
Rd=round(Rs,Rt) Word32 Q6_R_round_RR(Word32 Rs, Word32 Rt)
Rd=round(Rs,Rt):sat Word32 Q6_R_round_RR_sat(Word32 Rs, Word32
Rt)
Rd=round(Rss):sat Word32 Q6_R_round_P_sat(Word64 Rss)
Rdd=cround(Rss,#u6) Word64 Q6_P_cround_PI(Word64 Rss, Word32
Iu6)
Rdd=cround(Rss,Rt) Word64 Q6_P_cround_PR(Word64 Rss, Word32
Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 0 0 1 1 0 s s s s s P P - - - - - - 0 0 1 d d d d d Rd=round(Rss):sat
1 0 0 0 1 1 0 0 1 1 1 s s s s s P P 0 i i i i i 0 0 - d d d d d Rd=cround(Rs,#u5)
1 0 0 0 1 1 0 0 1 1 1 s s s s s P P 0 i i i i i 1 0 - d d d d d Rd=round(Rs,#u5)
1 0 0 0 1 1 0 0 1 1 1 s s s s s P P 0 i i i i i 1 1 - d d d d d Rd=round(Rs,#u5):sat
1 0 0 0 1 1 0 0 1 1 1 s s s s s P P i i i i i i 0 1 - d d d d d Rdd=cround(Rss,#u6)
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 1 1 0 1 1 - s s s s s P P - t t t t t 0 0 - d d d d d Rd=cround(Rs,Rt)
1 1 0 0 0 1 1 0 1 1 - s s s s s P P - t t t t t 0 1 - d d d d d Rdd=cround(Rss,Rt)
1 1 0 0 0 1 1 0 1 1 - s s s s s P P - t t t t t 1 0 - d d d d d Rd=round(Rs,Rt)
1 1 0 0 0 1 1 0 1 1 - s s s s s P P - t t t t t 1 1 - d d d d d Rd=round(Rs,Rt):sat

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
MajOp Major opcode
MinOp Minor opcode
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 357


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Subtract doublewords
Subtract the 64-bit register Rss from register Rtt.
Syntax Behavior
Rd=sub(Rt,Rs):sat:deprecated Rd=sat32(Rt - Rs);

Rdd=sub(Rtt,Rss) Rdd=Rtt-Rss;

Class: XTYPE (slots 2,3)

Notes
■ If saturation occurs during execution of this instruction (a result is clamped to either
maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=sub(Rtt,Rss) Word64 Q6_P_sub_PP(Word64 Rtt, Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 0 0 1 s s s s s P P - t t t t t 1 1 1 d d d d d Rdd=sub(Rtt,Rss)

1 1 0 1 0 1 0 1 1 0 0 s s s s s P P - t t t t t 1 - - d d d d d Rd=sub(Rt,Rs):sat:depreca
ted

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 358


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Subtract and accumulate words


Subtract Rs from Rt, then add the resulting value with Rx. The result is saved in Rx.
Syntax Behavior
Rx+=sub(Rt,Rs) Rx=Rx + Rt - Rs;

Class: XTYPE (slots 2,3)

Intrinsics

Rx+=sub(Rt,Rs) Word32 Q6_R_subacc_RR(Word32 Rx, Word32 Rt,


Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 1 1 1 0 0 0 s s s s s P P 0 t t t t t 0 1 1 x x x x x Rx+=sub(Rt,Rs)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 359


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Subtract halfword
Perform a 16-bit subtract with optional saturation and place the result in either the upper or
lower half of a register. If the result goes in the upper half, the sources can be any high or low
halfword of Rs and Rt. The lower 16 bits of the result are zeroed.
If the result is placed in the lower 16 bits of Rd, the Rs source can be either high or low, but the
other source must be the low halfword of Rt. In this case, the upper halfword of Rd is the sign-
extension of the low halfword.

Rd=sub(Rt.[hl],Rs.l)[:sat] Rd=sub(Rt.[hl],Rs.[hl])[:sat]:<<16

Rt.H Rt.L Rt Rt.H Rt.L Rt

Mux Rs Mux Rs

Rs.H Rs.L Rs.H Rs.L

Mux

16-bit Sub 16-bit Sub


0x7FFF 0x8000 0x7FFF 0x8000

Saturate Saturate

Sign-extend Result Rd Result 0x0000 Rd

Syntax Behavior
Rd=sub(Rt.L,Rs.[HL])[:sat] Rd=[sat16](Rt.h[0]-Rs.h[01]);

Rd=sub(Rt.[HL],Rs.[HL])[:sat]:<< Rd=([sat16](Rt.h[01]-Rs.h[01]))<<16;
16

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 360


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rd=sub(Rt.H,Rs.H):<<16 Word32 Q6_R_sub_RhRh_s16(Word32 Rt, Word32 Rs)


Rd=sub(Rt.H,Rs.H):sat:<<1 Word32 Q6_R_sub_RhRh_sat_s16(Word32 Rt, Word32
6 Rs)
Rd=sub(Rt.H,Rs.L):<<16 Word32 Q6_R_sub_RhRl_s16(Word32 Rt, Word32 Rs)
Rd=sub(Rt.H,Rs.L):sat:<<1 Word32 Q6_R_sub_RhRl_sat_s16(Word32 Rt, Word32
6 Rs)
Rd=sub(Rt.L,Rs.H) Word32 Q6_R_sub_RlRh(Word32 Rt, Word32 Rs)
Rd=sub(Rt.L,Rs.H):<<16 Word32 Q6_R_sub_RlRh_s16(Word32 Rt, Word32 Rs)
Rd=sub(Rt.L,Rs.H):sat Word32 Q6_R_sub_RlRh_sat(Word32 Rt, Word32 Rs)
Rd=sub(Rt.L,Rs.H):sat:<<1 Word32 Q6_R_sub_RlRh_sat_s16(Word32 Rt, Word32
6 Rs)
Rd=sub(Rt.L,Rs.L) Word32 Q6_R_sub_RlRl(Word32 Rt, Word32 Rs)
Rd=sub(Rt.L,Rs.L):<<16 Word32 Q6_R_sub_RlRl_s16(Word32 Rt, Word32 Rs)
Rd=sub(Rt.L,Rs.L):sat Word32 Q6_R_sub_RlRl_sat(Word32 Rt, Word32 Rs)
Rd=sub(Rt.L,Rs.L):sat:<<1 Word32 Q6_R_sub_RlRl_sat_s16(Word32 Rt, Word32
6 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 1 0 1 0 0 1 s s s s s P P - t t t t t 0 0 - d d d d d Rd=sub(Rt.L,Rs.L)
1 1 0 1 0 1 0 1 0 0 1 s s s s s P P - t t t t t 0 1 - d d d d d Rd=sub(Rt.L,Rs.H)
1 1 0 1 0 1 0 1 0 0 1 s s s s s P P - t t t t t 1 0 - d d d d d Rd=sub(Rt.L,Rs.L):sat
1 1 0 1 0 1 0 1 0 0 1 s s s s s P P - t t t t t 1 1 - d d d d d Rd=sub(Rt.L,Rs.H):sat
1 1 0 1 0 1 0 1 0 1 1 s s s s s P P - t t t t t 0 0 0 d d d d d Rd=sub(Rt.L,Rs.L):<<16
1 1 0 1 0 1 0 1 0 1 1 s s s s s P P - t t t t t 0 0 1 d d d d d Rd=sub(Rt.L,Rs.H):<<16
1 1 0 1 0 1 0 1 0 1 1 s s s s s P P - t t t t t 0 1 0 d d d d d Rd=sub(Rt.H,Rs.L):<<16
1 1 0 1 0 1 0 1 0 1 1 s s s s s P P - t t t t t 0 1 1 d d d d d Rd=sub(Rt.H,Rs.H):<<16
Rd=sub(Rt.L,Rs.L):sat:<<1
1 1 0 1 0 1 0 1 0 1 1 s s s s s P P - t t t t t 1 0 0 d d d d d 6

1 1 0 1 0 1 0 1 0 1 1 s s s s s P P - t t t t t 1 0 1 d d d d d Rd=sub(Rt.L,Rs.H):sat:<<1
6

1 1 0 1 0 1 0 1 0 1 1 s s s s s P P - t t t t t 1 1 0 d d d d d Rd=sub(Rt.H,Rs.L):sat:<<1
6
Rd=sub(Rt.H,Rs.H):sat:<<1
1 1 0 1 0 1 0 1 0 1 1 s s s s s P P - t t t t t 1 1 1 d d d d d 6

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 361


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Sign extend word to doubleword


Sign-extend a 32-bit word to a 64-bit doubleword.
Syntax Behavior
Rdd=sxtw(Rs) Rdd = sxt32->64(Rs);

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=sxtw(Rs) Word64 Q6_P_sxtw_R(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 1 0 0 0 1 - s s s s s P P - - - - - - 0 0 - d d d d d Rdd=sxtw(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 362


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector absolute value halfwords


Take the absolute value of each of the four halfwords in the 64-bit source vector Rss. Place the
result in Rdd.
Saturation is optionally available.
Syntax Behavior
Rdd=vabsh(Rss) for (i=0;i<4;i++) {
Rdd.h[i]=ABS(Rss.h[i]);
}
Rdd=vabsh(Rss):sat for (i=0;i<4;i++) {
Rdd.h[i]=sat16(ABS(Rss.h[i]));
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vabsh(Rss) Word64 Q6_P_vabsh_P(Word64 Rss)


Rdd=vabsh(Rss):sat Word64 Q6_P_vabsh_P_sat(Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 0 1 0 s s s s s P P - - - - - - 1 0 0 d d d d d Rdd=vabsh(Rss)
1 0 0 0 0 0 0 0 0 1 0 s s s s s P P - - - - - - 1 0 1 d d d d d Rdd=vabsh(Rss):sat

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 363


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector absolute value words


Take the absolute value of each of the two words in the 64-bit source vector Rss. Place the result
in Rdd.
Saturation is optionally available.
Syntax Behavior
Rdd=vabsw(Rss) for (i=0;i<2;i++) {
Rdd.w[i]=ABS(Rss.w[i]);
}
Rdd=vabsw(Rss):sat for (i=0;i<2;i++) {
Rdd.w[i]=sat32(ABS(Rss.w[i]));
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vabsw(Rss) Word64 Q6_P_vabsw_P(Word64 Rss)


Rdd=vabsw(Rss):sat Word64 Q6_P_vabsw_P_sat(Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 0 1 0 s s s s s P P - - - - - - 1 1 0 d d d d d Rdd=vabsw(Rss)
1 0 0 0 0 0 0 0 0 1 0 s s s s s P P - - - - - - 1 1 1 d d d d d Rdd=vabsw(Rss):sat

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 364


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector absolute difference bytes


For each element in the source vector Rss, subtract the corresponding element in source vector
Rtt. Take the absolute value of the results, and store into Rdd.

Syntax Behavior
Rdd=vabsdiffb(Rtt,Rss) for (i=0;i<8;i++) {
Rdd.b[i]=ABS(Rtt.b[i] - Rss.b[i]);
}
Rdd=vabsdiffub(Rtt,Rss) for (i=0;i<8;i++) {
Rdd.b[i]=ABS(Rtt.ub[i] - Rss.ub[i]);
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vabsdiffb(Rtt,Rss) Word64 Q6_P_vabsdiffb_PP(Word64 Rtt, Word64


Rss)
Rdd=vabsdiffub(Rtt,Rss) Word64 Q6_P_vabsdiffub_PP(Word64 Rtt, Word64
Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 0 1 0 1 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rdd=vabsdiffub(Rtt,Rss)
1 1 1 0 1 0 0 0 1 1 1 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rdd=vabsdiffb(Rtt,Rss)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 365


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector absolute difference halfwords


For each element in the source vector Rss, subtract the corresponding element in source vector
Rtt. Take the absolute value of the results, and store into Rdd.

Syntax Behavior
Rdd=vabsdiffh(Rtt,Rss) for (i=0;i<4;i++) {
Rdd.h[i]=ABS(Rtt.h[i] - Rss.h[i]);
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vabsdiffh(Rtt,Rss) Word64 Q6_P_vabsdiffh_PP(Word64 Rtt, Word64


Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 0 0 1 1 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rdd=vabsdiffh(Rtt,Rss)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 366


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector absolute difference words


For each element in the source vector Rss, subtract the corresponding element in source vector
Rtt. Take the absolute value of the results, and store into Rdd.

Syntax Behavior
Rdd=vabsdiffw(Rtt,Rss) for (i=0;i<2;i++) {
Rdd.w[i]=ABS(Rtt.w[i] - Rss.w[i]);
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vabsdiffw(Rtt,Rss) Word64 Q6_P_vabsdiffw_PP(Word64 Rtt, Word64


Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 0 0 0 1 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rdd=vabsdiffw(Rtt,Rss)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 367


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector add compare and select maximum bytes


Add each byte element in Rxx and Rtt, and compare the resulting sums with the corresponding
differences between Rss and Rtt. Store the maximum value of each compare in Rxx, and set the
corresponding bits in a predicate destination to '1' if the compare result is greater, '0' if not. Each
sum and difference is saturated to 8 bits before the compare, and the compare operation is a
signed byte compare.

Rxx.H3 Rxx.H2 Rxx.H1 Rxx.H0 Rxx

Rss.H3 Rss.H2 Rss.H1 Rss.H0 Rss

Rtt.H3 Rtt.H2 Rtt.H1 Rtt.H0 Rtt

+ - + - + - + -
Rxx,Pd=vacsh(Rss,Rtt)
sat16 sat16 sat16 sat16 sat16 sat16 sat16 sat16

>
>

>
1 0
> 1bit

16bits 1 0

1 0

1 0

Rxx.H3 Rxx.H2 Rxx.H1 Rxx.H0 Pd

Syntax Behavior

Class: N/A

80-N2040-53 Rev. AA 368


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector add compare and select maximum halfwords


Add each halfword element in Rxx and Rtt, and compare the resulting sums with the
corresponding differences between Rss and Rtt. Store the maximum value of each compare in
Rxx, and set the corresponding bits in a predicate destination to '11' if the compare result is
greater, '00' if not. Each sum and difference is saturated to 16 bits before the compare, and the
compare operation is a signed halfword compare.

Rxx.H3 Rxx.H2 Rxx.H1 Rxx.H0 Rxx

Rss.H3 Rss.H2 Rss.H1 Rss.H0 Rss

Rtt.H3 Rtt.H2 Rtt.H1 Rtt.H0 Rtt

+ - + - + - + -
Rxx,Pd=vacsh(Rss,Rtt)
sat16 sat16 sat16 sat16 sat16 sat16 sat16 sat16

>
>

>
1 0
> 1bit

16bits 1 0

1 0

1 0

Rxx.H3 Rxx.H2 Rxx.H1 Rxx.H0 Pd

Syntax Behavior
Rxx,Pe=vacsh(Rss,Rtt) for (i = 0; i < 4; i++) {
xv = (int) Rxx.h[i];
sv = (int) Rss.h[i];
tv = (int) Rtt.h[i];
xv = xv + tv;
sv = sv - tv;
Pe.i*2 = (xv > sv);
Pe.i*2+1 = (xv > sv);
Rxx.h[i]=sat16(max(xv,sv));
}

80-N2040-53 Rev. AA 369


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Notes

■ The predicate generated by this instruction cannot be used as a .new predicate, nor can it be
automatically AND’d with another predicate.
■ If saturation occurs during execution of this instruction (a result is clamped to either
maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 e2 x5
1 1 1 0 1 0 1 0 1 0 1 s s s s s P P 0 t t t t t 0 e e x x x x x Rxx,Pe=vacsh(Rss,Rtt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
e2 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 370


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector add halfwords


Add each of the four halfwords in 64-bit vector Rss to the corresponding halfword in vector Rtt.
Optionally saturate each 16-bit addition to either a signed or unsigned 16-bit value. Applying
saturation to the vaddh instruction clamps the result to the signed range 0x8000 to 0x7fff,
whereas applying saturation to the vadduh instruction ensures that the unsigned result falls
within the range 0 to 0xffff. When saturation is not needed, use the vaddh form.
For the 32-bit version of this vector operation, see the ALU32 instructions.
Syntax Behavior
Rdd=vaddh(Rss,Rtt)[:sat] for (i=0;i<4;i++) {
Rdd.h[i]=[sat16](Rss.h[i]+Rtt.h[i]);
}
Rdd=vadduh(Rss,Rtt):sat for (i=0;i<4;i++) {
Rdd.h[i]=usat16(Rss.uh[i]+Rtt.uh[i]);
}

Class: XTYPE (slots 2,3)

Notes
■ If saturation occurs during execution of this instruction (a result is clamped to either
maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vaddh(Rss,Rtt) Word64 Q6_P_vaddh_PP(Word64 Rss, Word64 Rtt)


Rdd=vaddh(Rss,Rtt):sat Word64 Q6_P_vaddh_PP_sat(Word64 Rss, Word64
Rtt)
Rdd=vadduh(Rss,Rtt):sat Word64 Q6_P_vadduh_PP_sat(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 0 0 0 s s s s s P P - t t t t t 0 1 0 d d d d d Rdd=vaddh(Rss,Rtt)
1 1 0 1 0 0 1 1 0 0 0 s s s s s P P - t t t t t 0 1 1 d d d d d Rdd=vaddh(Rss,Rtt):sat
1 1 0 1 0 0 1 1 0 0 0 s s s s s P P - t t t t t 1 0 0 d d d d d Rdd=vadduh(Rss,Rtt):sat

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits

80-N2040-53 Rev. AA 371


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 372


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector add halfwords with saturate and pack to unsigned bytes


Add the four 16-bit halfwords of Rss to the four 16-bit halfwords of Rtt. The results are saturated
to unsigned 8-bits and packed in destination register Rd.
Syntax Behavior
Rd=vaddhub(Rss,Rtt):sat for (i=0;i<4;i++) {
Rd.b[i]=usat8(Rss.h[i]+Rtt.h[i]);
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=vaddhub(Rss,Rtt):sat Word32 Q6_R_vaddhub_PP_sat(Word64 Rss, Word64


Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 0 1 0 1 - s s s s s P P - t t t t t 0 0 1 d d d d d Rd=vaddhub(Rss,Rtt):sat

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 373


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector reduce add unsigned bytes


For each byte in the source vector Rss, add the corresponding byte in the source vector Rtt. Add
the four upper intermediate results and optionally the upper word of the destination. Add the
four lower results and optionally the lower word of the destination.

Rss

Rtt

+ + + + + + + +

32-bit Add 32-bit Add

Rdd

Syntax Behavior
Rdd=vraddub(Rss,Rtt) Rdd = 0;
for (i=0;i<4;i++) {
Rdd.w[0]=(Rdd.w[0] + (Rss.ub[i]+Rtt.ub[i]));
}
for (i=4;i<8;i++) {
Rdd.w[1]=(Rdd.w[1] + (Rss.ub[i]+Rtt.ub[i]));
}
Rxx+=vraddub(Rss,Rtt) for (i = 0; i < 4; i++) {
Rxx.w[0]=(Rxx.w[0] + (Rss.ub[i]+Rtt.ub[i]));
}
for (i = 4; i < 8; i++) {
Rxx.w[1]=(Rxx.w[1] + (Rss.ub[i]+Rtt.ub[i]));
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vraddub(Rss,Rtt) Word64 Q6_P_vraddub_PP(Word64 Rss, Word64 Rtt)


Rxx+=vraddub(Rss,Rtt) Word64 Q6_P_vraddubacc_PP(Word64 Rxx, Word64
Rss, Word64 Rtt)

80-N2040-53 Rev. AA 374


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 0 0 1 0 s s s s s P P 0 t t t t t 0 0 1 d d d d d Rdd=vraddub(Rss,Rtt)
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 0 1 0 0 1 0 s s s s s P P 0 t t t t t 0 0 1 x x x x x Rxx+=vraddub(Rss,Rtt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 375


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector reduce add halfwords


For each halfword in the source vector Rss, add the corresponding halfword in the source vector
Rtt. Add these intermediate results together, and place the result in Rd.
Rss

Rtt

+ + + +

Rd

Syntax Behavior
Rd=vraddh(Rss,Rtt) Rd = 0;
for (i=0;i<4;i++) {
Rd += (Rss.h[i]+Rtt.h[i]);
}
Rd=vradduh(Rss,Rtt) Rd = 0;
for (i=0;i<4;i++) {
Rd += (Rss.uh[i]+Rtt.uh[i]);
}

Class: XTYPE (slots 2,3)

Intrinsics

Rd=vraddh(Rss,Rtt) Word32 Q6_R_vraddh_PP(Word64 Rss, Word64


Rtt)
Rd=vradduh(Rss,Rtt) Word32 Q6_R_vradduh_PP(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 1 0 - - s s s s s P P 0 t t t t t - 0 1 d d d d d Rd=vradduh(Rss,Rtt)
1 1 1 0 1 0 0 1 0 - 1 s s s s s P P 0 t t t t t 1 1 1 d d d d d Rd=vraddh(Rss,Rtt)

80-N2040-53 Rev. AA 376


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 377


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector add bytes


Add each of the eight bytes in 64-bit vector Rss to the corresponding byte in vector Rtt.
Optionally, saturate each 8-bit addition to an unsigned value between 0 and 255. The eight results
are stored in destination register Rdd.
Syntax Behavior
Rdd=vaddb(Rss,Rtt) Assembler mapped to: "Rdd=vaddub(Rss,Rtt)"
Rdd=vaddub(Rss,Rtt)[:sat] for (i = 0; i < 8; i++) {
Rdd.b[i]=[usat8](Rss.ub[i]+Rtt.ub[i]);
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vaddb(Rss,Rtt) Word64 Q6_P_vaddb_PP(Word64 Rss, Word64 Rtt)


Rdd=vaddub(Rss,Rtt) Word64 Q6_P_vaddub_PP(Word64 Rss, Word64 Rtt)
Rdd=vaddub(Rss,Rtt):s Word64 Q6_P_vaddub_PP_sat(Word64 Rss, Word64 Rtt)
at

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 0 0 0 s s s s s P P - t t t t t 0 0 0 d d d d d Rdd=vaddub(Rss,Rtt)
1 1 0 1 0 0 1 1 0 0 0 s s s s s P P - t t t t t 0 0 1 d d d d d Rdd=vaddub(Rss,Rtt):sat

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 378


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector add words


Add each of the two words in 64-bit vector Rss to the corresponding word in vector Rtt.
Optionally, saturate each 32-bit addition to a signed value between 0x80000000 and 0x7fffffff.
The two word results are stored in destination register Rdd.
Syntax Behavior
Rdd=vaddw(Rss,Rtt)[:sat] for (i=0;i<2;i++) {
Rdd.w[i]=[sat32](Rss.w[i]+Rtt.w[i]);
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vaddw(Rss,Rtt) Word64 Q6_P_vaddw_PP(Word64 Rss, Word64


Rtt)
Rdd=vaddw(Rss,Rtt):sat Word64 Q6_P_vaddw_PP_sat(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 0 0 0 s s s s s P P - t t t t t 1 0 1 d d d d d Rdd=vaddw(Rss,Rtt)
1 1 0 1 0 0 1 1 0 0 0 s s s s s P P - t t t t t 1 1 0 d d d d d Rdd=vaddw(Rss,Rtt):sat

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 379


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector average halfwords


Average each of the four halfwords in the 64-bit source vector Rss with the corresponding
halfword in Rtt. The average operation performed on each halfword adds the two halfwords and
shifts the result right by one bit. Unsigned average uses a logical right shift (shift in 0), whereas
signed average uses an arithmetic right shift (shift in the sign bit). If the round option is used,
0x0001 is also added to each result before shifting. This operation does not overflow. When a
summation (before right shift by 1) causes an overflow of 32 bits, the value shifted in is the most-
significant carry out.
The signed average and negative average halfwords is available with optional convergent
rounding. In convergent rounding, if the two LSBs after the addition/subtraction are 11, a
rounding constant of 1 is added, otherwise a 0 is added. This result is then shifted right by one bit.
Convergent rounding accumulates less error than arithmetic rounding.
Syntax Behavior
Rdd=vavgh(Rss,Rtt) for (i=0;i<4;i++) {
Rdd.h[i]=(Rss.h[i]+Rtt.h[i])>>1;
}
Rdd=vavgh(Rss,Rtt):crnd for (i=0;i<4;i++) {
Rdd.h[i]=convround(Rss.h[i]+Rtt.h[i])>>1;
}
Rdd=vavgh(Rss,Rtt):rnd for (i=0;i<4;i++) {
Rdd.h[i]=(Rss.h[i]+Rtt.h[i]+1)>>1;
}
Rdd=vavguh(Rss,Rtt) for (i=0;i<4;i++) {
Rdd.h[i]=(Rss.uh[i]+Rtt.uh[i])>>1;
}
Rdd=vavguh(Rss,Rtt):rnd for (i=0;i<4;i++) {
Rdd.h[i]=(Rss.uh[i]+Rtt.uh[i]+1)>>1;
}
Rdd=vnavgh(Rtt,Rss) for (i=0;i<4;i++) {
Rdd.h[i]=(Rtt.h[i]-Rss.h[i])>>1;
}
Rdd=vnavgh(Rtt,Rss):crnd: for (i=0;i<4;i++) {
sat Rdd.h[i]=sat16(convround(Rtt.h[i]-
Rss.h[i])>>1);
}
Rdd=vnavgh(Rtt,Rss):rnd:s for (i=0;i<4;i++) {
at Rdd.h[i]=sat16((Rtt.h[i]-Rss.h[i]+1)>>1);
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 380


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rdd=vavgh(Rss,Rtt) Word64 Q6_P_vavgh_PP(Word64 Rss, Word64 Rtt)


Rdd=vavgh(Rss,Rtt):crnd Word64 Q6_P_vavgh_PP_crnd(Word64 Rss, Word64
Rtt)
Rdd=vavgh(Rss,Rtt):rnd Word64 Q6_P_vavgh_PP_rnd(Word64 Rss, Word64
Rtt)
Rdd=vavguh(Rss,Rtt) Word64 Q6_P_vavguh_PP(Word64 Rss, Word64
Rtt)
Rdd=vavguh(Rss,Rtt):rnd Word64 Q6_P_vavguh_PP_rnd(Word64 Rss, Word64
Rtt)
Rdd=vnavgh(Rtt,Rss) Word64 Q6_P_vnavgh_PP(Word64 Rtt, Word64
Rss)
Rdd=vnavgh(Rtt,Rss):crnd:s Word64 Q6_P_vnavgh_PP_crnd_sat(Word64 Rtt,
at Word64 Rss)
Rdd=vnavgh(Rtt,Rss):rnd:sa Word64 Q6_P_vnavgh_PP_rnd_sat(Word64 Rtt,
t Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 0 1 0 s s s s s P P - t t t t t 0 1 0 d d d d d Rdd=vavgh(Rss,Rtt)
1 1 0 1 0 0 1 1 0 1 0 s s s s s P P - t t t t t 0 1 1 d d d d d Rdd=vavgh(Rss,Rtt):rnd
1 1 0 1 0 0 1 1 0 1 0 s s s s s P P - t t t t t 1 0 0 d d d d d Rdd=vavgh(Rss,Rtt):crnd
1 1 0 1 0 0 1 1 0 1 0 s s s s s P P - t t t t t 1 0 1 d d d d d Rdd=vavguh(Rss,Rtt)
1 1 0 1 0 0 1 1 0 1 0 s s s s s P P - t t t t t 1 1 - d d d d d Rdd=vavguh(Rss,Rtt):rnd
1 1 0 1 0 0 1 1 1 0 0 s s s s s P P - t t t t t 0 0 0 d d d d d Rdd=vnavgh(Rtt,Rss)
Rdd=vnavgh(Rtt,Rss):rnd:s
1 1 0 1 0 0 1 1 1 0 0 s s s s s P P - t t t t t 0 0 1 d d d d d at

1 1 0 1 0 0 1 1 1 0 0 s s s s s P P - t t t t t 0 1 0 d d d d d Rdd=vnavgh(Rtt,Rss):crnd:
sat

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 381


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector average unsigned bytes


Average each of the eight unsigned bytes in the 64-bit source vector Rss with the corresponding
byte in Rtt. The average operation performed on each byte is the sum of the two bytes shifted
right by 1 bit. When the round option is used, 0x01 is also added to each result before shifting.
This operation does not overflow. When a summation (before right shift by 1) causes an overflow
of 8 bits, the value shifted in is the most-significant carry out.
Syntax Behavior
Rdd=vavgub(Rss,Rtt) for (i = 0; i < 8; i++) {
Rdd.b[i]=((Rss.ub[i] + Rtt.ub[i])>>1);
}
Rdd=vavgub(Rss,Rtt):rnd for (i = 0; i < 8; i++) {
Rdd.b[i]=((Rss.ub[i]+Rtt.ub[i]+1)>>1);
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vavgub(Rss,Rtt) Word64 Q6_P_vavgub_PP(Word64 Rss, Word64 Rtt)


Rdd=vavgub(Rss,Rtt):rnd Word64 Q6_P_vavgub_PP_rnd(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 0 1 0 s s s s s P P - t t t t t 0 0 0 d d d d d Rdd=vavgub(Rss,Rtt)
1 1 0 1 0 0 1 1 0 1 0 s s s s s P P - t t t t t 0 0 1 d d d d d Rdd=vavgub(Rss,Rtt):rnd

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 382


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector average words


Average each of the two words in the 64-bit source vector Rss with the corresponding word in Rtt.
The average operation performed on each halfword adds the two words and shifts the result right
by 1 bit. Unsigned average uses a logical right shift (shift in 0), whereas signed average uses an
arithmetic right shift (shift in the sign bit). When the round option is used, 0x1 is also added to
each result before shifting. This operation does not overflow. When a summation (before right
shift by 1) causes an overflow of 32 bits, the value shifted in is the most-significant carry out.
The signed average and negative average words are available with optional convergent rounding.
In convergent rounding, if the two LSBs after the addition/subtraction are 11, a rounding constant
of 1 is added, otherwise a 0 is added. This result is then shifted right by one bit. Convergent
rounding accumulates less error than arithmetic rounding.
Syntax Behavior
Rdd=vavguw(Rss,Rtt)[:rnd] for (i=0;i<2;i++) {
Rdd.w[i]=(zxt32->33(Rss.uw[i])+zxt32-
>33(Rtt.uw[i])+1)>>1;
}
Rdd=vavgw(Rss,Rtt):crnd for (i=0;i<2;i++) {
Rdd.w[i]=(convround(sxt32->33(Rss.w[i])+sxt32-
>33 (Rtt.w[i]))>>1);
}
Rdd=vavgw(Rss,Rtt)[:rnd] for (i=0;i<2;i++) {
Rdd.w[i]=(sxt32->33(Rss.w[i])+sxt32-
>33(Rtt.w[i])+1)>>1;
}
Rdd=vnavgw(Rtt,Rss) for (i=0;i<2;i++) {
Rdd.w[i]=(sxt32->33(Rtt.w[i])-sxt32-
>33(Rss.w[i]))>>1;
}
Rdd=vnavgw(Rtt,Rss):crnd: for (i=0;i<2;i++) {
sat Rdd.w[i]=sat32(convround(sxt32->33(Rtt.w[i])-
sxt32->33(Rss.w[i]))>>1);
}
Rdd=vnavgw(Rtt,Rss):rnd:s for (i=0;i<2;i++) {
at Rdd.w[i]=sat32((sxt32->33(Rtt.w[i])-sxt32-
>33(Rss.w[i])+1)>>1);
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 383


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rdd=vavguw(Rss,Rtt) Word64 Q6_P_vavguw_PP(Word64 Rss, Word64 Rtt)


Rdd=vavguw(Rss,Rtt):rnd Word64 Q6_P_vavguw_PP_rnd(Word64 Rss, Word64 Rtt)
Rdd=vavgw(Rss,Rtt) Word64 Q6_P_vavgw_PP(Word64 Rss, Word64 Rtt)
Rdd=vavgw(Rss,Rtt):crnd Word64 Q6_P_vavgw_PP_crnd(Word64 Rss, Word64 Rtt)
Rdd=vavgw(Rss,Rtt):rnd Word64 Q6_P_vavgw_PP_rnd(Word64 Rss, Word64 Rtt)
Rdd=vnavgw(Rtt,Rss) Word64 Q6_P_vnavgw_PP(Word64 Rtt, Word64 Rss)
Rdd=vnavgw(Rtt,Rss):crnd: Word64 Q6_P_vnavgw_PP_crnd_sat(Word64 Rtt, Word64
sat Rss)
Rdd=vnavgw(Rtt,Rss):rnd:s Word64 Q6_P_vnavgw_PP_rnd_sat(Word64 Rtt, Word64
at Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 0 1 1 s s s s s P P - t t t t t 0 0 0 d d d d d Rdd=vavgw(Rss,Rtt)
1 1 0 1 0 0 1 1 0 1 1 s s s s s P P - t t t t t 0 0 1 d d d d d Rdd=vavgw(Rss,Rtt):rnd
1 1 0 1 0 0 1 1 0 1 1 s s s s s P P - t t t t t 0 1 0 d d d d d Rdd=vavgw(Rss,Rtt):crnd
1 1 0 1 0 0 1 1 0 1 1 s s s s s P P - t t t t t 0 1 1 d d d d d Rdd=vavguw(Rss,Rtt)
1 1 0 1 0 0 1 1 0 1 1 s s s s s P P - t t t t t 1 0 0 d d d d d Rdd=vavguw(Rss,Rtt):rnd
1 1 0 1 0 0 1 1 1 0 0 s s s s s P P - t t t t t 0 1 1 d d d d d Rdd=vnavgw(Rtt,Rss)

1 1 0 1 0 0 1 1 1 0 0 s s s s s P P - t t t t t 1 0 - d d d d d Rdd=vnavgw(Rtt,Rss):rnd:s
at

1 1 0 1 0 0 1 1 1 0 0 s s s s s P P - t t t t t 1 1 - d d d d d Rdd=vnavgw(Rtt,Rss):crnd:
sat

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 384


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector clip to unsigned


Clip input to an unsigned integer.
Syntax Behavior
Rdd=vclip(Rss,#u5) tmp=MIN((1<<#u)-1,MAX(Rss.w[0],-(1<<#u)));
Rdd.w[0]=tmp;
tmp=MIN((1<<#u)-1,MAX(Rss.w[1],-(1<<#u)));
Rdd.w[1]=tmp;

Class: XTYPE (slots 2,3)

Notes

■ This instruction can only execute on a core with the Hexagon audio extensions

Intrinsics

Rdd=vclip(Rss,#u5) Word64 Q6_P_vclip_PI(Word64 Rss, Word32


Iu5)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 0 0 1 1 0 s s s s s P P 0 i i i i i 1 1 0 d d d d d Rdd=vclip(Rss,#u5)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 385


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector conditional negate


Based on bits in Rt, conditionally negate halves in Rss.
Syntax Behavior
Rdd=vcnegh(Rss,Rt) for (i = 0; i < 4; i++) {
if (Rt.i) {
Rdd.h[i]=sat16(-Rss.h[i]);
} else {
Rdd.h[i]=Rss.h[i];
}
}
Rxx+=vrcnegh(Rss,Rt) for (i = 0; i < 4; i++) {
if (Rt.i) {
Rxx += -Rss.h[i];
} else {
Rxx += Rss.h[i];
}
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vcnegh(Rss,Rt) Word64 Q6_P_vcnegh_PR(Word64 Rss, Word32 Rt)


Rxx+=vrcnegh(Rss,Rt Word64 Q6_P_vrcneghacc_PR(Word64 Rxx, Word64 Rss,
) Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 1 1 1 1 - s s s s s P P - t t t t t 0 1 - d d d d d Rdd=vcnegh(Rss,Rt)
ICLASS RegType Maj s5 Parse t5 Min x5
1 1 0 0 1 0 1 1 0 0 1 s s s s s P P 1 t t t t t 1 1 1 x x x x x Rxx+=vrcnegh(Rss,Rt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 386


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 387


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector maximum bytes


Compare each of the eight unsigned bytes in the 64-bit source vector Rss to the corresponding
byte in Rtt. For each comparison, select the maximum of the two bytes and place that byte in the
corresponding location in Rdd.
Syntax Behavior
Rdd=vmaxb(Rtt,Rss) for (i = 0; i < 8; i++) {
Rdd.b[i]=max(Rtt.b[i],Rss.b[i]);
}
Rdd=vmaxub(Rtt,Rss) for (i = 0; i < 8; i++) {
Rdd.b[i]=max(Rtt.ub[i],Rss.ub[i]);
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vmaxb(Rtt,Rss) Word64 Q6_P_vmaxb_PP(Word64 Rtt, Word64


Rss)
Rdd=vmaxub(Rtt,Rss) Word64 Q6_P_vmaxub_PP(Word64 Rtt, Word64
Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 1 1 0 s s s s s P P - t t t t t 0 0 0 d d d d d Rdd=vmaxub(Rtt,Rss)
1 1 0 1 0 0 1 1 1 1 0 s s s s s P P - t t t t t 1 1 0 d d d d d Rdd=vmaxb(Rtt,Rss)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 388


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector maximum halfwords


Compare each of the four halfwords in the 64-bit source vector Rss to the corresponding
halfword in Rtt. For each comparison, select the maximum of the two halfwords and place that
halfword in the corresponding location in Rdd. Comparisons are available in both signed and
unsigned form.
Syntax Behavior
Rdd=vmaxh(Rtt,Rss) for (i = 0; i < 4; i++) {
Rdd.h[i]=max(Rtt.h[i],Rss.h[i]);
}
Rdd=vmaxuh(Rtt,Rss) for (i = 0; i < 4; i++) {
Rdd.h[i]=max(Rtt.uh[i],Rss.uh[i]);
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vmaxh(Rtt,Rss) Word64 Q6_P_vmaxh_PP(Word64 Rtt, Word64


Rss)
Rdd=vmaxuh(Rtt,Rss) Word64 Q6_P_vmaxuh_PP(Word64 Rtt, Word64
Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 1 1 0 s s s s s P P - t t t t t 0 0 1 d d d d d Rdd=vmaxh(Rtt,Rss)
1 1 0 1 0 0 1 1 1 1 0 s s s s s P P - t t t t t 0 1 0 d d d d d Rdd=vmaxuh(Rtt,Rss)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 389


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector reduce maximum halfwords


Register Rxx contains a maximum value in the low word and the address of that maximum value
in the high word. Register Rss contains a vector of four halfword values, and register Ru contains
the address of this data. The instruction finds the maximum halfword between the previous
maximum in Rxx[0] and the four values in Rss. The address of the new maximum is stored in
Rxx[1].
Syntax Behavior
Rxx=vrmaxh(Rss,Ru) max = Rxx.h[0];
addr = Rxx.w[1];
for (i = 0; i < 4; i++) {
if (max < Rss.h[i]) {
max = Rss.h[i];
addr = Ru | i<<1;
}
}
Rxx.w[0]=max;
Rxx.w[1]=addr;
Rxx=vrmaxuh(Rss,Ru) max = Rxx.uh[0];
addr = Rxx.w[1];
for (i = 0; i < 4; i++) {
if (max < Rss.uh[i]) {
max = Rss.uh[i];
addr = Ru | i<<1;
}
}
Rxx.w[0]=max;
Rxx.w[1]=addr;

Class: XTYPE (slots 2,3)

Intrinsics

Rxx=vrmaxh(Rss,Ru) Word64 Q6_P_vrmaxh_PR(Word64 Rxx, Word64 Rss,


Word32 Ru)
Rxx=vrmaxuh(Rss,Ru) Word64 Q6_P_vrmaxuh_PR(Word64 Rxx, Word64 Rss,
Word32 Ru)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse x5 Min u5
1 1 0 0 1 0 1 1 0 0 1 s s s s s P P 0 x x x x x 0 0 1 u u u u u Rxx=vrmaxh(Rss,Ru)
1 1 0 0 1 0 1 1 0 0 1 s s s s s P P 1 x x x x x 0 0 1 u u u u u Rxx=vrmaxuh(Rss,Ru)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
u5 Field to encode register u
x5 Field to encode register x

80-N2040-53 Rev. AA 390


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 391


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector reduce maximum words


Find the maximum word between the previous maximum in Rxx[0] and the two values in Rss. The
address of the new maximum is stored in Rxx[1].
Register Rxx contains a maximum value in the low word and the address of that maximum value
in the high word. Register Rss contains a vector of two word values, and register Ru contains the
address of this data.
Syntax Behavior
Rxx=vrmaxuw(Rss,Ru) max = Rxx.uw[0];
addr = Rxx.w[1];
for (i = 0; i < 2; i++) {
if (max < Rss.uw[i]) {
max = Rss.uw[i];
addr = Ru | i<<2;
}
}
Rxx.w[0]=max;
Rxx.w[1]=addr;
Rxx=vrmaxw(Rss,Ru) max = Rxx.w[0];
addr = Rxx.w[1];
for (i = 0; i < 2; i++) {
if (max < Rss.w[i]) {
max = Rss.w[i];
addr = Ru | i<<2;
}
}
Rxx.w[0]=max;
Rxx.w[1]=addr;

Class: XTYPE (slots 2,3)

Intrinsics

Rxx=vrmaxuw(Rss,Ru) Word64 Q6_P_vrmaxuw_PR(Word64 Rxx, Word64 Rss,


Word32 Ru)
Rxx=vrmaxw(Rss,Ru) Word64 Q6_P_vrmaxw_PR(Word64 Rxx, Word64 Rss,
Word32 Ru)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse x5 Min u5
1 1 0 0 1 0 1 1 0 0 1 s s s s s P P 0 x x x x x 0 1 0 u u u u u Rxx=vrmaxw(Rss,Ru)
1 1 0 0 1 0 1 1 0 0 1 s s s s s P P 1 x x x x x 0 1 0 u u u u u Rxx=vrmaxuw(Rss,Ru)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
u5 Field to encode register u

80-N2040-53 Rev. AA 392


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


x5 Field to encode register x
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 393


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector maximum words


Compare each of the two words in the 64-bit source vector Rss to the corresponding word in Rtt.
For each comparison, select the maximum of the two words and place that word in the
corresponding location in Rdd.
Comparisons are available in both signed and unsigned form.
Syntax Behavior
Rdd=vmaxuw(Rtt,Rss) for (i = 0; i < 2; i++) {
Rdd.w[i]=max(Rtt.uw[i],Rss.uw[i]);
}
Rdd=vmaxw(Rtt,Rss) for (i = 0; i < 2; i++) {
Rdd.w[i]=max(Rtt.w[i],Rss.w[i]);
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vmaxuw(Rtt,Rss) Word64 Q6_P_vmaxuw_PP(Word64 Rtt, Word64


Rss)
Rdd=vmaxw(Rtt,Rss) Word64 Q6_P_vmaxw_PP(Word64 Rtt, Word64
Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 1 0 1 s s s s s P P - t t t t t 1 0 1 d d d d d Rdd=vmaxuw(Rtt,Rss)
1 1 0 1 0 0 1 1 1 1 0 s s s s s P P - t t t t t 0 1 1 d d d d d Rdd=vmaxw(Rtt,Rss)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 394


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector minimum bytes


Compare each of the eight unsigned bytes in the 64-bit source vector Rss to the corresponding
byte in Rtt. For each comparison, select the minimum of the two bytes and place that byte in the
corresponding location in Rdd.
Syntax Behavior
Rdd,Pe=vminub(Rtt,Rss) for (i = 0; i < 8; i++) {
Pe.i = (Rtt.ub[i] > Rss.ub[i]);
Rdd.b[i]=min(Rtt.ub[i],Rss.ub[i]);
}
Rdd=vminb(Rtt,Rss) for (i = 0; i < 8; i++) {
Rdd.b[i]=min(Rtt.b[i],Rss.b[i]);
}
Rdd=vminub(Rtt,Rss) for (i = 0; i < 8; i++) {
Rdd.b[i]=min(Rtt.ub[i],Rss.ub[i]);
}

Class: XTYPE (slots 2,3)

Notes

■ The predicate generated by this instruction cannot be used as a .new predicate, nor can it be
automatically ANDed with another predicate.

Intrinsics

Rdd=vminb(Rtt,Rss) Word64 Q6_P_vminb_PP(Word64 Rtt, Word64


Rss)
Rdd=vminub(Rtt,Rss) Word64 Q6_P_vminub_PP(Word64 Rtt, Word64
Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 1 0 1 s s s s s P P - t t t t t 0 0 0 d d d d d Rdd=vminub(Rtt,Rss)
1 1 0 1 0 0 1 1 1 1 0 s s s s s P P - t t t t t 1 1 1 d d d d d Rdd=vminb(Rtt,Rss)
ICLASS RegType MajOp s5 Parse t5 e2 d5
1 1 1 0 1 0 1 0 1 1 1 s s s s s P P 0 t t t t t 0 e e d d d d d Rdd,Pe=vminub(Rtt,Rss)

Field name Description


RegType Register type
ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
Parse Packet/loop parse bits
d5 Field to encode register d

80-N2040-53 Rev. AA 395


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


e2 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 396


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector minimum halfwords


Compare each of the four halfwords in the 64-bit source vector Rss to the corresponding
halfword in Rtt. For each comparison, select the minimum of the two halfwords and place that
halfword in the corresponding location in Rdd.
Comparisons are available in both signed and unsigned form.
Syntax Behavior
Rdd=vminh(Rtt,Rss) for (i = 0; i < 4; i++) {
Rdd.h[i]=min(Rtt.h[i],Rss.h[i]);
}
Rdd=vminuh(Rtt,Rss) for (i = 0; i < 4; i++) {
Rdd.h[i]=min(Rtt.uh[i],Rss.uh[i]);
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vminh(Rtt,Rss) Word64 Q6_P_vminh_PP(Word64 Rtt, Word64


Rss)
Rdd=vminuh(Rtt,Rss) Word64 Q6_P_vminuh_PP(Word64 Rtt, Word64
Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 1 0 1 s s s s s P P - t t t t t 0 0 1 d d d d d Rdd=vminh(Rtt,Rss)
1 1 0 1 0 0 1 1 1 0 1 s s s s s P P - t t t t t 0 1 0 d d d d d Rdd=vminuh(Rtt,Rss)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 397


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector reduce minimum halfwords


Find the minimum halfword between the previous minimum in Rxx[0] and the four values in Rss.
The address of the new minimum is stored in Rxx[1].
Register Rxx contains a minimum value in the low word and the address of that minimum value in
the high word. Register Rss contains a vector of four halfword values, and register Ru contains the
address of this data.
Syntax Behavior
Rxx=vrminh(Rss,Ru) min = Rxx.h[0];
addr = Rxx.w[1];
for (i = 0; i < 4; i++) {
if (min > Rss.h[i]) {
min = Rss.h[i];
addr = Ru | i<<1;
}
}
Rxx.w[0]=min;
Rxx.w[1]=addr;
Rxx=vrminuh(Rss,Ru) min = Rxx.uh[0];
addr = Rxx.w[1];
for (i = 0; i < 4; i++) {
if (min > Rss.uh[i]) {
min = Rss.uh[i];
addr = Ru | i<<1;
}
}
Rxx.w[0]=min;
Rxx.w[1]=addr;

Class: XTYPE (slots 2,3)

Intrinsics

Rxx=vrminh(Rss,Ru) Word64 Q6_P_vrminh_PR(Word64 Rxx, Word64 Rss,


Word32 Ru)
Rxx=vrminuh(Rss,Ru) Word64 Q6_P_vrminuh_PR(Word64 Rxx, Word64 Rss,
Word32 Ru)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse x5 Min u5
1 1 0 0 1 0 1 1 0 0 1 s s s s s P P 0 x x x x x 1 0 1 u u u u u Rxx=vrminh(Rss,Ru)
1 1 0 0 1 0 1 1 0 0 1 s s s s s P P 1 x x x x x 1 0 1 u u u u u Rxx=vrminuh(Rss,Ru)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
u5 Field to encode register u

80-N2040-53 Rev. AA 398


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


x5 Field to encode register x
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 399


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector reduce minimum words


Find the minimum word between the previous minimum in Rxx[0] and the two values in Rss. The
address of the new minimum is stored in Rxx[1].
Register Rxx contains a minimum value in the low word and the address of that minimum value in
the high word. Register Rss contains a vector of two word values, and register Ru contains the
address of this data.
Syntax Behavior
Rxx=vrminuw(Rss,Ru) min = Rxx.uw[0];
addr = Rxx.w[1];
for (i = 0; i < 2; i++) {
if (min > Rss.uw[i]) {
min = Rss.uw[i];
addr = Ru | i<<2;
}
}
Rxx.w[0]=min;
Rxx.w[1]=addr;
Rxx=vrminw(Rss,Ru) min = Rxx.w[0];
addr = Rxx.w[1];
for (i = 0; i < 2; i++) {
if (min > Rss.w[i]) {
min = Rss.w[i];
addr = Ru | i<<2;
}
}
Rxx.w[0]=min;
Rxx.w[1]=addr;

Class: XTYPE (slots 2,3)

Intrinsics

Rxx=vrminuw(Rss,Ru) Word64 Q6_P_vrminuw_PR(Word64 Rxx, Word64 Rss,


Word32 Ru)
Rxx=vrminw(Rss,Ru) Word64 Q6_P_vrminw_PR(Word64 Rxx, Word64 Rss,
Word32 Ru)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse x5 Min u5
1 1 0 0 1 0 1 1 0 0 1 s s s s s P P 0 x x x x x 1 1 0 u u u u u Rxx=vrminw(Rss,Ru)
1 1 0 0 1 0 1 1 0 0 1 s s s s s P P 1 x x x x x 1 1 0 u u u u u Rxx=vrminuw(Rss,Ru)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
u5 Field to encode register u

80-N2040-53 Rev. AA 400


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


x5 Field to encode register x
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 401


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector minimum words


Compare each of the two words in the 64-bit source vector Rss to the corresponding word in Rtt.
For each comparison, select the minimum of the two words and place that word in the
corresponding location in Rdd.
Comparisons are available in both signed and unsigned form.
Syntax Behavior
Rdd=vminuw(Rtt,Rss) for (i = 0; i < 2; i++) {
Rdd.w[i]=min(Rtt.uw[i],Rss.uw[i]);
}
Rdd=vminw(Rtt,Rss) for (i = 0; i < 2; i++) {
Rdd.w[i]=min(Rtt.w[i],Rss.w[i]);
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vminuw(Rtt,Rss) Word64 Q6_P_vminuw_PP(Word64 Rtt, Word64


Rss)
Rdd=vminw(Rtt,Rss) Word64 Q6_P_vminw_PP(Word64 Rtt, Word64
Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 1 0 1 s s s s s P P - t t t t t 0 1 1 d d d d d Rdd=vminw(Rtt,Rss)
1 1 0 1 0 0 1 1 1 0 1 s s s s s P P - t t t t t 1 0 0 d d d d d Rdd=vminuw(Rtt,Rss)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 402


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector sum of absolute differences unsigned bytes


For each byte in the source vector Rss, subtract the corresponding byte in source vector Rtt. Take
the absolute value of the intermediate results, and the upper four together and add the lower
four together. Optionally, add the destination upper and lower words to these results.
This instruction is useful in determining distance between two vectors, in applications such as
motion estimation.
Rss

Rtt

sad sad sad sad sad sad sad sad

32-bit Add 32-bit Add

Rdd

Syntax Behavior
Rdd=vrsadub(Rss,Rtt) Rdd = 0;
for (i = 0; i < 4; i++) {
Rdd.w[0]=(Rdd.w[0] + ABS((Rss.ub[i] -
Rtt.ub[i])));
}
for (i = 4; i < 8; i++) {
Rdd.w[1]=(Rdd.w[1] + ABS((Rss.ub[i] -
Rtt.ub[i])));
}
Rxx+=vrsadub(Rss,Rtt for (i = 0; i < 4; i++) {
) Rxx.w[0]=(Rxx.w[0] + ABS((Rss.ub[i] -
Rtt.ub[i])));
}
for (i = 4; i < 8; i++) {
Rxx.w[1]=(Rxx.w[1] + ABS((Rss.ub[i] -
Rtt.ub[i])));
}

80-N2040-53 Rev. AA 403


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vrsadub(Rss,Rtt) Word64 Q6_P_vrsadub_PP(Word64 Rss, Word64 Rtt)


Rxx+=vrsadub(Rss,Rtt) Word64 Q6_P_vrsadubacc_PP(Word64 Rxx, Word64 Rss,
Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 0 0 1 0 s s s s s P P 0 t t t t t 0 1 0 d d d d d Rdd=vrsadub(Rss,Rtt)
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 0 1 0 0 1 0 s s s s s P P 0 t t t t t 0 1 0 x x x x x Rxx+=vrsadub(Rss,Rtt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 404


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector subtract halfwords


Subtract each of the four halfwords in 64-bit vector Rss from the corresponding halfword in
vector Rtt.
Optionally, saturate each 16-bit addition to either a signed or unsigned 16-bit value. Applying
saturation to the vsubh instruction clamps the result to the signed range 0x8000 to 0x7fff,
whereas applying saturation to the vsubuh instruction ensures that the unsigned result falls
within the range 0 to 0xffff.
When saturation is not needed, use the vsubh instruction.

Syntax Behavior
Rdd=vsubh(Rtt,Rss)[:sat] for (i=0;i<4;i++) {
Rdd.h[i]=[sat16](Rtt.h[i]-Rss.h[i]);
}
Rdd=vsubuh(Rtt,Rss):sat for (i=0;i<4;i++) {
Rdd.h[i]=usat16(Rtt.uh[i]-Rss.uh[i]);
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vsubh(Rtt,Rss) Word64 Q6_P_vsubh_PP(Word64 Rtt, Word64 Rss)


Rdd=vsubh(Rtt,Rss):sat Word64 Q6_P_vsubh_PP_sat(Word64 Rtt, Word64 Rss)
Rdd=vsubuh(Rtt,Rss):sat Word64 Q6_P_vsubuh_PP_sat(Word64 Rtt, Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 0 0 1 s s s s s P P - t t t t t 0 1 0 d d d d d Rdd=vsubh(Rtt,Rss)
1 1 0 1 0 0 1 1 0 0 1 s s s s s P P - t t t t t 0 1 1 d d d d d Rdd=vsubh(Rtt,Rss):sat
1 1 0 1 0 0 1 1 0 0 1 s s s s s P P - t t t t t 1 0 0 d d d d d Rdd=vsubuh(Rtt,Rss):sat

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits

80-N2040-53 Rev. AA 405


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 406


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector subtract bytes


Subtract each of the eight bytes in 64-bit vector Rss from the corresponding byte in vector Rtt.
Optionally, saturate each 8-bit subtraction to an unsigned value between 0 and 255. The eight
results are stored in destination register Rdd.
Syntax Behavior
Rdd=vsubb(Rss,Rtt) Assembler mapped to: "Rdd=vsubub(Rss,Rtt)"
Rdd=vsubub(Rtt,Rss)[:sat] for (i = 0; i < 8; i++) {
Rdd.b[i]=[usat8](Rtt.ub[i]-Rss.ub[i]);
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vsubb(Rss,Rtt) Word64 Q6_P_vsubb_PP(Word64 Rss, Word64 Rtt)


Rdd=vsubub(Rtt,Rss) Word64 Q6_P_vsubub_PP(Word64 Rtt, Word64 Rss)
Rdd=vsubub(Rtt,Rss):sat Word64 Q6_P_vsubub_PP_sat(Word64 Rtt, Word64
Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 0 0 1 s s s s s P P - t t t t t 0 0 0 d d d d d Rdd=vsubub(Rtt,Rss)
1 1 0 1 0 0 1 1 0 0 1 s s s s s P P - t t t t t 0 0 1 d d d d d Rdd=vsubub(Rtt,Rss):sat

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 407


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector subtract words


Subtract each of the two words in 64-bit vector Rss from the corresponding word in vector Rtt.
Optionally, saturate each 32-bit subtraction to a signed value between 0x8000_0000 and
0x7fff_ffff. The two word results are stored in destination register Rdd.
Syntax Behavior
Rdd=vsubw(Rtt,Rss)[:sat] for (i=0;i<2;i++) {
Rdd.w[i]=[sat32](Rtt.w[i]-Rss.w[i]);
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vsubw(Rtt,Rss) Word64 Q6_P_vsubw_PP(Word64 Rtt, Word64


Rss)
Rdd=vsubw(Rtt,Rss):sat Word64 Q6_P_vsubw_PP_sat(Word64 Rtt, Word64
Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 0 1 1 0 0 1 s s s s s P P - t t t t t 1 0 1 d d d d d Rdd=vsubw(Rtt,Rss)
1 1 0 1 0 0 1 1 0 0 1 s s s s s P P - t t t t t 1 1 0 d d d d d Rdd=vsubw(Rtt,Rss):sat

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 408


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.10.2 XTYPE BIT


The XTYPE BIT instruction subclass includes instructions for bit manipulation.

Count leading
Count leading zeros (cl0) counts the number of consecutive zeros starting with the most
significant bit.
Count leading ones (cl1) counts the number of consecutive ones starting with the most significant
bit.
Count leading bits (clb) counts both leading ones and leading zeros and then selects the
maximum.
The normamt instruction returns the number of leading bits minus one.
For a two's-complement number, the number of leading zeros is zero for negative numbers. The
number of leading ones is zero for positive numbers.
The number of leading bits can be used to judge the magnitude of the value.
Syntax Behavior
Rd=add(clb(Rs),#s6 Rd =
) (max(count_leading_ones(Rs),count_leading_ones(~Rs)))+#s;
Rd=add(clb(Rss),#s Rd =
6) (max(count_leading_ones(Rss),count_leading_ones(~Rss)))+#s;
Rd=cl0(Rs) Rd = count_leading_ones(~Rs);
Rd=cl0(Rss) Rd = count_leading_ones(~Rss);
Rd=cl1(Rs) Rd = count_leading_ones(Rs);
Rd=cl1(Rss) Rd = count_leading_ones(Rss);
Rd=clb(Rs) Rd = max(count_leading_ones(Rs),count_leading_ones(~Rs));
Rd=clb(Rss) Rd = max(count_leading_ones(Rss),count_leading_ones(~Rss));
Rd=normamt(Rs) if (Rs == 0) {
Rd = 0;
} else {
Rd =
(max(count_leading_ones(Rs),count_leading_ones(~Rs)))-1;
}
Rd=normamt(Rss) if (Rss == 0) {
Rd = 0;
} else {
Rd =
(max(count_leading_ones(Rss),count_leading_ones(~Rss)))-1;
}

80-N2040-53 Rev. AA 409


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rd=add(clb(Rs),#s6) Word32 Q6_R_add_clb_RI(Word32 Rs, Word32


Is6)
Rd=add(clb(Rss),#s6) Word32 Q6_R_add_clb_PI(Word64 Rss, Word32
Is6)
Rd=cl0(Rs) Word32 Q6_R_cl0_R(Word32 Rs)
Rd=cl0(Rss) Word32 Q6_R_cl0_P(Word64 Rss)
Rd=cl1(Rs) Word32 Q6_R_cl1_R(Word32 Rs)
Rd=cl1(Rss) Word32 Q6_R_cl1_P(Word64 Rss)
Rd=clb(Rs) Word32 Q6_R_clb_R(Word32 Rs)
Rd=clb(Rss) Word32 Q6_R_clb_P(Word64 Rss)
Rd=normamt(Rs) Word32 Q6_R_normamt_R(Word32 Rs)
Rd=normamt(Rss) Word32 Q6_R_normamt_P(Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 0 0 0 1 0 s s s s s P P - - - - - - 0 0 0 d d d d d Rd=clb(Rss)
1 0 0 0 1 0 0 0 0 1 0 s s s s s P P - - - - - - 0 1 0 d d d d d Rd=cl0(Rss)
1 0 0 0 1 0 0 0 0 1 0 s s s s s P P - - - - - - 1 0 0 d d d d d Rd=cl1(Rss)
1 0 0 0 1 0 0 0 0 1 1 s s s s s P P - - - - - - 0 0 0 d d d d d Rd=normamt(Rss)
1 0 0 0 1 0 0 0 0 1 1 s s s s s P P i i i i i i 0 1 0 d d d d d Rd=add(clb(Rss),#s6)
1 0 0 0 1 1 0 0 0 0 1 s s s s s P P i i i i i i 0 0 0 d d d d d Rd=add(clb(Rs),#s6)
1 0 0 0 1 1 0 0 0 0 0 s s s s s P P - - - - - - 1 0 0 d d d d d Rd=clb(Rs)
1 0 0 0 1 1 0 0 0 0 0 s s s s s P P - - - - - - 1 0 1 d d d d d Rd=cl0(Rs)
1 0 0 0 1 1 0 0 0 0 0 s s s s s P P - - - - - - 1 1 0 d d d d d Rd=cl1(Rs)
1 0 0 0 1 1 0 0 0 0 0 s s s s s P P - - - - - - 1 1 1 d d d d d Rd=normamt(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 410


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Count population
The population count (popcount) instruction counts the number set bits in Rss.
Syntax Behavior
Rd=popcount(Rss) Rd = count_ones(Rss);

Class: XTYPE (slots 2,3)

Intrinsics

Rd=popcount(Rss) Word32 Q6_R_popcount_P(Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 0 0 0 1 1 s s s s s P P - - - - - - 0 1 1 d d d d d Rd=popcount(Rss)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 411


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Count trailing
Count trailing zeros (ct0) counts the number of consecutive zeros starting with the least
significant bit.
Count trailing ones (ct1) counts the number of consecutive ones starting with the least significant
bit.

Syntax Behavior
Rd=ct0(Rs) Rd = count_leading_ones(~reverse_bits(Rs));
Rd=ct0(Rss) Rd =
count_leading_ones(~reverse_bits(Rss));
Rd=ct1(Rs) Rd = count_leading_ones(reverse_bits(Rs));
Rd=ct1(Rss) Rd = count_leading_ones(reverse_bits(Rss));

Class: XTYPE (slots 2,3)

Intrinsics

Rd=ct0(Rs) Word32 Q6_R_ct0_R(Word32 Rs)


Rd=ct0(Rss) Word32 Q6_R_ct0_P(Word64 Rss)
Rd=ct1(Rs) Word32 Q6_R_ct1_R(Word32 Rs)
Rd=ct1(Rss) Word32 Q6_R_ct1_P(Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 0 0 1 1 1 s s s s s P P - - - - - - 0 1 0 d d d d d Rd=ct0(Rss)
1 0 0 0 1 0 0 0 1 1 1 s s s s s P P - - - - - - 1 0 0 d d d d d Rd=ct1(Rss)
1 0 0 0 1 1 0 0 0 1 0 s s s s s P P - - - - - - 1 0 0 d d d d d Rd=ct0(Rs)
1 0 0 0 1 1 0 0 0 1 0 s s s s s P P - - - - - - 1 0 1 d d d d d Rd=ct1(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 412


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Extract bit field


Extract a bit field from the source register (or register pair) and deposit into the least significant
bits of the destination register (or register pair). The other, more significant bits in the destination
are either cleared or sign-extended, depending on the instruction.
The width of the extracted field is obtained from the first immediate or from the most-significant
word of Rtt. The field offset is obtained from either the second immediate or from the least-
significant word of Rtt.
For register-based extract, where Rtt supplies the offset and width, the offset value is treated as a
signed 7-bit number. If this value is negative, the source register Rss is shifted left (the reverse
direction). Width number of bits are then taken from the least-significant portion of this result.
When the shift amount and/or offset captures data beyond the most significant end of the input,
these bits are taken as zero.

Width Offset

Rs

Rd

Zero Extension

Syntax Behavior
Rd=extract(Rs,#u5,#U5) width=#u;
offset=#U;
Rd = sxtwidth->32((Rs >> offset));

Rd=extract(Rs,Rtt) width=zxt6->32((Rtt.w[1]));
offset=sxt7->32((Rtt.w[0]));
Rd = sxtwidth->64((offset>0)?(zxt32->64(zxt32->64(Rs))>>>
offset): (zxt32->64(zxt32->64(Rs))<<offset));

Rd=extractu(Rs,#u5,#U5 width=#u;
) offset=#U;
Rd = zxtwidth->32((Rs >> offset));

Rd=extractu(Rs,Rtt) width=zxt6->32((Rtt.w[1]));
offset=sxt7->32((Rtt.w[0]));
Rd = zxtwidth->64((offset>0)?(zxt32->64(zxt32->64(Rs))>>>
offset): (zxt32->64(zxt32->64(Rs))<<offset));

Rdd=extract(Rss,#u6,#U width=#u;
6) offset=#U;
Rdd = sxtwidth->64((Rss >> offset));

80-N2040-53 Rev. AA 413


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rdd=extract(Rss,Rtt) width=zxt6->32((Rtt.w[1]));
offset=sxt7->32((Rtt.w[0]));
Rdd = sxtwidth-
>64((offset>0)?(Rss>>>offset):(Rss<<offset));

Rdd=extractu(Rss,#u6,# width=#u;
U6) offset=#U;
Rdd = zxtwidth->64((Rss >> offset));

Rdd=extractu(Rss,Rtt) width=zxt6->32((Rtt.w[1]));
offset=sxt7->32((Rtt.w[0]));
Rdd = zxtwidth-
>64((offset>0)?(Rss>>>offset):(Rss<<offset));

Class: XTYPE (slots 2,3)

Intrinsics

Rd=extract(Rs,#u5,#U5) Word32 Q6_R_extract_RII(Word32 Rs, Word32 Iu5, Word32


IU5)
Rd=extract(Rs,Rtt) Word32 Q6_R_extract_RP(Word32 Rs, Word64 Rtt)
Rd=extractu(Rs,#u5,#U5) Word32 Q6_R_extractu_RII(Word32 Rs, Word32 Iu5, Word32
IU5)
Rd=extractu(Rs,Rtt) Word32 Q6_R_extractu_RP(Word32 Rs, Word64 Rtt)
Rdd=extract(Rss,#u6,#U6) Word64 Q6_P_extract_PII(Word64 Rss, Word32 Iu6, Word32
IU6)
Rdd=extract(Rss,Rtt) Word64 Q6_P_extract_PP(Word64 Rss, Word64 Rtt)
Rdd=extractu(Rss,#u6,#U6 Word64 Q6_P_extractu_PII(Word64 Rss, Word32 Iu6,
) Word32 IU6)
Rdd=extractu(Rss,Rtt) Word64 Q6_P_extractu_PP(Word64 Rss, Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
Rdd=extractu(Rss,#u6,#U6
1 0 0 0 0 0 0 1 I I I s s s s s P P i i i i i i I I I d d d d d )

1 0 0 0 1 0 1 0 I I I s s s s s P P i i i i i i I I I d d d d d Rdd=extract(Rss,#u6,#U6)
1 0 0 0 1 1 0 1 0 I I s s s s s P P 0 i i i i i I I I d d d d d Rd=extractu(Rs,#u5,#U5)
1 0 0 0 1 1 0 1 1 I I s s s s s P P 0 i i i i i I I I d d d d d Rd=extract(Rs,#u5,#U5)
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 0 1 0 0 - s s s s s P P - t t t t t 0 0 - d d d d d Rdd=extractu(Rss,Rtt)
1 1 0 0 0 0 0 1 1 1 - s s s s s P P - t t t t t 1 0 - d d d d d Rdd=extract(Rss,Rtt)
1 1 0 0 1 0 0 1 0 0 - s s s s s P P - t t t t t 0 0 - d d d d d Rd=extractu(Rs,Rtt)
1 1 0 0 1 0 0 1 0 0 - s s s s s P P - t t t t t 0 1 - d d d d d Rd=extract(Rs,Rtt)

80-N2040-53 Rev. AA 414


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
MajOp Major opcode
MinOp Minor opcode
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 415


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Insert bit field


Replace a bit field in the destination register (or register pair) with bits from the least significant
portion of Rs/Rss. The number of bits is obtained from the first immediate or the most-significant
word of Rtt. The bits are shifted by the second immediate or the least significant word of Rtt.
When register Rtt specifies the offset, the low 7-bits of Rtt are treated as a signed 7-bit value. If
this value is negative, the result is zero.
Shift amounts and offsets that are too large may push bits beyond the end of the destination
register., and the bits do not appear in the destination register.

Width

Rs

Offset

Rd

Unchanged Unchanged

Syntax Behavior
Rx=insert(Rs,#u5,#U5) width=#u;
offset=#U;
Rx &= ~(((1<<width)-1)<<offset);
Rx |= ((Rs & ((1<<width)-1)) << offset);
Rx=insert(Rs,Rtt) width=zxt6->32((Rtt.w[1]));
offset=sxt7->32((Rtt.w[0]));
mask = ((1<<width)-1);
if (offset < 0) {
Rx = 0;
} else {
Rx &= ~(mask<<offset);
Rx |= ((Rs & mask) << offset);
}
Rxx=insert(Rss,#u6,#U6 width=#u;
) offset=#U;
Rxx &= ~(((1<<width)-1)<<offset);
Rxx |= ((Rss & ((1<<width)-1)) << offset);

80-N2040-53 Rev. AA 416


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rxx=insert(Rss,Rtt) width=zxt6->32((Rtt.w[1]));
offset=sxt7->32((Rtt.w[0]));
mask = ((1<<width)-1);
if (offset < 0) {
Rxx = 0;
} else {
Rxx &= ~(mask<<offset);
Rxx |= ((Rss & mask) << offset);
}

Class: XTYPE (slots 2,3)

Intrinsics

Rx=insert(Rs,#u5,#U5) Word32 Q6_R_insert_RII(Word32 Rx, Word32 Rs, Word32 Iu5,


Word32 IU5)
Rx=insert(Rs,Rtt) Word32 Q6_R_insert_RP(Word32 Rx, Word32 Rs, Word64 Rtt)
Rxx=insert(Rss,#u6,#U Word64 Q6_P_insert_PII(Word64 Rxx, Word64 Rss, Word32
6) Iu6, Word32 IU6)
Rxx=insert(Rss,Rtt) Word64 Q6_P_insert_PP(Word64 Rxx, Word64 Rss, Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp x5
1 0 0 0 0 0 1 1 I I I s s s s s P P i i i i i i I I I x x x x x Rxx=insert(Rss,#u6,#U6)
1 0 0 0 1 1 1 1 0 I I s s s s s P P 0 i i i i i I I I x x x x x Rx=insert(Rs,#u5,#U5)
ICLASS RegType s5 Parse t5 x5
1 1 0 0 1 0 0 0 - - - s s s s s P P - t t t t t - - - x x x x x Rx=insert(Rs,Rtt)
ICLASS RegType Maj s5 Parse t5 x5
1 1 0 0 1 0 1 0 0 - - s s s s s P P 0 t t t t t - - - x x x x x Rxx=insert(Rss,Rtt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x
MajOp Major opcode
MinOp Minor opcode
Maj Major opcode
RegType Register type

80-N2040-53 Rev. AA 417


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Interleave/deinterleave
For interleave, bits I+32 of Rss (which are the bits from the upper source word) are placed in the
odd bits (I*2)+1 of Rdd, while bits I of Rss (which are the bits from the lower source word) are
placed in the even bits (I*2) of Rdd.
For deinterleave, the even bits of the source register are placed in the even register of the result
pair, and the odd bits of the source register are placed in the odd register of the result pair.
"r1:0 = deinterleave(r1:0)" is the inverse of "r1:0 = interleave(r1:0)".
Syntax Behavior
Rdd=deinterleave(Rss) Rdd = deinterleave(ODD,EVEN);
Rdd=interleave(Rss) Rdd = interleave(Rss.w[1],Rss.w[0]);

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=deinterleave(Rss) Word64 Q6_P_deinterleave_P(Word64 Rss)


Rdd=interleave(Rss) Word64 Q6_P_interleave_P(Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 1 1 0 s s s s s P P - - - - - - 1 0 0 d d d d d Rdd=deinterleave(Rss)
1 0 0 0 0 0 0 0 1 1 0 s s s s s P P - - - - - - 1 0 1 d d d d d Rdd=interleave(Rss)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 418


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Linear feedback-shift iteration


Count the number of ones of the logical AND of the two source input values, and take the least
significant value of that sum. The first source value is shifted right by one bit, and the parity is
placed in the MSB.

Syntax Behavior
Rdd=lfs(Rss,Rtt) Rdd = (Rss.u64 >> 1) | ((1&count_ones(Rss &
Rtt)).u64<<63);

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=lfs(Rss,Rtt) Word64 Q6_P_lfs_PP(Word64 Rss, Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 0 1 1 0 - s s s s s P P - t t t t t 1 1 0 d d d d d Rdd=lfs(Rss,Rtt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 419


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Masked parity
Count the number of ones of the logical AND of the two source input values, and take the least
significant bit of that sum.

Syntax Behavior
Rd=parity(Rs,Rt) Rd = 1&count_ones(Rs & Rt);
Rd=parity(Rss,Rtt) Rd = 1&count_ones(Rss & Rtt);

Class: XTYPE (slots 2,3)

Intrinsics

Rd=parity(Rs,Rt) Word32 Q6_R_parity_RR(Word32 Rs, Word32 Rt)


Rd=parity(Rss,Rtt) Word32 Q6_R_parity_PP(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 d5
1 1 0 1 0 0 0 0 - - - s s s s s P P - t t t t t - - - d d d d d Rd=parity(Rss,Rtt)
1 1 0 1 0 1 0 1 1 1 1 s s s s s P P - t t t t t - - - d d d d d Rd=parity(Rs,Rt)

Field name Description


RegType Register type
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 420


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Bit reverse
Reverse the order of bits. The most significant swap with the least significant, bit 30 swaps with
bit 1, and so on.

Syntax Behavior
Rd=brev(Rs) Rd = reverse_bits(Rs);
Rdd=brev(Rss) Rdd = reverse_bits(Rss);

Class: XTYPE (slots 2,3)

Intrinsics

Rd=brev(Rs) Word32 Q6_R_brev_R(Word32 Rs)


Rdd=brev(Rss) Word64 Q6_P_brev_P(Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 1 1 0 s s s s s P P - - - - - - 1 1 0 d d d d d Rdd=brev(Rss)
1 0 0 0 1 1 0 0 0 1 0 s s s s s P P - - - - - - 1 1 0 d d d d d Rd=brev(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 421


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Set/clear/toggle bit
Set (to 1), clear (to 0), or toggle a single bit in the source, and place the resulting value in the
destination. Indicate the bit to manipulate using an immediate or register value.
If a register is used to indicate the bit position, and the value of the least-significant 7 bits of Rt is
out of range, the destination register is unchanged.

Syntax Behavior
Rd=clrbit(Rs,#u5) Rd = (Rs & (~(1<<#u)));
Rd=clrbit(Rs,Rt) Rd = (Rs & (~((sxt7->32(Rt)>0)?(zxt32->64(1)<<sxt7-
>32(Rt)):(zxt32->64(1)>>>sxt7->32(Rt)))));

Rd=setbit(Rs,#u5) Rd = (Rs | (1<<#u));


Rd=setbit(Rs,Rt) Rd = (Rs | (sxt7->32(Rt)>0)?(zxt32->64(1)<<sxt7-
>32(Rt)):(zxt32->64(1)>>>sxt7->32(Rt)));

Rd=togglebit(Rs,#u5) Rd = (Rs ^ (1<<#u));


Rd=togglebit(Rs,Rt) Rd = (Rs ^ (sxt7->32(Rt)>0)?(zxt32->64(1)<<sxt7-
>32(Rt)):(zxt32->64(1)>>>sxt7->32(Rt)));

Class: XTYPE (slots 2,3)

Intrinsics

Rd=clrbit(Rs,#u5) Word32 Q6_R_clrbit_RI(Word32 Rs, Word32


Iu5)
Rd=clrbit(Rs,Rt) Word32 Q6_R_clrbit_RR(Word32 Rs, Word32 Rt)
Rd=setbit(Rs,#u5) Word32 Q6_R_setbit_RI(Word32 Rs, Word32
Iu5)
Rd=setbit(Rs,Rt) Word32 Q6_R_setbit_RR(Word32 Rs, Word32 Rt)
Rd=togglebit(Rs,#u5) Word32 Q6_R_togglebit_RI(Word32 Rs, Word32
Iu5)
Rd=togglebit(Rs,Rt) Word32 Q6_R_togglebit_RR(Word32 Rs, Word32
Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 1 0 0 1 1 0 s s s s s P P 0 i i i i i 0 0 0 d d d d d Rd=setbit(Rs,#u5)
1 0 0 0 1 1 0 0 1 1 0 s s s s s P P 0 i i i i i 0 0 1 d d d d d Rd=clrbit(Rs,#u5)
1 0 0 0 1 1 0 0 1 1 0 s s s s s P P 0 i i i i i 0 1 0 d d d d d Rd=togglebit(Rs,#u5)
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 1 1 0 1 0 - s s s s s P P - t t t t t 0 0 - d d d d d Rd=setbit(Rs,Rt)
1 1 0 0 0 1 1 0 1 0 - s s s s s P P - t t t t t 0 1 - d d d d d Rd=clrbit(Rs,Rt)
1 1 0 0 0 1 1 0 1 0 - s s s s s P P - t t t t t 1 0 - d d d d d Rd=togglebit(Rs,Rt)

80-N2040-53 Rev. AA 422


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
MajOp Major opcode
MinOp Minor opcode
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 423


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Split bit field


Split the bit field in a register into upper and lower parts of variable size. The lower part is placed
in the lower word of a destination register pair, and the upper part is placed in the upper word of
the destination. An immediate value or register Rt is used to determine the bit position of the
split.

Bits

Rs

Rdd[0]

Zero

Rdd[1]

Zero

Syntax Behavior
Rdd=bitsplit(Rs,#u5) Rdd.w[1]=(Rs>>#u);
Rdd.w[0]=zxt#u->32(Rs);

Rdd=bitsplit(Rs,Rt) shamt = zxt5->32(Rt);


Rdd.w[1]=(Rs>>shamt);
Rdd.w[0]=zxtshamt->32(Rs);

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=bitsplit(Rs,#u5) Word64 Q6_P_bitsplit_RI(Word32 Rs, Word32


Iu5)
Rdd=bitsplit(Rs,Rt) Word64 Q6_P_bitsplit_RR(Word32 Rs, Word32
Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 0 0 1 1 0 s s s s s P P 0 i i i i i 1 0 0 d d d d d Rdd=bitsplit(Rs,#u5)
ICLASS RegType s5 Parse t5 d5
1 1 0 1 0 1 0 0 - - 1 s s s s s P P - t t t t t - - - d d d d d Rdd=bitsplit(Rs,Rt)

80-N2040-53 Rev. AA 424


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


RegType Register type
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
MajOp Major opcode
MinOp Minor opcode

80-N2040-53 Rev. AA 425


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Table index
The table index instruction supports fast lookup tables where the index into the table is stored in
a bit-field. The instruction forms the address of a table element by extracting the bit field and
inserting it into the appropriate bits of a pointer to the table element.
Tables are defined to contain entries of bytes, halfwords, words, or doublewords. The table must
align to a power-of-two size greater than or equal to the table size. For example, a 4 K byte table
should align to a 4 K byte boundary. This instruction supports tables with a maximum of 32 K
table entries.
Register Rx contains a pointer to within the table. Register Rs contains a field to extract and use as
a table index. This instruction first extracts the field from register Rs and then inserts it into
register Rx. The insertion point is bit 0 for tables of bytes, bit 1 for tables of halfwords, bit 2 for
tables of words, and bit 3 for tables of doublewords.
In the assembly syntax, the width and offset values represent the field in Rs to extract. Use
unsigned constants to specify the width and offsets in assembly. In the encoded instruction,
however, the assembler adjusts these values as follows.
■ For tableidxb, no adjustment is necessary.
■ For tableidxh, the assembler encodes offset-1 in the signed immediate field.
■ For tableidxw, the assembler encodes offset-2 in the signed immediate field.
■ For tableidxd, the assembler encodes offset-3 in the signed immediate field.
Rx=TABLEIDXD(Rs,#width,#offset)

Width Offset

Rs

Unchanged Rx

Unchanged

Syntax Behavior
Rx=tableidxb(Rs,#u4,#S6):raw width=#u;
offset=#S;
field = Rs[(width+offset-1):offset];
Rx[(width-1+0):0]=field;
Rx=tableidxb(Rs,#u4,#U5) Assembler mapped to:
"Rx=tableidxb(Rs,#u4,#U5):raw"
Rx=tableidxd(Rs,#u4,#S6):raw width=#u;
offset=#S+3;
field = Rs[(width+offset-1):offset];
Rx[(width-1+3):3]=field;

80-N2040-53 Rev. AA 426


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rx=tableidxd(Rs,#u4,#U5) Assembler mapped to: "Rx=tableidxd(Rs,#u4,#U5-
3):raw"
Rx=tableidxh(Rs,#u4,#S6):raw width=#u;
offset=#S+1;
field = Rs[(width+offset-1):offset];
Rx[(width-1+1):1]=field;
Rx=tableidxh(Rs,#u4,#U5) Assembler mapped to: "Rx=tableidxh(Rs,#u4,#U5-
1):raw"
Rx=tableidxw(Rs,#u4,#S6):raw width=#u;
offset=#S+2;
field = Rs[(width+offset-1):offset];
Rx[(width-1+2):2]=field;
Rx=tableidxw(Rs,#u4,#U5) Assembler mapped to: "Rx=tableidxw(Rs,#u4,#U5-
2):raw"

Class: XTYPE (slots 2,3)

Intrinsics

Rx=tableidxb(Rs,#u4,#U5) Word32 Q6_R_tableidxb_RII(Word32 Rx, Word32 Rs, Word32


Iu4, Word32 IU5)
Rx=tableidxd(Rs,#u4,#U5) Word32 Q6_R_tableidxd_RII(Word32 Rx, Word32 Rs, Word32
Iu4, Word32 IU5)
Rx=tableidxh(Rs,#u4,#U5) Word32 Q6_R_tableidxh_RII(Word32 Rx, Word32 Rs, Word32
Iu4, Word32 IU5)
Rx=tableidxw(Rs,#u4,#U5) Word32 Q6_R_tableidxw_RII(Word32 Rx, Word32 Rs, Word32
Iu4, Word32 IU5)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp x5

1 0 0 0 0 1 1 1 0 0 i s s s s s P P I I I I I I i i i x x x x x Rx=tableidxb(Rs,#u4,#S6):r
aw

1 0 0 0 0 1 1 1 0 1 i s s s s s P P I I I I I I i i i x x x x x Rx=tableidxh(Rs,#u4,#S6):r
aw
Rx=tableidxw(Rs,#u4,#S6):
1 0 0 0 0 1 1 1 1 0 i s s s s s P P I I I I I I i i i x x x x x raw

1 0 0 0 0 1 1 1 1 1 i s s s s s P P I I I I I I i i i x x x x x Rx=tableidxd(Rs,#u4,#S6):r
aw

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
x5 Field to encode register x
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 427


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.10.3 XTYPE COMPLEX


The XTYPE COMPLEX instruction subclass includes instructions that are for complex math, using
imaginary values.

Complex add/sub halfwords


Cross vector add-sub or sub-add instructions perform X + jY and X - jY complex operations. Each
16-bit result is saturated to 16 bits.

Rdd=vxaddsubh(Rss,Rtt):sat
I R I R Rss

I R I R Rtt

- -
+ + + +

Sat_16 Sat_16 Sat_16 Sat_16

I R I R Rdd

Rdd=vxsubaddh(Rss,Rt):rnd:>>1:sat
I R I R Rss
I R I R Rtt

- -
1 + 1 + 1 + 1 +

>>1 >>1 >>1 >>1

Rdd
Sat_16 Sat_16 Sat_16 Sat_16

I R I R

80-N2040-53 Rev. AA 428


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rdd=vxaddsubh(Rss,Rtt):rnd:>>1 Rdd.h[0]=sat16((Rss.h[0]+Rtt.h[1]+1)>>1)
:sat ;
Rdd.h[1]=sat16((Rss.h[1]-
Rtt.h[0]+1)>>1);
Rdd.h[2]=sat16((Rss.h[2]+Rtt.h[3]+1)>>1)
;
Rdd.h[3]=sat16((Rss.h[3]-
Rtt.h[2]+1)>>1);
Rdd=vxaddsubh(Rss,Rtt):sat Rdd.h[0]=sat16(Rss.h[0]+Rtt.h[1]);
Rdd.h[1]=sat16(Rss.h[1]-Rtt.h[0]);
Rdd.h[2]=sat16(Rss.h[2]+Rtt.h[3]);
Rdd.h[3]=sat16(Rss.h[3]-Rtt.h[2]);

Rdd=vxsubaddh(Rss,Rtt):rnd:>>1 Rdd.h[0]=sat16((Rss.h[0]-
:sat Rtt.h[1]+1)>>1);
Rdd.h[1]=sat16((Rss.h[1]+Rtt.h[0]+1)>>1)
;
Rdd.h[2]=sat16((Rss.h[2]-
Rtt.h[3]+1)>>1);
Rdd.h[3]=sat16((Rss.h[3]+Rtt.h[2]+1)>>1)
;
Rdd=vxsubaddh(Rss,Rtt):sat Rdd.h[0]=sat16(Rss.h[0]-Rtt.h[1]);
Rdd.h[1]=sat16(Rss.h[1]+Rtt.h[0]);
Rdd.h[2]=sat16(Rss.h[2]-Rtt.h[3]);
Rdd.h[3]=sat16(Rss.h[3]+Rtt.h[2]);

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 429


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rdd=vxaddsubh(Rss,Rtt):rnd:>>1 Word64 Q6_P_vxaddsubh_PP_rnd_rs1_sat(Word64


:sat Rss, Word64 Rtt)
Rdd=vxaddsubh(Rss,Rtt):sat Word64 Q6_P_vxaddsubh_PP_sat(Word64 Rss, Word64
Rtt)
Rdd=vxsubaddh(Rss,Rtt):rnd:>>1 Word64 Q6_P_vxsubaddh_PP_rnd_rs1_sat(Word64
:sat Rss, Word64 Rtt)
Rdd=vxsubaddh(Rss,Rtt):sat Word64 Q6_P_vxsubaddh_PP_sat(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5

1 1 0 0 0 0 0 1 0 1 - s s s s s P P - t t t t t 1 0 0 d d d d d Rdd=vxaddsubh(Rss,Rtt):s
at
Rdd=vxsubaddh(Rss,Rtt):s
1 1 0 0 0 0 0 1 0 1 - s s s s s P P - t t t t t 1 1 0 d d d d d at

1 1 0 0 0 0 0 1 1 1 - s s s s s P P - t t t t t 0 0 - d d d d d Rdd=vxaddsubh(Rss,Rtt):r
nd:>>1:sat

1 1 0 0 0 0 0 1 1 1 - s s s s s P P - t t t t t 0 1 - d d d d d Rdd=vxsubaddh(Rss,Rtt):r
nd:>>1:sat

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 430


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Complex add/sub words


Cross vector add-sub or sub-add instructions perform X+jY and X-jY complex operations. Each 32-
bit result is saturated to 32 bits.

Rdd=vxaddsubw(Rss,Rt):sat Rdd=vxsubaddw(Rss,Rt):sat
I R Rss I R Rss

I R Rtt I R Rtt

- + + -

Sat_32 Sat_32 Sat_32 Sat_32

I R Rdd I R Rdd

Syntax Behavior
Rdd=vxaddsubw(Rss,Rtt):sat Rdd.w[0]=sat32(Rss.w[0]+Rtt.w[1]);
Rdd.w[1]=sat32(Rss.w[1]-Rtt.w[0]);

Rdd=vxsubaddw(Rss,Rtt):sat Rdd.w[0]=sat32(Rss.w[0]-Rtt.w[1]);
Rdd.w[1]=sat32(Rss.w[1]+Rtt.w[0]);

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vxaddsubw(Rss,Rtt):sat Word64 Q6_P_vxaddsubw_PP_sat(Word64 Rss, Word64


Rtt)
Rdd=vxsubaddw(Rss,Rtt):sat Word64 Q6_P_vxsubaddw_PP_sat(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
Rdd=vxaddsubw(Rss,Rtt):s
1 1 0 0 0 0 0 1 0 1 - s s s s s P P - t t t t t 0 0 0 d d d d d at

1 1 0 0 0 0 0 1 0 1 - s s s s s P P - t t t t t 0 1 0 d d d d d Rdd=vxsubaddw(Rss,Rtt):s
at

80-N2040-53 Rev. AA 431


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 432


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Complex multiply
Multiply complex values Rs and Rt. The inputs have a real 16-bit value in the low halfword and an
imaginary 16-bit value in the high halfword. Optionally, scale the result by 0-1 bits. Optionally, add
a complex accumulator. Saturate the real and imaginary portions to 32-bits. The output has a real
32-bit value in the low word and an imaginary 32-bit value in the high word. The Rt input can be
optionally conjugated. Another option is to subtracted the result from the destination rather than
accumulate it.

Rxx+=cmpy(Rs,Rt):sat

Rs I R I R Rs

Rt I R I R Rt

* * * *
32 32 32 32

<<0-1 <<0-1 <<0-1 <<0-1

-
Add Add

Sat_32 Sat_32 32

32

Imaginary accumulation Real accumulation

Rxx

Syntax Behavior
Rdd=cmpy(Rs,Rt)[:<<1]:sat Rdd.w[1]=sat32((Rs.h[1] * Rt.h[0])[<<1] + (Rs.h[0] *
Rt.h[1])[<<1]);
Rdd.w[0]=sat32((Rs.h[0] * Rt.h[0])[<<1] - (Rs.h[1] *
Rt.h[1])[<<1]);
Rdd=cmpy(Rs,Rt*)[:<<1]:sa Rdd.w[1]=sat32((Rs.h[1] * Rt.h[0])[<<1] - (Rs.h[0] *
t Rt.h[1])[<<1]);
Rdd.w[0]=sat32((Rs.h[0] * Rt.h[0])[<<1] + (Rs.h[1] *
Rt.h[1])[<<1]);

80-N2040-53 Rev. AA 433


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rxx+=cmpy(Rs,Rt)[:<<1]:sa Rxx.w[1]=sat32(Rxx.w[1] + (Rs.h[1] * Rt.h[0])[<<1] +
t (Rs.h[0] * Rt.h[1])[<<1]);
Rxx.w[0]=sat32(Rxx.w[0] + (Rs.h[0] * Rt.h[0])[<<1] -
(Rs.h[1] * Rt.h[1])[<<1]);
Rxx+=cmpy(Rs,Rt*)[:<<1]:s Rxx.w[1]=sat32(Rxx.w[1] + (Rs.h[1] * Rt.h[0])[<<1] -
at (Rs.h[0] * Rt.h[1])[<<1]);
Rxx.w[0]=sat32(Rxx.w[0] + (Rs.h[0] * Rt.h[0])[<<1] +
(Rs.h[1] * Rt.h[1])[<<1]);
Rxx- Rxx.w[1]=sat32(Rxx.w[1] - ((Rs.h[1] * Rt.h[0])[<<1] +
=cmpy(Rs,Rt)[:<<1]:sat (Rs.h[0] * Rt.h[1])[<<1]));
Rxx.w[0]=sat32(Rxx.w[0] - ((Rs.h[0] * Rt.h[0])[<<1] -
(Rs.h[1] * Rt.h[1])[<<1]));
Rxx- Rxx.w[1]=sat32(Rxx.w[1] - ((Rs.h[1] * Rt.h[0])[<<1] -
=cmpy(Rs,Rt*)[:<<1]:sat (Rs.h[0] * Rt.h[1])[<<1]));
Rxx.w[0]=sat32(Rxx.w[0] - ((Rs.h[0] * Rt.h[0])[<<1] +
(Rs.h[1] * Rt.h[1])[<<1]));

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 434


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rdd=cmpy(Rs,Rt):<<1:sat Word64 Q6_P_cmpy_RR_s1_sat(Word32 Rs, Word32 Rt)


Rdd=cmpy(Rs,Rt):sat Word64 Q6_P_cmpy_RR_sat(Word32 Rs, Word32 Rt)
Rdd=cmpy(Rs,Rt*):<<1:sat Word64 Q6_P_cmpy_RR_conj_s1_sat(Word32 Rs, Word32 Rt)
Rdd=cmpy(Rs,Rt*):sat Word64 Q6_P_cmpy_RR_conj_sat(Word32 Rs, Word32 Rt)
Rxx+=cmpy(Rs,Rt):<<1:sat Word64 Q6_P_cmpyacc_RR_s1_sat(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx+=cmpy(Rs,Rt):sat Word64 Q6_P_cmpyacc_RR_sat(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx+=cmpy(Rs,Rt*):<<1:sa Word64 Q6_P_cmpyacc_RR_conj_s1_sat(Word64 Rxx, Word32
t Rs, Word32 Rt)
Rxx+=cmpy(Rs,Rt*):sat Word64 Q6_P_cmpyacc_RR_conj_sat(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=cmpy(Rs,Rt):<<1:sat Word64 Q6_P_cmpynac_RR_s1_sat(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=cmpy(Rs,Rt):sat Word64 Q6_P_cmpynac_RR_sat(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx- Word64 Q6_P_cmpynac_RR_conj_s1_sat(Word64 Rxx, Word32
=cmpy(Rs,Rt*):<<1:sat Rs, Word32 Rt)
Rxx-=cmpy(Rs,Rt*):sat Word64 Q6_P_cmpynac_RR_conj_sat(Word64 Rxx, Word32 Rs,
Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 0 1 0 1 N 0 0 s s s s s P P 0 t t t t t 1 1 0 d d d d d Rdd=cmpy(Rs,Rt)[:<<N]:sat

1 1 1 0 0 1 0 1 N 1 0 s s s s s P P 0 t t t t t 1 1 0 d d d d d Rdd=cmpy(Rs,Rt*)[:<<N]:s
at
ICLASS RegType MajOp s5 Parse t5 MinOp x5

1 1 1 0 0 1 1 1 N 0 0 s s s s s P P 0 t t t t t 1 1 0 x x x x x Rxx+=cmpy(Rs,Rt)[:<<N]:s
at
Rxx-
1 1 1 0 0 1 1 1 N 0 0 s s s s s P P 0 t t t t t 1 1 1 x x x x x =cmpy(Rs,Rt)[:<<N]:sat

1 1 1 0 0 1 1 1 N 1 0 s s s s s P P 0 t t t t t 1 1 0 x x x x x Rxx+=cmpy(Rs,Rt*)[:<<N]:
sat

1 1 1 0 0 1 1 1 N 1 0 s s s s s P P 0 t t t t t 1 1 1 x x x x x Rxx-
=cmpy(Rs,Rt*)[:<<N]:sat

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d

80-N2040-53 Rev. AA 435


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 436


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Complex multiply real or imaginary


Multiply complex values Rs and Rt. The inputs have a real 16-bit value in the low halfword and an
imaginary 16-bit value in the high halfword. Take either the real or imaginary result and optionally
accumulate with a 64-bit destination.

Rxx+=cmpyi(Rs,Rt)

I R Rs

I R Rt

* *
32
32

Add

64

Rxx
Imaginary Accumulation

Syntax Behavior
Rdd=cmpyi(Rs,Rt) Rdd = (Rs.h[1] * Rt.h[0]) + (Rs.h[0] * Rt.h[1]);
Rdd=cmpyr(Rs,Rt) Rdd = (Rs.h[0] * Rt.h[0]) - (Rs.h[1] * Rt.h[1]);
Rxx+=cmpyi(Rs,Rt) Rxx = Rxx + (Rs.h[1] * Rt.h[0]) + (Rs.h[0] *
Rt.h[1]);
Rxx+=cmpyr(Rs,Rt) Rxx = Rxx + (Rs.h[0] * Rt.h[0]) - (Rs.h[1] *
Rt.h[1]);

80-N2040-53 Rev. AA 437


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=cmpyi(Rs,Rt) Word64 Q6_P_cmpyi_RR(Word32 Rs, Word32 Rt)


Rdd=cmpyr(Rs,Rt) Word64 Q6_P_cmpyr_RR(Word32 Rs, Word32 Rt)
Rxx+=cmpyi(Rs,Rt) Word64 Q6_P_cmpyiacc_RR(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx+=cmpyr(Rs,Rt) Word64 Q6_P_cmpyracc_RR(Word64 Rxx, Word32 Rs,
Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 0 1 0 1 0 0 0 s s s s s P P 0 t t t t t 0 0 1 d d d d d Rdd=cmpyi(Rs,Rt)
1 1 1 0 0 1 0 1 0 0 0 s s s s s P P 0 t t t t t 0 1 0 d d d d d Rdd=cmpyr(Rs,Rt)
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 0 1 1 1 0 0 0 s s s s s P P 0 t t t t t 0 0 1 x x x x x Rxx+=cmpyi(Rs,Rt)
1 1 1 0 0 1 1 1 0 0 0 s s s s s P P 0 t t t t t 0 1 0 x x x x x Rxx+=cmpyr(Rs,Rt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 438


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Complex multiply with round and pack


Multiply complex values Rs and Rt. The inputs have a real 16-bit value in the low halfword and an
imaginary 16-bit value in the high halfword. The Rt input is optionally conjugated. The multiplier
results are optionally scaled by 0 to 1 bits. A rounding constant is added to each real and
imaginary sum. The real and imaginary parts are individually saturated to 32 bits. The upper 16
bits of each 32-bit results are packed in a 32-bit destination register.

Rd=cmpy(Rs,Rt):rnd:sat

Rs I R I R Rs

Rt I R I R Rt

* * * *
32 32 32 32

0x8000 <<0-1 <<0-1 <<0-1 <<0-1 0x8000


-
Add Add

Sat_32 Sat_32

High 16-bits High 16-bits

I R Rd

Syntax Behavior
Rd=cmpy(Rs,Rt)[:<<1]:rnd:s Rd.h[1]=(sat32((Rs.h[1] * Rt.h[0])[<<1] + (Rs.h[0]
at * Rt.h[1])[<<1] + 0x8000)).h[1];
Rd.h[0]=(sat32((Rs.h[0] * Rt.h[0])[<<1] - (Rs.h[1]
* Rt.h[1])[<<1] + 0x8000)).h[1];
Rd=cmpy(Rs,Rt*)[:<<1]:rnd: Rd.h[1]=(sat32((Rs.h[1] * Rt.h[0])[<<1] - (Rs.h[0]
sat * Rt.h[1])[<<1] + 0x8000)).h[1];
Rd.h[0]=(sat32((Rs.h[0] * Rt.h[0])[<<1] + (Rs.h[1]
* Rt.h[1])[<<1] + 0x8000)).h[1];

80-N2040-53 Rev. AA 439


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=cmpy(Rs,Rt):<<1:rnd:sa Word32 Q6_R_cmpy_RR_s1_rnd_sat(Word32 Rs, Word32 Rt)


t
Rd=cmpy(Rs,Rt):rnd:sat Word32 Q6_R_cmpy_RR_rnd_sat(Word32 Rs, Word32 Rt)
Rd=cmpy(Rs,Rt*):<<1:rnd:s Word32 Q6_R_cmpy_RR_conj_s1_rnd_sat(Word32 Rs,
at Word32 Rt)
Rd=cmpy(Rs,Rt*):rnd:sat Word32 Q6_R_cmpy_RR_conj_rnd_sat(Word32 Rs, Word32
Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5

1 1 1 0 1 1 0 1 N 0 1 s s s s s P P 0 t t t t t 1 1 0 d d d d d Rd=cmpy(Rs,Rt)[:<<N]:rnd:
sat
Rd=cmpy(Rs,Rt*)[:<<N]:rnd
1 1 1 0 1 1 0 1 N 1 1 s s s s s P P 0 t t t t t 1 1 0 d d d d d :sat

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 440


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Complex multiply 32 × 16

Multiply 32 by 16 bit complex values Rss and Rt. The inputs have a real value in the low part of a
register and the imaginary value in the upper part. The multiplier results are scaled by 1 bit and
accumulated with a rounding constant. The result is saturated to 32 bits.
Rd=cmpyrwh(Rss,Rt):<<1:rnd:sat Rd=cmpyiwh(Rss,Rt):<<1:rnd:sat
I R Rss I R Rss

I R Rt I R Rt

* * * 48
*
48
48 48
0x8000 0x8000
<<1 <<1 <<1 <<1
-
Add Add

Sat_32 Sat_32

Real result Rd Imag result Rd

Syntax Behavior
Rd=cmpyiwh(Rss,Rt):<<1:rnd:s Rd = sat32(( (Rss.w[0] * Rt.h[1]) + (Rss.w[1] *
at Rt.h[0]) + 0x4000)>>15);
Rd=cmpyiwh(Rss,Rt*):<<1:rnd: Rd = sat32(( (Rss.w[1] * Rt.h[0]) - (Rss.w[0] *
sat Rt.h[1]) + 0x4000)>>15);
Rd=cmpyrwh(Rss,Rt):<<1:rnd:s Rd = sat32(( (Rss.w[0] * Rt.h[0]) - (Rss.w[1] *
at Rt.h[1]) + 0x4000)>>15);
Rd=cmpyrwh(Rss,Rt*):<<1:rnd: Rd = sat32(( (Rss.w[0] * Rt.h[0]) + (Rss.w[1] *
sat Rt.h[1]) + 0x4000)>>15);

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 441


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rd=cmpyiwh(Rss,Rt):<<1:rnd:s Word32 Q6_R_cmpyiwh_PR_s1_rnd_sat(Word64 Rss,


at Word32 Rt)
Rd=cmpyiwh(Rss,Rt*):<<1:rnd: Word32 Q6_R_cmpyiwh_PR_conj_s1_rnd_sat(Word64
sat Rss, Word32 Rt)
Rd=cmpyrwh(Rss,Rt):<<1:rnd:s Word32 Q6_R_cmpyrwh_PR_s1_rnd_sat(Word64 Rss,
at Word32 Rt)
Rd=cmpyrwh(Rss,Rt*):<<1:rnd: Word32 Q6_R_cmpyrwh_PR_conj_s1_rnd_sat(Word64
sat Rss, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 Min d5

1 1 0 0 0 1 0 1 - - - s s s s s P P - t t t t t 1 0 0 d d d d d Rd=cmpyiwh(Rss,Rt):<<1:r
nd:sat
Rd=cmpyiwh(Rss,Rt*):<<1:
1 1 0 0 0 1 0 1 - - - s s s s s P P - t t t t t 1 0 1 d d d d d rnd:sat

1 1 0 0 0 1 0 1 - - - s s s s s P P - t t t t t 1 1 0 d d d d d Rd=cmpyrwh(Rss,Rt):<<1:r
nd:sat

1 1 0 0 0 1 0 1 - - - s s s s s P P - t t t t t 1 1 1 d d d d d Rd=cmpyrwh(Rss,Rt*):<<1:
rnd:sat

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 442


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Complex multiply real or imaginary 32-bit


Multiply complex values Rss and Rtt. The inputs have a real 32-bit value in the low word and an
imaginary 32-bit value in the high word. Take either the real or imaginary result and optionally
accumulate with a 64-bit destination.

Syntax Behavior
Rd=cmpyiw(Rss,Rtt):<<1:rnd:s tmp128 = sxt64->128((Rss.w[0] * Rtt.w[1]));
at acc128 = sxt64->128((Rss.w[1] * Rtt.w[0]));
const128 = sxt64->128(0x40000000);
acc128 = tmp128+acc128;
acc128 = acc128+const128;
acc128 = (size8s_t) (acc128 >> 31);
acc64 = sxt128->64(acc128);
Rd = sat32(acc64);

Rd=cmpyiw(Rss,Rtt):<<1:sat tmp128 = sxt64->128((Rss.w[0] * Rtt.w[1]));


acc128 = sxt64->128((Rss.w[1] * Rtt.w[0]));
acc128 = tmp128+acc128;
acc128 = (size8s_t) (acc128 >> 31);
acc64 = sxt128->64(acc128);
Rd = sat32(acc64);

Rd=cmpyiw(Rss,Rtt*):<<1:rnd: tmp128 = sxt64->128((Rss.w[1] * Rtt.w[0]));


sat acc128 = sxt64->128((Rss.w[0] * Rtt.w[1]));
const128 = sxt64->128(0x40000000);
acc128 = tmp128-acc128;
acc128 = acc128+const128;
acc128 = (size8s_t) (acc128 >> 31);
acc64 = sxt128->64(acc128);
Rd = sat32(acc64);

Rd=cmpyiw(Rss,Rtt*):<<1:sat tmp128 = sxt64->128((Rss.w[1] * Rtt.w[0]));


acc128 = sxt64->128((Rss.w[0] * Rtt.w[1]));
acc128 = tmp128-acc128;
acc128 = (size8s_t) (acc128 >> 31);
acc64 = sxt128->64(acc128);
Rd = sat32(acc64);

Rd=cmpyrw(Rss,Rtt):<<1:rnd:s tmp128 = sxt64->128((Rss.w[0] * Rtt.w[0]));


at acc128 = sxt64->128((Rss.w[1] * Rtt.w[1]));
const128 = sxt64->128(0x40000000);
acc128 = tmp128-acc128;
acc128 = acc128+const128;
acc128 = (size8s_t) (acc128 >> 31);
acc64 = sxt128->64(acc128);
Rd = sat32(acc64);

Rd=cmpyrw(Rss,Rtt):<<1:sat tmp128 = sxt64->128((Rss.w[0] * Rtt.w[0]));


acc128 = sxt64->128((Rss.w[1] * Rtt.w[1]));
acc128 = tmp128-acc128;
acc128 = (size8s_t) (acc128 >> 31);
acc64 = sxt128->64(acc128);
Rd = sat32(acc64);

80-N2040-53 Rev. AA 443


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rd=cmpyrw(Rss,Rtt*):<<1:rnd: tmp128 = sxt64->128((Rss.w[0] * Rtt.w[0]));
sat acc128 = sxt64->128((Rss.w[1] * Rtt.w[1]));
const128 = sxt64->128(0x40000000);
acc128 = tmp128+acc128;
acc128 = acc128+const128;
acc128 = (size8s_t) (acc128 >> 31);
acc64 = sxt128->64(acc128);
Rd = sat32(acc64);

Rd=cmpyrw(Rss,Rtt*):<<1:sat tmp128 = sxt64->128((Rss.w[0] * Rtt.w[0]));


acc128 = sxt64->128((Rss.w[1] * Rtt.w[1]));
acc128 = tmp128+acc128;
acc128 = (size8s_t) (acc128 >> 31);
acc64 = sxt128->64(acc128);
Rd = sat32(acc64);

Rdd=cmpyiw(Rss,Rtt) Rdd = ((Rss.w[0] * Rtt.w[1]) + (Rss.w[1] *


Rtt.w[0]));
Rdd=cmpyiw(Rss,Rtt*) Rdd = ((Rss.w[1] * Rtt.w[0]) - (Rss.w[0] *
Rtt.w[1]));
Rdd=cmpyrw(Rss,Rtt) Rdd = ((Rss.w[0] * Rtt.w[0]) - (Rss.w[1] *
Rtt.w[1]));
Rdd=cmpyrw(Rss,Rtt*) Rdd = ((Rss.w[0] * Rtt.w[0]) + (Rss.w[1] *
Rtt.w[1]));
Rxx+=cmpyiw(Rss,Rtt) Rxx += ((Rss.w[0] * Rtt.w[1]) + (Rss.w[1] *
Rtt.w[0]));
Rxx+=cmpyiw(Rss,Rtt*) Rxx += ((Rss.w[1] * Rtt.w[0]) - (Rss.w[0] *
Rtt.w[1]));
Rxx+=cmpyrw(Rss,Rtt) Rxx += ((Rss.w[0] * Rtt.w[0]) - (Rss.w[1] *
Rtt.w[1]));
Rxx+=cmpyrw(Rss,Rtt*) Rxx += ((Rss.w[0] * Rtt.w[0]) + (Rss.w[1] *
Rtt.w[1]));

Class: XTYPE (slots 3)

Notes

■ This instruction can only execute on a core with the Hexagon audio extensions
■ A packet with this instruction cannot have a slot 2 multiply instruction.
■ If saturation occurs during execution of this instruction (a result is clamped to either
maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 444


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rd=cmpyiw(Rss,Rtt):<<1:rnd:s Word32 Q6_R_cmpyiw_PP_s1_rnd_sat(Word64 Rss,


at Word64 Rtt)
Rd=cmpyiw(Rss,Rtt):<<1:sat Word32 Q6_R_cmpyiw_PP_s1_sat(Word64 Rss, Word64
Rtt)
Rd=cmpyiw(Rss,Rtt*):<<1:rnd: Word32 Q6_R_cmpyiw_PP_conj_s1_rnd_sat(Word64 Rss,
sat Word64 Rtt)
Rd=cmpyiw(Rss,Rtt*):<<1:sat Word32 Q6_R_cmpyiw_PP_conj_s1_sat(Word64 Rss,
Word64 Rtt)
Rd=cmpyrw(Rss,Rtt):<<1:rnd:s Word32 Q6_R_cmpyrw_PP_s1_rnd_sat(Word64 Rss,
at Word64 Rtt)
Rd=cmpyrw(Rss,Rtt):<<1:sat Word32 Q6_R_cmpyrw_PP_s1_sat(Word64 Rss, Word64
Rtt)
Rd=cmpyrw(Rss,Rtt*):<<1:rnd: Word32 Q6_R_cmpyrw_PP_conj_s1_rnd_sat(Word64 Rss,
sat Word64 Rtt)
Rd=cmpyrw(Rss,Rtt*):<<1:sat Word32 Q6_R_cmpyrw_PP_conj_s1_sat(Word64 Rss,
Word64 Rtt)
Rdd=cmpyiw(Rss,Rtt) Word64 Q6_P_cmpyiw_PP(Word64 Rss, Word64 Rtt)
Rdd=cmpyiw(Rss,Rtt*) Word64 Q6_P_cmpyiw_PP_conj(Word64 Rss, Word64 Rtt)
Rdd=cmpyrw(Rss,Rtt) Word64 Q6_P_cmpyrw_PP(Word64 Rss, Word64 Rtt)
Rdd=cmpyrw(Rss,Rtt*) Word64 Q6_P_cmpyrw_PP_conj(Word64 Rss, Word64 Rtt)
Rxx+=cmpyiw(Rss,Rtt) Word64 Q6_P_cmpyiwacc_PP(Word64 Rxx, Word64 Rss,
Word64 Rtt)
Rxx+=cmpyiw(Rss,Rtt*) Word64 Q6_P_cmpyiwacc_PP_conj(Word64 Rxx, Word64
Rss, Word64 Rtt)
Rxx+=cmpyrw(Rss,Rtt) Word64 Q6_P_cmpyrwacc_PP(Word64 Rxx, Word64 Rss,
Word64 Rtt)
Rxx+=cmpyrw(Rss,Rtt*) Word64 Q6_P_cmpyrwacc_PP_conj(Word64 Rxx, Word64
Rss, Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 0 0 1 1 s s s s s P P 0 t t t t t 0 1 0 d d d d d Rdd=cmpyiw(Rss,Rtt)
1 1 1 0 1 0 0 0 1 0 0 s s s s s P P 0 t t t t t 0 1 0 d d d d d Rdd=cmpyrw(Rss,Rtt)
1 1 1 0 1 0 0 0 1 1 0 s s s s s P P 0 t t t t t 0 1 0 d d d d d Rdd=cmpyrw(Rss,Rtt*)
1 1 1 0 1 0 0 0 1 1 1 s s s s s P P 0 t t t t t 0 1 0 d d d d d Rdd=cmpyiw(Rss,Rtt*)

1 1 1 0 1 0 0 1 0 0 0 s s s s s P P 0 t t t t t 1 0 0 d d d d d Rd=cmpyiw(Rss,Rtt*):<<1:s
at

1 1 1 0 1 0 0 1 0 0 1 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rd=cmpyiw(Rss,Rtt):<<1:s
at
Rd=cmpyrw(Rss,Rtt):<<1:s
1 1 1 0 1 0 0 1 0 1 0 s s s s s P P 0 t t t t t 0 0 0 d d d d d at

1 1 1 0 1 0 0 1 0 1 1 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rd=cmpyrw(Rss,Rtt*):<<1:
sat

1 1 1 0 1 0 0 1 1 0 0 s s s s s P P 0 t t t t t 1 0 0 d d d d d Rd=cmpyiw(Rss,Rtt*):<<1:r
nd:sat

80-N2040-53 Rev. AA 445


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Rd=cmpyiw(Rss,Rtt):<<1:rn
1 1 1 0 1 0 0 1 1 0 1 s s s s s P P 0 t t t t t 0 0 0 d d d d d
d:sat

1 1 1 0 1 0 0 1 1 1 0 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rd=cmpyrw(Rss,Rtt):<<1:r
nd:sat
Rd=cmpyrw(Rss,Rtt*):<<1:r
1 1 1 0 1 0 0 1 1 1 1 s s s s s P P 0 t t t t t 0 0 0 d d d d d
nd:sat
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 0 1 0 0 1 0 s s s s s P P 0 t t t t t 1 1 0 x x x x x Rxx+=cmpyiw(Rss,Rtt*)
1 1 1 0 1 0 1 0 0 1 1 s s s s s P P 0 t t t t t 0 1 0 x x x x x Rxx+=cmpyiw(Rss,Rtt)
1 1 1 0 1 0 1 0 1 0 0 s s s s s P P 0 t t t t t 0 1 0 x x x x x Rxx+=cmpyrw(Rss,Rtt)
1 1 1 0 1 0 1 0 1 1 0 s s s s s P P 0 t t t t t 0 1 0 x x x x x Rxx+=cmpyrw(Rss,Rtt*)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 446


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector complex multiply real or imaginary


The inputs Rss and Rtt are a vector of two complex values. Each complex value is composed of a
16-bit imaginary portion in the upper halfword and a 16-bit real portion in the lower halfword.
Generate two complex results, either the real result or the imaginary result. These results are
optionally shifted left by 0 to 1 bits, and optionally accumulated with the destination register.

Rxx+=vcmpyi(Rss,Rtt):sat

I R I R Rss

I R I R Rtt

* * * *
32 32 32 32

<<0-1 <<0-1 <<0-1 <<0-1

Add Add

Sat_32 Sat_32 32

32

Imag accumulation Imag accumulation

Rxx

Syntax Behavior
Rdd=vcmpyi(Rss,Rtt)[:<<1]:sa Rdd.w[0]=sat32((Rss.h[1] * Rtt.h[0]) + (Rss.h[0] *
t Rtt.h[1])[<<1]);
Rdd.w[1]=sat32((Rss.h[3] * Rtt.h[2]) + (Rss.h[2] *
Rtt.h[3])[<<1]);
Rdd=vcmpyr(Rss,Rtt)[:<<1]:sa Rdd.w[0]=sat32((Rss.h[0] * Rtt.h[0]) - (Rss.h[1] *
t Rtt.h[1])[<<1]);
Rdd.w[1]=sat32((Rss.h[2] * Rtt.h[2]) - (Rss.h[3] *
Rtt.h[3])[<<1]);

80-N2040-53 Rev. AA 447


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rxx+=vcmpyi(Rss,Rtt):sat Rxx.w[0]=sat32(Rxx.w[0] + (Rss.h[1] * Rtt.h[0]) +
(Rss.h[0] * Rtt.h[1])<<0);
Rxx.w[1]=sat32(Rxx.w[1] + (Rss.h[3] * Rtt.h[2]) +
(Rss.h[2] * Rtt.h[3])<<0);
Rxx+=vcmpyr(Rss,Rtt):sat Rxx.w[0]=sat32(Rxx.w[0] + (Rss.h[0] * Rtt.h[0]) -
(Rss.h[1] * Rtt.h[1])<<0);
Rxx.w[1]=sat32(Rxx.w[1] + (Rss.h[2] * Rtt.h[2]) -
(Rss.h[3] * Rtt.h[3])<<0);

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vcmpyi(Rss,Rtt):<<1:sat Word64 Q6_P_vcmpyi_PP_s1_sat(Word64 Rss, Word64


Rtt)
Rdd=vcmpyi(Rss,Rtt):sat Word64 Q6_P_vcmpyi_PP_sat(Word64 Rss, Word64 Rtt)
Rdd=vcmpyr(Rss,Rtt):<<1:sat Word64 Q6_P_vcmpyr_PP_s1_sat(Word64 Rss, Word64
Rtt)
Rdd=vcmpyr(Rss,Rtt):sat Word64 Q6_P_vcmpyr_PP_sat(Word64 Rss, Word64 Rtt)
Rxx+=vcmpyi(Rss,Rtt):sat Word64 Q6_P_vcmpyiacc_PP_sat(Word64 Rxx, Word64
Rss, Word64 Rtt)
Rxx+=vcmpyr(Rss,Rtt):sat Word64 Q6_P_vcmpyracc_PP_sat(Word64 Rxx, Word64
Rss, Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
Rdd=vcmpyr(Rss,Rtt)[:<<N]
1 1 1 0 1 0 0 0 N 0 1 s s s s s P P 0 t t t t t 1 1 0 d d d d d :sat

1 1 1 0 1 0 0 0 N 1 0 s s s s s P P 0 t t t t t 1 1 0 d d d d d Rdd=vcmpyi(Rss,Rtt)[:<<N]
:sat
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 0 1 0 0 0 1 s s s s s P P 0 t t t t t 1 0 0 x x x x x Rxx+=vcmpyr(Rss,Rtt):sat
1 1 1 0 1 0 1 0 0 1 0 s s s s s P P 0 t t t t t 1 0 0 x x x x x Rxx+=vcmpyi(Rss,Rtt):sat

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode

80-N2040-53 Rev. AA 448


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 449


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector complex conjugate


Perform a vector complex conjugate of both complex values in vector Rss by negating the
imaginary halfwords, and placing the result in destination Rdd.

Syntax Behavior
Rdd=vconj(Rss):sat Rdd.h[1]=sat16(-Rss.h[1]);
Rdd.h[0]=Rss.h[0];
Rdd.h[3]=sat16(-Rss.h[3]);
Rdd.h[2]=Rss.h[2];

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vconj(Rss):sat Word64 Q6_P_vconj_P_sat(Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 1 0 0 s s s s s P P - - - - - - 1 1 1 d d d d d Rdd=vconj(Rss):sat

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 450


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector complex rotate


Take the least significant bits of Rt, and use these bits to rotate each of the two complex values in
the source vector a multiple of 90 degrees. Bits 0 and 1 control the rotation factor for word 0, and
bits 2 and 3 control the rotation factor for word 1.
If the rotation control bits are 0, the rotation is 0: the real and imaginary halves of the source
appear unchanged and unmoved in the destination.
If the rotation control bits are 1, the rotation is -pi/2: the real half of the destination gets the
imaginary half of the source, and the imaginary half of the destination gets the negative real half
of the source.
If the rotation control bits are 2, the rotation is pi/2: the real half of the destination gets the
negative imaginary half of the source, and the imaginary half of the destination gets the real half
of the source.
If the rotation control bits are 3, the rotation is pi: the real half of the destination gets the
negative real half of the source, and the imaginary half of the destination gets the negative
imaginary half of the source.

Syntax Behavior
Rdd=vcrotate(Rss,Rt) tmp = Rt[1:0];
if (tmp == 0) {
Rdd.h[0]=Rss.h[0];
Rdd.h[1]=Rss.h[1];
} else if (tmp == 1) {
Rdd.h[0]=Rss.h[1];
Rdd.h[1]=sat16(-Rss.h[0]);
} else if (tmp == 2) {
Rdd.h[0]=sat16(-Rss.h[1]);
Rdd.h[1]=Rss.h[0];
} else {
Rdd.h[0]=sat16(-Rss.h[0]);
Rdd.h[1]=sat16(-Rss.h[1]);
}
tmp = Rt[3:2];
if (tmp == 0) {
Rdd.h[2]=Rss.h[2];
Rdd.h[3]=Rss.h[3];
} else if (tmp == 1) {
Rdd.h[2]=Rss.h[3];
Rdd.h[3]=sat16(-Rss.h[2]);
} else if (tmp == 2) {
Rdd.h[2]=sat16(-Rss.h[3]);
Rdd.h[3]=Rss.h[2];
} else {
Rdd.h[2]=sat16(-Rss.h[2]);
Rdd.h[3]=sat16(-Rss.h[3]);
}

80-N2040-53 Rev. AA 451


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vcrotate(Rss,Rt) Word64 Q6_P_vcrotate_PR(Word64 Rss, Word32


Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 1 1 1 1 - s s s s s P P - t t t t t 0 0 - d d d d d Rdd=vcrotate(Rss,Rt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 452


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector reduce complex multiply by scalar


Multiply a complex number by a scalar. Rss contains two complex numbers. The real portions are
each multiplied by two scalars contained in register Rt, scaled, summed, optionally accumulated,
saturated, and stored in the lower word of Rdd. A similar operation is done on the two imaginary
portions of Rss.

Rdd=vrcmpys(Rss,Rt):<<1:sat

I R I R Rss

Rt b a b a Rt

* * * *
32 32 32 32

<<1 <<1 <<1 <<1

Add Add

Sat_32 Sat_32

I Rdd

Syntax Behavior
Rdd=vrcmpys(Rss,Rt):<<1:sat if ("Rt & 1") {
Assembler mapped to:
"Rdd=vrcmpys(Rss,Rtt):<<1:sat:raw:hi";
} else {
Assembler mapped to:
"Rdd=vrcmpys(Rss,Rtt):<<1:sat:raw:lo";
}
Rdd=vrcmpys(Rss,Rtt):<<1:sat:raw Rdd.w[1]=sat32((Rss.h[1] * Rtt.w[1].h[0])<<1 +
:hi (Rss.h[3] * Rtt.w[1].h[1])<<1);
Rdd.w[0]=sat32((Rss.h[0] * Rtt.w[1].h[0])<<1 +
(Rss.h[2] * Rtt.w[1].h[1])<<1);

80-N2040-53 Rev. AA 453


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rdd=vrcmpys(Rss,Rtt):<<1:sat:raw Rdd.w[1]=sat32((Rss.h[1] * Rtt.w[0].h[0])<<1 +
:lo (Rss.h[3] * Rtt.w[0].h[1])<<1);
Rdd.w[0]=sat32((Rss.h[0] * Rtt.w[0].h[0])<<1 +
(Rss.h[2] * Rtt.w[0].h[1])<<1);
Rxx+=vrcmpys(Rss,Rt):<<1:sat if ("Rt & 1") {
Assembler mapped to:
"Rxx+=vrcmpys(Rss,Rtt):<<1:sat:raw:hi";
} else {
Assembler mapped to:
"Rxx+=vrcmpys(Rss,Rtt):<<1:sat:raw:lo";
}
Rxx+=vrcmpys(Rss,Rtt):<<1:sat:ra Rxx.w[1]=sat32(Rxx.w[1] + (Rss.h[1] *
w:hi Rtt.w[1].h[0])<<1 + (Rss.h[3] *
Rtt.w[1].h[1])<<1);
Rxx.w[0]=sat32(Rxx.w[0] + (Rss.h[0] *
Rtt.w[1].h[0])<<1 + (Rss.h[2] *
Rtt.w[1].h[1])<<1);
Rxx+=vrcmpys(Rss,Rtt):<<1:sat:ra Rxx.w[1]=sat32(Rxx.w[1] + (Rss.h[1] *
w:lo Rtt.w[0].h[0])<<1 + (Rss.h[3] *
Rtt.w[0].h[1])<<1);
Rxx.w[0]=sat32(Rxx.w[0] + (Rss.h[0] *
Rtt.w[0].h[0])<<1 + (Rss.h[2] *
Rtt.w[0].h[1])<<1);

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vrcmpys(Rss,Rt):<<1:s Word64 Q6_P_vrcmpys_PR_s1_sat(Word64 Rss, Word32 Rt)


at
Rxx+=vrcmpys(Rss,Rt):<<1: Word64 Q6_P_vrcmpysacc_PR_s1_sat(Word64 Rxx, Word64
sat Rss, Word32 Rt)

80-N2040-53 Rev. AA 454


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
Rdd=vrcmpys(Rss,Rtt):<<1
1 1 1 0 1 0 0 0 1 0 1 s s s s s P P 0 t t t t t 1 0 0 d d d d d :sat:raw:hi

Rdd=vrcmpys(Rss,Rtt):<<1
1 1 1 0 1 0 0 0 1 1 1 s s s s s P P 0 t t t t t 1 0 0 d d d d d
:sat:raw:lo
ICLASS RegType MajOp s5 Parse t5 MinOp x5
Rxx+=vrcmpys(Rss,Rtt):<<
1 1 1 0 1 0 1 0 1 0 1 s s s s s P P 0 t t t t t 1 0 0 x x x x x
1:sat:raw:hi
Rxx+=vrcmpys(Rss,Rtt):<<
1 1 1 0 1 0 1 0 1 1 1 s s s s s P P 0 t t t t t 1 0 0 x x x x x
1:sat:raw:lo

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 455


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector reduce complex multiply by scalar with round and pack


Multiply a complex number by scalar. Rss contains two complex numbers. The real portions are
each multiplied by two scalars contained in register Rt, scaled, summed, rounded, and saturated.
The upper 16bits of this result are packed in the lower halfword of Rd. A similar operation is done
on the two imaginary portions of Rss.

Rd=vrcmpys(Rss,Rt):<<1:rnd:sat

I R I R Rss

Rt b a b a Rt

* * * *
32 32 32 32

0x8000 <<0-1 <<0-1 <<0-1 <<0-1 0x8000

Add Add

Sat_32 Sat_32

High 16 bits High 16 bits

I R Rd

Syntax Behavior
Rd=vrcmpys(Rss,Rt):<<1:rnd:sat if ("Rt & 1") {
Assembler mapped to:
"Rd=vrcmpys(Rss,Rtt):<<1:rnd:sat:raw:hi";
} else {
Assembler mapped to:
"Rd=vrcmpys(Rss,Rtt):<<1:rnd:sat:raw:lo";
}
Rd=vrcmpys(Rss,Rtt):<<1:rnd:sat:ra Rd.h[1]=sat32((Rss.h[1] * Rtt.w[1].h[0])<<1 +
w:hi (Rss.h[3] * Rtt.w[1].h[1])<<1 + 0x8000).h[1];
Rd.h[0]=sat32((Rss.h[0] * Rtt.w[1].h[0])<<1 +
(Rss.h[2] * Rtt.w[1].h[1])<<1 + 0x8000).h[1];

80-N2040-53 Rev. AA 456


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rd=vrcmpys(Rss,Rtt):<<1:rnd:sat:ra Rd.h[1]=sat32((Rss.h[1] * Rtt.w[0].h[0])<<1 +
w:lo (Rss.h[3] * Rtt.w[0].h[1])<<1 + 0x8000).h[1];
Rd.h[0]=sat32((Rss.h[0] * Rtt.w[0].h[0])<<1 +
(Rss.h[2] * Rtt.w[0].h[1])<<1 + 0x8000).h[1];

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=vrcmpys(Rss,Rt):<<1:rnd:s Word32 Q6_R_vrcmpys_PR_s1_rnd_sat(Word64


at Rss, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
Rd=vrcmpys(Rss,Rtt):<<1:r
1 1 1 0 1 0 0 1 1 - 1 s s s s s P P 0 t t t t t 1 1 0 d d d d d nd:sat:raw:hi

1 1 1 0 1 0 0 1 1 - 1 s s s s s P P 0 t t t t t 1 1 1 d d d d d Rd=vrcmpys(Rss,Rtt):<<1:r
nd:sat:raw:lo

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 457


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector reduce complex rotate


This instruction is useful for CDMA despreading. An unsigned 2-bit immediate specifies a byte to
use in Rt. Four 2-bit fields in the specified byte each select a rotation amount for one of the four
complex numbers in Rss. The real and imaginary products accumulate and store as a 32-bit
complex number in Rd. Optionally, the destination register can also accumulate.
Rxx += vrcrotate(Rss,Rt,#0)

Rt

1 j -1 -j 1 j -1 -j 1 j -1 -j 1 j -1 -j

mux mux mux mux

Im3 Re3 Im2 Re2 Im1 Re1 Im0 Re0 Rss

* * * *

+ +

I R Rxx

80-N2040-53 Rev. AA 458


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rdd=vrcrotate(Rss,Rt,#u2) sumr = 0;
sumi = 0;
control = Rt.ub[#u];
for (i = 0; i < 8; i += 2) {
tmpr = Rss.b[i];
tmpi = Rss.b[i+1];
switch (control & 3) {
case 0: sumr += tmpr;
sumi += tmpi;
break;
case 1: sumr += tmpi;
sumi -= tmpr;
break;
case 2: sumr -= tmpi;
sumi += tmpr;
break;
case 3: sumr -= tmpr;
sumi -= tmpi;
break;
}
control = control >> 2;
}
Rdd.w[0]=sumr;
Rdd.w[1]=sumi;
Rxx+=vrcrotate(Rss,Rt,#u2) sumr = 0;
sumi = 0;
control = Rt.ub[#u];
for (i = 0; i < 8; i += 2) {
tmpr = Rss.b[i];
tmpi = Rss.b[i+1];
switch (control & 3) {
case 0: sumr += tmpr;
sumi += tmpi;
break;
case 1: sumr += tmpi;
sumi -= tmpr;
break;
case 2: sumr -= tmpi;
sumi += tmpr;
break;
case 3: sumr -= tmpr;
sumi -= tmpi;
break;
}
control = control >> 2;
}
Rxx.w[0]=Rxx.w[0] + sumr;
Rxx.w[1]=Rxx.w[1] + sumi;

80-N2040-53 Rev. AA 459


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vrcrotate(Rss,Rt,#u2 Word64 Q6_P_vrcrotate_PRI(Word64 Rss, Word32


) Rt, Word32 Iu2)
Rxx+=vrcrotate(Rss,Rt,#u Word64 Q6_P_vrcrotateacc_PRI(Word64 Rxx, Word64
2) Rss, Word32 Rt, Word32 Iu2)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 1 1 1 1 - s s s s s P P i t t t t t 1 1 i d d d d d Rdd=vrcrotate(Rss,Rt,#u2)
ICLASS RegType Maj s5 Parse t5 x5

1 1 0 0 1 0 1 1 1 0 1 s s s s s P P i t t t t t - - i x x x x x Rxx+=vrcrotate(Rss,Rt,#u2
)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 460


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.10.4 XTYPE FP
The XTYPE FP instruction subclass includes instructions for floating point math.

Floating point addition


Add two floating point values.

Syntax Behavior
Rd=sfadd(Rs,Rt) Rd=Rs+Rt;
Rdd=dfadd(Rss,Rtt) Rdd=Rss+Rtt;

Class: XTYPE (slots 2,3)

Intrinsics

Rd=sfadd(Rs,Rt) Word32 Q6_R_sfadd_RR(Word32 Rs, Word32 Rt)


Rdd=dfadd(Rss,Rtt) Word64 Q6_P_dfadd_PP(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 0 0 0 0 s s s s s P P 0 t t t t t 0 1 1 d d d d d Rdd=dfadd(Rss,Rtt)
1 1 1 0 1 0 1 1 0 0 0 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rd=sfadd(Rs,Rt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 461


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Classify floating point value


Classify floating point values. Classes are normal, subnormal, zero, NaN, or infinity. When the
number is one of the specified classes, return true.

Syntax Behavior
Pd=dfclass(Rss,#u5) Pd = 0;
class = fpclassify(Rss);
if (#u.0 && (class == FP_ZERO)) Pd = 0xff;
if (#u.1 && (class == FP_NORMAL)) Pd =
0xff;
if (#u.2 && (class == FP_SUBNORMAL)) Pd =
0xff;
if (#u.3 && (class == FP_INFINITE)) Pd =
0xff;
if (#u.4 && (class == FP_NAN)) Pd = 0xff;
cancel_flags();
Pd=sfclass(Rs,#u5) Pd = 0;
class = fpclassify(Rs);
if (#u.0 && (class == FP_ZERO)) Pd = 0xff;
if (#u.1 && (class == FP_NORMAL)) Pd =
0xff;
if (#u.2 && (class == FP_SUBNORMAL)) Pd =
0xff;
if (#u.3 && (class == FP_INFINITE)) Pd =
0xff;
if (#u.4 && (class == FP_NAN)) Pd = 0xff;
cancel_flags();

Class: XTYPE (slots 2,3)

Intrinsics

Pd=dfclass(Rss,#u5) Byte Q6_p_dfclass_PI(Word64 Rss, Word32


Iu5)
Pd=sfclass(Rs,#u5) Byte Q6_p_sfclass_RI(Word32 Rs, Word32 Iu5)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse d2
1 0 0 0 0 1 0 1 1 1 1 s s s s s P P 0 i i i i i - - - - - - d d Pd=sfclass(Rs,#u5)
ICLASS RegType s5 Parse d2
1 1 0 1 1 1 0 0 1 0 0 s s s s s P P - 0 0 0 i i i i i 1 0 - d d Pd=dfclass(Rss,#u5)

Field name Description


RegType Register type
MajOp Major opcode
ICLASS Instruction class

80-N2040-53 Rev. AA 462


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s

80-N2040-53 Rev. AA 463


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Compare floating point value


Compare floating point values. p0 returns true when at least one value is a NaN, otherwise p0
returns zero.

Syntax Behavior
Pd=dfcmp.eq(Rss,Rtt) Pd=Rss==Rtt ? 0xff : 0x00;
Pd=dfcmp.ge(Rss,Rtt) Pd=Rss>=Rtt ? 0xff : 0x00;
Pd=dfcmp.gt(Rss,Rtt) Pd=Rss>Rtt ? 0xff : 0x00;
Pd=dfcmp.uo(Rss,Rtt) Pd=isunordered(Rss,Rtt) ? 0xff : 0x00;
Pd=sfcmp.eq(Rs,Rt) Pd=Rs==Rt ? 0xff : 0x00;
Pd=sfcmp.ge(Rs,Rt) Pd=Rs>=Rt ? 0xff : 0x00;
Pd=sfcmp.gt(Rs,Rt) Pd=Rs>Rt ? 0xff : 0x00;
Pd=sfcmp.uo(Rs,Rt) Pd=isunordered(Rs,Rt) ? 0xff : 0x00;

Class: XTYPE (slots 2,3)

Intrinsics

Pd=dfcmp.eq(Rss,Rtt) Byte Q6_p_dfcmp_eq_PP(Word64 Rss, Word64


Rtt)
Pd=dfcmp.ge(Rss,Rtt) Byte Q6_p_dfcmp_ge_PP(Word64 Rss, Word64
Rtt)
Pd=dfcmp.gt(Rss,Rtt) Byte Q6_p_dfcmp_gt_PP(Word64 Rss, Word64
Rtt)
Pd=dfcmp.uo(Rss,Rtt) Byte Q6_p_dfcmp_uo_PP(Word64 Rss, Word64
Rtt)
Pd=sfcmp.eq(Rs,Rt) Byte Q6_p_sfcmp_eq_RR(Word32 Rs, Word32 Rt)
Pd=sfcmp.ge(Rs,Rt) Byte Q6_p_sfcmp_ge_RR(Word32 Rs, Word32 Rt)
Pd=sfcmp.gt(Rs,Rt) Byte Q6_p_sfcmp_gt_RR(Word32 Rs, Word32 Rt)
Pd=sfcmp.uo(Rs,Rt) Byte Q6_p_sfcmp_uo_RR(Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d2
1 1 0 0 0 1 1 1 1 1 1 s s s s s P P - t t t t t 0 0 0 - - - d d Pd=sfcmp.ge(Rs,Rt)
1 1 0 0 0 1 1 1 1 1 1 s s s s s P P - t t t t t 0 0 1 - - - d d Pd=sfcmp.uo(Rs,Rt)
1 1 0 0 0 1 1 1 1 1 1 s s s s s P P - t t t t t 0 1 1 - - - d d Pd=sfcmp.eq(Rs,Rt)
1 1 0 0 0 1 1 1 1 1 1 s s s s s P P - t t t t t 1 0 0 - - - d d Pd=sfcmp.gt(Rs,Rt)

80-N2040-53 Rev. AA 464


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d2
1 1 0 1 0 0 1 0 1 1 1 s s s s s P P - t t t t t 0 0 0 - - - d d Pd=dfcmp.eq(Rss,Rtt)
1 1 0 1 0 0 1 0 1 1 1 s s s s s P P - t t t t t 0 0 1 - - - d d Pd=dfcmp.gt(Rss,Rtt)
1 1 0 1 0 0 1 0 1 1 1 s s s s s P P - t t t t t 0 1 0 - - - d d Pd=dfcmp.ge(Rss,Rtt)
1 1 0 1 0 0 1 0 1 1 1 s s s s s P P - t t t t t 0 1 1 - - - d d Pd=dfcmp.uo(Rss,Rtt)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode

80-N2040-53 Rev. AA 465


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Convert floating point value to other format


Convert floating point values. When rounding is required, it occurs according to the rounding
mode.

Syntax Behavior
Rd=convert_df2sf(Rss) Rd = conv_df_to_sf(Rss);
Rdd=convert_sf2df(Rs) Rdd = conv_sf_to_df(Rs);

Class: XTYPE (slots 2,3)

Intrinsics

Rd=convert_df2sf(Rss) Word32 Q6_R_convert_df2sf_P(Word64 Rss)


Rdd=convert_sf2df(Rs) Word64 Q6_P_convert_sf2df_R(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 1 0 0 1 - - s s s s s P P - - - - - - 0 0 0 d d d d d Rdd=convert_sf2df(Rs)
1 0 0 0 1 0 0 0 0 0 0 s s s s s P P - - - - - - 0 0 1 d d d d d Rd=convert_df2sf(Rss)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 466


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Convert integer to floating point value


Convert floating point values. When rounding is required, it occurs according to the rounding
mode unless the :chop option is specified.
Syntax Behavior
Rd=convert_d2sf(Rss) Rd = conv_8s_to_sf(Rss.s64);
Rd=convert_ud2sf(Rss) Rd = conv_8u_to_sf(Rss.u64);
Rd=convert_uw2sf(Rs) Rd = conv_4u_to_sf(Rs.uw[0]);
Rd=convert_w2sf(Rs) Rd = conv_4s_to_sf(Rs.s32);
Rdd=convert_d2df(Rss) Rdd = conv_8s_to_df(Rss.s64);
Rdd=convert_ud2df(Rss) Rdd = conv_8u_to_df(Rss.u64);
Rdd=convert_uw2df(Rs) Rdd = conv_4u_to_df(Rs.uw[0]);
Rdd=convert_w2df(Rs) Rdd = conv_4s_to_df(Rs.s32);

Class: XTYPE (slots 2,3)

Intrinsics

Rd=convert_d2sf(Rss) Word32 Q6_R_convert_d2sf_P(Word64 Rss)


Rd=convert_ud2sf(Rss) Word32 Q6_R_convert_ud2sf_P(Word64 Rss)
Rd=convert_uw2sf(Rs) Word32 Q6_R_convert_uw2sf_R(Word32 Rs)
Rd=convert_w2sf(Rs) Word32 Q6_R_convert_w2sf_R(Word32 Rs)
Rdd=convert_d2df(Rss) Word64 Q6_P_convert_d2df_P(Word64 Rss)
Rdd=convert_ud2df(Rss) Word64 Q6_P_convert_ud2df_P(Word64 Rss)
Rdd=convert_uw2df(Rs) Word64 Q6_P_convert_uw2df_R(Word32 Rs)
Rdd=convert_w2df(Rs) Word64 Q6_P_convert_w2df_R(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 1 1 1 s s s s s P P 0 - - - - - 0 1 0 d d d d d Rdd=convert_ud2df(Rss)
1 0 0 0 0 0 0 0 1 1 1 s s s s s P P 0 - - - - - 0 1 1 d d d d d Rdd=convert_d2df(Rss)
1 0 0 0 0 1 0 0 1 - - s s s s s P P - - - - - - 0 0 1 d d d d d Rdd=convert_uw2df(Rs)
1 0 0 0 0 1 0 0 1 - - s s s s s P P - - - - - - 0 1 0 d d d d d Rdd=convert_w2df(Rs)
1 0 0 0 1 0 0 0 0 0 1 s s s s s P P - - - - - - 0 0 1 d d d d d Rd=convert_ud2sf(Rss)
1 0 0 0 1 0 0 0 0 1 0 s s s s s P P - - - - - - 0 0 1 d d d d d Rd=convert_d2sf(Rss)
1 0 0 0 1 0 1 1 0 0 1 s s s s s P P - - - - - - 0 0 0 d d d d d Rd=convert_uw2sf(Rs)
1 0 0 0 1 0 1 1 0 1 0 s s s s s P P - - - - - - 0 0 0 d d d d d Rd=convert_w2sf(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d

80-N2040-53 Rev. AA 467


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 468


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Convert floating point value to integer


Convert floating point values. When rounding is required, it occurs according to the rounding
mode unless the :chop option is specified.
If the value is out of range of the destination integer type, the invalid flag is raised and closest
integer is chosen, including for infinite inputs. For NaN inputs, the invalid flag is also raised, and
the output value is implementation defined.

Syntax Behavior
Rd=convert_df2uw(Rss) Rd = conv_df_to_4u(Rss).uw[0];
Rd=convert_df2uw(Rss):chop round_to_zero();
Rd = conv_df_to_4u(Rss).uw[0];
Rd=convert_df2w(Rss) Rd = conv_df_to_4s(Rss).s32;
Rd=convert_df2w(Rss):chop round_to_zero();
Rd = conv_df_to_4s(Rss).s32;
Rd=convert_sf2uw(Rs) Rd = conv_sf_to_4u(Rs).uw[0];
Rd=convert_sf2uw(Rs):chop round_to_zero();
Rd = conv_sf_to_4u(Rs).uw[0];
Rd=convert_sf2w(Rs) Rd = conv_sf_to_4s(Rs).s32;
Rd=convert_sf2w(Rs):chop round_to_zero();
Rd = conv_sf_to_4s(Rs).s32;
Rdd=convert_df2d(Rss) Rdd = conv_df_to_8s(Rss).s64;
Rdd=convert_df2d(Rss):chop round_to_zero();
Rdd = conv_df_to_8s(Rss).s64;
Rdd=convert_df2ud(Rss) Rdd = conv_df_to_8u(Rss).u64;
Rdd=convert_df2ud(Rss):chop round_to_zero();
Rdd = conv_df_to_8u(Rss).u64;
Rdd=convert_sf2d(Rs) Rdd = conv_sf_to_8s(Rs).s64;
Rdd=convert_sf2d(Rs):chop round_to_zero();
Rdd = conv_sf_to_8s(Rs).s64;
Rdd=convert_sf2ud(Rs) Rdd = conv_sf_to_8u(Rs).u64;
Rdd=convert_sf2ud(Rs):chop round_to_zero();
Rdd = conv_sf_to_8u(Rs).u64;

80-N2040-53 Rev. AA 469


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rd=convert_df2uw(Rss) Word32 Q6_R_convert_df2uw_P(Word64 Rss)


Rd=convert_df2uw(Rss):chop Word32 Q6_R_convert_df2uw_P_chop(Word64
Rss)
Rd=convert_df2w(Rss) Word32 Q6_R_convert_df2w_P(Word64 Rss)
Rd=convert_df2w(Rss):chop Word32 Q6_R_convert_df2w_P_chop(Word64 Rss)
Rd=convert_sf2uw(Rs) Word32 Q6_R_convert_sf2uw_R(Word32 Rs)
Rd=convert_sf2uw(Rs):chop Word32 Q6_R_convert_sf2uw_R_chop(Word32 Rs)
Rd=convert_sf2w(Rs) Word32 Q6_R_convert_sf2w_R(Word32 Rs)
Rd=convert_sf2w(Rs):chop Word32 Q6_R_convert_sf2w_R_chop(Word32 Rs)
Rdd=convert_df2d(Rss) Word64 Q6_P_convert_df2d_P(Word64 Rss)
Rdd=convert_df2d(Rss):chop Word64 Q6_P_convert_df2d_P_chop(Word64 Rss)
Rdd=convert_df2ud(Rss) Word64 Q6_P_convert_df2ud_P(Word64 Rss)
Rdd=convert_df2ud(Rss):chop Word64 Q6_P_convert_df2ud_P_chop(Word64
Rss)
Rdd=convert_sf2d(Rs) Word64 Q6_P_convert_sf2d_R(Word32 Rs)
Rdd=convert_sf2d(Rs):chop Word64 Q6_P_convert_sf2d_R_chop(Word32 Rs)
Rdd=convert_sf2ud(Rs) Word64 Q6_P_convert_sf2ud_R(Word32 Rs)
Rdd=convert_sf2ud(Rs):chop Word64 Q6_P_convert_sf2ud_R_chop(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 1 1 1 s s s s s P P 0 - - - - - 0 0 0 d d d d d Rdd=convert_df2d(Rss)
1 0 0 0 0 0 0 0 1 1 1 s s s s s P P 0 - - - - - 0 0 1 d d d d d Rdd=convert_df2ud(Rss)

1 0 0 0 0 0 0 0 1 1 1 s s s s s P P 0 - - - - - 1 1 0 d d d d d Rdd=convert_df2d(Rss):ch
op
Rdd=convert_df2ud(Rss):c
1 0 0 0 0 0 0 0 1 1 1 s s s s s P P 0 - - - - - 1 1 1 d d d d d hop

1 0 0 0 0 1 0 0 1 - - s s s s s P P - - - - - - 0 1 1 d d d d d Rdd=convert_sf2ud(Rs)
1 0 0 0 0 1 0 0 1 - - s s s s s P P - - - - - - 1 0 0 d d d d d Rdd=convert_sf2d(Rs)

1 0 0 0 0 1 0 0 1 - - s s s s s P P - - - - - - 1 0 1 d d d d d Rdd=convert_sf2ud(Rs):ch
op
Rdd=convert_sf2d(Rs):cho
1 0 0 0 0 1 0 0 1 - - s s s s s P P - - - - - - 1 1 0 d d d d d p

1 0 0 0 1 0 0 0 0 1 1 s s s s s P P - - - - - - 0 0 1 d d d d d Rd=convert_df2uw(Rss)
1 0 0 0 1 0 0 0 1 0 0 s s s s s P P - - - - - - 0 0 1 d d d d d Rd=convert_df2w(Rss)

1 0 0 0 1 0 0 0 1 0 1 s s s s s P P - - - - - - 0 0 1 d d d d d Rd=convert_df2uw(Rss):ch
op

80-N2040-53 Rev. AA 470


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Rd=convert_df2w(Rss):cho
1 0 0 0 1 0 0 0 1 1 1 s s s s s P P - - - - - - 0 0 1 d d d d d
p
1 0 0 0 1 0 1 1 0 1 1 s s s s s P P - - - - - - 0 0 0 d d d d d Rd=convert_sf2uw(Rs)
Rd=convert_sf2uw(Rs):cho
1 0 0 0 1 0 1 1 0 1 1 s s s s s P P - - - - - - 0 0 1 d d d d d
p
1 0 0 0 1 0 1 1 1 0 0 s s s s s P P - - - - - - 0 0 0 d d d d d Rd=convert_sf2w(Rs)
1 0 0 0 1 0 1 1 1 0 0 s s s s s P P - - - - - - 0 0 1 d d d d d Rd=convert_sf2w(Rs):chop

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 471


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Floating point extreme value assistance


For divide and square root routines, certain values are problematic for the default routine. These
instructions appropriately fix up the numerator (fixupn), denominator (fixupd), or radicand
(fixupr) for proper calculations when combined with the divide or square root approximation
instructions.

Syntax Behavior
Rd=sffixupd(Rs,Rt) (Rs,Rt,Rd,adjust)=recip_common(Rs,Rt);
Rd = Rt;
Rd=sffixupn(Rs,Rt) (Rs,Rt,Rd,adjust)=recip_common(Rs,Rt);
Rd = Rs;
Rd=sffixupr(Rs) (Rs,Rd,adjust)=invsqrt_common(Rs);
Rd = Rs;

Class: XTYPE (slots 2,3)

Intrinsics

Rd=sffixupd(Rs,Rt) Word32 Q6_R_sffixupd_RR(Word32 Rs, Word32


Rt)
Rd=sffixupn(Rs,Rt) Word32 Q6_R_sffixupn_RR(Word32 Rs, Word32
Rt)
Rd=sffixupr(Rs) Word32 Q6_R_sffixupr_R(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 1 1 1 0 1 s s s s s P P - - - - - - 0 0 0 d d d d d Rd=sffixupr(Rs)
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 1 1 1 1 0 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rd=sffixupn(Rs,Rt)
1 1 1 0 1 0 1 1 1 1 0 s s s s s P P 0 t t t t t 0 0 1 d d d d d Rd=sffixupd(Rs,Rt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 472


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Floating point fused multiply-add


Multiply two values, and add to (or subtract from) the accumulator. Full intermediate precision is
kept.

Syntax Behavior
Rx+=sfmpy(Rs,Rt) Rx=fmaf(Rs,Rt,Rx);
Rx-=sfmpy(Rs,Rt) Rx=fmaf(-Rs,Rt,Rx);
Rxx+=dfmpyhh(Rss,Rtt) Rxx = Rss*Rtt with partial product Rxx;
Rxx+=dfmpylh(Rss,Rtt) Rxx += (Rss.uw[0] * (0x00100000 | zxt20->64(Rtt.uw[1])))
<< 1;

Class: XTYPE (slots 2,3)

Intrinsics

Rx+=sfmpy(Rs,Rt) Word32 Q6_R_sfmpyacc_RR(Word32 Rx, Word32 Rs, Word32


Rt)
Rx-=sfmpy(Rs,Rt) Word32 Q6_R_sfmpynac_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rxx+=dfmpyhh(Rss,Rtt) Word64 Q6_P_dfmpyhhacc_PP(Word64 Rxx, Word64 Rss,
Word64 Rtt)
Rxx+=dfmpylh(Rss,Rtt) Word64 Q6_P_dfmpylhacc_PP(Word64 Rxx, Word64 Rss,
Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 0 1 0 0 0 0 s s s s s P P 0 t t t t t 0 1 1 x x x x x Rxx+=dfmpylh(Rss,Rtt)
1 1 1 0 1 0 1 0 1 0 0 s s s s s P P 0 t t t t t 0 1 1 x x x x x Rxx+=dfmpyhh(Rss,Rtt)
1 1 1 0 1 1 1 1 0 0 0 s s s s s P P 0 t t t t t 1 0 0 x x x x x Rx+=sfmpy(Rs,Rt)
1 1 1 0 1 1 1 1 0 0 0 s s s s s P P 0 t t t t t 1 0 1 x x x x x Rx-=sfmpy(Rs,Rt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 473


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Floating point fused multiply-add with scaling


Multiply two values, and add to (or subtract from) the accumulator. Full intermediate precision is
kept. Additionally, scale the output.
This instruction has special handling of corner cases. If a multiplicand source is zero and a NaN is
not produced, the accumulator is unchanged; thus the sign of a zero accumulator does not
change when the product is a true zero.
For single precision, the scaling factor is the predicate taken as a two's compliment number.
For double precision, the scaling factor is twice the predicate taken as a two's compliment
number. The implementation can change denormal accumulator values to zero for positive scale
factors.

Syntax Behavior
Rx+=sfmpy(Rs,Rt,Pu):scal PREDUSE_TIMING;
e if (isnan(Rx) || isnan(Rs) || isnan(Rt)) Rx =
NaN;
tmp=fmaf(Rs,Rt,Rx) * 2**(Pu);
if (!((Rx == 0.0) && is_true_zero(Rs*Rt))) Rx =
tmp;

Class: XTYPE (slots 2,3)

Intrinsics

Rx+=sfmpy(Rs,Rt,Pu):scale Word32 Q6_R_sfmpyacc_RRp_scale(Word32 Rx,


Word32 Rs, Word32 Rt, Byte Pu)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 u2 x5

1 1 1 0 1 1 1 1 0 1 1 s s s s s P P 0 t t t t t 1 u u x x x x x Rx+=sfmpy(Rs,Rt,Pu):scal
e

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t
u2 Field to encode register u
x5 Field to encode register x

80-N2040-53 Rev. AA 474


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Floating point reciprocal square root approximation


Provides an approximation of the reciprocal square root of the radicand (Rs), if combined with the
appropriate fixup instruction. Certain values (such as infinities or zeros) in the numerator or
denominator can yield values that are not reciprocal approximations, but yield the correct answer
when combined with fixup instructions and the appropriate routines.
For compatibility, exact results of these instructions cannot be relied on. The precision of the
approximation for this architecture and later is at least 6.6 bits.

Syntax Behavior
Rd,Pe=sfinvsqrta(Rs) if ((Rs,Rd,adjust)=invsqrt_common(Rs)) {
Pe = adjust;
idx = (Rs >> 17) & 0x7f;
mant = (invsqrt_lut[idx] << 15);
exp = 127 - ((exponent(Rs) - 127) >> 1)
- 1;
Rd = -1**Rs.31 * 1.MANT * 2**(exp-
BIAS);
}

Class: XTYPE (slots 2,3)

Notes

■ This instruction provides a certain amount of accuracy. In future versions the accuracy may
increase. For future compatibility, avoid dependence on exact values.
■ The predicate generated by this instruction cannot be used as a .new predicate, nor can it be
automatically ANDed with another predicate.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse e2 d5
1 0 0 0 1 0 1 1 1 1 1 s s s s s P P - - - - - - 0 e e d d d d d Rd,Pe=sfinvsqrta(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
e2 Field to encode register e
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 475


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Floating point fused multiply-add for library routines


Multiply two values, and add to (or subtract from) the accumulator. Full intermediate precision is
kept. This instruction has special handling of corner cases. Addition of infinities with opposite
signs, or subtraction of infinities with like signs, is defined as (positive) zero. Rounding is always
nearest-even, except for overflows to infinity, which round to maximal finite values. If a
multiplicand source is zero and a NaN is not produced, the accumulator is unchanged; thus the
sign of a zero accumulator does not change if the product is a true zero. Flags and exceptions do
not generate.
Syntax Behavior
Rx+=sfmpy(Rs,Rt):l round_to_nearest();
ib infminusinf = ((isinf(Rx)) && (isinf(Rs*Rt)) && (Rs ^ Rx ^
Rt.31 != 0));
infinp = (isinf(Rx)) || (isinf(Rt)) || (isinf(Rs));
if (isnan(Rx) || isnan(Rs) || isnan(Rt)) Rx = NaN;
tmp=fmaf(Rs,Rt,Rx);
if (!((Rx == 0.0) && is_true_zero(Rs*Rt))) Rx = tmp;
cancel_flags();
if (isinf(Rx) && !infinp) Rx = Rx - 1;
if (infminusinf) Rx = 0;
Rx- round_to_nearest();
=sfmpy(Rs,Rt):lib infminusinf = ((isinf(Rx)) && (isinf(Rs*Rt)) && (Rs ^ Rx ^
Rt.31 == 0));
infinp = (isinf(Rx)) || (isinf(Rt)) || (isinf(Rs));
if (isnan(Rx) || isnan(Rs) || isnan(Rt)) Rx = NaN;
tmp=fmaf(-Rs,Rt,Rx);
if (!((Rx == 0.0) && is_true_zero(Rs*Rt))) Rx = tmp;
cancel_flags();
if (isinf(Rx) && !infinp) Rx = Rx - 1;
if (infminusinf) Rx = 0;

Class: XTYPE (slots 2,3)

Intrinsics

Rx+=sfmpy(Rs,Rt):lib Word32 Q6_R_sfmpyacc_RR_lib(Word32 Rx, Word32 Rs,


Word32 Rt)
Rx-=sfmpy(Rs,Rt):lib Word32 Q6_R_sfmpynac_RR_lib(Word32 Rx, Word32 Rs,
Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 1 1 1 0 0 0 s s s s s P P 0 t t t t t 1 1 0 x x x x x Rx+=sfmpy(Rs,Rt):lib
1 1 1 0 1 1 1 1 0 0 0 s s s s s P P 0 t t t t t 1 1 1 x x x x x Rx-=sfmpy(Rs,Rt):lib

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode

80-N2040-53 Rev. AA 476


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


RegType Register type
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 477


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Create floating-point constant


Using ten bits of immediate, form a floating-point constant.
Syntax Behavior
Rd=sfmake(#u10):neg Rd = (127 - 6) << 23;
Rd += (#u << 17);
Rd |= (1 << 31);
Rd=sfmake(#u10):pos Rd = (127 - 6) << 23;
Rd += #u << 17;
Rdd=dfmake(#u10):neg Rdd = (1023ULL - 6) << 52;
Rdd += (#u) << 46;
Rdd |= ((1ULL) << 63);
Rdd=dfmake(#u10):pos Rdd = (1023ULL - 6) << 52;
Rdd += (#u) << 46;

Class: XTYPE (slots 2,3)

Intrinsics

Rd=sfmake(#u10):neg Word32 Q6_R_sfmake_I_neg(Word32 Iu10)


Rd=sfmake(#u10):pos Word32 Q6_R_sfmake_I_pos(Word32 Iu10)
Rdd=dfmake(#u10):neg Word64 Q6_P_dfmake_I_neg(Word32 Iu10)
Rdd=dfmake(#u10):pos Word64 Q6_P_dfmake_I_pos(Word32 Iu10)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Parse MinOp d5
1 1 0 1 0 1 1 0 0 0 i - - - - - P P i i i i i i i i i d d d d d Rd=sfmake(#u10):pos
1 1 0 1 0 1 1 0 0 1 i - - - - - P P i i i i i i i i i d d d d d Rd=sfmake(#u10):neg
ICLASS RegType Parse d5
1 1 0 1 1 0 0 1 0 0 i - - - - - P P i i i i i i i i i d d d d d Rdd=dfmake(#u10):pos
1 1 0 1 1 0 0 1 0 1 i - - - - - P P i i i i i i i i i d d d d d Rdd=dfmake(#u10):neg

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d

80-N2040-53 Rev. AA 478


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Floating point maximum


Maximum of two floating point values. If one value is a NaN, the other is chosen.

Syntax Behavior
Rd=sfmax(Rs,Rt) Rd = fmaxf(Rs,Rt);
Rdd=dfmax(Rss,Rtt) Rdd = fmax(Rss,Rtt);

Class: XTYPE (slots 2,3)

Intrinsics

Rd=sfmax(Rs,Rt) Word32 Q6_R_sfmax_RR(Word32 Rs, Word32 Rt)


Rdd=dfmax(Rss,Rtt) Word64 Q6_P_dfmax_PP(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 0 0 0 1 s s s s s P P 0 t t t t t 0 1 1 d d d d d Rdd=dfmax(Rss,Rtt)
1 1 1 0 1 0 1 1 1 0 0 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rd=sfmax(Rs,Rt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 479


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Floating point minimum


Minimum of two floating point values. If one value is a NaN, the other is chosen.

Syntax Behavior
Rd=sfmin(Rs,Rt) Rd = fmin(Rs,Rt);
Rdd=dfmin(Rss,Rtt) Rdd = fmin(Rss,Rtt);

Class: XTYPE (slots 2,3)

Intrinsics

Rd=sfmin(Rs,Rt) Word32 Q6_R_sfmin_RR(Word32 Rs, Word32 Rt)


Rdd=dfmin(Rss,Rtt) Word64 Q6_P_dfmin_PP(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 0 1 1 0 s s s s s P P 0 t t t t t 0 1 1 d d d d d Rdd=dfmin(Rss,Rtt)
1 1 1 0 1 0 1 1 1 0 0 s s s s s P P 0 t t t t t 0 0 1 d d d d d Rd=sfmin(Rs,Rt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 480


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Floating point multiply


Multiply two floating point values.

Syntax Behavior
Rd=sfmpy(Rs,Rt) Rd=Rs*Rt;
Rdd=dfmpyfix(Rss,R if (is_denormal(Rss) && (df_exponent(Rtt) >= 512) &&
tt) is_normal(Rtt)) Rdd = Rss * 0x1.0p52;
else if (is_denormal(Rtt) && (df_exponent(Rss) >= 512) &&
is_normal(Rss)) Rdd = Rss * 0x1.0p-52;
else Rdd = Rss;
Rdd=dfmpyll(Rss,Rt prod = (Rss.uw[0] * Rtt.uw[0]);
t) Rdd = (prod >> 32) << 1;
if (prod.uw[0] != 0) Rdd.0 = 1;

Class: XTYPE (slots 2,3)

Intrinsics

Rd=sfmpy(Rs,Rt) Word32 Q6_R_sfmpy_RR(Word32 Rs, Word32 Rt)


Rdd=dfmpyfix(Rss,Rtt) Word64 Q6_P_dfmpyfix_PP(Word64 Rss, Word64
Rtt)
Rdd=dfmpyll(Rss,Rtt) Word64 Q6_P_dfmpyll_PP(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 0 0 1 0 s s s s s P P 0 t t t t t 0 1 1 d d d d d Rdd=dfmpyfix(Rss,Rtt)
1 1 1 0 1 0 0 0 1 0 1 s s s s s P P 0 t t t t t 0 1 1 d d d d d Rdd=dfmpyll(Rss,Rtt)
1 1 1 0 1 0 1 1 0 1 0 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rd=sfmpy(Rs,Rt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 481


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Floating point reciprocal approximation


Provides an approximation of the reciprocal of the denominator (Rt), if combined with the
appropriate fixup instructions. Certain values (such as infinities or zeros) in the numerator or
denominator may yield values that are not reciprocal approximations, but yield the correct
answer when combined with fixup instructions and the appropriate routines.
For compatibility, exact results of these instructions cannot be relied on. The precision of the
approximation for this architecture and later is at least 6.6 bits.

Syntax Behavior
Rd,Pe=sfrecipa(Rs,Rt) if ((Rs,Rt,Rd,adjust)=recip_common(Rs,Rt))
{
Pe = adjust;
idx = (Rt >> 16) & 0x7f;
mant = (recip_lut[idx] << 15) | 1;
exp = 127 - (exponent(Rt) - 127) - 1;
Rd = -1**Rt.31 * 1.MANT * 2**(exp-
BIAS);
}

Class: XTYPE (slots 2,3)

Notes

■ This instruction provides a certain amount of accuracy. In future versions the accuracy may
increase. For future compatibility, avoid dependence on exact values.
■ The predicate generated by this instruction cannot be used as a .new predicate, nor can it be
automatically ANDed with another predicate.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 e2 d5
1 1 1 0 1 0 1 1 1 1 1 s s s s s P P 0 t t t t t 1 e e d d d d d Rd,Pe=sfrecipa(Rs,Rt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
e2 Field to encode register e
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 482


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Floating point subtraction


Subtract two floating point values.

Syntax Behavior
Rd=sfsub(Rs,Rt) Rd=Rs-Rt;
Rdd=dfsub(Rss,Rtt) Rdd=Rss-Rtt;

Class: XTYPE (slots 2,3)

Intrinsics

Rd=sfsub(Rs,Rt) Word32 Q6_R_sfsub_RR(Word32 Rs, Word32 Rt)


Rdd=dfsub(Rss,Rtt) Word64 Q6_P_dfsub_PP(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 0 1 0 0 s s s s s P P 0 t t t t t 0 1 1 d d d d d Rdd=dfsub(Rss,Rtt)
1 1 1 0 1 0 1 1 0 0 0 s s s s s P P 0 t t t t t 0 0 1 d d d d d Rd=sfsub(Rs,Rt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 483


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.10.5 XTYPE MPY


The XTYPE MPY instruction subclass includes instructions which perform multiplication.

Multiply and use lower result


Multiply the signed 32-bit integer in Rs by either the signed 32-bit integer in Rt or an unsigned
immediate value. The 64-bit result optionally accumulates with the 32-bit destination, or is added
to an immediate. The least-significant 32-bits of the result are written to the single destination
register.
This multiply produces the correct results for the ANSI C multiplication of two signed or unsigned
integers with an integer result.

Rs

Rt /#u8

*
64

Add

Low 32 bits 32

Rx

Syntax Behavior
Rd=+mpyi(Rs,#u8) apply_extension(#u);
Rd=Rs*#u;
Rd=-mpyi(Rs,#u8) Rd=Rs*-#u;
Rd=add(#u6,mpyi(Rs,#U6 apply_extension(#u);
)) Rd = #u + Rs*#U;
Rd=add(#u6,mpyi(Rs,Rt) apply_extension(#u);
) Rd = #u + Rs*Rt;
Rd=add(Ru,mpyi(#u6:2,R Rd = Ru + Rs*#u;
s))
Rd=add(Ru,mpyi(Rs,#u6) apply_extension(#u);
) Rd = Ru + Rs*#u;

80-N2040-53 Rev. AA 484


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rd=mpyi(Rs,#m9) if ("((#m9<0) && (#m9>-256))") {
Assembler mapped to: "Rd=-mpyi(Rs,#m9*(-
1))";
} else {
Assembler mapped to: "Rd=+mpyi(Rs,#m9)";
}
Rd=mpyi(Rs,Rt) Rd=Rs*Rt;
Rd=mpyui(Rs,Rt) Assembler mapped to: "Rd=mpyi(Rs,Rt)"
Rx+=mpyi(Rs,#u8) apply_extension(#u);
Rx=Rx + (Rs*#u);
Rx+=mpyi(Rs,Rt) Rx=Rx + Rs*Rt;
Rx-=mpyi(Rs,#u8) apply_extension(#u);
Rx=Rx - (Rs*#u);
Rx-=mpyi(Rs,Rt) Rx=Rx - Rs*Rt;
Ry=add(Ru,mpyi(Ry,Rs)) Ry = Ru + Rs*Ry;

80-N2040-53 Rev. AA 485


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rd=add(#u6,mpyi(Rs,#U6)) Word32 Q6_R_add_mpyi_IRI(Word32 Iu6, Word32 Rs, Word32


IU6)
Rd=add(#u6,mpyi(Rs,Rt)) Word32 Q6_R_add_mpyi_IRR(Word32 Iu6, Word32 Rs, Word32
Rt)
Rd=add(Ru,mpyi(#u6:2,Rs) Word32 Q6_R_add_mpyi_RIR(Word32 Ru, Word32 Iu6_2,
) Word32 Rs)
Rd=add(Ru,mpyi(Rs,#u6)) Word32 Q6_R_add_mpyi_RRI(Word32 Ru, Word32 Rs, Word32
Iu6)
Rd=mpyi(Rs,#m9) Word32 Q6_R_mpyi_RI(Word32 Rs, Word32 Im9)
Rd=mpyi(Rs,Rt) Word32 Q6_R_mpyi_RR(Word32 Rs, Word32 Rt)
Rd=mpyui(Rs,Rt) Word32 Q6_R_mpyui_RR(Word32 Rs, Word32 Rt)
Rx+=mpyi(Rs,#u8) Word32 Q6_R_mpyiacc_RI(Word32 Rx, Word32 Rs, Word32
Iu8)
Rx+=mpyi(Rs,Rt) Word32 Q6_R_mpyiacc_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx-=mpyi(Rs,#u8) Word32 Q6_R_mpyinac_RI(Word32 Rx, Word32 Rs, Word32
Iu8)
Rx-=mpyi(Rs,Rt) Word32 Q6_R_mpyinac_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Ry=add(Ru,mpyi(Ry,Rs)) Word32 Q6_R_add_mpyi_RRR(Word32 Ru, Word32 Ry, Word32
Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d5
1 1 0 1 0 1 1 1 0 i i s s s s s P P i t t t t t i i i d d d d d Rd=add(#u6,mpyi(Rs,Rt))
ICLASS RegType s5 Parse d5
1 1 0 1 1 0 0 0 I i i s s s s s P P i d d d d d i i i I I I I I Rd=add(#u6,mpyi(Rs,#U6))
ICLASS RegType s5 Parse d5 u5

1 1 0 1 1 1 1 1 0 i i s s s s s P P i d d d d d i i i u u u u u Rd=add(Ru,mpyi(#u6:2,Rs)
)
1 1 0 1 1 1 1 1 1 i i s s s s s P P i d d d d d i i i u u u u u Rd=add(Ru,mpyi(Rs,#u6))
ICLASS RegType MajOp s5 Parse y5 u5
1 1 1 0 0 0 1 1 0 0 0 s s s s s P P - y y y y y - - - u u u u u Ry=add(Ru,mpyi(Ry,Rs))
ICLASS RegType MajOp s5 Parse MinOp d5
1 1 1 0 0 0 0 0 0 - - s s s s s P P 0 i i i i i i i i d d d d d Rd=+mpyi(Rs,#u8)
1 1 1 0 0 0 0 0 1 - - s s s s s P P 0 i i i i i i i i d d d d d Rd=-mpyi(Rs,#u8)
ICLASS RegType MajOp s5 Parse MinOp x5
1 1 1 0 0 0 0 1 0 - - s s s s s P P 0 i i i i i i i i x x x x x Rx+=mpyi(Rs,#u8)
1 1 1 0 0 0 0 1 1 - - s s s s s P P 0 i i i i i i i i x x x x x Rx-=mpyi(Rs,#u8)

80-N2040-53 Rev. AA 486


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 1 0 1 0 0 0 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rd=mpyi(Rs,Rt)
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 1 1 1 0 0 0 s s s s s P P 0 t t t t t 0 0 0 x x x x x Rx+=mpyi(Rs,Rt)
1 1 1 0 1 1 1 1 1 0 0 s s s s s P P 0 t t t t t 0 0 0 x x x x x Rx-=mpyi(Rs,Rt)

Field name Description


RegType Register type
MajOp Major opcode
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
u5 Field to encode register u
x5 Field to encode register x
y5 Field to encode register y

80-N2040-53 Rev. AA 487


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector multiply word by signed half (32 × 16)

Perform mixed precision vector multiply operations. A 32-bit word from vector Rss is multiplied
by a 16-bit halfword (either even or odd) from vector Rtt. The multiplication is performed as a
signed 32 × 16, which produces a 48-bit result. This result is optionally scaled left by one bit. This
result is then shifted right by 16 bits, optionally accumulated and then saturated to 32 bits.
This operation is available in vector form (vmpyweh/vmpywoh) and non-vector form (multiply
and use upper result).

s16 s16 s16 s16 Rtt

mux mux

s32 s32 Rss

* *
48 0x0 0x8000 48 0x0 0x8000

<<0-1 <<0-1
mux mux

Add Add

>>16 >>16

Add Add

Sat_32 Sat_32 32

32

Rxx

Syntax Behavior
Rdd=vmpyweh(Rss,Rtt)[:<<1]:rnd:s Rdd.w[1]=sat32(((Rss.w[1] *
at Rtt.h[2])[<<1]+0x8000)>>16);
Rdd.w[0]=sat32(((Rss.w[0] *
Rtt.h[0])[<<1]+0x8000)>>16);

80-N2040-53 Rev. AA 488


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rdd=vmpyweh(Rss,Rtt)[:<<1]:sat Rdd.w[1]=sat32(((Rss.w[1] *
Rtt.h[2])[<<1])>>16);
Rdd.w[0]=sat32(((Rss.w[0] *
Rtt.h[0])[<<1])>>16);
Rdd=vmpywoh(Rss,Rtt)[:<<1]:rnd:s Rdd.w[1]=sat32(((Rss.w[1] *
at Rtt.h[3])[<<1]+0x8000)>>16);
Rdd.w[0]=sat32(((Rss.w[0] *
Rtt.h[1])[<<1]+0x8000)>>16);
Rdd=vmpywoh(Rss,Rtt)[:<<1]:sat Rdd.w[1]=sat32(((Rss.w[1] *
Rtt.h[3])[<<1])>>16);
Rdd.w[0]=sat32(((Rss.w[0] *
Rtt.h[1])[<<1])>>16);
Rxx+=vmpyweh(Rss,Rtt)[:<<1]:rnd: Rxx.w[1]=sat32(Rxx.w[1] + (((Rss.w[1] *
sat Rtt.h[2])[<<1]+0x8000)>>16));
Rxx.w[0]=sat32(Rxx.w[0] + (((Rss.w[0] *
Rtt.h[0])[<<1]+0x8000)>>16));
Rxx+=vmpyweh(Rss,Rtt)[:<<1]:sat Rxx.w[1]=sat32(Rxx.w[1] + (((Rss.w[1] *
Rtt.h[2])[<<1])>>16));
Rxx.w[0]=sat32(Rxx.w[0] + (((Rss.w[0] *
Rtt.h[0])[<<1])>>16));
Rxx+=vmpywoh(Rss,Rtt)[:<<1]:rnd: Rxx.w[1]=sat32(Rxx.w[1] + (((Rss.w[1] *
sat Rtt.h[3])[<<1]+0x8000)>>16));
Rxx.w[0]=sat32(Rxx.w[0] + (((Rss.w[0] *
Rtt.h[1])[<<1]+0x8000)>>16 ));
Rxx+=vmpywoh(Rss,Rtt)[:<<1]:sat Rxx.w[1]=sat32(Rxx.w[1] + (((Rss.w[1] *
Rtt.h[3])[<<1])>>16));
Rxx.w[0]=sat32(Rxx.w[0] + (((Rss.w[0] *
Rtt.h[1])[<<1])>>16 ));

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vmpyweh(Rss,Rtt):<<1:rnd:s Word64 Q6_P_vmpyweh_PP_s1_rnd_sat(Word64 Rss,


at Word64 Rtt)
Rdd=vmpyweh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpyweh_PP_s1_sat(Word64 Rss, Word64
Rtt)
Rdd=vmpyweh(Rss,Rtt):rnd:sat Word64 Q6_P_vmpyweh_PP_rnd_sat(Word64 Rss, Word64
Rtt)
Rdd=vmpyweh(Rss,Rtt):sat Word64 Q6_P_vmpyweh_PP_sat(Word64 Rss, Word64 Rtt)

80-N2040-53 Rev. AA 489


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Rdd=vmpywoh(Rss,Rtt):<<1:rnd:s Word64 Q6_P_vmpywoh_PP_s1_rnd_sat(Word64 Rss,


at Word64 Rtt)
Rdd=vmpywoh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpywoh_PP_s1_sat(Word64 Rss, Word64
Rtt)
Rdd=vmpywoh(Rss,Rtt):rnd:sat Word64 Q6_P_vmpywoh_PP_rnd_sat(Word64 Rss, Word64
Rtt)
Rdd=vmpywoh(Rss,Rtt):sat Word64 Q6_P_vmpywoh_PP_sat(Word64 Rss, Word64 Rtt)
Rxx+=vmpyweh(Rss,Rtt):<<1:rnd: Word64 Q6_P_vmpywehacc_PP_s1_rnd_sat(Word64 Rxx,
sat Word64 Rss, Word64 Rtt)
Rxx+=vmpyweh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpywehacc_PP_s1_sat(Word64 Rxx, Word64
Rss, Word64 Rtt)
Rxx+=vmpyweh(Rss,Rtt):rnd:sat Word64 Q6_P_vmpywehacc_PP_rnd_sat(Word64 Rxx,
Word64 Rss, Word64 Rtt)
Rxx+=vmpyweh(Rss,Rtt):sat Word64 Q6_P_vmpywehacc_PP_sat(Word64 Rxx, Word64
Rss, Word64 Rtt)
Rxx+=vmpywoh(Rss,Rtt):<<1:rnd: Word64 Q6_P_vmpywohacc_PP_s1_rnd_sat(Word64 Rxx,
sat Word64 Rss, Word64 Rtt)
Rxx+=vmpywoh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpywohacc_PP_s1_sat(Word64 Rxx, Word64
Rss, Word64 Rtt)
Rxx+=vmpywoh(Rss,Rtt):rnd:sat Word64 Q6_P_vmpywohacc_PP_rnd_sat(Word64 Rxx,
Word64 Rss, Word64 Rtt)
Rxx+=vmpywoh(Rss,Rtt):sat Word64 Q6_P_vmpywohacc_PP_sat(Word64 Rxx, Word64
Rss, Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5

1 1 1 0 1 0 0 0 N 0 0 s s s s s P P 0 t t t t t 1 0 1 d d d d d Rdd=vmpyweh(Rss,Rtt)[:<<
N]:sat
Rdd=vmpywoh(Rss,Rtt)[:<<
1 1 1 0 1 0 0 0 N 0 0 s s s s s P P 0 t t t t t 1 1 1 d d d d d N]:sat

1 1 1 0 1 0 0 0 N 0 1 s s s s s P P 0 t t t t t 1 0 1 d d d d d Rdd=vmpyweh(Rss,Rtt)[:<<
N]:rnd:sat

1 1 1 0 1 0 0 0 N 0 1 s s s s s P P 0 t t t t t 1 1 1 d d d d d Rdd=vmpywoh(Rss,Rtt)[:<<
N]:rnd:sat
ICLASS RegType MajOp s5 Parse t5 MinOp x5

1 1 1 0 1 0 1 0 N 0 0 s s s s s P P 0 t t t t t 1 0 1 x x x x x Rxx+=vmpyweh(Rss,Rtt)[:<
<N]:sat
Rxx+=vmpywoh(Rss,Rtt)[:<
1 1 1 0 1 0 1 0 N 0 0 s s s s s P P 0 t t t t t 1 1 1 x x x x x <N]:sat

1 1 1 0 1 0 1 0 N 0 1 s s s s s P P 0 t t t t t 1 0 1 x x x x x Rxx+=vmpyweh(Rss,Rtt)[:<
<N]:rnd:sat

1 1 1 0 1 0 1 0 N 0 1 s s s s s P P 0 t t t t t 1 1 1 x x x x x Rxx+=vmpywoh(Rss,Rtt)[:<
<N]:rnd:sat

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode

80-N2040-53 Rev. AA 490


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 491


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector multiply word by unsigned half (32 × 16)


Perform mixed precision vector multiply operations. A 32-bit signed word from vector Rss is
multiplied by a 16-bit unsigned halfword (either odd or even) from vector Rtt. This multiplication
produces a 48-bit result. This result is optionally scaled left by one bit, and then a rounding
constant is optionally added to the lower 16 bits. This result is then shifted right by 16 bits,
optionally accumulated and then saturated to 32 bits. This is a dual vector operation and is
performed for both high and low word of Rss.

u16 u16 u16 u16 Rtt

mux mux

s32 s32 Rss

* *
48 0x0 0x8000 48 0x0 0x8000

<<0-1 <<0-1
mux mux

Add Add

>>16 >>16

Add Add

Sat_32 Sat_32 32

32

Rxx

Syntax Behavior
Rdd=vmpyweuh(Rss,Rtt)[:<<1]:rnd:s Rdd.w[1]=sat32(((Rss.w[1] *
at Rtt.uh[2])[<<1]+0x8000)>>16);
Rdd.w[0]=sat32(((Rss.w[0] *
Rtt.uh[0])[<<1]+0x8000)>>16);

80-N2040-53 Rev. AA 492


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rdd=vmpyweuh(Rss,Rtt)[:<<1]:sat Rdd.w[1]=sat32(((Rss.w[1] *
Rtt.uh[2])[<<1])>>16);
Rdd.w[0]=sat32(((Rss.w[0] *
Rtt.uh[0])[<<1])>>16);
Rdd=vmpywouh(Rss,Rtt)[:<<1]:rnd:s Rdd.w[1]=sat32(((Rss.w[1] *
at Rtt.uh[3])[<<1]+0x8000)>>16);
Rdd.w[0]=sat32(((Rss.w[0] *
Rtt.uh[1])[<<1]+0x8000)>>16);
Rdd=vmpywouh(Rss,Rtt)[:<<1]:sat Rdd.w[1]=sat32(((Rss.w[1] *
Rtt.uh[3])[<<1])>>16);
Rdd.w[0]=sat32(((Rss.w[0] *
Rtt.uh[1])[<<1])>>16);
Rxx+=vmpyweuh(Rss,Rtt)[:<<1]:rnd: Rxx.w[1]=sat32(Rxx.w[1] + (((Rss.w[1] *
sat Rtt.uh[2])[<<1]+0x8000)>>16));
Rxx.w[0]=sat32(Rxx.w[0] + (((Rss.w[0] *
Rtt.uh[0])[<<1]+0x8000)>>16));
Rxx+=vmpyweuh(Rss,Rtt)[:<<1]:sat Rxx.w[1]=sat32(Rxx.w[1] + (((Rss.w[1] *
Rtt.uh[2])[<<1])>>16));
Rxx.w[0]=sat32(Rxx.w[0] + (((Rss.w[0] *
Rtt.uh[0])[<<1])>>16));
Rxx+=vmpywouh(Rss,Rtt)[:<<1]:rnd: Rxx.w[1]=sat32(Rxx.w[1] + (((Rss.w[1] *
sat Rtt.uh[3])[<<1]+0x8000)>>16));
Rxx.w[0]=sat32(Rxx.w[0] + (((Rss.w[0] *
Rtt.uh[1])[<<1]+0x8000)>>16 ));
Rxx+=vmpywouh(Rss,Rtt)[:<<1]:sat Rxx.w[1]=sat32(Rxx.w[1] + (((Rss.w[1] *
Rtt.uh[3])[<<1])>>16));
Rxx.w[0]=sat32(Rxx.w[0] + (((Rss.w[0] *
Rtt.uh[1])[<<1])>>16 ));

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vmpyweuh(Rss,Rtt):<<1:rnd: Word64 Q6_P_vmpyweuh_PP_s1_rnd_sat(Word64 Rss,


sat Word64 Rtt)
Rdd=vmpyweuh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpyweuh_PP_s1_sat(Word64 Rss,
Word64 Rtt)
Rdd=vmpyweuh(Rss,Rtt):rnd:sat Word64 Q6_P_vmpyweuh_PP_rnd_sat(Word64 Rss,
Word64 Rtt)

80-N2040-53 Rev. AA 493


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Rdd=vmpyweuh(Rss,Rtt):sat Word64 Q6_P_vmpyweuh_PP_sat(Word64 Rss, Word64


Rtt)
Rdd=vmpywouh(Rss,Rtt):<<1:rnd: Word64 Q6_P_vmpywouh_PP_s1_rnd_sat(Word64 Rss,
sat Word64 Rtt)
Rdd=vmpywouh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpywouh_PP_s1_sat(Word64 Rss,
Word64 Rtt)
Rdd=vmpywouh(Rss,Rtt):rnd:sat Word64 Q6_P_vmpywouh_PP_rnd_sat(Word64 Rss,
Word64 Rtt)
Rdd=vmpywouh(Rss,Rtt):sat Word64 Q6_P_vmpywouh_PP_sat(Word64 Rss, Word64
Rtt)
Rxx+=vmpyweuh(Rss,Rtt):<<1:rnd Word64 Q6_P_vmpyweuhacc_PP_s1_rnd_sat(Word64
:sat Rxx, Word64 Rss, Word64 Rtt)
Rxx+=vmpyweuh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpyweuhacc_PP_s1_sat(Word64 Rxx,
Word64 Rss, Word64 Rtt)
Rxx+=vmpyweuh(Rss,Rtt):rnd:sat Word64 Q6_P_vmpyweuhacc_PP_rnd_sat(Word64 Rxx,
Word64 Rss, Word64 Rtt)
Rxx+=vmpyweuh(Rss,Rtt):sat Word64 Q6_P_vmpyweuhacc_PP_sat(Word64 Rxx,
Word64 Rss, Word64 Rtt)
Rxx+=vmpywouh(Rss,Rtt):<<1:rnd Word64 Q6_P_vmpywouhacc_PP_s1_rnd_sat(Word64
:sat Rxx, Word64 Rss, Word64 Rtt)
Rxx+=vmpywouh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpywouhacc_PP_s1_sat(Word64 Rxx,
Word64 Rss, Word64 Rtt)
Rxx+=vmpywouh(Rss,Rtt):rnd:sat Word64 Q6_P_vmpywouhacc_PP_rnd_sat(Word64 Rxx,
Word64 Rss, Word64 Rtt)
Rxx+=vmpywouh(Rss,Rtt):sat Word64 Q6_P_vmpywouhacc_PP_sat(Word64 Rxx,
Word64 Rss, Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5

1 1 1 0 1 0 0 0 N 1 0 s s s s s P P 0 t t t t t 1 0 1 d d d d d Rdd=vmpyweuh(Rss,Rtt)[:<
<N]:sat

1 1 1 0 1 0 0 0 N 1 0 s s s s s P P 0 t t t t t 1 1 1 d d d d d Rdd=vmpywouh(Rss,Rtt)[:<
<N]:sat
Rdd=vmpyweuh(Rss,Rtt)[:<
1 1 1 0 1 0 0 0 N 1 1 s s s s s P P 0 t t t t t 1 0 1 d d d d d <N]:rnd:sat

1 1 1 0 1 0 0 0 N 1 1 s s s s s P P 0 t t t t t 1 1 1 d d d d d Rdd=vmpywouh(Rss,Rtt)[:<
<N]:rnd:sat
ICLASS RegType MajOp s5 Parse t5 MinOp x5

1 1 1 0 1 0 1 0 N 1 0 s s s s s P P 0 t t t t t 1 0 1 x x x x x Rxx+=vmpyweuh(Rss,Rtt)[:
<<N]:sat

1 1 1 0 1 0 1 0 N 1 0 s s s s s P P 0 t t t t t 1 1 1 x x x x x Rxx+=vmpywouh(Rss,Rtt)[:
<<N]:sat
Rxx+=vmpyweuh(Rss,Rtt)[:
1 1 1 0 1 0 1 0 N 1 1 s s s s s P P 0 t t t t t 1 0 1 x x x x x <<N]:rnd:sat

1 1 1 0 1 0 1 0 N 1 1 s s s s s P P 0 t t t t t 1 1 1 x x x x x Rxx+=vmpywouh(Rss,Rtt)[:
<<N]:rnd:sat

80-N2040-53 Rev. AA 494


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 495


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Multiply signed halfwords


Multiply two signed halfwords. Optionally shift the multiplier result by 1 bit. This result can be
accumulated or rounded. The destination/accumulator is either 32 or 64 bits. For 32-bit results,
saturation is optional.

Rx+=mpy(Rs.[HL],Rt.[HL])[:<<1][:sat] Rxx+=mpy(Rs.[HL],Rt.[HL])[:<<1]
Rd = mpy(Rs.[HL],Rt.[HL])[:<<1][:rnd][:sat] Rdd = mpy(Rs.[HL],Rt.[HL])[:<<1][:rnd]

Rs Rt Rs Rt

mux mux mux mux

16 x 16 16 x 16
0x0 0x8000
0x0 0x8000
32 32

<<0-1 <<0-1
mux
mux

32-bit 64-bit add/


add/sub sub

Optional sat
to 32 bits

Rxx
Rx

Syntax Behavior
Rd=mpy(Rs.[HL],Rt.[HL])[:<<1][:rnd][: Rd=[sat32]([round]((Rs.h[01] *
sat] Rt.h[01])[<<1]));
Rdd=mpy(Rs.[HL],Rt.[HL])[:<<1][:rnd] Rdd=[round]((Rs.h[01] * Rt.h[01])[<<1]);
Rx+=mpy(Rs.[HL],Rt.[HL])[:<<1][:sat] Rx=[sat32](Rx+ (Rs.h[01] * Rt.h[01])[<<1]);

Rx-=mpy(Rs.[HL],Rt.[HL])[:<<1][:sat] Rx=[sat32](Rx- (Rs.h[01] * Rt.h[01])[<<1]);

Rxx+=mpy(Rs.[HL],Rt.[HL])[:<<1] Rxx=Rxx+ (Rs.h[01] * Rt.h[01])[<<1];


Rxx-=mpy(Rs.[HL],Rt.[HL])[:<<1] Rxx=Rxx- (Rs.h[01] * Rt.h[01])[<<1];

80-N2040-53 Rev. AA 496


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=mpy(Rs.H,Rt.H) Word32 Q6_R_mpy_RhRh(Word32 Rs, Word32 Rt)


Rd=mpy(Rs.H,Rt.H):<<1 Word32 Q6_R_mpy_RhRh_s1(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.H,Rt.H):<<1:rnd Word32 Q6_R_mpy_RhRh_s1_rnd(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.H,Rt.H):<<1:rnd: Word32 Q6_R_mpy_RhRh_s1_rnd_sat(Word32 Rs, Word32
sat Rt)
Rd=mpy(Rs.H,Rt.H):<<1:sat Word32 Q6_R_mpy_RhRh_s1_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.H,Rt.H):rnd Word32 Q6_R_mpy_RhRh_rnd(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.H,Rt.H):rnd:sat Word32 Q6_R_mpy_RhRh_rnd_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.H,Rt.H):sat Word32 Q6_R_mpy_RhRh_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.H,Rt.L) Word32 Q6_R_mpy_RhRl(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.H,Rt.L):<<1 Word32 Q6_R_mpy_RhRl_s1(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.H,Rt.L):<<1:rnd Word32 Q6_R_mpy_RhRl_s1_rnd(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.H,Rt.L):<<1:rnd: Word32 Q6_R_mpy_RhRl_s1_rnd_sat(Word32 Rs, Word32
sat Rt)
Rd=mpy(Rs.H,Rt.L):<<1:sat Word32 Q6_R_mpy_RhRl_s1_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.H,Rt.L):rnd Word32 Q6_R_mpy_RhRl_rnd(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.H,Rt.L):rnd:sat Word32 Q6_R_mpy_RhRl_rnd_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.H,Rt.L):sat Word32 Q6_R_mpy_RhRl_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.L,Rt.H) Word32 Q6_R_mpy_RlRh(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.L,Rt.H):<<1 Word32 Q6_R_mpy_RlRh_s1(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.L,Rt.H):<<1:rnd Word32 Q6_R_mpy_RlRh_s1_rnd(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.L,Rt.H):<<1:rnd: Word32 Q6_R_mpy_RlRh_s1_rnd_sat(Word32 Rs, Word32
sat Rt)
Rd=mpy(Rs.L,Rt.H):<<1:sat Word32 Q6_R_mpy_RlRh_s1_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.L,Rt.H):rnd Word32 Q6_R_mpy_RlRh_rnd(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.L,Rt.H):rnd:sat Word32 Q6_R_mpy_RlRh_rnd_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.L,Rt.H):sat Word32 Q6_R_mpy_RlRh_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.L,Rt.L) Word32 Q6_R_mpy_RlRl(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.L,Rt.L):<<1 Word32 Q6_R_mpy_RlRl_s1(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.L,Rt.L):<<1:rnd Word32 Q6_R_mpy_RlRl_s1_rnd(Word32 Rs, Word32 Rt)

80-N2040-53 Rev. AA 497


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Rd=mpy(Rs.L,Rt.L):<<1:rnd: Word32 Q6_R_mpy_RlRl_s1_rnd_sat(Word32 Rs, Word32


sat Rt)
Rd=mpy(Rs.L,Rt.L):<<1:sat Word32 Q6_R_mpy_RlRl_s1_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.L,Rt.L):rnd Word32 Q6_R_mpy_RlRl_rnd(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.L,Rt.L):rnd:sat Word32 Q6_R_mpy_RlRl_rnd_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs.L,Rt.L):sat Word32 Q6_R_mpy_RlRl_sat(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.H,Rt.H) Word64 Q6_P_mpy_RhRh(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.H,Rt.H):<<1 Word64 Q6_P_mpy_RhRh_s1(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.H,Rt.H):<<1:rnd Word64 Q6_P_mpy_RhRh_s1_rnd(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.H,Rt.H):rnd Word64 Q6_P_mpy_RhRh_rnd(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.H,Rt.L) Word64 Q6_P_mpy_RhRl(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.H,Rt.L):<<1 Word64 Q6_P_mpy_RhRl_s1(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.H,Rt.L):<<1:rnd Word64 Q6_P_mpy_RhRl_s1_rnd(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.H,Rt.L):rnd Word64 Q6_P_mpy_RhRl_rnd(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.L,Rt.H) Word64 Q6_P_mpy_RlRh(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.L,Rt.H):<<1 Word64 Q6_P_mpy_RlRh_s1(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.L,Rt.H):<<1:rnd Word64 Q6_P_mpy_RlRh_s1_rnd(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.L,Rt.H):rnd Word64 Q6_P_mpy_RlRh_rnd(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.L,Rt.L) Word64 Q6_P_mpy_RlRl(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.L,Rt.L):<<1 Word64 Q6_P_mpy_RlRl_s1(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.L,Rt.L):<<1:rnd Word64 Q6_P_mpy_RlRl_s1_rnd(Word32 Rs, Word32 Rt)
Rdd=mpy(Rs.L,Rt.L):rnd Word64 Q6_P_mpy_RlRl_rnd(Word32 Rs, Word32 Rt)
Rx+=mpy(Rs.H,Rt.H) Word32 Q6_R_mpyacc_RhRh(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx+=mpy(Rs.H,Rt.H):<<1 Word32 Q6_R_mpyacc_RhRh_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx+=mpy(Rs.H,Rt.H):<<1:sat Word32 Q6_R_mpyacc_RhRh_s1_sat(Word32 Rx, Word32
Rs, Word32 Rt)
Rx+=mpy(Rs.H,Rt.H):sat Word32 Q6_R_mpyacc_RhRh_sat(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx+=mpy(Rs.H,Rt.L) Word32 Q6_R_mpyacc_RhRl(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx+=mpy(Rs.H,Rt.L):<<1 Word32 Q6_R_mpyacc_RhRl_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx+=mpy(Rs.H,Rt.L):<<1:sat Word32 Q6_R_mpyacc_RhRl_s1_sat(Word32 Rx, Word32
Rs, Word32 Rt)
Rx+=mpy(Rs.H,Rt.L):sat Word32 Q6_R_mpyacc_RhRl_sat(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx+=mpy(Rs.L,Rt.H) Word32 Q6_R_mpyacc_RlRh(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx+=mpy(Rs.L,Rt.H):<<1 Word32 Q6_R_mpyacc_RlRh_s1(Word32 Rx, Word32 Rs,
Word32 Rt)

80-N2040-53 Rev. AA 498


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Rx+=mpy(Rs.L,Rt.H):<<1:sat Word32 Q6_R_mpyacc_RlRh_s1_sat(Word32 Rx, Word32


Rs, Word32 Rt)
Rx+=mpy(Rs.L,Rt.H):sat Word32 Q6_R_mpyacc_RlRh_sat(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx+=mpy(Rs.L,Rt.L) Word32 Q6_R_mpyacc_RlRl(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx+=mpy(Rs.L,Rt.L):<<1 Word32 Q6_R_mpyacc_RlRl_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx+=mpy(Rs.L,Rt.L):<<1:sat Word32 Q6_R_mpyacc_RlRl_s1_sat(Word32 Rx, Word32
Rs, Word32 Rt)
Rx+=mpy(Rs.L,Rt.L):sat Word32 Q6_R_mpyacc_RlRl_sat(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpy(Rs.H,Rt.H) Word32 Q6_R_mpynac_RhRh(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpy(Rs.H,Rt.H):<<1 Word32 Q6_R_mpynac_RhRh_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpy(Rs.H,Rt.H):<<1:sat Word32 Q6_R_mpynac_RhRh_s1_sat(Word32 Rx, Word32
Rs, Word32 Rt)
Rx-=mpy(Rs.H,Rt.H):sat Word32 Q6_R_mpynac_RhRh_sat(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpy(Rs.H,Rt.L) Word32 Q6_R_mpynac_RhRl(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpy(Rs.H,Rt.L):<<1 Word32 Q6_R_mpynac_RhRl_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpy(Rs.H,Rt.L):<<1:sat Word32 Q6_R_mpynac_RhRl_s1_sat(Word32 Rx, Word32
Rs, Word32 Rt)
Rx-=mpy(Rs.H,Rt.L):sat Word32 Q6_R_mpynac_RhRl_sat(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpy(Rs.L,Rt.H) Word32 Q6_R_mpynac_RlRh(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpy(Rs.L,Rt.H):<<1 Word32 Q6_R_mpynac_RlRh_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpy(Rs.L,Rt.H):<<1:sat Word32 Q6_R_mpynac_RlRh_s1_sat(Word32 Rx, Word32
Rs, Word32 Rt)
Rx-=mpy(Rs.L,Rt.H):sat Word32 Q6_R_mpynac_RlRh_sat(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpy(Rs.L,Rt.L) Word32 Q6_R_mpynac_RlRl(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpy(Rs.L,Rt.L):<<1 Word32 Q6_R_mpynac_RlRl_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpy(Rs.L,Rt.L):<<1:sat Word32 Q6_R_mpynac_RlRl_s1_sat(Word32 Rx, Word32
Rs, Word32 Rt)
Rx-=mpy(Rs.L,Rt.L):sat Word32 Q6_R_mpynac_RlRl_sat(Word32 Rx, Word32 Rs,
Word32 Rt)
Rxx+=mpy(Rs.H,Rt.H) Word64 Q6_P_mpyacc_RhRh(Word64 Rxx, Word32 Rs,
Word32 Rt)

80-N2040-53 Rev. AA 499


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Rxx+=mpy(Rs.H,Rt.H):<<1 Word64 Q6_P_mpyacc_RhRh_s1(Word64 Rxx, Word32 Rs,


Word32 Rt)
Rxx+=mpy(Rs.H,Rt.L) Word64 Q6_P_mpyacc_RhRl(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx+=mpy(Rs.H,Rt.L):<<1 Word64 Q6_P_mpyacc_RhRl_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx+=mpy(Rs.L,Rt.H) Word64 Q6_P_mpyacc_RlRh(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx+=mpy(Rs.L,Rt.H):<<1 Word64 Q6_P_mpyacc_RlRh_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx+=mpy(Rs.L,Rt.L) Word64 Q6_P_mpyacc_RlRl(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx+=mpy(Rs.L,Rt.L):<<1 Word64 Q6_P_mpyacc_RlRl_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpy(Rs.H,Rt.H) Word64 Q6_P_mpynac_RhRh(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpy(Rs.H,Rt.H):<<1 Word64 Q6_P_mpynac_RhRh_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpy(Rs.H,Rt.L) Word64 Q6_P_mpynac_RhRl(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpy(Rs.H,Rt.L):<<1 Word64 Q6_P_mpynac_RhRl_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpy(Rs.L,Rt.H) Word64 Q6_P_mpynac_RlRh(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpy(Rs.L,Rt.H):<<1 Word64 Q6_P_mpynac_RlRh_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpy(Rs.L,Rt.L) Word64 Q6_P_mpynac_RlRl(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpy(Rs.L,Rt.L):<<1 Word64 Q6_P_mpynac_RlRl_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 0 1 0 0 N 0 0 s s s s s P P - t t t t t - 0 0 d d d d d Rdd=mpy(Rs.L,Rt.L)[:<<N]
1 1 1 0 0 1 0 0 N 0 0 s s s s s P P - t t t t t - 0 1 d d d d d Rdd=mpy(Rs.L,Rt.H)[:<<N]
1 1 1 0 0 1 0 0 N 0 0 s s s s s P P - t t t t t - 1 0 d d d d d Rdd=mpy(Rs.H,Rt.L)[:<<N]
1 1 1 0 0 1 0 0 N 0 0 s s s s s P P - t t t t t - 1 1 d d d d d Rdd=mpy(Rs.H,Rt.H)[:<<N]

1 1 1 0 0 1 0 0 N 0 1 s s s s s P P - t t t t t - 0 0 d d d d d Rdd=mpy(Rs.L,Rt.L)[:<<N]:
rnd

1 1 1 0 0 1 0 0 N 0 1 s s s s s P P - t t t t t - 0 1 d d d d d Rdd=mpy(Rs.L,Rt.H)[:<<N]:
rnd
Rdd=mpy(Rs.H,Rt.L)[:<<N]:
1 1 1 0 0 1 0 0 N 0 1 s s s s s P P - t t t t t - 1 0 d d d d d rnd

1 1 1 0 0 1 0 0 N 0 1 s s s s s P P - t t t t t - 1 1 d d d d d Rdd=mpy(Rs.H,Rt.H)[:<<N]
:rnd
ICLASS RegType MajOp s5 Parse t5 MinOp x5

80-N2040-53 Rev. AA 500


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Rxx+=mpy(Rs.L,Rt.L)[:<<N
1 1 1 0 0 1 1 0 N 0 0 s s s s s P P - t t t t t 0 0 0 x x x x x
]

1 1 1 0 0 1 1 0 N 0 0 s s s s s P P - t t t t t 0 0 1 x x x x x Rxx+=mpy(Rs.L,Rt.H)[:<<N
]
Rxx+=mpy(Rs.H,Rt.L)[:<<N
1 1 1 0 0 1 1 0 N 0 0 s s s s s P P - t t t t t 0 1 0 x x x x x
]
Rxx+=mpy(Rs.H,Rt.H)[:<<
1 1 1 0 0 1 1 0 N 0 0 s s s s s P P - t t t t t 0 1 1 x x x x x N]

1 1 1 0 0 1 1 0 N 0 1 s s s s s P P - t t t t t 0 0 0 x x x x x Rxx-=mpy(Rs.L,Rt.L)[:<<N]
1 1 1 0 0 1 1 0 N 0 1 s s s s s P P - t t t t t 0 0 1 x x x x x Rxx-=mpy(Rs.L,Rt.H)[:<<N]
1 1 1 0 0 1 1 0 N 0 1 s s s s s P P - t t t t t 0 1 0 x x x x x Rxx-=mpy(Rs.H,Rt.L)[:<<N]
Rxx-
1 1 1 0 0 1 1 0 N 0 1 s s s s s P P - t t t t t 0 1 1 x x x x x
=mpy(Rs.H,Rt.H)[:<<N]
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 1 0 0 N 0 0 s s s s s P P - t t t t t 0 0 0 d d d d d Rd=mpy(Rs.L,Rt.L)[:<<N]
1 1 1 0 1 1 0 0 N 0 0 s s s s s P P - t t t t t 0 0 1 d d d d d Rd=mpy(Rs.L,Rt.H)[:<<N]
1 1 1 0 1 1 0 0 N 0 0 s s s s s P P - t t t t t 0 1 0 d d d d d Rd=mpy(Rs.H,Rt.L)[:<<N]
1 1 1 0 1 1 0 0 N 0 0 s s s s s P P - t t t t t 0 1 1 d d d d d Rd=mpy(Rs.H,Rt.H)[:<<N]
Rd=mpy(Rs.L,Rt.L)[:<<N]:s
1 1 1 0 1 1 0 0 N 0 0 s s s s s P P - t t t t t 1 0 0 d d d d d at

1 1 1 0 1 1 0 0 N 0 0 s s s s s P P - t t t t t 1 0 1 d d d d d Rd=mpy(Rs.L,Rt.H)[:<<N]:s
at

1 1 1 0 1 1 0 0 N 0 0 s s s s s P P - t t t t t 1 1 0 d d d d d Rd=mpy(Rs.H,Rt.L)[:<<N]:s
at
Rd=mpy(Rs.H,Rt.H)[:<<N]:
1 1 1 0 1 1 0 0 N 0 0 s s s s s P P - t t t t t 1 1 1 d d d d d sat

1 1 1 0 1 1 0 0 N 0 1 s s s s s P P - t t t t t 0 0 0 d d d d d Rd=mpy(Rs.L,Rt.L)[:<<N]:r
nd

1 1 1 0 1 1 0 0 N 0 1 s s s s s P P - t t t t t 0 0 1 d d d d d Rd=mpy(Rs.L,Rt.H)[:<<N]:r
nd
Rd=mpy(Rs.H,Rt.L)[:<<N]:r
1 1 1 0 1 1 0 0 N 0 1 s s s s s P P - t t t t t 0 1 0 d d d d d nd

1 1 1 0 1 1 0 0 N 0 1 s s s s s P P - t t t t t 0 1 1 d d d d d Rd=mpy(Rs.H,Rt.H)[:<<N]:r
nd

1 1 1 0 1 1 0 0 N 0 1 s s s s s P P - t t t t t 1 0 0 d d d d d Rd=mpy(Rs.L,Rt.L)[:<<N]:r
nd:sat
Rd=mpy(Rs.L,Rt.H)[:<<N]:r
1 1 1 0 1 1 0 0 N 0 1 s s s s s P P - t t t t t 1 0 1 d d d d d nd:sat

1 1 1 0 1 1 0 0 N 0 1 s s s s s P P - t t t t t 1 1 0 d d d d d Rd=mpy(Rs.H,Rt.L)[:<<N]:r
nd:sat
Rd=mpy(Rs.H,Rt.H)[:<<N]:r
1 1 1 0 1 1 0 0 N 0 1 s s s s s P P - t t t t t 1 1 1 d d d d d nd:sat

ICLASS RegType MajOp s5 Parse t5 MinOp x5


1 1 1 0 1 1 1 0 N 0 0 s s s s s P P - t t t t t 0 0 0 x x x x x Rx+=mpy(Rs.L,Rt.L)[:<<N]
1 1 1 0 1 1 1 0 N 0 0 s s s s s P P - t t t t t 0 0 1 x x x x x Rx+=mpy(Rs.L,Rt.H)[:<<N]
1 1 1 0 1 1 1 0 N 0 0 s s s s s P P - t t t t t 0 1 0 x x x x x Rx+=mpy(Rs.H,Rt.L)[:<<N]
1 1 1 0 1 1 1 0 N 0 0 s s s s s P P - t t t t t 0 1 1 x x x x x Rx+=mpy(Rs.H,Rt.H)[:<<N]

1 1 1 0 1 1 1 0 N 0 0 s s s s s P P - t t t t t 1 0 0 x x x x x Rx+=mpy(Rs.L,Rt.L)[:<<N]:
sat
Rx+=mpy(Rs.L,Rt.H)[:<<N]:
1 1 1 0 1 1 1 0 N 0 0 s s s s s P P - t t t t t 1 0 1 x x x x x sat

1 1 1 0 1 1 1 0 N 0 0 s s s s s P P - t t t t t 1 1 0 x x x x x Rx+=mpy(Rs.H,Rt.L)[:<<N]:
sat

1 1 1 0 1 1 1 0 N 0 0 s s s s s P P - t t t t t 1 1 1 x x x x x Rx+=mpy(Rs.H,Rt.H)[:<<N]
:sat
1 1 1 0 1 1 1 0 N 0 1 s s s s s P P - t t t t t 0 0 0 x x x x x Rx-=mpy(Rs.L,Rt.L)[:<<N]
1 1 1 0 1 1 1 0 N 0 1 s s s s s P P - t t t t t 0 0 1 x x x x x Rx-=mpy(Rs.L,Rt.H)[:<<N]
1 1 1 0 1 1 1 0 N 0 1 s s s s s P P - t t t t t 0 1 0 x x x x x Rx-=mpy(Rs.H,Rt.L)[:<<N]

80-N2040-53 Rev. AA 501


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 1 1 1 0 N 0 1 s s s s s P P - t t t t t 0 1 1 x x x x x Rx-=mpy(Rs.H,Rt.H)[:<<N]
Rx-
1 1 1 0 1 1 1 0 N 0 1 s s s s s P P - t t t t t 1 0 0 x x x x x
=mpy(Rs.L,Rt.L)[:<<N]:sat
Rx-
1 1 1 0 1 1 1 0 N 0 1 s s s s s P P - t t t t t 1 0 1 x x x x x
=mpy(Rs.L,Rt.H)[:<<N]:sat
Rx-
1 1 1 0 1 1 1 0 N 0 1 s s s s s P P - t t t t t 1 1 0 x x x x x
=mpy(Rs.H,Rt.L)[:<<N]:sat
Rx-
1 1 1 0 1 1 1 0 N 0 1 s s s s s P P - t t t t t 1 1 1 x x x x x
=mpy(Rs.H,Rt.H)[:<<N]:sat

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
sH Rs is high
tH Rt is high
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 502


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Multiply unsigned halfwords


Multiply two unsigned halfwords. Scale the result by 0-3 bits. Optionally, add or subtract the
result from the accumulator.

Rx+=mpyu(Rs.[HL],Rt.[HL])[:<<1] Rxx+=mpyu(Rs.[HL],Rt.[HL])[:<<1]
Rd = mpyu(Rs.[HL],Rt.[HL])[:<<1] Rdd = mpyu(Rs.[HL],Rt.[HL])[:<<1]

Rs Rt Rs Rt

mux mux mux mux

16 x 16 16 x 16
0x0
0x0
32 32

<<0-1 <<0-1
mux
mux

32-bit 64-bit add/


add/sub sub

Rx

Rxx

Syntax Behavior
Rd=mpyu(Rs.[HL],Rt.[HL])[:<<1] Rd=(Rs.uh[01] * Rt.uh[01])[<<1];
Rdd=mpyu(Rs.[HL],Rt.[HL])[:<<1 Rdd=(Rs.uh[01] * Rt.uh[01])[<<1];
]
Rx+=mpyu(Rs.[HL],Rt.[HL])[:<<1 Rx=Rx+ (Rs.uh[01] * Rt.uh[01])[<<1];
]
Rx- Rx=Rx- (Rs.uh[01] * Rt.uh[01])[<<1];
=mpyu(Rs.[HL],Rt.[HL])[:<<1]
Rxx+=mpyu(Rs.[HL],Rt.[HL])[:<< Rxx=Rxx+ (Rs.uh[01] * Rt.uh[01])[<<1];
1]
Rxx- Rxx=Rxx- (Rs.uh[01] * Rt.uh[01])[<<1];
=mpyu(Rs.[HL],Rt.[HL])[:<<1]

80-N2040-53 Rev. AA 503


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rd=mpyu(Rs.H,Rt.H) UWord32 Q6_R_mpyu_RhRh(Word32 Rs, Word32 Rt)


Rd=mpyu(Rs.H,Rt.H):<<1 UWord32 Q6_R_mpyu_RhRh_s1(Word32 Rs, Word32 Rt)
Rd=mpyu(Rs.H,Rt.L) UWord32 Q6_R_mpyu_RhRl(Word32 Rs, Word32 Rt)
Rd=mpyu(Rs.H,Rt.L):<<1 UWord32 Q6_R_mpyu_RhRl_s1(Word32 Rs, Word32 Rt)
Rd=mpyu(Rs.L,Rt.H) UWord32 Q6_R_mpyu_RlRh(Word32 Rs, Word32 Rt)
Rd=mpyu(Rs.L,Rt.H):<<1 UWord32 Q6_R_mpyu_RlRh_s1(Word32 Rs, Word32 Rt)
Rd=mpyu(Rs.L,Rt.L) UWord32 Q6_R_mpyu_RlRl(Word32 Rs, Word32 Rt)
Rd=mpyu(Rs.L,Rt.L):<<1 UWord32 Q6_R_mpyu_RlRl_s1(Word32 Rs, Word32 Rt)
Rdd=mpyu(Rs.H,Rt.H) UWord64 Q6_P_mpyu_RhRh(Word32 Rs, Word32 Rt)
Rdd=mpyu(Rs.H,Rt.H):<<1 UWord64 Q6_P_mpyu_RhRh_s1(Word32 Rs, Word32 Rt)
Rdd=mpyu(Rs.H,Rt.L) UWord64 Q6_P_mpyu_RhRl(Word32 Rs, Word32 Rt)
Rdd=mpyu(Rs.H,Rt.L):<<1 UWord64 Q6_P_mpyu_RhRl_s1(Word32 Rs, Word32 Rt)
Rdd=mpyu(Rs.L,Rt.H) UWord64 Q6_P_mpyu_RlRh(Word32 Rs, Word32 Rt)
Rdd=mpyu(Rs.L,Rt.H):<<1 UWord64 Q6_P_mpyu_RlRh_s1(Word32 Rs, Word32 Rt)
Rdd=mpyu(Rs.L,Rt.L) UWord64 Q6_P_mpyu_RlRl(Word32 Rs, Word32 Rt)
Rdd=mpyu(Rs.L,Rt.L):<<1 UWord64 Q6_P_mpyu_RlRl_s1(Word32 Rs, Word32 Rt)
Rx+=mpyu(Rs.H,Rt.H) Word32 Q6_R_mpyuacc_RhRh(Word32 Rx, Word32 Rs, Word32
Rt)
Rx+=mpyu(Rs.H,Rt.H):<<1 Word32 Q6_R_mpyuacc_RhRh_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx+=mpyu(Rs.H,Rt.L) Word32 Q6_R_mpyuacc_RhRl(Word32 Rx, Word32 Rs, Word32
Rt)
Rx+=mpyu(Rs.H,Rt.L):<<1 Word32 Q6_R_mpyuacc_RhRl_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx+=mpyu(Rs.L,Rt.H) Word32 Q6_R_mpyuacc_RlRh(Word32 Rx, Word32 Rs, Word32
Rt)
Rx+=mpyu(Rs.L,Rt.H):<<1 Word32 Q6_R_mpyuacc_RlRh_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx+=mpyu(Rs.L,Rt.L) Word32 Q6_R_mpyuacc_RlRl(Word32 Rx, Word32 Rs, Word32
Rt)
Rx+=mpyu(Rs.L,Rt.L):<<1 Word32 Q6_R_mpyuacc_RlRl_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpyu(Rs.H,Rt.H) Word32 Q6_R_mpyunac_RhRh(Word32 Rx, Word32 Rs, Word32
Rt)
Rx-=mpyu(Rs.H,Rt.H):<<1 Word32 Q6_R_mpyunac_RhRh_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpyu(Rs.H,Rt.L) Word32 Q6_R_mpyunac_RhRl(Word32 Rx, Word32 Rs, Word32
Rt)

80-N2040-53 Rev. AA 504


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Rx-=mpyu(Rs.H,Rt.L):<<1 Word32 Q6_R_mpyunac_RhRl_s1(Word32 Rx, Word32 Rs,


Word32 Rt)
Rx-=mpyu(Rs.L,Rt.H) Word32 Q6_R_mpyunac_RlRh(Word32 Rx, Word32 Rs, Word32
Rt)
Rx-=mpyu(Rs.L,Rt.H):<<1 Word32 Q6_R_mpyunac_RlRh_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpyu(Rs.L,Rt.L) Word32 Q6_R_mpyunac_RlRl(Word32 Rx, Word32 Rs, Word32
Rt)
Rx-=mpyu(Rs.L,Rt.L):<<1 Word32 Q6_R_mpyunac_RlRl_s1(Word32 Rx, Word32 Rs,
Word32 Rt)
Rxx+=mpyu(Rs.H,Rt.H) Word64 Q6_P_mpyuacc_RhRh(Word64 Rxx, Word32 Rs, Word32
Rt)
Rxx+=mpyu(Rs.H,Rt.H):<<1 Word64 Q6_P_mpyuacc_RhRh_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx+=mpyu(Rs.H,Rt.L) Word64 Q6_P_mpyuacc_RhRl(Word64 Rxx, Word32 Rs, Word32
Rt)
Rxx+=mpyu(Rs.H,Rt.L):<<1 Word64 Q6_P_mpyuacc_RhRl_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx+=mpyu(Rs.L,Rt.H) Word64 Q6_P_mpyuacc_RlRh(Word64 Rxx, Word32 Rs, Word32
Rt)
Rxx+=mpyu(Rs.L,Rt.H):<<1 Word64 Q6_P_mpyuacc_RlRh_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx+=mpyu(Rs.L,Rt.L) Word64 Q6_P_mpyuacc_RlRl(Word64 Rxx, Word32 Rs, Word32
Rt)
Rxx+=mpyu(Rs.L,Rt.L):<<1 Word64 Q6_P_mpyuacc_RlRl_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpyu(Rs.H,Rt.H) Word64 Q6_P_mpyunac_RhRh(Word64 Rxx, Word32 Rs, Word32
Rt)
Rxx-=mpyu(Rs.H,Rt.H):<<1 Word64 Q6_P_mpyunac_RhRh_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpyu(Rs.H,Rt.L) Word64 Q6_P_mpyunac_RhRl(Word64 Rxx, Word32 Rs, Word32
Rt)
Rxx-=mpyu(Rs.H,Rt.L):<<1 Word64 Q6_P_mpyunac_RhRl_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpyu(Rs.L,Rt.H) Word64 Q6_P_mpyunac_RlRh(Word64 Rxx, Word32 Rs, Word32
Rt)
Rxx-=mpyu(Rs.L,Rt.H):<<1 Word64 Q6_P_mpyunac_RlRh_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpyu(Rs.L,Rt.L) Word64 Q6_P_mpyunac_RlRl(Word64 Rxx, Word32 Rs, Word32
Rt)
Rxx-=mpyu(Rs.L,Rt.L):<<1 Word64 Q6_P_mpyunac_RlRl_s1(Word64 Rxx, Word32 Rs,
Word32 Rt)

80-N2040-53 Rev. AA 505


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
Rdd=mpyu(Rs.L,Rt.L)[:<<N
1 1 1 0 0 1 0 0 N 1 0 s s s s s P P - t t t t t - 0 0 d d d d d ]

Rdd=mpyu(Rs.L,Rt.H)[:<<N
1 1 1 0 0 1 0 0 N 1 0 s s s s s P P - t t t t t - 0 1 d d d d d
]
Rdd=mpyu(Rs.H,Rt.L)[:<<N
1 1 1 0 0 1 0 0 N 1 0 s s s s s P P - t t t t t - 1 0 d d d d d
]
Rdd=mpyu(Rs.H,Rt.H)[:<<
1 1 1 0 0 1 0 0 N 1 0 s s s s s P P - t t t t t - 1 1 d d d d d
N]
ICLASS RegType MajOp s5 Parse t5 MinOp x5
Rxx+=mpyu(Rs.L,Rt.L)[:<<
1 1 1 0 0 1 1 0 N 1 0 s s s s s P P - t t t t t 0 0 0 x x x x x
N]
Rxx+=mpyu(Rs.L,Rt.H)[:<<
1 1 1 0 0 1 1 0 N 1 0 s s s s s P P - t t t t t 0 0 1 x x x x x
N]
Rxx+=mpyu(Rs.H,Rt.L)[:<<
1 1 1 0 0 1 1 0 N 1 0 s s s s s P P - t t t t t 0 1 0 x x x x x N]

1 1 1 0 0 1 1 0 N 1 0 s s s s s P P - t t t t t 0 1 1 x x x x x Rxx+=mpyu(Rs.H,Rt.H)[:<<
N]

1 1 1 0 0 1 1 0 N 1 1 s s s s s P P - t t t t t 0 0 0 x x x x x Rxx-
=mpyu(Rs.L,Rt.L)[:<<N]
Rxx-
1 1 1 0 0 1 1 0 N 1 1 s s s s s P P - t t t t t 0 0 1 x x x x x =mpyu(Rs.L,Rt.H)[:<<N]

1 1 1 0 0 1 1 0 N 1 1 s s s s s P P - t t t t t 0 1 0 x x x x x Rxx-
=mpyu(Rs.H,Rt.L)[:<<N]

1 1 1 0 0 1 1 0 N 1 1 s s s s s P P - t t t t t 0 1 1 x x x x x Rxx-
=mpyu(Rs.H,Rt.H)[:<<N]
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 1 0 0 N 1 0 s s s s s P P - t t t t t 0 0 0 d d d d d Rd=mpyu(Rs.L,Rt.L)[:<<N]
1 1 1 0 1 1 0 0 N 1 0 s s s s s P P - t t t t t 0 0 1 d d d d d Rd=mpyu(Rs.L,Rt.H)[:<<N]
1 1 1 0 1 1 0 0 N 1 0 s s s s s P P - t t t t t 0 1 0 d d d d d Rd=mpyu(Rs.H,Rt.L)[:<<N]
1 1 1 0 1 1 0 0 N 1 0 s s s s s P P - t t t t t 0 1 1 d d d d d Rd=mpyu(Rs.H,Rt.H)[:<<N]
ICLASS RegType MajOp s5 Parse t5 MinOp x5
Rx+=mpyu(Rs.L,Rt.L)[:<<N
1 1 1 0 1 1 1 0 N 1 0 s s s s s P P - t t t t t 0 0 0 x x x x x ]

1 1 1 0 1 1 1 0 N 1 0 s s s s s P P - t t t t t 0 0 1 x x x x x Rx+=mpyu(Rs.L,Rt.H)[:<<N
]
Rx+=mpyu(Rs.H,Rt.L)[:<<N
1 1 1 0 1 1 1 0 N 1 0 s s s s s P P - t t t t t 0 1 0 x x x x x ]

1 1 1 0 1 1 1 0 N 1 0 s s s s s P P - t t t t t 0 1 1 x x x x x Rx+=mpyu(Rs.H,Rt.H)[:<<
N]
1 1 1 0 1 1 1 0 N 1 1 s s s s s P P - t t t t t 0 0 0 x x x x x Rx-=mpyu(Rs.L,Rt.L)[:<<N]
1 1 1 0 1 1 1 0 N 1 1 s s s s s P P - t t t t t 0 0 1 x x x x x Rx-=mpyu(Rs.L,Rt.H)[:<<N]
1 1 1 0 1 1 1 0 N 1 1 s s s s s P P - t t t t t 0 1 0 x x x x x Rx-=mpyu(Rs.H,Rt.L)[:<<N]

1 1 1 0 1 1 1 0 N 1 1 s s s s s P P - t t t t t 0 1 1 x x x x x Rx-
=mpyu(Rs.H,Rt.H)[:<<N]

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
sH Rs is high
tH Rt is high

80-N2040-53 Rev. AA 506


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 507


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Polynomial multiply words

Perform a 32 × 32 carryless polynomial multiply using 32-bit source registers Rs and Rt. The 64-bit
result is optionally accumulated (XORed) with the destination register. Finite field multiply
instructions are useful for many algorithms including scramble code generation, cryptographic
algorithms, convolutional, and Reed Solomon codes.
Rxx += pmpyw(Rs,Rt)

Rs

Rt

32 x 32
carryless
polynomial
mpy *
XOR

Rxx

Syntax Behavior
Rdd=pmpyw(Rs,Rt) x = Rs.uw[0];
y = Rt.uw[0];
prod = 0;
for(i=0; i < 32; i++) {
if((y >> i) & 1) prod ^= (x << i);
}
Rdd = prod;
Rxx^=pmpyw(Rs,Rt) x = Rs.uw[0];
y = Rt.uw[0];
prod = 0;
for(i=0; i < 32; i++) {
if((y >> i) & 1) prod ^= (x << i);
}
Rxx ^= prod;

80-N2040-53 Rev. AA 508


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=pmpyw(Rs,Rt) Word64 Q6_P_pmpyw_RR(Word32 Rs, Word32 Rt)


Rxx^=pmpyw(Rs,Rt) Word64 Q6_P_pmpywxacc_RR(Word64 Rxx, Word32
Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 0 1 0 1 0 1 0 s s s s s P P 0 t t t t t 1 1 1 d d d d d Rdd=pmpyw(Rs,Rt)
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 0 1 1 1 0 0 1 s s s s s P P 0 t t t t t 1 1 1 x x x x x Rxx^=pmpyw(Rs,Rt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 509


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector reduce multiply word by signed half (32 × 16)


Perform mixed precision vector multiply operations and accumulate the results. A 32-bit word
from vector Rss is multiplied by a 16-bit halfword (either even or odd) from vector Rtt. The
multiplication is performed as a signed 32 × 16, which produces a 48-bit result. This result is
optionally scaled left by one bit. A similar operation is performed for both words in Rss, and
accumulates the two results. The final result optionally accumulates with Rxx.

s16 s16 s16 s16 Rtt

mux mux

s32 s32 Rss

* *
48 48

<<0-1 <<0-1

Add

Rxx

Syntax Behavior
Rdd=vrmpyweh(Rss,Rtt)[:<<1] Rdd = (Rss.w[1] * Rtt.h[2])[<<1] + (Rss.w[0] *
Rtt.h[0])[<<1];
Rdd=vrmpywoh(Rss,Rtt)[:<<1] Rdd = (Rss.w[1] * Rtt.h[3])[<<1] + (Rss.w[0] *
Rtt.h[1])[<<1];
Rxx+=vrmpyweh(Rss,Rtt)[:<<1] Rxx += (Rss.w[1] * Rtt.h[2])[<<1] + (Rss.w[0] *
Rtt.h[0])[<<1];
Rxx+=vrmpywoh(Rss,Rtt)[:<<1] Rxx += (Rss.w[1] * Rtt.h[3])[<<1] + (Rss.w[0] *
Rtt.h[1])[<<1];

80-N2040-53 Rev. AA 510


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vrmpyweh(Rss,Rtt) Word64 Q6_P_vrmpyweh_PP(Word64 Rss, Word64 Rtt)


Rdd=vrmpyweh(Rss,Rtt):<< Word64 Q6_P_vrmpyweh_PP_s1(Word64 Rss, Word64 Rtt)
1
Rdd=vrmpywoh(Rss,Rtt) Word64 Q6_P_vrmpywoh_PP(Word64 Rss, Word64 Rtt)
Rdd=vrmpywoh(Rss,Rtt):<< Word64 Q6_P_vrmpywoh_PP_s1(Word64 Rss, Word64 Rtt)
1
Rxx+=vrmpyweh(Rss,Rtt) Word64 Q6_P_vrmpywehacc_PP(Word64 Rxx, Word64 Rss,
Word64 Rtt)
Rxx+=vrmpyweh(Rss,Rtt):< Word64 Q6_P_vrmpywehacc_PP_s1(Word64 Rxx, Word64 Rss,
<1 Word64 Rtt)
Rxx+=vrmpywoh(Rss,Rtt) Word64 Q6_P_vrmpywohacc_PP(Word64 Rxx, Word64 Rss,
Word64 Rtt)
Rxx+=vrmpywoh(Rss,Rtt):< Word64 Q6_P_vrmpywohacc_PP_s1(Word64 Rxx, Word64 Rss,
<1 Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5

1 1 1 0 1 0 0 0 N 0 1 s s s s s P P 0 t t t t t 0 1 0 d d d d d Rdd=vrmpywoh(Rss,Rtt)[:<
<N]
Rdd=vrmpyweh(Rss,Rtt)[:<
1 1 1 0 1 0 0 0 N 1 0 s s s s s P P 0 t t t t t 1 0 0 d d d d d <N]

ICLASS RegType MajOp s5 Parse t5 MinOp x5


Rxx+=vrmpyweh(Rss,Rtt)[:
1 1 1 0 1 0 1 0 N 0 1 s s s s s P P 0 t t t t t 1 1 0 x x x x x <<N]

1 1 1 0 1 0 1 0 N 1 1 s s s s s P P 0 t t t t t 1 1 0 x x x x x Rxx+=vrmpywoh(Rss,Rtt)[:
<<N]

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 511


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Multiply and use upper result


Multiply two signed or unsigned 32-bit words. Store the upper 32-bits of this result to a single
destination register. Optional rounding is available.

Rs

Rt

32x32

64

Rd

Syntax Behavior
Rd=mpy(Rs,Rt.H):<<1:rnd:sat Rd = sat32(((Rs * Rt.h[1])<<1+0x8000)>>16);

Rd=mpy(Rs,Rt.H):<<1:sat Rd = sat32(((Rs * Rt.h[1])<<1)>>16);

Rd=mpy(Rs,Rt.L):<<1:rnd:sat Rd = sat32(((Rs * Rt.h[0])<<1+0x8000)>>16);

Rd=mpy(Rs,Rt.L):<<1:sat Rd = sat32(((Rs * Rt.h[0])<<1)>>16);

Rd=mpy(Rs,Rt) Rd=(Rs * Rt)>>32;


Rd=mpy(Rs,Rt):<<1 Rd=(Rs * Rt)>>31;
Rd=mpy(Rs,Rt):<<1:sat Rd=sat32((Rs * Rt)>>31);

Rd=mpy(Rs,Rt):rnd Rd=((Rs * Rt)+0x80000000)>>32;


Rd=mpysu(Rs,Rt) Rd=(Rs * Rt.uw[0])>>32;
Rd=mpyu(Rs,Rt) Rd=(Rs.uw[0] * Rt.uw[0])>>32;
Rx+=mpy(Rs,Rt):<<1:sat Rx=sat32((Rx) + ((Rs * Rt)>>31));

Rx-=mpy(Rs,Rt):<<1:sat Rx=sat32((Rx) - ((Rs * Rt)>>31));

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 512


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rd=mpy(Rs,Rt.H):<<1:rnd:s Word32 Q6_R_mpy_RRh_s1_rnd_sat(Word32 Rs, Word32


at Rt)
Rd=mpy(Rs,Rt.H):<<1:sat Word32 Q6_R_mpy_RRh_s1_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs,Rt.L):<<1:rnd:s Word32 Q6_R_mpy_RRl_s1_rnd_sat(Word32 Rs, Word32
at Rt)
Rd=mpy(Rs,Rt.L):<<1:sat Word32 Q6_R_mpy_RRl_s1_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs,Rt) Word32 Q6_R_mpy_RR(Word32 Rs, Word32 Rt)
Rd=mpy(Rs,Rt):<<1 Word32 Q6_R_mpy_RR_s1(Word32 Rs, Word32 Rt)
Rd=mpy(Rs,Rt):<<1:sat Word32 Q6_R_mpy_RR_s1_sat(Word32 Rs, Word32 Rt)
Rd=mpy(Rs,Rt):rnd Word32 Q6_R_mpy_RR_rnd(Word32 Rs, Word32 Rt)
Rd=mpysu(Rs,Rt) Word32 Q6_R_mpysu_RR(Word32 Rs, Word32 Rt)
Rd=mpyu(Rs,Rt) UWord32 Q6_R_mpyu_RR(Word32 Rs, Word32 Rt)
Rx+=mpy(Rs,Rt):<<1:sat Word32 Q6_R_mpyacc_RR_s1_sat(Word32 Rx, Word32 Rs,
Word32 Rt)
Rx-=mpy(Rs,Rt):<<1:sat Word32 Q6_R_mpynac_RR_s1_sat(Word32 Rx, Word32 Rs,
Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 1 0 1 0 0 1 s s s s s P P 0 t t t t t 0 0 1 d d d d d Rd=mpy(Rs,Rt):rnd
1 1 1 0 1 1 0 1 0 1 0 s s s s s P P 0 t t t t t 0 0 1 d d d d d Rd=mpyu(Rs,Rt)
1 1 1 0 1 1 0 1 0 1 1 s s s s s P P 0 t t t t t 0 0 1 d d d d d Rd=mpysu(Rs,Rt)
1 1 1 0 1 1 0 1 1 0 1 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rd=mpy(Rs,Rt.H):<<1:sat
1 1 1 0 1 1 0 1 1 0 1 s s s s s P P 0 t t t t t 0 0 1 d d d d d Rd=mpy(Rs,Rt.L):<<1:sat

1 1 1 0 1 1 0 1 1 0 1 s s s s s P P 0 t t t t t 1 0 0 d d d d d Rd=mpy(Rs,Rt.H):<<1:rnd:
sat
1 1 1 0 1 1 0 1 1 1 1 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rd=mpy(Rs,Rt):<<1:sat

1 1 1 0 1 1 0 1 1 1 1 s s s s s P P 0 t t t t t 1 0 0 d d d d d Rd=mpy(Rs,Rt.L):<<1:rnd:s
at
1 1 1 0 1 1 0 1 N 0 N s s s s s P P 0 t t t t t 0 N N d d d d d Rd=mpy(Rs,Rt)[:<<N]
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 1 1 1 0 1 1 s s s s s P P 0 t t t t t 0 0 0 x x x x x Rx+=mpy(Rs,Rt):<<1:sat
1 1 1 0 1 1 1 1 0 1 1 s s s s s P P 0 t t t t t 0 0 1 x x x x x Rx-=mpy(Rs,Rt):<<1:sat

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d

80-N2040-53 Rev. AA 513


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 514


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Multiply and use full result


Multiply two signed or unsigned 32-bit words. Optionally, add or subtract this value from the 64-
bit accumulator. The result is a full-precision 64-bit value.

Rs

Rt

32 x 32

64

64-bit add/sub

Rxx

Syntax Behavior
Rdd=mpy(Rs,Rt) Rdd=(Rs * Rt);
Rdd=mpyu(Rs,Rt) Rdd=(Rs.uw[0] * Rt.uw[0]);
Rxx[+-]=mpy(Rs,Rt) Rxx= Rxx [+-] (Rs * Rt);
Rxx[+-]=mpyu(Rs,Rt) Rxx= Rxx [+-] (Rs.uw[0] * Rt.uw[0]);

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=mpy(Rs,Rt) Word64 Q6_P_mpy_RR(Word32 Rs, Word32 Rt)


Rdd=mpyu(Rs,Rt) UWord64 Q6_P_mpyu_RR(Word32 Rs, Word32 Rt)
Rxx+=mpy(Rs,Rt) Word64 Q6_P_mpyacc_RR(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx+=mpyu(Rs,Rt) Word64 Q6_P_mpyuacc_RR(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpy(Rs,Rt) Word64 Q6_P_mpynac_RR(Word64 Rxx, Word32 Rs,
Word32 Rt)
Rxx-=mpyu(Rs,Rt) Word64 Q6_P_mpyunac_RR(Word64 Rxx, Word32 Rs,
Word32 Rt)

80-N2040-53 Rev. AA 515


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 0 1 0 1 0 0 0 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rdd=mpy(Rs,Rt)
1 1 1 0 0 1 0 1 0 1 0 s s s s s P P 0 t t t t t 0 0 0 d d d d d Rdd=mpyu(Rs,Rt)
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 0 1 1 1 0 0 0 s s s s s P P 0 t t t t t 0 0 0 x x x x x Rxx+=mpy(Rs,Rt)
1 1 1 0 0 1 1 1 0 0 1 s s s s s P P 0 t t t t t 0 0 0 x x x x x Rxx-=mpy(Rs,Rt)
1 1 1 0 0 1 1 1 0 1 0 s s s s s P P 0 t t t t t 0 0 0 x x x x x Rxx+=mpyu(Rs,Rt)
1 1 1 0 0 1 1 1 0 1 1 s s s s s P P 0 t t t t t 0 0 0 x x x x x Rxx-=mpyu(Rs,Rt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 516


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector dual multiply


Multiply four 16-bit halfwords in Rss by the corresponding 16-bit halfwords in Rtt. Add and scale
the two lower results. Optionally add the lower word of the accumulator. Saturate this result to
32-bits and store in the lower word of the accumulator. Perform the same operation on the upper
two products using the upper word of the accumulator.

Rxx+=vdmpy(Rss,Rtt):sat

Rss

Rtt

* * * *
32 32 32 32

<<0-1 <<0-1 <<0-1 <<0-1

Add Add

Sat_32 Sat_32 32

32

High accumulation Low accumulation

Rxx

Syntax Behavior
Rdd=vdmpy(Rss,Rtt):<<1:s Rdd.w[0]=sat32((Rss.h[0] * Rtt.h[0])<<1 + (Rss.h[1] *
at Rtt.h[1])<<1);
Rdd.w[1]=sat32((Rss.h[2] * Rtt.h[2])<<1 + (Rss.h[3] *
Rtt.h[3])<<1);
Rdd=vdmpy(Rss,Rtt):sat Rdd.w[0]=sat32((Rss.h[0] * Rtt.h[0])<<0 + (Rss.h[1] *
Rtt.h[1])<<0);
Rdd.w[1]=sat32((Rss.h[2] * Rtt.h[2])<<0 + (Rss.h[3] *
Rtt.h[3])<<0);

80-N2040-53 Rev. AA 517


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rxx+=vdmpy(Rss,Rtt):<<1: Rxx.w[0]=sat32(Rxx.w[0] + (Rss.h[0] * Rtt.h[0])<<1 +
sat (Rss.h[1] * Rtt.h[1])<<1);
Rxx.w[1]=sat32(Rxx.w[1] + (Rss.h[2] * Rtt.h[2])<<1 +
(Rss.h[3] * Rtt.h[3])<<1);
Rxx+=vdmpy(Rss,Rtt):sat Rxx.w[0]=sat32(Rxx.w[0] + (Rss.h[0] * Rtt.h[0])<<0 +
(Rss.h[1] * Rtt.h[1])<<0);
Rxx.w[1]=sat32(Rxx.w[1] + (Rss.h[2] * Rtt.h[2])<<0 +
(Rss.h[3] * Rtt.h[3])<<0);

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vdmpy(Rss,Rtt):<<1:sa Word64 Q6_P_vdmpy_PP_s1_sat(Word64 Rss, Word64 Rtt)


t
Rdd=vdmpy(Rss,Rtt):sat Word64 Q6_P_vdmpy_PP_sat(Word64 Rss, Word64 Rtt)
Rxx+=vdmpy(Rss,Rtt):<<1:s Word64 Q6_P_vdmpyacc_PP_s1_sat(Word64 Rxx, Word64
at Rss, Word64 Rtt)
Rxx+=vdmpy(Rss,Rtt):sat Word64 Q6_P_vdmpyacc_PP_sat(Word64 Rxx, Word64 Rss,
Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
Rdd=vdmpy(Rss,Rtt)[:<<N]:
1 1 1 0 1 0 0 0 N 0 0 s s s s s P P 0 t t t t t 1 0 0 d d d d d sat

ICLASS RegType MajOp s5 Parse t5 MinOp x5


Rxx+=vdmpy(Rss,Rtt)[:<<N
1 1 1 0 1 0 1 0 N 0 0 s s s s s P P 0 t t t t t 1 0 0 x x x x x ]:sat

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d

80-N2040-53 Rev. AA 518


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 519


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector dual multiply with round and pack


Multiply four 16-bit halfwords in Rss by the corresponding 16-bit halfwords in Rtt. Scale and add
together the two lower results with a rounding constant. Saturate this result to 32-bits, and store
the upper 16-bits of this result in the lower 16-bits of the destination register. The same operation
is performed on the upper two products and the result is stored in the upper 16-bit halfword of
the destination.

Rd=vdmpy(Rss,Rtt):rnd:sat

Rss

Rtt

* * * *
32 32 32 32

0x8000 <<0-1 <<0-1 <<0-1 <<0-1 0x8000

Add Add

Sat_32 Sat_32

High 16 bits High 16 bits

Rd

Syntax Behavior
Rd=vdmpy(Rss,Rtt)[:<<1]:rnd: Rd.h[0]=(sat32((Rss.h[0] * Rtt.h[0])[<<1] +
sat (Rss.h[1] * Rtt.h[1])[<<1] + 0x8000)).h[1];
Rd.h[1]=(sat32((Rss.h[2] * Rtt.h[2])[<<1] +
(Rss.h[3] * Rtt.h[3])[<<1] + 0x8000)).h[1];

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 520


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rd=vdmpy(Rss,Rtt):<<1:rnd:sa Word32 Q6_R_vdmpy_PP_s1_rnd_sat(Word64 Rss, Word64


t Rtt)
Rd=vdmpy(Rss,Rtt):rnd:sat Word32 Q6_R_vdmpy_PP_rnd_sat(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
Rd=vdmpy(Rss,Rtt)[:<<N]:r
1 1 1 0 1 0 0 1 N 0 0 s s s s s P P 0 t t t t t 0 0 0 d d d d d
nd:sat

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 521


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector reduce multiply bytes


Multiply eight 8-bit bytes in Rss by the corresponding 8-bit bytes in Rtt. Accumulate the four
lower results. Optionally add the lower word of the accumulator. Store this result in the lower 32-
bits of the accumulator. The same operation is performed on the upper four products using the
upper word of the accumulator. The eight bytes of Rss are treated as either signed or unsigned.

Rss

Rtt

* * * * * * * *
16 16 16 16 16 16 16 16

Add Add
32 32

High accumulation Low accumulation

Rxx

Syntax Behavior
Rdd=vrmpybsu(Rss,Rtt Rdd.w[0]=((Rss.b[0] * Rtt.ub[0]) + (Rss.b[1] * Rtt.ub[1])
) + (Rss.b[2] * Rtt.ub[2]) + (Rss.b[3] * Rtt.ub[3]));
Rdd.w[1]=((Rss.b[4] * Rtt.ub[4]) + (Rss.b[5] * Rtt.ub[5])
+ (Rss.b[6] * Rtt.ub[6]) + (Rss.b[7] * Rtt.ub[7]));
Rdd=vrmpybu(Rss,Rtt) Rdd.w[0]=((Rss.ub[0] * Rtt.ub[0]) + (Rss.ub[1] *
Rtt.ub[1]) + (Rss.ub[2] * Rtt.ub[2]) + (Rss.ub[3] *
Rtt.ub[3]));
Rdd.w[1]=((Rss.ub[4] * Rtt.ub[4]) + (Rss.ub[5] *
Rtt.ub[5]) + (Rss.ub[6] * Rtt.ub[6]) + (Rss.ub[7] *
Rtt.ub[7]));
Rxx+=vrmpybsu(Rss,Rt Rxx.w[0]=(Rxx.w[0] + (Rss.b[0] * Rtt.ub[0]) + (Rss.b[1] *
t) Rtt.ub[1]) + (Rss.b[2] * Rtt.ub[2]) + (Rss.b[3] *
Rtt.ub[3]));
Rxx.w[1]=(Rxx.w[1] + (Rss.b[4] * Rtt.ub[4]) + (Rss.b[5] *
Rtt.ub[5]) + (Rss.b[6] * Rtt.ub[6]) + (Rss.b[7] *
Rtt.ub[7]));
Rxx+=vrmpybu(Rss,Rtt Rxx.w[0]=(Rxx.w[0] + (Rss.ub[0] * Rtt.ub[0]) + (Rss.ub[1]
) * Rtt.ub[1]) + (Rss.ub[2] * Rtt.ub[2]) + (Rss.ub[3] *
Rtt.ub[3]));
Rxx.w[1]=(Rxx.w[1] + (Rss.ub[4] * Rtt.ub[4]) + (Rss.ub[5]
* Rtt.ub[5]) + (Rss.ub[6] * Rtt.ub[6]) + (Rss.ub[7] *
Rtt.ub[7]));

80-N2040-53 Rev. AA 522


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vrmpybsu(Rss,Rtt) Word64 Q6_P_vrmpybsu_PP(Word64 Rss, Word64 Rtt)


Rdd=vrmpybu(Rss,Rtt) Word64 Q6_P_vrmpybu_PP(Word64 Rss, Word64 Rtt)
Rxx+=vrmpybsu(Rss,Rtt) Word64 Q6_P_vrmpybsuacc_PP(Word64 Rxx, Word64 Rss,
Word64 Rtt)
Rxx+=vrmpybu(Rss,Rtt) Word64 Q6_P_vrmpybuacc_PP(Word64 Rxx, Word64 Rss,
Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 0 1 0 0 s s s s s P P 0 t t t t t 0 0 1 d d d d d Rdd=vrmpybu(Rss,Rtt)
1 1 1 0 1 0 0 0 1 1 0 s s s s s P P 0 t t t t t 0 0 1 d d d d d Rdd=vrmpybsu(Rss,Rtt)
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 0 1 0 1 0 0 s s s s s P P 0 t t t t t 0 0 1 x x x x x Rxx+=vrmpybu(Rss,Rtt)
1 1 1 0 1 0 1 0 1 1 0 s s s s s P P 0 t t t t t 0 0 1 x x x x x Rxx+=vrmpybsu(Rss,Rtt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 523


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector dual multiply signed by unsigned bytes


Multiply eight 8-bit signed bytes in Rss by the corresponding 8-bit unsigned bytes in Rtt. Add the
results in pairs, and optionally add the accumulator. Saturate the results to signed 16 bits and
store in the four halfwords of the destination register.

Rss

Rtt

* * * * * * * *
16 16 16 16 16 16 16 16

Add Add Add Add

Sat_16 Sat_16 Sat_16 Sat_16

Rxx

Syntax Behavior
Rdd=vdmpybsu(Rss,Rtt):sat Rdd.h[0]=sat16(((Rss.b[0] * Rtt.ub[0]) + (Rss.b[1]
* Rtt.ub[1])));
Rdd.h[1]=sat16((((Rss.b[2] * Rtt.ub[2]) +
(Rss.b[3] * Rtt.ub[3])));
Rdd.h[2]=sat16(((Rss.b[4] * Rtt.ub[4]) + (Rss.b[5]
* Rtt.ub[5])));
Rdd.h[3]=sat16(((Rss.b[6] * Rtt.ub[6]) + (Rss.b[7]
* Rtt.ub[7])));
Rxx+=vdmpybsu(Rss,Rtt):sat Rxx.h[0]=sat16((Rxx.h[0] + (Rss.b[0] * Rtt.ub[0])
+ (Rss.b[1] * Rtt.ub[1])));
Rxx.h[1]=sat16((Rxx.h[1] + (Rss.b[2] * Rtt.ub[2])
+ (Rss.b[3] * Rtt.ub[3])));
Rxx.h[2]=sat16((Rxx.h[2] + (Rss.b[4] * Rtt.ub[4])
+ (Rss.b[5] * Rtt.ub[5])));
Rxx.h[3]=sat16((Rxx.h[3] + (Rss.b[6] * Rtt.ub[6])
+ (Rss.b[7] * Rtt.ub[7])));

80-N2040-53 Rev. AA 524


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vdmpybsu(Rss,Rtt):sat Word64 Q6_P_vdmpybsu_PP_sat(Word64 Rss,


Word64 Rtt)
Rxx+=vdmpybsu(Rss,Rtt):sat Word64 Q6_P_vdmpybsuacc_PP_sat(Word64 Rxx,
Word64 Rss, Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
Rdd=vdmpybsu(Rss,Rtt):sa
1 1 1 0 1 0 0 0 1 0 1 s s s s s P P 0 t t t t t 0 0 1 d d d d d t

ICLASS RegType MajOp s5 Parse t5 MinOp x5


Rxx+=vdmpybsu(Rss,Rtt):s
1 1 1 0 1 0 1 0 0 0 1 s s s s s P P 0 t t t t t 0 0 1 x x x x x at

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 525


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector multiply even halfwords


Multiply the even 16-bit halfwords from Rss and Rtt separately. Optionally accumulate with the
low and high words of the destination register pair and optionally saturate.

Rxx+=vmpyeh(Rss,Rtt):sat

Rss

Rtt

* *
32 32

<<0-1 <<0-1

Add Add

Sat32 Sat32 32
32

High accumulation Low accumulation

Rxx

Syntax Behavior
Rdd=vmpyeh(Rss,Rtt):<<1:sat Rdd.w[0]=sat32((Rss.h[0] * Rtt.h[0])<<1);
Rdd.w[1]=sat32((Rss.h[2] * Rtt.h[2])<<1);

Rdd=vmpyeh(Rss,Rtt):sat Rdd.w[0]=sat32((Rss.h[0] * Rtt.h[0])<<0);


Rdd.w[1]=sat32((Rss.h[2] * Rtt.h[2])<<0);

Rxx+=vmpyeh(Rss,Rtt) Rxx.w[0]=Rxx.w[0] + (Rss.h[0] * Rtt.h[0]);


Rxx.w[1]=Rxx.w[1] + (Rss.h[2] * Rtt.h[2]);
Rxx+=vmpyeh(Rss,Rtt):<<1:sat Rxx.w[0]=sat32(Rxx.w[0] + (Rss.h[0] *
Rtt.h[0])<<1);
Rxx.w[1]=sat32(Rxx.w[1] + (Rss.h[2] *
Rtt.h[2])<<1);
Rxx+=vmpyeh(Rss,Rtt):sat Rxx.w[0]=sat32(Rxx.w[0] + (Rss.h[0] *
Rtt.h[0])<<0);
Rxx.w[1]=sat32(Rxx.w[1] + (Rss.h[2] *
Rtt.h[2])<<0);

80-N2040-53 Rev. AA 526


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vmpyeh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpyeh_PP_s1_sat(Word64 Rss, Word64


Rtt)
Rdd=vmpyeh(Rss,Rtt):sat Word64 Q6_P_vmpyeh_PP_sat(Word64 Rss, Word64 Rtt)
Rxx+=vmpyeh(Rss,Rtt) Word64 Q6_P_vmpyehacc_PP(Word64 Rxx, Word64 Rss,
Word64 Rtt)
Rxx+=vmpyeh(Rss,Rtt):<<1:sat Word64 Q6_P_vmpyehacc_PP_s1_sat(Word64 Rxx, Word64
Rss, Word64 Rtt)
Rxx+=vmpyeh(Rss,Rtt):sat Word64 Q6_P_vmpyehacc_PP_sat(Word64 Rxx, Word64
Rss, Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5

1 1 1 0 1 0 0 0 N 0 0 s s s s s P P 0 t t t t t 1 1 0 d d d d d Rdd=vmpyeh(Rss,Rtt)[:<<N
]:sat
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 0 1 0 0 0 1 s s s s s P P 0 t t t t t 0 1 0 x x x x x Rxx+=vmpyeh(Rss,Rtt)

1 1 1 0 1 0 1 0 N 0 0 s s s s s P P 0 t t t t t 1 1 0 x x x x x Rxx+=vmpyeh(Rss,Rtt)[:<<
N]:sat

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 527


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector multiply halfwords


Multiply two 16-bit halfwords separately, and optionally accumulate with the low and high words
of the destination. Optionally saturate, and store the results back to the destination register pair.

Rxx+=vmpyh(Rs,Rt):sat

Rs

Rt

* *
32 32

<<0-1 <<0-1

Add Add

Sat32 Sat32 32
32

High accumulation Low accumulation

Rxx

Syntax Behavior
Rdd=vmpyh(Rs,Rt)[:<<1]:sa Rdd.w[0]=sat32((Rs.h[0] * Rt.h[0])[<<1]);
t Rdd.w[1]=sat32((Rs.h[1] * Rt.h[1])[<<1]);

Rxx+=vmpyh(Rs,Rt) Rxx.w[0]=Rxx.w[0] + (Rs.h[0] * Rt.h[0]);


Rxx.w[1]=Rxx.w[1] + (Rs.h[1] * Rt.h[1]);
Rxx+=vmpyh(Rs,Rt)[:<<1]:s Rxx.w[0]=sat32(Rxx.w[0] + (Rs.h[0] * Rt.h[0])[<<1]);
at Rxx.w[1]=sat32(Rxx.w[1] + (Rs.h[1] * Rt.h[1])[<<1]);

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 528


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rdd=vmpyh(Rs,Rt):<<1:sat Word64 Q6_P_vmpyh_RR_s1_sat(Word32 Rs, Word32 Rt)


Rdd=vmpyh(Rs,Rt):sat Word64 Q6_P_vmpyh_RR_sat(Word32 Rs, Word32 Rt)
Rxx+=vmpyh(Rs,Rt) Word64 Q6_P_vmpyhacc_RR(Word64 Rxx, Word32 Rs, Word32
Rt)
Rxx+=vmpyh(Rs,Rt):<<1:sa Word64 Q6_P_vmpyhacc_RR_s1_sat(Word64 Rxx, Word32 Rs,
t Word32 Rt)
Rxx+=vmpyh(Rs,Rt):sat Word64 Q6_P_vmpyhacc_RR_sat(Word64 Rxx, Word32 Rs,
Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
Rdd=vmpyh(Rs,Rt)[:<<N]:s
1 1 1 0 0 1 0 1 N 0 0 s s s s s P P 0 t t t t t 1 0 1 d d d d d at

ICLASS RegType MajOp s5 Parse t5 MinOp x5


1 1 1 0 0 1 1 1 0 0 1 s s s s s P P 0 t t t t t 0 0 1 x x x x x Rxx+=vmpyh(Rs,Rt)

1 1 1 0 0 1 1 1 N 0 0 s s s s s P P 0 t t t t t 1 0 1 x x x x x Rxx+=vmpyh(Rs,Rt)[:<<N]:
sat

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 529


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector multiply halfwords with round and pack


Multiply two 16-bit halfwords separately. Round the results, and store the high halfwords packed
in a single register destination.

Rd=vmpyh(Rs,Rt):rnd:sat

Rs

Rt

* *
32 32
0x8000 <<0-1 <<0-1 0x8000

Add Add

Sat32 Sat32

High 16 bits
High 16 bits

Rd

Syntax Behavior
Rd=vmpyh(Rs,Rt)[:<<1]:rnd:sa Rd.h[1]=(sat32((Rs.h[1] * Rt.h[1])[<<1] +
t 0x8000)).h[1];
Rd.h[0]=(sat32((Rs.h[0] * Rt.h[0])[<<1] +
0x8000)).h[1];

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 530


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rd=vmpyh(Rs,Rt):<<1:rnd:sat Word32 Q6_R_vmpyh_RR_s1_rnd_sat(Word32 Rs,


Word32 Rt)
Rd=vmpyh(Rs,Rt):rnd:sat Word32 Q6_R_vmpyh_RR_rnd_sat(Word32 Rs, Word32
Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
Rd=vmpyh(Rs,Rt)[:<<N]:rn
1 1 1 0 1 1 0 1 N 0 1 s s s s s P P 0 t t t t t 1 1 1 d d d d d
d:sat

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 531


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector multiply halfwords signed by unsigned


Multiply two 16-bit halfwords. Rs is considered signed, Ru unsigned.

Syntax Behavior
Rdd=vmpyhsu(Rs,Rt)[:<<1]:sat Rdd.w[0]=sat32((Rs.h[0] * Rt.uh[0])[<<1]);
Rdd.w[1]=sat32((Rs.h[1] * Rt.uh[1])[<<1]);

Rxx+=vmpyhsu(Rs,Rt)[:<<1]:sa Rxx.w[0]=sat32(Rxx.w[0] + (Rs.h[0] *


t Rt.uh[0])[<<1]);
Rxx.w[1]=sat32(Rxx.w[1] + (Rs.h[1] *
Rt.uh[1])[<<1]);

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vmpyhsu(Rs,Rt):<<1:sa Word64 Q6_P_vmpyhsu_RR_s1_sat(Word32 Rs, Word32 Rt)


t
Rdd=vmpyhsu(Rs,Rt):sat Word64 Q6_P_vmpyhsu_RR_sat(Word32 Rs, Word32 Rt)
Rxx+=vmpyhsu(Rs,Rt):<<1:s Word64 Q6_P_vmpyhsuacc_RR_s1_sat(Word64 Rxx, Word32
at Rs, Word32 Rt)
Rxx+=vmpyhsu(Rs,Rt):sat Word64 Q6_P_vmpyhsuacc_RR_sat(Word64 Rxx, Word32 Rs,
Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5

1 1 1 0 0 1 0 1 N 0 0 s s s s s P P 0 t t t t t 1 1 1 d d d d d Rdd=vmpyhsu(Rs,Rt)[:<<N]
:sat
ICLASS RegType MajOp s5 Parse t5 MinOp x5

1 1 1 0 0 1 1 1 N 1 1 s s s s s P P 0 t t t t t 1 0 1 x x x x x Rxx+=vmpyhsu(Rs,Rt)[:<<
N]:sat

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d

80-N2040-53 Rev. AA 532


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 533


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector reduce multiply halfwords


Multiply each halfword of Rss by the corresponding halfword in Rtt. Add the intermediate
products together and then optionally add the accumulator. Store the full 64-bit result in the
destination register pair.
Rss

Rtt

* * * *
32 32
32 32

Add

64

Rdd
64-bit register pair

Syntax Behavior
Rdd=vrmpyh(Rss,Rtt) Rdd = (Rss.h[0] * Rtt.h[0]) + (Rss.h[1] * Rtt.h[1]) +
(Rss.h[2] * Rtt.h[2]) + (Rss.h[3] * Rtt.h[3]);
Rxx+=vrmpyh(Rss,Rtt) Rxx = Rxx + (Rss.h[0] * Rtt.h[0]) + (Rss.h[1] *
Rtt.h[1]) + (Rss.h[2] * Rtt.h[2]) + (Rss.h[3] *
Rtt.h[3]);

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vrmpyh(Rss,Rtt) Word64 Q6_P_vrmpyh_PP(Word64 Rss, Word64 Rtt)


Rxx+=vrmpyh(Rss,Rtt) Word64 Q6_P_vrmpyhacc_PP(Word64 Rxx, Word64 Rss,
Word64 Rtt)

80-N2040-53 Rev. AA 534


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 1 0 0 0 0 0 0 s s s s s P P 0 t t t t t 0 1 0 d d d d d Rdd=vrmpyh(Rss,Rtt)
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 1 0 1 0 0 0 0 s s s s s P P 0 t t t t t 0 1 0 x x x x x Rxx+=vrmpyh(Rss,Rtt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 535


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector multiply bytes


Multiply four 8-bit bytes from register Rs by four 8-bit bytes from Rt. Optionally accumulate the
product with the 16-bit value from the destination register. Pack the 16-bit results in the
destination register pair. The bytes of Rs are treated as either signed or unsigned.

Rs

Rt

* * * *

Add Add Add Add

Rxx

Syntax Behavior
Rdd=vmpybsu(Rs,Rt) Rdd.h[0]=((Rs.b[0] * Rt.ub[0]));
Rdd.h[1]=((Rs.b[1] * Rt.ub[1]));
Rdd.h[2]=((Rs.b[2] * Rt.ub[2]));
Rdd.h[3]=((Rs.b[3] * Rt.ub[3]));
Rdd=vmpybu(Rs,Rt) Rdd.h[0]=((Rs.ub[0] * Rt.ub[0]));
Rdd.h[1]=((Rs.ub[1] * Rt.ub[1]));
Rdd.h[2]=((Rs.ub[2] * Rt.ub[2]));
Rdd.h[3]=((Rs.ub[3] * Rt.ub[3]));
Rxx+=vmpybsu(Rs,Rt) Rxx.h[0]=(Rxx.h[0]+(Rs.b[0] * Rt.ub[0]));
Rxx.h[1]=(Rxx.h[1]+(Rs.b[1] * Rt.ub[1]));
Rxx.h[2]=(Rxx.h[2]+(Rs.b[2] * Rt.ub[2]));
Rxx.h[3]=(Rxx.h[3]+(Rs.b[3] * Rt.ub[3]));
Rxx+=vmpybu(Rs,Rt) Rxx.h[0]=(Rxx.h[0]+(Rs.ub[0] * Rt.ub[0]));
Rxx.h[1]=(Rxx.h[1]+(Rs.ub[1] * Rt.ub[1]));
Rxx.h[2]=(Rxx.h[2]+(Rs.ub[2] * Rt.ub[2]));
Rxx.h[3]=(Rxx.h[3]+(Rs.ub[3] * Rt.ub[3]));

80-N2040-53 Rev. AA 536


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vmpybsu(Rs,Rt) Word64 Q6_P_vmpybsu_RR(Word32 Rs, Word32 Rt)


Rdd=vmpybu(Rs,Rt) Word64 Q6_P_vmpybu_RR(Word32 Rs, Word32 Rt)
Rxx+=vmpybsu(Rs,Rt) Word64 Q6_P_vmpybsuacc_RR(Word64 Rxx, Word32 Rs, Word32
Rt)
Rxx+=vmpybu(Rs,Rt) Word64 Q6_P_vmpybuacc_RR(Word64 Rxx, Word32 Rs, Word32
Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 0 1 0 1 0 1 0 s s s s s P P 0 t t t t t 0 0 1 d d d d d Rdd=vmpybsu(Rs,Rt)
1 1 1 0 0 1 0 1 1 0 0 s s s s s P P 0 t t t t t 0 0 1 d d d d d Rdd=vmpybu(Rs,Rt)
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 0 1 1 1 1 0 0 s s s s s P P 0 t t t t t 0 0 1 x x x x x Rxx+=vmpybu(Rs,Rt)
1 1 1 0 0 1 1 1 1 1 0 s s s s s P P 0 t t t t t 0 0 1 x x x x x Rxx+=vmpybsu(Rs,Rt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 537


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector polynomial multiply halfwords

Perform a vector 16 × 16 carryless polynomial multiply using 32-bit source registers Rs and Rt.
Store the 64-bit result in packed H,H,L,L format in the destination register. The destination
register can also be optionally accumulated (XORed). Finite field multiply instructions are useful
for many algorithms including scramble code generation, cryptographic algorithms,
convolutional, and Reed Solomon codes.

Rxx += vpmpyh(Rs,Rt)

Rs

Rt

16 x 16
carryless 16 x 16
polynomial
mpy * * carryless
polynomial
mpy

XOR XOR

Rxx

Syntax Behavior
Rdd=vpmpyh(Rs,Rt) x0 = Rs.uh[0];
x1 = Rs.uh[1];
y0 = Rt.uh[0];
y1 = Rt.uh[1];
prod0 = prod1 = 0;
for(i=0; i < 16; i++) {
if((y0 >> i) & 1) prod0 ^= (x0 << i);
if((y1 >> i) & 1) prod1 ^= (x1 << i);
}
Rdd.h[0]=prod0.uh[0];
Rdd.h[1]=prod1.uh[0];
Rdd.h[2]=prod0.uh[1];
Rdd.h[3]=prod1.uh[1];

80-N2040-53 Rev. AA 538


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rxx^=vpmpyh(Rs,Rt) x0 = Rs.uh[0];
x1 = Rs.uh[1];
y0 = Rt.uh[0];
y1 = Rt.uh[1];
prod0 = prod1 = 0;
for(i=0; i < 16; i++) {
if((y0 >> i) & 1) prod0 ^= (x0 << i);
if((y1 >> i) & 1) prod1 ^= (x1 << i);
}
Rxx.h[0]=Rxx.uh[0] ^ prod0.uh[0];
Rxx.h[1]=Rxx.uh[1] ^ prod1.uh[0];
Rxx.h[2]=Rxx.uh[2] ^ prod0.uh[1];
Rxx.h[3]=Rxx.uh[3] ^ prod1.uh[1];

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vpmpyh(Rs,Rt) Word64 Q6_P_vpmpyh_RR(Word32 Rs, Word32 Rt)


Rxx^=vpmpyh(Rs,Rt) Word64 Q6_P_vpmpyhxacc_RR(Word64 Rxx, Word32 Rs,
Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse t5 MinOp d5
1 1 1 0 0 1 0 1 1 1 0 s s s s s P P 0 t t t t t 1 1 1 d d d d d Rdd=vpmpyh(Rs,Rt)
ICLASS RegType MajOp s5 Parse t5 MinOp x5
1 1 1 0 0 1 1 1 1 0 1 s s s s s P P 0 t t t t t 1 1 1 x x x x x Rxx^=vpmpyh(Rs,Rt)

Field name Description


ICLASS Instruction class
MajOp Major opcode
MinOp Minor opcode
RegType Register type
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x

80-N2040-53 Rev. AA 539


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.10.6 XTYPE PERM


The XTYPE PERM instruction subclass includes instructions that perform permutations.

CABAC decode bin


This is a special-purpose instruction to support H.264 Context Adaptive Binary Arithmetic Coding
(CABAC).
Syntax Behavior
Rdd=decbin(Rss,Rtt) state = Rtt.w[1][5:0];
valMPS = Rtt.w[1][8:8];
bitpos = Rtt.w[0][4:0];
range = Rss.w[0];
offset = Rss.w[1];
range <<= bitpos;
offset <<= bitpos;
rLPS = rLPS_table_64x4[state][ (range
>>29)&3];
rLPS = rLPS << 23;
rMPS= (range&0xff800000) - rLPS;
if (offset < rMPS) {
Rdd = AC_next_state_MPS_64[state];
Rdd[8:8]=valMPS;
Rdd[31:23]=(rMPS>>23);
Rdd.w[1]=offset;
P0=valMPS;
} else {
Rdd = AC_next_state_LPS_64[state];
Rdd[8:8]=((!state)?(1-
valMPS):(valMPS));
Rdd[31:23]=(rLPS>>23);
Rdd.w[1]=(offset-rMPS);
P0=(valMPS^1);
}

Class: XTYPE (slots 2,3)

Notes

■ The predicate generated by this instruction cannot be used as a .new predicate, nor can it be
automatically ANDed with another predicate.

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 0 1 1 1 - s s s s s P P - t t t t t 1 1 - d d d d d Rdd=decbin(Rss,Rtt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d

80-N2040-53 Rev. AA 540


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 541


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Saturate
Saturate a single scalar value.
The sath instruction saturates a signed 32-bit number to a signed 16-bit number, which is sign-
extended back to 32 bits and placed in the destination register. The minimum negative value of
the result is 0xffff8000 and the maximum positive value is 0x00007fff.
The satuh instruction saturates a signed 32-bit number to an unsigned 16-bit number, which is
zero-extended back to 32 bits and placed in the destination register. The minimum value of the
result is 0 and the maximum value is 0x0000ffff.
The satb instruction saturates a signed 32-bit number to an signed 8-bit number, which is sign-
extended back to 32 bits and placed in the destination register. The minimum value of the result is
0xffffff80 and the maximum value is 0x0000007f.
The satub instruction saturates a signed 32-bit number to an unsigned 8-bit number, which is
zero-extended back to 32 bits and placed in the destination register. The minimum value of the
result is 0 and the maximum value is 0x000000ff.

Syntax Behavior
Rd=sat(Rss) Rd = sat32(Rss);

Rd=satb(Rs) Rd = sat8(Rs);

Rd=sath(Rs) Rd = sat16(Rs);

Rd=satub(Rs) Rd = usat8(Rs);

Rd=satuh(Rs) Rd = usat16(Rs);

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=sat(Rss) Word32 Q6_R_sat_P(Word64 Rss)


Rd=satb(Rs) Word32 Q6_R_satb_R(Word32 Rs)
Rd=sath(Rs) Word32 Q6_R_sath_R(Word32 Rs)
Rd=satub(Rs) Word32 Q6_R_satub_R(Word32 Rs)
Rd=satuh(Rs) Word32 Q6_R_satuh_R(Word32 Rs)

80-N2040-53 Rev. AA 542


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 0 0 1 1 0 s s s s s P P - - - - - - 0 0 0 d d d d d Rd=sat(Rss)
1 0 0 0 1 1 0 0 1 1 0 s s s s s P P - - - - - - 1 0 0 d d d d d Rd=sath(Rs)
1 0 0 0 1 1 0 0 1 1 0 s s s s s P P - - - - - - 1 0 1 d d d d d Rd=satuh(Rs)
1 0 0 0 1 1 0 0 1 1 0 s s s s s P P - - - - - - 1 1 0 d d d d d Rd=satub(Rs)
1 0 0 0 1 1 0 0 1 1 0 s s s s s P P - - - - - - 1 1 1 d d d d d Rd=satb(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 543


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Swizzle bytes
Swizzle the bytes of a word. This instruction is useful in converting between little and big endian
formats.

Rd=swiz(Rs)
Rs

Rd

Syntax Behavior
Rd=swiz(Rs) Rd.b[0]=Rs.b[3];
Rd.b[1]=Rs.b[2];
Rd.b[2]=Rs.b[1];
Rd.b[3]=Rs.b[0];

Class: XTYPE (slots 2,3)

Intrinsics

Rd=swiz(Rs) Word32 Q6_R_swiz_R(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 1 0 0 1 0 0 s s s s s P P - - - - - - 1 1 1 d d d d d Rd=swiz(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 544


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector align
Align a vector. Use the immediate amount, or the least significant three bits of a predicate register
as the number of bytes to align. Shift the Rss register pair right by this number of bytes. Fill the
vacated positions with the least significant elements from Rtt.

#u3/P

Rtt Rss

Rdd

Syntax Behavior
Rdd=valignb(Rtt,Rss,#u3) Rdd = (Rss >>> #u*8)|(Rtt << ((8-#u)*8));
Rdd=valignb(Rtt,Rss,Pu) PREDUSE_TIMING;
Rdd = Rss >>> (Pu&0x7)*8|(Rtt << (8-(Pu&0x7))*8);

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=valignb(Rtt,Rss,#u3) Word64 Q6_P_valignb_PPI(Word64 Rtt, Word64 Rss, Word32


Iu3)
Rdd=valignb(Rtt,Rss,Pu) Word64 Q6_P_valignb_PPp(Word64 Rtt, Word64 Rss, Byte
Pu)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 0 0 0 - - s s s s s P P - t t t t t i i i d d d d d Rdd=valignb(Rtt,Rss,#u3)
ICLASS RegType Maj s5 Parse t5 u2 d5
1 1 0 0 0 0 1 0 0 - - s s s s s P P - t t t t t - u u d d d d d Rdd=valignb(Rtt,Rss,Pu)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
u2 Field to encode register u

80-N2040-53 Rev. AA 545


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 546


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector round and pack


Add the constant 0x00008000 to each word in the 64-bit source vector Rss. Optionally saturate
this addition to 32 bits. Pack the high halfwords of the result into the corresponding halfword of
the 32-bit destination register.

Rss.w[1] Rss.w[0] Rss

0x8000 0x8000

32-bit Add 32-bit Add

Rd.h[1] Rd.h[0] Rd

Syntax Behavior
Rd=vrndwh(Rss) for (i=0;i<2;i++) {
Rd.h[i]=(Rss.w[i]+0x08000).h[1];
}
Rd=vrndwh(Rss):sat for (i=0;i<2;i++) {
Rd.h[i]=sat32(Rss.w[i]+0x08000).h[1];
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=vrndwh(Rss) Word32 Q6_R_vrndwh_P(Word64 Rss)


Rd=vrndwh(Rss):sat Word32 Q6_R_vrndwh_P_sat(Word64 Rss)

80-N2040-53 Rev. AA 547


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 0 0 1 0 0 s s s s s P P - - - - - - 1 0 0 d d d d d Rd=vrndwh(Rss)
1 0 0 0 1 0 0 0 1 0 0 s s s s s P P - - - - - - 1 1 0 d d d d d Rd=vrndwh(Rss):sat

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 548


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector saturate and pack


For each element in the vector, saturate the value to the next smaller size.
The vsathub instruction saturates signed halfwords to unsigned bytes,
The vsathb instruction saturates signed halfwords to signed bytes.

Rd=vsathub(Rss) Rd=vsathub(Rs)

s16 s16 s16 s16 s16 s16 Rs


Rss

Sat_u8 Sat_u8
Sat_u8 Sat_u8 Sat_u8 Sat_u8

u8 u8 u8 u8 Rd 0 0 u8 u8 Rd

Rd=vsathb(Rss) Rd=vsathb(Rs)
s16 s16 s16 s16 Rss s16 s16 Rs

Sat_s8 Sat_s8 Sat_s8 Sat_s8 Sat_s8 Sat_s8

s8 s8 s8 s8 Rd 0 0 s8 s8 Rd

The vsatwh instruction saturates signed words to signed halfwords.


The vsatwuh instruction saturates signed words to unsigned halfwords. The resulting values are
packed together into the destination register Rd.

Rd=vsathwh(Rss) Rd=vsathwuh(Rss)
s32 s32 Rss s32 s32 Rss

Sat_s16 Sat_s16 Sat_u16


Sat_u16

s16 s16 Rd u16 u16 Rd

80-N2040-53 Rev. AA 549


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rd=vsathb(Rs) Rd.b[0]=sat8(Rs.h[0]);
Rd.b[1]=sat8(Rs.h[1]);
Rd.b[2]=0;
Rd.b[3]=0;
Rd=vsathb(Rss) for (i=0;i<4;i++) {
Rd.b[i]=sat8(Rss.h[i]);
}
Rd=vsathub(Rs) Rd.b[0]=usat8(Rs.h[0]);
Rd.b[1]=usat8(Rs.h[1]);
Rd.b[2]=0;
Rd.b[3]=0;
Rd=vsathub(Rss) for (i=0;i<4;i++) {
Rd.b[i]=usat8(Rss.h[i]);
}
Rd=vsatwh(Rss) for (i=0;i<2;i++) {
Rd.h[i]=sat16(Rss.w[i]);
}
Rd=vsatwuh(Rss) for (i=0;i<2;i++) {
Rd.h[i]=usat16(Rss.w[i]);
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=vsathb(Rs) Word32 Q6_R_vsathb_R(Word32 Rs)


Rd=vsathb(Rss) Word32 Q6_R_vsathb_P(Word64 Rss)
Rd=vsathub(Rs) Word32 Q6_R_vsathub_R(Word32 Rs)
Rd=vsathub(Rss) Word32 Q6_R_vsathub_P(Word64 Rss)
Rd=vsatwh(Rss) Word32 Q6_R_vsatwh_P(Word64 Rss)
Rd=vsatwuh(Rss) Word32 Q6_R_vsatwuh_P(Word64 Rss)

80-N2040-53 Rev. AA 550


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 0 0 0 0 0 s s s s s P P - - - - - - 0 0 0 d d d d d Rd=vsathub(Rss)
1 0 0 0 1 0 0 0 0 0 0 s s s s s P P - - - - - - 0 1 0 d d d d d Rd=vsatwh(Rss)
1 0 0 0 1 0 0 0 0 0 0 s s s s s P P - - - - - - 1 0 0 d d d d d Rd=vsatwuh(Rss)
1 0 0 0 1 0 0 0 0 0 0 s s s s s P P - - - - - - 1 1 0 d d d d d Rd=vsathb(Rss)
1 0 0 0 1 1 0 0 1 0 - s s s s s P P - - - - - - 0 0 - d d d d d Rd=vsathb(Rs)
1 0 0 0 1 1 0 0 1 0 - s s s s s P P - - - - - - 0 1 - d d d d d Rd=vsathub(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 551


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector saturate without pack


Saturate each element of source vector Rss to the next smaller size.
The vsathub instruction saturates signed halfwords to unsigned bytes.
The vsatwh instruction saturates signed words to signed halfwords.
The vsatwuh instruction saturates signed words to unsigned halfwords.
The resulting values are placed in destination register Rdd in unpacked form.
Rdd=vsathub(Rss)

s16 s16 s16 s16 Rss

Sat_u8 Sat_u8 Sat_u8 Sat_u8

0 u8 0 u8 0 u8 0 u8 Rdd

Rdd=vsathb(Rss)

s16 s16 s16 s16 Rss

Sat_s8 Sat_s8 Sat_s8 Sat_s8

se se se se
s8 s8 s8 s8 Rdd

80-N2040-53 Rev. AA 552


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rdd=vsathb(Rss) for (i=0;i<4;i++) {
Rdd.h[i]=sat8(Rss.h[i]);
}
Rdd=vsathub(Rss) for (i=0;i<4;i++) {
Rdd.h[i]=usat8(Rss.h[i]);
}
Rdd=vsatwh(Rss) for (i=0;i<2;i++) {
Rdd.w[i]=sat16(Rss.w[i]);
}
Rdd=vsatwuh(Rss) for (i=0;i<2;i++) {
Rdd.w[i]=usat16(Rss.w[i]);
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rdd=vsathb(Rss) Word64 Q6_P_vsathb_P(Word64 Rss)


Rdd=vsathub(Rss) Word64 Q6_P_vsathub_P(Word64 Rss)
Rdd=vsatwh(Rss) Word64 Q6_P_vsatwh_P(Word64 Rss)
Rdd=vsatwuh(Rss) Word64 Q6_P_vsatwuh_P(Word64 Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 0 0 0 s s s s s P P - - - - - - 1 0 0 d d d d d Rdd=vsathub(Rss)
1 0 0 0 0 0 0 0 0 0 0 s s s s s P P - - - - - - 1 0 1 d d d d d Rdd=vsatwuh(Rss)
1 0 0 0 0 0 0 0 0 0 0 s s s s s P P - - - - - - 1 1 0 d d d d d Rdd=vsatwh(Rss)
1 0 0 0 0 0 0 0 0 0 0 s s s s s P P - - - - - - 1 1 1 d d d d d Rdd=vsathb(Rss)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s

80-N2040-53 Rev. AA 553


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 554


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector shuffle
Shuffle odd halfwords (shuffoh) takes the odd halfwords from Rtt and the odd halfwords from Rss
and merges them together into vector Rdd. Shuffle even halfwords (shuffeh) performs the same
operation on every even halfword in Rss and Rtt. The same operation is available for odd and
even bytes.

shuffoh shuffeh
Rtt Rss

Rss Rtt

Rdd Rdd

shuffob shuffeb
Rtt Rss

Rss Rtt

Rdd Rdd

Syntax Behavior
Rdd=shuffeb(Rss,Rtt) for (i=0;i<4;i++) {
Rdd.b[i*2]=Rtt.b[i*2];
Rdd.b[i*2+1]=Rss.b[i*2];
}
Rdd=shuffeh(Rss,Rtt) for (i=0;i<2;i++) {
Rdd.h[i*2]=Rtt.h[i*2];
Rdd.h[i*2+1]=Rss.h[i*2];
}
Rdd=shuffob(Rtt,Rss) for (i=0;i<4;i++) {
Rdd.b[i*2]=Rss.b[i*2+1];
Rdd.b[i*2+1]=Rtt.b[i*2+1];
}
Rdd=shuffoh(Rtt,Rss) for (i=0;i<2;i++) {
Rdd.h[i*2]=Rss.h[i*2+1];
Rdd.h[i*2+1]=Rtt.h[i*2+1];
}

80-N2040-53 Rev. AA 555


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=shuffeb(Rss,Rtt) Word64 Q6_P_shuffeb_PP(Word64 Rss, Word64


Rtt)
Rdd=shuffeh(Rss,Rtt) Word64 Q6_P_shuffeh_PP(Word64 Rss, Word64
Rtt)
Rdd=shuffob(Rtt,Rss) Word64 Q6_P_shuffob_PP(Word64 Rtt, Word64
Rss)
Rdd=shuffoh(Rtt,Rss) Word64 Q6_P_shuffoh_PP(Word64 Rtt, Word64
Rss)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 0 1 0 0 - s s s s s P P - t t t t t 0 1 - d d d d d Rdd=shuffeb(Rss,Rtt)
1 1 0 0 0 0 0 1 0 0 - s s s s s P P - t t t t t 1 0 - d d d d d Rdd=shuffob(Rtt,Rss)
1 1 0 0 0 0 0 1 0 0 - s s s s s P P - t t t t t 1 1 - d d d d d Rdd=shuffeh(Rss,Rtt)
1 1 0 0 0 0 0 1 1 0 - s s s s s P P - t t t t t 0 0 0 d d d d d Rdd=shuffoh(Rtt,Rss)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 556


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector splat bytes


Replicate the low 8-bits from register Rs into each of the four bytes of destination register Rd.

Rd=vsplatb(Rs)

Rs

Rd

Syntax Behavior
Rd=vsplatb(Rs) for (i=0;i<4;i++) {
Rd.b[i]=Rs.b[0];
}
Rdd=vsplatb(Rs) for (i=0;i<8;i++) {
Rdd.b[i]=Rs.b[0];
}

Class: XTYPE (slots 2,3)

Intrinsics

Rd=vsplatb(Rs) Word32 Q6_R_vsplatb_R(Word32 Rs)


Rdd=vsplatb(Rs) Word64 Q6_P_vsplatb_R(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 1 0 0 0 1 - s s s s s P P - - - - - - 1 0 - d d d d d Rdd=vsplatb(Rs)
1 0 0 0 1 1 0 0 0 1 0 s s s s s P P - - - - - - 1 1 1 d d d d d Rd=vsplatb(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 557


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector splat halfwords


Replicate the low 16-bits from register Rs into each of the four halfwords of destination Rdd.

Rdd=vsplath(Rs)

Rs

Rdd

Syntax Behavior
Rdd=vsplath(Rs) for (i=0;i<4;i++) {
Rdd.h[i]=Rs.h[0];
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vsplath(Rs) Word64 Q6_P_vsplath_R(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 1 0 0 0 1 - s s s s s P P - - - - - - 0 1 - d d d d d Rdd=vsplath(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 558


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector splice
Concatenate the low (8-N) bytes of vector Rtt with the low N bytes of vector Rss. This instruction
is helpful to vectorize unaligned stores.

#u3/P

Rtt Rss

Rdd

Syntax Behavior
Rdd=vspliceb(Rss,Rtt,#u3) Rdd = Rtt << #u*8 | zxt#u*8->64(Rss);

Rdd=vspliceb(Rss,Rtt,Pu) PREDUSE_TIMING;
Rdd = Rtt << (Pu&7)*8 | zxt(Pu&7)*8->64(Rss);

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vspliceb(Rss,Rtt,#u3) Word64 Q6_P_vspliceb_PPI(Word64 Rss, Word64


Rtt, Word32 Iu3)
Rdd=vspliceb(Rss,Rtt,Pu) Word64 Q6_P_vspliceb_PPp(Word64 Rss, Word64
Rtt, Byte Pu)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 0 0 1 - - s s s s s P P - t t t t t i i i d d d d d Rdd=vspliceb(Rss,Rtt,#u3)
ICLASS RegType Maj s5 Parse t5 u2 d5
1 1 0 0 0 0 1 0 1 0 0 s s s s s P P - t t t t t - u u d d d d d Rdd=vspliceb(Rss,Rtt,Pu)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
u2 Field to encode register u

80-N2040-53 Rev. AA 559


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 560


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector sign extend


The vsxtbh instruction sign-extends each byte of a single register source to halfwords, and places
the result in the destination register pair.
The vsxthw instruction sign-extends each halfword of a single register source to words, and places
the result in the destination register pair.

Rdd=vsxtbh(Rs) Rs

sign sign sign sign Rdd

Rs
Rdd=vsxthw(Rs)

sign sign Rdd

Syntax Behavior
Rdd=vsxtbh(Rs) for (i=0;i<4;i++) {
Rdd.h[i]=Rs.b[i];
}
Rdd=vsxthw(Rs) for (i=0;i<2;i++) {
Rdd.w[i]=Rs.h[i];
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vsxtbh(Rs) Word64 Q6_P_vsxtbh_R(Word32 Rs)


Rdd=vsxthw(Rs) Word64 Q6_P_vsxthw_R(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 1 0 0 0 0 - s s s s s P P - - - - - - 0 0 - d d d d d Rdd=vsxtbh(Rs)
1 0 0 0 0 1 0 0 0 0 - s s s s s P P - - - - - - 1 0 - d d d d d Rdd=vsxthw(Rs)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d

80-N2040-53 Rev. AA 561


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 562


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector truncate
In the vtrunehb instruction, for each halfword in a vector, take the even (lower) byte and ignore
the other byte. Pack the resulting values into destination register Rd.
The vtrunohb instruction takes each odd byte of the source vector.
The vtrunewh instruction uses two source register pairs, Rss and Rtt. Pack the even (lower)
halfwords of Rss in the upper word of Rdd, and pack the lower halfwords of Rtt in the lower word
of Rdd.
The vtrunowh instruction performs the same operation as the vtrunewh instruction, but uses the
odd (upper) halfwords of the source vectors instead.

Rd=vtrunehb(Rss) Rdd=vtrunewh(Rss,Rtt)
Rss
Rss
Rtt
Rd
Rdd

Rd=vtrunohb(Rss)
Rdd=vtrunowh(Rss,Rtt)
Rss Rss
Rtt
Rd

Rdd

Syntax Behavior
Rd=vtrunehb(Rss) for (i=0;i<4;i++) {
Rd.b[i]=Rss.b[i*2];
}
Rd=vtrunohb(Rss) for (i=0;i<4;i++) {
Rd.b[i]=Rss.b[i*2+1];
}
Rdd=vtrunehb(Rss,Rtt) for (i=0;i<4;i++) {
Rdd.b[i]=Rtt.b[i*2];
Rdd.b[i+4]=Rss.b[i*2];
}
Rdd=vtrunewh(Rss,Rtt) Rdd.h[0]=Rtt.h[0];
Rdd.h[1]=Rtt.h[2];
Rdd.h[2]=Rss.h[0];
Rdd.h[3]=Rss.h[2];
Rdd=vtrunohb(Rss,Rtt) for (i=0;i<4;i++) {
Rdd.b[i]=Rtt.b[i*2+1];
Rdd.b[i+4]=Rss.b[i*2+1];
}

80-N2040-53 Rev. AA 563


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rdd=vtrunowh(Rss,Rtt) Rdd.h[0]=Rtt.h[1];
Rdd.h[1]=Rtt.h[3];
Rdd.h[2]=Rss.h[1];
Rdd.h[3]=Rss.h[3];

Class: XTYPE (slots 2,3)

Intrinsics

Rd=vtrunehb(Rss) Word32 Q6_R_vtrunehb_P(Word64 Rss)


Rd=vtrunohb(Rss) Word32 Q6_R_vtrunohb_P(Word64 Rss)
Rdd=vtrunehb(Rss,Rtt) Word64 Q6_P_vtrunehb_PP(Word64 Rss, Word64
Rtt)
Rdd=vtrunewh(Rss,Rtt) Word64 Q6_P_vtrunewh_PP(Word64 Rss, Word64
Rtt)
Rdd=vtrunohb(Rss,Rtt) Word64 Q6_P_vtrunohb_PP(Word64 Rss, Word64
Rtt)
Rdd=vtrunowh(Rss,Rtt) Word64 Q6_P_vtrunowh_PP(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 0 0 1 0 0 s s s s s P P - - - - - - 0 0 0 d d d d d Rd=vtrunohb(Rss)
1 0 0 0 1 0 0 0 1 0 0 s s s s s P P - - - - - - 0 1 0 d d d d d Rd=vtrunehb(Rss)
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 0 1 1 0 - s s s s s P P - t t t t t 0 1 0 d d d d d Rdd=vtrunewh(Rss,Rtt)
1 1 0 0 0 0 0 1 1 0 - s s s s s P P - t t t t t 0 1 1 d d d d d Rdd=vtrunehb(Rss,Rtt)
1 1 0 0 0 0 0 1 1 0 - s s s s s P P - t t t t t 1 0 0 d d d d d Rdd=vtrunowh(Rss,Rtt)
1 1 0 0 0 0 0 1 1 0 - s s s s s P P - t t t t t 1 0 1 d d d d d Rdd=vtrunohb(Rss,Rtt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
MajOp Major opcode
MinOp Minor opcode
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 564


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector zero extend


The vzxtbh instruction zero-extends each byte of a single register source to halfwords, and places
the result in the destination register pair.
The vzxthw instruction zero-extends each halfword of a single register source to words, and
places the result in the destination register pair.

Rdd=vzxtbh(Rs) Rs

zero zero zero zero Rdd

Rdd=vzxthw(Rs) Rs

zero zero Rdd

Syntax Behavior
Rdd=vzxtbh(Rs) for (i=0;i<4;i++) {
Rdd.h[i]=Rs.ub[i];
}
Rdd=vzxthw(Rs) for (i=0;i<2;i++) {
Rdd.w[i]=Rs.uh[i];
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vzxtbh(Rs) Word64 Q6_P_vzxtbh_R(Word32 Rs)


Rdd=vzxthw(Rs) Word64 Q6_P_vzxthw_R(Word32 Rs)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 1 0 0 0 0 - s s s s s P P - - - - - - 0 1 - d d d d d Rdd=vzxtbh(Rs)
1 0 0 0 0 1 0 0 0 0 - s s s s s P P - - - - - - 1 1 - d d d d d Rdd=vzxthw(Rs)

80-N2040-53 Rev. AA 565


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 566


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.10.7 XTYPE PRED


The XTYPE PRED instruction subclass includes instructions that perform miscellaneous operations
on predicates, including mask generation, predicate transfers, and the Viterbi pack operation.

Bounds check
Determine if Rs falls in the range defined by Rtt.
The user sets Rtt.w0 to the lower bound, and Rtt.w1 to the upper bound.
All bits of the destination predicate are set if the value falls within the range, or all cleared
otherwise.
Syntax Behavior
Pd=boundscheck(Rs,Rtt) if ("Rs & 1") {
Assembler mapped to:
"Pd=boundscheck(Rss,Rtt):raw:hi";
} else {
Assembler mapped to:
"Pd=boundscheck(Rss,Rtt):raw:lo";
}
Pd=boundscheck(Rss,Rtt):raw: src = Rss.uw[1];
hi Pd = (src.uw[0] >= Rtt.uw[0]) && (src.uw[0] <
Rtt.uw[1]) ? 0xff : 0x00;
Pd=boundscheck(Rss,Rtt):raw: src = Rss.uw[0];
lo Pd = (src.uw[0] >= Rtt.uw[0]) && (src.uw[0] <
Rtt.uw[1]) ? 0xff : 0x00;

Class: XTYPE (slots 2,3)

Intrinsics

Pd=boundscheck(Rs,Rtt) Byte Q6_p_boundscheck_RP(Word32 Rs, Word64


Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d2

1 1 0 1 0 0 1 0 0 - - s s s s s P P 1 t t t t t 1 0 0 - - - d d Pd=boundscheck(Rss,Rtt):r
aw:lo

1 1 0 1 0 0 1 0 0 - - s s s s s P P 1 t t t t t 1 0 1 - - - d d Pd=boundscheck(Rss,Rtt):r
aw:hi

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits

80-N2040-53 Rev. AA 567


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 568


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Compare byte
These instructions sign- or zero-extend the low 8 bits of the source registers and perform 32-bit
comparisons on the result. When there is an extended 32-bit immediate operand, the full 32
immediate bits are used for the comparison.
Syntax Behavior
Pd=cmpb.eq(Rs,#u8) Pd=Rs.ub[0] == #u ? 0xff : 0x00;
Pd=cmpb.eq(Rs,Rt) Pd=Rs.b[0] == Rt.b[0] ? 0xff : 0x00;
Pd=cmpb.gt(Rs,#s8) Pd=Rs.b[0] > #s ? 0xff : 0x00;
Pd=cmpb.gt(Rs,Rt) Pd=Rs.b[0] > Rt.b[0] ? 0xff : 0x00;
Pd=cmpb.gtu(Rs,#u7) apply_extension(#u);
Pd=Rs.ub[0] > #u.uw[0] ? 0xff : 0x00;
Pd=cmpb.gtu(Rs,Rt) Pd=Rs.ub[0] > Rt.ub[0] ? 0xff : 0x00;

Class: XTYPE (slots 2,3)

Intrinsics

Pd=cmpb.eq(Rs,#u8) Byte Q6_p_cmpb_eq_RI(Word32 Rs, Word32 Iu8)


Pd=cmpb.eq(Rs,Rt) Byte Q6_p_cmpb_eq_RR(Word32 Rs, Word32 Rt)
Pd=cmpb.gt(Rs,#s8) Byte Q6_p_cmpb_gt_RI(Word32 Rs, Word32 Is8)
Pd=cmpb.gt(Rs,Rt) Byte Q6_p_cmpb_gt_RR(Word32 Rs, Word32 Rt)
Pd=cmpb.gtu(Rs,#u7) Byte Q6_p_cmpb_gtu_RI(Word32 Rs, Word32
Iu7)
Pd=cmpb.gtu(Rs,Rt) Byte Q6_p_cmpb_gtu_RR(Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d2
1 1 0 0 0 1 1 1 1 1 0 s s s s s P P - t t t t t 0 1 0 - - - d d Pd=cmpb.gt(Rs,Rt)
1 1 0 0 0 1 1 1 1 1 0 s s s s s P P - t t t t t 1 1 0 - - - d d Pd=cmpb.eq(Rs,Rt)
1 1 0 0 0 1 1 1 1 1 0 s s s s s P P - t t t t t 1 1 1 - - - d d Pd=cmpb.gtu(Rs,Rt)
ICLASS RegType s5 Parse d2
1 1 0 1 1 1 0 1 - 0 0 s s s s s P P - i i i i i i i i 0 0 - d d Pd=cmpb.eq(Rs,#u8)
1 1 0 1 1 1 0 1 - 0 1 s s s s s P P - i i i i i i i i 0 0 - d d Pd=cmpb.gt(Rs,#s8)
1 1 0 1 1 1 0 1 - 1 0 s s s s s P P - 0 i i i i i i i 0 0 - d d Pd=cmpb.gtu(Rs,#u7)

Field name Description


RegType Register type
MajOp Major opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s

80-N2040-53 Rev. AA 569


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


t5 Field to encode register t
Maj Major opcode
Min Minor opcode

80-N2040-53 Rev. AA 570


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Compare half
These instructions sign- or zero-extend the low 16 bits of the source registers and perform 32-bit
comparisons on the result. When there is an extended 32-bit immediate operand, the full 32
immediate bits are used for the comparison.
Syntax Behavior
Pd=cmph.eq(Rs,#s8) apply_extension(#s);
Pd=Rs.h[0] == #s ? 0xff : 0x00;
Pd=cmph.eq(Rs,Rt) Pd=Rs.h[0] == Rt.h[0] ? 0xff : 0x00;
Pd=cmph.gt(Rs,#s8) apply_extension(#s);
Pd=Rs.h[0] > #s ? 0xff : 0x00;
Pd=cmph.gt(Rs,Rt) Pd=Rs.h[0] > Rt.h[0] ? 0xff : 0x00;
Pd=cmph.gtu(Rs,#u7) apply_extension(#u);
Pd=Rs.uh[0] > #u.uw[0] ? 0xff : 0x00;
Pd=cmph.gtu(Rs,Rt) Pd=Rs.uh[0] > Rt.uh[0] ? 0xff : 0x00;

Class: XTYPE (slots 2,3)

Intrinsics

Pd=cmph.eq(Rs,#s8) Byte Q6_p_cmph_eq_RI(Word32 Rs, Word32 Is8)


Pd=cmph.eq(Rs,Rt) Byte Q6_p_cmph_eq_RR(Word32 Rs, Word32 Rt)
Pd=cmph.gt(Rs,#s8) Byte Q6_p_cmph_gt_RI(Word32 Rs, Word32 Is8)
Pd=cmph.gt(Rs,Rt) Byte Q6_p_cmph_gt_RR(Word32 Rs, Word32 Rt)
Pd=cmph.gtu(Rs,#u7) Byte Q6_p_cmph_gtu_RI(Word32 Rs, Word32
Iu7)
Pd=cmph.gtu(Rs,Rt) Byte Q6_p_cmph_gtu_RR(Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d2
1 1 0 0 0 1 1 1 1 1 0 s s s s s P P - t t t t t 0 1 1 - - - d d Pd=cmph.eq(Rs,Rt)
1 1 0 0 0 1 1 1 1 1 0 s s s s s P P - t t t t t 1 0 0 - - - d d Pd=cmph.gt(Rs,Rt)
1 1 0 0 0 1 1 1 1 1 0 s s s s s P P - t t t t t 1 0 1 - - - d d Pd=cmph.gtu(Rs,Rt)
ICLASS RegType s5 Parse d2
1 1 0 1 1 1 0 1 - 0 0 s s s s s P P - i i i i i i i i 0 1 - d d Pd=cmph.eq(Rs,#s8)
1 1 0 1 1 1 0 1 - 0 1 s s s s s P P - i i i i i i i i 0 1 - d d Pd=cmph.gt(Rs,#s8)
1 1 0 1 1 1 0 1 - 1 0 s s s s s P P - 0 i i i i i i i 0 1 - d d Pd=cmph.gtu(Rs,#u7)

Field name Description


RegType Register type
MajOp Major opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d

80-N2040-53 Rev. AA 571


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode

80-N2040-53 Rev. AA 572


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Compare doublewords
Compare two 64-bit register pairs for unsigned greater than, greater than, or equal. The 8-bit
predicate register Pd is set to all 1s or all 0s, depending on the result.

Syntax Behavior
Pd=cmp.eq(Rss,Rtt) Pd=Rss==Rtt ? 0xff : 0x00;
Pd=cmp.gt(Rss,Rtt) Pd=Rss>Rtt ? 0xff : 0x00;
Pd=cmp.gtu(Rss,Rtt) Pd=Rss.u64>Rtt.u64 ? 0xff : 0x00;

Class: XTYPE (slots 2,3)

Intrinsics

Pd=cmp.eq(Rss,Rtt) Byte Q6_p_cmp_eq_PP(Word64 Rss, Word64 Rtt)


Pd=cmp.gt(Rss,Rtt) Byte Q6_p_cmp_gt_PP(Word64 Rss, Word64 Rtt)
Pd=cmp.gtu(Rss,Rtt) Byte Q6_p_cmp_gtu_PP(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d2
1 1 0 1 0 0 1 0 1 0 0 s s s s s P P - t t t t t 0 0 0 - - - d d Pd=cmp.eq(Rss,Rtt)
1 1 0 1 0 0 1 0 1 0 0 s s s s s P P - t t t t t 0 1 0 - - - d d Pd=cmp.gt(Rss,Rtt)
1 1 0 1 0 0 1 0 1 0 0 s s s s s P P - t t t t t 1 0 0 - - - d d Pd=cmp.gtu(Rss,Rtt)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 573


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Compare bit mask


If all the bits in the mask in Rt or a short immediate are set (bitsset) or clear (bitsclear) in Rs, set
the Pd to true. Otherwise, set the bits in Pd to false.

Syntax Behavior
Pd=[!]bitsclr(Rs,#u6) Pd=(Rs&#u)[!]=0 ? 0xff : 0x00;
Pd=[!]bitsclr(Rs,Rt) Pd=(Rs&Rt)[!]=0 ? 0xff : 0x00;
Pd=[!]bitsset(Rs,Rt) Pd=(Rs&Rt)[!]=Rt ? 0xff : 0x00;

Class: XTYPE (slots 2,3)

Intrinsics

Pd=!bitsclr(Rs,#u6) Byte Q6_p_not_bitsclr_RI(Word32 Rs, Word32


Iu6)
Pd=!bitsclr(Rs,Rt) Byte Q6_p_not_bitsclr_RR(Word32 Rs, Word32
Rt)
Pd=!bitsset(Rs,Rt) Byte Q6_p_not_bitsset_RR(Word32 Rs, Word32
Rt)
Pd=bitsclr(Rs,#u6) Byte Q6_p_bitsclr_RI(Word32 Rs, Word32 Iu6)
Pd=bitsclr(Rs,Rt) Byte Q6_p_bitsclr_RR(Word32 Rs, Word32 Rt)
Pd=bitsset(Rs,Rt) Byte Q6_p_bitsset_RR(Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse d2
1 0 0 0 0 1 0 1 1 0 0 s s s s s P P i i i i i i - - - - - - d d Pd=bitsclr(Rs,#u6)
1 0 0 0 0 1 0 1 1 0 1 s s s s s P P i i i i i i - - - - - - d d Pd=!bitsclr(Rs,#u6)
ICLASS RegType Maj s5 Parse t5 d2
1 1 0 0 0 1 1 1 0 1 0 s s s s s P P - t t t t t - - - - - - d d Pd=bitsset(Rs,Rt)
1 1 0 0 0 1 1 1 0 1 1 s s s s s P P - t t t t t - - - - - - d d Pd=!bitsset(Rs,Rt)
1 1 0 0 0 1 1 1 1 0 0 s s s s s P P - t t t t t - - - - - - d d Pd=bitsclr(Rs,Rt)
1 1 0 0 0 1 1 1 1 0 1 s s s s s P P - t t t t t - - - - - - d d Pd=!bitsclr(Rs,Rt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
MajOp Major opcode
Maj Major opcode
RegType Register type

80-N2040-53 Rev. AA 574


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Mask generate from predicate


For each of the low 8 bits in predicate register Pt, when the bit is set, set the corresponding byte
in 64-bit register pair Rdd to 0xff, otherwise, set the corresponding byte to 0x00.

7 0

1 0 1 0 1 0 1 0 Pt

0xFF 0x00 0xFF 0x00 0xFF 0x00 0xFF 0x00 Rdd

Syntax Behavior
Rdd=mask(Pt) PREDUSE_TIMING;
for (i = 0; i < 8; i++) {
Rdd.b[i]=(Pt.i?(0xff):(0x00));
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=mask(Pt) Word64 Q6_P_mask_p(Byte Pt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Parse t2 d5
1 0 0 0 0 1 1 0 - - - - - - - - P P - - - - t t - - - d d d d d Rdd=mask(Pt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
t2 Field to encode register t
RegType Register type

80-N2040-53 Rev. AA 575


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Check for TLB match


Determine if the TLB entry in Rss matches the ASID:PPN in Rt.

Syntax Behavior
Pd=tlbmatch(Rss,Rt) MASK = 0x07ffffff;
TLBLO = Rss.uw[0];
TLBHI = Rss.uw[1];
SIZE = min(6,count_leading_ones(~reverse_bits(TLBLO)));
MASK &= (0xffffffff << 2*SIZE);
Pd = TLBHI.31 && ((TLBHI & MASK) == (Rt & MASK)) ? 0xff
: 0x00;

Class: XTYPE (slots 2,3)

Notes

■ The predicate generated by this instruction cannot be used as a .new predicate, nor can it be
automatically ANDed with another predicate.

Intrinsics

Pd=tlbmatch(Rss,Rt) Byte Q6_p_tlbmatch_PR(Word64 Rss, Word32


Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d2
1 1 0 1 0 0 1 0 0 - - s s s s s P P 1 t t t t t 0 1 1 - - - d d Pd=tlbmatch(Rss,Rt)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 576


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Predicate transfer
Pd=Rs transfers a predicate to the eight least-significant bits of a general register and zeros the
other bits.
Rd=Ps transfers the eight least-significant bits of a general register to a predicate.

Syntax Behavior
Pd=Rs Pd = Rs.ub[0];
Rd=Ps PREDUSE_TIMING;
Rd = zxt8->32(Ps);

Class: XTYPE (slots 2,3)

Intrinsics

Pd=Rs Byte Q6_p_equals_R(Word32 Rs)


Rd=Ps Word32 Q6_R_equals_p(Byte Ps)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse d2
1 0 0 0 0 1 0 1 0 1 0 s s s s s P P - - - - - - - - - - - - d d Pd=Rs
ICLASS RegType MajOp s2 Parse d5
1 0 0 0 1 0 0 1 - 1 - - - - s s P P - - - - - - - - - d d d d d Rd=Ps

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
d5 Field to encode register d
s2 Field to encode register s
s5 Field to encode register s
MajOp Major opcode
RegType Register type

80-N2040-53 Rev. AA 577


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Test bit
Extract a bit from a register. If the bit is true (1), set all the bits of the predicate register
destination to 1. If the bit is false (0), set all the bits of the predicate register destination to 0. The
bit to test can be indicated using an immediate or register value.
If a register is used to indicate the bit to test, and the value specified is out of range, the predicate
result is zero.

Syntax Behavior
Pd=[!]tstbit(Rs,#u5) Pd = (Rs & (1<<#u)) == 0 ? 0xff : 0x00;
Pd=[!]tstbit(Rs,Rt) Pd = (zxt32->64(Rs) & (sxt7->32(Rt)>0)?(zxt32->64(1)<<sxt7-
>32(Rt)):(zxt32->64(1)>>>sxt7->32(Rt)))== 0 ? 0xff : 0x00;

Class: XTYPE (slots 2,3)

Intrinsics

Pd=!tstbit(Rs,#u5) Byte Q6_p_not_tstbit_RI(Word32 Rs, Word32


Iu5)
Pd=!tstbit(Rs,Rt) Byte Q6_p_not_tstbit_RR(Word32 Rs, Word32
Rt)
Pd=tstbit(Rs,#u5) Byte Q6_p_tstbit_RI(Word32 Rs, Word32 Iu5)
Pd=tstbit(Rs,Rt) Byte Q6_p_tstbit_RR(Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse d2
1 0 0 0 0 1 0 1 0 0 0 s s s s s P P 0 i i i i i - - - - - - d d Pd=tstbit(Rs,#u5)
1 0 0 0 0 1 0 1 0 0 1 s s s s s P P 0 i i i i i - - - - - - d d Pd=!tstbit(Rs,#u5)
ICLASS RegType Maj s5 Parse t5 d2
1 1 0 0 0 1 1 1 0 0 0 s s s s s P P - t t t t t - - - - - - d d Pd=tstbit(Rs,Rt)
1 1 0 0 0 1 1 1 0 0 1 s s s s s P P - t t t t t - - - - - - d d Pd=!tstbit(Rs,Rt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
MajOp Major opcode
Maj Major opcode
RegType Register type

80-N2040-53 Rev. AA 578


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector compare halfwords


Compare each of four 16-bit halfwords in two 64-bit vectors and set the corresponding bits in a
predicate destination to '11' if true, '00' if false.
Halfword comparisons are for equal, signed greater than, or unsigned greater than.

Rss

Rtt

cmp cmp cmp cmp

1 1 0 0 1 1 0 0 Pd
7 0

Syntax Behavior
Pd=vcmph.eq(Rss,#s8) for (i = 0; i < 4; i++) {
Pd.i*2 = (Rss.h[i] == #s);
Pd.i*2+1 = (Rss.h[i] == #s);
}
Pd=vcmph.eq(Rss,Rtt) for (i = 0; i < 4; i++) {
Pd.i*2 = (Rss.h[i] == Rtt.h[i]);
Pd.i*2+1 = (Rss.h[i] == Rtt.h[i]);
}
Pd=vcmph.gt(Rss,#s8) for (i = 0; i < 4; i++) {
Pd.i*2 = (Rss.h[i] > #s);
Pd.i*2+1 = (Rss.h[i] > #s);
}
Pd=vcmph.gt(Rss,Rtt) for (i = 0; i < 4; i++) {
Pd.i*2 = (Rss.h[i] > Rtt.h[i]);
Pd.i*2+1 = (Rss.h[i] > Rtt.h[i]);
}
Pd=vcmph.gtu(Rss,#u7) for (i = 0; i < 4; i++) {
Pd.i*2 = (Rss.uh[i] > #u);
Pd.i*2+1 = (Rss.uh[i] > #u);
}
Pd=vcmph.gtu(Rss,Rtt) for (i = 0; i < 4; i++) {
Pd.i*2 = (Rss.uh[i] > Rtt.uh[i]);
Pd.i*2+1 = (Rss.uh[i] > Rtt.uh[i]);
}

80-N2040-53 Rev. AA 579


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Pd=vcmph.eq(Rss,#s8) Byte Q6_p_vcmph_eq_PI(Word64 Rss, Word32


Is8)
Pd=vcmph.eq(Rss,Rtt) Byte Q6_p_vcmph_eq_PP(Word64 Rss, Word64
Rtt)
Pd=vcmph.gt(Rss,#s8) Byte Q6_p_vcmph_gt_PI(Word64 Rss, Word32
Is8)
Pd=vcmph.gt(Rss,Rtt) Byte Q6_p_vcmph_gt_PP(Word64 Rss, Word64
Rtt)
Pd=vcmph.gtu(Rss,#u7) Byte Q6_p_vcmph_gtu_PI(Word64 Rss, Word32
Iu7)
Pd=vcmph.gtu(Rss,Rtt) Byte Q6_p_vcmph_gtu_PP(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d2
1 1 0 1 0 0 1 0 0 - - s s s s s P P 0 t t t t t 0 1 1 - - - d d Pd=vcmph.eq(Rss,Rtt)
1 1 0 1 0 0 1 0 0 - - s s s s s P P 0 t t t t t 1 0 0 - - - d d Pd=vcmph.gt(Rss,Rtt)
1 1 0 1 0 0 1 0 0 - - s s s s s P P 0 t t t t t 1 0 1 - - - d d Pd=vcmph.gtu(Rss,Rtt)
ICLASS RegType s5 Parse d2
1 1 0 1 1 1 0 0 0 0 0 s s s s s P P - i i i i i i i i 0 1 - d d Pd=vcmph.eq(Rss,#s8)
1 1 0 1 1 1 0 0 0 0 1 s s s s s P P - i i i i i i i i 0 1 - d d Pd=vcmph.gt(Rss,#s8)
1 1 0 1 1 1 0 0 0 1 0 s s s s s P P - 0 i i i i i i i 0 1 - d d Pd=vcmph.gtu(Rss,#u7)

Field name Description


RegType Register type
MajOp Major opcode
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 580


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector compare bytes for any match


Compare each byte in two 64-bit source vectors and set a predicate if any of the eight bytes are
equal.
This instruction can quickly find the null terminator in a string.

Syntax Behavior
Pd=!any8(vcmpb.eq(Rss,Rtt)) Pd = 0;
for (i = 0; i < 8; i++) {
if (Rss.b[i] == Rtt.b[i]) Pd = 0xff;
}
Pd = ~Pd;
Pd=any8(vcmpb.eq(Rss,Rtt)) Pd = 0;
for (i = 0; i < 8; i++) {
if (Rss.b[i] == Rtt.b[i]) Pd = 0xff;
}

Class: XTYPE (slots 2,3)

Intrinsics

Pd=!any8(vcmpb.eq(Rss,Rtt)) Byte Q6_p_not_any8_vcmpb_eq_PP(Word64 Rss,


Word64 Rtt)
Pd=any8(vcmpb.eq(Rss,Rtt)) Byte Q6_p_any8_vcmpb_eq_PP(Word64 Rss,
Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d2

1 1 0 1 0 0 1 0 0 - - s s s s s P P 1 t t t t t 0 0 0 - - - d d Pd=any8(vcmpb.eq(Rss,Rtt
))

1 1 0 1 0 0 1 0 0 - - s s s s s P P 1 t t t t t 0 0 1 - - - d d Pd=!any8(vcmpb.eq(Rss,Rt
t))

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 581


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector compare bytes


Compare each of eight bytes in two 64-bit vectors and set the corresponding bit in a predicate
destination to 1 if true, 0 if false.
Byte comparisons are for equal or for unsigned greater than.
In the following example, every other comparison is true.

Rss

Rtt

cmp cmp cmp cmp cmp cmp cmp cmp

1 0 1 0 1 0 1 0 Pd
7 0

Syntax Behavior
Pd=vcmpb.eq(Rss,#u8) for (i = 0; i < 8; i++) {
Pd.i = (Rss.ub[i] == #u);
}
Pd=vcmpb.eq(Rss,Rtt) for (i = 0; i < 8; i++) {
Pd.i = (Rss.b[i] == Rtt.b[i]);
}
Pd=vcmpb.gt(Rss,#s8) for (i = 0; i < 8; i++) {
Pd.i = (Rss.b[i] > #s);
}
Pd=vcmpb.gt(Rss,Rtt) for (i = 0; i < 8; i++) {
Pd.i = (Rss.b[i] > Rtt.b[i]);
}
Pd=vcmpb.gtu(Rss,#u7) for (i = 0; i < 8; i++) {
Pd.i = (Rss.ub[i] > #u);
}
Pd=vcmpb.gtu(Rss,Rtt) for (i = 0; i < 8; i++) {
Pd.i = (Rss.ub[i] > Rtt.ub[i]);
}

80-N2040-53 Rev. AA 582


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Pd=vcmpb.eq(Rss,#u8) Byte Q6_p_vcmpb_eq_PI(Word64 Rss, Word32


Iu8)
Pd=vcmpb.eq(Rss,Rtt) Byte Q6_p_vcmpb_eq_PP(Word64 Rss, Word64
Rtt)
Pd=vcmpb.gt(Rss,#s8) Byte Q6_p_vcmpb_gt_PI(Word64 Rss, Word32
Is8)
Pd=vcmpb.gt(Rss,Rtt) Byte Q6_p_vcmpb_gt_PP(Word64 Rss, Word64
Rtt)
Pd=vcmpb.gtu(Rss,#u7) Byte Q6_p_vcmpb_gtu_PI(Word64 Rss, Word32
Iu7)
Pd=vcmpb.gtu(Rss,Rtt) Byte Q6_p_vcmpb_gtu_PP(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d2
1 1 0 1 0 0 1 0 0 - - s s s s s P P 0 t t t t t 1 1 0 - - - d d Pd=vcmpb.eq(Rss,Rtt)
1 1 0 1 0 0 1 0 0 - - s s s s s P P 0 t t t t t 1 1 1 - - - d d Pd=vcmpb.gtu(Rss,Rtt)
1 1 0 1 0 0 1 0 0 - - s s s s s P P 1 t t t t t 0 1 0 - - - d d Pd=vcmpb.gt(Rss,Rtt)
ICLASS RegType s5 Parse d2
1 1 0 1 1 1 0 0 0 0 0 s s s s s P P - i i i i i i i i 0 0 - d d Pd=vcmpb.eq(Rss,#u8)
1 1 0 1 1 1 0 0 0 0 1 s s s s s P P - i i i i i i i i 0 0 - d d Pd=vcmpb.gt(Rss,#s8)
1 1 0 1 1 1 0 0 0 1 0 s s s s s P P - 0 i i i i i i i 0 0 - d d Pd=vcmpb.gtu(Rss,#u7)

Field name Description


RegType Register type
MajOp Major opcode
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 583


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector compare words


Compare each of two 32-bit words in two 64-bit vectors and set the corresponding bits in a
predicate destination to '1111' if true, '0000' if false.
Word comparisons are for equal, signed greater than, or unsigned greater than.

Rss

Rtt

cmp cmp

1 1 1 1 0 0 0 0 Pd
7 0

Syntax Behavior
Pd=vcmpw.eq(Rss,#s8) Pd[3:0] = (Rss.w[0]==#s);
Pd[7:4] = (Rss.w[1]==#s);
Pd=vcmpw.eq(Rss,Rtt) Pd[3:0] = (Rss.w[0]==Rtt.w[0]);
Pd[7:4] = (Rss.w[1]==Rtt.w[1]);
Pd=vcmpw.gt(Rss,#s8) Pd[3:0] = (Rss.w[0]>#s);
Pd[7:4] = (Rss.w[1]>#s);
Pd=vcmpw.gt(Rss,Rtt) Pd[3:0] = (Rss.w[0]>Rtt.w[0]);
Pd[7:4] = (Rss.w[1]>Rtt.w[1]);
Pd=vcmpw.gtu(Rss,#u7) Pd[3:0] = (Rss.uw[0]>#u.uw[0]);
Pd[7:4] = (Rss.uw[1]>#u.uw[0]);
Pd=vcmpw.gtu(Rss,Rtt) Pd[3:0] = (Rss.uw[0]>Rtt.uw[0]);
Pd[7:4] = (Rss.uw[1]>Rtt.uw[1]);

80-N2040-53 Rev. AA 584


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Pd=vcmpw.eq(Rss,#s8) Byte Q6_p_vcmpw_eq_PI(Word64 Rss, Word32


Is8)
Pd=vcmpw.eq(Rss,Rtt) Byte Q6_p_vcmpw_eq_PP(Word64 Rss, Word64
Rtt)
Pd=vcmpw.gt(Rss,#s8) Byte Q6_p_vcmpw_gt_PI(Word64 Rss, Word32
Is8)
Pd=vcmpw.gt(Rss,Rtt) Byte Q6_p_vcmpw_gt_PP(Word64 Rss, Word64
Rtt)
Pd=vcmpw.gtu(Rss,#u7) Byte Q6_p_vcmpw_gtu_PI(Word64 Rss, Word32
Iu7)
Pd=vcmpw.gtu(Rss,Rtt) Byte Q6_p_vcmpw_gtu_PP(Word64 Rss, Word64
Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 MinOp d2
1 1 0 1 0 0 1 0 0 - - s s s s s P P 0 t t t t t 0 0 0 - - - d d Pd=vcmpw.eq(Rss,Rtt)
1 1 0 1 0 0 1 0 0 - - s s s s s P P 0 t t t t t 0 0 1 - - - d d Pd=vcmpw.gt(Rss,Rtt)
1 1 0 1 0 0 1 0 0 - - s s s s s P P 0 t t t t t 0 1 0 - - - d d Pd=vcmpw.gtu(Rss,Rtt)
ICLASS RegType s5 Parse d2
1 1 0 1 1 1 0 0 0 0 0 s s s s s P P - i i i i i i i i 1 0 - d d Pd=vcmpw.eq(Rss,#s8)
1 1 0 1 1 1 0 0 0 0 1 s s s s s P P - i i i i i i i i 1 0 - d d Pd=vcmpw.gt(Rss,#s8)
1 1 0 1 1 1 0 0 0 1 0 s s s s s P P - 0 i i i i i i i 1 0 - d d Pd=vcmpw.gtu(Rss,#u7)

Field name Description


RegType Register type
MajOp Major opcode
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d2 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t

80-N2040-53 Rev. AA 585


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Viterbi pack even and odd predicate bits


Pack the even and odd bits of two predicate registers into a single destination register. A variant
of this instruction is R3:2 |= vitpack(P1,P0), which places the packed predicate bits into the lower
eight bits of the register pair, which is preshifted by eight bits.
This instruction is useful in Viterbi decoding. Repeated use of the push version enables a history
storage for traceback, purposes.

7 0
Ps

Pt

0 Rd
31 8 7 0

Syntax Behavior
Rd=vitpack(Ps,Pt) PREDUSE_TIMING;
Rd = (Ps&0x55) | (Pt&0xAA);

Class: XTYPE (slots 2,3)

Intrinsics

Rd=vitpack(Ps,Pt) Word32 Q6_R_vitpack_pp(Byte Ps, Byte Pt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s2 Parse t2 d5
1 0 0 0 1 0 0 1 - 0 0 - - - s s P P - - - - t t - - - d d d d d Rd=vitpack(Ps,Pt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s2 Field to encode register s
t2 Field to encode register t
MajOp Major opcode
RegType Register type

80-N2040-53 Rev. AA 586


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector mux
Perform an element-wise byte selection between two vectors.
For each of the low eight bits of predicate register Pu, if the bit is set, the corresponding byte in
Rdd is set to the corresponding byte from Rss. Otherwise, set the byte in Rdd to the byte from Rtt.

Rss

Rtt

mux mux mux mux mux mux mux mux


P[7] P[6] P[5] P[4] P[3] P[2] P[1] P[0]

Rdd

Syntax Behavior
Rdd=vmux(Pu,Rss,Rtt) PREDUSE_TIMING;
for (i = 0; i < 8; i++) {
Rdd.b[i]=(Pu.i?(Rss.b[i]):(Rtt.b[i]));
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vmux(Pu,Rss,Rtt) Word64 Q6_P_vmux_pPP(Byte Pu, Word64 Rss,


Word64 Rtt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType s5 Parse t5 u2 d5
1 1 0 1 0 0 0 1 - - - s s s s s P P - t t t t t - u u d d d d d Rdd=vmux(Pu,Rss,Rtt)

Field name Description


RegType Register type
MinOp Minor opcode
ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
u2 Field to encode register u

80-N2040-53 Rev. AA 587


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

11.10.8 XTYPE SHIFT


The XTYPE SHIFT instruction subclass includes instructions that perform shifts.

Mask generate from immediate


Generate a mask from two immediate values.

Syntax Behavior
Rd=mask(#u5,#U5) Rd = ((1<<#u)-1) << #U;

Class: XTYPE (slots 2,3)

Intrinsics

Rd=mask(#u5,#U5) Word32 Q6_R_mask_II(Word32 Iu5, Word32 IU5)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp Parse MinOp d5
1 0 0 0 1 1 0 1 0 I I - - - - - P P 1 i i i i i I I I d d d d d Rd=mask(#u5,#U5)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 588


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Shift by immediate
Shift the source register value right or left based on the type of instruction. In these instructions,
the shift amount is contained in an unsigned immediate (five bits for 32-bit shifts, six bits for 64-
bit shifts) and the shift instruction gives the shift direction.
Arithmetic right shifts place the sign bit of the source value in the vacated positions.
Logical right shifts place zeros in the vacated positions.
Left shifts always zero-fill the vacated bits.

ASR LSR

Lost Rs Lost Rs

Sign-ext Rd Zero-fill Rd

ASL

Lost Rs

Zero-fill Rd

Syntax Behavior
Rd=asl(Rs,#u5) Rd = Rs << #u;
Rd=asr(Rs,#u5) Rd = Rs >> #u;
Rd=lsr(Rs,#u5) Rd = Rs >>> #u;
Rd=rol(Rs,#u5) Rd = Rs <<R #u;

Rdd=asl(Rss,#u6) Rdd = Rss << #u;


Rdd=asr(Rss,#u6) Rdd = Rss >> #u;
Rdd=lsr(Rss,#u6) Rdd = Rss >>> #u;
Rdd=rol(Rss,#u6) Rdd = Rss <<R #u;

Class: XTYPE (slots 2,3)

Intrinsics

Rd=asl(Rs,#u5) Word32 Q6_R_asl_RI(Word32 Rs, Word32 Iu5)


Rd=asr(Rs,#u5) Word32 Q6_R_asr_RI(Word32 Rs, Word32 Iu5)
Rd=lsr(Rs,#u5) Word32 Q6_R_lsr_RI(Word32 Rs, Word32 Iu5)
Rd=rol(Rs,#u5) Word32 Q6_R_rol_RI(Word32 Rs, Word32 Iu5)

80-N2040-53 Rev. AA 589


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Rdd=asl(Rss,#u6) Word64 Q6_P_asl_PI(Word64 Rss, Word32 Iu6)


Rdd=asr(Rss,#u6) Word64 Q6_P_asr_PI(Word64 Rss, Word32 Iu6)
Rdd=lsr(Rss,#u6) Word64 Q6_P_lsr_PI(Word64 Rss, Word32 Iu6)
Rdd=rol(Rss,#u6) Word64 Q6_P_rol_PI(Word64 Rss, Word32 Iu6)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 0 0 0 s s s s s P P i i i i i i 0 0 0 d d d d d Rdd=asr(Rss,#u6)
1 0 0 0 0 0 0 0 0 0 0 s s s s s P P i i i i i i 0 0 1 d d d d d Rdd=lsr(Rss,#u6)
1 0 0 0 0 0 0 0 0 0 0 s s s s s P P i i i i i i 0 1 0 d d d d d Rdd=asl(Rss,#u6)
1 0 0 0 0 0 0 0 0 0 0 s s s s s P P i i i i i i 0 1 1 d d d d d Rdd=rol(Rss,#u6)
1 0 0 0 1 1 0 0 0 0 0 s s s s s P P 0 i i i i i 0 0 0 d d d d d Rd=asr(Rs,#u5)
1 0 0 0 1 1 0 0 0 0 0 s s s s s P P 0 i i i i i 0 0 1 d d d d d Rd=lsr(Rs,#u5)
1 0 0 0 1 1 0 0 0 0 0 s s s s s P P 0 i i i i i 0 1 0 d d d d d Rd=asl(Rs,#u5)
1 0 0 0 1 1 0 0 0 0 0 s s s s s P P 0 i i i i i 0 1 1 d d d d d Rd=rol(Rs,#u5)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 590


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Shift by immediate and accumulate


Shift the source register value right or left, based on the type of instruction. In these instructions,
an unsigned immediate (5 bits for 32-bit shifts, 6 bits for 64-bit shifts) contains the shift amount,
and the shift instruction gives the shift direction.
Arithmetic right shifts place the sign bit of the source value in the vacated positions. Logical right
shifts place zeros in the vacated positions. Left shifts always zero-fill the vacated bits.
After shifting, add or subtract the shifted value from the destination register or register pair.

Rss # / Rt Rs # / Rt
64-bit shift value Shift amount 32-bit shift value Shift amount

64-bit shift 32-bit shift

64-bit add/sub 32-bit add/sub

64-bit result Rxx 32-bit result Rx

Syntax Behavior
Rx=add(#u8,asl(Rx,#U5)) Rx=apply_extension(#u)+(Rx<<#U);
Rx=add(#u8,lsr(Rx,#U5)) Rx=apply_extension(#u)+(((unsigned
int)Rx)>>#U);
Rx=sub(#u8,asl(Rx,#U5)) Rx=apply_extension(#u)-(Rx<<#U);
Rx=sub(#u8,lsr(Rx,#U5)) Rx=apply_extension(#u)-(((unsigned
int)Rx)>>#U);
Rx[+-]=asl(Rs,#u5) Rx = Rx [+-] Rs << #u;
Rx[+-]=asr(Rs,#u5) Rx = Rx [+-] Rs >> #u;
Rx[+-]=lsr(Rs,#u5) Rx = Rx [+-] Rs >>> #u;
Rx[+-]=rol(Rs,#u5) Rx = Rx [+-] Rs <<R #u;

Rxx[+-]=asl(Rss,#u6) Rxx = Rxx [+-] Rss << #u;


Rxx[+-]=asr(Rss,#u6) Rxx = Rxx [+-] Rss >> #u;
Rxx[+-]=lsr(Rss,#u6) Rxx = Rxx [+-] Rss >>> #u;
Rxx[+-]=rol(Rss,#u6) Rxx = Rxx [+-] Rss <<R #u;

80-N2040-53 Rev. AA 591


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rx+=asl(Rs,#u5) Word32 Q6_R_aslacc_RI(Word32 Rx, Word32 Rs, Word32


Iu5)
Rx+=asr(Rs,#u5) Word32 Q6_R_asracc_RI(Word32 Rx, Word32 Rs, Word32
Iu5)
Rx+=lsr(Rs,#u5) Word32 Q6_R_lsracc_RI(Word32 Rx, Word32 Rs, Word32
Iu5)
Rx+=rol(Rs,#u5) Word32 Q6_R_rolacc_RI(Word32 Rx, Word32 Rs, Word32
Iu5)
Rx-=asl(Rs,#u5) Word32 Q6_R_aslnac_RI(Word32 Rx, Word32 Rs, Word32
Iu5)
Rx-=asr(Rs,#u5) Word32 Q6_R_asrnac_RI(Word32 Rx, Word32 Rs, Word32
Iu5)
Rx-=lsr(Rs,#u5) Word32 Q6_R_lsrnac_RI(Word32 Rx, Word32 Rs, Word32
Iu5)
Rx-=rol(Rs,#u5) Word32 Q6_R_rolnac_RI(Word32 Rx, Word32 Rs, Word32
Iu5)
Rx=add(#u8,asl(Rx,#U5 Word32 Q6_R_add_asl_IRI(Word32 Iu8, Word32 Rx, Word32
)) IU5)
Rx=add(#u8,lsr(Rx,#U5 Word32 Q6_R_add_lsr_IRI(Word32 Iu8, Word32 Rx, Word32
)) IU5)
Rx=sub(#u8,asl(Rx,#U5 Word32 Q6_R_sub_asl_IRI(Word32 Iu8, Word32 Rx, Word32
)) IU5)
Rx=sub(#u8,lsr(Rx,#U5 Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32
)) IU5)
Rxx+=asl(Rss,#u6) Word64 Q6_P_aslacc_PI(Word64 Rxx, Word64 Rss, Word32
Iu6)
Rxx+=asr(Rss,#u6) Word64 Q6_P_asracc_PI(Word64 Rxx, Word64 Rss, Word32
Iu6)
Rxx+=lsr(Rss,#u6) Word64 Q6_P_lsracc_PI(Word64 Rxx, Word64 Rss, Word32
Iu6)
Rxx+=rol(Rss,#u6) Word64 Q6_P_rolacc_PI(Word64 Rxx, Word64 Rss, Word32
Iu6)
Rxx-=asl(Rss,#u6) Word64 Q6_P_aslnac_PI(Word64 Rxx, Word64 Rss, Word32
Iu6)
Rxx-=asr(Rss,#u6) Word64 Q6_P_asrnac_PI(Word64 Rxx, Word64 Rss, Word32
Iu6)
Rxx-=lsr(Rss,#u6) Word64 Q6_P_lsrnac_PI(Word64 Rxx, Word64 Rss, Word32
Iu6)
Rxx-=rol(Rss,#u6) Word64 Q6_P_rolnac_PI(Word64 Rxx, Word64 Rss, Word32
Iu6)

80-N2040-53 Rev. AA 592


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp x5
1 0 0 0 0 0 1 0 0 0 - s s s s s P P i i i i i i 0 0 0 x x x x x Rxx-=asr(Rss,#u6)
1 0 0 0 0 0 1 0 0 0 - s s s s s P P i i i i i i 0 0 1 x x x x x Rxx-=lsr(Rss,#u6)
1 0 0 0 0 0 1 0 0 0 - s s s s s P P i i i i i i 0 1 0 x x x x x Rxx-=asl(Rss,#u6)
1 0 0 0 0 0 1 0 0 0 - s s s s s P P i i i i i i 0 1 1 x x x x x Rxx-=rol(Rss,#u6)
1 0 0 0 0 0 1 0 0 0 - s s s s s P P i i i i i i 1 0 0 x x x x x Rxx+=asr(Rss,#u6)
1 0 0 0 0 0 1 0 0 0 - s s s s s P P i i i i i i 1 0 1 x x x x x Rxx+=lsr(Rss,#u6)
1 0 0 0 0 0 1 0 0 0 - s s s s s P P i i i i i i 1 1 0 x x x x x Rxx+=asl(Rss,#u6)
1 0 0 0 0 0 1 0 0 0 - s s s s s P P i i i i i i 1 1 1 x x x x x Rxx+=rol(Rss,#u6)
1 0 0 0 1 1 1 0 0 0 - s s s s s P P 0 i i i i i 0 0 0 x x x x x Rx-=asr(Rs,#u5)
1 0 0 0 1 1 1 0 0 0 - s s s s s P P 0 i i i i i 0 0 1 x x x x x Rx-=lsr(Rs,#u5)
1 0 0 0 1 1 1 0 0 0 - s s s s s P P 0 i i i i i 0 1 0 x x x x x Rx-=asl(Rs,#u5)
1 0 0 0 1 1 1 0 0 0 - s s s s s P P 0 i i i i i 0 1 1 x x x x x Rx-=rol(Rs,#u5)
1 0 0 0 1 1 1 0 0 0 - s s s s s P P 0 i i i i i 1 0 0 x x x x x Rx+=asr(Rs,#u5)
1 0 0 0 1 1 1 0 0 0 - s s s s s P P 0 i i i i i 1 0 1 x x x x x Rx+=lsr(Rs,#u5)
1 0 0 0 1 1 1 0 0 0 - s s s s s P P 0 i i i i i 1 1 0 x x x x x Rx+=asl(Rs,#u5)
1 0 0 0 1 1 1 0 0 0 - s s s s s P P 0 i i i i i 1 1 1 x x x x x Rx+=rol(Rs,#u5)
ICLASS RegType x5 Parse MajOp
1 1 0 1 1 1 1 0 i i i x x x x x P P i I I I I I i i i 0 i 1 0 - Rx=add(#u8,asl(Rx,#U5))
1 1 0 1 1 1 1 0 i i i x x x x x P P i I I I I I i i i 0 i 1 1 - Rx=sub(#u8,asl(Rx,#U5))
1 1 0 1 1 1 1 0 i i i x x x x x P P i I I I I I i i i 1 i 1 0 - Rx=add(#u8,lsr(Rx,#U5))
1 1 0 1 1 1 1 0 i i i x x x x x P P i I I I I I i i i 1 i 1 1 - Rx=sub(#u8,lsr(Rx,#U5))

Field name Description


RegType Register type
MajOp Major opcode
ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
x5 Field to encode register x
MinOp Minor opcode

80-N2040-53 Rev. AA 593


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Shift by immediate and add


Shift Rs left by 0-7 bits, add to Rt, and place the result in Rd.
This instruction is useful for calculating array pointers, where destruction of the base pointer is
undesirable.

Syntax Behavior
Rd=addasl(Rt,Rs,#u3) Rd = Rt + Rs << #u;

Class: XTYPE (slots 2,3)

Intrinsics

Rd=addasl(Rt,Rs,#u3) Word32 Q6_R_addasl_RRI(Word32 Rt, Word32


Rs, Word32 Iu3)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 1 0 0 0 0 0 s s s s s P P 0 t t t t t i i i d d d d d Rd=addasl(Rt,Rs,#u3)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 594


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Shift by immediate and logical


Shift the source register value right or left based on the type of instruction. In these instructions,
an unsigned immediate (five bits for 32-bit shifts, six bits for 64-bit shifts) contains the shift
amount and the shift instruction gives the shift direction.
Arithmetic right shifts place the sign bit of the source value in the vacated positions. Logical right
shifts place zeros in the vacated positions. Left shifts always zero-fill the vacated bits.
After shifting, take the logical AND, OR, or XOR of the shifted amount and the destination register
or register pair, and place the result back in the destination register or register pair.
Saturation is not available for these instructions.

Rss # / Rt Rs # / Rt
64-bit shift value Shift amount 32-bit shift value Shift amount

64-bit shift 32-bit shift

64-bit AND/OR 32-bit AND/OR

64-bit result Rxx 32-bit result Rx

Syntax Behavior
Rx=and(#u8,asl(Rx,#U5)) Rx=apply_extension(#u)&(Rx<<#U);
Rx=and(#u8,lsr(Rx,#U5)) Rx=apply_extension(#u)&(((unsigned
int)Rx)>>#U);
Rx=or(#u8,asl(Rx,#U5)) Rx=apply_extension(#u)|(Rx<<#U);
Rx=or(#u8,lsr(Rx,#U5)) Rx=apply_extension(#u)|(((unsigned
int)Rx)>>#U);
Rx[&|]=asl(Rs,#u5) Rx = Rx [|&] Rs << #u;
Rx[&|]=asr(Rs,#u5) Rx = Rx [|&] Rs >> #u;
Rx[&|]=lsr(Rs,#u5) Rx = Rx [|&] Rs >>> #u;
Rx[&|]=rol(Rs,#u5) Rx = Rx [|&] Rs <<R #u;

Rx^=asl(Rs,#u5) Rx = Rx ^ Rs << #u;


Rx^=lsr(Rs,#u5) Rx = Rx ^ Rs >>> #u;
Rx^=rol(Rs,#u5) Rx = Rx ^ Rs <<R #u;

Rxx[&|]=asl(Rss,#u6) Rxx = Rxx [|&] Rss << #u;

80-N2040-53 Rev. AA 595


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rxx[&|]=asr(Rss,#u6) Rxx = Rxx [|&] Rss >> #u;
Rxx[&|]=lsr(Rss,#u6) Rxx = Rxx [|&] Rss >>> #u;
Rxx[&|]=rol(Rss,#u6) Rxx = Rxx [|&] Rss <<R #u;

Rxx^=asl(Rss,#u6) Rxx = Rxx ^ Rss << #u;


Rxx^=lsr(Rss,#u6) Rxx = Rxx ^ Rss >>> #u;
Rxx^=rol(Rss,#u6) Rxx = Rxx ^ Rss <<R #u;

Class: XTYPE (slots 2,3)

Intrinsics

Rx&=asl(Rs,#u5) Word32 Q6_R_asland_RI(Word32 Rx, Word32 Rs, Word32 Iu5)


Rx&=asr(Rs,#u5) Word32 Q6_R_asrand_RI(Word32 Rx, Word32 Rs, Word32 Iu5)
Rx&=lsr(Rs,#u5) Word32 Q6_R_lsrand_RI(Word32 Rx, Word32 Rs, Word32 Iu5)
Rx&=rol(Rs,#u5) Word32 Q6_R_roland_RI(Word32 Rx, Word32 Rs, Word32 Iu5)
Rx=and(#u8,asl(Rx,#U5 Word32 Q6_R_and_asl_IRI(Word32 Iu8, Word32 Rx, Word32
)) IU5)
Rx=and(#u8,lsr(Rx,#U5 Word32 Q6_R_and_lsr_IRI(Word32 Iu8, Word32 Rx, Word32
)) IU5)
Rx=or(#u8,asl(Rx,#U5) Word32 Q6_R_or_asl_IRI(Word32 Iu8, Word32 Rx, Word32 IU5)
)
Rx=or(#u8,lsr(Rx,#U5) Word32 Q6_R_or_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5)
)
Rx^=asl(Rs,#u5) Word32 Q6_R_aslxacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5)
Rx^=lsr(Rs,#u5) Word32 Q6_R_lsrxacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5)
Rx^=rol(Rs,#u5) Word32 Q6_R_rolxacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5)
Rx|=asl(Rs,#u5) Word32 Q6_R_aslor_RI(Word32 Rx, Word32 Rs, Word32 Iu5)
Rx|=asr(Rs,#u5) Word32 Q6_R_asror_RI(Word32 Rx, Word32 Rs, Word32 Iu5)
Rx|=lsr(Rs,#u5) Word32 Q6_R_lsror_RI(Word32 Rx, Word32 Rs, Word32 Iu5)
Rx|=rol(Rs,#u5) Word32 Q6_R_rolor_RI(Word32 Rx, Word32 Rs, Word32 Iu5)
Rxx&=asl(Rss,#u6) Word64 Q6_P_asland_PI(Word64 Rxx, Word64 Rss, Word32 Iu6)
Rxx&=asr(Rss,#u6) Word64 Q6_P_asrand_PI(Word64 Rxx, Word64 Rss, Word32 Iu6)
Rxx&=lsr(Rss,#u6) Word64 Q6_P_lsrand_PI(Word64 Rxx, Word64 Rss, Word32 Iu6)
Rxx&=rol(Rss,#u6) Word64 Q6_P_roland_PI(Word64 Rxx, Word64 Rss, Word32 Iu6)
Rxx^=asl(Rss,#u6) Word64 Q6_P_aslxacc_PI(Word64 Rxx, Word64 Rss, Word32
Iu6)
Rxx^=lsr(Rss,#u6) Word64 Q6_P_lsrxacc_PI(Word64 Rxx, Word64 Rss, Word32
Iu6)

80-N2040-53 Rev. AA 596


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Rxx^=rol(Rss,#u6) Word64 Q6_P_rolxacc_PI(Word64 Rxx, Word64 Rss, Word32


Iu6)
Rxx|=asl(Rss,#u6) Word64 Q6_P_aslor_PI(Word64 Rxx, Word64 Rss, Word32 Iu6)
Rxx|=asr(Rss,#u6) Word64 Q6_P_asror_PI(Word64 Rxx, Word64 Rss, Word32 Iu6)
Rxx|=lsr(Rss,#u6) Word64 Q6_P_lsror_PI(Word64 Rxx, Word64 Rss, Word32 Iu6)
Rxx|=rol(Rss,#u6) Word64 Q6_P_rolor_PI(Word64 Rxx, Word64 Rss, Word32 Iu6)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp x5
1 0 0 0 0 0 1 0 0 1 - s s s s s P P i i i i i i 0 0 0 x x x x x Rxx&=asr(Rss,#u6)
1 0 0 0 0 0 1 0 0 1 - s s s s s P P i i i i i i 0 0 1 x x x x x Rxx&=lsr(Rss,#u6)
1 0 0 0 0 0 1 0 0 1 - s s s s s P P i i i i i i 0 1 0 x x x x x Rxx&=asl(Rss,#u6)
1 0 0 0 0 0 1 0 0 1 - s s s s s P P i i i i i i 0 1 1 x x x x x Rxx&=rol(Rss,#u6)
1 0 0 0 0 0 1 0 0 1 - s s s s s P P i i i i i i 1 0 0 x x x x x Rxx|=asr(Rss,#u6)
1 0 0 0 0 0 1 0 0 1 - s s s s s P P i i i i i i 1 0 1 x x x x x Rxx|=lsr(Rss,#u6)
1 0 0 0 0 0 1 0 0 1 - s s s s s P P i i i i i i 1 1 0 x x x x x Rxx|=asl(Rss,#u6)
1 0 0 0 0 0 1 0 0 1 - s s s s s P P i i i i i i 1 1 1 x x x x x Rxx|=rol(Rss,#u6)
1 0 0 0 0 0 1 0 1 0 - s s s s s P P i i i i i i 0 0 1 x x x x x Rxx^=lsr(Rss,#u6)
1 0 0 0 0 0 1 0 1 0 - s s s s s P P i i i i i i 0 1 0 x x x x x Rxx^=asl(Rss,#u6)
1 0 0 0 0 0 1 0 1 0 - s s s s s P P i i i i i i 0 1 1 x x x x x Rxx^=rol(Rss,#u6)
1 0 0 0 1 1 1 0 0 1 - s s s s s P P 0 i i i i i 0 0 0 x x x x x Rx&=asr(Rs,#u5)
1 0 0 0 1 1 1 0 0 1 - s s s s s P P 0 i i i i i 0 0 1 x x x x x Rx&=lsr(Rs,#u5)
1 0 0 0 1 1 1 0 0 1 - s s s s s P P 0 i i i i i 0 1 0 x x x x x Rx&=asl(Rs,#u5)
1 0 0 0 1 1 1 0 0 1 - s s s s s P P 0 i i i i i 0 1 1 x x x x x Rx&=rol(Rs,#u5)
1 0 0 0 1 1 1 0 0 1 - s s s s s P P 0 i i i i i 1 0 0 x x x x x Rx|=asr(Rs,#u5)
1 0 0 0 1 1 1 0 0 1 - s s s s s P P 0 i i i i i 1 0 1 x x x x x Rx|=lsr(Rs,#u5)
1 0 0 0 1 1 1 0 0 1 - s s s s s P P 0 i i i i i 1 1 0 x x x x x Rx|=asl(Rs,#u5)
1 0 0 0 1 1 1 0 0 1 - s s s s s P P 0 i i i i i 1 1 1 x x x x x Rx|=rol(Rs,#u5)
1 0 0 0 1 1 1 0 1 0 - s s s s s P P 0 i i i i i 0 0 1 x x x x x Rx^=lsr(Rs,#u5)
1 0 0 0 1 1 1 0 1 0 - s s s s s P P 0 i i i i i 0 1 0 x x x x x Rx^=asl(Rs,#u5)
1 0 0 0 1 1 1 0 1 0 - s s s s s P P 0 i i i i i 0 1 1 x x x x x Rx^=rol(Rs,#u5)
ICLASS RegType x5 Parse MajOp
1 1 0 1 1 1 1 0 i i i x x x x x P P i I I I I I i i i 0 i 0 0 - Rx=and(#u8,asl(Rx,#U5))
1 1 0 1 1 1 1 0 i i i x x x x x P P i I I I I I i i i 0 i 0 1 - Rx=or(#u8,asl(Rx,#U5))
1 1 0 1 1 1 1 0 i i i x x x x x P P i I I I I I i i i 1 i 0 0 - Rx=and(#u8,lsr(Rx,#U5))
1 1 0 1 1 1 1 0 i i i x x x x x P P i I I I I I i i i 1 i 0 1 - Rx=or(#u8,lsr(Rx,#U5))

Field name Description


RegType Register type
ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
x5 Field to encode register x
MajOp Major opcode
MinOp Minor opcode

80-N2040-53 Rev. AA 597


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Shift right by immediate with rounding


Perform an arithmetic right shift by an immediate amount, and then round the result. This
instruction works by first shifting right, then adding the value +1 to the result, and finally shifting
right again by one bit. The right shifts always inserts the sign-bit in the vacated position.
When using the asrrnd instruction, the assembler adjusts the immediate appropriately.

Lost Rs

Sign-ext

+1

32-bit add

Rd

Syntax Behavior
Rd=asr(Rs,#u5):rnd Rd = ((Rs >> #u)+1) >> 1;
Rd=asrrnd(Rs,#u5) if ("#u5==0") {
Assembler mapped to: "Rd=Rs";
} else {
Assembler mapped to: "Rd=asr(Rs,#u5-1):rnd";
}
Rdd=asr(Rss,#u6):rnd tmp = Rss >> #u;
rnd = tmp & 1;
Rdd = tmp >> 1 + rnd;
Rdd=asrrnd(Rss,#u6) if ("#u6==0") {
Assembler mapped to: "Rdd=Rss";
} else {
Assembler mapped to: "Rdd=asr(Rss,#u6-1):rnd";
}

80-N2040-53 Rev. AA 598


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Class: XTYPE (slots 2,3)

Intrinsics

Rd=asr(Rs,#u5):rnd Word32 Q6_R_asr_RI_rnd(Word32 Rs, Word32


Iu5)
Rd=asrrnd(Rs,#u5) Word32 Q6_R_asrrnd_RI(Word32 Rs, Word32
Iu5)
Rdd=asr(Rss,#u6):rnd Word64 Q6_P_asr_PI_rnd(Word64 Rss, Word32
Iu6)
Rdd=asrrnd(Rss,#u6) Word64 Q6_P_asrrnd_PI(Word64 Rss, Word32
Iu6)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 1 1 0 s s s s s P P i i i i i i 1 1 1 d d d d d Rdd=asr(Rss,#u6):rnd
1 0 0 0 1 1 0 0 0 1 0 s s s s s P P 0 i i i i i 0 0 0 d d d d d Rd=asr(Rs,#u5):rnd

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 599


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Shift left by immediate with saturation


Perform a left shift of the 32-bit source register value by an immediate amount and saturate.
Saturation works by first sign-extending the 32-bit Rs register to 64 bits. It is then left-shifted by
the immediate amount. If this 64-bit value cannot fit in a signed 32-bit number (the upper word is
not the sign-extension of bit 31), saturation is performed based on the sign of the original value.
Saturation clamps the 32-bit result to the range 0x8000_0000 to 0x7fff_ffff.

Syntax Behavior
Rd=asl(Rs,#u5):sat Rd = sat32(sxt32->64(Rs) << #u);

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=asl(Rs,#u5):sat Word32 Q6_R_asl_RI_sat(Word32 Rs, Word32


Iu5)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 1 0 0 0 1 0 s s s s s P P 0 i i i i i 0 1 0 d d d d d Rd=asl(Rs,#u5):sat

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 600


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Shift by register
The shift amount is the least significant seven bits of Rt, treated as a two's complement value. If
the shift amount is negative (bit 6 of Rt is set), reverse the direction of the shift indicated in the
opcode (see Figure).
The source data to shift is always performed as a 64-bit shift. When the Rs source register is a 32-
bit register, this register is first sign or zero-extended to 64-bits. Arithmetic shifts sign-extend the
32-bit source to 64-bits, whereas logical shifts zero extend.
The 64-bit source value is then right or left shifted based on the shift amount and the type of
instruction. Arithmetic right shifts place the sign bit of the source value in the vacated positions.
Logical right shifts place zeros in the vacated positions.

ASR w/ positive Rt LSR w/ positive Rt


ASL w/ negative Rt LSL w/ negative Rt

Lost Rs Lost Rs

Sign-ext Rd Zero-fill Rd

ASL w/ positive Rt
LSL w/ positive Rt
ASR w/ negative Rt
LSR w/ negative Rt

Lost Rs

Zero-fill Rd

Syntax Behavior
Rd=asl(Rs,Rt) shamt=sxt7->32(Rt);
Rd = (shamt>0)?(sxt32->64(Rs)<<shamt):(sxt32-
>64(Rs)>>shamt);

Rd=asr(Rs,Rt) shamt=sxt7->32(Rt);
Rd = (shamt>0)?(sxt32->64(Rs)>>shamt):(sxt32-
>64(Rs)<<shamt);

Rd=lsl(#s6,Rt) shamt = sxt7->32(Rt);


Rd = (shamt>0)?(zxt32->64(#s)<<shamt):(zxt32-
>64(#s)>>>shamt);

Rd=lsl(Rs,Rt) shamt=sxt7->32(Rt);
Rd = (shamt>0)?(zxt32->64(Rs)<<shamt):(zxt32-
>64(Rs)>>>shamt);

80-N2040-53 Rev. AA 601


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rd=lsr(Rs,Rt) shamt=sxt7->32(Rt);
Rd = (shamt>0)?(zxt32->64(Rs)>>>shamt):(zxt32-
>64(Rs)<<shamt);

Rdd=asl(Rss,Rt) shamt=sxt7->32(Rt);
Rdd = (shamt>0)?(Rss<<shamt):(Rss>>shamt);
Rdd=asr(Rss,Rt) shamt=sxt7->32(Rt);
Rdd = (shamt>0)?(Rss>>shamt):(Rss<<shamt);
Rdd=lsl(Rss,Rt) shamt=sxt7->32(Rt);
Rdd = (shamt>0)?(Rss<<shamt):(Rss>>>shamt);
Rdd=lsr(Rss,Rt) shamt=sxt7->32(Rt);
Rdd = (shamt>0)?(Rss>>>shamt):(Rss<<shamt);

Class: XTYPE (slots 2,3)

Intrinsics

Rd=asl(Rs,Rt) Word32 Q6_R_asl_RR(Word32 Rs, Word32 Rt)


Rd=asr(Rs,Rt) Word32 Q6_R_asr_RR(Word32 Rs, Word32 Rt)
Rd=lsl(#s6,Rt) Word32 Q6_R_lsl_IR(Word32 Is6, Word32 Rt)
Rd=lsl(Rs,Rt) Word32 Q6_R_lsl_RR(Word32 Rs, Word32 Rt)
Rd=lsr(Rs,Rt) Word32 Q6_R_lsr_RR(Word32 Rs, Word32 Rt)
Rdd=asl(Rss,Rt) Word64 Q6_P_asl_PR(Word64 Rss, Word32 Rt)
Rdd=asr(Rss,Rt) Word64 Q6_P_asr_PR(Word64 Rss, Word32 Rt)
Rdd=lsl(Rss,Rt) Word64 Q6_P_lsl_PR(Word64 Rss, Word32 Rt)
Rdd=lsr(Rss,Rt) Word64 Q6_P_lsr_PR(Word64 Rss, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 1 1 1 0 - s s s s s P P - t t t t t 0 0 - d d d d d Rdd=asr(Rss,Rt)
1 1 0 0 0 0 1 1 1 0 - s s s s s P P - t t t t t 0 1 - d d d d d Rdd=lsr(Rss,Rt)
1 1 0 0 0 0 1 1 1 0 - s s s s s P P - t t t t t 1 0 - d d d d d Rdd=asl(Rss,Rt)
1 1 0 0 0 0 1 1 1 0 - s s s s s P P - t t t t t 1 1 - d d d d d Rdd=lsl(Rss,Rt)
1 1 0 0 0 1 1 0 0 1 - s s s s s P P - t t t t t 0 0 - d d d d d Rd=asr(Rs,Rt)
1 1 0 0 0 1 1 0 0 1 - s s s s s P P - t t t t t 0 1 - d d d d d Rd=lsr(Rs,Rt)
1 1 0 0 0 1 1 0 0 1 - s s s s s P P - t t t t t 1 0 - d d d d d Rd=asl(Rs,Rt)
1 1 0 0 0 1 1 0 0 1 - s s s s s P P - t t t t t 1 1 - d d d d d Rd=lsl(Rs,Rt)
ICLASS RegType Maj Parse t5 Min d5
1 1 0 0 0 1 1 0 1 0 - i i i i i P P - t t t t t 1 1 i d d d d d Rd=lsl(#s6,Rt)

80-N2040-53 Rev. AA 602


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 603


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Shift by register and accumulate


The shift amount is the least significant seven bits of Rt, treated as a two's complement value.
When the shift amount is negative (bit 6 of Rt is set), reverse the direction of the shift indicated in
the opcode.
Shift the source register value right or left based on the shift amount and the type of instruction.
Arithmetic right shifts place the sign bit of the source value in the vacated positions. Logical right
shifts place zeros in the vacated positions.
The shift operation is always performed as a 64-bit shift. When Rs is a 32-bit register, this register
is first sign- or zero-extended to 64-bits. Arithmetic shifts sign-extend the 32-bit source to 64-bits,
whereas logical shifts zero extend.
After shifting, add or subtract the 64-bit shifted amount from the destination register or register
pair.

Rss # / Rt Rs # / Rt
64-bit shift value Shift amount 32-bit shift value Shift amount

64-bit shift 32-bit shift

64-bit add/sub 32-bit add/sub

64-bit result Rxx 32-bit result Rx

Syntax Behavior
Rx[+-]=asl(Rs,Rt) shamt=sxt7->32(Rt);
Rx = Rx [+-] (shamt>0)?(sxt32->64(Rs)<<shamt):(sxt32-
>64(Rs)>>shamt);

Rx[+-]=asr(Rs,Rt) shamt=sxt7->32(Rt);
Rx = Rx [+-] (shamt>0)?(sxt32->64(Rs)>>shamt):(sxt32-
>64(Rs)<<shamt);

Rx[+-]=lsl(Rs,Rt) shamt=sxt7->32(Rt);
Rx = Rx [+-] (shamt>0)?(zxt32->64(Rs)<<shamt):(zxt32-
>64(Rs)>>>shamt);

Rx[+-]=lsr(Rs,Rt) shamt=sxt7->32(Rt);
Rx = Rx [+-] (shamt>0)?(zxt32->64(Rs)>>>shamt):(zxt32-
>64(Rs)<<shamt);

Rxx[+- shamt=sxt7->32(Rt);
]=asl(Rss,Rt) Rxx = Rxx [+-] (shamt>0)?(Rss<<shamt):(Rss>>shamt);

80-N2040-53 Rev. AA 604


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rxx[+- shamt=sxt7->32(Rt);
]=asr(Rss,Rt) Rxx = Rxx [+-] (shamt>0)?(Rss>>shamt):(Rss<<shamt);
Rxx[+- shamt=sxt7->32(Rt);
]=lsl(Rss,Rt) Rxx = Rxx [+-] (shamt>0)?(Rss<<shamt):(Rss>>>shamt);
Rxx[+- shamt=sxt7->32(Rt);
]=lsr(Rss,Rt) Rxx = Rxx [+-] (shamt>0)?(Rss>>>shamt):(Rss<<shamt);

Class: XTYPE (slots 2,3)

Intrinsics

Rx+=asl(Rs,Rt) Word32 Q6_R_aslacc_RR(Word32 Rx, Word32 Rs, Word32


Rt)
Rx+=asr(Rs,Rt) Word32 Q6_R_asracc_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx+=lsl(Rs,Rt) Word32 Q6_R_lslacc_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx+=lsr(Rs,Rt) Word32 Q6_R_lsracc_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx-=asl(Rs,Rt) Word32 Q6_R_aslnac_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx-=asr(Rs,Rt) Word32 Q6_R_asrnac_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx-=lsl(Rs,Rt) Word32 Q6_R_lslnac_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rx-=lsr(Rs,Rt) Word32 Q6_R_lsrnac_RR(Word32 Rx, Word32 Rs, Word32
Rt)
Rxx+=asl(Rss,Rt) Word64 Q6_P_aslacc_PR(Word64 Rxx, Word64 Rss,
Word32 Rt)
Rxx+=asr(Rss,Rt) Word64 Q6_P_asracc_PR(Word64 Rxx, Word64 Rss,
Word32 Rt)
Rxx+=lsl(Rss,Rt) Word64 Q6_P_lslacc_PR(Word64 Rxx, Word64 Rss,
Word32 Rt)
Rxx+=lsr(Rss,Rt) Word64 Q6_P_lsracc_PR(Word64 Rxx, Word64 Rss,
Word32 Rt)
Rxx-=asl(Rss,Rt) Word64 Q6_P_aslnac_PR(Word64 Rxx, Word64 Rss,
Word32 Rt)
Rxx-=asr(Rss,Rt) Word64 Q6_P_asrnac_PR(Word64 Rxx, Word64 Rss,
Word32 Rt)
Rxx-=lsl(Rss,Rt) Word64 Q6_P_lslnac_PR(Word64 Rxx, Word64 Rss,
Word32 Rt)
Rxx-=lsr(Rss,Rt) Word64 Q6_P_lsrnac_PR(Word64 Rxx, Word64 Rss,
Word32 Rt)

80-N2040-53 Rev. AA 605


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min x5
1 1 0 0 1 0 1 1 1 0 0 s s s s s P P - t t t t t 0 0 - x x x x x Rxx-=asr(Rss,Rt)
1 1 0 0 1 0 1 1 1 0 0 s s s s s P P - t t t t t 0 1 - x x x x x Rxx-=lsr(Rss,Rt)
1 1 0 0 1 0 1 1 1 0 0 s s s s s P P - t t t t t 1 0 - x x x x x Rxx-=asl(Rss,Rt)
1 1 0 0 1 0 1 1 1 0 0 s s s s s P P - t t t t t 1 1 - x x x x x Rxx-=lsl(Rss,Rt)
1 1 0 0 1 0 1 1 1 1 0 s s s s s P P - t t t t t 0 0 - x x x x x Rxx+=asr(Rss,Rt)
1 1 0 0 1 0 1 1 1 1 0 s s s s s P P - t t t t t 0 1 - x x x x x Rxx+=lsr(Rss,Rt)
1 1 0 0 1 0 1 1 1 1 0 s s s s s P P - t t t t t 1 0 - x x x x x Rxx+=asl(Rss,Rt)
1 1 0 0 1 0 1 1 1 1 0 s s s s s P P - t t t t t 1 1 - x x x x x Rxx+=lsl(Rss,Rt)
1 1 0 0 1 1 0 0 1 0 - s s s s s P P - t t t t t 0 0 - x x x x x Rx-=asr(Rs,Rt)
1 1 0 0 1 1 0 0 1 0 - s s s s s P P - t t t t t 0 1 - x x x x x Rx-=lsr(Rs,Rt)
1 1 0 0 1 1 0 0 1 0 - s s s s s P P - t t t t t 1 0 - x x x x x Rx-=asl(Rs,Rt)
1 1 0 0 1 1 0 0 1 0 - s s s s s P P - t t t t t 1 1 - x x x x x Rx-=lsl(Rs,Rt)
1 1 0 0 1 1 0 0 1 1 - s s s s s P P - t t t t t 0 0 - x x x x x Rx+=asr(Rs,Rt)
1 1 0 0 1 1 0 0 1 1 - s s s s s P P - t t t t t 0 1 - x x x x x Rx+=lsr(Rs,Rt)
1 1 0 0 1 1 0 0 1 1 - s s s s s P P - t t t t t 1 0 - x x x x x Rx+=asl(Rs,Rt)
1 1 0 0 1 1 0 0 1 1 - s s s s s P P - t t t t t 1 1 - x x x x x Rx+=lsl(Rs,Rt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 606


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Shift by register and logical


The shift amount is the least significant seven bits of Rt, treated as a two's complement value.
When the shift amount is negative (bit 6 of Rt is set), reverse the direction of the shift indicated in
the opcode.
Shift the source register value right or left based on the shift amount and the type of instruction.
Arithmetic right shifts place the sign bit of the source value in the vacated positions. Logical right
shifts place zeros in the vacated positions.
The shift operation is always performed as a 64-bit shift. When the Rs source register is a 32-bit
register, this register is first sign or zero-extended to 64-bits. Arithmetic shifts sign-extend the 32-
bit source to 64-bits, whereas logical shifts zero extend.
After shifting, take the logical AND or OR of the shifted amount and the destination register or
register pair, and place the result back in the destination register or register pair.
Saturation is not available for these instructions.

Rss # / Rt Rs # / Rt
64-bit shift value Shift amount 32-bit shift value Shift amount

64-bit shift 32-bit shift

64-bit AND/OR 32-bit AND/OR

64-bit result Rxx 32-bit result Rx

Syntax Behavior
Rx[&|]=asl(Rs,Rt) shamt=sxt7->32(Rt);
Rx = Rx [|&] (shamt>0)?(sxt32->64(Rs)<<shamt):(sxt32-
>64(Rs)>>shamt);

Rx[&|]=asr(Rs,Rt) shamt=sxt7->32(Rt);
Rx = Rx [|&] (shamt>0)?(sxt32->64(Rs)>>shamt):(sxt32-
>64(Rs)<<shamt);

Rx[&|]=lsl(Rs,Rt) shamt=sxt7->32(Rt);
Rx = Rx [|&] (shamt>0)?(zxt32->64(Rs)<<shamt):(zxt32-
>64(Rs)>>>shamt);

Rx[&|]=lsr(Rs,Rt) shamt=sxt7->32(Rt);
Rx = Rx [|&] (shamt>0)?(zxt32->64(Rs)>>>shamt):(zxt32-
>64(Rs)<<shamt);

80-N2040-53 Rev. AA 607


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Syntax Behavior
Rxx[&|]=asl(Rss,Rt shamt=sxt7->32(Rt);
) Rxx = Rxx [|&] (shamt>0)?(Rss<<shamt):(Rss>>shamt);
Rxx[&|]=asr(Rss,Rt shamt=sxt7->32(Rt);
) Rxx = Rxx [|&] (shamt>0)?(Rss>>shamt):(Rss<<shamt);
Rxx[&|]=lsl(Rss,Rt shamt=sxt7->32(Rt);
) Rxx = Rxx [|&] (shamt>0)?(Rss<<shamt):(Rss>>>shamt);
Rxx[&|]=lsr(Rss,Rt shamt=sxt7->32(Rt);
) Rxx = Rxx [|&] (shamt>0)?(Rss>>>shamt):(Rss<<shamt);
Rxx^=asl(Rss,Rt) shamt=sxt7->32(Rt);
Rxx = Rxx ^ (shamt>0)?(Rss<<shamt):(Rss>>shamt);
Rxx^=asr(Rss,Rt) shamt=sxt7->32(Rt);
Rxx = Rxx ^ (shamt>0)?(Rss>>shamt):(Rss<<shamt);
Rxx^=lsl(Rss,Rt) shamt=sxt7->32(Rt);
Rxx = Rxx ^ (shamt>0)?(Rss<<shamt):(Rss>>>shamt);
Rxx^=lsr(Rss,Rt) shamt=sxt7->32(Rt);
Rxx = Rxx ^ (shamt>0)?(Rss>>>shamt):(Rss<<shamt);

Class: XTYPE (slots 2,3)

Intrinsics

Rx&=asl(Rs,Rt) Word32 Q6_R_asland_RR(Word32 Rx, Word32 Rs, Word32 Rt)


Rx&=asr(Rs,Rt) Word32 Q6_R_asrand_RR(Word32 Rx, Word32 Rs, Word32 Rt)
Rx&=lsl(Rs,Rt) Word32 Q6_R_lsland_RR(Word32 Rx, Word32 Rs, Word32 Rt)
Rx&=lsr(Rs,Rt) Word32 Q6_R_lsrand_RR(Word32 Rx, Word32 Rs, Word32 Rt)
Rx|=asl(Rs,Rt) Word32 Q6_R_aslor_RR(Word32 Rx, Word32 Rs, Word32 Rt)
Rx|=asr(Rs,Rt) Word32 Q6_R_asror_RR(Word32 Rx, Word32 Rs, Word32 Rt)
Rx|=lsl(Rs,Rt) Word32 Q6_R_lslor_RR(Word32 Rx, Word32 Rs, Word32 Rt)
Rx|=lsr(Rs,Rt) Word32 Q6_R_lsror_RR(Word32 Rx, Word32 Rs, Word32 Rt)
Rxx&=asl(Rss,Rt) Word64 Q6_P_asland_PR(Word64 Rxx, Word64 Rss, Word32 Rt)
Rxx&=asr(Rss,Rt) Word64 Q6_P_asrand_PR(Word64 Rxx, Word64 Rss, Word32 Rt)
Rxx&=lsl(Rss,Rt) Word64 Q6_P_lsland_PR(Word64 Rxx, Word64 Rss, Word32 Rt)
Rxx&=lsr(Rss,Rt) Word64 Q6_P_lsrand_PR(Word64 Rxx, Word64 Rss, Word32 Rt)
Rxx^=asl(Rss,Rt) Word64 Q6_P_aslxacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt)
Rxx^=asr(Rss,Rt) Word64 Q6_P_asrxacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt)
Rxx^=lsl(Rss,Rt) Word64 Q6_P_lslxacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt)

80-N2040-53 Rev. AA 608


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Rxx^=lsr(Rss,Rt) Word64 Q6_P_lsrxacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt)


Rxx|=asl(Rss,Rt) Word64 Q6_P_aslor_PR(Word64 Rxx, Word64 Rss, Word32 Rt)
Rxx|=asr(Rss,Rt) Word64 Q6_P_asror_PR(Word64 Rxx, Word64 Rss, Word32 Rt)
Rxx|=lsl(Rss,Rt) Word64 Q6_P_lslor_PR(Word64 Rxx, Word64 Rss, Word32 Rt)
Rxx|=lsr(Rss,Rt) Word64 Q6_P_lsror_PR(Word64 Rxx, Word64 Rss, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min x5
1 1 0 0 1 0 1 1 0 0 0 s s s s s P P - t t t t t 0 0 - x x x x x Rxx|=asr(Rss,Rt)
1 1 0 0 1 0 1 1 0 0 0 s s s s s P P - t t t t t 0 1 - x x x x x Rxx|=lsr(Rss,Rt)
1 1 0 0 1 0 1 1 0 0 0 s s s s s P P - t t t t t 1 0 - x x x x x Rxx|=asl(Rss,Rt)
1 1 0 0 1 0 1 1 0 0 0 s s s s s P P - t t t t t 1 1 - x x x x x Rxx|=lsl(Rss,Rt)
1 1 0 0 1 0 1 1 0 1 0 s s s s s P P - t t t t t 0 0 - x x x x x Rxx&=asr(Rss,Rt)
1 1 0 0 1 0 1 1 0 1 0 s s s s s P P - t t t t t 0 1 - x x x x x Rxx&=lsr(Rss,Rt)
1 1 0 0 1 0 1 1 0 1 0 s s s s s P P - t t t t t 1 0 - x x x x x Rxx&=asl(Rss,Rt)
1 1 0 0 1 0 1 1 0 1 0 s s s s s P P - t t t t t 1 1 - x x x x x Rxx&=lsl(Rss,Rt)
1 1 0 0 1 0 1 1 0 1 1 s s s s s P P - t t t t t 0 0 - x x x x x Rxx^=asr(Rss,Rt)
1 1 0 0 1 0 1 1 0 1 1 s s s s s P P - t t t t t 0 1 - x x x x x Rxx^=lsr(Rss,Rt)
1 1 0 0 1 0 1 1 0 1 1 s s s s s P P - t t t t t 1 0 - x x x x x Rxx^=asl(Rss,Rt)
1 1 0 0 1 0 1 1 0 1 1 s s s s s P P - t t t t t 1 1 - x x x x x Rxx^=lsl(Rss,Rt)
1 1 0 0 1 1 0 0 0 0 - s s s s s P P - t t t t t 0 0 - x x x x x Rx|=asr(Rs,Rt)
1 1 0 0 1 1 0 0 0 0 - s s s s s P P - t t t t t 0 1 - x x x x x Rx|=lsr(Rs,Rt)
1 1 0 0 1 1 0 0 0 0 - s s s s s P P - t t t t t 1 0 - x x x x x Rx|=asl(Rs,Rt)
1 1 0 0 1 1 0 0 0 0 - s s s s s P P - t t t t t 1 1 - x x x x x Rx|=lsl(Rs,Rt)
1 1 0 0 1 1 0 0 0 1 - s s s s s P P - t t t t t 0 0 - x x x x x Rx&=asr(Rs,Rt)
1 1 0 0 1 1 0 0 0 1 - s s s s s P P - t t t t t 0 1 - x x x x x Rx&=lsr(Rs,Rt)
1 1 0 0 1 1 0 0 0 1 - s s s s s P P - t t t t t 1 0 - x x x x x Rx&=asl(Rs,Rt)
1 1 0 0 1 1 0 0 0 1 - s s s s s P P - t t t t t 1 1 - x x x x x Rx&=lsl(Rs,Rt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
s5 Field to encode register s
t5 Field to encode register t
x5 Field to encode register x
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 609


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Shift by register with saturation


The shift amount is the least significant seven bits of Rt, treated as a two's complement value.
When the shift amount is negative (bit 6 of Rt is set), reverse the direction of the shift indicated in
the opcode.
Saturation is available for 32-bit arithmetic left shifts: either an ASL instruction with positive Rt, or
an ASR instruction with negative Rt. Saturation works by first sign-extending the 32-bit Rs register
to 64 bits. It is then shifted by the shift amount. If this 64-bit value cannot fit in a signed 32-bit
number (the upper word is not the sign-extension of bit 31), aturation is performed based on the
sign of the original value. Saturation clamps the 32-bit result to the range 0x80000000 to
0x7fffffff.
Syntax Behavior
Rd=asl(Rs,Rt):sat shamt=sxt7->32(Rt);
Rd = bidir_shiftl(Rs,shamt);
Rd=asr(Rs,Rt):sat shamt=sxt7->32(Rt);
Rd = bidir_shiftr(Rs,shamt);

Class: XTYPE (slots 2,3)

Notes
■ If saturation occurs during execution of this instruction (a result is clamped to either
maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

Intrinsics

Rd=asl(Rs,Rt):sat Word32 Q6_R_asl_RR_sat(Word32 Rs, Word32 Rt)


Rd=asr(Rs,Rt):sat Word32 Q6_R_asr_RR_sat(Word32 Rs, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 1 1 0 0 0 - s s s s s P P - t t t t t 0 0 - d d d d d Rd=asr(Rs,Rt):sat
1 1 0 0 0 1 1 0 0 0 - s s s s s P P - t t t t t 1 0 - d d d d d Rd=asl(Rs,Rt):sat

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 610


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector shift halfwords by immediate


Shift individual halfwords of the source vector. Arithmetic right shifts place the sign bit of the
source values in the vacated positions. Logical right shifts place zeros in the vacated positions.

Rdd = vaslh(Rss,#) Shift amount Rt/#u4

lost lost lost lost Rss

0 0 0 0
Rdd

Syntax Behavior
Rdd=vaslh(Rss,#u4) for (i=0;i<4;i++) {
Rdd.h[i]=(Rss.h[i]<<#u);
}
Rdd=vasrh(Rss,#u4) for (i=0;i<4;i++) {
Rdd.h[i]=(Rss.h[i]>>#u);
}
Rdd=vlsrh(Rss,#u4) for (i=0;i<4;i++) {
Rdd.h[i]=(Rss.uh[i]>>#u);
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vaslh(Rss,#u4) Word64 Q6_P_vaslh_PI(Word64 Rss, Word32 Iu4)


Rdd=vasrh(Rss,#u4) Word64 Q6_P_vasrh_PI(Word64 Rss, Word32 Iu4)
Rdd=vlsrh(Rss,#u4) Word64 Q6_P_vlsrh_PI(Word64 Rss, Word32 Iu4)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 1 0 0 s s s s s P P 0 0 i i i i 0 0 0 d d d d d Rdd=vasrh(Rss,#u4)
1 0 0 0 0 0 0 0 1 0 0 s s s s s P P 0 0 i i i i 0 0 1 d d d d d Rdd=vlsrh(Rss,#u4)
1 0 0 0 0 0 0 0 1 0 0 s s s s s P P 0 0 i i i i 0 1 0 d d d d d Rdd=vaslh(Rss,#u4)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 611


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector arithmetic shift halfwords with round


For each halfword in the vector, round then arithmetic shift right by an immediate amount. The
results are stored in the destination register.
Rdd = vasrh(Rss,#u):rnd

1<<(#u-1) 1<<(#u-1) 1<<(#u-1) 1<<(#u-1)

+ + + +
lost lost lost lost

Sign- Sign- Sign- Sign-


ext ext ext ext

Syntax Behavior
Rdd=vasrh(Rss,#u4):raw for (i=0;i<4;i++) {
Rdd.h[i]=( ((Rss.h[i] >> #u)+1)>>1 );
}
Rdd=vasrh(Rss,#u4):rnd if ("#u4==0") {
Assembler mapped to: "Rdd=Rss";
} else {
Assembler mapped to: "Rdd=vasrh(Rss,#u4-1):raw";
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vasrh(Rss,#u4):rnd Word64 Q6_P_vasrh_PI_rnd(Word64 Rss, Word32 Iu4)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 0 0 1 s s s s s P P 0 0 i i i i 0 0 0 d d d d d Rdd=vasrh(Rss,#u4):raw

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 612


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector arithmetic shift halfwords with saturate and pack


For each halfword in the vector, optionally round, then arithmetic shift right by an immediate
amount. Saturate the results to unsigned [0-255] and then pack in the destination register.
Rd = vasrhub(Rss,#u):rnd:sat

1<<(#u-1) 1<<(#u-1) 1<<(#u-1) 1<<(#u-1)

+ + + +
lost lost lost lost

Sat_u8 Sat_u8 Sat_u8 Sat_u8

Syntax Behavior
Rd=vasrhub(Rss,#u4):raw for (i=0;i<4;i++) {
Rd.b[i]=usat8(((Rss.h[i] >> #u )+1)>>1);
}
Rd=vasrhub(Rss,#u4):rnd:sa if ("#u4==0") {
t Assembler mapped to: "Rd=vsathub(Rss)";
} else {
Assembler mapped to: "Rd=vasrhub(Rss,#u4-
1):raw";
}
Rd=vasrhub(Rss,#u4):sat for (i=0;i<4;i++) {
Rd.b[i]=usat8(Rss.h[i] >> #u);
}

Class: XTYPE (slots 2,3)

Notes

■ If saturation occurs during execution of this instruction (a result is clamped to either


maximum or minimum values), the OVF bit in the status register is set. OVF remains set until
explicitly cleared by a transfer to the status register.

80-N2040-53 Rev. AA 613


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rd=vasrhub(Rss,#u4):rnd:sat Word32 Q6_R_vasrhub_PI_rnd_sat(Word64 Rss,


Word32 Iu4)
Rd=vasrhub(Rss,#u4):sat Word32 Q6_R_vasrhub_PI_sat(Word64 Rss, Word32
Iu4)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 0 0 0 1 1 s s s s s P P 0 0 i i i i 1 0 0 d d d d d Rd=vasrhub(Rss,#u4):raw
1 0 0 0 1 0 0 0 0 1 1 s s s s s P P 0 0 i i i i 1 0 1 d d d d d Rd=vasrhub(Rss,#u4):sat

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 614


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector shift halfwords by register


The shift amount is the least significant seven bits of Rt, treated as a two's complement value. If
the shift amount is negative, reverse the direction of the shift. Shift the source values right or left
based on the shift amount and the type of instruction. Arithmetic right shifts place the sign bit of
the source value in the vacated positions. Logical right shifts place zeros in the vacated positions.

Rdd = vaslh(Rss,#) Shift amount Rt/#u4

lost lost lost lost Rss

0 0 0 0
Rdd

Syntax Behavior
Rdd=vaslh(Rss,Rt for (i=0;i<4;i++) {
) Rdd.h[i]=(sxt7->32(Rt)>0)?(sxt16->64(Rss.h[i])<<sxt7-
>32 (Rt)):(sxt 16->64(Rss.h[i])>>sxt7->32(Rt));
}
Rdd=vasrh(Rss,Rt for (i=0;i<4;i++) {
) Rdd.h[i]=(sxt7->32(Rt)>0)?(sxt16->64(Rss.h[i])>>sxt7-
>32 (Rt)):(sxt 16->64(Rss.h[i])<<sxt7->32(Rt));
}
Rdd=vlslh(Rss,Rt for (i=0;i<4;i++) {
) Rdd.h[i]=(sxt7->32(Rt)>0)?(zxt16->64(Rss.uh[i])<<sxt7-
>32(Rt)):(zxt16->64(Rss.uh[i])>>>sxt7->32(Rt));
}
Rdd=vlsrh(Rss,Rt for (i=0;i<4;i++) {
) Rdd.h[i]=(sxt7->32(Rt)>0)?(zxt16->64(Rss.uh[i])>>>sxt7-
>32 (Rt)):(zxt 16->64(Rss.uh[i])<<sxt7->32(Rt));
}

Class: XTYPE (slots 2,3)

Notes

■ If the number of bits to shift is greater than the width of the vector element, the result is
either all sign-bits (for arithmetic right shifts) or all zeros for logical and left shifts.

80-N2040-53 Rev. AA 615


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Intrinsics

Rdd=vaslh(Rss,Rt) Word64 Q6_P_vaslh_PR(Word64 Rss, Word32 Rt)


Rdd=vasrh(Rss,Rt) Word64 Q6_P_vasrh_PR(Word64 Rss, Word32 Rt)
Rdd=vlslh(Rss,Rt) Word64 Q6_P_vlslh_PR(Word64 Rss, Word32 Rt)
Rdd=vlsrh(Rss,Rt) Word64 Q6_P_vlsrh_PR(Word64 Rss, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 1 1 0 1 - s s s s s P P - t t t t t 0 0 - d d d d d Rdd=vasrh(Rss,Rt)
1 1 0 0 0 0 1 1 0 1 - s s s s s P P - t t t t t 0 1 - d d d d d Rdd=vlsrh(Rss,Rt)
1 1 0 0 0 0 1 1 0 1 - s s s s s P P - t t t t t 1 0 - d d d d d Rdd=vaslh(Rss,Rt)
1 1 0 0 0 0 1 1 0 1 - s s s s s P P - t t t t t 1 1 - d d d d d Rdd=vlslh(Rss,Rt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 616


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector shift words by immediate


Shift individual words of the source vector. Arithmetic right shifts place the sign bit of the source
values in the vacated positions. Logical right shifts place zeros in the vacated positions.
Rdd = vaslw(Rss,{Rt/#}) Shift amount Rt/#u5

lost lost Rss

0 0 0 Rdd

Syntax Behavior
Rdd=vaslw(Rss,#u5) for (i=0;i<2;i++) {
Rdd.w[i]=(Rss.w[i]<<#u);
}
Rdd=vasrw(Rss,#u5) for (i=0;i<2;i++) {
Rdd.w[i]=(Rss.w[i]>>#u);
}
Rdd=vlsrw(Rss,#u5) for (i=0;i<2;i++) {
Rdd.w[i]=(Rss.uw[i]>>#u);
}

Class: XTYPE (slots 2,3)

Intrinsics

Rdd=vaslw(Rss,#u5) Word64 Q6_P_vaslw_PI(Word64 Rss, Word32 Iu5)


Rdd=vasrw(Rss,#u5) Word64 Q6_P_vasrw_PI(Word64 Rss, Word32 Iu5)
Rdd=vlsrw(Rss,#u5) Word64 Q6_P_vlsrw_PI(Word64 Rss, Word32 Iu5)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 0 0 0 0 0 1 0 s s s s s P P 0 i i i i i 0 0 0 d d d d d Rdd=vasrw(Rss,#u5)
1 0 0 0 0 0 0 0 0 1 0 s s s s s P P 0 i i i i i 0 0 1 d d d d d Rdd=vlsrw(Rss,#u5)
1 0 0 0 0 0 0 0 0 1 0 s s s s s P P 0 i i i i i 0 1 0 d d d d d Rdd=vaslw(Rss,#u5)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
MajOp Major opcode
MinOp Minor opcode
RegType Register type

80-N2040-53 Rev. AA 617


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector shift words by register


The shift amount is the least significant seven bits of Rt, treated as a two's complement value. If
the shift amount is negative, reverse the direction of the shift. Shift the source values right or left
based on the shift amount and the type of instruction. Arithmetic right shifts place the sign bit of
the source value in the vacated positions. Logical right shifts place zeros in the vacated positions.

Rdd = vaslw(Rss,{Rt/#}) Shift amount Rt/#u5

lost lost Rss

0 0 0 Rdd

Syntax Behavior
Rdd=vaslw(Rss,Rt) for (i=0;i<2;i++) {
Rdd.w[i]=(sxt7->32(Rt)>0)?(sxt32->64(Rss.w[i])<<sxt7-
>32(Rt)):(sxt32->64(Rss.w[i])>>sxt7->32(Rt));
}
Rdd=vasrw(Rss,Rt) for (i=0;i<2;i++) {
Rdd.w[i]=(sxt7->32(Rt)>0)?(sxt32->64(Rss.w[i])>>sxt7-
>32 (Rt)):(sxt 32->64(Rss.w[i])<<sxt7->32(Rt));
}
Rdd=vlslw(Rss,Rt) for (i=0;i<2;i++) {
Rdd.w[i]=(sxt7->32(Rt)>0)?(zxt32->64(Rss.uw[i])<<sxt7-
>32(Rt)):(zxt32->64(Rss.uw[i])>>>sxt7->32(Rt));
}
Rdd=vlsrw(Rss,Rt) for (i=0;i<2;i++) {
Rdd.w[i]=(sxt7->32(Rt)>0)?(zxt32->64(Rss.uw[i])>>>sxt7-
>32(Rt)):(zxt32->64(Rss.uw[i])<<sxt7->32(Rt));
}

Class: XTYPE (slots 2,3)

Notes

■ If the number of bits to shift is greater than the width of the vector element, the result is
either all sign-bits (for arithmetic right shifts) or all zeros for logical and left shifts.

Intrinsics

Rdd=vaslw(Rss,Rt) Word64 Q6_P_vaslw_PR(Word64 Rss, Word32 Rt)


Rdd=vasrw(Rss,Rt) Word64 Q6_P_vasrw_PR(Word64 Rss, Word32 Rt)
Rdd=vlslw(Rss,Rt) Word64 Q6_P_vlslw_PR(Word64 Rss, Word32 Rt)
Rdd=vlsrw(Rss,Rt) Word64 Q6_P_vlsrw_PR(Word64 Rss, Word32 Rt)

80-N2040-53 Rev. AA 618


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType Maj s5 Parse t5 Min d5
1 1 0 0 0 0 1 1 0 0 - s s s s s P P - t t t t t 0 0 - d d d d d Rdd=vasrw(Rss,Rt)
1 1 0 0 0 0 1 1 0 0 - s s s s s P P - t t t t t 0 1 - d d d d d Rdd=vlsrw(Rss,Rt)
1 1 0 0 0 0 1 1 0 0 - s s s s s P P - t t t t t 1 0 - d d d d d Rdd=vaslw(Rss,Rt)
1 1 0 0 0 0 1 1 0 0 - s s s s s P P - t t t t t 1 1 - d d d d d Rdd=vlslw(Rss,Rt)

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
Maj Major opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 619


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Vector shift words with truncate and pack


Shift individual words of the source vector Rss right by a register or immediate amount. The low
16-bits of each word are packed into destination register Rd.

Rd = vasrw(Rss,{Rt/#}) Shift amount Rt/#u5

lost lost Rss

sxt sxt

Low 16 bits Low 16 bits

Rd

Syntax Behavior
Rd=vasrw(Rss,#u5 for (i=0;i<2;i++) {
) Rd.h[i]=(Rss.w[i]>>#u).h[0];
}
Rd=vasrw(Rss,Rt) for (i=0;i<2;i++) {
Rd.h[i]=(sxt7->32(Rt)>0)?(sxt32->64(Rss.w[i])>>sxt7-
>32(Rt)):(sxt32->64(Rss.w[i])<<sxt7->32(Rt)).h[0];
}

Class: XTYPE (slots 2,3)

Intrinsics

Rd=vasrw(Rss,#u5) Word32 Q6_R_vasrw_PI(Word64 Rss, Word32


Iu5)
Rd=vasrw(Rss,Rt) Word32 Q6_R_vasrw_PR(Word64 Rss, Word32 Rt)

Encoding

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ICLASS RegType MajOp s5 Parse MinOp d5
1 0 0 0 1 0 0 0 1 1 0 s s s s s P P 0 i i i i i 0 1 0 d d d d d Rd=vasrw(Rss,#u5)
ICLASS RegType s5 Parse t5 Min d5
1 1 0 0 0 1 0 1 - - - s s s s s P P - t t t t t 0 1 0 d d d d d Rd=vasrw(Rss,Rt)

80-N2040-53 Rev. AA 620


Qualcomm Hexagon V73 Programmer’s Reference Manual Instruction Set

Field name Description


ICLASS Instruction class
Parse Packet/loop parse bits
d5 Field to encode register d
s5 Field to encode register s
t5 Field to encode register t
MajOp Major opcode
MinOp Minor opcode
Min Minor opcode
RegType Register type

80-N2040-53 Rev. AA 621


Instruction Index
A
abs
Rd=abs(Rs)[:sat] 334
Rdd=abs(Rss) 333

add
if ([!]Pu[.new]) Rd=add(Rs,#s8) 178
if ([!]Pu[.new]) Rd=add(Rs,Rt) 178
Rd=add(#u6,mpyi(Rs,#U6)) 484
Rd=add(#u6,mpyi(Rs,Rt)) 484
Rd=add(Rs,#s16) 156
Rd=add(Rs,add(Ru,#s6)) 335
Rd=add(Rs,Rt) 156
Rd=add(Rs,Rt):sat 156
Rd=add(Rs,Rt):sat:deprecated 337
Rd=add(Rt.[HL],Rs.[HL])[:sat]:<<16 339
Rd=add(Rt.L,Rs.[HL])[:sat] 339
Rd=add(Ru,mpyi(#u6:2,Rs)) 484
Rd=add(Ru,mpyi(Rs,#u6)) 484
Rdd=add(Rs,Rtt) 337
Rdd=add(Rss,Rtt,Px):carry 341
Rdd=add(Rss,Rtt) 337
Rdd=add(Rss,Rtt):raw:hi 337
Rdd=add(Rss,Rtt):raw:lo 337
Rdd=add(Rss,Rtt):sat 337
Rx+=add(Rs,#s8) 335
Rx+=add(Rs,Rt) 335
Rx-=add(Rs,#s8) 335
Rx-=add(Rs,Rt) 335
Ry=add(Ru,mpyi(Ry,Rs)) 485

addasl
Rd=addasl(Rt,Rs,#u3) 594

all8
Pd=all8(Ps) 197

allocframe
allocframe(#u11:3) 312
allocframe(Rx,#u11:3):raw 312

and
if ([!]Pu[.new]) Rd=and(Rs,Rt) 183
Pd=and(Ps,and(Pt,[!]Pu)) 203
Pd=and(Pt,[!]Ps) 203
Rd=and(Rs,#s10) 158
Rd=and(Rs,Rt) 158
Rd=and(Rt,~Rs) 158
Rdd=and(Rss,Rtt) 343
Rdd=and(Rtt,~Rss) 343
Rx[&|^]=and(Rs,~Rt) 346
Rx[&|^]=and(Rs,Rt) 346
Rx|=and(Rs,#s10) 346

any8
Pd=any8(Ps) 197

80-N2040-53 Rev. AA 622


Qualcomm Hexagon V73 Programmer’s Reference Manual

asl
Rd=asl(Rs,#u5) 589
Rd=asl(Rs,#u5):sat 600
Rd=asl(Rs,Rt) 601
Rd=asl(Rs,Rt):sat 610
Rdd=asl(Rss,#u6) 589
Rdd=asl(Rss,Rt) 602
Rx[&|]=asl(Rs,#u5) 595
Rx[&|]=asl(Rs,Rt) 607
Rx[+-]=asl(Rs,#u5) 591
Rx[+-]=asl(Rs,Rt) 604
Rx^=asl(Rs,#u5) 595
Rx=add(#u8,asl(Rx,#U5)) 591
Rx=and(#u8,asl(Rx,#U5)) 595
Rx=or(#u8,asl(Rx,#U5)) 595
Rx=sub(#u8,asl(Rx,#U5)) 591
Rxx[&|]=asl(Rss,#u6) 595
Rxx[&|]=asl(Rss,Rt) 608
Rxx[+-]=asl(Rss,#u6) 591
Rxx[+-]=asl(Rss,Rt) 604
Rxx^=asl(Rss,#u6) 596
Rxx^=asl(Rss,Rt) 608

aslh
if ([!]Pu[.new]) Rd=aslh(Rs) 180
Rd=aslh(Rs) 176

asr
Rd=asr(Rs,#u5) 589
Rd=asr(Rs,#u5):rnd 598
Rd=asr(Rs,Rt) 601
Rd=asr(Rs,Rt):sat 610
Rdd=asr(Rss,#u6) 589
Rdd=asr(Rss,#u6):rnd 598
Rdd=asr(Rss,Rt) 602
Rx[&|]=asr(Rs,#u5) 595
Rx[&|]=asr(Rs,Rt) 607
Rx[+-]=asr(Rs,#u5) 591
Rx[+-]=asr(Rs,Rt) 604
Rxx[&|]=asr(Rss,#u6) 596
Rxx[&|]=asr(Rss,Rt) 608
Rxx[+-]=asr(Rss,#u6) 591
Rxx[+-]=asr(Rss,Rt) 605
Rxx^=asr(Rss,Rt) 608

asrh
if ([!]Pu[.new]) Rd=asrh(Rs) 180
Rd=asrh(Rs) 176

asrrnd
Rd=asrrnd(Rs,#u5) 598
Rdd=asrrnd(Rss,#u6) 598

B
barrier
barrier 317

bitsclr
Pd=[!]bitsclr(Rs,#u6) 574
Pd=[!]bitsclr(Rs,Rt) 574

80-N2040-53 Rev. AA 623


Qualcomm Hexagon V73 Programmer’s Reference Manual

bitsplit
Rdd=bitsplit(Rs,#u5) 424
Rdd=bitsplit(Rs,Rt) 424

bitsset
Pd=[!]bitsset(Rs,Rt) 574

boundscheck
Pd=boundscheck(Rs,Rtt) 567
Pd=boundscheck(Rss,Rtt):raw:hi 567
Pd=boundscheck(Rss,Rtt):raw:lo 567

brev
Rd=brev(Rs) 421
Rdd=brev(Rss) 421

brkpt
brkpt 318

C
call
call #r22:2 211
if ([!]Pu) call #r15:2 211

callr
callr Rs 206
if ([!]Pu) callr Rs 206

callrh
callrh Rs 206, 207

cl0
Rd=cl0(Rs) 409
Rd=cl0(Rss) 409

cl1
Rd=cl1(Rs) 409
Rd=cl1(Rss) 409

clb
Rd=add(clb(Rs),#s6) 409
Rd=add(clb(Rss),#s6) 409
Rd=clb(Rs) 409
Rd=clb(Rss) 409

clip
Rd=clip(Rs,#u5) 342

clrbit
memb(Rs+#u6:0)=clrbit(#U5) 268
memh(Rs+#u6:1)=clrbit(#U5) 270
memw(Rs+#u6:2)=clrbit(#U5) 271
Rd=clrbit(Rs,#u5) 422
Rd=clrbit(Rs,Rt) 422

80-N2040-53 Rev. AA 624


Qualcomm Hexagon V73 Programmer’s Reference Manual

cmp.eq
if ([!]cmp.eq(Ns.new,#-1)) jump:<hint> #r9:2 272
if ([!]cmp.eq(Ns.new,#U5)) jump:<hint> #r9:2 272
if ([!]cmp.eq(Ns.new,Rt)) jump:<hint> #r9:2 272
p[01]=cmp.eq(Rs,#-1) 213
p[01]=cmp.eq(Rs,#U5) 213
p[01]=cmp.eq(Rs,Rt) 213
Pd=[!]cmp.eq(Rs,#s10) 191
Pd=[!]cmp.eq(Rs,Rt) 191
Pd=cmp.eq(Rss,Rtt) 573
Rd=[!]cmp.eq(Rs,#s8) 193
Rd=[!]cmp.eq(Rs,Rt) 193

cmp.ge
Pd=cmp.ge(Rs,#s8) 191

cmp.geu
Pd=cmp.geu(Rs,#u8) 191

cmp.gt
if ([!]cmp.gt(Ns.new,#-1)) jump:<hint> #r9:2 272
if ([!]cmp.gt(Ns.new,#U5)) jump:<hint> #r9:2 272
if ([!]cmp.gt(Ns.new,Rt)) jump:<hint> #r9:2 272
if ([!]cmp.gt(Rt,Ns.new)) jump:<hint> #r9:2 273
p[01]=cmp.gt(Rs,#-1) 213
p[01]=cmp.gt(Rs,#U5) 213
p[01]=cmp.gt(Rs,Rt) 213
Pd=[!]cmp.gt(Rs,#s10) 191
Pd=[!]cmp.gt(Rs,Rt) 191
Pd=cmp.gt(Rss,Rtt) 573

cmp.gtu
if ([!]cmp.gtu(Ns.new,#U5)) jump:<hint> #r9:2 273
if ([!]cmp.gtu(Ns.new,Rt)) jump:<hint> #r9:2 273
if ([!]cmp.gtu(Rt,Ns.new)) jump:<hint> #r9:2 273
p[01]=cmp.gtu(Rs,#U5) 214
p[01]=cmp.gtu(Rs,Rt) 214
Pd=[!]cmp.gtu(Rs,#u9) 191
Pd=[!]cmp.gtu(Rs,Rt) 191
Pd=cmp.gtu(Rss,Rtt) 573

cmp.lt
Pd=cmp.lt(Rs,Rt) 191

cmp.ltu
Pd=cmp.ltu(Rs,Rt) 191

cmpb.eq
Pd=cmpb.eq(Rs,#u8) 569
Pd=cmpb.eq(Rs,Rt) 569

cmpb.gt
Pd=cmpb.gt(Rs,#s8) 569
Pd=cmpb.gt(Rs,Rt) 569

cmpb.gtu
Pd=cmpb.gtu(Rs,#u7) 569
Pd=cmpb.gtu(Rs,Rt) 569

cmph.eq
Pd=cmph.eq(Rs,#s8) 571
Pd=cmph.eq(Rs,Rt) 571

80-N2040-53 Rev. AA 625


Qualcomm Hexagon V73 Programmer’s Reference Manual

cmph.gt
Pd=cmph.gt(Rs,#s8) 571
Pd=cmph.gt(Rs,Rt) 571

cmph.gtu
Pd=cmph.gtu(Rs,#u7) 571
Pd=cmph.gtu(Rs,Rt) 571

cmpy
Rd=cmpy(Rs,Rt)[:<<1]:rnd:sat 439
Rd=cmpy(Rs,Rt*)[:<<1]:rnd:sat 439
Rdd=cmpy(Rs,Rt)[:<<1]:sat 433
Rdd=cmpy(Rs,Rt*)[:<<1]:sat 433
Rxx+=cmpy(Rs,Rt)[:<<1]:sat 434
Rxx+=cmpy(Rs,Rt*)[:<<1]:sat 434
Rxx-=cmpy(Rs,Rt)[:<<1]:sat 434
Rxx-=cmpy(Rs,Rt*)[:<<1]:sat 434

cmpyi
Rdd=cmpyi(Rs,Rt) 437
Rxx+=cmpyi(Rs,Rt) 437

cmpyiw
Rd=cmpyiw(Rss,Rtt):<<1:rnd:sat 443
Rd=cmpyiw(Rss,Rtt):<<1:sat 443
Rd=cmpyiw(Rss,Rtt*):<<1:rnd:sat 443
Rd=cmpyiw(Rss,Rtt*):<<1:sat 443
Rdd=cmpyiw(Rss,Rtt) 444
Rdd=cmpyiw(Rss,Rtt*) 444
Rxx+=cmpyiw(Rss,Rtt) 444
Rxx+=cmpyiw(Rss,Rtt*) 444

cmpyiwh
Rd=cmpyiwh(Rss,Rt):<<1:rnd:sat 441
Rd=cmpyiwh(Rss,Rt*):<<1:rnd:sat 441

cmpyr
Rdd=cmpyr(Rs,Rt) 437
Rxx+=cmpyr(Rs,Rt) 437

cmpyrw
Rd=cmpyrw(Rss,Rtt):<<1:rnd:sat 443
Rd=cmpyrw(Rss,Rtt):<<1:sat 443
Rd=cmpyrw(Rss,Rtt*):<<1:rnd:sat 444
Rd=cmpyrw(Rss,Rtt*):<<1:sat 444
Rdd=cmpyrw(Rss,Rtt) 444
Rdd=cmpyrw(Rss,Rtt*) 444
Rxx+=cmpyrw(Rss,Rtt) 444
Rxx+=cmpyrw(Rss,Rtt*) 444

cmpyrwh
Rd=cmpyrwh(Rss,Rt):<<1:rnd:sat 441
Rd=cmpyrwh(Rss,Rt*):<<1:rnd:sat 441

combine
if ([!]Pu[.new]) Rdd=combine(Rs,Rt) 182
Rd=combine(Rt.[HL],Rs.[HL]) 172
Rdd=combine(#s8,#S8) 172
Rdd=combine(#s8,#U6) 172
Rdd=combine(#s8,Rs) 172
Rdd=combine(Rs,#s8) 172
Rdd=combine(Rs,Rt) 173

80-N2040-53 Rev. AA 626


Qualcomm Hexagon V73 Programmer’s Reference Manual

convert_d2df
Rdd=convert_d2df(Rss) 467

convert_d2sf
Rd=convert_d2sf(Rss) 467

convert_df2d
Rdd=convert_df2d(Rss) 469
Rdd=convert_df2d(Rss):chop 469

convert_df2sf
Rd=convert_df2sf(Rss) 466

convert_df2ud
Rdd=convert_df2ud(Rss) 469
Rdd=convert_df2ud(Rss):chop 469

convert_df2uw
Rd=convert_df2uw(Rss) 469
Rd=convert_df2uw(Rss):chop 469

convert_df2w
Rd=convert_df2w(Rss) 469
Rd=convert_df2w(Rss):chop 469

convert_sf2d
Rdd=convert_sf2d(Rs) 469
Rdd=convert_sf2d(Rs):chop 469

convert_sf2df
Rdd=convert_sf2df(Rs) 466

convert_sf2ud
Rdd=convert_sf2ud(Rs) 469
Rdd=convert_sf2ud(Rs):chop 469

convert_sf2uw
Rd=convert_sf2uw(Rs) 469
Rd=convert_sf2uw(Rs):chop 469

convert_sf2w
Rd=convert_sf2w(Rs) 469
Rd=convert_sf2w(Rs):chop 469

convert_ud2df
Rdd=convert_ud2df(Rss) 467

convert_ud2sf
Rd=convert_ud2sf(Rss) 467

convert_uw2df
Rdd=convert_uw2df(Rs) 467

convert_uw2sf
Rd=convert_uw2sf(Rs) 467

convert_w2df
Rdd=convert_w2df(Rs) 467

convert_w2sf
Rd=convert_w2sf(Rs) 467

80-N2040-53 Rev. AA 627


Qualcomm Hexagon V73 Programmer’s Reference Manual

cround
Rd=cround(Rs,#u5) 355
Rd=cround(Rs,Rt) 355
Rdd=cround(Rss,#u6) 355
Rdd=cround(Rss,Rt) 356

ct0
Rd=ct0(Rs) 412
Rd=ct0(Rss) 412

ct1
Rd=ct1(Rs) 412
Rd=ct1(Rss) 412

D
dccleana
dccleana(Rs) 320

dccleaninva
dccleaninva(Rs) 320

dcfetch
dcfetch(Rs) 319
dcfetch(Rs+#u11:3) 319

dcinva
dcinva(Rs) 320

dczeroa
dczeroa(Rs) 316

dealloc_return
dealloc_return 258
if ([!]Pv.new) Rdd=dealloc_return(Rs):nt:raw 258
if ([!]Pv.new) Rdd=dealloc_return(Rs):t:raw 258
if ([!]Pv) dealloc_return 258
if ([!]Pv) Rdd=dealloc_return(Rs):raw 258
nt
if ([!]Pv.new) dealloc_return:nt 258
Rdd=dealloc_return(Rs):raw 258
t
if ([!]Pv.new) dealloc_return:t 258

deallocframe
deallocframe 256
Rdd=deallocframe(Rs):raw 256

decbin
Rdd=decbin(Rss,Rtt) 540

deinterleave
Rdd=deinterleave(Rss) 418

dfadd
Rdd=dfadd(Rss,Rtt) 461

dfclass
Pd=dfclass(Rss,#u5) 462

dfcmp.eq
Pd=dfcmp.eq(Rss,Rtt) 464

80-N2040-53 Rev. AA 628


Qualcomm Hexagon V73 Programmer’s Reference Manual

dfcmp.ge
Pd=dfcmp.ge(Rss,Rtt) 464

dfcmp.gt
Pd=dfcmp.gt(Rss,Rtt) 464

dfcmp.uo
Pd=dfcmp.uo(Rss,Rtt) 464

dfmake
Rdd=dfmake(#u10):neg 478
Rdd=dfmake(#u10):pos 478

dfmax
Rdd=dfmax(Rss,Rtt) 479

dfmin
Rdd=dfmin(Rss,Rtt) 480

dfmpyfix
Rdd=dfmpyfix(Rss,Rtt) 481

dfmpyhh
Rxx+=dfmpyhh(Rss,Rtt) 473

dfmpylh
Rxx+=dfmpylh(Rss,Rtt) 473

dfmpyll
Rdd=dfmpyll(Rss,Rtt) 481

dfsub
Rdd=dfsub(Rss,Rtt) 483

diag
diag(Rs) 322

diag0
diag0(Rss,Rtt) 322

diag1
diag1(Rss,Rtt) 322

dmsyncht
Rd=dmsyncht 329

E
endloop0
endloop0 194

endloop01
endloop01 194

endloop1
endloop1 194

extract
Rd=extract(Rs,#u5,#U5) 413
Rd=extract(Rs,Rtt) 413
Rdd=extract(Rss,#u6,#U6) 413
Rdd=extract(Rss,Rtt) 414

80-N2040-53 Rev. AA 629


Qualcomm Hexagon V73 Programmer’s Reference Manual

extractu
Rd=extractu(Rs,#u5,#U5) 413
Rd=extractu(Rs,Rtt) 413
Rdd=extractu(Rss,#u6,#U6) 414
Rdd=extractu(Rss,Rtt) 414

F
fastcorner9
Pd=[!]fastcorner9(Ps,Pt) 196

H
hintjr
hintjr(Rs) 208

I
icinva
icinva(Rs) 323

if ([!]p[01].new) jump:<hint> #r9:2 213, 213, 213, 213, 213, 213, 214, 214, 214

insert
Rx=insert(Rs,#u5,#U5) 416
Rx=insert(Rs,Rtt) 416
Rxx=insert(Rss,#u6,#U6) 416
Rxx=insert(Rss,Rtt) 417

interleave
Rdd=interleave(Rss) 418

isync
isync 324

J
jump
if ([!]Pu.new) jump:<hint> #r15:2 218
if ([!]Pu) jump #r15:2 217
if ([!]Pu) jump:<hint> #r15:2 217
jump #r22:2 217
nt
if (Rs!=#0) jump:nt #r13:2 219
if (Rs<=#0) jump:nt #r13:2 219
if (Rs==#0) jump:nt #r13:2 219
if (Rs>=#0) jump:nt #r13:2 219
Rd=#U6 221
Rd=Rs 221
t
if (Rs!=#0) jump:t #r13:2 219
if (Rs<=#0) jump:t #r13:2 219
if (Rs==#0) jump:t #r13:2 219
if (Rs>=#0) jump:t #r13:2 219

jump #r9:2 221, 221

jumpr
if ([!]Pu) jumpr Rs 209
if ([!]Pu[.new]) jumpr:<hint> Rs 209
jumpr Rs 209

jumprh
jumprh Rs 209, 210

80-N2040-53 Rev. AA 630


Qualcomm Hexagon V73 Programmer’s Reference Manual

L
l2fetch
l2fetch(Rs,Rt) 326
l2fetch(Rs,Rtt) 326

lfs
Rdd=lfs(Rss,Rtt) 419

linecpy
Rdd=linecpy(Rs,Rtt) 242

loop0
loop0(#r7:2,#U10) 198
loop0(#r7:2,Rs) 198

loop1
loop1(#r7:2,#U10) 198
loop1(#r7:2,Rs) 198

lsl
Rd=lsl(#s6,Rt) 601
Rd=lsl(Rs,Rt) 601
Rdd=lsl(Rss,Rt) 602
Rx[&|]=lsl(Rs,Rt) 607
Rx[+-]=lsl(Rs,Rt) 604
Rxx[&|]=lsl(Rss,Rt) 608
Rxx[+-]=lsl(Rss,Rt) 605
Rxx^=lsl(Rss,Rt) 608

lsr
Rd=lsr(Rs,#u5) 589
Rd=lsr(Rs,Rt) 602
Rdd=lsr(Rss,#u6) 589
Rdd=lsr(Rss,Rt) 602
Rx[&|]=lsr(Rs,#u5) 595
Rx[&|]=lsr(Rs,Rt) 607
Rx[+-]=lsr(Rs,#u5) 591
Rx[+-]=lsr(Rs,Rt) 604
Rx^=lsr(Rs,#u5) 595
Rx=add(#u8,lsr(Rx,#U5)) 591
Rx=and(#u8,lsr(Rx,#U5)) 595
Rx=or(#u8,lsr(Rx,#U5)) 595
Rx=sub(#u8,lsr(Rx,#U5)) 591
Rxx[&|]=lsr(Rss,#u6) 596
Rxx[&|]=lsr(Rss,Rt) 608
Rxx[+-]=lsr(Rss,#u6) 591
Rxx[+-]=lsr(Rss,Rt) 605
Rxx^=lsr(Rss,#u6) 596
Rxx^=lsr(Rss,Rt) 608

M
mask
Rd=mask(#u5,#U5) 588
Rdd=mask(Pt) 575

max
Rd=max(Rs,Rt) 349
Rdd=max(Rss,Rtt) 350

80-N2040-53 Rev. AA 631


Qualcomm Hexagon V73 Programmer’s Reference Manual

maxu
Rd=maxu(Rs,Rt) 349
Rdd=maxu(Rss,Rtt) 350

memb
if ([!]Pt[.new]) Rd=memb(#u6) 229
if ([!]Pt[.new]) Rd=memb(Rs+#u6:0) 229
if ([!]Pt[.new]) Rd=memb(Rx++#s4:0) 229
if ([!]Pv[.new]) memb(#u6)=Nt.new 278
if ([!]Pv[.new]) memb(#u6)=Rt 295
if ([!]Pv[.new]) memb(Rs+#u6:0)=#S6 295
if ([!]Pv[.new]) memb(Rs+#u6:0)=Nt.new 278
if ([!]Pv[.new]) memb(Rs+#u6:0)=Rt 295
if ([!]Pv[.new]) memb(Rs+Ru<<#u2)=Nt.new 278
if ([!]Pv[.new]) memb(Rs+Ru<<#u2)=Rt 295
if ([!]Pv[.new]) memb(Rx++#s4:0)=Nt.new 278
if ([!]Pv[.new]) memb(Rx++#s4:0)=Rt 295
if ([!]Pv[.new]) Rd=memb(Rs+Rt<<#u2) 229
memb(gp+#u16:0)=Nt.new 276
memb(gp+#u16:0)=Rt 293
memb(Re=#U6)=Nt.new 276
memb(Re=#U6)=Rt 293
memb(Rs+#s11:0)=Nt.new 276
memb(Rs+#s11:0)=Rt 293
memb(Rs+#u6:0)[+-]=#U5 268
memb(Rs+#u6:0)[+-|&]=Rt 268
memb(Rs+#u6:0)=#S8 293
memb(Rs+Ru<<#u2)=Nt.new 276
memb(Rs+Ru<<#u2)=Rt 293
memb(Ru<<#u2+#U6)=Nt.new 276
memb(Ru<<#u2+#U6)=Rt 293
memb(Rx++#s4:0:circ(Mu))=Nt.new 276
memb(Rx++#s4:0:circ(Mu))=Rt 293
memb(Rx++#s4:0)=Nt.new 276
memb(Rx++#s4:0)=Rt 293
memb(Rx++I:circ(Mu))=Nt.new 276
memb(Rx++I:circ(Mu))=Rt 293
memb(Rx++Mu:brev)=Nt.new 276
memb(Rx++Mu:brev)=Rt 293
memb(Rx++Mu)=Nt.new 276
memb(Rx++Mu)=Rt 293
Rd=memb(gp+#u16:0) 227
Rd=memb(Re=#U6) 227
Rd=memb(Rs+#s11:0) 227
Rd=memb(Rs+Rt<<#u2) 227
Rd=memb(Rt<<#u2+#U6) 227
Rd=memb(Rx++#s4:0:circ(Mu)) 227
Rd=memb(Rx++#s4:0) 227
Rd=memb(Rx++I:circ(Mu)) 227
Rd=memb(Rx++Mu:brev) 227
Rd=memb(Rx++Mu) 227

memb_fifo
Ryy=memb_fifo(Re=#U6) 231
Ryy=memb_fifo(Rs) 231
Ryy=memb_fifo(Rs+#s11:0) 231
Ryy=memb_fifo(Rt<<#u2+#U6) 231
Ryy=memb_fifo(Rx++#s4:0:circ(Mu)) 231
Ryy=memb_fifo(Rx++#s4:0) 231
Ryy=memb_fifo(Rx++I:circ(Mu)) 232
Ryy=memb_fifo(Rx++Mu:brev) 232
Ryy=memb_fifo(Rx++Mu) 232

80-N2040-53 Rev. AA 632


Qualcomm Hexagon V73 Programmer’s Reference Manual

membh
Rd=membh(Re=#U6) 260
Rd=membh(Rs) 260
Rd=membh(Rs+#s11:1) 260
Rd=membh(Rt<<#u2+#U6) 260
Rd=membh(Rx++#s4:1:circ(Mu)) 261
Rd=membh(Rx++#s4:1) 261
Rd=membh(Rx++I:circ(Mu)) 261
Rd=membh(Rx++Mu:brev) 261
Rd=membh(Rx++Mu) 261
Rdd=membh(Re=#U6) 263
Rdd=membh(Rs) 263
Rdd=membh(Rs+#s11:2) 263
Rdd=membh(Rt<<#u2+#U6) 263
Rdd=membh(Rx++#s4:2:circ(Mu)) 263
Rdd=membh(Rx++#s4:2) 263
Rdd=membh(Rx++I:circ(Mu)) 264
Rdd=membh(Rx++Mu:brev) 264
Rdd=membh(Rx++Mu) 264

memd
if ([!]Pt[.new]) Rdd=memd(#u6) 225
if ([!]Pt[.new]) Rdd=memd(Rs+#u6:3) 225
if ([!]Pt[.new]) Rdd=memd(Rx++#s4:3) 225
if ([!]Pv[.new]) memd(#u6)=Rtt 291
if ([!]Pv[.new]) memd(Rs+#u6:3)=Rtt 291
if ([!]Pv[.new]) memd(Rs+Ru<<#u2)=Rtt 291
if ([!]Pv[.new]) memd(Rx++#s4:3)=Rtt 291
if ([!]Pv[.new]) Rdd=memd(Rs+Rt<<#u2) 225
memd(gp+#u16:3)=Rtt 288
memd(Re=#U6)=Rtt 288
memd(Rs+#s11:3)=Rtt 288
memd(Rs+Ru<<#u2)=Rtt 288
memd(Ru<<#u2+#U6)=Rtt 288
memd(Rx++#s4:3:circ(Mu))=Rtt 288
memd(Rx++#s4:3)=Rtt 288
memd(Rx++I:circ(Mu))=Rtt 288
memd(Rx++Mu:brev)=Rtt 288
memd(Rx++Mu)=Rtt 288
Rdd=memd(gp+#u16:3) 222
Rdd=memd(Re=#U6) 222
Rdd=memd(Rs+#s11:3) 222
Rdd=memd(Rs+Rt<<#u2) 222
Rdd=memd(Rt<<#u2+#U6) 222
Rdd=memd(Rx++#s4:3:circ(Mu)) 222
Rdd=memd(Rx++#s4:3) 222
Rdd=memd(Rx++I:circ(Mu)) 222
Rdd=memd(Rx++Mu:brev) 222
Rdd=memd(Rx++Mu) 222

memd_aq
Rdd=memd_aq(Rs) 224

memd_locked
memd_locked(Rs,Pd)=Rtt 315
Rdd=memd_locked(Rs) 314

memd_rl
memd_rl(Rs):at=Rtt 290
memd_rl(Rs):st=Rtt 290

80-N2040-53 Rev. AA 633


Qualcomm Hexagon V73 Programmer’s Reference Manual

memh
if ([!]Pt[.new]) Rd=memh(#u6) 239
if ([!]Pt[.new]) Rd=memh(Rs+#u6:1) 239
if ([!]Pt[.new]) Rd=memh(Rx++#s4:1) 239
if ([!]Pv[.new]) memh(#u6)=Nt.new 282
if ([!]Pv[.new]) memh(#u6)=Rt 301
if ([!]Pv[.new]) memh(#u6)=Rt.H 301
if ([!]Pv[.new]) memh(Rs+#u6:1)=#S6 301
if ([!]Pv[.new]) memh(Rs+#u6:1)=Nt.new 282
if ([!]Pv[.new]) memh(Rs+#u6:1)=Rt 301
if ([!]Pv[.new]) memh(Rs+#u6:1)=Rt.H 301
if ([!]Pv[.new]) memh(Rs+Ru<<#u2)=Nt.new 282
if ([!]Pv[.new]) memh(Rs+Ru<<#u2)=Rt 302
if ([!]Pv[.new]) memh(Rs+Ru<<#u2)=Rt.H 301
if ([!]Pv[.new]) memh(Rx++#s4:1)=Nt.new 282
if ([!]Pv[.new]) memh(Rx++#s4:1)=Rt 302
if ([!]Pv[.new]) memh(Rx++#s4:1)=Rt.H 302
if ([!]Pv[.new]) Rd=memh(Rs+Rt<<#u2) 239
memh(gp+#u16:1)=Nt.new 280
memh(gp+#u16:1)=Rt 299
memh(gp+#u16:1)=Rt.H 299
memh(Re=#U6)=Nt.new 280
memh(Re=#U6)=Rt 298
memh(Re=#U6)=Rt.H 298
memh(Rs+#s11:1)=Nt.new 280
memh(Rs+#s11:1)=Rt 298
memh(Rs+#s11:1)=Rt.H 298
memh(Rs+#u6:1)[+-]=#U5 270
memh(Rs+#u6:1)[+-|&]=Rt 270
memh(Rs+#u6:1)=#S8 298
memh(Rs+Ru<<#u2)=Nt.new 280
memh(Rs+Ru<<#u2)=Rt 298
memh(Rs+Ru<<#u2)=Rt.H 298
memh(Ru<<#u2+#U6)=Nt.new 280
memh(Ru<<#u2+#U6)=Rt 298
memh(Ru<<#u2+#U6)=Rt.H 298
memh(Rx++#s4:1:circ(Mu))=Nt.new 280
memh(Rx++#s4:1:circ(Mu))=Rt 298
memh(Rx++#s4:1:circ(Mu))=Rt.H 298
memh(Rx++#s4:1)=Nt.new 280
memh(Rx++#s4:1)=Rt 298
memh(Rx++#s4:1)=Rt.H 298
memh(Rx++I:circ(Mu))=Nt.new 280
memh(Rx++I:circ(Mu))=Rt 298
memh(Rx++I:circ(Mu))=Rt.H 298
memh(Rx++Mu:brev)=Nt.new 280
memh(Rx++Mu:brev)=Rt 299
memh(Rx++Mu:brev)=Rt.H 299
memh(Rx++Mu)=Nt.new 280
memh(Rx++Mu)=Rt 299
memh(Rx++Mu)=Rt.H 299
Rd=memh(gp+#u16:1) 237
Rd=memh(Re=#U6) 237
Rd=memh(Rs+#s11:1) 237
Rd=memh(Rs+Rt<<#u2) 237
Rd=memh(Rt<<#u2+#U6) 237
Rd=memh(Rx++#s4:1:circ(Mu)) 237
Rd=memh(Rx++#s4:1) 237
Rd=memh(Rx++I:circ(Mu)) 237
Rd=memh(Rx++Mu:brev) 237
Rd=memh(Rx++Mu) 237

80-N2040-53 Rev. AA 634


Qualcomm Hexagon V73 Programmer’s Reference Manual

memh_fifo
Ryy=memh_fifo(Re=#U6) 234
Ryy=memh_fifo(Rs) 234
Ryy=memh_fifo(Rs+#s11:1) 234
Ryy=memh_fifo(Rt<<#u2+#U6) 234
Ryy=memh_fifo(Rx++#s4:1:circ(Mu)) 234
Ryy=memh_fifo(Rx++#s4:1) 234
Ryy=memh_fifo(Rx++I:circ(Mu)) 235
Ryy=memh_fifo(Rx++Mu:brev) 235
Ryy=memh_fifo(Rx++Mu) 235

memub
if ([!]Pt[.new]) Rd=memub(#u6) 245
if ([!]Pt[.new]) Rd=memub(Rs+#u6:0) 245
if ([!]Pt[.new]) Rd=memub(Rx++#s4:0) 245
if ([!]Pv[.new]) Rd=memub(Rs+Rt<<#u2) 245
Rd=memub(gp+#u16:0) 243
Rd=memub(Re=#U6) 243
Rd=memub(Rs+#s11:0) 243
Rd=memub(Rs+Rt<<#u2) 243
Rd=memub(Rt<<#u2+#U6) 243
Rd=memub(Rx++#s4:0:circ(Mu)) 243
Rd=memub(Rx++#s4:0) 243
Rd=memub(Rx++I:circ(Mu)) 243
Rd=memub(Rx++Mu:brev) 243
Rd=memub(Rx++Mu) 243

memubh
Rd=memubh(Re=#U6) 261
Rd=memubh(Rs+#s11:1) 262
Rd=memubh(Rt<<#u2+#U6) 262
Rd=memubh(Rx++#s4:1:circ(Mu)) 262
Rd=memubh(Rx++#s4:1) 262
Rd=memubh(Rx++I:circ(Mu)) 262
Rd=memubh(Rx++Mu:brev) 263
Rd=memubh(Rx++Mu) 262
Rdd=memubh(Re=#U6) 264
Rdd=memubh(Rs+#s11:2) 264
Rdd=memubh(Rt<<#u2+#U6) 264
Rdd=memubh(Rx++#s4:2:circ(Mu)) 265
Rdd=memubh(Rx++#s4:2) 265
Rdd=memubh(Rx++I:circ(Mu)) 265
Rdd=memubh(Rx++Mu:brev) 265
Rdd=memubh(Rx++Mu) 265

memuh
if ([!]Pt[.new]) Rd=memuh(#u6) 249
if ([!]Pt[.new]) Rd=memuh(Rs+#u6:1) 249
if ([!]Pt[.new]) Rd=memuh(Rx++#s4:1) 249
if ([!]Pv[.new]) Rd=memuh(Rs+Rt<<#u2) 249
Rd=memuh(gp+#u16:1) 247
Rd=memuh(Re=#U6) 247
Rd=memuh(Rs+#s11:1) 247
Rd=memuh(Rs+Rt<<#u2) 247
Rd=memuh(Rt<<#u2+#U6) 247
Rd=memuh(Rx++#s4:1:circ(Mu)) 247
Rd=memuh(Rx++#s4:1) 247
Rd=memuh(Rx++I:circ(Mu)) 247
Rd=memuh(Rx++Mu:brev) 247
Rd=memuh(Rx++Mu) 247

80-N2040-53 Rev. AA 635


Qualcomm Hexagon V73 Programmer’s Reference Manual

memw
if ([!]Pt[.new]) Rd=memw(#u6) 254
if ([!]Pt[.new]) Rd=memw(Rs+#u6:2) 254
if ([!]Pt[.new]) Rd=memw(Rx++#s4:2) 254
if ([!]Pv[.new]) memw(#u6)=Nt.new 286
if ([!]Pv[.new]) memw(#u6)=Rt 309
if ([!]Pv[.new]) memw(Rs+#u6:2)=#S6 309
if ([!]Pv[.new]) memw(Rs+#u6:2)=Nt.new 286
if ([!]Pv[.new]) memw(Rs+#u6:2)=Rt 309
if ([!]Pv[.new]) memw(Rs+Ru<<#u2)=Nt.new 286
if ([!]Pv[.new]) memw(Rs+Ru<<#u2)=Rt 309
if ([!]Pv[.new]) memw(Rx++#s4:2)=Nt.new 286
if ([!]Pv[.new]) memw(Rx++#s4:2)=Rt 309
if ([!]Pv[.new]) Rd=memw(Rs+Rt<<#u2) 254
memw(gp+#u16:2)=Nt.new 284
memw(gp+#u16:2)=Rt 306
memw(Re=#U6)=Nt.new 284
memw(Re=#U6)=Rt 306
memw(Rs+#s11:2)=Nt.new 284
memw(Rs+#s11:2)=Rt 306
memw(Rs+#u6:2)[+-]=#U5 271
memw(Rs+#u6:2)[+-|&]=Rt 271
memw(Rs+#u6:2)=#S8 306
memw(Rs+Ru<<#u2)=Nt.new 284
memw(Rs+Ru<<#u2)=Rt 306
memw(Ru<<#u2+#U6)=Nt.new 284
memw(Ru<<#u2+#U6)=Rt 306
memw(Rx++#s4:2:circ(Mu))=Nt.new 284
memw(Rx++#s4:2:circ(Mu))=Rt 306
memw(Rx++#s4:2)=Nt.new 284
memw(Rx++#s4:2)=Rt 306
memw(Rx++I:circ(Mu))=Nt.new 284
memw(Rx++I:circ(Mu))=Rt 306
memw(Rx++Mu:brev)=Nt.new 284
memw(Rx++Mu:brev)=Rt 306
memw(Rx++Mu)=Nt.new 284
memw(Rx++Mu)=Rt 306
Rd=memw(gp+#u16:2) 251
Rd=memw(Re=#U6) 251
Rd=memw(Rs+#s11:2) 251
Rd=memw(Rs+Rt<<#u2) 251
Rd=memw(Rt<<#u2+#U6) 251
Rd=memw(Rx++#s4:2:circ(Mu)) 251
Rd=memw(Rx++#s4:2) 251
Rd=memw(Rx++I:circ(Mu)) 251
Rd=memw(Rx++Mu:brev) 251
Rd=memw(Rx++Mu) 251

memw_aq
Rd=memw_aq(Rs) 253

memw_locked
memw_locked(Rs,Pd)=Rt 315
Rd=memw_locked(Rs) 314

memw_rl
memw_rl(Rs):at=Rt 308
memw_rl(Rs):st=Rt 308

min
Rd=min(Rt,Rs) 351
Rdd=min(Rtt,Rss) 352

80-N2040-53 Rev. AA 636


Qualcomm Hexagon V73 Programmer’s Reference Manual

minu
Rd=minu(Rt,Rs) 351
Rdd=minu(Rtt,Rss) 352

modwrap
Rd=modwrap(Rs,Rt) 353

movlen
Rd=movlen(Rs,Rtt) 242

mpy
Rd=mpy(Rs,Rt.H):<<1:rnd:sat 512
Rd=mpy(Rs,Rt.H):<<1:sat 512
Rd=mpy(Rs,Rt.L):<<1:rnd:sat 512
Rd=mpy(Rs,Rt.L):<<1:sat 512
Rd=mpy(Rs,Rt) 512
Rd=mpy(Rs,Rt):<<1 512
Rd=mpy(Rs,Rt):<<1:sat 512
Rd=mpy(Rs,Rt):rnd 512
Rd=mpy(Rs.[HL],Rt.[HL])[:<<1][:rnd][:sat] 496
Rdd=mpy(Rs,Rt) 515
Rdd=mpy(Rs.[HL],Rt.[HL])[:<<1][:rnd] 496
Rx+=mpy(Rs,Rt):<<1:sat 512
Rx+=mpy(Rs.[HL],Rt.[HL])[:<<1][:sat] 496
Rx-=mpy(Rs,Rt):<<1:sat 512
Rx-=mpy(Rs.[HL],Rt.[HL])[:<<1][:sat] 496
Rxx[+-]=mpy(Rs,Rt) 515
Rxx+=mpy(Rs.[HL],Rt.[HL])[:<<1] 496
Rxx-=mpy(Rs.[HL],Rt.[HL])[:<<1] 496

mpyi
Rd=+mpyi(Rs,#u8) 484
Rd=mpyi(Rs,#m9) 485
Rd=-mpyi(Rs,#u8) 484
Rd=mpyi(Rs,Rt) 485
Rx+=mpyi(Rs,#u8) 485
Rx+=mpyi(Rs,Rt) 485
Rx-=mpyi(Rs,#u8) 485
Rx-=mpyi(Rs,Rt) 485

mpysu
Rd=mpysu(Rs,Rt) 512

mpyu
Rd=mpyu(Rs,Rt) 512
Rd=mpyu(Rs.[HL],Rt.[HL])[:<<1] 503
Rdd=mpyu(Rs,Rt) 515
Rdd=mpyu(Rs.[HL],Rt.[HL])[:<<1] 503
Rx+=mpyu(Rs.[HL],Rt.[HL])[:<<1] 503
Rx-=mpyu(Rs.[HL],Rt.[HL])[:<<1] 503
Rxx[+-]=mpyu(Rs,Rt) 515
Rxx+=mpyu(Rs.[HL],Rt.[HL])[:<<1] 503
Rxx-=mpyu(Rs.[HL],Rt.[HL])[:<<1] 503

mpyui
Rd=mpyui(Rs,Rt) 485

mux
Rd=mux(Pu,#s8,#S8) 174
Rd=mux(Pu,#s8,Rs) 174
Rd=mux(Pu,Rs,#s8) 174
Rd=mux(Pu,Rs,Rt) 174

80-N2040-53 Rev. AA 637


Qualcomm Hexagon V73 Programmer’s Reference Manual

N
neg
Rd=neg(Rs) 160
Rd=neg(Rs):sat 354
Rdd=neg(Rss) 354

no mnemonic
Cd=Rs 205
Cdd=Rss 205
if ([!]Pu[.new]) Rd=#s12 188
if ([!]Pu[.new]) Rd=Rs 188
if ([!]Pu[.new]) Rdd=Rss 188
Pd=Ps 203
Pd=Rs 577
Rd=#s16 165
Rd=Cs 205
Rd=Ps 577
Rd=Rs 167
Rdd=#s8 165
Rdd=Css 205
Rdd=Rss 167
Rx.[HL]=#u16 165

nop
nop 161

normamt
Rd=normamt(Rs) 409
Rd=normamt(Rss) 409

not
Pd=not(Ps) 203
Rd=not(Rs) 158
Rdd=not(Rss) 343

O
or
if ([!]Pu[.new]) Rd=or(Rs,Rt) 183
Pd=and(Ps,or(Pt,[!]Pu)) 203
Pd=or(Ps,and(Pt,[!]Pu)) 203
Pd=or(Ps,or(Pt,[!]Pu)) 203
Pd=or(Pt,[!]Ps) 203
Rd=or(Rs,#s10) 158
Rd=or(Rs,Rt) 158
Rd=or(Rt,~Rs) 158
Rdd=or(Rss,Rtt) 343
Rdd=or(Rtt,~Rss) 343
Rx[&|^]=or(Rs,Rt) 346
Rx=or(Ru,and(Rx,#s10)) 346
Rx|=or(Rs,#s10) 346

P
packhl
Rdd=packhl(Rs,Rt) 177

parity
Rd=parity(Rs,Rt) 420
Rd=parity(Rss,Rtt) 420

pause
pause(#u10) 328

80-N2040-53 Rev. AA 638


Qualcomm Hexagon V73 Programmer’s Reference Manual

pc
Rd=add(pc,#u6) 200

pmemcpy
Rdd=pmemcpy(Rx,Rtt) 241, 242

pmpyw
Rdd=pmpyw(Rs,Rt) 508
Rxx^=pmpyw(Rs,Rt) 508

popcount
Rd=popcount(Rss) 411

R
release
release(Rs):at 305
release(Rs):st 305

rol
Rd=rol(Rs,#u5) 589
Rdd=rol(Rss,#u6) 589
Rx[&|]=rol(Rs,#u5) 595
Rx[+-]=rol(Rs,#u5) 591
Rx^=rol(Rs,#u5) 595
Rxx[&|]=rol(Rss,#u6) 596
Rxx[+-]=rol(Rss,#u6) 591
Rxx^=rol(Rss,#u6) 596

round
Rd=round(Rs,#u5)[:sat] 355
Rd=round(Rs,Rt)[:sat] 355
Rd=round(Rss):sat 355

S
sat
Rd=sat(Rss) 542

satb
Rd=satb(Rs) 542

sath
Rd=sath(Rs) 542

satub
Rd=satub(Rs) 542

satuh
Rd=satuh(Rs) 542

setbit
memb(Rs+#u6:0)=setbit(#U5) 268
memh(Rs+#u6:1)=setbit(#U5) 270
memw(Rs+#u6:2)=setbit(#U5) 271
Rd=setbit(Rs,#u5) 422
Rd=setbit(Rs,Rt) 422

sfadd
Rd=sfadd(Rs,Rt) 461

sfclass
Pd=sfclass(Rs,#u5) 462

80-N2040-53 Rev. AA 639


Qualcomm Hexagon V73 Programmer’s Reference Manual

sfcmp.eq
Pd=sfcmp.eq(Rs,Rt) 464

sfcmp.ge
Pd=sfcmp.ge(Rs,Rt) 464

sfcmp.gt
Pd=sfcmp.gt(Rs,Rt) 464

sfcmp.uo
Pd=sfcmp.uo(Rs,Rt) 464

sffixupd
Rd=sffixupd(Rs,Rt) 472

sffixupn
Rd=sffixupn(Rs,Rt) 472

sffixupr
Rd=sffixupr(Rs) 472

sfinvsqrta
Rd,Pe=sfinvsqrta(Rs) 475

sfmake
Rd=sfmake(#u10):neg 478
Rd=sfmake(#u10):pos 478

sfmax
Rd=sfmax(Rs,Rt) 479

sfmin
Rd=sfmin(Rs,Rt) 480

sfmpy
Rd=sfmpy(Rs,Rt) 481
Rx+=sfmpy(Rs,Rt,Pu):scale 474
Rx+=sfmpy(Rs,Rt) 473
Rx+=sfmpy(Rs,Rt):lib 476
Rx-=sfmpy(Rs,Rt) 473
Rx-=sfmpy(Rs,Rt):lib 476

sfrecipa
Rd,Pe=sfrecipa(Rs,Rt) 482

sfsub
Rd=sfsub(Rs,Rt) 483

shuffeb
Rdd=shuffeb(Rss,Rtt) 555

shuffeh
Rdd=shuffeh(Rss,Rtt) 555

shuffob
Rdd=shuffob(Rtt,Rss) 555

shuffoh
Rdd=shuffoh(Rtt,Rss) 555

sp1loop0
p3=sp1loop0(#r7:2,#U10) 201
p3=sp1loop0(#r7:2,Rs) 201

80-N2040-53 Rev. AA 640


Qualcomm Hexagon V73 Programmer’s Reference Manual

sp2loop0
p3=sp2loop0(#r7:2,#U10) 201
p3=sp2loop0(#r7:2,Rs) 201

sp3loop0
p3=sp3loop0(#r7:2,#U10) 201
p3=sp3loop0(#r7:2,Rs) 201

sub
if ([!]Pu[.new]) Rd=sub(Rt,Rs) 185
Rd=add(Rs,sub(#s6,Ru)) 335
Rd=sub(#s10,Rs) 162
Rd=sub(Rt,Rs) 162
Rd=sub(Rt,Rs):sat 162
Rd=sub(Rt,Rs):sat:deprecated 358
Rd=sub(Rt.[HL],Rs.[HL])[:sat]:<<16 360
Rd=sub(Rt.L,Rs.[HL])[:sat] 360
Rdd=sub(Rss,Rtt,Px):carry 341
Rdd=sub(Rtt,Rss) 358
Rx+=sub(Rt,Rs) 359

swiz
Rd=swiz(Rs) 544

sxtb
if ([!]Pu[.new]) Rd=sxtb(Rs) 186
Rd=sxtb(Rs) 164

sxth
if ([!]Pu[.new]) Rd=sxth(Rs) 186
Rd=sxth(Rs) 164

sxtw
Rdd=sxtw(Rs) 362

syncht
syncht 329

T
tableidxb
Rx=tableidxb(Rs,#u4,#S6):raw 426
Rx=tableidxb(Rs,#u4,#U5) 426

tableidxd
Rx=tableidxd(Rs,#u4,#S6):raw 426
Rx=tableidxd(Rs,#u4,#U5) 427

tableidxh
Rx=tableidxh(Rs,#u4,#S6):raw 427
Rx=tableidxh(Rs,#u4,#U5) 427

tableidxw
Rx=tableidxw(Rs,#u4,#S6):raw 427
Rx=tableidxw(Rs,#u4,#U5) 427

tlbmatch
Pd=tlbmatch(Rss,Rt) 576

togglebit
Rd=togglebit(Rs,#u5) 422
Rd=togglebit(Rs,Rt) 422

80-N2040-53 Rev. AA 641


Qualcomm Hexagon V73 Programmer’s Reference Manual

trace
trace(Rs) 330

trap0
trap0(#u8) 331

trap1
trap1(#u8) 331
trap1(Rx,#u8) 331

tstbit
if ([!]tstbit(Ns.new,#0)) jump:<hint> #r9:2 273
p[01]=tstbit(Rs,#0) 214
Pd=[!]tstbit(Rs,#u5) 578
Pd=[!]tstbit(Rs,Rt) 578

U
unpause
unpause 332

V
vabsdiffb
Rdd=vabsdiffb(Rtt,Rss) 365

vabsdiffh
Rdd=vabsdiffh(Rtt,Rss) 366

vabsdiffub
Rdd=vabsdiffub(Rtt,Rss) 365

vabsdiffw
Rdd=vabsdiffw(Rtt,Rss) 367

vabsh
Rdd=vabsh(Rss) 363
Rdd=vabsh(Rss):sat 363

vabsw
Rdd=vabsw(Rss) 364
Rdd=vabsw(Rss):sat 364

vacsh
Rxx,Pe=vacsh(Rss,Rtt) 369

vaddb
Rdd=vaddb(Rss,Rtt) 378

vaddh
Rd=vaddh(Rs,Rt)[:sat] 168
Rdd=vaddh(Rss,Rtt)[:sat] 371

vaddhub
Rd=vaddhub(Rss,Rtt):sat 373

vaddub
Rdd=vaddub(Rss,Rtt)[:sat] 378

vadduh
Rd=vadduh(Rs,Rt):sat 168
Rdd=vadduh(Rss,Rtt):sat 371

80-N2040-53 Rev. AA 642


Qualcomm Hexagon V73 Programmer’s Reference Manual

vaddw
Rdd=vaddw(Rss,Rtt)[:sat] 379

valignb
Rdd=valignb(Rtt,Rss,#u3) 545
Rdd=valignb(Rtt,Rss,Pu) 545

vaslh
Rdd=vaslh(Rss,#u4) 611
Rdd=vaslh(Rss,Rt) 615

vaslw
Rdd=vaslw(Rss,#u5) 617
Rdd=vaslw(Rss,Rt) 618

vasrh
Rdd=vasrh(Rss,#u4) 611
Rdd=vasrh(Rss,#u4):raw 612
Rdd=vasrh(Rss,#u4):rnd 612
Rdd=vasrh(Rss,Rt) 615

vasrhub
Rd=vasrhub(Rss,#u4):raw 613
Rd=vasrhub(Rss,#u4):rnd:sat 613
Rd=vasrhub(Rss,#u4):sat 613

vasrw
Rd=vasrw(Rss,#u5) 620
Rd=vasrw(Rss,Rt) 620
Rdd=vasrw(Rss,#u5) 617
Rdd=vasrw(Rss,Rt) 618

vavgh
Rd=vavgh(Rs,Rt) 169
Rd=vavgh(Rs,Rt):rnd 169
Rdd=vavgh(Rss,Rtt) 380
Rdd=vavgh(Rss,Rtt):crnd 380
Rdd=vavgh(Rss,Rtt):rnd 380

vavgub
Rdd=vavgub(Rss,Rtt) 382
Rdd=vavgub(Rss,Rtt):rnd 382

vavguh
Rdd=vavguh(Rss,Rtt) 380
Rdd=vavguh(Rss,Rtt):rnd 380

vavguw
Rdd=vavguw(Rss,Rtt)[:rnd] 383

vavgw
Rdd=vavgw(Rss,Rtt):crnd 383
Rdd=vavgw(Rss,Rtt)[:rnd] 383

vclip
Rdd=vclip(Rss,#u5) 385

vcmpb.eq
Pd=!any8(vcmpb.eq(Rss,Rtt)) 581
Pd=any8(vcmpb.eq(Rss,Rtt)) 581
Pd=vcmpb.eq(Rss,#u8) 582
Pd=vcmpb.eq(Rss,Rtt) 582

80-N2040-53 Rev. AA 643


Qualcomm Hexagon V73 Programmer’s Reference Manual

vcmpb.gt
Pd=vcmpb.gt(Rss,#s8) 582
Pd=vcmpb.gt(Rss,Rtt) 582

vcmpb.gtu
Pd=vcmpb.gtu(Rss,#u7) 582
Pd=vcmpb.gtu(Rss,Rtt) 582

vcmph.eq
Pd=vcmph.eq(Rss,#s8) 579
Pd=vcmph.eq(Rss,Rtt) 579

vcmph.gt
Pd=vcmph.gt(Rss,#s8) 579
Pd=vcmph.gt(Rss,Rtt) 579

vcmph.gtu
Pd=vcmph.gtu(Rss,#u7) 579
Pd=vcmph.gtu(Rss,Rtt) 579

vcmpw.eq
Pd=vcmpw.eq(Rss,#s8) 584
Pd=vcmpw.eq(Rss,Rtt) 584

vcmpw.gt
Pd=vcmpw.gt(Rss,#s8) 584
Pd=vcmpw.gt(Rss,Rtt) 584

vcmpw.gtu
Pd=vcmpw.gtu(Rss,#u7) 584
Pd=vcmpw.gtu(Rss,Rtt) 584

vcmpyi
Rdd=vcmpyi(Rss,Rtt)[:<<1]:sat 447
Rxx+=vcmpyi(Rss,Rtt):sat 448

vcmpyr
Rdd=vcmpyr(Rss,Rtt)[:<<1]:sat 447
Rxx+=vcmpyr(Rss,Rtt):sat 448

vcnegh
Rdd=vcnegh(Rss,Rt) 386

vconj
Rdd=vconj(Rss):sat 450

vcrotate
Rdd=vcrotate(Rss,Rt) 451

vdmpy
Rd=vdmpy(Rss,Rtt)[:<<1]:rnd:sat 520
Rdd=vdmpy(Rss,Rtt):<<1:sat 517
Rdd=vdmpy(Rss,Rtt):sat 517
Rxx+=vdmpy(Rss,Rtt):<<1:sat 518
Rxx+=vdmpy(Rss,Rtt):sat 518

vdmpybsu
Rdd=vdmpybsu(Rss,Rtt):sat 524
Rxx+=vdmpybsu(Rss,Rtt):sat 524

vitpack
Rd=vitpack(Ps,Pt) 586

80-N2040-53 Rev. AA 644


Qualcomm Hexagon V73 Programmer’s Reference Manual

vlslh
Rdd=vlslh(Rss,Rt) 615

vlslw
Rdd=vlslw(Rss,Rt) 618

vlsrh
Rdd=vlsrh(Rss,#u4) 611
Rdd=vlsrh(Rss,Rt) 615

vlsrw
Rdd=vlsrw(Rss,#u5) 617
Rdd=vlsrw(Rss,Rt) 618

vmaxb
Rdd=vmaxb(Rtt,Rss) 388

vmaxh
Rdd=vmaxh(Rtt,Rss) 389

vmaxub
Rdd=vmaxub(Rtt,Rss) 388

vmaxuh
Rdd=vmaxuh(Rtt,Rss) 389

vmaxuw
Rdd=vmaxuw(Rtt,Rss) 394

vmaxw
Rdd=vmaxw(Rtt,Rss) 394

vminb
Rdd=vminb(Rtt,Rss) 395

vminh
Rdd=vminh(Rtt,Rss) 397

vminub
Rdd,Pe=vminub(Rtt,Rss) 395
Rdd=vminub(Rtt,Rss) 395

vminuh
Rdd=vminuh(Rtt,Rss) 397

vminuw
Rdd=vminuw(Rtt,Rss) 402

vminw
Rdd=vminw(Rtt,Rss) 402

vmpybsu
Rdd=vmpybsu(Rs,Rt) 536
Rxx+=vmpybsu(Rs,Rt) 536

vmpybu
Rdd=vmpybu(Rs,Rt) 536
Rxx+=vmpybu(Rs,Rt) 536

80-N2040-53 Rev. AA 645


Qualcomm Hexagon V73 Programmer’s Reference Manual

vmpyeh
Rdd=vmpyeh(Rss,Rtt):<<1:sat 526
Rdd=vmpyeh(Rss,Rtt):sat 526
Rxx+=vmpyeh(Rss,Rtt) 526
Rxx+=vmpyeh(Rss,Rtt):<<1:sat 526
Rxx+=vmpyeh(Rss,Rtt):sat 526

vmpyh
Rd=vmpyh(Rs,Rt)[:<<1]:rnd:sat 530
Rdd=vmpyh(Rs,Rt)[:<<1]:sat 528
Rxx+=vmpyh(Rs,Rt) 528
Rxx+=vmpyh(Rs,Rt)[:<<1]:sat 528

vmpyhsu
Rdd=vmpyhsu(Rs,Rt)[:<<1]:sat 532
Rxx+=vmpyhsu(Rs,Rt)[:<<1]:sat 532

vmpyweh
Rdd=vmpyweh(Rss,Rtt)[:<<1]:rnd:sat 488
Rdd=vmpyweh(Rss,Rtt)[:<<1]:sat 489
Rxx+=vmpyweh(Rss,Rtt)[:<<1]:rnd:sat 489
Rxx+=vmpyweh(Rss,Rtt)[:<<1]:sat 489

vmpyweuh
Rdd=vmpyweuh(Rss,Rtt)[:<<1]:rnd:sat 492
Rdd=vmpyweuh(Rss,Rtt)[:<<1]:sat 493
Rxx+=vmpyweuh(Rss,Rtt)[:<<1]:rnd:sat 493
Rxx+=vmpyweuh(Rss,Rtt)[:<<1]:sat 493

vmpywoh
Rdd=vmpywoh(Rss,Rtt)[:<<1]:rnd:sat 489
Rdd=vmpywoh(Rss,Rtt)[:<<1]:sat 489
Rxx+=vmpywoh(Rss,Rtt)[:<<1]:rnd:sat 489
Rxx+=vmpywoh(Rss,Rtt)[:<<1]:sat 489

vmpywouh
Rdd=vmpywouh(Rss,Rtt)[:<<1]:rnd:sat 493
Rdd=vmpywouh(Rss,Rtt)[:<<1]:sat 493
Rxx+=vmpywouh(Rss,Rtt)[:<<1]:rnd:sat 493
Rxx+=vmpywouh(Rss,Rtt)[:<<1]:sat 493

vmux
Rdd=vmux(Pu,Rss,Rtt) 587

vnavgh
Rd=vnavgh(Rt,Rs) 169
Rdd=vnavgh(Rtt,Rss) 380
Rdd=vnavgh(Rtt,Rss):crnd:sat 380
Rdd=vnavgh(Rtt,Rss):rnd:sat 380

vnavgw
Rdd=vnavgw(Rtt,Rss) 383
Rdd=vnavgw(Rtt,Rss):crnd:sat 383
Rdd=vnavgw(Rtt,Rss):rnd:sat 383

vpmpyh
Rdd=vpmpyh(Rs,Rt) 538
Rxx^=vpmpyh(Rs,Rt) 539

vraddh
Rd=vraddh(Rss,Rtt) 376

80-N2040-53 Rev. AA 646


Qualcomm Hexagon V73 Programmer’s Reference Manual

vraddub
Rdd=vraddub(Rss,Rtt) 374
Rxx+=vraddub(Rss,Rtt) 374

vradduh
Rd=vradduh(Rss,Rtt) 376

vrcmpys
Rd=vrcmpys(Rss,Rt):<<1:rnd:sat 456
Rd=vrcmpys(Rss,Rtt):<<1:rnd:sat:raw:hi 456
Rd=vrcmpys(Rss,Rtt):<<1:rnd:sat:raw:lo 457
Rdd=vrcmpys(Rss,Rt):<<1:sat 453
Rdd=vrcmpys(Rss,Rtt):<<1:sat:raw:hi 453
Rdd=vrcmpys(Rss,Rtt):<<1:sat:raw:lo 454
Rxx+=vrcmpys(Rss,Rt):<<1:sat 454
Rxx+=vrcmpys(Rss,Rtt):<<1:sat:raw:hi 454
Rxx+=vrcmpys(Rss,Rtt):<<1:sat:raw:lo 454

vrcnegh
Rxx+=vrcnegh(Rss,Rt) 386

vrcrotate
Rdd=vrcrotate(Rss,Rt,#u2) 459
Rxx+=vrcrotate(Rss,Rt,#u2) 459

vrmaxh
Rxx=vrmaxh(Rss,Ru) 390

vrmaxuh
Rxx=vrmaxuh(Rss,Ru) 390

vrmaxuw
Rxx=vrmaxuw(Rss,Ru) 392

vrmaxw
Rxx=vrmaxw(Rss,Ru) 392

vrminh
Rxx=vrminh(Rss,Ru) 398

vrminuh
Rxx=vrminuh(Rss,Ru) 398

vrminuw
Rxx=vrminuw(Rss,Ru) 400

vrminw
Rxx=vrminw(Rss,Ru) 400

vrmpybsu
Rdd=vrmpybsu(Rss,Rtt) 522
Rxx+=vrmpybsu(Rss,Rtt) 522

vrmpybu
Rdd=vrmpybu(Rss,Rtt) 522
Rxx+=vrmpybu(Rss,Rtt) 522

vrmpyh
Rdd=vrmpyh(Rss,Rtt) 534
Rxx+=vrmpyh(Rss,Rtt) 534

80-N2040-53 Rev. AA 647


Qualcomm Hexagon V73 Programmer’s Reference Manual

vrmpyweh
Rdd=vrmpyweh(Rss,Rtt)[:<<1] 510
Rxx+=vrmpyweh(Rss,Rtt)[:<<1] 510

vrmpywoh
Rdd=vrmpywoh(Rss,Rtt)[:<<1] 510
Rxx+=vrmpywoh(Rss,Rtt)[:<<1] 510

vrndwh
Rd=vrndwh(Rss) 547
Rd=vrndwh(Rss):sat 547

vrsadub
Rdd=vrsadub(Rss,Rtt) 403
Rxx+=vrsadub(Rss,Rtt) 403

vsathb
Rd=vsathb(Rs) 550
Rd=vsathb(Rss) 550
Rdd=vsathb(Rss) 553

vsathub
Rd=vsathub(Rs) 550
Rd=vsathub(Rss) 550
Rdd=vsathub(Rss) 553

vsatwh
Rd=vsatwh(Rss) 550
Rdd=vsatwh(Rss) 553

vsatwuh
Rd=vsatwuh(Rss) 550
Rdd=vsatwuh(Rss) 553

vsplatb
Rd=vsplatb(Rs) 557
Rdd=vsplatb(Rs) 557

vsplath
Rdd=vsplath(Rs) 558

vspliceb
Rdd=vspliceb(Rss,Rtt,#u3) 559
Rdd=vspliceb(Rss,Rtt,Pu) 559

vsubb
Rdd=vsubb(Rss,Rtt) 407

vsubh
Rd=vsubh(Rt,Rs)[:sat] 170
Rdd=vsubh(Rtt,Rss)[:sat] 405

vsubub
Rdd=vsubub(Rtt,Rss)[:sat] 407

vsubuh
Rd=vsubuh(Rt,Rs):sat 170
Rdd=vsubuh(Rtt,Rss):sat 405

vsubw
Rdd=vsubw(Rtt,Rss)[:sat] 408

80-N2040-53 Rev. AA 648


Qualcomm Hexagon V73 Programmer’s Reference Manual

vsxtbh
Rdd=vsxtbh(Rs) 561

vsxthw
Rdd=vsxthw(Rs) 561

vtrunehb
Rd=vtrunehb(Rss) 563
Rdd=vtrunehb(Rss,Rtt) 563

vtrunewh
Rdd=vtrunewh(Rss,Rtt) 563

vtrunohb
Rd=vtrunohb(Rss) 563
Rdd=vtrunohb(Rss,Rtt) 563

vtrunowh
Rdd=vtrunowh(Rss,Rtt) 564

vxaddsubh
Rdd=vxaddsubh(Rss,Rtt):rnd:>>1:sat 429
Rdd=vxaddsubh(Rss,Rtt):sat 429

vxaddsubw
Rdd=vxaddsubw(Rss,Rtt):sat 431

vxsubaddh
Rdd=vxsubaddh(Rss,Rtt):rnd:>>1:sat 429
Rdd=vxsubaddh(Rss,Rtt):sat 429

vxsubaddw
Rdd=vxsubaddw(Rss,Rtt):sat 431

vzxtbh
Rdd=vzxtbh(Rs) 565

vzxthw
Rdd=vzxthw(Rs) 565

X
xor
if ([!]Pu[.new]) Rd=xor(Rs,Rt) 183
Pd=xor(Ps,Pt) 203
Rd=xor(Rs,Rt) 158
Rdd=xor(Rss,Rtt) 343
Rx[&|^]=xor(Rs,Rt) 346
Rxx^=xor(Rss,Rtt) 345

Z
zxtb
if ([!]Pu[.new]) Rd=zxtb(Rs) 189
Rd=zxtb(Rs) 171

zxth
if ([!]Pu[.new]) Rd=zxth(Rs) 189
Rd=zxth(Rs) 171

80-N2040-53 Rev. AA 649


Intrinsics Index
A
abs
Rd=abs(Rs)
Word32 Q6_R_abs_R(Word32 Rs) 334
Rd=abs(Rs):sat
Word32 Q6_R_abs_R_sat(Word32 Rs) 334
Rdd=abs(Rss)
Word64 Q6_P_abs_P(Word64 Rss) 333

add
Rd=add(#u6,mpyi(Rs,#U6))
Word32 Q6_R_add_mpyi_IRI(Word32 Iu6, Word32 Rs, Word32 IU6) 486
Rd=add(#u6,mpyi(Rs,Rt))
Word32 Q6_R_add_mpyi_IRR(Word32 Iu6, Word32 Rs, Word32 Rt) 486
Rd=add(Rs,#s16)
Word32 Q6_R_add_RI(Word32 Rs, Word32 Is16) 156
Rd=add(Rs,add(Ru,#s6))
Word32 Q6_R_add_add_RRI(Word32 Rs, Word32 Ru, Word32 Is6) 335
Rd=add(Rs,Rt)
Word32 Q6_R_add_RR(Word32 Rs, Word32 Rt) 156
Rd=add(Rs,Rt):sat
Word32 Q6_R_add_RR_sat(Word32 Rs, Word32 Rt) 156
Rd=add(Rt.H,Rs.H):<<16
Word32 Q6_R_add_RhRh_s16(Word32 Rt, Word32 Rs) 340
Rd=add(Rt.H,Rs.H):sat:<<16
Word32 Q6_R_add_RhRh_sat_s16(Word32 Rt, Word32 Rs) 340
Rd=add(Rt.H,Rs.L):<<16
Word32 Q6_R_add_RhRl_s16(Word32 Rt, Word32 Rs) 340
Rd=add(Rt.H,Rs.L):sat:<<16
Word32 Q6_R_add_RhRl_sat_s16(Word32 Rt, Word32 Rs) 340
Rd=add(Rt.L,Rs.H)
Word32 Q6_R_add_RlRh(Word32 Rt, Word32 Rs) 340
Rd=add(Rt.L,Rs.H):<<16
Word32 Q6_R_add_RlRh_s16(Word32 Rt, Word32 Rs) 340
Rd=add(Rt.L,Rs.H):sat
Word32 Q6_R_add_RlRh_sat(Word32 Rt, Word32 Rs) 340
Rd=add(Rt.L,Rs.H):sat:<<16
Word32 Q6_R_add_RlRh_sat_s16(Word32 Rt, Word32 Rs) 340
Rd=add(Rt.L,Rs.L)
Word32 Q6_R_add_RlRl(Word32 Rt, Word32 Rs) 340
Rd=add(Rt.L,Rs.L):<<16
Word32 Q6_R_add_RlRl_s16(Word32 Rt, Word32 Rs) 340
Rd=add(Rt.L,Rs.L):sat
Word32 Q6_R_add_RlRl_sat(Word32 Rt, Word32 Rs) 340
Rd=add(Rt.L,Rs.L):sat:<<16
Word32 Q6_R_add_RlRl_sat_s16(Word32 Rt, Word32 Rs) 340
Rd=add(Ru,mpyi(#u6:2,Rs))
Word32 Q6_R_add_mpyi_RIR(Word32 Ru, Word32 Iu6_2, Word32 Rs) 486
Rd=add(Ru,mpyi(Rs,#u6))
Word32 Q6_R_add_mpyi_RRI(Word32 Ru, Word32 Rs, Word32 Iu6) 486
Rdd=add(Rs,Rtt)

80-N2040-53 Rev. AA 650


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word64 Q6_P_add_RP(Word32 Rs, Word64 Rtt) 338


Rdd=add(Rss,Rtt)
Word64 Q6_P_add_PP(Word64 Rss, Word64 Rtt) 338
Rdd=add(Rss,Rtt):sat
Word64 Q6_P_add_PP_sat(Word64 Rss, Word64 Rtt) 338
Rx+=add(Rs,#s8)
Word32 Q6_R_addacc_RI(Word32 Rx, Word32 Rs, Word32 Is8) 335
Rx+=add(Rs,Rt)
Word32 Q6_R_addacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 335
Rx-=add(Rs,#s8)
Word32 Q6_R_addnac_RI(Word32 Rx, Word32 Rs, Word32 Is8) 335
Rx-=add(Rs,Rt)
Word32 Q6_R_addnac_RR(Word32 Rx, Word32 Rs, Word32 Rt) 335
Ry=add(Ru,mpyi(Ry,Rs))
Word32 Q6_R_add_mpyi_RRR(Word32 Ru, Word32 Ry, Word32 Rs) 486

addasl
Rd=addasl(Rt,Rs,#u3)
Word32 Q6_R_addasl_RRI(Word32 Rt, Word32 Rs, Word32 Iu3) 594

all8
Pd=all8(Ps)
Byte Q6_p_all8_p(Byte Ps) 197

and
Pd=and(Ps,and(Pt,!Pu))
Byte Q6_p_and_and_ppnp(Byte Ps, Byte Pt, Byte Pu) 203
Pd=and(Ps,and(Pt,Pu))
Byte Q6_p_and_and_ppp(Byte Ps, Byte Pt, Byte Pu) 203
Pd=and(Pt,!Ps)
Byte Q6_p_and_pnp(Byte Pt, Byte Ps) 203
Pd=and(Pt,Ps)
Byte Q6_p_and_pp(Byte Pt, Byte Ps) 203
Rd=and(Rs,#s10)
Word32 Q6_R_and_RI(Word32 Rs, Word32 Is10) 158
Rd=and(Rs,Rt)
Word32 Q6_R_and_RR(Word32 Rs, Word32 Rt) 158
Rd=and(Rt,~Rs)
Word32 Q6_R_and_RnR(Word32 Rt, Word32 Rs) 158
Rdd=and(Rss,Rtt)
Word64 Q6_P_and_PP(Word64 Rss, Word64 Rtt) 343
Rdd=and(Rtt,~Rss)
Word64 Q6_P_and_PnP(Word64 Rtt, Word64 Rss) 343
Rx&=and(Rs,~Rt)
Word32 Q6_R_andand_RnR(Word32 Rx, Word32 Rs, Word32 Rt) 347
Rx&=and(Rs,Rt)
Word32 Q6_R_andand_RR(Word32 Rx, Word32 Rs, Word32 Rt) 347
Rx^=and(Rs,~Rt)
Word32 Q6_R_andxacc_RnR(Word32 Rx, Word32 Rs, Word32 Rt) 347
Rx^=and(Rs,Rt)
Word32 Q6_R_andxacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 347
Rx|=and(Rs,#s10)
Word32 Q6_R_andor_RI(Word32 Rx, Word32 Rs, Word32 Is10) 347
Rx|=and(Rs,~Rt)
Word32 Q6_R_andor_RnR(Word32 Rx, Word32 Rs, Word32 Rt) 347
Rx|=and(Rs,Rt)
Word32 Q6_R_andor_RR(Word32 Rx, Word32 Rs, Word32 Rt) 347

80-N2040-53 Rev. AA 651


Qualcomm Hexagon V73 Programmer’s Reference Manual

any8
Pd=any8(Ps)
Byte Q6_p_any8_p(Byte Ps) 197

asl
Rd=asl(Rs,#u5)
Word32 Q6_R_asl_RI(Word32 Rs, Word32 Iu5) 589
Rd=asl(Rs,#u5):sat
Word32 Q6_R_asl_RI_sat(Word32 Rs, Word32 Iu5) 600
Rd=asl(Rs,Rt)
Word32 Q6_R_asl_RR(Word32 Rs, Word32 Rt) 602
Rd=asl(Rs,Rt):sat
Word32 Q6_R_asl_RR_sat(Word32 Rs, Word32 Rt) 610
Rdd=asl(Rss,#u6)
Word64 Q6_P_asl_PI(Word64 Rss, Word32 Iu6) 590
Rdd=asl(Rss,Rt)
Word64 Q6_P_asl_PR(Word64 Rss, Word32 Rt) 602
Rx&=asl(Rs,#u5)
Word32 Q6_R_asland_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 596
Rx&=asl(Rs,Rt)
Word32 Q6_R_asland_RR(Word32 Rx, Word32 Rs, Word32 Rt) 608
Rx^=asl(Rs,#u5)
Word32 Q6_R_aslxacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 596
Rx+=asl(Rs,#u5)
Word32 Q6_R_aslacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 592
Rx+=asl(Rs,Rt)
Word32 Q6_R_aslacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 605
Rx=add(#u8,asl(Rx,#U5))
Word32 Q6_R_add_asl_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) 592
Rx=and(#u8,asl(Rx,#U5))
Word32 Q6_R_and_asl_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) 596
Rx-=asl(Rs,#u5)
Word32 Q6_R_aslnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 592
Rx-=asl(Rs,Rt)
Word32 Q6_R_aslnac_RR(Word32 Rx, Word32 Rs, Word32 Rt) 605
Rx=or(#u8,asl(Rx,#U5))
Word32 Q6_R_or_asl_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) 596
Rx=sub(#u8,asl(Rx,#U5))
Word32 Q6_R_sub_asl_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) 592
Rx|=asl(Rs,#u5)
Word32 Q6_R_aslor_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 596
Rx|=asl(Rs,Rt)
Word32 Q6_R_aslor_RR(Word32 Rx, Word32 Rs, Word32 Rt) 608
Rxx&=asl(Rss,#u6)
Word64 Q6_P_asland_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 596
Rxx&=asl(Rss,Rt)
Word64 Q6_P_asland_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 608
Rxx^=asl(Rss,#u6)
Word64 Q6_P_aslxacc_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 596
Rxx^=asl(Rss,Rt)
Word64 Q6_P_aslxacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 608
Rxx+=asl(Rss,#u6)
Word64 Q6_P_aslacc_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 592
Rxx+=asl(Rss,Rt)
Word64 Q6_P_aslacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 605
Rxx-=asl(Rss,#u6)
Word64 Q6_P_aslnac_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 592
Rxx-=asl(Rss,Rt)

80-N2040-53 Rev. AA 652


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word64 Q6_P_aslnac_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 605


Rxx|=asl(Rss,#u6)
Word64 Q6_P_aslor_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 597
Rxx|=asl(Rss,Rt)
Word64 Q6_P_aslor_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 609

aslh
Rd=aslh(Rs)
Word32 Q6_R_aslh_R(Word32 Rs) 176

asr
Rd=asr(Rs,#u5)
Word32 Q6_R_asr_RI(Word32 Rs, Word32 Iu5) 589
Rd=asr(Rs,#u5):rnd
Word32 Q6_R_asr_RI_rnd(Word32 Rs, Word32 Iu5) 599
Rd=asr(Rs,Rt)
Word32 Q6_R_asr_RR(Word32 Rs, Word32 Rt) 602
Rd=asr(Rs,Rt):sat
Word32 Q6_R_asr_RR_sat(Word32 Rs, Word32 Rt) 610
Rdd=asr(Rss,#u6)
Word64 Q6_P_asr_PI(Word64 Rss, Word32 Iu6) 590
Rdd=asr(Rss,#u6):rnd
Word64 Q6_P_asr_PI_rnd(Word64 Rss, Word32 Iu6) 599
Rdd=asr(Rss,Rt)
Word64 Q6_P_asr_PR(Word64 Rss, Word32 Rt) 602
Rx&=asr(Rs,#u5)
Word32 Q6_R_asrand_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 596
Rx&=asr(Rs,Rt)
Word32 Q6_R_asrand_RR(Word32 Rx, Word32 Rs, Word32 Rt) 608
Rx+=asr(Rs,#u5)
Word32 Q6_R_asracc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 592
Rx+=asr(Rs,Rt)
Word32 Q6_R_asracc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 605
Rx-=asr(Rs,#u5)
Word32 Q6_R_asrnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 592
Rx-=asr(Rs,Rt)
Word32 Q6_R_asrnac_RR(Word32 Rx, Word32 Rs, Word32 Rt) 605
Rx|=asr(Rs,#u5)
Word32 Q6_R_asror_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 596
Rx|=asr(Rs,Rt)
Word32 Q6_R_asror_RR(Word32 Rx, Word32 Rs, Word32 Rt) 608
Rxx&=asr(Rss,#u6)
Word64 Q6_P_asrand_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 596
Rxx&=asr(Rss,Rt)
Word64 Q6_P_asrand_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 608
Rxx^=asr(Rss,Rt)
Word64 Q6_P_asrxacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 608
Rxx+=asr(Rss,#u6)
Word64 Q6_P_asracc_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 592
Rxx+=asr(Rss,Rt)
Word64 Q6_P_asracc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 605
Rxx-=asr(Rss,#u6)
Word64 Q6_P_asrnac_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 592
Rxx-=asr(Rss,Rt)
Word64 Q6_P_asrnac_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 605
Rxx|=asr(Rss,#u6)
Word64 Q6_P_asror_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 597
Rxx|=asr(Rss,Rt)
Word64 Q6_P_asror_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 609

80-N2040-53 Rev. AA 653


Qualcomm Hexagon V73 Programmer’s Reference Manual

asrh
Rd=asrh(Rs)
Word32 Q6_R_asrh_R(Word32 Rs) 176

asrrnd
Rd=asrrnd(Rs,#u5)
Word32 Q6_R_asrrnd_RI(Word32 Rs, Word32 Iu5) 599
Rdd=asrrnd(Rss,#u6)
Word64 Q6_P_asrrnd_PI(Word64 Rss, Word32 Iu6) 599

B
bitsclr
Pd=!bitsclr(Rs,#u6)
Byte Q6_p_not_bitsclr_RI(Word32 Rs, Word32 Iu6) 574
Pd=!bitsclr(Rs,Rt)
Byte Q6_p_not_bitsclr_RR(Word32 Rs, Word32 Rt) 574
Pd=bitsclr(Rs,#u6)
Byte Q6_p_bitsclr_RI(Word32 Rs, Word32 Iu6) 574
Pd=bitsclr(Rs,Rt)
Byte Q6_p_bitsclr_RR(Word32 Rs, Word32 Rt) 574

bitsplit
Rdd=bitsplit(Rs,#u5)
Word64 Q6_P_bitsplit_RI(Word32 Rs, Word32 Iu5) 424
Rdd=bitsplit(Rs,Rt)
Word64 Q6_P_bitsplit_RR(Word32 Rs, Word32 Rt) 424

bitsset
Pd=!bitsset(Rs,Rt)
Byte Q6_p_not_bitsset_RR(Word32 Rs, Word32 Rt) 574
Pd=bitsset(Rs,Rt)
Byte Q6_p_bitsset_RR(Word32 Rs, Word32 Rt) 574

boundscheck
Pd=boundscheck(Rs,Rtt)
Byte Q6_p_boundscheck_RP(Word32 Rs, Word64 Rtt) 567

brev
Rd=brev(Rs)
Word32 Q6_R_brev_R(Word32 Rs) 421
Rdd=brev(Rss)
Word64 Q6_P_brev_P(Word64 Rss) 421

C
cl0
Rd=cl0(Rs)
Word32 Q6_R_cl0_R(Word32 Rs) 410
Rd=cl0(Rss)
Word32 Q6_R_cl0_P(Word64 Rss) 410

cl1
Rd=cl1(Rs)
Word32 Q6_R_cl1_R(Word32 Rs) 410
Rd=cl1(Rss)
Word32 Q6_R_cl1_P(Word64 Rss) 410

clb
Rd=add(clb(Rs),#s6)

80-N2040-53 Rev. AA 654


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word32 Q6_R_add_clb_RI(Word32 Rs, Word32 Is6) 410


Rd=add(clb(Rss),#s6)
Word32 Q6_R_add_clb_PI(Word64 Rss, Word32 Is6) 410
Rd=clb(Rs)
Word32 Q6_R_clb_R(Word32 Rs) 410
Rd=clb(Rss)
Word32 Q6_R_clb_P(Word64 Rss) 410

clip
Rd=clip(Rs,#u5)
Word32 Q6_R_clip_RI(Word32 Rs, Word32 Iu5) 342

clrbit
Rd=clrbit(Rs,#u5)
Word32 Q6_R_clrbit_RI(Word32 Rs, Word32 Iu5) 422
Rd=clrbit(Rs,Rt)
Word32 Q6_R_clrbit_RR(Word32 Rs, Word32 Rt) 422

cmp.eq
Pd=!cmp.eq(Rs,#s10)
Byte Q6_p_not_cmp_eq_RI(Word32 Rs, Word32 Is10) 191
Pd=!cmp.eq(Rs,Rt)
Byte Q6_p_not_cmp_eq_RR(Word32 Rs, Word32 Rt) 191
Pd=cmp.eq(Rs,#s10)
Byte Q6_p_cmp_eq_RI(Word32 Rs, Word32 Is10) 191
Pd=cmp.eq(Rs,Rt)
Byte Q6_p_cmp_eq_RR(Word32 Rs, Word32 Rt) 191
Pd=cmp.eq(Rss,Rtt)
Byte Q6_p_cmp_eq_PP(Word64 Rss, Word64 Rtt) 573
Rd=!cmp.eq(Rs,#s8)
Word32 Q6_R_not_cmp_eq_RI(Word32 Rs, Word32 Is8) 193
Rd=!cmp.eq(Rs,Rt)
Word32 Q6_R_not_cmp_eq_RR(Word32 Rs, Word32 Rt) 193
Rd=cmp.eq(Rs,#s8)
Word32 Q6_R_cmp_eq_RI(Word32 Rs, Word32 Is8) 193
Rd=cmp.eq(Rs,Rt)
Word32 Q6_R_cmp_eq_RR(Word32 Rs, Word32 Rt) 193

cmp.ge
Pd=cmp.ge(Rs,#s8)
Byte Q6_p_cmp_ge_RI(Word32 Rs, Word32 Is8) 191

cmp.geu
Pd=cmp.geu(Rs,#u8)
Byte Q6_p_cmp_geu_RI(Word32 Rs, Word32 Iu8) 191

cmp.gt
Pd=!cmp.gt(Rs,#s10)
Byte Q6_p_not_cmp_gt_RI(Word32 Rs, Word32 Is10) 191
Pd=!cmp.gt(Rs,Rt)
Byte Q6_p_not_cmp_gt_RR(Word32 Rs, Word32 Rt) 191
Pd=cmp.gt(Rs,#s10)
Byte Q6_p_cmp_gt_RI(Word32 Rs, Word32 Is10) 191
Pd=cmp.gt(Rs,Rt)
Byte Q6_p_cmp_gt_RR(Word32 Rs, Word32 Rt) 192
Pd=cmp.gt(Rss,Rtt)
Byte Q6_p_cmp_gt_PP(Word64 Rss, Word64 Rtt) 573

80-N2040-53 Rev. AA 655


Qualcomm Hexagon V73 Programmer’s Reference Manual

cmp.gtu
Pd=!cmp.gtu(Rs,#u9)
Byte Q6_p_not_cmp_gtu_RI(Word32 Rs, Word32 Iu9) 191
Pd=!cmp.gtu(Rs,Rt)
Byte Q6_p_not_cmp_gtu_RR(Word32 Rs, Word32 Rt) 191
Pd=cmp.gtu(Rs,#u9)
Byte Q6_p_cmp_gtu_RI(Word32 Rs, Word32 Iu9) 192
Pd=cmp.gtu(Rs,Rt)
Byte Q6_p_cmp_gtu_RR(Word32 Rs, Word32 Rt) 192
Pd=cmp.gtu(Rss,Rtt)
Byte Q6_p_cmp_gtu_PP(Word64 Rss, Word64 Rtt) 573

cmp.lt
Pd=cmp.lt(Rs,Rt)
Byte Q6_p_cmp_lt_RR(Word32 Rs, Word32 Rt) 192

cmp.ltu
Pd=cmp.ltu(Rs,Rt)
Byte Q6_p_cmp_ltu_RR(Word32 Rs, Word32 Rt) 192

cmpb.eq
Pd=cmpb.eq(Rs,#u8)
Byte Q6_p_cmpb_eq_RI(Word32 Rs, Word32 Iu8) 569
Pd=cmpb.eq(Rs,Rt)
Byte Q6_p_cmpb_eq_RR(Word32 Rs, Word32 Rt) 569

cmpb.gt
Pd=cmpb.gt(Rs,#s8)
Byte Q6_p_cmpb_gt_RI(Word32 Rs, Word32 Is8) 569
Pd=cmpb.gt(Rs,Rt)
Byte Q6_p_cmpb_gt_RR(Word32 Rs, Word32 Rt) 569

cmpb.gtu
Pd=cmpb.gtu(Rs,#u7)
Byte Q6_p_cmpb_gtu_RI(Word32 Rs, Word32 Iu7) 569
Pd=cmpb.gtu(Rs,Rt)
Byte Q6_p_cmpb_gtu_RR(Word32 Rs, Word32 Rt) 569

cmph.eq
Pd=cmph.eq(Rs,#s8)
Byte Q6_p_cmph_eq_RI(Word32 Rs, Word32 Is8) 571
Pd=cmph.eq(Rs,Rt)
Byte Q6_p_cmph_eq_RR(Word32 Rs, Word32 Rt) 571

cmph.gt
Pd=cmph.gt(Rs,#s8)
Byte Q6_p_cmph_gt_RI(Word32 Rs, Word32 Is8) 571
Pd=cmph.gt(Rs,Rt)
Byte Q6_p_cmph_gt_RR(Word32 Rs, Word32 Rt) 571

cmph.gtu
Pd=cmph.gtu(Rs,#u7)
Byte Q6_p_cmph_gtu_RI(Word32 Rs, Word32 Iu7) 571
Pd=cmph.gtu(Rs,Rt)
Byte Q6_p_cmph_gtu_RR(Word32 Rs, Word32 Rt) 571

cmpy
Rd=cmpy(Rs,Rt):<<1:rnd:sat

80-N2040-53 Rev. AA 656


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word32 Q6_R_cmpy_RR_s1_rnd_sat(Word32 Rs, Word32 Rt) 440


Rd=cmpy(Rs,Rt):rnd:sat
Word32 Q6_R_cmpy_RR_rnd_sat(Word32 Rs, Word32 Rt) 440
Rd=cmpy(Rs,Rt*):<<1:rnd:sat
Word32 Q6_R_cmpy_RR_conj_s1_rnd_sat(Word32 Rs, Word32 Rt) 440
Rd=cmpy(Rs,Rt*):rnd:sat
Word32 Q6_R_cmpy_RR_conj_rnd_sat(Word32 Rs, Word32 Rt) 440
Rdd=cmpy(Rs,Rt):<<1:sat
Word64 Q6_P_cmpy_RR_s1_sat(Word32 Rs, Word32 Rt) 435
Rdd=cmpy(Rs,Rt):sat
Word64 Q6_P_cmpy_RR_sat(Word32 Rs, Word32 Rt) 435
Rdd=cmpy(Rs,Rt*):<<1:sat
Word64 Q6_P_cmpy_RR_conj_s1_sat(Word32 Rs, Word32 Rt) 435
Rdd=cmpy(Rs,Rt*):sat
Word64 Q6_P_cmpy_RR_conj_sat(Word32 Rs, Word32 Rt) 435
Rxx+=cmpy(Rs,Rt):<<1:sat
Word64 Q6_P_cmpyacc_RR_s1_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 435
Rxx+=cmpy(Rs,Rt):sat
Word64 Q6_P_cmpyacc_RR_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 435
Rxx+=cmpy(Rs,Rt*):<<1:sat
Word64 Q6_P_cmpyacc_RR_conj_s1_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 435
Rxx+=cmpy(Rs,Rt*):sat
Word64 Q6_P_cmpyacc_RR_conj_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 435
Rxx-=cmpy(Rs,Rt):<<1:sat
Word64 Q6_P_cmpynac_RR_s1_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 435
Rxx-=cmpy(Rs,Rt):sat
Word64 Q6_P_cmpynac_RR_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 435
Rxx-=cmpy(Rs,Rt*):<<1:sat
Word64 Q6_P_cmpynac_RR_conj_s1_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 435
Rxx-=cmpy(Rs,Rt*):sat
Word64 Q6_P_cmpynac_RR_conj_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 435

cmpyi
Rdd=cmpyi(Rs,Rt)
Word64 Q6_P_cmpyi_RR(Word32 Rs, Word32 Rt) 438
Rxx+=cmpyi(Rs,Rt)
Word64 Q6_P_cmpyiacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 438

cmpyiw
Rd=cmpyiw(Rss,Rtt):<<1:rnd:sat
Word32 Q6_R_cmpyiw_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) 445
Rd=cmpyiw(Rss,Rtt):<<1:sat
Word32 Q6_R_cmpyiw_PP_s1_sat(Word64 Rss, Word64 Rtt) 445
Rd=cmpyiw(Rss,Rtt*):<<1:rnd:sat
Word32 Q6_R_cmpyiw_PP_conj_s1_rnd_sat(Word64 Rss, Word64 Rtt) 445
Rd=cmpyiw(Rss,Rtt*):<<1:sat
Word32 Q6_R_cmpyiw_PP_conj_s1_sat(Word64 Rss, Word64 Rtt) 445
Rdd=cmpyiw(Rss,Rtt)
Word64 Q6_P_cmpyiw_PP(Word64 Rss, Word64 Rtt) 445
Rdd=cmpyiw(Rss,Rtt*)
Word64 Q6_P_cmpyiw_PP_conj(Word64 Rss, Word64 Rtt) 445
Rxx+=cmpyiw(Rss,Rtt)
Word64 Q6_P_cmpyiwacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 445
Rxx+=cmpyiw(Rss,Rtt*)
Word64 Q6_P_cmpyiwacc_PP_conj(Word64 Rxx, Word64 Rss, Word64 Rtt) 445

cmpyiwh
Rd=cmpyiwh(Rss,Rt):<<1:rnd:sat

80-N2040-53 Rev. AA 657


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word32 Q6_R_cmpyiwh_PR_s1_rnd_sat(Word64 Rss, Word32 Rt) 442


Rd=cmpyiwh(Rss,Rt*):<<1:rnd:sat
Word32 Q6_R_cmpyiwh_PR_conj_s1_rnd_sat(Word64 Rss, Word32 Rt) 442

cmpyr
Rdd=cmpyr(Rs,Rt)
Word64 Q6_P_cmpyr_RR(Word32 Rs, Word32 Rt) 438
Rxx+=cmpyr(Rs,Rt)
Word64 Q6_P_cmpyracc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 438

cmpyrw
Rd=cmpyrw(Rss,Rtt):<<1:rnd:sat
Word32 Q6_R_cmpyrw_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) 445
Rd=cmpyrw(Rss,Rtt):<<1:sat
Word32 Q6_R_cmpyrw_PP_s1_sat(Word64 Rss, Word64 Rtt) 445
Rd=cmpyrw(Rss,Rtt*):<<1:rnd:sat
Word32 Q6_R_cmpyrw_PP_conj_s1_rnd_sat(Word64 Rss, Word64 Rtt) 445
Rd=cmpyrw(Rss,Rtt*):<<1:sat
Word32 Q6_R_cmpyrw_PP_conj_s1_sat(Word64 Rss, Word64 Rtt) 445
Rdd=cmpyrw(Rss,Rtt)
Word64 Q6_P_cmpyrw_PP(Word64 Rss, Word64 Rtt) 445
Rdd=cmpyrw(Rss,Rtt*)
Word64 Q6_P_cmpyrw_PP_conj(Word64 Rss, Word64 Rtt) 445
Rxx+=cmpyrw(Rss,Rtt)
Word64 Q6_P_cmpyrwacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 445
Rxx+=cmpyrw(Rss,Rtt*)
Word64 Q6_P_cmpyrwacc_PP_conj(Word64 Rxx, Word64 Rss, Word64 Rtt) 445

cmpyrwh
Rd=cmpyrwh(Rss,Rt):<<1:rnd:sat
Word32 Q6_R_cmpyrwh_PR_s1_rnd_sat(Word64 Rss, Word32 Rt) 442
Rd=cmpyrwh(Rss,Rt*):<<1:rnd:sat
Word32 Q6_R_cmpyrwh_PR_conj_s1_rnd_sat(Word64 Rss, Word32 Rt) 442

combine
Rd=combine(Rt.H,Rs.H)
Word32 Q6_R_combine_RhRh(Word32 Rt, Word32 Rs) 173
Rd=combine(Rt.H,Rs.L)
Word32 Q6_R_combine_RhRl(Word32 Rt, Word32 Rs) 173
Rd=combine(Rt.L,Rs.H)
Word32 Q6_R_combine_RlRh(Word32 Rt, Word32 Rs) 173
Rd=combine(Rt.L,Rs.L)
Word32 Q6_R_combine_RlRl(Word32 Rt, Word32 Rs) 173
Rdd=combine(#s8,#S8)
Word64 Q6_P_combine_II(Word32 Is8, Word32 IS8) 173
Rdd=combine(#s8,Rs)
Word64 Q6_P_combine_IR(Word32 Is8, Word32 Rs) 173
Rdd=combine(Rs,#s8)
Word64 Q6_P_combine_RI(Word32 Rs, Word32 Is8) 173
Rdd=combine(Rs,Rt)
Word64 Q6_P_combine_RR(Word32 Rs, Word32 Rt) 173

convert_d2df
Rdd=convert_d2df(Rss)
Word64 Q6_P_convert_d2df_P(Word64 Rss) 467

convert_d2sf
Rd=convert_d2sf(Rss)
Word32 Q6_R_convert_d2sf_P(Word64 Rss) 467

80-N2040-53 Rev. AA 658


Qualcomm Hexagon V73 Programmer’s Reference Manual

convert_df2d
Rdd=convert_df2d(Rss)
Word64 Q6_P_convert_df2d_P(Word64 Rss) 470
Rdd=convert_df2d(Rss):chop
Word64 Q6_P_convert_df2d_P_chop(Word64 Rss) 470

convert_df2sf
Rd=convert_df2sf(Rss)
Word32 Q6_R_convert_df2sf_P(Word64 Rss) 466

convert_df2ud
Rdd=convert_df2ud(Rss)
Word64 Q6_P_convert_df2ud_P(Word64 Rss) 470
Rdd=convert_df2ud(Rss):chop
Word64 Q6_P_convert_df2ud_P_chop(Word64 Rss) 470

convert_df2uw
Rd=convert_df2uw(Rss)
Word32 Q6_R_convert_df2uw_P(Word64 Rss) 470
Rd=convert_df2uw(Rss):chop
Word32 Q6_R_convert_df2uw_P_chop(Word64 Rss) 470

convert_df2w
Rd=convert_df2w(Rss)
Word32 Q6_R_convert_df2w_P(Word64 Rss) 470
Rd=convert_df2w(Rss):chop
Word32 Q6_R_convert_df2w_P_chop(Word64 Rss) 470

convert_sf2d
Rdd=convert_sf2d(Rs)
Word64 Q6_P_convert_sf2d_R(Word32 Rs) 470
Rdd=convert_sf2d(Rs):chop
Word64 Q6_P_convert_sf2d_R_chop(Word32 Rs) 470

convert_sf2df
Rdd=convert_sf2df(Rs)
Word64 Q6_P_convert_sf2df_R(Word32 Rs) 466

convert_sf2ud
Rdd=convert_sf2ud(Rs)
Word64 Q6_P_convert_sf2ud_R(Word32 Rs) 470
Rdd=convert_sf2ud(Rs):chop
Word64 Q6_P_convert_sf2ud_R_chop(Word32 Rs) 470

convert_sf2uw
Rd=convert_sf2uw(Rs)
Word32 Q6_R_convert_sf2uw_R(Word32 Rs) 470
Rd=convert_sf2uw(Rs):chop
Word32 Q6_R_convert_sf2uw_R_chop(Word32 Rs) 470

convert_sf2w
Rd=convert_sf2w(Rs)
Word32 Q6_R_convert_sf2w_R(Word32 Rs) 470
Rd=convert_sf2w(Rs):chop
Word32 Q6_R_convert_sf2w_R_chop(Word32 Rs) 470

convert_ud2df
Rdd=convert_ud2df(Rss)
Word64 Q6_P_convert_ud2df_P(Word64 Rss) 467

80-N2040-53 Rev. AA 659


Qualcomm Hexagon V73 Programmer’s Reference Manual

convert_ud2sf
Rd=convert_ud2sf(Rss)
Word32 Q6_R_convert_ud2sf_P(Word64 Rss) 467

convert_uw2df
Rdd=convert_uw2df(Rs)
Word64 Q6_P_convert_uw2df_R(Word32 Rs) 467

convert_uw2sf
Rd=convert_uw2sf(Rs)
Word32 Q6_R_convert_uw2sf_R(Word32 Rs) 467

convert_w2df
Rdd=convert_w2df(Rs)
Word64 Q6_P_convert_w2df_R(Word32 Rs) 467

convert_w2sf
Rd=convert_w2sf(Rs)
Word32 Q6_R_convert_w2sf_R(Word32 Rs) 467

cround
Rd=cround(Rs,#u5)
Word32 Q6_R_cround_RI(Word32 Rs, Word32 Iu5) 357
Rd=cround(Rs,Rt)
Word32 Q6_R_cround_RR(Word32 Rs, Word32 Rt) 357
Rdd=cround(Rss,#u6)
Word64 Q6_P_cround_PI(Word64 Rss, Word32 Iu6) 357
Rdd=cround(Rss,Rt)
Word64 Q6_P_cround_PR(Word64 Rss, Word32 Rt) 357

ct0
Rd=ct0(Rs)
Word32 Q6_R_ct0_R(Word32 Rs) 412
Rd=ct0(Rss)
Word32 Q6_R_ct0_P(Word64 Rss) 412

ct1
Rd=ct1(Rs)
Word32 Q6_R_ct1_R(Word32 Rs) 412
Rd=ct1(Rss)
Word32 Q6_R_ct1_P(Word64 Rss) 412

D
dccleana
dccleana(Rs)
void Q6_dccleana_A(Address a) 320

dccleaninva
dccleaninva(Rs)
void Q6_dccleaninva_A(Address a) 320

dcfetch
dcfetch(Rs)
void Q6_dcfetch_A(Address a) 319

dcinva
dcinva(Rs)
void Q6_dcinva_A(Address a) 320

80-N2040-53 Rev. AA 660


Qualcomm Hexagon V73 Programmer’s Reference Manual

dczeroa
dczeroa(Rs)
void Q6_dczeroa_A(Address a) 316

deinterleave
Rdd=deinterleave(Rss)
Word64 Q6_P_deinterleave_P(Word64 Rss) 418

dfadd
Rdd=dfadd(Rss,Rtt)
Word64 Q6_P_dfadd_PP(Word64 Rss, Word64 Rtt) 461

dfclass
Pd=dfclass(Rss,#u5)
Byte Q6_p_dfclass_PI(Word64 Rss, Word32 Iu5) 462

dfcmp.eq
Pd=dfcmp.eq(Rss,Rtt)
Byte Q6_p_dfcmp_eq_PP(Word64 Rss, Word64 Rtt) 464

dfcmp.ge
Pd=dfcmp.ge(Rss,Rtt)
Byte Q6_p_dfcmp_ge_PP(Word64 Rss, Word64 Rtt) 464

dfcmp.gt
Pd=dfcmp.gt(Rss,Rtt)
Byte Q6_p_dfcmp_gt_PP(Word64 Rss, Word64 Rtt) 464

dfcmp.uo
Pd=dfcmp.uo(Rss,Rtt)
Byte Q6_p_dfcmp_uo_PP(Word64 Rss, Word64 Rtt) 464

dfmake
Rdd=dfmake(#u10):neg
Word64 Q6_P_dfmake_I_neg(Word32 Iu10) 478
Rdd=dfmake(#u10):pos
Word64 Q6_P_dfmake_I_pos(Word32 Iu10) 478

dfmax
Rdd=dfmax(Rss,Rtt)
Word64 Q6_P_dfmax_PP(Word64 Rss, Word64 Rtt) 479

dfmin
Rdd=dfmin(Rss,Rtt)
Word64 Q6_P_dfmin_PP(Word64 Rss, Word64 Rtt) 480

dfmpyfix
Rdd=dfmpyfix(Rss,Rtt)
Word64 Q6_P_dfmpyfix_PP(Word64 Rss, Word64 Rtt) 481

dfmpyhh
Rxx+=dfmpyhh(Rss,Rtt)
Word64 Q6_P_dfmpyhhacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 473

dfmpylh
Rxx+=dfmpylh(Rss,Rtt)
Word64 Q6_P_dfmpylhacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 473

dfmpyll
Rdd=dfmpyll(Rss,Rtt)

80-N2040-53 Rev. AA 661


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word64 Q6_P_dfmpyll_PP(Word64 Rss, Word64 Rtt) 481

dfsub
Rdd=dfsub(Rss,Rtt)
Word64 Q6_P_dfsub_PP(Word64 Rss, Word64 Rtt) 483

dmsyncht
Rd=dmsyncht
Word32 Q6_R_dmsyncht() 329

E
extract
Rd=extract(Rs,#u5,#U5)
Word32 Q6_R_extract_RII(Word32 Rs, Word32 Iu5, Word32 IU5) 414
Rd=extract(Rs,Rtt)
Word32 Q6_R_extract_RP(Word32 Rs, Word64 Rtt) 414
Rdd=extract(Rss,#u6,#U6)
Word64 Q6_P_extract_PII(Word64 Rss, Word32 Iu6, Word32 IU6) 414
Rdd=extract(Rss,Rtt)
Word64 Q6_P_extract_PP(Word64 Rss, Word64 Rtt) 414

extractu
Rd=extractu(Rs,#u5,#U5)
Word32 Q6_R_extractu_RII(Word32 Rs, Word32 Iu5, Word32 IU5) 414
Rd=extractu(Rs,Rtt)
Word32 Q6_R_extractu_RP(Word32 Rs, Word64 Rtt) 414
Rdd=extractu(Rss,#u6,#U6)
Word64 Q6_P_extractu_PII(Word64 Rss, Word32 Iu6, Word32 IU6) 414
Rdd=extractu(Rss,Rtt)
Word64 Q6_P_extractu_PP(Word64 Rss, Word64 Rtt) 414

F
fastcorner9
Pd=!fastcorner9(Ps,Pt)
Byte Q6_p_not_fastcorner9_pp(Byte Ps, Byte Pt) 196
Pd=fastcorner9(Ps,Pt)
Byte Q6_p_fastcorner9_pp(Byte Ps, Byte Pt) 196

I
insert
Rx=insert(Rs,#u5,#U5)
Word32 Q6_R_insert_RII(Word32 Rx, Word32 Rs, Word32 Iu5, Word32 IU5) 417
Rx=insert(Rs,Rtt)
Word32 Q6_R_insert_RP(Word32 Rx, Word32 Rs, Word64 Rtt) 417
Rxx=insert(Rss,#u6,#U6)
Word64 Q6_P_insert_PII(Word64 Rxx, Word64 Rss, Word32 Iu6, Word32 IU6) 417
Rxx=insert(Rss,Rtt)
Word64 Q6_P_insert_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 417

interleave
Rdd=interleave(Rss)
Word64 Q6_P_interleave_P(Word64 Rss) 418

L
l2fetch
l2fetch(Rs,Rt)

80-N2040-53 Rev. AA 662


Qualcomm Hexagon V73 Programmer’s Reference Manual

void Q6_l2fetch_AR(Address a, Word32 Rt) 326


l2fetch(Rs,Rtt)
void Q6_l2fetch_AP(Address a, Word64 Rtt) 326

lfs
Rdd=lfs(Rss,Rtt)
Word64 Q6_P_lfs_PP(Word64 Rss, Word64 Rtt) 419

lsl
Rd=lsl(#s6,Rt)
Word32 Q6_R_lsl_IR(Word32 Is6, Word32 Rt) 602
Rd=lsl(Rs,Rt)
Word32 Q6_R_lsl_RR(Word32 Rs, Word32 Rt) 602
Rdd=lsl(Rss,Rt)
Word64 Q6_P_lsl_PR(Word64 Rss, Word32 Rt) 602
Rx&=lsl(Rs,Rt)
Word32 Q6_R_lsland_RR(Word32 Rx, Word32 Rs, Word32 Rt) 608
Rx+=lsl(Rs,Rt)
Word32 Q6_R_lslacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 605
Rx-=lsl(Rs,Rt)
Word32 Q6_R_lslnac_RR(Word32 Rx, Word32 Rs, Word32 Rt) 605
Rx|=lsl(Rs,Rt)
Word32 Q6_R_lslor_RR(Word32 Rx, Word32 Rs, Word32 Rt) 608
Rxx&=lsl(Rss,Rt)
Word64 Q6_P_lsland_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 608
Rxx^=lsl(Rss,Rt)
Word64 Q6_P_lslxacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 608
Rxx+=lsl(Rss,Rt)
Word64 Q6_P_lslacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 605
Rxx-=lsl(Rss,Rt)
Word64 Q6_P_lslnac_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 605
Rxx|=lsl(Rss,Rt)
Word64 Q6_P_lslor_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 609

lsr
Rd=lsr(Rs,#u5)
Word32 Q6_R_lsr_RI(Word32 Rs, Word32 Iu5) 589
Rd=lsr(Rs,Rt)
Word32 Q6_R_lsr_RR(Word32 Rs, Word32 Rt) 602
Rdd=lsr(Rss,#u6)
Word64 Q6_P_lsr_PI(Word64 Rss, Word32 Iu6) 590
Rdd=lsr(Rss,Rt)
Word64 Q6_P_lsr_PR(Word64 Rss, Word32 Rt) 602
Rx&=lsr(Rs,#u5)
Word32 Q6_R_lsrand_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 596
Rx&=lsr(Rs,Rt)
Word32 Q6_R_lsrand_RR(Word32 Rx, Word32 Rs, Word32 Rt) 608
Rx^=lsr(Rs,#u5)
Word32 Q6_R_lsrxacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 596
Rx+=lsr(Rs,#u5)
Word32 Q6_R_lsracc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 592
Rx+=lsr(Rs,Rt)
Word32 Q6_R_lsracc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 605
Rx=add(#u8,lsr(Rx,#U5))
Word32 Q6_R_add_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) 592
Rx=and(#u8,lsr(Rx,#U5))
Word32 Q6_R_and_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) 596
Rx-=lsr(Rs,#u5)

80-N2040-53 Rev. AA 663


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word32 Q6_R_lsrnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 592


Rx-=lsr(Rs,Rt)
Word32 Q6_R_lsrnac_RR(Word32 Rx, Word32 Rs, Word32 Rt) 605
Rx=or(#u8,lsr(Rx,#U5))
Word32 Q6_R_or_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) 596
Rx=sub(#u8,lsr(Rx,#U5))
Word32 Q6_R_sub_lsr_IRI(Word32 Iu8, Word32 Rx, Word32 IU5) 592
Rx|=lsr(Rs,#u5)
Word32 Q6_R_lsror_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 596
Rx|=lsr(Rs,Rt)
Word32 Q6_R_lsror_RR(Word32 Rx, Word32 Rs, Word32 Rt) 608
Rxx&=lsr(Rss,#u6)
Word64 Q6_P_lsrand_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 596
Rxx&=lsr(Rss,Rt)
Word64 Q6_P_lsrand_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 608
Rxx^=lsr(Rss,#u6)
Word64 Q6_P_lsrxacc_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 596
Rxx^=lsr(Rss,Rt)
Word64 Q6_P_lsrxacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 609
Rxx+=lsr(Rss,#u6)
Word64 Q6_P_lsracc_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 592
Rxx+=lsr(Rss,Rt)
Word64 Q6_P_lsracc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 605
Rxx-=lsr(Rss,#u6)
Word64 Q6_P_lsrnac_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 592
Rxx-=lsr(Rss,Rt)
Word64 Q6_P_lsrnac_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 605
Rxx|=lsr(Rss,#u6)
Word64 Q6_P_lsror_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 597
Rxx|=lsr(Rss,Rt)
Word64 Q6_P_lsror_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 609

M
mask
Rd=mask(#u5,#U5)
Word32 Q6_R_mask_II(Word32 Iu5, Word32 IU5) 588
Rdd=mask(Pt)
Word64 Q6_P_mask_p(Byte Pt) 575

max
Rd=max(Rs,Rt)
Word32 Q6_R_max_RR(Word32 Rs, Word32 Rt) 349
Rdd=max(Rss,Rtt)
Word64 Q6_P_max_PP(Word64 Rss, Word64 Rtt) 350

maxu
Rd=maxu(Rs,Rt)
UWord32 Q6_R_maxu_RR(Word32 Rs, Word32 Rt) 349
Rdd=maxu(Rss,Rtt)
UWord64 Q6_P_maxu_PP(Word64 Rss, Word64 Rtt) 350

memb
memb(Rx++#s4:0:circ(Mu))=Rt
void Q6_memb_IMR_circ(void** StartAddress, Word32 Is4_0, Word32 Mu, Word32
Rt, void* BaseAddress) 293
memb(Rx++I:circ(Mu))=Rt
void Q6_memb_MR_circ(void** StartAddress, Word32 Mu, Word32 Rt, void*

80-N2040-53 Rev. AA 664


Qualcomm Hexagon V73 Programmer’s Reference Manual

BaseAddress) 293
Rd=memb(Rx++#s4:0:circ(Mu))
Word32 Q6_R_memb_IM_circ(void** StartAddress, Word32 Is4_0, Word32 Mu,
void* BaseAddress) 227
Rd=memb(Rx++I:circ(Mu))
Word32 Q6_R_memb_M_circ(void** StartAddress, Word32 Mu, void* BaseAddress)
227

memd
memd(Rx++#s4:3:circ(Mu))=Rtt
void Q6_memd_IMP_circ(void** StartAddress, Word32 Is4_3, Word32 Mu, Word64
Rtt, void* BaseAddress) 289
memd(Rx++I:circ(Mu))=Rtt
void Q6_memd_MP_circ(void** StartAddress, Word32 Mu, Word64 Rtt, void*
BaseAddress) 289
Rdd=memd(Rx++#s4:3:circ(Mu))
Word32 Q6_R_memd_IM_circ(void** StartAddress, Word32 Is4_3, Word32 Mu,
void* BaseAddress) 223
Rdd=memd(Rx++I:circ(Mu))
Word32 Q6_R_memd_M_circ(void** StartAddress, Word32 Mu, void* BaseAddress)
223

memh
memh(Rx++#s4:1:circ(Mu))=Rt
void Q6_memh_IMR_circ(void** StartAddress, Word32 Is4_1, Word32 Mu, Word32
Rt, void* BaseAddress) 299
memh(Rx++#s4:1:circ(Mu))=Rt.H
void Q6_memh_IMRh_circ(void** StartAddress, Word32 Is4_1, Word32 Mu, Word32
Rt, void* BaseAddress) 299
memh(Rx++I:circ(Mu))=Rt
void Q6_memh_MR_circ(void** StartAddress, Word32 Mu, Word32 Rt, void*
BaseAddress) 299
memh(Rx++I:circ(Mu))=Rt.H
void Q6_memh_MRh_circ(void** StartAddress, Word32 Mu, Word32 Rt, void*
BaseAddress) 299
Rd=memh(Rx++#s4:1:circ(Mu))
Word32 Q6_R_memh_IM_circ(void** StartAddress, Word32 Is4_1, Word32 Mu,
void* BaseAddress) 237
Rd=memh(Rx++I:circ(Mu))
Word32 Q6_R_memh_M_circ(void** StartAddress, Word32 Mu, void* BaseAddress)
237

memub
Rd=memub(Rx++#s4:0:circ(Mu))
Word32 Q6_R_memub_IM_circ(void** StartAddress, Word32 Is4_0, Word32 Mu,
void* BaseAddress) 243
Rd=memub(Rx++I:circ(Mu))
Word32 Q6_R_memub_M_circ(void** StartAddress, Word32 Mu, void*
BaseAddress) 243

memuh
Rd=memuh(Rx++#s4:1:circ(Mu))
Word32 Q6_R_memuh_IM_circ(void** StartAddress, Word32 Is4_1, Word32 Mu,
void* BaseAddress) 247
Rd=memuh(Rx++I:circ(Mu))
Word32 Q6_R_memuh_M_circ(void** StartAddress, Word32 Mu, void*
BaseAddress) 247

80-N2040-53 Rev. AA 665


Qualcomm Hexagon V73 Programmer’s Reference Manual

memw
memw(Rx++#s4:2:circ(Mu))=Rt
void Q6_memw_IMR_circ(void** StartAddress, Word32 Is4_2, Word32 Mu, Word32
Rt, void* BaseAddress) 306
memw(Rx++I:circ(Mu))=Rt
void Q6_memw_MR_circ(void** StartAddress, Word32 Mu, Word32 Rt, void*
BaseAddress) 306
Rd=memw(Rx++#s4:2:circ(Mu))
Word32 Q6_R_memw_IM_circ(void** StartAddress, Word32 Is4_2, Word32 Mu,
void* BaseAddress) 251
Rd=memw(Rx++I:circ(Mu))
Word32 Q6_R_memw_M_circ(void** StartAddress, Word32 Mu, void* BaseAddress)
251

min
Rd=min(Rt,Rs)
Word32 Q6_R_min_RR(Word32 Rt, Word32 Rs) 351
Rdd=min(Rtt,Rss)
Word64 Q6_P_min_PP(Word64 Rtt, Word64 Rss) 352

minu
Rd=minu(Rt,Rs)
UWord32 Q6_R_minu_RR(Word32 Rt, Word32 Rs) 351
Rdd=minu(Rtt,Rss)
UWord64 Q6_P_minu_PP(Word64 Rtt, Word64 Rss) 352

modwrap
Rd=modwrap(Rs,Rt)
Word32 Q6_R_modwrap_RR(Word32 Rs, Word32 Rt) 353

mpy
Rd=mpy(Rs,Rt.H):<<1:rnd:sat
Word32 Q6_R_mpy_RRh_s1_rnd_sat(Word32 Rs, Word32 Rt) 513
Rd=mpy(Rs,Rt.H):<<1:sat
Word32 Q6_R_mpy_RRh_s1_sat(Word32 Rs, Word32 Rt) 513
Rd=mpy(Rs,Rt.L):<<1:rnd:sat
Word32 Q6_R_mpy_RRl_s1_rnd_sat(Word32 Rs, Word32 Rt) 513
Rd=mpy(Rs,Rt.L):<<1:sat
Word32 Q6_R_mpy_RRl_s1_sat(Word32 Rs, Word32 Rt) 513
Rd=mpy(Rs,Rt)
Word32 Q6_R_mpy_RR(Word32 Rs, Word32 Rt) 513
Rd=mpy(Rs,Rt):<<1
Word32 Q6_R_mpy_RR_s1(Word32 Rs, Word32 Rt) 513
Rd=mpy(Rs,Rt):<<1:sat
Word32 Q6_R_mpy_RR_s1_sat(Word32 Rs, Word32 Rt) 513
Rd=mpy(Rs,Rt):rnd
Word32 Q6_R_mpy_RR_rnd(Word32 Rs, Word32 Rt) 513
Rd=mpy(Rs.H,Rt.H)
Word32 Q6_R_mpy_RhRh(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.H):<<1
Word32 Q6_R_mpy_RhRh_s1(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.H):<<1:rnd
Word32 Q6_R_mpy_RhRh_s1_rnd(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.H):<<1:rnd:sat
Word32 Q6_R_mpy_RhRh_s1_rnd_sat(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.H):<<1:sat
Word32 Q6_R_mpy_RhRh_s1_sat(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.H):rnd

80-N2040-53 Rev. AA 666


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word32 Q6_R_mpy_RhRh_rnd(Word32 Rs, Word32 Rt) 497


Rd=mpy(Rs.H,Rt.H):rnd:sat
Word32 Q6_R_mpy_RhRh_rnd_sat(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.H):sat
Word32 Q6_R_mpy_RhRh_sat(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.L)
Word32 Q6_R_mpy_RhRl(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.L):<<1
Word32 Q6_R_mpy_RhRl_s1(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.L):<<1:rnd
Word32 Q6_R_mpy_RhRl_s1_rnd(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.L):<<1:rnd:sat
Word32 Q6_R_mpy_RhRl_s1_rnd_sat(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.L):<<1:sat
Word32 Q6_R_mpy_RhRl_s1_sat(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.L):rnd
Word32 Q6_R_mpy_RhRl_rnd(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.L):rnd:sat
Word32 Q6_R_mpy_RhRl_rnd_sat(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.H,Rt.L):sat
Word32 Q6_R_mpy_RhRl_sat(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.L,Rt.H)
Word32 Q6_R_mpy_RlRh(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.L,Rt.H):<<1
Word32 Q6_R_mpy_RlRh_s1(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.L,Rt.H):<<1:rnd
Word32 Q6_R_mpy_RlRh_s1_rnd(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.L,Rt.H):<<1:rnd:sat
Word32 Q6_R_mpy_RlRh_s1_rnd_sat(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.L,Rt.H):<<1:sat
Word32 Q6_R_mpy_RlRh_s1_sat(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.L,Rt.H):rnd
Word32 Q6_R_mpy_RlRh_rnd(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.L,Rt.H):rnd:sat
Word32 Q6_R_mpy_RlRh_rnd_sat(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.L,Rt.H):sat
Word32 Q6_R_mpy_RlRh_sat(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.L,Rt.L)
Word32 Q6_R_mpy_RlRl(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.L,Rt.L):<<1
Word32 Q6_R_mpy_RlRl_s1(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.L,Rt.L):<<1:rnd
Word32 Q6_R_mpy_RlRl_s1_rnd(Word32 Rs, Word32 Rt) 497
Rd=mpy(Rs.L,Rt.L):<<1:rnd:sat
Word32 Q6_R_mpy_RlRl_s1_rnd_sat(Word32 Rs, Word32 Rt) 498
Rd=mpy(Rs.L,Rt.L):<<1:sat
Word32 Q6_R_mpy_RlRl_s1_sat(Word32 Rs, Word32 Rt) 498
Rd=mpy(Rs.L,Rt.L):rnd
Word32 Q6_R_mpy_RlRl_rnd(Word32 Rs, Word32 Rt) 498
Rd=mpy(Rs.L,Rt.L):rnd:sat
Word32 Q6_R_mpy_RlRl_rnd_sat(Word32 Rs, Word32 Rt) 498
Rd=mpy(Rs.L,Rt.L):sat
Word32 Q6_R_mpy_RlRl_sat(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs,Rt)
Word64 Q6_P_mpy_RR(Word32 Rs, Word32 Rt) 515
Rdd=mpy(Rs.H,Rt.H)
Word64 Q6_P_mpy_RhRh(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.H,Rt.H):<<1

80-N2040-53 Rev. AA 667


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word64 Q6_P_mpy_RhRh_s1(Word32 Rs, Word32 Rt) 498


Rdd=mpy(Rs.H,Rt.H):<<1:rnd
Word64 Q6_P_mpy_RhRh_s1_rnd(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.H,Rt.H):rnd
Word64 Q6_P_mpy_RhRh_rnd(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.H,Rt.L)
Word64 Q6_P_mpy_RhRl(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.H,Rt.L):<<1
Word64 Q6_P_mpy_RhRl_s1(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.H,Rt.L):<<1:rnd
Word64 Q6_P_mpy_RhRl_s1_rnd(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.H,Rt.L):rnd
Word64 Q6_P_mpy_RhRl_rnd(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.L,Rt.H)
Word64 Q6_P_mpy_RlRh(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.L,Rt.H):<<1
Word64 Q6_P_mpy_RlRh_s1(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.L,Rt.H):<<1:rnd
Word64 Q6_P_mpy_RlRh_s1_rnd(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.L,Rt.H):rnd
Word64 Q6_P_mpy_RlRh_rnd(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.L,Rt.L)
Word64 Q6_P_mpy_RlRl(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.L,Rt.L):<<1
Word64 Q6_P_mpy_RlRl_s1(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.L,Rt.L):<<1:rnd
Word64 Q6_P_mpy_RlRl_s1_rnd(Word32 Rs, Word32 Rt) 498
Rdd=mpy(Rs.L,Rt.L):rnd
Word64 Q6_P_mpy_RlRl_rnd(Word32 Rs, Word32 Rt) 498
Rx+=mpy(Rs,Rt):<<1:sat
Word32 Q6_R_mpyacc_RR_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 513
Rx+=mpy(Rs.H,Rt.H)
Word32 Q6_R_mpyacc_RhRh(Word32 Rx, Word32 Rs, Word32 Rt) 498
Rx+=mpy(Rs.H,Rt.H):<<1
Word32 Q6_R_mpyacc_RhRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) 498
Rx+=mpy(Rs.H,Rt.H):<<1:sat
Word32 Q6_R_mpyacc_RhRh_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 498
Rx+=mpy(Rs.H,Rt.H):sat
Word32 Q6_R_mpyacc_RhRh_sat(Word32 Rx, Word32 Rs, Word32 Rt) 498
Rx+=mpy(Rs.H,Rt.L)
Word32 Q6_R_mpyacc_RhRl(Word32 Rx, Word32 Rs, Word32 Rt) 498
Rx+=mpy(Rs.H,Rt.L):<<1
Word32 Q6_R_mpyacc_RhRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) 498
Rx+=mpy(Rs.H,Rt.L):<<1:sat
Word32 Q6_R_mpyacc_RhRl_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 498
Rx+=mpy(Rs.H,Rt.L):sat
Word32 Q6_R_mpyacc_RhRl_sat(Word32 Rx, Word32 Rs, Word32 Rt) 498
Rx+=mpy(Rs.L,Rt.H)
Word32 Q6_R_mpyacc_RlRh(Word32 Rx, Word32 Rs, Word32 Rt) 498
Rx+=mpy(Rs.L,Rt.H):<<1
Word32 Q6_R_mpyacc_RlRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) 498
Rx+=mpy(Rs.L,Rt.H):<<1:sat
Word32 Q6_R_mpyacc_RlRh_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx+=mpy(Rs.L,Rt.H):sat
Word32 Q6_R_mpyacc_RlRh_sat(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx+=mpy(Rs.L,Rt.L)
Word32 Q6_R_mpyacc_RlRl(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx+=mpy(Rs.L,Rt.L):<<1

80-N2040-53 Rev. AA 668


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word32 Q6_R_mpyacc_RlRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) 499


Rx+=mpy(Rs.L,Rt.L):<<1:sat
Word32 Q6_R_mpyacc_RlRl_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx+=mpy(Rs.L,Rt.L):sat
Word32 Q6_R_mpyacc_RlRl_sat(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs,Rt):<<1:sat
Word32 Q6_R_mpynac_RR_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 513
Rx-=mpy(Rs.H,Rt.H)
Word32 Q6_R_mpynac_RhRh(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.H,Rt.H):<<1
Word32 Q6_R_mpynac_RhRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.H,Rt.H):<<1:sat
Word32 Q6_R_mpynac_RhRh_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.H,Rt.H):sat
Word32 Q6_R_mpynac_RhRh_sat(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.H,Rt.L)
Word32 Q6_R_mpynac_RhRl(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.H,Rt.L):<<1
Word32 Q6_R_mpynac_RhRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.H,Rt.L):<<1:sat
Word32 Q6_R_mpynac_RhRl_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.H,Rt.L):sat
Word32 Q6_R_mpynac_RhRl_sat(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.L,Rt.H)
Word32 Q6_R_mpynac_RlRh(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.L,Rt.H):<<1
Word32 Q6_R_mpynac_RlRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.L,Rt.H):<<1:sat
Word32 Q6_R_mpynac_RlRh_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.L,Rt.H):sat
Word32 Q6_R_mpynac_RlRh_sat(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.L,Rt.L)
Word32 Q6_R_mpynac_RlRl(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.L,Rt.L):<<1
Word32 Q6_R_mpynac_RlRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.L,Rt.L):<<1:sat
Word32 Q6_R_mpynac_RlRl_s1_sat(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rx-=mpy(Rs.L,Rt.L):sat
Word32 Q6_R_mpynac_RlRl_sat(Word32 Rx, Word32 Rs, Word32 Rt) 499
Rxx+=mpy(Rs,Rt)
Word64 Q6_P_mpyacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 515
Rxx+=mpy(Rs.H,Rt.H)
Word64 Q6_P_mpyacc_RhRh(Word64 Rxx, Word32 Rs, Word32 Rt) 499
Rxx+=mpy(Rs.H,Rt.H):<<1
Word64 Q6_P_mpyacc_RhRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx+=mpy(Rs.H,Rt.L)
Word64 Q6_P_mpyacc_RhRl(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx+=mpy(Rs.H,Rt.L):<<1
Word64 Q6_P_mpyacc_RhRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx+=mpy(Rs.L,Rt.H)
Word64 Q6_P_mpyacc_RlRh(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx+=mpy(Rs.L,Rt.H):<<1
Word64 Q6_P_mpyacc_RlRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx+=mpy(Rs.L,Rt.L)
Word64 Q6_P_mpyacc_RlRl(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx+=mpy(Rs.L,Rt.L):<<1
Word64 Q6_P_mpyacc_RlRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx-=mpy(Rs,Rt)

80-N2040-53 Rev. AA 669


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word64 Q6_P_mpynac_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 515


Rxx-=mpy(Rs.H,Rt.H)
Word64 Q6_P_mpynac_RhRh(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx-=mpy(Rs.H,Rt.H):<<1
Word64 Q6_P_mpynac_RhRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx-=mpy(Rs.H,Rt.L)
Word64 Q6_P_mpynac_RhRl(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx-=mpy(Rs.H,Rt.L):<<1
Word64 Q6_P_mpynac_RhRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx-=mpy(Rs.L,Rt.H)
Word64 Q6_P_mpynac_RlRh(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx-=mpy(Rs.L,Rt.H):<<1
Word64 Q6_P_mpynac_RlRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx-=mpy(Rs.L,Rt.L)
Word64 Q6_P_mpynac_RlRl(Word64 Rxx, Word32 Rs, Word32 Rt) 500
Rxx-=mpy(Rs.L,Rt.L):<<1
Word64 Q6_P_mpynac_RlRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 500

mpyi
Rd=mpyi(Rs,#m9)
Word32 Q6_R_mpyi_RI(Word32 Rs, Word32 Im9) 486
Rd=mpyi(Rs,Rt)
Word32 Q6_R_mpyi_RR(Word32 Rs, Word32 Rt) 486
Rx+=mpyi(Rs,#u8)
Word32 Q6_R_mpyiacc_RI(Word32 Rx, Word32 Rs, Word32 Iu8) 486
Rx+=mpyi(Rs,Rt)
Word32 Q6_R_mpyiacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 486
Rx-=mpyi(Rs,#u8)
Word32 Q6_R_mpyinac_RI(Word32 Rx, Word32 Rs, Word32 Iu8) 486
Rx-=mpyi(Rs,Rt)
Word32 Q6_R_mpyinac_RR(Word32 Rx, Word32 Rs, Word32 Rt) 486

mpysu
Rd=mpysu(Rs,Rt)
Word32 Q6_R_mpysu_RR(Word32 Rs, Word32 Rt) 513

mpyu
Rd=mpyu(Rs,Rt)
UWord32 Q6_R_mpyu_RR(Word32 Rs, Word32 Rt) 513
Rd=mpyu(Rs.H,Rt.H)
UWord32 Q6_R_mpyu_RhRh(Word32 Rs, Word32 Rt) 504
Rd=mpyu(Rs.H,Rt.H):<<1
UWord32 Q6_R_mpyu_RhRh_s1(Word32 Rs, Word32 Rt) 504
Rd=mpyu(Rs.H,Rt.L)
UWord32 Q6_R_mpyu_RhRl(Word32 Rs, Word32 Rt) 504
Rd=mpyu(Rs.H,Rt.L):<<1
UWord32 Q6_R_mpyu_RhRl_s1(Word32 Rs, Word32 Rt) 504
Rd=mpyu(Rs.L,Rt.H)
UWord32 Q6_R_mpyu_RlRh(Word32 Rs, Word32 Rt) 504
Rd=mpyu(Rs.L,Rt.H):<<1
UWord32 Q6_R_mpyu_RlRh_s1(Word32 Rs, Word32 Rt) 504
Rd=mpyu(Rs.L,Rt.L)
UWord32 Q6_R_mpyu_RlRl(Word32 Rs, Word32 Rt) 504
Rd=mpyu(Rs.L,Rt.L):<<1
UWord32 Q6_R_mpyu_RlRl_s1(Word32 Rs, Word32 Rt) 504
Rdd=mpyu(Rs,Rt)
UWord64 Q6_P_mpyu_RR(Word32 Rs, Word32 Rt) 515
Rdd=mpyu(Rs.H,Rt.H)

80-N2040-53 Rev. AA 670


Qualcomm Hexagon V73 Programmer’s Reference Manual

UWord64 Q6_P_mpyu_RhRh(Word32 Rs, Word32 Rt) 504


Rdd=mpyu(Rs.H,Rt.H):<<1
UWord64 Q6_P_mpyu_RhRh_s1(Word32 Rs, Word32 Rt) 504
Rdd=mpyu(Rs.H,Rt.L)
UWord64 Q6_P_mpyu_RhRl(Word32 Rs, Word32 Rt) 504
Rdd=mpyu(Rs.H,Rt.L):<<1
UWord64 Q6_P_mpyu_RhRl_s1(Word32 Rs, Word32 Rt) 504
Rdd=mpyu(Rs.L,Rt.H)
UWord64 Q6_P_mpyu_RlRh(Word32 Rs, Word32 Rt) 504
Rdd=mpyu(Rs.L,Rt.H):<<1
UWord64 Q6_P_mpyu_RlRh_s1(Word32 Rs, Word32 Rt) 504
Rdd=mpyu(Rs.L,Rt.L)
UWord64 Q6_P_mpyu_RlRl(Word32 Rs, Word32 Rt) 504
Rdd=mpyu(Rs.L,Rt.L):<<1
UWord64 Q6_P_mpyu_RlRl_s1(Word32 Rs, Word32 Rt) 504
Rx+=mpyu(Rs.H,Rt.H)
Word32 Q6_R_mpyuacc_RhRh(Word32 Rx, Word32 Rs, Word32 Rt) 504
Rx+=mpyu(Rs.H,Rt.H):<<1
Word32 Q6_R_mpyuacc_RhRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) 504
Rx+=mpyu(Rs.H,Rt.L)
Word32 Q6_R_mpyuacc_RhRl(Word32 Rx, Word32 Rs, Word32 Rt) 504
Rx+=mpyu(Rs.H,Rt.L):<<1
Word32 Q6_R_mpyuacc_RhRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) 504
Rx+=mpyu(Rs.L,Rt.H)
Word32 Q6_R_mpyuacc_RlRh(Word32 Rx, Word32 Rs, Word32 Rt) 504
Rx+=mpyu(Rs.L,Rt.H):<<1
Word32 Q6_R_mpyuacc_RlRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) 504
Rx+=mpyu(Rs.L,Rt.L)
Word32 Q6_R_mpyuacc_RlRl(Word32 Rx, Word32 Rs, Word32 Rt) 504
Rx+=mpyu(Rs.L,Rt.L):<<1
Word32 Q6_R_mpyuacc_RlRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) 504
Rx-=mpyu(Rs.H,Rt.H)
Word32 Q6_R_mpyunac_RhRh(Word32 Rx, Word32 Rs, Word32 Rt) 504
Rx-=mpyu(Rs.H,Rt.H):<<1
Word32 Q6_R_mpyunac_RhRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) 504
Rx-=mpyu(Rs.H,Rt.L)
Word32 Q6_R_mpyunac_RhRl(Word32 Rx, Word32 Rs, Word32 Rt) 504
Rx-=mpyu(Rs.H,Rt.L):<<1
Word32 Q6_R_mpyunac_RhRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) 505
Rx-=mpyu(Rs.L,Rt.H)
Word32 Q6_R_mpyunac_RlRh(Word32 Rx, Word32 Rs, Word32 Rt) 505
Rx-=mpyu(Rs.L,Rt.H):<<1
Word32 Q6_R_mpyunac_RlRh_s1(Word32 Rx, Word32 Rs, Word32 Rt) 505
Rx-=mpyu(Rs.L,Rt.L)
Word32 Q6_R_mpyunac_RlRl(Word32 Rx, Word32 Rs, Word32 Rt) 505
Rx-=mpyu(Rs.L,Rt.L):<<1
Word32 Q6_R_mpyunac_RlRl_s1(Word32 Rx, Word32 Rs, Word32 Rt) 505
Rxx+=mpyu(Rs,Rt)
Word64 Q6_P_mpyuacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 515
Rxx+=mpyu(Rs.H,Rt.H)
Word64 Q6_P_mpyuacc_RhRh(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx+=mpyu(Rs.H,Rt.H):<<1
Word64 Q6_P_mpyuacc_RhRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx+=mpyu(Rs.H,Rt.L)
Word64 Q6_P_mpyuacc_RhRl(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx+=mpyu(Rs.H,Rt.L):<<1
Word64 Q6_P_mpyuacc_RhRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx+=mpyu(Rs.L,Rt.H)

80-N2040-53 Rev. AA 671


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word64 Q6_P_mpyuacc_RlRh(Word64 Rxx, Word32 Rs, Word32 Rt) 505


Rxx+=mpyu(Rs.L,Rt.H):<<1
Word64 Q6_P_mpyuacc_RlRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx+=mpyu(Rs.L,Rt.L)
Word64 Q6_P_mpyuacc_RlRl(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx+=mpyu(Rs.L,Rt.L):<<1
Word64 Q6_P_mpyuacc_RlRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx-=mpyu(Rs,Rt)
Word64 Q6_P_mpyunac_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 515
Rxx-=mpyu(Rs.H,Rt.H)
Word64 Q6_P_mpyunac_RhRh(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx-=mpyu(Rs.H,Rt.H):<<1
Word64 Q6_P_mpyunac_RhRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx-=mpyu(Rs.H,Rt.L)
Word64 Q6_P_mpyunac_RhRl(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx-=mpyu(Rs.H,Rt.L):<<1
Word64 Q6_P_mpyunac_RhRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx-=mpyu(Rs.L,Rt.H)
Word64 Q6_P_mpyunac_RlRh(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx-=mpyu(Rs.L,Rt.H):<<1
Word64 Q6_P_mpyunac_RlRh_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx-=mpyu(Rs.L,Rt.L)
Word64 Q6_P_mpyunac_RlRl(Word64 Rxx, Word32 Rs, Word32 Rt) 505
Rxx-=mpyu(Rs.L,Rt.L):<<1
Word64 Q6_P_mpyunac_RlRl_s1(Word64 Rxx, Word32 Rs, Word32 Rt) 505

mpyui
Rd=mpyui(Rs,Rt)
Word32 Q6_R_mpyui_RR(Word32 Rs, Word32 Rt) 486

mux
Rd=mux(Pu,#s8,#S8)
Word32 Q6_R_mux_pII(Byte Pu, Word32 Is8, Word32 IS8) 174
Rd=mux(Pu,#s8,Rs)
Word32 Q6_R_mux_pIR(Byte Pu, Word32 Is8, Word32 Rs) 174
Rd=mux(Pu,Rs,#s8)
Word32 Q6_R_mux_pRI(Byte Pu, Word32 Rs, Word32 Is8) 174
Rd=mux(Pu,Rs,Rt)
Word32 Q6_R_mux_pRR(Byte Pu, Word32 Rs, Word32 Rt) 174

N
neg
Rd=neg(Rs)
Word32 Q6_R_neg_R(Word32 Rs) 160
Rd=neg(Rs):sat
Word32 Q6_R_neg_R_sat(Word32 Rs) 354
Rdd=neg(Rss)
Word64 Q6_P_neg_P(Word64 Rss) 354

no mnemonic
Pd=Ps
Byte Q6_p_equals_p(Byte Ps) 203
Pd=Rs
Byte Q6_p_equals_R(Word32 Rs) 577
Rd=#s16
Word32 Q6_R_equals_I(Word32 Is16) 165
Rd=Ps

80-N2040-53 Rev. AA 672


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word32 Q6_R_equals_p(Byte Ps) 577


Rd=Rs
Word32 Q6_R_equals_R(Word32 Rs) 167
Rdd=#s8
Word64 Q6_P_equals_I(Word32 Is8) 165
Rdd=Rss
Word64 Q6_P_equals_P(Word64 Rss) 167
Rx.H=#u16
Word32 Q6_Rh_equals_I(Word32 Rx, Word32 Iu16) 165
Rx.L=#u16
Word32 Q6_Rl_equals_I(Word32 Rx, Word32 Iu16) 165

normamt
Rd=normamt(Rs)
Word32 Q6_R_normamt_R(Word32 Rs) 410
Rd=normamt(Rss)
Word32 Q6_R_normamt_P(Word64 Rss) 410

not
Pd=not(Ps)
Byte Q6_p_not_p(Byte Ps) 203
Rd=not(Rs)
Word32 Q6_R_not_R(Word32 Rs) 158
Rdd=not(Rss)
Word64 Q6_P_not_P(Word64 Rss) 343

O
or
Pd=and(Ps,or(Pt,!Pu))
Byte Q6_p_and_or_ppnp(Byte Ps, Byte Pt, Byte Pu) 203
Pd=and(Ps,or(Pt,Pu))
Byte Q6_p_and_or_ppp(Byte Ps, Byte Pt, Byte Pu) 203
Pd=or(Ps,and(Pt,!Pu))
Byte Q6_p_or_and_ppnp(Byte Ps, Byte Pt, Byte Pu) 203
Pd=or(Ps,and(Pt,Pu))
Byte Q6_p_or_and_ppp(Byte Ps, Byte Pt, Byte Pu) 203
Pd=or(Ps,or(Pt,!Pu))
Byte Q6_p_or_or_ppnp(Byte Ps, Byte Pt, Byte Pu) 204
Pd=or(Ps,or(Pt,Pu))
Byte Q6_p_or_or_ppp(Byte Ps, Byte Pt, Byte Pu) 204
Pd=or(Pt,!Ps)
Byte Q6_p_or_pnp(Byte Pt, Byte Ps) 204
Pd=or(Pt,Ps)
Byte Q6_p_or_pp(Byte Pt, Byte Ps) 204
Rd=or(Rs,#s10)
Word32 Q6_R_or_RI(Word32 Rs, Word32 Is10) 158
Rd=or(Rs,Rt)
Word32 Q6_R_or_RR(Word32 Rs, Word32 Rt) 158
Rd=or(Rt,~Rs)
Word32 Q6_R_or_RnR(Word32 Rt, Word32 Rs) 158
Rdd=or(Rss,Rtt)
Word64 Q6_P_or_PP(Word64 Rss, Word64 Rtt) 343
Rdd=or(Rtt,~Rss)
Word64 Q6_P_or_PnP(Word64 Rtt, Word64 Rss) 343
Rx&=or(Rs,Rt)
Word32 Q6_R_orand_RR(Word32 Rx, Word32 Rs, Word32 Rt) 347
Rx^=or(Rs,Rt)

80-N2040-53 Rev. AA 673


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word32 Q6_R_orxacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 347


Rx=or(Ru,and(Rx,#s10))
Word32 Q6_R_or_and_RRI(Word32 Ru, Word32 Rx, Word32 Is10) 347
Rx|=or(Rs,#s10)
Word32 Q6_R_oror_RI(Word32 Rx, Word32 Rs, Word32 Is10) 347
Rx|=or(Rs,Rt)
Word32 Q6_R_oror_RR(Word32 Rx, Word32 Rs, Word32 Rt) 347

P
packhl
Rdd=packhl(Rs,Rt)
Word64 Q6_P_packhl_RR(Word32 Rs, Word32 Rt) 177

parity
Rd=parity(Rs,Rt)
Word32 Q6_R_parity_RR(Word32 Rs, Word32 Rt) 420
Rd=parity(Rss,Rtt)
Word32 Q6_R_parity_PP(Word64 Rss, Word64 Rtt) 420

pmpyw
Rdd=pmpyw(Rs,Rt)
Word64 Q6_P_pmpyw_RR(Word32 Rs, Word32 Rt) 509
Rxx^=pmpyw(Rs,Rt)
Word64 Q6_P_pmpywxacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 509

popcount
Rd=popcount(Rss)
Word32 Q6_R_popcount_P(Word64 Rss) 411

R
rol
Rd=rol(Rs,#u5)
Word32 Q6_R_rol_RI(Word32 Rs, Word32 Iu5) 589
Rdd=rol(Rss,#u6)
Word64 Q6_P_rol_PI(Word64 Rss, Word32 Iu6) 590
Rx&=rol(Rs,#u5)
Word32 Q6_R_roland_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 596
Rx^=rol(Rs,#u5)
Word32 Q6_R_rolxacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 596
Rx+=rol(Rs,#u5)
Word32 Q6_R_rolacc_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 592
Rx-=rol(Rs,#u5)
Word32 Q6_R_rolnac_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 592
Rx|=rol(Rs,#u5)
Word32 Q6_R_rolor_RI(Word32 Rx, Word32 Rs, Word32 Iu5) 596
Rxx&=rol(Rss,#u6)
Word64 Q6_P_roland_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 596
Rxx^=rol(Rss,#u6)
Word64 Q6_P_rolxacc_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 597
Rxx+=rol(Rss,#u6)
Word64 Q6_P_rolacc_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 592
Rxx-=rol(Rss,#u6)
Word64 Q6_P_rolnac_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 592
Rxx|=rol(Rss,#u6)
Word64 Q6_P_rolor_PI(Word64 Rxx, Word64 Rss, Word32 Iu6) 597

round
Rd=round(Rs,#u5)

80-N2040-53 Rev. AA 674


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word32 Q6_R_round_RI(Word32 Rs, Word32 Iu5) 357


Rd=round(Rs,#u5):sat
Word32 Q6_R_round_RI_sat(Word32 Rs, Word32 Iu5) 357
Rd=round(Rs,Rt)
Word32 Q6_R_round_RR(Word32 Rs, Word32 Rt) 357
Rd=round(Rs,Rt):sat
Word32 Q6_R_round_RR_sat(Word32 Rs, Word32 Rt) 357
Rd=round(Rss):sat
Word32 Q6_R_round_P_sat(Word64 Rss) 357

S
sat
Rd=sat(Rss)
Word32 Q6_R_sat_P(Word64 Rss) 542

satb
Rd=satb(Rs)
Word32 Q6_R_satb_R(Word32 Rs) 542

sath
Rd=sath(Rs)
Word32 Q6_R_sath_R(Word32 Rs) 542

satub
Rd=satub(Rs)
Word32 Q6_R_satub_R(Word32 Rs) 542

satuh
Rd=satuh(Rs)
Word32 Q6_R_satuh_R(Word32 Rs) 542

setbit
Rd=setbit(Rs,#u5)
Word32 Q6_R_setbit_RI(Word32 Rs, Word32 Iu5) 422
Rd=setbit(Rs,Rt)
Word32 Q6_R_setbit_RR(Word32 Rs, Word32 Rt) 422

sfadd
Rd=sfadd(Rs,Rt)
Word32 Q6_R_sfadd_RR(Word32 Rs, Word32 Rt) 461

sfclass
Pd=sfclass(Rs,#u5)
Byte Q6_p_sfclass_RI(Word32 Rs, Word32 Iu5) 462

sfcmp.eq
Pd=sfcmp.eq(Rs,Rt)
Byte Q6_p_sfcmp_eq_RR(Word32 Rs, Word32 Rt) 464

sfcmp.ge
Pd=sfcmp.ge(Rs,Rt)
Byte Q6_p_sfcmp_ge_RR(Word32 Rs, Word32 Rt) 464

sfcmp.gt
Pd=sfcmp.gt(Rs,Rt)
Byte Q6_p_sfcmp_gt_RR(Word32 Rs, Word32 Rt) 464

sfcmp.uo
Pd=sfcmp.uo(Rs,Rt)

80-N2040-53 Rev. AA 675


Qualcomm Hexagon V73 Programmer’s Reference Manual

Byte Q6_p_sfcmp_uo_RR(Word32 Rs, Word32 Rt) 464

sffixupd
Rd=sffixupd(Rs,Rt)
Word32 Q6_R_sffixupd_RR(Word32 Rs, Word32 Rt) 472

sffixupn
Rd=sffixupn(Rs,Rt)
Word32 Q6_R_sffixupn_RR(Word32 Rs, Word32 Rt) 472

sffixupr
Rd=sffixupr(Rs)
Word32 Q6_R_sffixupr_R(Word32 Rs) 472

sfmake
Rd=sfmake(#u10):neg
Word32 Q6_R_sfmake_I_neg(Word32 Iu10) 478
Rd=sfmake(#u10):pos
Word32 Q6_R_sfmake_I_pos(Word32 Iu10) 478

sfmax
Rd=sfmax(Rs,Rt)
Word32 Q6_R_sfmax_RR(Word32 Rs, Word32 Rt) 479

sfmin
Rd=sfmin(Rs,Rt)
Word32 Q6_R_sfmin_RR(Word32 Rs, Word32 Rt) 480

sfmpy
Rd=sfmpy(Rs,Rt)
Word32 Q6_R_sfmpy_RR(Word32 Rs, Word32 Rt) 481
Rx+=sfmpy(Rs,Rt,Pu):scale
Word32 Q6_R_sfmpyacc_RRp_scale(Word32 Rx, Word32 Rs, Word32 Rt, Byte Pu) 474
Rx+=sfmpy(Rs,Rt)
Word32 Q6_R_sfmpyacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 473
Rx+=sfmpy(Rs,Rt):lib
Word32 Q6_R_sfmpyacc_RR_lib(Word32 Rx, Word32 Rs, Word32 Rt) 476
Rx-=sfmpy(Rs,Rt)
Word32 Q6_R_sfmpynac_RR(Word32 Rx, Word32 Rs, Word32 Rt) 473
Rx-=sfmpy(Rs,Rt):lib
Word32 Q6_R_sfmpynac_RR_lib(Word32 Rx, Word32 Rs, Word32 Rt) 476

sfsub
Rd=sfsub(Rs,Rt)
Word32 Q6_R_sfsub_RR(Word32 Rs, Word32 Rt) 483

shuffeb
Rdd=shuffeb(Rss,Rtt)
Word64 Q6_P_shuffeb_PP(Word64 Rss, Word64 Rtt) 556

shuffeh
Rdd=shuffeh(Rss,Rtt)
Word64 Q6_P_shuffeh_PP(Word64 Rss, Word64 Rtt) 556

shuffob
Rdd=shuffob(Rtt,Rss)
Word64 Q6_P_shuffob_PP(Word64 Rtt, Word64 Rss) 556

shuffoh
Rdd=shuffoh(Rtt,Rss)

80-N2040-53 Rev. AA 676


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word64 Q6_P_shuffoh_PP(Word64 Rtt, Word64 Rss) 556

sub
Rd=add(Rs,sub(#s6,Ru))
Word32 Q6_R_add_sub_RIR(Word32 Rs, Word32 Is6, Word32 Ru) 335
Rd=sub(#s10,Rs)
Word32 Q6_R_sub_IR(Word32 Is10, Word32 Rs) 162
Rd=sub(Rt,Rs)
Word32 Q6_R_sub_RR(Word32 Rt, Word32 Rs) 162
Rd=sub(Rt,Rs):sat
Word32 Q6_R_sub_RR_sat(Word32 Rt, Word32 Rs) 162
Rd=sub(Rt.H,Rs.H):<<16
Word32 Q6_R_sub_RhRh_s16(Word32 Rt, Word32 Rs) 361
Rd=sub(Rt.H,Rs.H):sat:<<16
Word32 Q6_R_sub_RhRh_sat_s16(Word32 Rt, Word32 Rs) 361
Rd=sub(Rt.H,Rs.L):<<16
Word32 Q6_R_sub_RhRl_s16(Word32 Rt, Word32 Rs) 361
Rd=sub(Rt.H,Rs.L):sat:<<16
Word32 Q6_R_sub_RhRl_sat_s16(Word32 Rt, Word32 Rs) 361
Rd=sub(Rt.L,Rs.H)
Word32 Q6_R_sub_RlRh(Word32 Rt, Word32 Rs) 361
Rd=sub(Rt.L,Rs.H):<<16
Word32 Q6_R_sub_RlRh_s16(Word32 Rt, Word32 Rs) 361
Rd=sub(Rt.L,Rs.H):sat
Word32 Q6_R_sub_RlRh_sat(Word32 Rt, Word32 Rs) 361
Rd=sub(Rt.L,Rs.H):sat:<<16
Word32 Q6_R_sub_RlRh_sat_s16(Word32 Rt, Word32 Rs) 361
Rd=sub(Rt.L,Rs.L)
Word32 Q6_R_sub_RlRl(Word32 Rt, Word32 Rs) 361
Rd=sub(Rt.L,Rs.L):<<16
Word32 Q6_R_sub_RlRl_s16(Word32 Rt, Word32 Rs) 361
Rd=sub(Rt.L,Rs.L):sat
Word32 Q6_R_sub_RlRl_sat(Word32 Rt, Word32 Rs) 361
Rd=sub(Rt.L,Rs.L):sat:<<16
Word32 Q6_R_sub_RlRl_sat_s16(Word32 Rt, Word32 Rs) 361
Rdd=sub(Rtt,Rss)
Word64 Q6_P_sub_PP(Word64 Rtt, Word64 Rss) 358
Rx+=sub(Rt,Rs)
Word32 Q6_R_subacc_RR(Word32 Rx, Word32 Rt, Word32 Rs) 359

swiz
Rd=swiz(Rs)
Word32 Q6_R_swiz_R(Word32 Rs) 544

sxtb
Rd=sxtb(Rs)
Word32 Q6_R_sxtb_R(Word32 Rs) 164

sxth
Rd=sxth(Rs)
Word32 Q6_R_sxth_R(Word32 Rs) 164

sxtw
Rdd=sxtw(Rs)
Word64 Q6_P_sxtw_R(Word32 Rs) 362

80-N2040-53 Rev. AA 677


Qualcomm Hexagon V73 Programmer’s Reference Manual

T
tableidxb
Rx=tableidxb(Rs,#u4,#U5)
Word32 Q6_R_tableidxb_RII(Word32 Rx, Word32 Rs, Word32 Iu4, Word32 IU5) 427

tableidxd
Rx=tableidxd(Rs,#u4,#U5)
Word32 Q6_R_tableidxd_RII(Word32 Rx, Word32 Rs, Word32 Iu4, Word32 IU5) 427

tableidxh
Rx=tableidxh(Rs,#u4,#U5)
Word32 Q6_R_tableidxh_RII(Word32 Rx, Word32 Rs, Word32 Iu4, Word32 IU5) 427

tableidxw
Rx=tableidxw(Rs,#u4,#U5)
Word32 Q6_R_tableidxw_RII(Word32 Rx, Word32 Rs, Word32 Iu4, Word32 IU5) 427

tlbmatch
Pd=tlbmatch(Rss,Rt)
Byte Q6_p_tlbmatch_PR(Word64 Rss, Word32 Rt) 576

togglebit
Rd=togglebit(Rs,#u5)
Word32 Q6_R_togglebit_RI(Word32 Rs, Word32 Iu5) 422
Rd=togglebit(Rs,Rt)
Word32 Q6_R_togglebit_RR(Word32 Rs, Word32 Rt) 422

tstbit
Pd=!tstbit(Rs,#u5)
Byte Q6_p_not_tstbit_RI(Word32 Rs, Word32 Iu5) 578
Pd=!tstbit(Rs,Rt)
Byte Q6_p_not_tstbit_RR(Word32 Rs, Word32 Rt) 578
Pd=tstbit(Rs,#u5)
Byte Q6_p_tstbit_RI(Word32 Rs, Word32 Iu5) 578
Pd=tstbit(Rs,Rt)
Byte Q6_p_tstbit_RR(Word32 Rs, Word32 Rt) 578

V
vabsdiffb
Rdd=vabsdiffb(Rtt,Rss)
Word64 Q6_P_vabsdiffb_PP(Word64 Rtt, Word64 Rss) 365

vabsdiffh
Rdd=vabsdiffh(Rtt,Rss)
Word64 Q6_P_vabsdiffh_PP(Word64 Rtt, Word64 Rss) 366

vabsdiffub
Rdd=vabsdiffub(Rtt,Rss)
Word64 Q6_P_vabsdiffub_PP(Word64 Rtt, Word64 Rss) 365

vabsdiffw
Rdd=vabsdiffw(Rtt,Rss)
Word64 Q6_P_vabsdiffw_PP(Word64 Rtt, Word64 Rss) 367

vabsh
Rdd=vabsh(Rss)
Word64 Q6_P_vabsh_P(Word64 Rss) 363
Rdd=vabsh(Rss):sat

80-N2040-53 Rev. AA 678


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word64 Q6_P_vabsh_P_sat(Word64 Rss) 363

vabsw
Rdd=vabsw(Rss)
Word64 Q6_P_vabsw_P(Word64 Rss) 364
Rdd=vabsw(Rss):sat
Word64 Q6_P_vabsw_P_sat(Word64 Rss) 364

vaddb
Rdd=vaddb(Rss,Rtt)
Word64 Q6_P_vaddb_PP(Word64 Rss, Word64 Rtt) 378

vaddh
Rd=vaddh(Rs,Rt)
Word32 Q6_R_vaddh_RR(Word32 Rs, Word32 Rt) 168
Rd=vaddh(Rs,Rt):sat
Word32 Q6_R_vaddh_RR_sat(Word32 Rs, Word32 Rt) 168
Rdd=vaddh(Rss,Rtt)
Word64 Q6_P_vaddh_PP(Word64 Rss, Word64 Rtt) 371
Rdd=vaddh(Rss,Rtt):sat
Word64 Q6_P_vaddh_PP_sat(Word64 Rss, Word64 Rtt) 371

vaddhub
Rd=vaddhub(Rss,Rtt):sat
Word32 Q6_R_vaddhub_PP_sat(Word64 Rss, Word64 Rtt) 373

vaddub
Rdd=vaddub(Rss,Rtt)
Word64 Q6_P_vaddub_PP(Word64 Rss, Word64 Rtt) 378
Rdd=vaddub(Rss,Rtt):sat
Word64 Q6_P_vaddub_PP_sat(Word64 Rss, Word64 Rtt) 378

vadduh
Rd=vadduh(Rs,Rt):sat
Word32 Q6_R_vadduh_RR_sat(Word32 Rs, Word32 Rt) 168
Rdd=vadduh(Rss,Rtt):sat
Word64 Q6_P_vadduh_PP_sat(Word64 Rss, Word64 Rtt) 371

vaddw
Rdd=vaddw(Rss,Rtt)
Word64 Q6_P_vaddw_PP(Word64 Rss, Word64 Rtt) 379
Rdd=vaddw(Rss,Rtt):sat
Word64 Q6_P_vaddw_PP_sat(Word64 Rss, Word64 Rtt) 379

valignb
Rdd=valignb(Rtt,Rss,#u3)
Word64 Q6_P_valignb_PPI(Word64 Rtt, Word64 Rss, Word32 Iu3) 545
Rdd=valignb(Rtt,Rss,Pu)
Word64 Q6_P_valignb_PPp(Word64 Rtt, Word64 Rss, Byte Pu) 545

vaslh
Rdd=vaslh(Rss,#u4)
Word64 Q6_P_vaslh_PI(Word64 Rss, Word32 Iu4) 611
Rdd=vaslh(Rss,Rt)
Word64 Q6_P_vaslh_PR(Word64 Rss, Word32 Rt) 616

vaslw
Rdd=vaslw(Rss,#u5)
Word64 Q6_P_vaslw_PI(Word64 Rss, Word32 Iu5) 617
Rdd=vaslw(Rss,Rt)

80-N2040-53 Rev. AA 679


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word64 Q6_P_vaslw_PR(Word64 Rss, Word32 Rt) 618

vasrh
Rdd=vasrh(Rss,#u4)
Word64 Q6_P_vasrh_PI(Word64 Rss, Word32 Iu4) 611
Rdd=vasrh(Rss,#u4):rnd
Word64 Q6_P_vasrh_PI_rnd(Word64 Rss, Word32 Iu4) 612
Rdd=vasrh(Rss,Rt)
Word64 Q6_P_vasrh_PR(Word64 Rss, Word32 Rt) 616

vasrhub
Rd=vasrhub(Rss,#u4):rnd:sat
Word32 Q6_R_vasrhub_PI_rnd_sat(Word64 Rss, Word32 Iu4) 614
Rd=vasrhub(Rss,#u4):sat
Word32 Q6_R_vasrhub_PI_sat(Word64 Rss, Word32 Iu4) 614

vasrw
Rd=vasrw(Rss,#u5)
Word32 Q6_R_vasrw_PI(Word64 Rss, Word32 Iu5) 620
Rd=vasrw(Rss,Rt)
Word32 Q6_R_vasrw_PR(Word64 Rss, Word32 Rt) 620
Rdd=vasrw(Rss,#u5)
Word64 Q6_P_vasrw_PI(Word64 Rss, Word32 Iu5) 617
Rdd=vasrw(Rss,Rt)
Word64 Q6_P_vasrw_PR(Word64 Rss, Word32 Rt) 618

vavgh
Rd=vavgh(Rs,Rt)
Word32 Q6_R_vavgh_RR(Word32 Rs, Word32 Rt) 169
Rd=vavgh(Rs,Rt):rnd
Word32 Q6_R_vavgh_RR_rnd(Word32 Rs, Word32 Rt) 169
Rdd=vavgh(Rss,Rtt)
Word64 Q6_P_vavgh_PP(Word64 Rss, Word64 Rtt) 381
Rdd=vavgh(Rss,Rtt):crnd
Word64 Q6_P_vavgh_PP_crnd(Word64 Rss, Word64 Rtt) 381
Rdd=vavgh(Rss,Rtt):rnd
Word64 Q6_P_vavgh_PP_rnd(Word64 Rss, Word64 Rtt) 381

vavgub
Rdd=vavgub(Rss,Rtt)
Word64 Q6_P_vavgub_PP(Word64 Rss, Word64 Rtt) 382
Rdd=vavgub(Rss,Rtt):rnd
Word64 Q6_P_vavgub_PP_rnd(Word64 Rss, Word64 Rtt) 382

vavguh
Rdd=vavguh(Rss,Rtt)
Word64 Q6_P_vavguh_PP(Word64 Rss, Word64 Rtt) 381
Rdd=vavguh(Rss,Rtt):rnd
Word64 Q6_P_vavguh_PP_rnd(Word64 Rss, Word64 Rtt) 381

vavguw
Rdd=vavguw(Rss,Rtt)
Word64 Q6_P_vavguw_PP(Word64 Rss, Word64 Rtt) 384
Rdd=vavguw(Rss,Rtt):rnd
Word64 Q6_P_vavguw_PP_rnd(Word64 Rss, Word64 Rtt) 384

vavgw
Rdd=vavgw(Rss,Rtt)

80-N2040-53 Rev. AA 680


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word64 Q6_P_vavgw_PP(Word64 Rss, Word64 Rtt) 384


Rdd=vavgw(Rss,Rtt):crnd
Word64 Q6_P_vavgw_PP_crnd(Word64 Rss, Word64 Rtt) 384
Rdd=vavgw(Rss,Rtt):rnd
Word64 Q6_P_vavgw_PP_rnd(Word64 Rss, Word64 Rtt) 384

vclip
Rdd=vclip(Rss,#u5)
Word64 Q6_P_vclip_PI(Word64 Rss, Word32 Iu5) 385

vcmpb.eq
Pd=!any8(vcmpb.eq(Rss,Rtt))
Byte Q6_p_not_any8_vcmpb_eq_PP(Word64 Rss, Word64 Rtt) 581
Pd=any8(vcmpb.eq(Rss,Rtt))
Byte Q6_p_any8_vcmpb_eq_PP(Word64 Rss, Word64 Rtt) 581
Pd=vcmpb.eq(Rss,#u8)
Byte Q6_p_vcmpb_eq_PI(Word64 Rss, Word32 Iu8) 583
Pd=vcmpb.eq(Rss,Rtt)
Byte Q6_p_vcmpb_eq_PP(Word64 Rss, Word64 Rtt) 583

vcmpb.gt
Pd=vcmpb.gt(Rss,#s8)
Byte Q6_p_vcmpb_gt_PI(Word64 Rss, Word32 Is8) 583
Pd=vcmpb.gt(Rss,Rtt)
Byte Q6_p_vcmpb_gt_PP(Word64 Rss, Word64 Rtt) 583

vcmpb.gtu
Pd=vcmpb.gtu(Rss,#u7)
Byte Q6_p_vcmpb_gtu_PI(Word64 Rss, Word32 Iu7) 583
Pd=vcmpb.gtu(Rss,Rtt)
Byte Q6_p_vcmpb_gtu_PP(Word64 Rss, Word64 Rtt) 583

vcmph.eq
Pd=vcmph.eq(Rss,#s8)
Byte Q6_p_vcmph_eq_PI(Word64 Rss, Word32 Is8) 580
Pd=vcmph.eq(Rss,Rtt)
Byte Q6_p_vcmph_eq_PP(Word64 Rss, Word64 Rtt) 580

vcmph.gt
Pd=vcmph.gt(Rss,#s8)
Byte Q6_p_vcmph_gt_PI(Word64 Rss, Word32 Is8) 580
Pd=vcmph.gt(Rss,Rtt)
Byte Q6_p_vcmph_gt_PP(Word64 Rss, Word64 Rtt) 580

vcmph.gtu
Pd=vcmph.gtu(Rss,#u7)
Byte Q6_p_vcmph_gtu_PI(Word64 Rss, Word32 Iu7) 580
Pd=vcmph.gtu(Rss,Rtt)
Byte Q6_p_vcmph_gtu_PP(Word64 Rss, Word64 Rtt) 580

vcmpw.eq
Pd=vcmpw.eq(Rss,#s8)
Byte Q6_p_vcmpw_eq_PI(Word64 Rss, Word32 Is8) 585
Pd=vcmpw.eq(Rss,Rtt)
Byte Q6_p_vcmpw_eq_PP(Word64 Rss, Word64 Rtt) 585

vcmpw.gt
Pd=vcmpw.gt(Rss,#s8)

80-N2040-53 Rev. AA 681


Qualcomm Hexagon V73 Programmer’s Reference Manual

Byte Q6_p_vcmpw_gt_PI(Word64 Rss, Word32 Is8) 585


Pd=vcmpw.gt(Rss,Rtt)
Byte Q6_p_vcmpw_gt_PP(Word64 Rss, Word64 Rtt) 585

vcmpw.gtu
Pd=vcmpw.gtu(Rss,#u7)
Byte Q6_p_vcmpw_gtu_PI(Word64 Rss, Word32 Iu7) 585
Pd=vcmpw.gtu(Rss,Rtt)
Byte Q6_p_vcmpw_gtu_PP(Word64 Rss, Word64 Rtt) 585

vcmpyi
Rdd=vcmpyi(Rss,Rtt):<<1:sat
Word64 Q6_P_vcmpyi_PP_s1_sat(Word64 Rss, Word64 Rtt) 448
Rdd=vcmpyi(Rss,Rtt):sat
Word64 Q6_P_vcmpyi_PP_sat(Word64 Rss, Word64 Rtt) 448
Rxx+=vcmpyi(Rss,Rtt):sat
Word64 Q6_P_vcmpyiacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 448

vcmpyr
Rdd=vcmpyr(Rss,Rtt):<<1:sat
Word64 Q6_P_vcmpyr_PP_s1_sat(Word64 Rss, Word64 Rtt) 448
Rdd=vcmpyr(Rss,Rtt):sat
Word64 Q6_P_vcmpyr_PP_sat(Word64 Rss, Word64 Rtt) 448
Rxx+=vcmpyr(Rss,Rtt):sat
Word64 Q6_P_vcmpyracc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 448

vcnegh
Rdd=vcnegh(Rss,Rt)
Word64 Q6_P_vcnegh_PR(Word64 Rss, Word32 Rt) 386

vconj
Rdd=vconj(Rss):sat
Word64 Q6_P_vconj_P_sat(Word64 Rss) 450

vcrotate
Rdd=vcrotate(Rss,Rt)
Word64 Q6_P_vcrotate_PR(Word64 Rss, Word32 Rt) 452

vdmpy
Rd=vdmpy(Rss,Rtt):<<1:rnd:sat
Word32 Q6_R_vdmpy_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) 521
Rd=vdmpy(Rss,Rtt):rnd:sat
Word32 Q6_R_vdmpy_PP_rnd_sat(Word64 Rss, Word64 Rtt) 521
Rdd=vdmpy(Rss,Rtt):<<1:sat
Word64 Q6_P_vdmpy_PP_s1_sat(Word64 Rss, Word64 Rtt) 518
Rdd=vdmpy(Rss,Rtt):sat
Word64 Q6_P_vdmpy_PP_sat(Word64 Rss, Word64 Rtt) 518
Rxx+=vdmpy(Rss,Rtt):<<1:sat
Word64 Q6_P_vdmpyacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 518
Rxx+=vdmpy(Rss,Rtt):sat
Word64 Q6_P_vdmpyacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 518

vdmpybsu
Rdd=vdmpybsu(Rss,Rtt):sat
Word64 Q6_P_vdmpybsu_PP_sat(Word64 Rss, Word64 Rtt) 525
Rxx+=vdmpybsu(Rss,Rtt):sat
Word64 Q6_P_vdmpybsuacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 525

80-N2040-53 Rev. AA 682


Qualcomm Hexagon V73 Programmer’s Reference Manual

vitpack
Rd=vitpack(Ps,Pt)
Word32 Q6_R_vitpack_pp(Byte Ps, Byte Pt) 586

vlslh
Rdd=vlslh(Rss,Rt)
Word64 Q6_P_vlslh_PR(Word64 Rss, Word32 Rt) 616

vlslw
Rdd=vlslw(Rss,Rt)
Word64 Q6_P_vlslw_PR(Word64 Rss, Word32 Rt) 618

vlsrh
Rdd=vlsrh(Rss,#u4)
Word64 Q6_P_vlsrh_PI(Word64 Rss, Word32 Iu4) 611
Rdd=vlsrh(Rss,Rt)
Word64 Q6_P_vlsrh_PR(Word64 Rss, Word32 Rt) 616

vlsrw
Rdd=vlsrw(Rss,#u5)
Word64 Q6_P_vlsrw_PI(Word64 Rss, Word32 Iu5) 617
Rdd=vlsrw(Rss,Rt)
Word64 Q6_P_vlsrw_PR(Word64 Rss, Word32 Rt) 618

vmaxb
Rdd=vmaxb(Rtt,Rss)
Word64 Q6_P_vmaxb_PP(Word64 Rtt, Word64 Rss) 388

vmaxh
Rdd=vmaxh(Rtt,Rss)
Word64 Q6_P_vmaxh_PP(Word64 Rtt, Word64 Rss) 389

vmaxub
Rdd=vmaxub(Rtt,Rss)
Word64 Q6_P_vmaxub_PP(Word64 Rtt, Word64 Rss) 388

vmaxuh
Rdd=vmaxuh(Rtt,Rss)
Word64 Q6_P_vmaxuh_PP(Word64 Rtt, Word64 Rss) 389

vmaxuw
Rdd=vmaxuw(Rtt,Rss)
Word64 Q6_P_vmaxuw_PP(Word64 Rtt, Word64 Rss) 394

vmaxw
Rdd=vmaxw(Rtt,Rss)
Word64 Q6_P_vmaxw_PP(Word64 Rtt, Word64 Rss) 394

vminb
Rdd=vminb(Rtt,Rss)
Word64 Q6_P_vminb_PP(Word64 Rtt, Word64 Rss) 395

vminh
Rdd=vminh(Rtt,Rss)
Word64 Q6_P_vminh_PP(Word64 Rtt, Word64 Rss) 397

vminub
Rdd=vminub(Rtt,Rss)
Word64 Q6_P_vminub_PP(Word64 Rtt, Word64 Rss) 395

80-N2040-53 Rev. AA 683


Qualcomm Hexagon V73 Programmer’s Reference Manual

vminuh
Rdd=vminuh(Rtt,Rss)
Word64 Q6_P_vminuh_PP(Word64 Rtt, Word64 Rss) 397

vminuw
Rdd=vminuw(Rtt,Rss)
Word64 Q6_P_vminuw_PP(Word64 Rtt, Word64 Rss) 402

vminw
Rdd=vminw(Rtt,Rss)
Word64 Q6_P_vminw_PP(Word64 Rtt, Word64 Rss) 402

vmpybsu
Rdd=vmpybsu(Rs,Rt)
Word64 Q6_P_vmpybsu_RR(Word32 Rs, Word32 Rt) 537
Rxx+=vmpybsu(Rs,Rt)
Word64 Q6_P_vmpybsuacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 537

vmpybu
Rdd=vmpybu(Rs,Rt)
Word64 Q6_P_vmpybu_RR(Word32 Rs, Word32 Rt) 537
Rxx+=vmpybu(Rs,Rt)
Word64 Q6_P_vmpybuacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 537

vmpyeh
Rdd=vmpyeh(Rss,Rtt):<<1:sat
Word64 Q6_P_vmpyeh_PP_s1_sat(Word64 Rss, Word64 Rtt) 527
Rdd=vmpyeh(Rss,Rtt):sat
Word64 Q6_P_vmpyeh_PP_sat(Word64 Rss, Word64 Rtt) 527
Rxx+=vmpyeh(Rss,Rtt)
Word64 Q6_P_vmpyehacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 527
Rxx+=vmpyeh(Rss,Rtt):<<1:sat
Word64 Q6_P_vmpyehacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 527
Rxx+=vmpyeh(Rss,Rtt):sat
Word64 Q6_P_vmpyehacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 527

vmpyh
Rd=vmpyh(Rs,Rt):<<1:rnd:sat
Word32 Q6_R_vmpyh_RR_s1_rnd_sat(Word32 Rs, Word32 Rt) 531
Rd=vmpyh(Rs,Rt):rnd:sat
Word32 Q6_R_vmpyh_RR_rnd_sat(Word32 Rs, Word32 Rt) 531
Rdd=vmpyh(Rs,Rt):<<1:sat
Word64 Q6_P_vmpyh_RR_s1_sat(Word32 Rs, Word32 Rt) 529
Rdd=vmpyh(Rs,Rt):sat
Word64 Q6_P_vmpyh_RR_sat(Word32 Rs, Word32 Rt) 529
Rxx+=vmpyh(Rs,Rt)
Word64 Q6_P_vmpyhacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 529
Rxx+=vmpyh(Rs,Rt):<<1:sat
Word64 Q6_P_vmpyhacc_RR_s1_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 529
Rxx+=vmpyh(Rs,Rt):sat
Word64 Q6_P_vmpyhacc_RR_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 529

vmpyhsu
Rdd=vmpyhsu(Rs,Rt):<<1:sat
Word64 Q6_P_vmpyhsu_RR_s1_sat(Word32 Rs, Word32 Rt) 532
Rdd=vmpyhsu(Rs,Rt):sat
Word64 Q6_P_vmpyhsu_RR_sat(Word32 Rs, Word32 Rt) 532
Rxx+=vmpyhsu(Rs,Rt):<<1:sat

80-N2040-53 Rev. AA 684


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word64 Q6_P_vmpyhsuacc_RR_s1_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 532


Rxx+=vmpyhsu(Rs,Rt):sat
Word64 Q6_P_vmpyhsuacc_RR_sat(Word64 Rxx, Word32 Rs, Word32 Rt) 532

vmpyweh
Rdd=vmpyweh(Rss,Rtt):<<1:rnd:sat
Word64 Q6_P_vmpyweh_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) 489
Rdd=vmpyweh(Rss,Rtt):<<1:sat
Word64 Q6_P_vmpyweh_PP_s1_sat(Word64 Rss, Word64 Rtt) 489
Rdd=vmpyweh(Rss,Rtt):rnd:sat
Word64 Q6_P_vmpyweh_PP_rnd_sat(Word64 Rss, Word64 Rtt) 489
Rdd=vmpyweh(Rss,Rtt):sat
Word64 Q6_P_vmpyweh_PP_sat(Word64 Rss, Word64 Rtt) 489
Rxx+=vmpyweh(Rss,Rtt):<<1:rnd:sat
Word64 Q6_P_vmpywehacc_PP_s1_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt)
490
Rxx+=vmpyweh(Rss,Rtt):<<1:sat
Word64 Q6_P_vmpywehacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 490
Rxx+=vmpyweh(Rss,Rtt):rnd:sat
Word64 Q6_P_vmpywehacc_PP_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 490
Rxx+=vmpyweh(Rss,Rtt):sat
Word64 Q6_P_vmpywehacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 490

vmpyweuh
Rdd=vmpyweuh(Rss,Rtt):<<1:rnd:sat
Word64 Q6_P_vmpyweuh_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) 493
Rdd=vmpyweuh(Rss,Rtt):<<1:sat
Word64 Q6_P_vmpyweuh_PP_s1_sat(Word64 Rss, Word64 Rtt) 493
Rdd=vmpyweuh(Rss,Rtt):rnd:sat
Word64 Q6_P_vmpyweuh_PP_rnd_sat(Word64 Rss, Word64 Rtt) 493
Rdd=vmpyweuh(Rss,Rtt):sat
Word64 Q6_P_vmpyweuh_PP_sat(Word64 Rss, Word64 Rtt) 494
Rxx+=vmpyweuh(Rss,Rtt):<<1:rnd:sat
Word64 Q6_P_vmpyweuhacc_PP_s1_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt)
494
Rxx+=vmpyweuh(Rss,Rtt):<<1:sat
Word64 Q6_P_vmpyweuhacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 494
Rxx+=vmpyweuh(Rss,Rtt):rnd:sat
Word64 Q6_P_vmpyweuhacc_PP_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 494
Rxx+=vmpyweuh(Rss,Rtt):sat
Word64 Q6_P_vmpyweuhacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 494

vmpywoh
Rdd=vmpywoh(Rss,Rtt):<<1:rnd:sat
Word64 Q6_P_vmpywoh_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) 490
Rdd=vmpywoh(Rss,Rtt):<<1:sat
Word64 Q6_P_vmpywoh_PP_s1_sat(Word64 Rss, Word64 Rtt) 490
Rdd=vmpywoh(Rss,Rtt):rnd:sat
Word64 Q6_P_vmpywoh_PP_rnd_sat(Word64 Rss, Word64 Rtt) 490
Rdd=vmpywoh(Rss,Rtt):sat
Word64 Q6_P_vmpywoh_PP_sat(Word64 Rss, Word64 Rtt) 490
Rxx+=vmpywoh(Rss,Rtt):<<1:rnd:sat
Word64 Q6_P_vmpywohacc_PP_s1_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt)
490
Rxx+=vmpywoh(Rss,Rtt):<<1:sat
Word64 Q6_P_vmpywohacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 490
Rxx+=vmpywoh(Rss,Rtt):rnd:sat
Word64 Q6_P_vmpywohacc_PP_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 490
Rxx+=vmpywoh(Rss,Rtt):sat

80-N2040-53 Rev. AA 685


Qualcomm Hexagon V73 Programmer’s Reference Manual

Word64 Q6_P_vmpywohacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 490

vmpywouh
Rdd=vmpywouh(Rss,Rtt):<<1:rnd:sat
Word64 Q6_P_vmpywouh_PP_s1_rnd_sat(Word64 Rss, Word64 Rtt) 494
Rdd=vmpywouh(Rss,Rtt):<<1:sat
Word64 Q6_P_vmpywouh_PP_s1_sat(Word64 Rss, Word64 Rtt) 494
Rdd=vmpywouh(Rss,Rtt):rnd:sat
Word64 Q6_P_vmpywouh_PP_rnd_sat(Word64 Rss, Word64 Rtt) 494
Rdd=vmpywouh(Rss,Rtt):sat
Word64 Q6_P_vmpywouh_PP_sat(Word64 Rss, Word64 Rtt) 494
Rxx+=vmpywouh(Rss,Rtt):<<1:rnd:sat
Word64 Q6_P_vmpywouhacc_PP_s1_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt)
494
Rxx+=vmpywouh(Rss,Rtt):<<1:sat
Word64 Q6_P_vmpywouhacc_PP_s1_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 494
Rxx+=vmpywouh(Rss,Rtt):rnd:sat
Word64 Q6_P_vmpywouhacc_PP_rnd_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 494
Rxx+=vmpywouh(Rss,Rtt):sat
Word64 Q6_P_vmpywouhacc_PP_sat(Word64 Rxx, Word64 Rss, Word64 Rtt) 494

vmux
Rdd=vmux(Pu,Rss,Rtt)
Word64 Q6_P_vmux_pPP(Byte Pu, Word64 Rss, Word64 Rtt) 587

vnavgh
Rd=vnavgh(Rt,Rs)
Word32 Q6_R_vnavgh_RR(Word32 Rt, Word32 Rs) 169
Rdd=vnavgh(Rtt,Rss)
Word64 Q6_P_vnavgh_PP(Word64 Rtt, Word64 Rss) 381
Rdd=vnavgh(Rtt,Rss):crnd:sat
Word64 Q6_P_vnavgh_PP_crnd_sat(Word64 Rtt, Word64 Rss) 381
Rdd=vnavgh(Rtt,Rss):rnd:sat
Word64 Q6_P_vnavgh_PP_rnd_sat(Word64 Rtt, Word64 Rss) 381

vnavgw
Rdd=vnavgw(Rtt,Rss)
Word64 Q6_P_vnavgw_PP(Word64 Rtt, Word64 Rss) 384
Rdd=vnavgw(Rtt,Rss):crnd:sat
Word64 Q6_P_vnavgw_PP_crnd_sat(Word64 Rtt, Word64 Rss) 384
Rdd=vnavgw(Rtt,Rss):rnd:sat
Word64 Q6_P_vnavgw_PP_rnd_sat(Word64 Rtt, Word64 Rss) 384

vpmpyh
Rdd=vpmpyh(Rs,Rt)
Word64 Q6_P_vpmpyh_RR(Word32 Rs, Word32 Rt) 539
Rxx^=vpmpyh(Rs,Rt)
Word64 Q6_P_vpmpyhxacc_RR(Word64 Rxx, Word32 Rs, Word32 Rt) 539

vraddh
Rd=vraddh(Rss,Rtt)
Word32 Q6_R_vraddh_PP(Word64 Rss, Word64 Rtt) 376

vraddub
Rdd=vraddub(Rss,Rtt)
Word64 Q6_P_vraddub_PP(Word64 Rss, Word64 Rtt) 374
Rxx+=vraddub(Rss,Rtt)
Word64 Q6_P_vraddubacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 374

80-N2040-53 Rev. AA 686


Qualcomm Hexagon V73 Programmer’s Reference Manual

vradduh
Rd=vradduh(Rss,Rtt)
Word32 Q6_R_vradduh_PP(Word64 Rss, Word64 Rtt) 376

vrcmpys
Rd=vrcmpys(Rss,Rt):<<1:rnd:sat
Word32 Q6_R_vrcmpys_PR_s1_rnd_sat(Word64 Rss, Word32 Rt) 457
Rdd=vrcmpys(Rss,Rt):<<1:sat
Word64 Q6_P_vrcmpys_PR_s1_sat(Word64 Rss, Word32 Rt) 454
Rxx+=vrcmpys(Rss,Rt):<<1:sat
Word64 Q6_P_vrcmpysacc_PR_s1_sat(Word64 Rxx, Word64 Rss, Word32 Rt) 454

vrcnegh
Rxx+=vrcnegh(Rss,Rt)
Word64 Q6_P_vrcneghacc_PR(Word64 Rxx, Word64 Rss, Word32 Rt) 386

vrcrotate
Rdd=vrcrotate(Rss,Rt,#u2)
Word64 Q6_P_vrcrotate_PRI(Word64 Rss, Word32 Rt, Word32 Iu2) 460
Rxx+=vrcrotate(Rss,Rt,#u2)
Word64 Q6_P_vrcrotateacc_PRI(Word64 Rxx, Word64 Rss, Word32 Rt, Word32 Iu2)
460

vrmaxh
Rxx=vrmaxh(Rss,Ru)
Word64 Q6_P_vrmaxh_PR(Word64 Rxx, Word64 Rss, Word32 Ru) 390

vrmaxuh
Rxx=vrmaxuh(Rss,Ru)
Word64 Q6_P_vrmaxuh_PR(Word64 Rxx, Word64 Rss, Word32 Ru) 390

vrmaxuw
Rxx=vrmaxuw(Rss,Ru)
Word64 Q6_P_vrmaxuw_PR(Word64 Rxx, Word64 Rss, Word32 Ru) 392

vrmaxw
Rxx=vrmaxw(Rss,Ru)
Word64 Q6_P_vrmaxw_PR(Word64 Rxx, Word64 Rss, Word32 Ru) 392

vrminh
Rxx=vrminh(Rss,Ru)
Word64 Q6_P_vrminh_PR(Word64 Rxx, Word64 Rss, Word32 Ru) 398

vrminuh
Rxx=vrminuh(Rss,Ru)
Word64 Q6_P_vrminuh_PR(Word64 Rxx, Word64 Rss, Word32 Ru) 398

vrminuw
Rxx=vrminuw(Rss,Ru)
Word64 Q6_P_vrminuw_PR(Word64 Rxx, Word64 Rss, Word32 Ru) 400

vrminw
Rxx=vrminw(Rss,Ru)
Word64 Q6_P_vrminw_PR(Word64 Rxx, Word64 Rss, Word32 Ru) 400

vrmpybsu
Rdd=vrmpybsu(Rss,Rtt)
Word64 Q6_P_vrmpybsu_PP(Word64 Rss, Word64 Rtt) 523
Rxx+=vrmpybsu(Rss,Rtt)
Word64 Q6_P_vrmpybsuacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 523

80-N2040-53 Rev. AA 687


Qualcomm Hexagon V73 Programmer’s Reference Manual

vrmpybu
Rdd=vrmpybu(Rss,Rtt)
Word64 Q6_P_vrmpybu_PP(Word64 Rss, Word64 Rtt) 523
Rxx+=vrmpybu(Rss,Rtt)
Word64 Q6_P_vrmpybuacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 523

vrmpyh
Rdd=vrmpyh(Rss,Rtt)
Word64 Q6_P_vrmpyh_PP(Word64 Rss, Word64 Rtt) 534
Rxx+=vrmpyh(Rss,Rtt)
Word64 Q6_P_vrmpyhacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 534

vrmpyweh
Rdd=vrmpyweh(Rss,Rtt)
Word64 Q6_P_vrmpyweh_PP(Word64 Rss, Word64 Rtt) 511
Rdd=vrmpyweh(Rss,Rtt):<<1
Word64 Q6_P_vrmpyweh_PP_s1(Word64 Rss, Word64 Rtt) 511
Rxx+=vrmpyweh(Rss,Rtt)
Word64 Q6_P_vrmpywehacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 511
Rxx+=vrmpyweh(Rss,Rtt):<<1
Word64 Q6_P_vrmpywehacc_PP_s1(Word64 Rxx, Word64 Rss, Word64 Rtt) 511

vrmpywoh
Rdd=vrmpywoh(Rss,Rtt)
Word64 Q6_P_vrmpywoh_PP(Word64 Rss, Word64 Rtt) 511
Rdd=vrmpywoh(Rss,Rtt):<<1
Word64 Q6_P_vrmpywoh_PP_s1(Word64 Rss, Word64 Rtt) 511
Rxx+=vrmpywoh(Rss,Rtt)
Word64 Q6_P_vrmpywohacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 511
Rxx+=vrmpywoh(Rss,Rtt):<<1
Word64 Q6_P_vrmpywohacc_PP_s1(Word64 Rxx, Word64 Rss, Word64 Rtt) 511

vrndwh
Rd=vrndwh(Rss)
Word32 Q6_R_vrndwh_P(Word64 Rss) 547
Rd=vrndwh(Rss):sat
Word32 Q6_R_vrndwh_P_sat(Word64 Rss) 547

vrsadub
Rdd=vrsadub(Rss,Rtt)
Word64 Q6_P_vrsadub_PP(Word64 Rss, Word64 Rtt) 404
Rxx+=vrsadub(Rss,Rtt)
Word64 Q6_P_vrsadubacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 404

vsathb
Rd=vsathb(Rs)
Word32 Q6_R_vsathb_R(Word32 Rs) 550
Rd=vsathb(Rss)
Word32 Q6_R_vsathb_P(Word64 Rss) 550
Rdd=vsathb(Rss)
Word64 Q6_P_vsathb_P(Word64 Rss) 553

vsathub
Rd=vsathub(Rs)
Word32 Q6_R_vsathub_R(Word32 Rs) 550
Rd=vsathub(Rss)
Word32 Q6_R_vsathub_P(Word64 Rss) 550
Rdd=vsathub(Rss)
Word64 Q6_P_vsathub_P(Word64 Rss) 553

80-N2040-53 Rev. AA 688


Qualcomm Hexagon V73 Programmer’s Reference Manual

vsatwh
Rd=vsatwh(Rss)
Word32 Q6_R_vsatwh_P(Word64 Rss) 550
Rdd=vsatwh(Rss)
Word64 Q6_P_vsatwh_P(Word64 Rss) 553

vsatwuh
Rd=vsatwuh(Rss)
Word32 Q6_R_vsatwuh_P(Word64 Rss) 550
Rdd=vsatwuh(Rss)
Word64 Q6_P_vsatwuh_P(Word64 Rss) 553

vsplatb
Rd=vsplatb(Rs)
Word32 Q6_R_vsplatb_R(Word32 Rs) 557
Rdd=vsplatb(Rs)
Word64 Q6_P_vsplatb_R(Word32 Rs) 557

vsplath
Rdd=vsplath(Rs)
Word64 Q6_P_vsplath_R(Word32 Rs) 558

vspliceb
Rdd=vspliceb(Rss,Rtt,#u3)
Word64 Q6_P_vspliceb_PPI(Word64 Rss, Word64 Rtt, Word32 Iu3) 559
Rdd=vspliceb(Rss,Rtt,Pu)
Word64 Q6_P_vspliceb_PPp(Word64 Rss, Word64 Rtt, Byte Pu) 559

vsubb
Rdd=vsubb(Rss,Rtt)
Word64 Q6_P_vsubb_PP(Word64 Rss, Word64 Rtt) 407

vsubh
Rd=vsubh(Rt,Rs)
Word32 Q6_R_vsubh_RR(Word32 Rt, Word32 Rs) 170
Rd=vsubh(Rt,Rs):sat
Word32 Q6_R_vsubh_RR_sat(Word32 Rt, Word32 Rs) 170
Rdd=vsubh(Rtt,Rss)
Word64 Q6_P_vsubh_PP(Word64 Rtt, Word64 Rss) 405
Rdd=vsubh(Rtt,Rss):sat
Word64 Q6_P_vsubh_PP_sat(Word64 Rtt, Word64 Rss) 405

vsubub
Rdd=vsubub(Rtt,Rss)
Word64 Q6_P_vsubub_PP(Word64 Rtt, Word64 Rss) 407
Rdd=vsubub(Rtt,Rss):sat
Word64 Q6_P_vsubub_PP_sat(Word64 Rtt, Word64 Rss) 407

vsubuh
Rd=vsubuh(Rt,Rs):sat
Word32 Q6_R_vsubuh_RR_sat(Word32 Rt, Word32 Rs) 170
Rdd=vsubuh(Rtt,Rss):sat
Word64 Q6_P_vsubuh_PP_sat(Word64 Rtt, Word64 Rss) 405

vsubw
Rdd=vsubw(Rtt,Rss)
Word64 Q6_P_vsubw_PP(Word64 Rtt, Word64 Rss) 408
Rdd=vsubw(Rtt,Rss):sat
Word64 Q6_P_vsubw_PP_sat(Word64 Rtt, Word64 Rss) 408

80-N2040-53 Rev. AA 689


Qualcomm Hexagon V73 Programmer’s Reference Manual

vsxtbh
Rdd=vsxtbh(Rs)
Word64 Q6_P_vsxtbh_R(Word32 Rs) 561

vsxthw
Rdd=vsxthw(Rs)
Word64 Q6_P_vsxthw_R(Word32 Rs) 561

vtrunehb
Rd=vtrunehb(Rss)
Word32 Q6_R_vtrunehb_P(Word64 Rss) 564
Rdd=vtrunehb(Rss,Rtt)
Word64 Q6_P_vtrunehb_PP(Word64 Rss, Word64 Rtt) 564

vtrunewh
Rdd=vtrunewh(Rss,Rtt)
Word64 Q6_P_vtrunewh_PP(Word64 Rss, Word64 Rtt) 564

vtrunohb
Rd=vtrunohb(Rss)
Word32 Q6_R_vtrunohb_P(Word64 Rss) 564
Rdd=vtrunohb(Rss,Rtt)
Word64 Q6_P_vtrunohb_PP(Word64 Rss, Word64 Rtt) 564

vtrunowh
Rdd=vtrunowh(Rss,Rtt)
Word64 Q6_P_vtrunowh_PP(Word64 Rss, Word64 Rtt) 564

vxaddsubh
Rdd=vxaddsubh(Rss,Rtt):rnd:>>1:sat
Word64 Q6_P_vxaddsubh_PP_rnd_rs1_sat(Word64 Rss, Word64 Rtt) 430
Rdd=vxaddsubh(Rss,Rtt):sat
Word64 Q6_P_vxaddsubh_PP_sat(Word64 Rss, Word64 Rtt) 430

vxaddsubw
Rdd=vxaddsubw(Rss,Rtt):sat
Word64 Q6_P_vxaddsubw_PP_sat(Word64 Rss, Word64 Rtt) 431

vxsubaddh
Rdd=vxsubaddh(Rss,Rtt):rnd:>>1:sat
Word64 Q6_P_vxsubaddh_PP_rnd_rs1_sat(Word64 Rss, Word64 Rtt) 430
Rdd=vxsubaddh(Rss,Rtt):sat
Word64 Q6_P_vxsubaddh_PP_sat(Word64 Rss, Word64 Rtt) 430

vxsubaddw
Rdd=vxsubaddw(Rss,Rtt):sat
Word64 Q6_P_vxsubaddw_PP_sat(Word64 Rss, Word64 Rtt) 431

vzxtbh
Rdd=vzxtbh(Rs)
Word64 Q6_P_vzxtbh_R(Word32 Rs) 565

vzxthw
Rdd=vzxthw(Rs)
Word64 Q6_P_vzxthw_R(Word32 Rs) 565

X
xor
Pd=xor(Ps,Pt)

80-N2040-53 Rev. AA 690


Qualcomm Hexagon V73 Programmer’s Reference Manual

Byte Q6_p_xor_pp(Byte Ps, Byte Pt) 204


Rd=xor(Rs,Rt)
Word32 Q6_R_xor_RR(Word32 Rs, Word32 Rt) 158
Rdd=xor(Rss,Rtt)
Word64 Q6_P_xor_PP(Word64 Rss, Word64 Rtt) 343
Rx&=xor(Rs,Rt)
Word32 Q6_R_xorand_RR(Word32 Rx, Word32 Rs, Word32 Rt) 347
Rx^=xor(Rs,Rt)
Word32 Q6_R_xorxacc_RR(Word32 Rx, Word32 Rs, Word32 Rt) 347
Rx|=xor(Rs,Rt)
Word32 Q6_R_xoror_RR(Word32 Rx, Word32 Rs, Word32 Rt) 347
Rxx^=xor(Rss,Rtt)
Word64 Q6_P_xorxacc_PP(Word64 Rxx, Word64 Rss, Word64 Rtt) 345

Z
zxtb
Rd=zxtb(Rs)
Word32 Q6_R_zxtb_R(Word32 Rs) 171

zxth
Rd=zxth(Rs)
Word32 Q6_R_zxth_R(Word32 Rs) 171

80-N2040-53 Rev. AA 691

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy