0% found this document useful (0 votes)
14 views17 pages

Xab

The document provides an overview of various assembly language instructions, including loop instructions, I/O operations, string operations, flag control instructions, conditional operations, and miscellaneous instructions. It details how each instruction functions, its operands, and specific mnemonics used for different operations. Additionally, it explains the purpose of instructions like 'nop', 'ud2', 'cpuid', and 'enter' in managing control flow and system operations.

Uploaded by

nohew14488
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views17 pages

Xab

The document provides an overview of various assembly language instructions, including loop instructions, I/O operations, string operations, flag control instructions, conditional operations, and miscellaneous instructions. It details how each instruction functions, its operands, and specific mnemonics used for different operations. Additionally, it explains the purpose of instructions like 'nop', 'ud2', 'cpuid', and 'enter' in managing control flow and system operations.

Uploaded by

nohew14488
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 17

\-----------------------------------------------------------/

The "loop" instructions are conditional jumps that use a value placed in
CX (or ECX) to specify the number of repetitions of a software loop. All
"loop" instructions automatically decrement CX (or ECX) and terminate the
loop (don't transfer the control) when CX (or ECX) is zero. It uses CX or ECX
whether the current code setting is 16-bit or 32-bit, but it can be forced to
us CX with the "loopw" mnemonic or to use ECX with the "loopd" mnemonic.
"loope" and "loopz" are the synonyms for the same instruction, which acts as
the standard "loop", but also terminates the loop when ZF flag is set.
"loopew" and "loopzw" mnemonics force them to use CX register while "looped"
and "loopzd" force them to use ECX register. "loopne" and "loopnz" are the
synonyms for the same instructions, which acts as the standard "loop", but
also terminate the loop when ZF flag is not set. "loopnew" and "loopnzw"
mnemonics force them to use CX register while "loopned" and "loopnzd" force
them to use ECX register. Every "loop" instruction needs an operand being an
immediate value specifying target address, it can be only short jump (in the
range of 128 bytes back and 127 bytes forward from the address of instruction
following the "loop" instruction).
"jcxz" branches to the label specified in the instruction if it finds a
value of zero in CX, "jecxz" does the same, but checks the value of ECX
instead of CX. Rules for the operands are the same as for the "loop"
instruction.
"int" activates the interrupt service routine that corresponds to the
number specified as an operand to the instruction, the number should be in
range from 0 to 255. The interrupt service routine terminates with an "iret"
instruction that returns control to the instruction that follows "int".
"int3" mnemonic codes the short (one byte) trap that invokes the interrupt 3.
"into" instruction invokes the interrupt 4 if the OF flag is set.
"bound" verifies that the signed value contained in the specified register
lies within specified limits. An interrupt 5 occurs if the value contained in
the register is less than the lower bound or greater than the upper bound. It
needs two operands, the first operand specifies the register being tested,
the second operand should be memory address for the two signed limit values.
The operands can be "word" or "dword" in size.

bound ax,[bx] ; check word for bounds


bound eax,[esi] ; check double word for bounds

2.1.7 I/O instructions

"in" transfers a byte, word, or double word from an input port to AL, AX,
or EAX. I/O ports can be addressed either directly, with the immediate byte
value coded in instruction, or indirectly via the DX register. The destination
operand should be AL, AX, or EAX register. The source operand should be an
immediate value in range from 0 to 255, or DX register.

in al,20h ; input byte from port 20h


in ax,dx ; input word from port addressed by dx

"out" transfers a byte, word, or double word to an output port from AL, AX,
or EAX. The program can specify the number of the port using the same methods
as the "in" instruction. The destination operand should be an immediate value
in range from 0 to 255, or DX register. The source operand should be AL, AX,
or EAX register.

out 20h,ax ; output word to port 20h


out dx,al ; output byte to port addressed by dx
2.1.8 Strings operations

The string operations operate on one element of a string. A string element


may be a byte, a word, or a double word. The string elements are addressed by
SI and DI (or ESI and EDI) registers. After every string operation SI and/or
DI (or ESI and/or EDI) are automatically updated to point to the next element
of the string. If DF (direction flag) is zero, the index registers are
incremented, if DF is one, they are decremented. The amount of the increment
or decrement is 1, 2, or 4 depending on the size of the string element. Every
string operation instruction has short forms which have no operands and use
SI and/or DI when the code type is 16-bit, and ESI and/or EDI when the code
type is 32-bit. SI and ESI by default address data in the segment selected
by DS, DI and EDI always address data in the segment selected by ES. Short
form is obtained by attaching to the mnemonic of string operation letter
specifying the size of string element, it should be "b" for byte element,
"w" for word element, and "d" for double word element. Full form of string
operation needs operands providing the size operator and the memory addresses,
which can be SI or ESI with any segment prefix, DI or EDI always with ES
segment prefix.
"movs" transfers the string element pointed to by SI (or ESI) to the
location pointed to by DI (or EDI). Size of operands can be byte, word, or
double word. The destination operand should be memory addressed by DI or EDI,
the source operand should be memory addressed by SI or ESI with any segment
prefix.

movs byte [di],[si] ; transfer byte


movs word [es:di],[ss:si] ; transfer word
movsd ; transfer double word

"cmps" subtracts the destination string element from the source string
element and updates the flags AF, SF, PF, CF and OF, but it does not change
any of the compared elements. If the string elements are equal, ZF is set,
otherwise it is cleared. The first operand for this instruction should be the
source string element addressed by SI or ESI with any segment prefix, the
second operand should be the destination string element addressed by DI or
EDI.

cmpsb ; compare bytes


cmps word [ds:si],[es:di] ; compare words
cmps dword [fs:esi],[edi] ; compare double words

"scas" subtracts the destination string element from AL, AX, or EAX
(depending on the size of string element) and updates the flags AF, SF, ZF,
PF, CF and OF. If the values are equal, ZF is set, otherwise it is cleared.
The operand should be the destination string element addressed by DI or EDI.

scas byte [es:di] ; scan byte


scasw ; scan word
scas dword [es:edi] ; scan double word

"stos" places the value of AL, AX, or EAX into the destination string
element. Rules for the operand are the same as for the "scas" instruction.
"lods" places the source string element into AL, AX, or EAX. The operand
should be the source string element addressed by SI or ESI with any segment
prefix.

lods byte [ds:si] ; load byte


lods word [cs:si] ; load word
lodsd ; load double word

"ins" transfers a byte, word, or double word from an input port addressed
by DX register to the destination string element. The destination operand
should be memory addressed by DI or EDI, the source operand should be the DX
register.

insb ; input byte


ins word [es:di],dx ; input word
ins dword [edi],dx ; input double word

"outs" transfers the source string element to an output port addressed by


DX register. The destination operand should be the DX register and the source
operand should be memory addressed by SI or ESI with any segment prefix.

outs dx,byte [si] ; output byte


outsw ; output word
outs dx,dword [gs:esi] ; output double word

The repeat prefixes "rep", "repe"/"repz", and "repne"/"repnz" specify


repeated string operation. When a string operation instruction has a repeat
prefix, the operation is executed repeatedly, each time using a different
element of the string. The repetition terminates when one of the conditions
specified by the prefix is satisfied. All three prefixes automatically
decrease CX or ECX register (depending whether string operation instruction
uses the 16-bit or 32-bit addressing) after each operation and repeat the
associated operation until CX or ECX is zero. "repe"/"repz" and
"repne"/"repnz" are used exclusively with the "scas" and "cmps" instructions
(described below). When these prefixes are used, repetition of the next
instruction depends on the zero flag (ZF) also, "repe" and "repz" terminate
the execution when the ZF is zero, "repne" and "repnz" terminate the execution
when the ZF is set.

rep movsd ; transfer multiple double words


repe cmpsb ; compare bytes until not equal

2.1.9 Flag control instructions

The flag control instructions provide a method for directly changing the
state of bits in the flag register. All instructions described in this
section have no operands.
"stc" sets the CF (carry flag) to 1, "clc" zeroes the CF, "cmc" changes the
CF to its complement. "std" sets the DF (direction flag) to 1, "cld" zeroes
the DF, "sti" sets the IF (interrupt flag) to 1 and therefore enables the
interrupts, "cli" zeroes the IF and therefore disables the interrupts.
"lahf" copies SF, ZF, AF, PF, and CF to bits 7, 6, 4, 2, and 0 of the
AH register. The contents of the remaining bits are undefined. The flags
remain unaffected.
"sahf" transfers bits 7, 6, 4, 2, and 0 from the AH register into SF, ZF,
AF, PF, and CF.
"pushf" decrements "esp" by two or four and stores the low word or
double word of flags register at the top of stack, size of stored data
depends on the current code setting. "pushfw" variant forces storing the
word and "pushfd" forces storing the double word.
"popf" transfers specific bits from the word or double word at the top
of stack, then increments "esp" by two or four, this value depends on
the current code setting. "popfw" variant forces restoring from the word
and "popfd" forces restoring from the double word.

2.1.10 Conditional operations

The instructions obtained by attaching the condition mnemonic (see table


2.1) to the "set" mnemonic set a byte to one if the condition is true and set
the byte to zero otherwise. The operand should be an 8-bit be general register
or the byte in memory.

setne al ; set al if zero flag cleared


seto byte [bx] ; set byte if overflow

"salc" instruction sets the all bits of AL register when the carry flag is
set and zeroes the AL register otherwise. This instruction has no arguments.
The instructions obtained by attaching the condition mnemonic to "cmov"
mnemonic transfer the word or double word from the general register or memory
to the general register only when the condition is true. The destination
operand should be general register, the source operand can be general register
or memory.

cmove ax,bx ; move when zero flag set


cmovnc eax,[ebx] ; move when carry flag cleared

"cmpxchg" compares the value in the AL, AX, or EAX register with the
destination operand. If the two values are equal, the source operand is
loaded into the destination operand. Otherwise, the destination operand is
loaded into the AL, AX, or EAX register. The destination operand may be a
general register or memory, the source operand must be a general register.

cmpxchg dl,bl ; compare and exchange with register


cmpxchg [bx],dx ; compare and exchange with memory

"cmpxchg8b" compares the 64-bit value in EDX and EAX registers with the
destination operand. If the values are equal, the 64-bit value in ECX and EBX
registers is stored in the destination operand. Otherwise, the value in the
destination operand is loaded into EDX and EAX registers. The destination
operand should be a quad word in memory.

cmpxchg8b [bx] ; compare and exchange 8 bytes

2.1.11 Miscellaneous instructions

"nop" instruction occupies one byte but affects nothing but the instruction
pointer. This instruction has no operands and doesn't perform any operation.
"ud2" instruction generates an invalid opcode exception. This instruction
is provided for software testing to explicitly generate an invalid opcode.
This is instruction has no operands.
"xlat" replaces a byte in the AL register with a byte indexed by its value
in a translation table addressed by BX or EBX. The operand should be a byte
memory addressed by BX or EBX with any segment prefix. This instruction has
also a short form "xlatb" which has no operands and uses the BX or EBX address
in the segment selected by DS depending on the current code setting.
"lds" transfers a pointer variable from the source operand to DS and the
destination register. The source operand must be a memory operand, and the
destination operand must be a general register. The DS register receives the
segment selector of the pointer while the destination register receives the
offset part of the pointer. "les", "lfs", "lgs" and "lss" operate identically
to "lds" except that rather than DS register the ES, FS, GS and SS is used
respectively.

lds bx,[si] ; load pointer to ds:bx

"lea" transfers the offset of the source operand (rather than its value)
to the destination operand. The source operand must be a memory operand, and
the destination operand must be a general register.

lea dx,[bx+si+1] ; load effective address to dx

"cpuid" returns processor identification and feature information in the


EAX, EBX, ECX, and EDX registers. The information returned is selected by
entering a value in the EAX register before the instruction is executed.
This instruction has no operands.
"pause" instruction delays the execution of the next instruction an
implementation specific amount of time. It can be used to improve the
performance of spin wait loops. This instruction has no operands.
"enter" creates a stack frame that may be used to implement the scope rules
of block-structured high-level languages. A "leave" instruction at the end of
a procedure complements an "enter" at the beginning of the procedure to
simplify stack management and to control access to variables for nested
procedures. The "enter" instruction includes two parameters. The first
parameter specifies the number of bytes of dynamic storage to be allocated on
the stack for the routine being entered. The second parameter corresponds to
the lexical nesting level of the routine, it can be in range from 0 to 31.
The specified lexical level determines how many sets of stack frame pointers
the CPU copies into the new stack frame from the preceding frame. This list
of stack frame pointers is sometimes called the display. The first word (or
double word when code is 32-bit) of the display is a pointer to the last stack
frame. This pointer enables a "leave" instruction to reverse the action of the
previous "enter" instruction by effectively discarding the last stack frame.
After "enter" creates the new display for a procedure, it allocates the
dynamic storage space for that procedure by decrementing ESP by the number of
bytes specified in the first parameter. To enable a procedure to address its
display, "enter" leaves BP (or EBP) pointing to the beginning of the new stack
frame. If the lexical level is zero, "enter" pushes BP (or EBP), copies SP to
BP (or ESP to EBP) and then subtracts the first operand from ESP. For nesting
levels greater than zero, the processor pushes additional frame pointers on
the stack before adjusting the stack pointer.

enter 2048,0 ; enter and allocate 2048 bytes on stack

2.1.12 System instructions

"lmsw" loads the operand into the machine status word (bits 0 through 15 of
CR0 register), while "smsw" stores the machine status word into the
destination operand. The operand for both those instructions can be 16-bit
general register or memory, for "smsw" it can also be 32-bit general
register.

lmsw ax ; load machine status from register


smsw [bx] ; store machine status to memory

"lgdt" and "lidt" instructions load the values in operand into the global
descriptor table register or the interrupt descriptor table register
respectively. "sgdt" and "sidt" store the contents of the global descriptor
table register or the interrupt descriptor table register in the destination
operand. The operand should be a 6 bytes in memory.

lgdt [ebx] ; load global descriptor table

"lldt" loads the operand into the segment selector field of the local
descriptor table register and "sldt" stores the segment selector from the
local descriptor table register in the operand. "ltr" loads the operand into
the segment selector field of the task register and "str" stores the segment
selector from the task register in the operand. Rules for operand are the same
as for the "lmsw" and "smsw" instructions.
"lar" loads the access rights from the segment descriptor specified by
the selector in source operand into the destination operand and sets the ZF
flag. The destination operand can be a 16-bit or 32-bit general register.
The source operand should be a 16-bit general register or memory.

lar ax,[bx] ; load access rights into word


lar eax,dx ; load access rights into double word

"lsl" loads the segment limit from the segment descriptor specified by the
selector in source operand into the destination operand and sets the ZF flag.
Rules for operand are the same as for the "lar" instruction.
"verr" and "verw" verify whether the code or data segment specified with
the operand is readable or writable from the current privilege level. The
operand should be a word, it can be general register or memory. If the segment
is accessible and readable (for "verr") or writable (for "verw") the ZF flag
is set, otherwise it's cleared. Rules for operand are the same as for the
"lldt" instruction.
"arpl" compares the RPL (requestor's privilege level) fields of two segment
selectors. The first operand contains one segment selector and the second
operand contains the other. If the RPL field of the destination operand is
less than the RPL field of the source operand, the ZF flag is set and the RPL
field of the destination operand is increased to match that of the source
operand. Otherwise, the ZF flag is cleared and no change is made to the
destination operand. The destination operand can be a word general register
or memory, the source operand must be a general register.

arpl bx,ax ; adjust RPL of selector in register


arpl [bx],ax ; adjust RPL of selector in memory

"clts" clears the TS (task switched) flag in the CR0 register. This
instruction has no operands.
"lock" prefix causes the processor's bus-lock signal to be asserted during
execution of the accompanying instruction. In a multiprocessor environment,
the bus-lock signal insures that the processor has exclusive use of any shared
memory while the signal is asserted. The "lock" prefix can be prepended only
to the following instructions and only to those forms of the instructions
where the destination operand is a memory operand: "add", "adc", "and", "btc",
"btr", "bts", "cmpxchg", "cmpxchg8b", "dec", "inc", "neg", "not", "or", "sbb",
"sub", "xor", "xadd" and "xchg". If the "lock" prefix is used with one of
these instructions and the source operand is a memory operand, an undefined
opcode exception may be generated. An undefined opcode exception will also be
generated if the "lock" prefix is used with any instruction not in the above
list. The "xchg" instruction always asserts the bus-lock signal regardless of
the presence or absence of the "lock" prefix.
"hlt" stops instruction execution and places the processor in a halted
state. An enabled interrupt, a debug exception, the BINIT, INIT or the RESET
signal will resume execution. This instruction has no operands.
"invlpg" invalidates (flushes) the TLB (translation lookaside buffer) entry
specified with the operand, which should be a memory. The processor determines
the page that contains that address and flushes the TLB entry for that page.
"rdmsr" loads the contents of a 64-bit MSR (model specific register) of the
address specified in the ECX register into registers EDX and EAX. "wrmsr"
writes the contents of registers EDX and EAX into the 64-bit MSR of the
address specified in the ECX register. "rdtsc" loads the current value of the
processor's time stamp counter from the 64-bit MSR into the EDX and EAX
registers. The processor increments the time stamp counter MSR every clock
cycle and resets it to 0 whenever the processor is reset. "rdpmc" loads the
contents of the 40-bit performance monitoring counter specified in the ECX
register into registers EDX and EAX. These instructions have no operands.
"wbinvd" writes back all modified cache lines in the processor's internal
cache to main memory and invalidates (flushes) the internal caches. The
instruction then issues a special function bus cycle that directs external
caches to also write back modified data and another bus cycle to indicate that
the external caches should be invalidated. This instruction has no operands.
"rsm" return program control from the system management mode to the program
that was interrupted when the processor received an SMM interrupt. This
instruction has no operands.
"sysenter" executes a fast call to a level 0 system procedure, "sysexit"
executes a fast return to level 3 user code. The addresses used by these
instructions are stored in MSRs. These instructions have no operands.

2.1.13 FPU instructions

The FPU (Floating-Point Unit) instructions operate on the floating-point


values in three formats: single precision (32-bit), double precision (64-bit)
and double extended precision (80-bit). The FPU registers form the stack and
each of them holds the double extended precision floating-point value. When
some values are pushed onto the stack or are removed from the top, the FPU
registers are shifted, so ST0 is always the value on the top of FPU stack, ST1
is the first value below the top, etc. The ST0 name has also the synonym ST.
"fld" pushes the floating-point value onto the FPU register stack. The
operand can be 32-bit, 64-bit or 80-bit memory location or the FPU register,
its value is then loaded onto the top of FPU register stack (the ST0
register) and is automatically converted into the double extended precision
format.

fld dword [bx] ; load single prevision value from memory


fld st2 ; push value of st2 onto register stack

"fld1", "fldz", "fldl2t", "fldl2e", "fldpi", "fldlg2" and "fldln2" load the
commonly used contants onto the FPU register stack. The loaded constants are
+1.0, +0.0, lb 10, lb e, pi, lg 2 and ln 2 respectively. These instructions
have no operands.
"fild" converts the signed integer source operand into double extended
precision floating-point format and pushes the result onto the FPU register
stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location.

fild qword [bx] ; load 64-bit integer from memory

"fst" copies the value of ST0 register to the destination operand, which
can be 32-bit or 64-bit memory location or another FPU register. "fstp"
performs the same operation as "fst" and then pops the register stack,
getting rid of ST0. "fstp" accepts the same operands as the "fst" instruction
and can also store value in the 80-bit memory.

fst st3 ; copy value of st0 into st3 register


fstp tword [bx] ; store value in memory and pop stack
"fist" converts the value in ST0 to a signed integer and stores the result
in the destination operand. The operand can be 16-bit or 32-bit memory
location. "fistp" performs the same operation and then pops the register
stack, it accepts the same operands as the "fist" instruction and can also
store integer value in the 64-bit memory, so it has the same rules for
operands as "fild" instruction.
"fbld" converts the packed BCD integer into double extended precision
floating-point format and pushes this value onto the FPU stack. "fbstp"
converts the value in ST0 to an 18-digit packed BCD integer, stores the result
in the destination operand, and pops the register stack. The operand should be
an 80-bit memory location.
"fadd" adds the destination and source operand and stores the sum in the
destination location. The destination operand is always an FPU register, if
the source is a memory location, the destination is ST0 register and only
source operand should be specified. If both operands are FPU registers, at
least one of them should be ST0 register. An operand in memory can be a
32-bit or 64-bit value.

fadd qword [bx] ; add double precision value to st0


fadd st2,st0 ; add st0 to st2

"faddp" adds the destination and source operand, stores the sum in the
destination location and then pops the register stack. The destination operand
must be an FPU register and the source operand must be the ST0. When no
operands are specified, ST1 is used as a destination operand.

faddp ; add st0 to st1 and pop the stack


faddp st2,st0 ; add st0 to st2 and pop the stack

"fiadd" instruction converts an integer source operand into double extended


precision floating-point value and adds it to the destination operand. The
operand should be a 16-bit or 32-bit memory location.

fiadd word [bx] ; add word integer to st0

"fsub", "fsubr", "fmul", "fdiv", "fdivr" instruction are similar to "fadd",


have the same rules for operands and differ only in the perfomed computation.
"fsub" subtracts the source operand from the destination operand, "fsubr"
subtract the destination operand from the source operand, "fmul" multiplies
the destination and source operands, "fdiv" divides the destination operand by
the source operand and "fdivr" divides the source operand by the destination
operand. "fsubp", "fsubrp", "fmulp", "fdivp", "fdivrp" perform the same
operations and pop the register stack, the rules for operand are the same as
for the "faddp" instruction. "fisub", "fisubr", "fimul", "fidiv", "fidivr"
perform these operations after converting the integer source operand into
floating-point value, they have the same rules for operands as "fiadd"
instruction.
"fsqrt" computes the square root of the value in ST0 register, "fsin"
computes the sine of that value, "fcos" computes the cosine of that value,
"fchs" complements its sign bit, "fabs" clears its sign to create the absolute
value, "frndint" rounds it to the nearest integral value, depending on the
current rounding mode. "f2xm1" computes the exponential value of 2 to the
power of ST0 and subtracts the 1.0 from it, the value of ST0 must lie in the
range -1.0 to +1.0. All these instructions store the result in ST0 and have no
operands.
"fsincos" computes both the sine and the cosine of the value in ST0
register, stores the sine in ST0 and pushes the cosine on the top of FPU
register stack. "fptan" computes the tangent of the value in ST0, stores the
result in ST0 and pushes a 1.0 onto the FPU register stack. "fpatan" computes
the arctangent of the value in ST1 divided by the value in ST0, stores the
result in ST1 and pops the FPU register stack. "fyl2x" computes the binary
logarithm of ST0, multiplies it by ST1, stores the result in ST1 and pops the
FPU register stack; "fyl2xp1" performs the same operation but it adds 1.0 to
ST0 before computing the logarithm. "fprem" computes the remainder obtained
from dividing the value in ST0 by the value in ST1, and stores the result
in ST0. "fprem1" performs the same operation as "fprem", but it computes the
remainder in the way specified by IEEE Standard 754. "fscale" truncates the
value in ST1 and increases the exponent of ST0 by this value. "fxtract"
separates the value in ST0 into its exponent and significand, stores the
exponent in ST0 and pushes the significand onto the register stack. "fnop"
performs no operation. These instructions have no operands.
"fxch" exchanges the contents of ST0 an another FPU register. The operand
should be an FPU register, if no operand is specified, the contents of ST0 and
ST1 are exchanged.
"fcom" and "fcomp" compare the contents of ST0 and the source operand and
set flags in the FPU status word according to the results. "fcomp"
additionally pops the register stack after performing the comparison. The
operand can be a single or double precision value in memory or the FPU
register. When no operand is specified, ST1 is used as a source operand.

fcom ; compare st0 with st1


fcomp st2 ; compare st0 with st2 and pop stack

"fcompp" compares the contents of ST0 and ST1, sets flags in the FPU status
word according to the results and pops the register stack twice. This
instruction has no operands.
"fucom", "fucomp" and "fucompp" performs an unordered comparison of two FPU
registers. Rules for operands are the same as for the "fcom", "fcomp" and
"fcompp", but the source operand must be an FPU register.
"ficom" and "ficomp" compare the value in ST0 with an integer source operand
and set the flags in the FPU status word according to the results. "ficomp"
additionally pops the register stack after performing the comparison. The
integer value is converted to double extended precision floating-point format
before the comparison is made. The operand should be a 16-bit or 32-bit
memory location.

ficom word [bx] ; compare st0 with 16-bit integer

"fcomi", "fcomip", "fucomi", "fucomip" perform the comparison of ST0 with


another FPU register and set the ZF, PF and CF flags according to the results.
"fcomip" and "fucomip" additionaly pop the register stack after performing the
comparison. The instructions obtained by attaching the FPU condition mnemonic
(see table 2.2) to the "fcmov" mnemonic transfer the specified FPU register
into ST0 register if the given test condition is true. These instructions
allow two different syntaxes, one with single operand specifying the source
FPU register, and one with two operands, in that case destination operand
should be ST0 register and the second operand specifies the source FPU
register.

fcomi st2 ; compare st0 with st2 and set flags


fcmovb st0,st2 ; transfer st2 to st0 if below

Table 2.2 FPU conditions


/------------------------------------------------------\
| Mnemonic | Condition tested | Description |
|==========|==================|========================|
| b | CF = 1 | below |
| e | ZF = 1 | equal |
| be | CF or ZF = 1 | below or equal |
| u | PF = 1 | unordered |
| nb | CF = 0 | not below |
| ne | ZF = 0 | not equal |
| nbe | CF and ZF = 0 | not below nor equal |
| nu | PF = 0 | not unordered |
\------------------------------------------------------/

"ftst" compares the value in ST0 with 0.0 and sets the flags in the FPU
status word according to the results. "fxam" examines the contents of the ST0
and sets the flags in FPU status word to indicate the class of value in the
register. These instructions have no operands.
"fstsw" and "fnstsw" store the current value of the FPU status word in the
destination location. The destination operand can be either a 16-bit memory or
the AX register. "fstsw" checks for pending unmasked FPU exceptions before
storing the status word, "fnstsw" does not.
"fstcw" and "fnstcw" store the current value of the FPU control word at the
specified destination in memory. "fstcw" checks for pending umasked FPU
exceptions before storing the control word, "fnstcw" does not. "fldcw" loads
the operand into the FPU control word. The operand should be a 16-bit memory
location.
"fstenv" and "fnstenv" store the current FPU operating environment at the
memory location specified with the destination operand, and then mask all FPU
exceptions. "fstenv" checks for pending umasked FPU exceptions before
proceeding, "fnstenv" does not. "fldenv" loads the complete operating
environment from memory into the FPU. "fsave" and "fnsave" store the current
FPU state (operating environment and register stack) at the specified
destination in memory and reinitializes the FPU. "fsave" check for pending
unmasked FPU exceptions before proceeding, "fnsave" does not. "frstor"
loads the FPU state from the specified memory location. All these instructions
need an operand being a memory location. For each of these instructions
exist two additional mnemonics that allow to precisely select the type of the
operation. The "fstenvw", "fnstenvw", "fldenvw", "fsavew", "fnsavew" and
"frstorw" mnemonics force the instruction to perform operation as in the 16-bit
mode, while "fstenvd", "fnstenvd", "fldenvd", "fsaved", "fnsaved" and "frstord"
force the operation as in 32-bit mode.
"finit" and "fninit" set the FPU operating environment into its default
state. "finit" checks for pending unmasked FPU exception before proceeding,
"fninit" does not. "fclex" and "fnclex" clear the FPU exception flags in the
FPU status word. "fclex" checks for pending unmasked FPU exception before
proceeding, "fnclex" does not. "wait" and "fwait" are synonyms for the same
instruction, which causes the processor to check for pending unmasked FPU
exceptions and handle them before proceeding. These instructions have no
operands.
"ffree" sets the tag associated with specified FPU register to empty. The
operand should be an FPU register.
"fincstp" and "fdecstp" rotate the FPU stack by one by adding or
subtracting one to the pointer of the top of stack. These instructions have no
operands.

2.1.14 MMX instructions

The MMX instructions operate on the packed integer types and use the MMX
registers, which are the low 64-bit parts of the 80-bit FPU registers. Because
of this MMX instructions cannot be used at the same time as FPU instructions.
They can operate on packed bytes (eight 8-bit integers), packed words (four
16-bit integers) or packed double words (two 32-bit integers), use of packed
formats allows to perform operations on multiple data at one time.
"movq" copies a quad word from the source operand to the destination
operand. At least one of the operands must be a MMX register, the second one
can be also a MMX register or 64-bit memory location.

movq mm0,mm1 ; move quad word from register to register


movq mm2,[ebx] ; move quad word from memory to register

"movd" copies a double word from the source operand to the destination
operand. One of the operands must be a MMX register, the second one can be a
general register or 32-bit memory location. Only low double word of MMX
register is used.
All general MMX operations have two operands, the destination operand should
be a MMX register, the source operand can be a MMX register or 64-bit memory
location. Operation is performed on the corresponding data elements of the
source and destination operand and stored in the data elements of the
destination operand. "paddb", "paddw" and "paddd" perform the addition of
packed bytes, packed words, or packed double words. "psubb", "psubw" and
"psubd" perform the subtraction of appropriate types. "paddsb", "paddsw",
"psubsb" and "psubsw" perform the addition or subtraction of packed bytes
or packed words with the signed saturation. "paddusb", "paddusw", "psubusb",
"psubusw" are analoguous, but with unsigned saturation. "pmulhw" and "pmullw"
performs a signed multiplication of the packed words and store the high or low
words of the results in the destination operand. "pmaddwd" performs a multiply
of the packed words and adds the four intermediate double word products in
pairs to produce result as a packed double words. "pand", "por" and "pxor"
perform the logical operations on the quad words, "pandn" peforms also a
logical negation of the destination operand before performing the "and"
operation. "pcmpeqb", "pcmpeqw" and "pcmpeqd" compare for equality of packed
bytes, packed words or packed double words. If a pair of data elements is
equal, the corresponding data element in the destination operand is filled with
bits of value 1, otherwise it's set to 0. "pcmpgtb", "pcmpgtw" and "pcmpgtd"
perform the similar operation, but they check whether the data elements in the
destination operand are greater than the correspoding data elements in the
source operand. "packsswb" converts packed signed words into packed signed
bytes, "packssdw" converts packed signed double words into packed signed
words, using saturation to handle overflow conditions. "packuswb" converts
packed signed words into packed unsigned bytes. Converted data elements from
the source operand are stored in the high part of the destination operand,
while converted data elements from the destination operand are stored in the
low part. "punpckhbw", "punpckhwd" and "punpckhdq" interleaves the data
elements from the high parts of the source and destination operands and
stores the result into the destination operand. "punpcklbw", "punpcklwd" and
"punpckldq" perform the same operation, but the low parts of the source and
destination operand are used.

paddsb mm0,[esi] ; add packed bytes with signed saturation


pcmpeqw mm3,mm7 ; compare packed words for equality

"psllw", "pslld" and "psllq" perform logical shift left of the packed words,
packed double words or a single quad word in the destination operand by the
amount specified in the source operand. "psrlw", "psrld" and "psrlq" perform
logical shift right of the packed words, packed double words or a single quad
word. "psraw" and "psrad" perform arithmetic shift of the packed words or
double words. The destination operand should be a MMX register, while source
operand can be a MMX register, 64-bit memory location, or 8-bit immediate
value.

psllw mm2,mm4 ; shift words left logically


psrad mm4,[ebx] ; shift double words right arithmetically

"emms" makes the FPU registers usable for the FPU instructions, it must be
used before using the FPU instructions if any MMX instructions were used.

2.1.15 SSE instructions

The SSE extension adds more MMX instructions and also introduces the
operations on packed single precision floating point values. The 128-bit
packed single precision format consists of four single precision floating
point values. The 128-bit SSE registers are designed for the purpose of
operations on this data type.
"movaps" and "movups" transfer a double quad word operand containing packed
single precision values from source operand to destination operand. At least
one of the operands have to be a SSE register, the second one can be also a
SSE register or 128-bit memory location. Memory operands for "movaps"
instruction must be aligned on boundary of 16 bytes, operands for "movups"
instruction don't have to be aligned.

movups xmm0,[ebx] ; move unaligned double quad word

"movlps" moves packed two single precision values between the memory and the
low quad word of SSE register. "movhps" moved packed two single precision
values between the memory and the high quad word of SSE register. One of the
operands must be a SSE register, and the other operand must be a 64-bit memory
location.

movlps xmm0,[ebx] ; move memory to low quad word of xmm0


movhps [esi],xmm7 ; move high quad word of xmm7 to memory

"movlhps" moves packed two single precision values from the low quad word
of source register to the high quad word of destination register. "movhlps"
moves two packed single precision values from the high quad word of source
register to the low quad word of destination register. Both operands have to
be a SSE registers.
"movmskps" transfers the most significant bit of each of the four single
precision values in the SSE register into low four bits of a general register.
The source operand must be a SSE register, the destination operand must be a
general register.
"movss" transfers a single precision value between source and destination
operand (only the low double word is trasferred). At least one of the operands
have to be a SSE register, the second one can be also a SSE register or 32-bit
memory location.

movss [edi],xmm3 ; move low double word of xmm3 to memory

Each of the SSE arithmetic operations has two variants. When the mnemonic
ends with "ps", the source operand can be a 128-bit memory location or a SSE
register, the destination operand must be a SSE register and the operation is
performed on packed four single precision values, for each pair of the
corresponding data elements separately, the result is stored in the
destination register. When the mnemonic ends with "ss", the source operand
can be a 32-bit memory location or a SSE register, the destination operand
must be a SSE register and the operation is performed on single precision
values, only low double words of SSE registers are used in this case, the
result is stored in the low double word of destination register. "addps" and
"addss" add the values, "subps" and "subss" subtract the source value from
destination value, "mulps" and "mulss" multiply the values, "divps" and
"divss" divide the destination value by the source value, "rcpps" and "rcpss"
compute the approximate reciprocal of the source value, "sqrtps" and "sqrtss"
compute the square root of the source value, "rsqrtps" and "rsqrtss" compute
the approximate reciprocal of square root of the source value, "maxps" and
"maxss" compare the source and destination values and return the greater one,
"minps" and "minss" compare the source and destination values and return the
lesser one.

mulss xmm0,[ebx] ; multiply single precision values


addps xmm3,xmm7 ; add packed single precision values

"andps", "andnps", "orps" and "xorps" perform the logical operations on


packed single precision values. The source operand can be a 128-bit memory
location or a SSE register, the destination operand must be a SSE register.
"cmpps" compares packed single precision values and returns a mask result
into the destination operand, which must be a SSE register. The source operand
can be a 128-bit memory location or SSE register, the third operand must be an
immediate operand selecting code of one of the eight compare conditions
(table 2.3). "cmpss" performs the same operation on single precision values,
only low double word of destination register is affected, in this case source
operand can be a 32-bit memory location or SSE register. These two
instructions have also variants with only two operands and the condition
encoded within mnemonic. Their mnemonics are obtained by attaching the
mnemonic from table 2.3 to the "cmp" mnemonic and then attaching the "ps" or
"ss" at the end.

cmpps xmm2,xmm4,0 ; compare packed single precision values


cmpltss xmm0,[ebx] ; compare single precision values

Table 2.3 SSE conditions


/-------------------------------------------\
| Code | Mnemonic | Description |
|======|==========|=========================|
| 0 | eq | equal |
| 1 | lt | less than |
| 2 | le | less than or equal |
| 3 | unord | unordered |
| 4 | neq | not equal |
| 5 | nlt | not less than |
| 6 | nle | not less than nor equal |
| 7 | ord | ordered |
\-------------------------------------------/

"comiss" and "ucomiss" compare the single precision values and set the ZF,
PF and CF flags to show the result. The destination operand must be a SSE
register, the source operand can be a 32-bit memory location or SSE register.
"shufps" moves any two of the four single precision values from the
destination operand into the low quad word of the destination operand, and any
two of the four values from the source operand into the high quad word of the
destination operand. The destination operand must be a SSE register, the
source operand can be a 128-bit memory location or SSE register, the third
operand must be an 8-bit immediate value selecting which values will be moved
into the destination operand. Bits 0 and 1 select the value to be moved from
destination operand to the low double word of the result, bits 2 and 3 select
the value to be moved from the destination operand to the second double word,
bits 4 and 5 select the value to be moved from the source operand to the third
double word, and bits 6 and 7 select the value to be moved from the source
operand to the high double word of the result.
shufps xmm0,xmm0,10010011b ; shuffle double words

"unpckhps" performs an interleaved unpack of the values from the high parts
of the source and destination operands and stores the result in the
destination operand, which must be a SSE register. The source operand can be
a 128-bit memory location or a SSE register. "unpcklps" performs an
interleaved unpack of the values from the low parts of the source and
destination operand and stores the result in the destination operand,
the rules for operands are the same.
"cvtpi2ps" converts packed two double word integers into the the packed two
single precision floating point values and stores the result in the low quad
word of the destination operand, which should be a SSE register. The source
operand can be a 64-bit memory location or MMX register.

cvtpi2ps xmm0,mm0 ; convert integers to single precision values

"cvtsi2ss" converts a double word integer into a single precision floating


point value and stores the result in the low double word of the destination
operand, which should be a SSE register. The source operand can be a 32-bit
memory location or 32-bit general register.

cvtsi2ss xmm0,eax ; convert integer to single precision value

"cvtps2pi" converts packed two single precision floating point values into
packed two double word integers and stores the result in the destination
operand, which should be a MMX register. The source operand can be a 64-bit
memory location or SSE register, only low quad word of SSE register is used.
"cvttps2pi" performs the similar operation, except that truncation is used to
round a source values to integers, rules for the operands are the same.

cvtps2pi mm0,xmm0 ; convert single precision values to integers

"cvtss2si" convert a single precision floating point value into a double


word integer and stores the result in the destination operand, which should be
a 32-bit general register. The source operand can be a 32-bit memory location
or SSE register, only low double word of SSE register is used. "cvttss2si"
performs the similar operation, except that truncation is used to round a
source value to integer, rules for the operands are the same.

cvtss2si eax,xmm0 ; convert single precision value to integer

"pextrw" copies the word in the source operand specified by the third
operand to the destination operand. The source operand must be a MMX register,
the destination operand must be a 32-bit general register (the high word of
the destination is cleared), the third operand must an 8-bit immediate value.

pextrw eax,mm0,1 ; extract word into eax

"pinsrw" inserts a word from the source operand in the destination operand
at the location specified with the third operand, which must be an 8-bit
immediate value. The destination operand must be a MMX register, the source
operand can be a 16-bit memory location or 32-bit general register (only low
word of the register is used).

pinsrw mm1,ebx,2 ; insert word from ebx

"pavgb" and "pavgw" compute average of packed bytes or words. "pmaxub"


return the maximum values of packed unsigned bytes, "pminub" returns the
minimum values of packed unsigned bytes, "pmaxsw" returns the maximum values
of packed signed words, "pminsw" returns the minimum values of packed signed
words. "pmulhuw" performs a unsigned multiplication of the packed words and
stores the high words of the results in the destination operand. "psadbw"
computes the absolute differences of packed unsigned bytes, sums the
differences, and stores the sum in the low word of destination operand. All
these instructions follow the same rules for operands as the general MMX
operations described in previous section.
"pmovmskb" creates a mask made of the most significant bit of each byte in
the source operand and stores the result in the low byte of destination
operand. The source operand must be a MMX register, the destination operand
must a 32-bit general register.
"pshufw" inserts words from the source operand in the destination operand
from the locations specified with the third operand. The destination operand
must be a MMX register, the source operand can be a 64-bit memory location or
MMX register, third operand must an 8-bit immediate value selecting which
values will be moved into destination operand, in the similar way as the third
operand of the "shufps" instruction.
"movntq" moves the quad word from the source operand to memory using a
non-temporal hint to minimize cache pollution. The source operand should be a
MMX register, the destination operand should be a 64-bit memory location.
"movntps" stores packed single precision values from the SSE register to
memory using a non-temporal hint. The source operand should be a SSE register,
the destination operand should be a 128-bit memory location. "maskmovq" stores
selected bytes from the first operand into a 64-bit memory location using a
non-temporal hint. Both operands should be a MMX registers, the second operand
selects wich bytes from the source operand are written to memory. The
memory location is pointed by DI (or EDI) register in the segment selected
by DS.
"prefetcht0", "prefetcht1", "prefetcht2" and "prefetchnta" fetch the line
of data from memory that contains byte specified with the operand to a
specified location in hierarchy. The operand should be an 8-bit memory
location.
"sfence" performs a serializing operation on all instruction storing to
memory that were issued prior to it. This instruction has no operands.
"ldmxcsr" loads the 32-bit memory operand into the MXCSR register. "stmxcsr"
stores the contents of MXCSR into a 32-bit memory operand.
"fxsave" saves the current state of the FPU, MXCSR register, and all the FPU
and SSE registers to a 512-byte memory location specified in the destination
operand. "fxrstor" reloads data previously stored with "fxsave" instruction
from the specified 512-byte memory location. The memory operand for both those
instructions must be aligned on 16 byte boundary, it should declare operand
of no specified size.

2.1.16 SSE2 instructions

The SSE2 extension introduces the operations on packed double precision


floating point values, extends the syntax of MMX instructions, and adds also
some new instructions.
"movapd" and "movupd" transfer a double quad word operand containing packed
double precision values from source operand to destination operand. These
instructions are analogous to "movaps" and "movups" and have the same rules
for operands.
"movlpd" moves double precision value between the memory and the low quad
word of SSE register. "movhpd" moved double precision value between the memory
and the high quad word of SSE register. These instructions are analogous to
"movlps" and "movhps" and have the same rules for operands.
"movmskpd" transfers the most significant bit of each of the two double
precision values in the SSE register into low two bits of a general register.
This instruction is analogous to "movmskps" and has the same rules for
operands.
"movsd" transfers a double precision value between source and destination
operand (only the low quad word is trasferred). At least one of the operands
have to be a SSE register, the second one can be also a SSE register or 64-bit
memory location.
Arithmetic operations on double precision values are: "addpd", "addsd",
"subpd", "subsd", "mulpd", "mulsd", "divpd", "divsd", "sqrtpd", "sqrtsd",
"maxpd", "maxsd", "minpd", "minsd", and they are analoguous to arithmetic
operations on single precision values described in previous section. When the
mnemonic ends with "pd" instead of "ps", the operation is performed on packed
two double precision values, but rules for operands are the same. When the
mnemonic ends with "sd" instead of "ss", the source operand can be a 64-bit
memory location or a SSE register, the destination operand must be a SSE
register and the operation is performed on double precision values, only low
quad words of SSE registers are used in this case.
"andpd", "andnpd", "orpd" and "xorpd" perform the logical operations on
packed double precision values. They are analoguous to SSE logical operations
on single prevision values and have the same rules for operands.
"cmppd" compares packed double precision values and returns and returns a
mask result into the destination operand. This instruction is analoguous to
"cmpps" and has the same rules for operands. "cmpsd" performs the same
operation on double precision values, only low quad word of destination
register is affected, in this case source operand can be a 64-bit memory or
SSE register. Variant with only two operands are obtained by attaching the
condition mnemonic from table 2.3 to the "cmp" mnemonic and then attaching
the "pd" or "sd" at the end.
"comisd" and "ucomisd" compare the double precision values and set the ZF,
PF and CF flags to show the result. The destination operand must be a SSE
register, the source operand can be a 128-bit memory location or SSE register.
"shufpd" moves any of the two double precision values from the destination
operand into the low quad word of the destination operand, and any of the two
values from the source operand into the high quad word of the destination
operand. This instruction is analoguous to "shufps" and has the same rules for
operand. Bit 0 of the third operand selects the value to be moved from the
destination operand, bit 1 selects the value to be moved from the source
operand, the rest of bits are reserved and must be zeroed.
"unpckhpd" performs an unpack of the high quad words from the source and
destination operands, "unpcklpd" performs an unpack of the low quad words from
the source and destination operands. They are analoguous to "unpckhps" and
"unpcklps", and have the same rules for operands.
"cvtps2pd" converts the packed two single precision floating point values to
two packed double precision floating point values, the destination operand
must be a SSE register, the source operand can be a 64-bit memory location or
SSE register. "cvtpd2ps" converts the packed two double precision floating
point values to packed two single precision floating point values, the
destination operand must be a SSE register, the source operand can be a
128-bit memory location or SSE register. "cvtss2sd" converts the single
precision floating point value to double precision floating point value, the
destination operand must be a SSE register, the source operand can be a 32-bit
memory location or SSE register. "cvtsd2ss" converts the double precision
floating point value to single precision floating point value, the destination
operand must be a SSE register, the source operand can be 64-bit memory
location or SSE register.
"cvtpi2pd" converts packed two double word integers into the the packed
double precision floating point values, the destination operand must be a SSE
register, the source operand can be a 64-bit memory location or MMX register.
"cvtsi2sd" converts a double word integer into a double precision floating
point value, the destination operand must be a SSE register, the source
operand can be a 32-bit memory location or 32-bit general register. "cvtpd2pi"
converts packed double precision floating point values into packed two double
word integers, the destination operand should be a MMX register, the source
operand can be a 128-bit memory location or SSE register. "cvttpd2pi" performs
the similar operation, except that truncation is used to round a source values
to integers, rules for operands are the same. "cvtsd2si" converts a double
precision floating point value into a double word integer, the destination
operand should be a 32-bit general register, the source operand can be a
64-bit memory location or SSE register. "cvttsd2si" performs the similar
operation, except that truncation is used to round a source value to integer,
rules for operands are the same.
"cvtps2dq" and "cvttps2dq" convert packed single precision floating point
values to packed four double word integers, storing them in the destination
operand. "cvtpd2dq" and "cvttpd2dq" convert packed double precision floating
point values to packed two double word integers, storing the result in the low
quad word of the destination operand. "cvtdq2ps" converts packed four
double word integers to packed single precision floating point values.
For all these instructions destination operand must be a SSE register, the
source operand can be a 128-bit memory location or SSE register.
"cvtdq2pd" converts packed two double word integers from the source operand to
packed double precision floating point values, the source can be a 64-bit
memory location or SSE register, destination has to be SSE register.
"movdqa" and "movdqu" transfer a double quad word operand containing packed
integers from source operand to destination operand. At least one of the
operands have to be a SSE register, the second one can be also a SSE register
or 128-bit memory location. Memory operands for "movdqa" instruction must be
aligned on boundary of 16 bytes, operands for "movdqu" instruction don't have
to be aligned.
"movq2dq" moves the contents of the MMX source register to the low quad word
of destination SSE register. "movdq2q" moves the low quad word from the source
SSE register to the destination MMX register.

movq2dq xmm0,mm1 ; move from MMX register to SSE register


movdq2q mm0,xmm1 ; move from SSE register to MMX register

All MMX instructions operating on the 64-bit packed integers (those with
mnemonics starting with "p") are extended to operate on 128-bit packed
integers located in SSE registers. Additional syntax for these instructions
needs an SSE register where MMX register was needed, and the 128-bit memory
location or SSE register where 64-bit memory location or MMX register were
needed. The exception is "pshufw" instruction, which doesn't allow extended
syntax, but has two new variants: "pshufhw" and "pshuflw", which allow only
the extended syntax, and perform the same operation as "pshufw" on the high
or low quad words of operands respectively. Also the new instruction "pshufd"
is introduced, which performs the same operation as "pshufw", but on the
double words instead of words, it allows only the extended syntax.

psubb xmm0,[esi] ; subtract 16 packed bytes


pextrw eax,xmm0,7 ; extract highest word into eax

"paddq" performs the addition of packed quad words, "psubq" performs the
subtraction of packed quad words, "pmuludq" performs an unsigned
multiplication of low double words from each corresponding quad words and
returns the results in packed quad words. These instructions follow the same
rules for operands as the general MMX operations described in 2.1.14.
"pslldq" and "psrldq" perform logical shift left or right of the double

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy