Xab
Xab
The "loop" instructions are conditional jumps that use a value placed in
CX (or ECX) to specify the number of repetitions of a software loop. All
"loop" instructions automatically decrement CX (or ECX) and terminate the
loop (don't transfer the control) when CX (or ECX) is zero. It uses CX or ECX
whether the current code setting is 16-bit or 32-bit, but it can be forced to
us CX with the "loopw" mnemonic or to use ECX with the "loopd" mnemonic.
"loope" and "loopz" are the synonyms for the same instruction, which acts as
the standard "loop", but also terminates the loop when ZF flag is set.
"loopew" and "loopzw" mnemonics force them to use CX register while "looped"
and "loopzd" force them to use ECX register. "loopne" and "loopnz" are the
synonyms for the same instructions, which acts as the standard "loop", but
also terminate the loop when ZF flag is not set. "loopnew" and "loopnzw"
mnemonics force them to use CX register while "loopned" and "loopnzd" force
them to use ECX register. Every "loop" instruction needs an operand being an
immediate value specifying target address, it can be only short jump (in the
range of 128 bytes back and 127 bytes forward from the address of instruction
following the "loop" instruction).
"jcxz" branches to the label specified in the instruction if it finds a
value of zero in CX, "jecxz" does the same, but checks the value of ECX
instead of CX. Rules for the operands are the same as for the "loop"
instruction.
"int" activates the interrupt service routine that corresponds to the
number specified as an operand to the instruction, the number should be in
range from 0 to 255. The interrupt service routine terminates with an "iret"
instruction that returns control to the instruction that follows "int".
"int3" mnemonic codes the short (one byte) trap that invokes the interrupt 3.
"into" instruction invokes the interrupt 4 if the OF flag is set.
"bound" verifies that the signed value contained in the specified register
lies within specified limits. An interrupt 5 occurs if the value contained in
the register is less than the lower bound or greater than the upper bound. It
needs two operands, the first operand specifies the register being tested,
the second operand should be memory address for the two signed limit values.
The operands can be "word" or "dword" in size.
"in" transfers a byte, word, or double word from an input port to AL, AX,
or EAX. I/O ports can be addressed either directly, with the immediate byte
value coded in instruction, or indirectly via the DX register. The destination
operand should be AL, AX, or EAX register. The source operand should be an
immediate value in range from 0 to 255, or DX register.
"out" transfers a byte, word, or double word to an output port from AL, AX,
or EAX. The program can specify the number of the port using the same methods
as the "in" instruction. The destination operand should be an immediate value
in range from 0 to 255, or DX register. The source operand should be AL, AX,
or EAX register.
"cmps" subtracts the destination string element from the source string
element and updates the flags AF, SF, PF, CF and OF, but it does not change
any of the compared elements. If the string elements are equal, ZF is set,
otherwise it is cleared. The first operand for this instruction should be the
source string element addressed by SI or ESI with any segment prefix, the
second operand should be the destination string element addressed by DI or
EDI.
"scas" subtracts the destination string element from AL, AX, or EAX
(depending on the size of string element) and updates the flags AF, SF, ZF,
PF, CF and OF. If the values are equal, ZF is set, otherwise it is cleared.
The operand should be the destination string element addressed by DI or EDI.
"stos" places the value of AL, AX, or EAX into the destination string
element. Rules for the operand are the same as for the "scas" instruction.
"lods" places the source string element into AL, AX, or EAX. The operand
should be the source string element addressed by SI or ESI with any segment
prefix.
"ins" transfers a byte, word, or double word from an input port addressed
by DX register to the destination string element. The destination operand
should be memory addressed by DI or EDI, the source operand should be the DX
register.
The flag control instructions provide a method for directly changing the
state of bits in the flag register. All instructions described in this
section have no operands.
"stc" sets the CF (carry flag) to 1, "clc" zeroes the CF, "cmc" changes the
CF to its complement. "std" sets the DF (direction flag) to 1, "cld" zeroes
the DF, "sti" sets the IF (interrupt flag) to 1 and therefore enables the
interrupts, "cli" zeroes the IF and therefore disables the interrupts.
"lahf" copies SF, ZF, AF, PF, and CF to bits 7, 6, 4, 2, and 0 of the
AH register. The contents of the remaining bits are undefined. The flags
remain unaffected.
"sahf" transfers bits 7, 6, 4, 2, and 0 from the AH register into SF, ZF,
AF, PF, and CF.
"pushf" decrements "esp" by two or four and stores the low word or
double word of flags register at the top of stack, size of stored data
depends on the current code setting. "pushfw" variant forces storing the
word and "pushfd" forces storing the double word.
"popf" transfers specific bits from the word or double word at the top
of stack, then increments "esp" by two or four, this value depends on
the current code setting. "popfw" variant forces restoring from the word
and "popfd" forces restoring from the double word.
"salc" instruction sets the all bits of AL register when the carry flag is
set and zeroes the AL register otherwise. This instruction has no arguments.
The instructions obtained by attaching the condition mnemonic to "cmov"
mnemonic transfer the word or double word from the general register or memory
to the general register only when the condition is true. The destination
operand should be general register, the source operand can be general register
or memory.
"cmpxchg" compares the value in the AL, AX, or EAX register with the
destination operand. If the two values are equal, the source operand is
loaded into the destination operand. Otherwise, the destination operand is
loaded into the AL, AX, or EAX register. The destination operand may be a
general register or memory, the source operand must be a general register.
"cmpxchg8b" compares the 64-bit value in EDX and EAX registers with the
destination operand. If the values are equal, the 64-bit value in ECX and EBX
registers is stored in the destination operand. Otherwise, the value in the
destination operand is loaded into EDX and EAX registers. The destination
operand should be a quad word in memory.
"nop" instruction occupies one byte but affects nothing but the instruction
pointer. This instruction has no operands and doesn't perform any operation.
"ud2" instruction generates an invalid opcode exception. This instruction
is provided for software testing to explicitly generate an invalid opcode.
This is instruction has no operands.
"xlat" replaces a byte in the AL register with a byte indexed by its value
in a translation table addressed by BX or EBX. The operand should be a byte
memory addressed by BX or EBX with any segment prefix. This instruction has
also a short form "xlatb" which has no operands and uses the BX or EBX address
in the segment selected by DS depending on the current code setting.
"lds" transfers a pointer variable from the source operand to DS and the
destination register. The source operand must be a memory operand, and the
destination operand must be a general register. The DS register receives the
segment selector of the pointer while the destination register receives the
offset part of the pointer. "les", "lfs", "lgs" and "lss" operate identically
to "lds" except that rather than DS register the ES, FS, GS and SS is used
respectively.
"lea" transfers the offset of the source operand (rather than its value)
to the destination operand. The source operand must be a memory operand, and
the destination operand must be a general register.
"lmsw" loads the operand into the machine status word (bits 0 through 15 of
CR0 register), while "smsw" stores the machine status word into the
destination operand. The operand for both those instructions can be 16-bit
general register or memory, for "smsw" it can also be 32-bit general
register.
"lgdt" and "lidt" instructions load the values in operand into the global
descriptor table register or the interrupt descriptor table register
respectively. "sgdt" and "sidt" store the contents of the global descriptor
table register or the interrupt descriptor table register in the destination
operand. The operand should be a 6 bytes in memory.
"lldt" loads the operand into the segment selector field of the local
descriptor table register and "sldt" stores the segment selector from the
local descriptor table register in the operand. "ltr" loads the operand into
the segment selector field of the task register and "str" stores the segment
selector from the task register in the operand. Rules for operand are the same
as for the "lmsw" and "smsw" instructions.
"lar" loads the access rights from the segment descriptor specified by
the selector in source operand into the destination operand and sets the ZF
flag. The destination operand can be a 16-bit or 32-bit general register.
The source operand should be a 16-bit general register or memory.
"lsl" loads the segment limit from the segment descriptor specified by the
selector in source operand into the destination operand and sets the ZF flag.
Rules for operand are the same as for the "lar" instruction.
"verr" and "verw" verify whether the code or data segment specified with
the operand is readable or writable from the current privilege level. The
operand should be a word, it can be general register or memory. If the segment
is accessible and readable (for "verr") or writable (for "verw") the ZF flag
is set, otherwise it's cleared. Rules for operand are the same as for the
"lldt" instruction.
"arpl" compares the RPL (requestor's privilege level) fields of two segment
selectors. The first operand contains one segment selector and the second
operand contains the other. If the RPL field of the destination operand is
less than the RPL field of the source operand, the ZF flag is set and the RPL
field of the destination operand is increased to match that of the source
operand. Otherwise, the ZF flag is cleared and no change is made to the
destination operand. The destination operand can be a word general register
or memory, the source operand must be a general register.
"clts" clears the TS (task switched) flag in the CR0 register. This
instruction has no operands.
"lock" prefix causes the processor's bus-lock signal to be asserted during
execution of the accompanying instruction. In a multiprocessor environment,
the bus-lock signal insures that the processor has exclusive use of any shared
memory while the signal is asserted. The "lock" prefix can be prepended only
to the following instructions and only to those forms of the instructions
where the destination operand is a memory operand: "add", "adc", "and", "btc",
"btr", "bts", "cmpxchg", "cmpxchg8b", "dec", "inc", "neg", "not", "or", "sbb",
"sub", "xor", "xadd" and "xchg". If the "lock" prefix is used with one of
these instructions and the source operand is a memory operand, an undefined
opcode exception may be generated. An undefined opcode exception will also be
generated if the "lock" prefix is used with any instruction not in the above
list. The "xchg" instruction always asserts the bus-lock signal regardless of
the presence or absence of the "lock" prefix.
"hlt" stops instruction execution and places the processor in a halted
state. An enabled interrupt, a debug exception, the BINIT, INIT or the RESET
signal will resume execution. This instruction has no operands.
"invlpg" invalidates (flushes) the TLB (translation lookaside buffer) entry
specified with the operand, which should be a memory. The processor determines
the page that contains that address and flushes the TLB entry for that page.
"rdmsr" loads the contents of a 64-bit MSR (model specific register) of the
address specified in the ECX register into registers EDX and EAX. "wrmsr"
writes the contents of registers EDX and EAX into the 64-bit MSR of the
address specified in the ECX register. "rdtsc" loads the current value of the
processor's time stamp counter from the 64-bit MSR into the EDX and EAX
registers. The processor increments the time stamp counter MSR every clock
cycle and resets it to 0 whenever the processor is reset. "rdpmc" loads the
contents of the 40-bit performance monitoring counter specified in the ECX
register into registers EDX and EAX. These instructions have no operands.
"wbinvd" writes back all modified cache lines in the processor's internal
cache to main memory and invalidates (flushes) the internal caches. The
instruction then issues a special function bus cycle that directs external
caches to also write back modified data and another bus cycle to indicate that
the external caches should be invalidated. This instruction has no operands.
"rsm" return program control from the system management mode to the program
that was interrupted when the processor received an SMM interrupt. This
instruction has no operands.
"sysenter" executes a fast call to a level 0 system procedure, "sysexit"
executes a fast return to level 3 user code. The addresses used by these
instructions are stored in MSRs. These instructions have no operands.
"fld1", "fldz", "fldl2t", "fldl2e", "fldpi", "fldlg2" and "fldln2" load the
commonly used contants onto the FPU register stack. The loaded constants are
+1.0, +0.0, lb 10, lb e, pi, lg 2 and ln 2 respectively. These instructions
have no operands.
"fild" converts the signed integer source operand into double extended
precision floating-point format and pushes the result onto the FPU register
stack. The source operand can be a 16-bit, 32-bit or 64-bit memory location.
"fst" copies the value of ST0 register to the destination operand, which
can be 32-bit or 64-bit memory location or another FPU register. "fstp"
performs the same operation as "fst" and then pops the register stack,
getting rid of ST0. "fstp" accepts the same operands as the "fst" instruction
and can also store value in the 80-bit memory.
"faddp" adds the destination and source operand, stores the sum in the
destination location and then pops the register stack. The destination operand
must be an FPU register and the source operand must be the ST0. When no
operands are specified, ST1 is used as a destination operand.
"fcompp" compares the contents of ST0 and ST1, sets flags in the FPU status
word according to the results and pops the register stack twice. This
instruction has no operands.
"fucom", "fucomp" and "fucompp" performs an unordered comparison of two FPU
registers. Rules for operands are the same as for the "fcom", "fcomp" and
"fcompp", but the source operand must be an FPU register.
"ficom" and "ficomp" compare the value in ST0 with an integer source operand
and set the flags in the FPU status word according to the results. "ficomp"
additionally pops the register stack after performing the comparison. The
integer value is converted to double extended precision floating-point format
before the comparison is made. The operand should be a 16-bit or 32-bit
memory location.
"ftst" compares the value in ST0 with 0.0 and sets the flags in the FPU
status word according to the results. "fxam" examines the contents of the ST0
and sets the flags in FPU status word to indicate the class of value in the
register. These instructions have no operands.
"fstsw" and "fnstsw" store the current value of the FPU status word in the
destination location. The destination operand can be either a 16-bit memory or
the AX register. "fstsw" checks for pending unmasked FPU exceptions before
storing the status word, "fnstsw" does not.
"fstcw" and "fnstcw" store the current value of the FPU control word at the
specified destination in memory. "fstcw" checks for pending umasked FPU
exceptions before storing the control word, "fnstcw" does not. "fldcw" loads
the operand into the FPU control word. The operand should be a 16-bit memory
location.
"fstenv" and "fnstenv" store the current FPU operating environment at the
memory location specified with the destination operand, and then mask all FPU
exceptions. "fstenv" checks for pending umasked FPU exceptions before
proceeding, "fnstenv" does not. "fldenv" loads the complete operating
environment from memory into the FPU. "fsave" and "fnsave" store the current
FPU state (operating environment and register stack) at the specified
destination in memory and reinitializes the FPU. "fsave" check for pending
unmasked FPU exceptions before proceeding, "fnsave" does not. "frstor"
loads the FPU state from the specified memory location. All these instructions
need an operand being a memory location. For each of these instructions
exist two additional mnemonics that allow to precisely select the type of the
operation. The "fstenvw", "fnstenvw", "fldenvw", "fsavew", "fnsavew" and
"frstorw" mnemonics force the instruction to perform operation as in the 16-bit
mode, while "fstenvd", "fnstenvd", "fldenvd", "fsaved", "fnsaved" and "frstord"
force the operation as in 32-bit mode.
"finit" and "fninit" set the FPU operating environment into its default
state. "finit" checks for pending unmasked FPU exception before proceeding,
"fninit" does not. "fclex" and "fnclex" clear the FPU exception flags in the
FPU status word. "fclex" checks for pending unmasked FPU exception before
proceeding, "fnclex" does not. "wait" and "fwait" are synonyms for the same
instruction, which causes the processor to check for pending unmasked FPU
exceptions and handle them before proceeding. These instructions have no
operands.
"ffree" sets the tag associated with specified FPU register to empty. The
operand should be an FPU register.
"fincstp" and "fdecstp" rotate the FPU stack by one by adding or
subtracting one to the pointer of the top of stack. These instructions have no
operands.
The MMX instructions operate on the packed integer types and use the MMX
registers, which are the low 64-bit parts of the 80-bit FPU registers. Because
of this MMX instructions cannot be used at the same time as FPU instructions.
They can operate on packed bytes (eight 8-bit integers), packed words (four
16-bit integers) or packed double words (two 32-bit integers), use of packed
formats allows to perform operations on multiple data at one time.
"movq" copies a quad word from the source operand to the destination
operand. At least one of the operands must be a MMX register, the second one
can be also a MMX register or 64-bit memory location.
"movd" copies a double word from the source operand to the destination
operand. One of the operands must be a MMX register, the second one can be a
general register or 32-bit memory location. Only low double word of MMX
register is used.
All general MMX operations have two operands, the destination operand should
be a MMX register, the source operand can be a MMX register or 64-bit memory
location. Operation is performed on the corresponding data elements of the
source and destination operand and stored in the data elements of the
destination operand. "paddb", "paddw" and "paddd" perform the addition of
packed bytes, packed words, or packed double words. "psubb", "psubw" and
"psubd" perform the subtraction of appropriate types. "paddsb", "paddsw",
"psubsb" and "psubsw" perform the addition or subtraction of packed bytes
or packed words with the signed saturation. "paddusb", "paddusw", "psubusb",
"psubusw" are analoguous, but with unsigned saturation. "pmulhw" and "pmullw"
performs a signed multiplication of the packed words and store the high or low
words of the results in the destination operand. "pmaddwd" performs a multiply
of the packed words and adds the four intermediate double word products in
pairs to produce result as a packed double words. "pand", "por" and "pxor"
perform the logical operations on the quad words, "pandn" peforms also a
logical negation of the destination operand before performing the "and"
operation. "pcmpeqb", "pcmpeqw" and "pcmpeqd" compare for equality of packed
bytes, packed words or packed double words. If a pair of data elements is
equal, the corresponding data element in the destination operand is filled with
bits of value 1, otherwise it's set to 0. "pcmpgtb", "pcmpgtw" and "pcmpgtd"
perform the similar operation, but they check whether the data elements in the
destination operand are greater than the correspoding data elements in the
source operand. "packsswb" converts packed signed words into packed signed
bytes, "packssdw" converts packed signed double words into packed signed
words, using saturation to handle overflow conditions. "packuswb" converts
packed signed words into packed unsigned bytes. Converted data elements from
the source operand are stored in the high part of the destination operand,
while converted data elements from the destination operand are stored in the
low part. "punpckhbw", "punpckhwd" and "punpckhdq" interleaves the data
elements from the high parts of the source and destination operands and
stores the result into the destination operand. "punpcklbw", "punpcklwd" and
"punpckldq" perform the same operation, but the low parts of the source and
destination operand are used.
"psllw", "pslld" and "psllq" perform logical shift left of the packed words,
packed double words or a single quad word in the destination operand by the
amount specified in the source operand. "psrlw", "psrld" and "psrlq" perform
logical shift right of the packed words, packed double words or a single quad
word. "psraw" and "psrad" perform arithmetic shift of the packed words or
double words. The destination operand should be a MMX register, while source
operand can be a MMX register, 64-bit memory location, or 8-bit immediate
value.
"emms" makes the FPU registers usable for the FPU instructions, it must be
used before using the FPU instructions if any MMX instructions were used.
The SSE extension adds more MMX instructions and also introduces the
operations on packed single precision floating point values. The 128-bit
packed single precision format consists of four single precision floating
point values. The 128-bit SSE registers are designed for the purpose of
operations on this data type.
"movaps" and "movups" transfer a double quad word operand containing packed
single precision values from source operand to destination operand. At least
one of the operands have to be a SSE register, the second one can be also a
SSE register or 128-bit memory location. Memory operands for "movaps"
instruction must be aligned on boundary of 16 bytes, operands for "movups"
instruction don't have to be aligned.
"movlps" moves packed two single precision values between the memory and the
low quad word of SSE register. "movhps" moved packed two single precision
values between the memory and the high quad word of SSE register. One of the
operands must be a SSE register, and the other operand must be a 64-bit memory
location.
"movlhps" moves packed two single precision values from the low quad word
of source register to the high quad word of destination register. "movhlps"
moves two packed single precision values from the high quad word of source
register to the low quad word of destination register. Both operands have to
be a SSE registers.
"movmskps" transfers the most significant bit of each of the four single
precision values in the SSE register into low four bits of a general register.
The source operand must be a SSE register, the destination operand must be a
general register.
"movss" transfers a single precision value between source and destination
operand (only the low double word is trasferred). At least one of the operands
have to be a SSE register, the second one can be also a SSE register or 32-bit
memory location.
Each of the SSE arithmetic operations has two variants. When the mnemonic
ends with "ps", the source operand can be a 128-bit memory location or a SSE
register, the destination operand must be a SSE register and the operation is
performed on packed four single precision values, for each pair of the
corresponding data elements separately, the result is stored in the
destination register. When the mnemonic ends with "ss", the source operand
can be a 32-bit memory location or a SSE register, the destination operand
must be a SSE register and the operation is performed on single precision
values, only low double words of SSE registers are used in this case, the
result is stored in the low double word of destination register. "addps" and
"addss" add the values, "subps" and "subss" subtract the source value from
destination value, "mulps" and "mulss" multiply the values, "divps" and
"divss" divide the destination value by the source value, "rcpps" and "rcpss"
compute the approximate reciprocal of the source value, "sqrtps" and "sqrtss"
compute the square root of the source value, "rsqrtps" and "rsqrtss" compute
the approximate reciprocal of square root of the source value, "maxps" and
"maxss" compare the source and destination values and return the greater one,
"minps" and "minss" compare the source and destination values and return the
lesser one.
"comiss" and "ucomiss" compare the single precision values and set the ZF,
PF and CF flags to show the result. The destination operand must be a SSE
register, the source operand can be a 32-bit memory location or SSE register.
"shufps" moves any two of the four single precision values from the
destination operand into the low quad word of the destination operand, and any
two of the four values from the source operand into the high quad word of the
destination operand. The destination operand must be a SSE register, the
source operand can be a 128-bit memory location or SSE register, the third
operand must be an 8-bit immediate value selecting which values will be moved
into the destination operand. Bits 0 and 1 select the value to be moved from
destination operand to the low double word of the result, bits 2 and 3 select
the value to be moved from the destination operand to the second double word,
bits 4 and 5 select the value to be moved from the source operand to the third
double word, and bits 6 and 7 select the value to be moved from the source
operand to the high double word of the result.
shufps xmm0,xmm0,10010011b ; shuffle double words
"unpckhps" performs an interleaved unpack of the values from the high parts
of the source and destination operands and stores the result in the
destination operand, which must be a SSE register. The source operand can be
a 128-bit memory location or a SSE register. "unpcklps" performs an
interleaved unpack of the values from the low parts of the source and
destination operand and stores the result in the destination operand,
the rules for operands are the same.
"cvtpi2ps" converts packed two double word integers into the the packed two
single precision floating point values and stores the result in the low quad
word of the destination operand, which should be a SSE register. The source
operand can be a 64-bit memory location or MMX register.
"cvtps2pi" converts packed two single precision floating point values into
packed two double word integers and stores the result in the destination
operand, which should be a MMX register. The source operand can be a 64-bit
memory location or SSE register, only low quad word of SSE register is used.
"cvttps2pi" performs the similar operation, except that truncation is used to
round a source values to integers, rules for the operands are the same.
"pextrw" copies the word in the source operand specified by the third
operand to the destination operand. The source operand must be a MMX register,
the destination operand must be a 32-bit general register (the high word of
the destination is cleared), the third operand must an 8-bit immediate value.
"pinsrw" inserts a word from the source operand in the destination operand
at the location specified with the third operand, which must be an 8-bit
immediate value. The destination operand must be a MMX register, the source
operand can be a 16-bit memory location or 32-bit general register (only low
word of the register is used).
All MMX instructions operating on the 64-bit packed integers (those with
mnemonics starting with "p") are extended to operate on 128-bit packed
integers located in SSE registers. Additional syntax for these instructions
needs an SSE register where MMX register was needed, and the 128-bit memory
location or SSE register where 64-bit memory location or MMX register were
needed. The exception is "pshufw" instruction, which doesn't allow extended
syntax, but has two new variants: "pshufhw" and "pshuflw", which allow only
the extended syntax, and perform the same operation as "pshufw" on the high
or low quad words of operands respectively. Also the new instruction "pshufd"
is introduced, which performs the same operation as "pshufw", but on the
double words instead of words, it allows only the extended syntax.
"paddq" performs the addition of packed quad words, "psubq" performs the
subtraction of packed quad words, "pmuludq" performs an unsigned
multiplication of low double words from each corresponding quad words and
returns the results in packed quad words. These instructions follow the same
rules for operands as the general MMX operations described in 2.1.14.
"pslldq" and "psrldq" perform logical shift left or right of the double