MP Assignment2
MP Assignment2
String representation in assembly: In assembly language, a string is simply a
contiguous block of bytes in memory where each byte represents a character.
There's no special "string" type like in high-level languages - it's just an array of
characters.
String termination: Strings are typically terminated with special characters to mark
their end:
● Null termination: A byte with value 0x00 (often called anullbyte)marksthe
end
● Newline termination: A byte with value 0x0A (line feed/newline character)
marks the end
● Some systems (like DOS) use 0x0D 0x0A (carriage return + line feed) as line
endings
String handling:
● rax: Accumulator register, holds syscall numbers and return values
● rdi: Destination index, often the first argument to syscalls (file descriptor)
● rsi: Source index, often points to the string/buffer to read or write
● rdx: Data register, often specifies the length/size for operations
● rcx: Counter register, frequently used as loop counter
; Write to stdout
mov rax, 1 ; syscall number for write
mov rdi, 1 ; file descriptor 1 (stdout)
mov rsi, message ; pointer to message
mov rdx, message_len ; length of message
syscall
Basics
1. How is a string represented in memory? A string in assembly is representedasa
contiguous sequence of bytes in memory, with each byte storing the ASCII or
Unicode value of a character. The string is typically terminated with a special
character (like a null byte0x00or newline0x0A) to mark its end.
2.Whatregisterstorestheaddressofastringinx86-64?Anygeneral-purposeregister
can storetheaddressofastring,butbyconvention,rsi(sourceindex)isoftenused
topointtosourcestrings,andrdi(destinationindex)fordestinationstrings.Forsystem
calls,rsitypically points to the string buffer.
Intermediate
1. How do you find the length of a string in assembly?To find the length of a string:
● Null-terminated string: Ends with a byte value of 0x00. Used for most C-style
strings and general text processing.
● Newline-terminated string: Ends with a byte value of 0x0A (or 0x0D 0x0A in
some systems). Typically represents a line of text from user input or a file.
The key difference is in how they're processed - code must check for the
appropriate terminator based on the expected string format.
3.Whatistheroleofthercxregisterinloops?Thercxregistertraditionallyservesasa
counter in loops. It can be:
● Decremented with each iteration when using the loop instruction (which
automatically decrementsrcxand jumps if not zero)
● Manually incremented or decremented in custom loop constructs
● Used to track the number of items processed
● Used as an index into arrays or strings
Advanced
1. What if the user inputs more characters than buffer size? How do you prevent
overflow?To prevent buffer overflow:
1. Alwaysspecifyamaximumnumberofbytestoreadthatislessthanorequal
to your buffer size:
mov rdx, buffer_size ; Set maximum bytes to read (equal to buffer size)
2. Ensure the buffer is large enough for the expected input plus the terminator
3. After reading, manually add a null terminator at the end of the actual input:
3. Can you modify this code to ignore whitespaces while counting? Here's a
modified string length routine that ignores whitespace characters:
skip_whitespace:
inc rsi ; Move to next character
jmp count_loop ; Repeat
count_done:
; rcx now contains count of non-whitespace characters
Thiscodecheckseachcharacteragainstcommonwhitespacevalues(space,tab,
line feed, carriage return) and only increments the counter for non-whitespace
characters.