0% found this document useful (0 votes)
9 views7 pages

MP Assignment2

The document provides an overview of string representation and handling in assembly language, emphasizing that strings are contiguous byte arrays without a special type. It details string termination methods, key registers for string operations, and syscall usage for input and output in Linux. Additionally, it covers algorithms for finding string length and answers to common questions regarding string processing in assembly.

Uploaded by

9iraj.jadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views7 pages

MP Assignment2

The document provides an overview of string representation and handling in assembly language, emphasizing that strings are contiguous byte arrays without a special type. It details string termination methods, key registers for string operations, and syscall usage for input and output in Linux. Additionally, it covers algorithms for finding string length and answers to common questions regarding string processing in assembly.

Uploaded by

9iraj.jadhav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

‭ASSIGNMENT 2‬

‭1. String in Assembly‬

‭String‬ ‭representation‬ ‭in‬ ‭assembly:‬ ‭In‬ ‭assembly‬ ‭language,‬ ‭a‬ ‭string‬ ‭is‬ ‭simply‬ ‭a‬
‭contiguous‬ ‭block‬ ‭of‬ ‭bytes‬ ‭in‬ ‭memory‬ ‭where‬ ‭each‬ ‭byte‬ ‭represents‬ ‭a‬ ‭character.‬
‭There's‬ ‭no‬ ‭special‬ ‭"string"‬ ‭type‬ ‭like‬ ‭in‬ ‭high-level‬ ‭languages‬ ‭-‬ ‭it's‬ ‭just‬ ‭an‬ ‭array‬ ‭of‬
‭characters.‬

‭String‬ ‭termination:‬ ‭Strings‬ ‭are‬ ‭typically‬ ‭terminated‬ ‭with‬ ‭special‬ ‭characters‬ ‭to‬ ‭mark‬
‭their end:‬

‭●‬ ‭Null‬ ‭termination:‬ ‭A‬ ‭byte‬ ‭with‬ ‭value‬ ‭0x00‬ ‭(often‬ ‭called‬ ‭a‬‭null‬‭byte)‬‭marks‬‭the‬
‭end‬
‭●‬ ‭Newline‬ ‭termination:‬ ‭A‬ ‭byte‬ ‭with‬ ‭value‬ ‭0x0A‬ ‭(line‬ ‭feed/newline‬ ‭character)‬
‭marks the end‬
‭●‬ ‭Some‬ ‭systems‬ ‭(like‬ ‭DOS)‬ ‭use‬ ‭0x0D‬ ‭0x0A‬ ‭(carriage‬ ‭return‬ ‭+‬ ‭line‬ ‭feed)‬ ‭as‬ ‭line‬
‭endings‬

‭String handling:‬

‭; Example of defining a string in the data section‬


‭section .data‬
‭message db "Hello, World!", 0 ; Null-terminated string‬
‭prompt db "Enter your name: ", 0‬
‭buffer resb 64 ; Reserve 64 bytes for input‬

‭Input and output handling:‬

‭●‬ ‭In Linux, string I/O is done via syscalls‬


‭●‬ ‭In DOS, string I/O is done via interrupts (like int 21h)‬
‭●‬ ‭Modern Windows programs typically use Win32 API calls‬

‭2. Registers to Know‬

‭Key registers for string operations:‬

‭●‬ ‭rax‬‭: Accumulator register, holds syscall numbers and return values‬
‭●‬ ‭rdi‬‭: Destination index, often the first argument to syscalls (file descriptor)‬
‭●‬ ‭rsi‬‭: Source index, often points to the string/buffer to read or write‬
‭●‬ ‭rdx‬‭: Data register, often specifies the length/size for operations‬
‭●‬ ‭rcx‬‭: Counter register, frequently used as loop counter‬

‭Register usage examples:‬

‭; Example: Print a string in Linux x86-64‬


‭mov rax, 1 ; syscall number for write‬
‭mov rdi, 1 ; file descriptor (stdout)‬
‭mov rsi, message ; pointer to string‬
‭mov rdx, length ; string length‬
‭syscall‬

‭; Example: Loop through string to count characters‬


‭mov rcx, 0 ; Initialize counter‬
‭mov rsi, buffer ; Point to string buffer‬
‭count_loop:‬
‭cmp byte [rsi], 0 ; Check for null terminator‬
‭je done ; If found, exit loop‬
‭inc rcx ; Increment counter‬
‭inc rsi ; Move to next character‬
‭jmp count_loop ; Repeat‬
‭done:‬
‭; rcx now contains string length‬

‭3. Syscalls (Linux x86-64)‬

‭Reading from input (syscall 0):‬

‭; Read from stdin‬


‭mov rax, 0 ; syscall number for read‬
‭mov rdi, 0 ; file descriptor 0 (stdin)‬
‭mov rsi, buffer ; buffer to store input‬
‭mov rdx, 64 ; maximum bytes to read‬
‭syscall‬
‭; After syscall, rax contains number of bytes read‬
‭Writing output (syscall 1):‬

‭; Write to stdout‬
‭mov rax, 1 ; syscall number for write‬
‭mov rdi, 1 ; file descriptor 1 (stdout)‬
‭mov rsi, message ; pointer to message‬
‭mov rdx, message_len ; length of message‬
‭syscall‬

‭Additional important syscalls:‬

‭Syscall 60 (‬‭exit‬‭): Terminate the program‬


‭mov rax, 60 ; syscall number for exitmov rdi, 0 ; exit code 0 (success)syscall‬

‭4. Finding String Length‬

‭Algorithm for finding string length:‬

‭1.‬ ‭Initialize a counter register (usually‬‭rcx‬‭) to 0‬


‭2.‬ ‭Point to the start of the string with a pointer register (usually‬‭rsi‬‭)‬
‭3.‬ ‭Loop through characters until terminator is found‬
‭4.‬ ‭Increment counter for each character‬

‭Implementation for null-terminated string:‬

‭mov rcx, 0 ; Initialize counter‬


‭mov rsi, string ; Point to string‬
‭strlen_loop:‬
‭cmp byte [rsi], 0 ; Check for null terminator‬
‭je strlen_done ; Exit if found‬
‭inc rcx ; Increment counter‬
‭inc rsi ; Move to next character‬
‭jmp strlen_loop ; Repeat‬
‭strlen_done:‬
‭; Length is in rcx‬

‭Implementation for newline-terminated string:‬


‭mov rcx, 0 ; Initialize counter‬
‭mov rsi, string ; Point to string‬
‭strlen_loop:‬
‭cmp byte [rsi], 0x0A ; Check for newline‬
‭je strlen_done ; Exit if found‬
‭inc rcx ; Increment counter‬
‭inc rsi ; Move to next character‬
‭jmp strlen_loop ; Repeat‬
‭strlen_done:‬
‭; Length is in rcx‬

‭Answers to Likely Viva Questions‬

‭Basics‬

‭1.‬ ‭How‬ ‭is‬ ‭a‬ ‭string‬ ‭represented‬ ‭in‬ ‭memory?‬ ‭A‬ ‭string‬ ‭in‬ ‭assembly‬ ‭is‬ ‭represented‬‭as‬‭a‬
‭contiguous‬ ‭sequence‬ ‭of‬ ‭bytes‬ ‭in‬ ‭memory,‬ ‭with‬ ‭each‬ ‭byte‬ ‭storing‬ ‭the‬ ‭ASCII‬ ‭or‬
‭Unicode‬ ‭value‬ ‭of‬ ‭a‬ ‭character.‬ ‭The‬ ‭string‬ ‭is‬ ‭typically‬ ‭terminated‬ ‭with‬ ‭a‬ ‭special‬
‭character (like a null byte‬‭0x00‬‭or newline‬‭0x0A‬‭) to mark its end.‬

‭2.‬‭What‬‭register‬‭stores‬‭the‬‭address‬‭of‬‭a‬‭string‬‭in‬‭x86-64?‬‭Any‬‭general-purpose‬‭register‬
‭can‬ ‭store‬‭the‬‭address‬‭of‬‭a‬‭string,‬‭but‬‭by‬‭convention,‬‭rsi‬‭(source‬‭index)‬‭is‬‭often‬‭used‬
‭to‬‭point‬‭to‬‭source‬‭strings,‬‭and‬‭rdi‬‭(destination‬‭index)‬‭for‬‭destination‬‭strings.‬‭For‬‭system‬
‭calls,‬‭rsi‬‭typically points to the string buffer.‬

‭3.‬ ‭What‬ ‭does‬ ‭the‬‭syscall‬‭instruction‬‭do?‬‭The‬‭syscall‬‭instruction‬‭triggers‬‭a‬‭system‬‭call,‬


‭transferring‬ ‭control‬ ‭from‬ ‭user‬ ‭space‬ ‭to‬ ‭kernel‬ ‭space‬ ‭to‬ ‭perform‬ ‭a‬ ‭privileged‬
‭operation‬ ‭(like‬ ‭I/O).‬ ‭Before‬ ‭executing‬ ‭syscall‬‭,‬ ‭the‬ ‭registers‬ ‭must‬ ‭be‬ ‭set‬ ‭up‬ ‭with‬ ‭the‬
‭appropriate values:‬

‭●‬ ‭rax‬‭: syscall number‬


‭●‬ ‭Other‬ ‭registers:‬ ‭arguments‬ ‭for‬ ‭the‬ ‭syscall‬ ‭After‬ ‭the‬ ‭syscall‬ ‭completes,‬ ‭rax‬
‭typically contains the return value.‬

‭Intermediate‬

‭1. How do you find the length of a string in assembly?‬‭To find the length of a string:‬

‭1.‬ ‭Set a counter (typically in‬‭rcx‬‭) to zero‬


‭2.‬ ‭Start with a pointer (typically in‬‭rsi‬‭) to the beginning of the string‬
‭3.‬ ‭Loop through each character:‬
‭○‬ ‭Check if the current character is the terminator (null or newline)‬
‭○‬ ‭If it is, exit the loop‬
‭○‬ ‭If not, increment the counter and pointer‬
‭○‬ ‭Repeat‬
‭4.‬ ‭When done, the counter holds the string length‬

‭2. What is the difference between a null-terminated and newline-terminated string?‬

‭●‬ ‭Null-terminated‬ ‭string‬‭:‬ ‭Ends‬ ‭with‬ ‭a‬ ‭byte‬ ‭value‬ ‭of‬ ‭0x00‬‭.‬ ‭Used‬ ‭for‬ ‭most‬ ‭C-style‬
‭strings and general text processing.‬
‭●‬ ‭Newline-terminated‬ ‭string‬‭:‬ ‭Ends‬ ‭with‬ ‭a‬ ‭byte‬ ‭value‬ ‭of‬ ‭0x0A‬ ‭(or‬ ‭0x0D‬ ‭0x0A‬ ‭in‬
‭some systems). Typically represents a line of text from user input or a file.‬

‭The‬ ‭key‬ ‭difference‬ ‭is‬ ‭in‬ ‭how‬ ‭they're‬ ‭processed‬ ‭-‬ ‭code‬ ‭must‬ ‭check‬ ‭for‬ ‭the‬
‭appropriate terminator based on the expected string format.‬

‭3.‬‭What‬‭is‬‭the‬‭role‬‭of‬‭the‬‭rcx‬‭register‬‭in‬‭loops?‬‭The‬‭rcx‬‭register‬‭traditionally‬‭serves‬‭as‬‭a‬
‭counter in loops. It can be:‬

‭●‬ ‭Decremented‬ ‭with‬ ‭each‬ ‭iteration‬ ‭when‬ ‭using‬ ‭the‬ ‭loop‬ ‭instruction‬ ‭(which‬
‭automatically decrements‬‭rcx‬‭and jumps if not zero)‬
‭●‬ ‭Manually incremented or decremented in custom loop constructs‬
‭●‬ ‭Used to track the number of items processed‬
‭●‬ ‭Used as an index into arrays or strings‬

‭Advanced‬

‭1.‬ ‭What‬ ‭if‬ ‭the‬ ‭user‬ ‭inputs‬ ‭more‬ ‭characters‬ ‭than‬ ‭buffer‬ ‭size?‬ ‭How‬ ‭do‬ ‭you‬ ‭prevent‬
‭overflow?‬‭To prevent buffer overflow:‬

‭1.‬ ‭Always‬‭specify‬‭a‬‭maximum‬‭number‬‭of‬‭bytes‬‭to‬‭read‬‭that‬‭is‬‭less‬‭than‬‭or‬‭equal‬
‭to your buffer size:‬

‭mov rdx, buffer_size ; Set maximum bytes to read (equal to buffer size)‬

‭2.‬ ‭Ensure the buffer is large enough for the expected input plus the terminator‬
‭3.‬ ‭After reading, manually add a null terminator at the end of the actual input:‬

‭; After read syscall (rax contains bytes read)‬


‭mov byte [buffer + rax], 0 ; Add null terminator‬
‭4.‬ ‭For‬ ‭extra‬ ‭safety,‬ ‭check‬ ‭the‬ ‭return‬ ‭value‬ ‭of‬ ‭the‬ ‭read‬ ‭syscall‬ ‭and‬ ‭handle‬ ‭the‬
‭case‬‭where‬‭it‬‭equals‬‭the‬‭buffer‬‭size‬‭(meaning‬‭the‬‭buffer‬‭might‬‭be‬‭full‬‭without‬
‭a proper terminator)‬

‭2.‬ ‭How‬ ‭would‬ ‭this‬ ‭differ‬‭if‬‭using‬‭DOS‬‭interrupts‬‭instead‬‭of‬‭Linux‬‭syscalls?‬‭When‬‭using‬


‭DOS interrupts:‬

‭●‬ ‭Use‬‭int 21h‬‭instruction instead of‬‭syscall‬


‭●‬ ‭Function number goes in‬‭ah‬‭(not‬‭rax‬‭)‬
‭●‬ ‭Different registers for parameters:‬
‭○‬ ‭For string output:‬‭ah = 09h‬‭,‬‭dx‬‭points to '$'-terminated string‬
‭○‬ ‭For character input:‬‭ah = 01h‬‭(returns character in‬‭al‬‭)‬
‭○‬ ‭For string input:‬‭ah = 0Ah‬‭,‬‭dx‬‭points to a special buffer format‬

‭Example of DOS string output:‬

‭mov ah, 09h ; Function: print string‬


‭mov dx, message ; DX points to the string (must end with '$')‬
‭int 21h ; Call DOS interrupt‬

‭3.‬ ‭Can‬ ‭you‬ ‭modify‬ ‭this‬ ‭code‬ ‭to‬ ‭ignore‬ ‭whitespaces‬ ‭while‬ ‭counting?‬ ‭Here's‬ ‭a‬
‭modified string length routine that ignores whitespace characters:‬

‭mov rcx, 0 ; Initialize counter (non-whitespace chars)‬


‭mov rsi, string ; Point to string‬
‭count_loop:‬
‭mov al, byte [rsi] ; Get current character‬
‭cmp al, 0 ; Check for null terminator‬
‭je count_done ; Exit if found‬

‭; Check if character is whitespace‬


‭cmp al, ' ' ; Space‬
‭je skip_whitespace‬
‭cmp al, 9 ; Tab‬
‭je skip_whitespace‬
‭cmp al, 10 ; Line feed‬
‭je skip_whitespace‬
‭cmp al, 13 ; Carriage return‬
‭je skip_whitespace‬

‭; Not whitespace, so count it‬


‭inc rcx ; Increment counter‬

‭skip_whitespace:‬
‭inc rsi ; Move to next character‬
‭jmp count_loop ; Repeat‬

‭count_done:‬
‭; rcx now contains count of non-whitespace characters‬

‭This‬‭code‬‭checks‬‭each‬‭character‬‭against‬‭common‬‭whitespace‬‭values‬‭(space,‬‭tab,‬
‭line‬ ‭feed,‬ ‭carriage‬ ‭return)‬ ‭and‬ ‭only‬ ‭increments‬ ‭the‬ ‭counter‬ ‭for‬ ‭non-whitespace‬
‭characters.‬

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy