Skip to content

Complete x86 Assembly Intel Syntax 32-bit Guide¤

1. KIẾN TRÚC PROCESSOR x86 32-BIT¤

1.1 CPU Architecture Overview¤

Text Only
┌─────────────────────────────────────────────────────────┐
│                    x86 32-bit CPU                       │
├─────────────────┬───────────────────┬───────────────────┤
│   Registers     │   Execution Unit  │   Memory Unit     │
├─────────────────┼───────────────────┼───────────────────┤
│ • General       │ • ALU             │ • Cache L1/L2/L3  │
│ • Segment       │ • FPU             │ • TLB             │
│ • Control       │ • MMU             │ • Prefetch Buffer │
│ • Debug         │ • Branch Predict  │ • Bus Interface   │
└─────────────────┴───────────────────┴───────────────────┘

1.2 Register Architecture Chi Tiết¤

General Purpose Registers (32-bit)¤

Text Only
EAX (Accumulator):
┌─────────┬─────────┬─────────┬─────────┐
│   AH    │   AL    │    -    │    -    │  8-bit access
├─────────┴─────────┼─────────┴─────────┤
│        AX         │         -         │  16-bit access  
├───────────────────┴───────────────────┤
│               EAX                     │  32-bit access
└───────────────────────────────────────┘
Bit:  31    24 23   16 15    8 7      0

EBX (Base):
┌─────────┬─────────┬─────────┬─────────┐
│   BH    │   BL    │    -    │    -    │
├─────────┴─────────┼─────────┴─────────┤
│        BX         │         -         │
├───────────────────┴───────────────────┤
│               EBX                     │
└───────────────────────────────────────┘

ECX (Counter):
┌─────────┬─────────┬─────────┬─────────┐
│   CH    │   CL    │    -    │    -    │
├─────────┴─────────┼─────────┴─────────┤
│        CX         │         -         │
├───────────────────┴───────────────────┤
│               ECX                     │
└───────────────────────────────────────┘

EDX (Data):
┌─────────┬─────────┬─────────┬─────────┐
│   DH    │   DL    │    -    │    -    │
├─────────┴─────────┼─────────┴─────────┤
│        DX         │         -         │
├───────────────────┴───────────────────┤
│               EDX                     │
└───────────────────────────────────────┘

Index và Pointer Registers¤

Text Only
ESI (Source Index):     ┌───────────────────────────────────────┐
                        │               ESI                     │
                        └───────────────┬───────────────────────┘
                                       │        SI             │
                                       └───────────────────────┘

EDI (Destination Index): ┌───────────────────────────────────────┐
                         │               EDI                     │
                         └───────────────┬───────────────────────┘
                                        │        DI             │
                                        └───────────────────────┘

ESP (Stack Pointer):     ┌───────────────────────────────────────┐
                         │               ESP                     │
                         └───────────────┬───────────────────────┘
                                        │        SP             │
                                        └───────────────────────┘

EBP (Base Pointer):      ┌───────────────────────────────────────┐
                         │               EBP                     │
                         └───────────────┬───────────────────────┘
                                        │        BP             │
                                        └───────────────────────┘

Segment Registers (16-bit)¤

Text Only
CS (Code Segment):      ┌───────────────────────────────────────┐
                        │                CS                     │
                        └───────────────────────────────────────┘

DS (Data Segment):      ┌───────────────────────────────────────┐
                        │                DS                     │
                        └───────────────────────────────────────┘

ES (Extra Segment):     ┌───────────────────────────────────────┐
                        │                ES                     │
                        └───────────────────────────────────────┘

FS (Additional Segment): ┌───────────────────────────────────────┐
                         │                FS                     │
                         └───────────────────────────────────────┘

GS (Additional Segment): ┌───────────────────────────────────────┐
                         │                GS                     │
                         └───────────────────────────────────────┘

SS (Stack Segment):     ┌───────────────────────────────────────┐
                        │                SS                     │
                        └───────────────────────────────────────┘

EFLAGS Register (32-bit)¤

Text Only
Bit Position và Flags:

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9  8  7  6  5  4  3  2  1  0
┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│0 │0 │0 │0 │0 │0 │0 │0 │0 │0 │ID│VIP│VIF│AC│VM│RF│0 │NT│IOPL │OF│DF│IF│TF│SF│ZF│0 │AF│0 │PF│1 │CF│
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴───┴───┴──┴──┴──┴──┴──┴──┴────┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘

Status Flags:
• CF (Carry Flag, bit 0): Set when arithmetic overflow/underflow
• PF (Parity Flag, bit 2): Set when result has even number of 1s
• AF (Auxiliary Flag, bit 4): Set for BCD arithmetic carry
• ZF (Zero Flag, bit 6): Set when result is zero
• SF (Sign Flag, bit 7): Set when result is negative (MSB = 1)
• OF (Overflow Flag, bit 11): Set when signed arithmetic overflow

Control Flags:
• DF (Direction Flag, bit 10): Controls string operations direction
• IF (Interrupt Flag, bit 9): Controls interrupt responses
• TF (Trap Flag, bit 8): Enables single-step debugging

System Flags:
• IOPL (I/O Privilege Level, bits 12-13): Controls I/O access
• NT (Nested Task, bit 14): Controls task switching
• RF (Resume Flag, bit 16): Controls debug exceptions
• VM (Virtual 8086 Mode, bit 17): Enables 8086 emulation
• AC (Alignment Check, bit 18): Controls alignment checking
• VIF (Virtual Interrupt Flag, bit 19): Virtual IF for V86 mode
• VIP (Virtual Interrupt Pending, bit 20): Virtual interrupt pending
• ID (Identification Flag, bit 21): Indicates CPUID availability

1.3 Memory Model và Segmentation¤

Memory Layout (32-bit Process)¤

Text Only
Virtual Address Space (4GB):

0xFFFFFFFF ┌─────────────────────────┐
           │     Kernel Space        │ ← Không truy cập được từ user mode
0xC0000000 ├─────────────────────────┤
           │        Stack            │ ← Grows downward
           │          ↓              │
           │                         │
           │                         │
           │       Free Space        │
           │                         │
           │                         │
           │          ↑              │
           │         Heap            │ ← Grows upward
           ├─────────────────────────┤
           │    BSS Segment          │ ← Uninitialized data
           ├─────────────────────────┤
           │    Data Segment         │ ← Initialized data
           ├─────────────────────────┤
           │    Text Segment         │ ← Program code
0x08048000 ├─────────────────────────┤
           │     Reserved            │
0x00000000 └─────────────────────────┘

Segmented Memory Model¤

Text Only
Linear Address Formation:
Segment Selector + Offset = Linear Address

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Segment Register│ +  │     Offset      │ =  │ Linear Address  │
│    (16-bit)     │    │    (32-bit)     │    │   (32-bit)      │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Example:
DS:EBX means DS segment + EBX offset

2. INSTRUCTION SET ARCHITECTURE¤

2.1 Instruction Format¤

Text Only
x86 Instruction Format (Variable Length: 1-15 bytes):

┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ Prefix  │ Prefix  │ Opcode  │ ModR/M  │  SIB    │Displace │Immediate│Immediate│
│(0-4 by) │(0-1 by) │(1-3 by) │(0-1 by) │(0-1 by) │(0,1,2,4)│(0,1,2,4)│(0,1,2,4)│
└─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

Prefixes:
• Instruction prefixes: LOCK, REP, REPNE
• Segment override: CS:, DS:, ES:, FS:, GS:, SS:
• Operand-size: 0x66 (switches 32↔16 bit)
• Address-size: 0x67 (switches 32↔16 bit addressing)

ModR/M Byte:
┌───────┬───────┬───────┐
│  Mod  │  Reg  │  R/M  │  
│(2 bit)│(3 bit)│(3 bit)│
└───────┴───────┴───────┘

SIB Byte (Scale-Index-Base):
┌───────┬───────┬───────┐
│ Scale │ Index │ Base  │
│(2 bit)│(3 bit)│(3 bit)│
└───────┴───────┴───────┘

2.2 Addressing Modes Chi Tiết¤

Immediate Addressing¤

Text Only
mov eax, 12345h         ; EAX = 0x12345
mov bl, 'A'             ; BL = ASCII 'A' (0x41)
add eax, 100            ; EAX = EAX + 100

Register Addressing¤

Text Only
mov eax, ebx            ; EAX = EBX
add ecx, edx            ; ECX = ECX + EDX
xchg eax, ebx           ; Swap EAX and EBX

Memory Addressing Modes¤

Text Only
; Direct Memory Addressing
mov eax, [1234h]        ; EAX = memory[0x1234]
mov [var1], ebx         ; memory[var1] = EBX

; Register Indirect
mov eax, [ebx]          ; EAX = memory[EBX]
mov [ecx], eax          ; memory[ECX] = EAX

; Base + Displacement
mov eax, [ebx+4]        ; EAX = memory[EBX + 4]
mov eax, [ebp-8]        ; EAX = memory[EBP - 8]

; Base + Index
mov eax, [ebx+ecx]      ; EAX = memory[EBX + ECX]
mov eax, [esi+edi]      ; EAX = memory[ESI + EDI]

; Base + Index + Displacement
mov eax, [ebx+ecx+8]    ; EAX = memory[EBX + ECX + 8]

; Scaled Index
mov eax, [ebx+ecx*2]    ; EAX = memory[EBX + ECX*2]
mov eax, [ebx+ecx*4+8]  ; EAX = memory[EBX + ECX*4 + 8]

; Scale factors: 1, 2, 4, 8
mov eax, [esi+edi*1]    ; Scale = 1
mov eax, [esi+edi*2]    ; Scale = 2
mov eax, [esi+edi*4]    ; Scale = 4
mov eax, [esi+edi*8]    ; Scale = 8

2.3 Complete Instruction Set¤

Data Movement Instructions¤

Text Only
; MOV - Move data
mov dst, src            ; dst = src
movsx eax, bl           ; Sign-extend BL to EAX
movzx eax, bl           ; Zero-extend BL to EAX

; PUSH/POP - Stack operations
push eax                ; ESP = ESP - 4; [ESP] = EAX
push word ptr [ebx]     ; Push 16-bit value
pushad                  ; Push all general registers
pop ebx                 ; EBX = [ESP]; ESP = ESP + 4
popad                   ; Pop all general registers

; XCHG - Exchange
xchg eax, ebx           ; Swap EAX and EBX
xchg [mem1], eax        ; Swap memory and EAX

; LEA - Load Effective Address
lea eax, [ebx+ecx*2+8]  ; EAX = EBX + ECX*2 + 8 (address calculation)
lea esi, [string1]      ; ESI = address of string1

; XLAT - Translate
; AL = [EBX + AL] (used for table lookups)
mov ebx, table_addr     ; Load table address
mov al, index           ; Load index
xlat                    ; AL = table[AL]

Arithmetic Instructions¤

Text Only
; Basic Arithmetic
add dst, src            ; dst = dst + src
adc dst, src            ; dst = dst + src + CF (add with carry)
sub dst, src            ; dst = dst - src
sbb dst, src            ; dst = dst - src - CF (subtract with borrow)
neg dst                 ; dst = -dst (two's complement)

; Increment/Decrement
inc dst                 ; dst = dst + 1
dec dst                 ; dst = dst - 1

; Multiplication
mul src                 ; AX/EAX = AL/EAX * src (unsigned)
imul src                ; AX/EAX = AL/EAX * src (signed)
imul dst, src           ; dst = dst * src (signed)
imul dst, src, imm      ; dst = src * imm (signed)

; Division
div src                 ; AL/EAX = AX/EDX:EAX / src; AH/EDX = remainder
idiv src                ; AL/EAX = AX/EDX:EAX / src (signed)

; Extended Operations
aaa                     ; ASCII Adjust After Addition
aas                     ; ASCII Adjust After Subtraction
aam                     ; ASCII Adjust After Multiplication
aad                     ; ASCII Adjust Before Division
daa                     ; Decimal Adjust After Addition
das                     ; Decimal Adjust After Subtraction

Logical Instructions¤

Text Only
; Bitwise Operations
and dst, src            ; dst = dst AND src
or dst, src             ; dst = dst OR src
xor dst, src            ; dst = dst XOR src
not dst                 ; dst = NOT dst (one's complement)

; Shift Operations
shl dst, count          ; Shift Left (logical)
shr dst, count          ; Shift Right (logical)
sal dst, count          ; Shift Arithmetic Left (same as SHL)
sar dst, count          ; Shift Arithmetic Right (sign extend)

; Rotate Operations
rol dst, count          ; Rotate Left
ror dst, count          ; Rotate Right
rcl dst, count          ; Rotate Left through Carry
rcr dst, count          ; Rotate Right through Carry

; Bit Operations
bt src, bit             ; Test bit (bit -> CF)
bts dst, bit            ; Test and Set bit
btr dst, bit            ; Test and Reset bit
btc dst, bit            ; Test and Complement bit
bsf dst, src            ; Bit Scan Forward (find first set bit)
bsr dst, src            ; Bit Scan Reverse (find last set bit)

String Instructions¤

Text Only
; Move String
movsb                   ; Move byte: [EDI] = [ESI]; ESI++; EDI++
movsw                   ; Move word
movsd                   ; Move dword

; Compare String  
cmpsb                   ; Compare bytes at ESI and EDI
cmpsw                   ; Compare words
cmpsd                   ; Compare dwords

; Scan String
scasb                   ; Compare AL with [EDI]
scasw                   ; Compare AX with [EDI]
scasd                   ; Compare EAX with [EDI]

; Load String
lodsb                   ; AL = [ESI]; ESI++
lodsw                   ; AX = [ESI]; ESI += 2
lodsd                   ; EAX = [ESI]; ESI += 4

; Store String
stosb                   ; [EDI] = AL; EDI++
stosw                   ; [EDI] = AX; EDI += 2
stosd                   ; [EDI] = EAX; EDI += 4

; Repeat Prefixes
rep                     ; Repeat while ECX != 0
repe/repz               ; Repeat while equal/zero
repne/repnz             ; Repeat while not equal/not zero

; Examples:
mov ecx, 100            ; Count
rep movsb               ; Copy 100 bytes from [ESI] to [EDI]

Control Transfer Instructions¤

Text Only
; Unconditional Jumps
jmp label               ; Jump to label
jmp eax                 ; Jump to address in EAX
jmp [eax]               ; Jump to address stored at [EAX]

; Conditional Jumps (based on flags)
je/jz label             ; Jump if Equal/Zero (ZF = 1)
jne/jnz label           ; Jump if Not Equal/Not Zero (ZF = 0)
jc/jb label             ; Jump if Carry/Below (CF = 1)
jnc/jnb/jae label       ; Jump if No Carry/Not Below/Above or Equal (CF = 0)
jo label                ; Jump if Overflow (OF = 1)
jno label               ; Jump if No Overflow (OF = 0)
js label                ; Jump if Sign (SF = 1)
jns label               ; Jump if No Sign (SF = 0)
jp/jpe label            ; Jump if Parity/Parity Even (PF = 1)
jnp/jpo label           ; Jump if No Parity/Parity Odd (PF = 0)

; Signed Comparisons
jl/jnge label           ; Jump if Less/Not Greater or Equal (SF != OF)
jle/jng label           ; Jump if Less or Equal/Not Greater ((ZF = 1) OR (SF != OF))
jg/jnle label           ; Jump if Greater/Not Less or Equal ((ZF = 0) AND (SF = OF))
jge/jnl label           ; Jump if Greater or Equal/Not Less (SF = OF)

; Unsigned Comparisons
jb/jnae label           ; Jump if Below/Not Above or Equal (CF = 1)
jbe/jna label           ; Jump if Below or Equal/Not Above ((CF = 1) OR (ZF = 1))
ja/jnbe label           ; Jump if Above/Not Below or Equal ((CF = 0) AND (ZF = 0))
jae/jnb label           ; Jump if Above or Equal/Not Below (CF = 0)

; Loop Instructions
loop label              ; ECX--; if ECX != 0 then jump to label
loope/loopz label       ; ECX--; if ECX != 0 AND ZF = 1 then jump
loopne/loopnz label     ; ECX--; if ECX != 0 AND ZF = 0 then jump

Procedure Instructions¤

Text Only
; Call/Return
call label              ; Push EIP; Jump to label
call eax                ; Push EIP; Jump to address in EAX
call [eax]              ; Push EIP; Jump to address at [EAX]
ret                     ; Pop EIP
ret n                   ; Pop EIP; ESP = ESP + n

; Stack Frame Operations
enter n, 0              ; Push EBP; EBP = ESP; ESP = ESP - n
leave                   ; ESP = EBP; Pop EBP (equivalent to mov esp,ebp; pop ebp)

Flag Manipulation¤

Text Only
; Flag Operations
stc                     ; Set Carry Flag (CF = 1)
clc                     ; Clear Carry Flag (CF = 0)
cmc                     ; Complement Carry Flag (CF = !CF)
std                     ; Set Direction Flag (DF = 1)
cld                     ; Clear Direction Flag (DF = 0)
sti                     ; Set Interrupt Flag (IF = 1)
cli                     ; Clear Interrupt Flag (IF = 0)

; Flag Transfer
lahf                    ; AH = flags (SF ZF 0 AF 0 PF 1 CF)
sahf                    ; flags = AH
pushf                   ; Push FLAGS register
popf                    ; Pop FLAGS register
pushfd                  ; Push EFLAGS register
popfd                   ; Pop EFLAGS register

Comparison and Test¤

Text Only
; Compare (performs subtraction but doesn't store result)
cmp dst, src            ; Compare dst with src (dst - src)
                        ; Sets flags: CF, ZF, SF, OF, PF, AF

; Test (performs AND but doesn't store result)  
test dst, src           ; Test dst with src (dst AND src)
                        ; Sets flags: ZF, SF, PF; clears CF, OF

; Examples:
cmp eax, 10             ; Compare EAX with 10
je equal                ; Jump if EAX = 10
jg greater              ; Jump if EAX > 10

test eax, eax           ; Test if EAX is zero
jz is_zero              ; Jump if EAX = 0

test eax, 1             ; Test if EAX is odd
jnz is_odd              ; Jump if last bit is set

3. DIRECTIVES VÀ ASSEMBLER SYNTAX¤

3.1 NASM Directives¤

Text Only
; Data Definition
DB    'Hello'           ; Define Byte(s)
DW    1234h             ; Define Word (16-bit)
DD    12345678h         ; Define Double Word (32-bit)
DQ    123456789ABCDEF0h ; Define Quad Word (64-bit)
DT    1.23e+4932        ; Define Ten Bytes (80-bit float)

; Reserve Storage
RESB  100               ; Reserve 100 bytes
RESW  50                ; Reserve 50 words (100 bytes)
RESD  25                ; Reserve 25 dwords (100 bytes)
RESQ  12                ; Reserve 12 qwords (96 bytes)

; Constants and Expressions
EQU                     ; Define constant
%define                 ; Define macro constant
%assign                 ; Define numeric constant

; Examples:
BUFFER_SIZE equ 1024
%define TRUE  1
%define FALSE 0
%assign COUNTER 0

; Current Location Counter
$                       ; Current position
$$                      ; Start of current section

; String length calculation
msg db 'Hello World', 0
msg_len equ $ - msg     ; Length of string including null terminator

; Alignment
ALIGN 4                 ; Align to 4-byte boundary
ALIGNB 16               ; Align to 16-byte boundary

; Conditional Assembly
%if BUFFER_SIZE > 1024
    %error "Buffer size too large"
%endif

%ifdef DEBUG
    debug_msg db 'Debug mode enabled', 0
%endif

3.2 Section Definitions¤

Text Only
; Standard sections
section .text           ; Code section (executable)
section .data           ; Initialized data section
section .bss            ; Uninitialized data section
section .rodata         ; Read-only data section

; Section attributes
section .text exec      ; Executable section
section .data write     ; Writable section
section .rodata nowrite ; Read-only section

; Custom sections
section .init           ; Initialization code
section .fini           ; Finalization code
section .comment        ; Comment section

; Section alignment
section .data align=4   ; Align section to 4 bytes
section .bss align=16   ; Align section to 16 bytes

3.3 Symbol Types và Visibility¤

Text Only
; Global symbols
global _start           ; Make _start visible to linker
global main             ; Make main visible to linker
global my_function      ; Export function

; External symbols  
extern printf           ; Import C function
extern malloc           ; Import C function
extern my_variable      ; Import variable

; Local labels
.local_label:           ; Local to current global label
..local_label:          ; Local to current section

; Symbol types
my_function:            ; Function label
    ; code here
    ret

my_data dd 12345       ; Data label
my_string db 'text', 0 ; String label

; Weak symbols
weak my_weak_function   ; Weak symbol (can be overridden)

; Common symbols
common my_common 4      ; 4-byte common variable

4. SYSTEM PROGRAMMING¤

4.1 Linux System Calls (32-bit)¤

Text Only
; System call interface
; EAX = system call number
; EBX, ECX, EDX, ESI, EDI, EBP = arguments
; INT 0x80 = invoke system call
; Return value in EAX

; Common system calls:
SYS_EXIT    equ 1       ; exit(int status)
SYS_FORK    equ 2       ; fork()  
SYS_READ    equ 3       ; read(int fd, void* buf, size_t count)
SYS_WRITE   equ 4       ; write(int fd, const void* buf, size_t count)
SYS_OPEN    equ 5       ; open(const char* pathname, int flags, mode_t mode)
SYS_CLOSE   equ 6       ; close(int fd)
SYS_WAITPID equ 7       ; waitpid(pid_t pid, int* status, int options)
SYS_CREAT   equ 8       ; creat(const char* pathname, mode_t mode)
SYS_LINK    equ 9       ; link(const char* oldpath, const char* newpath)
SYS_UNLINK  equ 10      ; unlink(const char* pathname)
SYS_EXECVE  equ 11      ; execve(const char* filename, char* const argv[], char* const envp[])
SYS_CHDIR   equ 12      ; chdir(const char* path)
SYS_TIME    equ 13      ; time(time_t* t)
SYS_MKNOD   equ 14      ; mknod(const char* pathname, mode_t mode, dev_t dev)
SYS_CHMOD   equ 15      ; chmod(const char* pathname, mode_t mode)

; File operations
SYS_LSEEK   equ 19      ; lseek(int fd, off_t offset, int whence)
SYS_GETPID  equ 20      ; getpid()
SYS_MOUNT   equ 21      ; mount(const char* source, const char* target, ...)
SYS_UMOUNT  equ 22      ; umount(const char* target)

; Memory management  
SYS_BRK     equ 45      ; brk(void* addr)
SYS_MMAP    equ 90      ; mmap(void* addr, size_t length, int prot, int flags, int fd, off_t offset)
SYS_MUNMAP  equ 91      ; munmap(void* addr, size_t length)

; Example system calls:
; Write system call
write_syscall:
    mov eax, SYS_WRITE  ; System call number for write
    mov ebx, 1          ; File descriptor (stdout)
    mov ecx, message    ; Buffer to write
    mov edx, msg_len    ; Number of bytes to write
    int 0x80            ; Invoke system call
    ret

; Read system call  
read_syscall:
    mov eax, SYS_READ   ; System call number for read
    mov ebx, 0          ; File descriptor (stdin)
    mov ecx, buffer     ; Buffer to read into
    mov edx, buffer_size; Maximum bytes to read
    int 0x80            ; Invoke system call
    ret                 ; Return value in EAX = bytes read

; Exit system call
exit_syscall:
    mov eax, SYS_EXIT   ; System call number for exit
    mov ebx, 0          ; Exit status
    int 0x80            ; Invoke system call (doesn't return)

4.2 Stack Frame Management¤

Text Only
; Standard function prologue/epilogue
my_function:
    ; Prologue
    push ebp            ; Save caller's base pointer
    mov ebp, esp        ; Set up new base pointer
    sub esp, 16         ; Allocate 16 bytes for local variables

    ; Save registers if needed
    push ebx            ; Save EBX if we'll modify it
    push esi            ; Save ESI if we'll modify it  
    push edi            ; Save EDI if we'll modify it

    ; Function body
    ; Local variables accessible as [ebp-4], [ebp-8], etc.
    ; Parameters accessible as [ebp+8], [ebp+12], etc.

    ; Example: accessing parameters and local variables
    mov eax, [ebp+8]    ; First parameter
    mov ebx, [ebp+12]   ; Second parameter
    mov [ebp-4], eax    ; Store in first local variable
    add eax, ebx        ; Perform operation
    mov [ebp-8], eax    ; Store result in second local variable

    ; Restore registers
    pop edi             ; Restore EDI
    pop esi             ; Restore ESI
    pop ebx             ; Restore EBX

    ; Epilogue
    mov esp, ebp        ; Restore stack pointer
    pop ebp             ; Restore caller's base pointer
    ret                 ; Return to caller

; Calling convention examples
; cdecl (C calling convention)
caller_cdecl:
    push 20             ; Second parameter (pushed last)
    push 10             ; First parameter (pushed first)
    call my_function    ; Call function
    add esp, 8          ; Caller cleans up stack (2 parameters * 4 bytes)
    ; Return value in EAX
    ret

; stdcall (Windows API calling convention)
my_stdcall_function:
    push ebp
    mov ebp, esp
    ; Function body
    mov esp, ebp
    pop ebp
    ret 8               ; Callee cleans up stack (8 bytes of parameters)

; fastcall (first two parameters in registers)
my_fastcall_function:   ; ECX = first param, EDX = second param
    push ebp
    mov ebp, esp
    mov eax, ecx        ; Use first parameter
    add eax, edx        ; Add second parameter
    ; Additional parameters on stack: [ebp+8], [ebp+12], etc.
    pop ebp
    ret                 ; Or ret n if cleaning up stack parameters

4.3 Memory Management¤

Text Only
; Dynamic memory allocation using system calls
allocate_memory:
    ; Get current program break
    mov eax, SYS_BRK
    mov ebx, 0          ; 0 = get current break
    int 0x80
    mov [heap_start], eax ; Save current heap start

    ; Extend heap by requested size
    mov ebx, eax        ; Current break
    add ebx, [requested_size] ; Add requested size
    mov eax, SYS_BRK
    int 0x80            ; Set new break

    ; Check if allocation succeeded
    cmp eax, ebx
    jl allocation_failed

    mov eax, [heap_start] ; Return pointer to allocated memory
    ret

allocation_failed:
    mov eax, 0          ; Return NULL
    ret

; Memory copy function
memcpy:
    ; Parameters: dest=[ebp+8], src=[ebp+12], count=[ebp+16]
    push ebp
    mov ebp, esp
    push esi
    push edi
    push ecx

    mov edi, [ebp+8]    ; Destination
    mov esi, [ebp+12]   ; Source
    mov ecx, [ebp+16]   ; Count

    cld                 ; Clear direction flag (forward copy)
    rep movsb           ; Copy ECX bytes from ESI to EDI

    pop ecx
    pop edi
    pop esi
    pop ebp
    ret

; Memory set function
memset:
    ; Parameters: dest=[ebp+8], value=[ebp+12], count=[ebp+16]
    push ebp
    mov ebp, esp
    push edi
    push ecx

    mov edi, [ebp+8]    ; Destination
    mov eax, [ebp+12]   ; Value to set
    mov ecx, [ebp+16]   ; Count

    cld                 ; Clear direction flag
    rep stosb           ; Store AL to [EDI], ECX times

    pop ecx
    pop edi
    pop ebp
    ret

; Memory compare function
memcmp:
    ; Parameters: ptr1=[ebp+8], ptr2=[ebp+12], count=[ebp+16]
    push ebp
    mov ebp, esp
    push esi
    push edi
    push ecx

    mov esi, [ebp+8]    ; First memory block
    mov edi, [ebp+12]   ; Second memory block
    mov ecx, [ebp+16]   ; Count

    cld                 ; Clear direction flag
    repe cmpsb          ; Compare bytes while equal

    ; Set return value based on comparison
    mov eax, 0          ; Assume equal
    je .equal
    mov eax, -1         ; First < second
    jb .done
    mov eax, 1          ; First > second
    jmp .done

.equal:
    mov eax, 0
.done:
    pop ecx
    pop edi
    pop esi
    pop ebp
    ret

5. FLOATING POINT OPERATIONS¤

5.1 FPU (x87) Architecture¤

Text Only
; x87 FPU Register Stack
; ST(0) - Top of stack (most recently loaded)
; ST(1) - Second from top
; ST(2) - Third from top
; ...
; ST(7) - Bottom of stack

; FPU Status Word
; Bit 15: B (FPU Busy)
; Bit 14: C3 (Condition Code 3)
; Bit 13-11: TOP (Top of Stack Pointer)
; Bit 10: C2 (Condition Code 2)
; Bit 9: C1 (Condition Code 1)
; Bit 8: C0 (Condition Code 0)
; Bit 7: ES (Error Summary)
; Bit 6: SF (Stack Fault)
; Bit 5: PE (Precision Exception)
; Bit 4: UE (Underflow Exception)
; Bit 3: OE (Overflow Exception)
; Bit 2: ZE (Zero Divide Exception)
; Bit 1: DE (Denormalized Operand Exception)
; Bit 0: IE (Invalid Operation Exception)

; Loading values onto FPU stack
fld dword [float_var]   ; Load 32-bit float onto ST(0)
fld qword [double_var]  ; Load 64-bit double onto ST(0)
fld tword [extended_var]; Load 80-bit extended precision
fld1                    ; Load 1.0 onto ST(0)
fldz                    ; Load 0.0 onto ST(0)
fldpi                   ; Load π onto ST(0)
fldl2e                  ; Load log₂(e) onto ST(0)
fldl2t                  ; Load log₂(10) onto ST(0)
fldlg2                  ; Load log₁₀(2) onto ST(0)
fldln2                  ; Load ln(2) onto ST(0)

; Storing values from FPU stack
fst dword [result]      ; Store ST(0) to memory (ST(0) remains)
fstp dword [result]     ; Store ST(0) to memory and pop stack
fist dword [int_result] ; Store ST(0) as integer
fistp dword [int_result]; Store ST(0) as integer and pop

; Arithmetic operations
fadd                    ; ST(0) = ST(0) + ST(1), pop stack
fadd st0, st1           ; ST(0) = ST(0) + ST(1)
fadd dword [memory]     ; ST(0) = ST(0) + memory
faddp st1, st0          ; ST(1) = ST(1) + ST(0), pop ST(0)

fsub                    ; ST(0) = ST(1) - ST(0), pop stack
fsub st0, st1           ; ST(0) = ST(0) - ST(1)
fsubr                   ; ST(0) = ST(0) - ST(1), pop stack
fsubr st0, st1          ; ST(0) = ST(1) - ST(0)

fmul                    ; ST(0) = ST(0) * ST(1), pop stack
fmul st0, st1           ; ST(0) = ST(0) * ST(1)
fmul dword [memory]     ; ST(0) = ST(0) * memory

fdiv                    ; ST(0) = ST(1) / ST(0), pop stack
fdiv st0, st1           ; ST(0) = ST(0) / ST(1)
fdivr                   ; ST(0) = ST(0) / ST(1), pop stack

; Comparison operations
fcom                    ; Compare ST(0) with ST(1)
fcom dword [memory]     ; Compare ST(0) with memory
fcomp                   ; Compare ST(0) with ST(1) and pop
fcompp                  ; Compare ST(0) with ST(1) and pop both
fucom                   ; Unordered compare (handles NaN)
fucomp                  ; Unordered compare and pop
fucompp                 ; Unordered compare and pop both

; Get comparison result
fstsw ax               ; Store FPU status word to AX
sahf                   ; Store AH to flags
; Now you can use conditional jumps based on comparison

; Mathematical functions
fsin                   ; ST(0) = sin(ST(0))
fcos                   ; ST(0) = cos(ST(0))
fsincos                ; ST(1) = sin(ST(0)), ST(0) = cos(ST(0))
fptan                  ; ST(1) = ST(0), ST(0) = tan(ST(0))
fpatan                 ; ST(0) = arctan(ST(1)/ST(0)), pop ST(1)
f2xm1                  ; ST(0) = 2^ST(0) - 1
fyl2x                  ; ST(0) = ST(1) * log₂(ST(0)), pop ST(1)
fyl2xp1                ; ST(0) = ST(1) * log₂(ST(0) + 1), pop ST(1)
fsqrt                  ; ST(0) = √ST(0)
fabs                   ; ST(0) = |ST(0)|
fchs                   ; ST(0) = -ST(0)

; Stack management
fxch                   ; Exchange ST(0) and ST(1)
fxch st3               ; Exchange ST(0) and ST(3)
ffree st2              ; Mark ST(2) as empty
fincstp                ; Increment stack top pointer
fdecstp                ; Decrement stack top pointer

; FPU control
finit                  ; Initialize FPU
fninit                 ; Initialize FPU (no wait)
fclex                  ; Clear exceptions
fnclex                 ; Clear exceptions (no wait)
fstcw [control_word]   ; Store control word
fldcw [control_word]   ; Load control word
fstenv [fpu_env]       ; Store FPU environment
fldenv [fpu_env]       ; Load FPU environment
fsave [fpu_state]      ; Save FPU state
frstor [fpu_state]     ; Restore FPU state

5.2 SSE (Streaming SIMD Extensions)¤

Text Only
; SSE Registers: XMM0-XMM7 (128-bit each)
; Each XMM register can hold:
; - 4 single-precision floats (32-bit each)
; - 2 double-precision floats (64-bit each)
; - 16 bytes, 8 words, 4 dwords, 2 qwords

; Load/Store operations
movss xmm0, [float_var]     ; Load single scalar float
movsd xmm0, [double_var]    ; Load single scalar double
movaps xmm0, [aligned_mem]  ; Load 4 aligned packed singles
movups xmm0, [unaligned_mem]; Load 4 unaligned packed singles
movapd xmm0, [aligned_mem]  ; Load 2 aligned packed doubles
movupd xmm0, [unaligned_mem]; Load 2 unaligned packed doubles

; Arithmetic operations (scalar)
addss xmm0, xmm1           ; Add scalar singles
subss xmm0, xmm1           ; Subtract scalar singles
mulss xmm0, xmm1           ; Multiply scalar singles
divss xmm0, xmm1           ; Divide scalar singles
sqrtss xmm0, xmm1          ; Square root scalar single

addsd xmm0, xmm1           ; Add scalar doubles
subsd xmm0, xmm1           ; Subtract scalar doubles
mulsd xmm0, xmm1           ; Multiply scalar doubles
divsd xmm0, xmm1           ; Divide scalar doubles
sqrtsd xmm0, xmm1          ; Square root scalar double

; Arithmetic operations (packed)
addps xmm0, xmm1           ; Add 4 packed singles
subps xmm0, xmm1           ; Subtract 4 packed singles
mulps xmm0, xmm1           ; Multiply 4 packed singles
divps xmm0, xmm1           ; Divide 4 packed singles
sqrtps xmm0, xmm1          ; Square root 4 packed singles

addpd xmm0, xmm1           ; Add 2 packed doubles
subpd xmm0, xmm1           ; Subtract 2 packed doubles
mulpd xmm0, xmm1           ; Multiply 2 packed doubles
divpd xmm0, xmm1           ; Divide 2 packed doubles
sqrtpd xmm0, xmm1          ; Square root 2 packed doubles

; Comparison operations
cmpss xmm0, xmm1, 0        ; Compare scalar singles (EQ)
cmpss xmm0, xmm1, 1        ; Compare scalar singles (LT)
cmpss xmm0, xmm1, 2        ; Compare scalar singles (LE)
cmpss xmm0, xmm1, 4        ; Compare scalar singles (NE)

; Comparison predicates:
; 0 = EQ (equal)
; 1 = LT (less than)
; 2 = LE (less than or equal)
; 3 = UNORD (unordered)
; 4 = NEQ (not equal)
; 5 = NLT (not less than)
; 6 = NLE (not less than or equal)
; 7 = ORD (ordered)

; Conversion operations
cvtss2sd xmm0, xmm1        ; Convert scalar single to double
cvtsd2ss xmm0, xmm1        ; Convert scalar double to single
cvtsi2ss xmm0, eax         ; Convert integer to scalar single
cvtss2si eax, xmm0         ; Convert scalar single to integer
cvttss2si eax, xmm0        ; Convert scalar single to integer (truncate)

; Shuffle and unpack operations
shufps xmm0, xmm1, 0xE4    ; Shuffle packed singles
unpcklps xmm0, xmm1        ; Unpack low packed singles
unpckhps xmm0, xmm1        ; Unpack high packed singles

; Logical operations
andps xmm0, xmm1           ; Bitwise AND
orps xmm0, xmm1            ; Bitwise OR
xorps xmm0, xmm1           ; Bitwise XOR
andnps xmm0, xmm1          ; Bitwise AND NOT

; Min/Max operations
minss xmm0, xmm1           ; Minimum scalar single
maxss xmm0, xmm1           ; Maximum scalar single
minps xmm0, xmm1           ; Minimum packed singles
maxps xmm0, xmm1           ; Maximum packed singles

6. ADVANCED PROGRAMMING TECHNIQUES¤

6.1 Bit Manipulation Techniques¤

Text Only
; Check if number is power of 2
is_power_of_2:
    ; Input: EAX = number
    ; Output: ZF = 1 if power of 2, 0 otherwise
    test eax, eax           ; Check if zero
    jz .not_power_of_2      ; Zero is not power of 2

    mov ebx, eax            ; Copy number
    dec ebx                 ; n - 1
    and eax, ebx            ; n & (n-1)
    ; If result is 0, then n is power of 2
    ret

.not_power_of_2:
    mov eax, 1              ; Set non-zero to clear ZF
    ret

; Count set bits (population count)
popcount:
    ; Input: EAX = number
    ; Output: EAX = number of set bits
    push ecx
    xor ecx, ecx            ; Clear counter

.count_loop:
    test eax, eax           ; Check if zero
    jz .done

    mov ebx, eax            ; Copy number
    dec ebx                 ; n - 1
    and eax, ebx            ; n & (n-1) clears lowest set bit
    inc ecx                 ; Increment counter
    jmp .count_loop

.done:
    mov eax, ecx            ; Return count
    pop ecx
    ret

; Find first set bit (similar to BSF)
find_first_set:
    ; Input: EAX = number
    ; Output: EAX = position of first set bit (0-based), -1 if not found
    test eax, eax
    jz .not_found

    push ecx
    xor ecx, ecx            ; Bit position counter

.find_loop:
    test eax, 1             ; Check lowest bit
    jnz .found
    shr eax, 1              ; Shift right
    inc ecx                 ; Increment position
    jmp .find_loop

.found:
    mov eax, ecx            ; Return position
    pop ecx
    ret

.not_found:
    mov eax, -1             ; Return -1 if no bits set
    ret

; Reverse bits in a 32-bit number
reverse_bits:
    ; Input: EAX = number
    ; Output: EAX = number with bits reversed
    push ebx
    push ecx

    xor ebx, ebx            ; Result
    mov ecx, 32             ; Bit counter

.reverse_loop:
    shl ebx, 1              ; Shift result left
    shr eax, 1              ; Shift input right
    adc ebx, 0              ; Add carry to result
    loop .reverse_loop

    mov eax, ebx            ; Return result
    pop ecx
    pop ebx
    ret

; Extract bits from position with mask
extract_bits:
    ; Parameters: value=[ebp+8], position=[ebp+12], width=[ebp+16]
    ; Output: EAX = extracted bits
    push ebp
    mov ebp, esp
    push ebx
    push ecx

    mov eax, [ebp+8]        ; Get value
    mov ecx, [ebp+12]       ; Get position
    mov ebx, [ebp+16]       ; Get width

    shr eax, cl             ; Shift right by position

    ; Create mask for width bits
    mov ecx, ebx            ; Width
    mov ebx, 1
    shl ebx, cl             ; 1 << width
    dec ebx                 ; (1 << width) - 1 = mask

    and eax, ebx            ; Apply mask

    pop ecx
    pop ebx
    pop ebp
    ret

; Set specific bits
set_bits:
    ; Parameters: value=[ebp+8], position=[ebp+12], width=[ebp+16], new_bits=[ebp+20]
    push ebp
    mov ebp, esp
    push ebx
    push ecx
    push edx

    mov eax, [ebp+8]        ; Original value
    mov ecx, [ebp+12]       ; Position
    mov ebx, [ebp+16]       ; Width
    mov edx, [ebp+20]       ; New bits

    ; Create mask
    push eax                ; Save original value
    mov eax, 1
    shl eax, bl             ; 1 << width
    dec eax                 ; (1 << width) - 1 = field mask
    shl eax, cl             ; Shift mask to position
    not eax                 ; Invert mask for clearing

    pop ebx                 ; Restore original value
    and ebx, eax            ; Clear target bits

    ; Prepare new bits
    and edx, 1              ; Mask new bits to width
    shl edx, 1              ; Shift to position
    dec edx
    shl edx, cl

    or eax, ebx             ; Combine with cleared value
    or eax, edx             ; Set new bits

    pop edx
    pop ecx  
    pop ebx
    pop ebp
    ret

6.2 String Processing¤

Text Only
; String length calculation
strlen:
    ; Input: ESI = string pointer
    ; Output: EAX = string length
    push edi
    push ecx

    mov edi, esi            ; Copy string pointer
    xor eax, eax            ; Search for null terminator (0)
    mov ecx, -1             ; Maximum count (4GB)
    repne scasb             ; Scan for AL (0) in string

    not ecx                 ; Convert to positive count
    dec ecx                 ; Subtract 1 (for the null terminator)
    mov eax, ecx            ; Return length

    pop ecx
    pop edi
    ret

; String copy
strcpy:
    ; Parameters: dest=[ebp+8], src=[ebp+12]
    ; Output: EAX = dest
    push ebp
    mov ebp, esp
    push esi
    push edi

    mov edi, [ebp+8]        ; Destination
    mov esi, [ebp+12]       ; Source
    mov eax, edi            ; Return value (dest)

.copy_loop:
    lodsb                   ; Load byte from [ESI] to AL
    stosb                   ; Store AL to [EDI]
    test al, al             ; Check for null terminator
    jnz .copy_loop          ; Continue if not zero

    pop edi
    pop esi
    pop ebp
    ret

; String concatenation
strcat:
    ; Parameters: dest=[ebp+8], src=[ebp+12]
    ; Output: EAX = dest
    push ebp
    mov ebp, esp
    push esi
    push edi

    mov edi, [ebp+8]        ; Destination
    mov esi, [ebp+12]       ; Source
    mov eax, edi            ; Return value

    ; Find end of destination string
    push edi
    xor eax, eax            ; Search for null
    mov ecx, -1             ; Maximum length
    repne scasb             ; Find null terminator
    dec edi                 ; Back up to null position
    pop eax                 ; Restore return value

    ; Copy source to end of destination
.concat_loop:
    lodsb                   ; Load from source
    stosb                   ; Store to destination
    test al, al             ; Check for null
    jnz .concat_loop        ; Continue if not null

    pop edi
    pop esi
    pop ebp
    ret

; String comparison
strcmp:
    ; Parameters: str1=[ebp+8], str2=[ebp+12]
    ; Output: EAX = 0 (equal), <0 (str1<str2), >0 (str1>str2)
    push ebp
    mov ebp, esp
    push esi
    push edi

    mov esi, [ebp+8]        ; First string
    mov edi, [ebp+12]       ; Second string

.compare_loop:
    lodsb                   ; Load byte from first string
    mov bl, [edi]           ; Load byte from second string
    inc edi                 ; Advance second string pointer

    cmp al, bl              ; Compare bytes
    jne .not_equal          ; Different bytes found

    test al, al             ; Check for null terminator
    jnz .compare_loop       ; Continue if not end

    ; Strings are equal
    xor eax, eax            ; Return 0
    jmp .done

.not_equal:
    movzx eax, al           ; Zero-extend first byte
    movzx ebx, bl           ; Zero-extend second byte
    sub eax, ebx            ; Return difference

.done:
    pop edi
    pop esi
    pop ebp
    ret

; Case-insensitive string comparison
stricmp:
    ; Parameters: str1=[ebp+8], str2=[ebp+12]
    push ebp
    mov ebp, esp
    push esi
    push edi

    mov esi, [ebp+8]        ; First string
    mov edi, [ebp+12]       ; Second string

.compare_loop:
    lodsb                   ; Load byte from first string
    mov bl, [edi]           ; Load byte from second string
    inc edi                 ; Advance pointer

    ; Convert both to lowercase
    call .to_lower_al       ; Convert AL to lowercase
    push eax
    mov al, bl
    call .to_lower_al       ; Convert BL to lowercase
    mov bl, al
    pop eax

    cmp al, bl              ; Compare lowercase bytes
    jne .not_equal

    test al, al             ; Check for null
    jnz .compare_loop

    xor eax, eax            ; Equal
    jmp .done

.not_equal:
    movzx eax, al
    movzx ebx, bl
    sub eax, ebx

.done:
    pop edi
    pop esi
    pop ebp
    ret

.to_lower_al:
    ; Convert AL to lowercase if uppercase
    cmp al, 'A'
    jb .not_upper
    cmp al, 'Z'
    ja .not_upper
    add al, 32              ; Convert to lowercase
.not_upper:
    ret

; String search (find substring)
strstr:
    ; Parameters: haystack=[ebp+8], needle=[ebp+12]
    ; Output: EAX = pointer to first occurrence, or 0 if not found
    push ebp
    mov ebp, esp
    push esi
    push edi
    push ebx

    mov esi, [ebp+8]        ; Haystack
    mov edi, [ebp+12]       ; Needle

    ; Check if needle is empty
    cmp byte [edi], 0
    je .found_at_start      ; Empty needle found at start

.search_loop:
    mov al, [esi]           ; Current haystack character
    test al, al             ; Check for end of haystack
    jz .not_found

    cmp al, [edi]           ; Compare with first needle character
    je .potential_match

    inc esi                 ; Move to next character in haystack
    jmp .search_loop

.potential_match:
    ; Found potential match, compare full substring
    push esi                ; Save haystack position
    push edi                ; Save needle position

.match_loop:
    lodsb                   ; Load haystack character
    mov bl, [edi]           ; Load needle character
    inc edi                 ; Advance needle

    cmp al, bl              ; Compare characters
    jne .no_match           ; Characters don't match

    test bl, bl             ; Check if end of needle
    jz .found_match         ; Complete match found

    test al, al             ; Check if end of haystack
    jz .no_match            ; Haystack ended before needle

    jmp .match_loop

.found_match:
    ; Complete match found
    add esp, 8              ; Clean up saved pointers
    mov eax, [ebp+8]        ; Return start of match
    jmp .done

.no_match:
    ; No match, restore positions and continue
    pop edi                 ; Restore needle pointer
    pop esi                 ; Restore haystack pointer
    inc esi                 ; Move to next position in haystack
    jmp .search_loop

.found_at_start:
    mov eax, esi            ; Return haystack pointer
    jmp .done

.not_found:
    xor eax, eax            ; Return NULL

.done:
    pop ebx
    pop edi
    pop esi
    pop ebp
    ret

6.3 Mathematical Functions¤

Text Only
; Integer square root (Newton's method)
isqrt:
    ; Input: EAX = number
    ; Output: EAX = integer square root
    push ebx
    push ecx
    push edx

    test eax, eax           ; Check for zero
    jz .done                ; sqrt(0) = 0

    ; Initial guess: x = n / 2
    mov ebx, eax            ; Save original number
    shr eax, 1              ; x = n / 2
    inc eax                 ; Ensure x > 0

.newton_loop:
    mov ecx, eax            ; Save current guess

    ; Calculate new guess: x = (x + n/x) / 2
    xor edx, edx            ; Clear for division
    mov eax, ebx            ; n
    div ecx                 ; n / x
    add eax, ecx            ; x + n/x
    shr eax, 1              ; (x + n/x) / 2

    ; Check for convergence
    cmp eax, ecx            ; Compare with previous guess
    jne .newton_loop        ; Continue if different

.done:
    pop edx
    pop ecx
    pop ebx
    ret

; Greatest Common Divisor (Euclidean algorithm)
gcd:
    ; Parameters: a=[ebp+8], b=[ebp+12]
    ; Output: EAX = GCD(a,b)
    push ebp
    mov ebp, esp
    push ebx
    push edx

    mov eax, [ebp+8]        ; a
    mov ebx, [ebp+12]       ; b

.gcd_loop:
    test ebx, ebx           ; Check if b == 0
    jz .done                ; If b == 0, GCD is a

    xor edx, edx            ; Clear remainder
    div ebx                 ; a / b, remainder in EDX
    mov eax, ebx            ; a = b
    mov ebx, edx            ; b = remainder
    jmp .gcd_loop

.done:
    pop edx
    pop ebx
    pop ebp
    ret

; Least Common Multiple
lcm:
    ; Parameters: a=[ebp+8], b=[ebp+12]  
    ; Output: EAX = LCM(a,b) = (a*b)/GCD(a,b)
    push ebp
    mov ebp, esp
    push ebx
    push ecx
    push edx

    mov eax, [ebp+8]        ; a
    mov ebx, [ebp+12]       ; b

    ; Calculate a * b
    mul ebx                 ; EDX:EAX = a * b
    push eax                ; Save low part of product
    push edx                ; Save high part of product

    ; Calculate GCD(a,b)
    push dword [ebp+12]     ; Push b
    push dword [ebp+8]      ; Push a
    call gcd                ; Call GCD function
    add esp, 8              ; Clean up parameters

    mov ebx, eax            ; EBX = GCD(a,b)
    pop edx                 ; Restore high part of product
    pop eax                 ; Restore low part of product

    ; Divide product by GCD
    div ebx                 ; EAX = (a*b) / GCD(a,b)

    pop edx
    pop ecx
    pop ebx
    pop ebp
    ret

; Fast modular exponentiation (a^b mod m)
modpow:
    ; Parameters: base=[ebp+8], exp=[ebp+12], mod=[ebp+16]
    ; Output: EAX = (base^exp) mod mod
    push ebp
    mov ebp, esp
    push ebx
    push ecx
    push edx
    push esi

    mov eax, [ebp+8]        ; base
    mov ecx, [ebp+12]       ; exponent
    mov esi, [ebp+16]       ; modulus

    ; Handle special cases
    test esi, esi           ; Check for mod = 0
    jz .error
    cmp esi, 1              ; If mod = 1, result is 0
    je .zero_result

    mov ebx, 1              ; result = 1

    ; Reduce base modulo m
    xor edx, edx
    div esi                 ; base = base mod m
    mov eax, edx            ; EAX = base mod m

.power_loop:
    test ecx, ecx           ; Check if exponent is 0
    jz .done

    test ecx, 1             ; Check if exponent is odd
    jz .even_exp

    ; Odd exponent: result = (result * base) mod m
    push eax                ; Save base
    mov eax, ebx            ; result
    mul dword [esp]         ; result * base
    div esi                 ; (result * base) mod m
    mov ebx, edx            ; Update result
    pop eax                 ; Restore base

.even_exp:
    ; Square the base: base = (base * base) mod m
    push ebx                ; Save result
    mul eax                 ; base * base
    div esi                 ; (base * base) mod m
    mov eax, edx            ; Update base
    pop ebx                 ; Restore result

    shr ecx, 1              ; exponent = exponent / 2
    jmp .power_loop

.done:
    mov eax, ebx            ; Return result
    jmp .exit

.zero_result:
    xor eax, eax            ; Return 0
    jmp .exit

.error:
    mov eax, -1             ; Return error

.exit:
    pop esi
    pop edx
    pop ecx
    pop ebx
    pop ebp
    ret

; Factorial calculation (iterative)
factorial:
    ; Input: EAX = n
    ; Output: EAX = n!
    push ebx
    push ecx

    mov ecx, eax            ; Counter
    mov eax, 1              ; Result

    test ecx, ecx           ; Check for n = 0
    jz .done                ; 0! = 1

.fact_loop:
    mul ecx                 ; result *= counter
    dec ecx                 ; counter--
    jnz .fact_loop          ; Continue while counter > 0

.done:
    pop ecx
    pop ebx
    ret

; Fibonacci calculation (iterative)
fibonacci:
    ; Input: EAX = n
    ; Output: EAX = nth Fibonacci number
    push ebx
    push ecx

    test eax, eax           ; Check for n = 0
    jz .fib_zero
    cmp eax, 1              ; Check for n = 1
    je .fib_one

    mov ecx, eax            ; Counter
    mov eax, 0              ; F(0) = 0
    mov ebx, 1              ; F(1) = 1

.fib_loop:
    add eax, ebx            ; F(n) = F(n-1) + F(n-2)
    xchg eax, ebx           ; Swap values
    dec ecx                 ; Decrement counter
    cmp ecx, 1              ; Continue until n = 1
    jg .fib_loop

    mov eax, ebx            ; Return result
    jmp .done

.fib_zero:
    xor eax, eax            ; F(0) = 0
    jmp .done

.fib_one:
    mov eax, 1              ; F(1) = 1

.done:
    pop ecx
    pop ebx
    ret

7. DEBUGGING AND OPTIMIZATION¤

7.1 GDB Debugging Commands¤

Bash
# Compilation for debugging
nasm -f elf32 -g -F dwarf program.asm -o program.o
ld -m elf_i386 program.o -o program

# Start GDB
gdb ./program

# Basic GDB commands:
(gdb) break _start          # Set breakpoint at _start
(gdb) break *0x08048080     # Set breakpoint at address
(gdb) run                   # Start program execution
(gdb) continue              # Continue execution
(gdb) step                  # Single step (into functions)
(gdb) stepi                 # Single instruction step
(gdb) next                  # Next line (over functions)
(gdb) nexti                 # Next instruction (over calls)

# Register examination
(gdb) info registers        # Show all registers
(gdb) info registers eax    # Show specific register
(gdb) print $eax           # Print register value
(gdb) print/x $eax         # Print in hexadecimal
(gdb) print/t $eax         # Print in binary
(gdb) print/d $eax         # Print in decimal

# Memory examination
(gdb) x/10i $eip           # Examine 10 instructions at EIP
(gdb) x/10x $esp           # Examine 10 words at ESP in hex
(gdb) x/10b $esp           # Examine 10 bytes at ESP
(gdb) x/s 0x8048000        # Examine string at address
(gdb) x/10i _start         # Examine instructions at _start

# Memory modification
(gdb) set $eax = 0x12345   # Set register value
(gdb) set {int}0x8048000 = 0x90909090  # Set memory value

# Watchpoints
(gdb) watch variable_name   # Break when variable changes
(gdb) watch *0x8048000     # Break when memory location changes
(gdb) rwatch *0x8048000    # Break when memory location is read

# Stack examination
(gdb) bt                   # Backtrace (call stack)
(gdb) info frame           # Current frame info
(gdb) info args            # Function arguments
(gdb) info locals          # Local variables

# Assembly-specific commands
(gdb) disassemble          # Disassemble current function
(gdb) disassemble _start   # Disassemble specific function
(gdb) set disassembly-flavor intel  # Use Intel syntax

7.2 Performance Optimization Techniques¤

Text Only
; 1. Loop Optimization
; Bad: Inefficient loop
bad_loop:
    mov ecx, 1000           ; Counter
    mov esi, array_start    ; Array pointer

.loop:
    mov eax, [esi]          ; Load array element
    add eax, 1              ; Increment
    mov [esi], eax          ; Store back
    add esi, 4              ; Next element
    dec ecx                 ; Decrement counter
    cmp ecx, 0              ; Compare with zero
    jne .loop               ; Jump if not zero
    ret

; Good: Optimized loop
good_loop:
    mov ecx, 1000           ; Counter
    mov esi, array_start    ; Array pointer

.loop:
    inc dword [esi]         ; Increment in memory (fewer instructions)
    add esi, 4              ; Next element
    loop .loop              ; Decrement ECX and jump if not zero
    ret

; 2. Strength Reduction
; Bad: Using multiplication
bad_multiply:
    mov eax, [index]        ; Load index
    mov ebx, 4              ; Size of element
    mul ebx                 ; Multiply by element size
    add eax, array_base     ; Add base address
    mov ebx, [eax]          ; Load element
    ret

; Good: Using shift (for powers of 2)
good_multiply:
    mov eax, [index]        ; Load index
    shl eax, 2              ; Multiply by 4 (shift left by 2)
    add eax, array_base     ; Add base address
    mov ebx, [eax]          ; Load element
    ret

; Even better: Using LEA
best_multiply:
    mov eax, [index]        ; Load index
    lea ebx, [array_base + eax*4]  ; Calculate address in one instruction
    mov eax, [ebx]          ; Load element
    ret

; 3. Branch Prediction Optimization
; Bad: Unpredictable branches
bad_branching:
    mov ecx, 1000
    mov esi, data_array

.loop:
    mov eax, [esi]
    test eax, 1             ; Check if odd
    jz .even                ; Jump if even (unpredictable)

    ; Process odd numbers
    add eax, 1
    jmp .continue

.even:
    ; Process even numbers
    shr eax, 1

.continue:
    mov [esi], eax
    add esi, 4
    loop .loop
    ret

; Good: Minimize branches using conditional moves
good_branching:
    mov ecx, 1000
    mov esi, data_array

.loop:
    mov eax, [esi]
    mov ebx, eax            ; Copy for even processing
    mov edx, eax            ; Copy for odd processing

    shr ebx, 1              ; Process as even
    inc edx                 ; Process as odd

    test eax, 1             ; Check if odd
    cmovz eax, ebx          ; Use even result if zero (even)
    cmovnz eax, edx         ; Use odd result if non-zero (odd)

    mov [esi], eax
    add esi, 4
    loop .loop
    ret

; 4. Cache-Friendly Memory Access
; Bad: Poor cache locality
bad_cache_access:
    mov ecx, 1000           ; Rows
    mov edx, 1000           ; Columns

.outer_loop:
    push ecx
    mov ecx, edx            ; Columns counter
    mov esi, 0              ; Column index

.inner_loop:
    ; Access matrix[row][col] - column-major access (bad for cache)
    mov eax, esi            ; Column
    imul eax, 1000          ; * number of rows
    add eax, [current_row]  ; + row index
    shl eax, 2              ; * sizeof(int)
    add eax, matrix_base    ; + base address

    inc dword [eax]         ; Process element
    inc esi                 ; Next column
    loop .inner_loop

    inc dword [current_row] ; Next row
    pop ecx
    loop .outer_loop
    ret

; Good: Cache-friendly access pattern
good_cache_access:
    mov ecx, 1000           ; Rows
    mov esi, matrix_base    ; Start of matrix

.outer_loop:
    push ecx
    mov ecx, 1000           ; Columns counter

.inner_loop:
    inc dword [esi]         ; Process element (sequential access)
    add esi, 4              ; Next element
    loop .inner_loop

    pop ecx
    loop .outer_loop
    ret

; 5. SIMD Optimization
; Scalar version (processes one element at a time)
scalar_add:
    mov ecx, 1000           ; Number of elements
    mov esi, array_a        ; First array
    mov edi, array_b        ; Second array
    mov ebx, result_array   ; Result array

.loop:
    mov eax, [esi]          ; Load from array_a
    add eax, [edi]          ; Add from array_b
    mov [ebx], eax          ; Store to result
    add esi, 4              ; Next element in array_a
    add edi, 4              ; Next element in array_b
    add ebx, 4              ; Next element in result
    loop .loop
    ret

; SIMD version (processes 4 elements at a time)
simd_add:
    mov ecx, 250            ; Number of SIMD iterations (1000/4)
    mov esi, array_a        ; First array
    mov edi, array_b        ; Second array
    mov ebx, result_array   ; Result array

.loop:
    movups xmm0, [esi]      ; Load 4 floats from array_a
    movups xmm1, [edi]      ; Load 4 floats from array_b
    addps xmm0, xmm1        ; Add 4 pairs simultaneously
    movups [ebx], xmm0      ; Store 4 results
    add esi, 16             ; Next 4 elements (4 * 4 bytes)
    add edi, 16             ; Next 4 elements
    add ebx, 16             ; Next 4 elements
    loop .loop
    ret

; 6. Function Call Optimization
; Bad: Excessive function calls
bad_function_calls:
    mov ecx, 1000

.loop:
    push ecx                ; Save counter
    push eax                ; Push parameter
    call small_function     ; Call function
    add esp, 4              ; Clean up parameter
    pop ecx                 ; Restore counter
    loop .loop
    ret

small_function:
    push ebp
    mov ebp, esp
    mov eax, [ebp+8]        ; Get parameter
    add eax, 1              ; Simple operation
    pop ebp
    ret

; Good: Inline the function
good_inline:
    mov ecx, 1000

.loop:
    inc eax                 ; Inlined operation
    loop .loop
    ret

; 7. Register Usage Optimization
; Bad: Excessive memory access
bad_register_usage:
    mov ecx, 1000

.loop:
    mov eax, [temp_var]     ; Load from memory
    add eax, 1              ; Increment
    mov [temp_var], eax     ; Store to memory

    mov ebx, [counter]      ; Load counter
    inc ebx                 ; Increment counter
    mov [counter], ebx      ; Store counter

    dec ecx
    jnz .loop
    ret

; Good: Keep variables in registers
good_register_usage:
    mov ecx, 1000           ; Loop counter
    mov eax, [temp_var]     ; Load once, keep in register
    mov ebx, [counter]      ; Load once, keep in register

.loop:
    inc eax                 ; Increment (in register)
    inc ebx                 ; Increment counter (in register)
    dec ecx
    jnz .loop

    mov [temp_var], eax     ; Store once at end
    mov [counter], ebx      ; Store once at end
    ret

7.3 Code Size Optimization¤

Text Only
; 1. Instruction Selection
; Longer instructions
mov eax, 0                  ; 5 bytes
mov ebx, ebx                ; 2 bytes (no-op)
add eax, 1                  ; 3 bytes

; Shorter instructions
xor eax, eax                ; 2 bytes (same effect as mov eax, 0)
nop                         ; 1 byte (no-op)
inc eax                     ; 1 byte (same effect as add eax, 1)

; 2. Use smaller data types when possible
; 32-bit operations
mov eax, [big_array + ebx*4]    ; Longer encoding
add eax, 1000                   ; Longer immediate

; 16-bit operations (when range allows)
mov ax, [small_array + bx*2]    ; Shorter encoding
add ax, 100                     ; Shorter immediate

; 8-bit operations (when range allows)
mov al, [byte_array + bx]       ; Shortest encoding
add al, 10                      ; Shortest immediate

; 3. Jump optimization
; Long jumps
jmp far_label                   ; May use 32-bit displacement

; Short jumps (when possible)
jmp short near_label            ; Uses 8-bit displacement

; 4. Use LOOP instruction when appropriate
; Manual loop
dec ecx
cmp ecx, 0
jne loop_start                  ; Multiple instructions

; LOOP instruction
loop loop_start                 ; Single instruction

; 5. Optimize common patterns
; Clear register (5 bytes)
mov eax, 0

; Clear register (2 bytes)
xor eax, eax

; Set register to -1 (5 bytes)
mov eax, -1

; Set register to -1 (2-3 bytes)
or eax, -1
; or
sbb eax, eax                    ; Sets to -1 if CF=1, 0 if CF=0

8. MACRO PROGRAMMING¤

8.1 NASM Macro System¤

Text Only
; Simple macros
%macro PRINT_CHAR 1
    mov eax, 4              ; sys_write
    mov ebx, 1              ; stdout
    mov ecx, %1             ; character to print
    mov edx, 1              ; length
    int 0x80
%endmacro

; Multi-line macro with parameters
%macro SAVE_REGS 0
    push eax
    push ebx
    push ecx
    push edx
%endmacro

%macro RESTORE_REGS 0
    pop edx
    pop ecx
    pop ebx
    pop eax
%endmacro

; Conditional macro parameters
%macro PRINT_STRING 1-2 0
    mov eax, 4              ; sys_write
    mov ebx, 1              ; stdout
    mov ecx, %1             ; string address
    %if %0 > 1
        mov edx, %2         ; length provided
    %else
        mov edx, strlen(%1) ; calculate length
    %endif
    int 0x80
%endmacro

; Variable argument macro
%macro PUSH_MULTIPLE 1-*
    %rep %0                 ; Repeat for each argument
        push %1             ; Push current argument
        %rotate 1           ; Move to next argument
    %endrep
%endmacro

; Macro with local labels
%macro SAFE_DIVIDE 2
    %%check_zero:
    cmp %2, 0               ; Check for division by zero
    je %%division_by_zero

    xor edx, edx            ; Clear remainder
    mov eax, %1             ; Dividend
    div %2                  ; Divide
    jmp %%done

    %%division_by_zero:
    mov eax, -1             ; Error code

    %%done:
%endmacro

; Advanced macro with context
%macro FUNCTION 1
    %push function_context
    %define %$name %1

    %1:
    push ebp
    mov ebp, esp
    %assign %$localsize 0
%endmacro

%macro LOCAL 1-2 1
    %assign %$localsize %$localsize + (%2 * 4)
    %define %1 ebp - %$localsize
%endmacro

%macro ENDFUNCTION 0
    mov esp, ebp
    pop ebp
    ret
    %pop
%endmacro

; Usage example
FUNCTION my_function
    LOCAL temp_var          ; [ebp-4]
    LOCAL array, 10         ; [ebp-44] (10 dwords)

    sub esp, %$localsize    ; Allocate local variables

    ; Function body
    mov dword [temp_var], 42

    add esp, %$localsize    ; Deallocate locals
ENDFUNCTION

; String macros
%macro STRING 2
    %1 db %2, 0
    %1_len equ $ - %1 - 1
%endmacro

; Usage
STRING hello_msg, "Hello, World!"
; Expands to:
; hello_msg db "Hello, World!", 0
; hello_msg_len equ $ - hello_msg - 1

; System call macro
%macro SYSCALL 1-6
    %if %0 >= 1
        mov eax, %1         ; System call number
    %endif
    %if %0 >= 2
        mov ebx, %2         ; First argument
    %endif
    %if %0 >= 3
        mov ecx, %3         ; Second argument
    %endif
    %if %0 >= 4
        mov edx, %4         ; Third argument
    %endif
    %if %0 >= 5
        mov esi, %5         ; Fourth argument
    %endif
    %if %0 >= 6
        mov edi, %6         ; Fifth argument
    %endif
    int 0x80
%endmacro

; Usage examples
SYSCALL 4, 1, msg, msg_len  ; write(stdout, msg, msg_len)
SYSCALL 1, 0                ; exit(0)

; Debugging macro
%macro DEBUG_PRINT 1
%ifdef DEBUG
    push eax
    push ebx
    push ecx
    push edx

    PRINT_STRING %1

    pop edx
    pop ecx
    pop ebx
    pop eax
%endif
%endmacro

; Compile-time calculations
%assign BUFFER_SIZE 1024
%assign HALF_BUFFER BUFFER_SIZE / 2
%assign BUFFER_MASK BUFFER_SIZE - 1

; Conditional compilation
%ifdef WIN32
    %define NEWLINE 13, 10
%else
    %define NEWLINE 10
%endif

; Preprocessor functions
%define ALIGN_UP(x, a) (((x) + (a) - 1) & ~((a) - 1))
%define MIN(a, b) ((a) < (b) ? (a) : (b))
%define MAX(a, b) ((a) > (b) ? (a) : (b))

; Structure macro
%macro STRUCT 1
    struc %1
%endmacro

%macro ENDS 0
    endstruc
%endmacro

; Usage
STRUCT Point
    .x resd 1
    .y resd 1
ENDS

; Create instance
my_point:
    istruc Point
        at Point.x, dd 10
        at Point.y, dd 20
    iend

8.2 Advanced Macro Techniques¤

Text Only
; Recursive macros
%macro FACTORIAL 1
    %if %1 <= 1
        %assign %%result 1
    %else
        FACTORIAL (%1 - 1)
        %assign %%result %1 * %%result
    %endif
%endmacro

; Generate lookup table
%macro GENERATE_SQUARES 1
    square_table:
    %assign %%i 0
    %rep %1
        dd %%i * %%i
        %assign %%i %%i + 1
    %endrep
%endmacro

GENERATE_SQUARES 256        ; Generate table of squares 0-255

; Template-like macro
%macro DEFINE_SORT_FUNCTION 2  ; %1=name, %2=element_size
%1:
    push ebp
    mov ebp, esp
    push esi
    push edi
    push ebx

    mov esi, [ebp+8]        ; Array pointer
    mov ecx, [ebp+12]       ; Number of elements

    %%outer_loop:
    dec ecx
    jz %%done
    mov edi, esi            ; Start of unsorted portion
    mov ebx, ecx            ; Inner loop counter

    %%inner_loop:
    mov eax, [edi]          ; Current element
    mov edx, [edi + %2]     ; Next element
    cmp eax, edx
    jle %%no_swap

    ; Swap elements
    mov [edi], edx
    mov [edi + %2], eax

    %%no_swap:
    add edi, %2             ; Move to next element
    dec ebx
    jnz %%inner_loop

    jmp %%outer_loop

    %%done:
    pop ebx
    pop edi
    pop esi
    pop ebp
    ret
%endmacro

; Generate sort functions for different data types
DEFINE_SORT_FUNCTION sort_bytes, 1
DEFINE_SORT_FUNCTION sort_words, 2
DEFINE_SORT_FUNCTION sort_dwords, 4

; Generic container macros
%macro DEFINE_ARRAY 3       ; %1=name, %2=type, %3=size
%1:
    .data times %3 %2 0
    .size equ %3
    .element_size equ %2_size
    .capacity equ %3
    .count dd 0
%endmacro

%macro ARRAY_PUSH 2         ; %1=array_name, %2=value
    mov eax, [%1.count]
    cmp eax, %1.capacity
    jge %%array_full

    mov ebx, %1.data
    mov ecx, %1.element_size
    mul ecx                 ; EAX = index * element_size
    add ebx, eax            ; EBX = address of element

    mov eax, %2             ; Value to store
    mov [ebx], eax          ; Store value

    inc dword [%1.count]    ; Increment count
    %%array_full:
%endmacro

; Usage
DEFINE_ARRAY my_array, dd, 100
ARRAY_PUSH my_array, 42

; Code generation macro
%macro GENERATE_CASE_TABLE 1-*
    %1_table:
    %assign %%case_num 0
    %rep %0 - 1
        %rotate 1
        dd %1_case_%[%%case_num]
        %assign %%case_num %%case_num + 1
    %endrep

    %1_dispatch:
    push ebp
    mov ebp, esp
    mov eax, [ebp+8]        ; Case number
    cmp eax, %0 - 1
    jae %%default_case

    jmp [%1_table + eax*4]  ; Jump to case

    %%default_case:
    ; Default case handling

    %assign %%case_num 0
    %rep %0 - 1
        %1_case_%[%%case_num]:
        ; Case handling code would go here
        jmp %%end_switch
        %assign %%case_num %%case_num + 1
    %endrep

    %%end_switch:
    pop ebp
    ret
%endmacro

; Error handling macros
%macro TRY 0
    %push try_block
    %define %$catch_label %%catch_%[__LINE__]
%endmacro

%macro CATCH 0
    jmp %%end_try_%[__LINE__]
    %$catch_label:
%endmacro

%macro ENDTRY 0
    %%end_try_%[__LINE__]:
    %pop
%endmacro

%macro THROW 0
    jmp %$catch_label
%endmacro

; Usage
TRY
    ; Some code that might fail
    cmp eax, 0
    je THROW               ; Throw exception if zero

CATCH
    ; Exception handling code
    mov eax, -1            ; Error code

ENDTRY

9. INTERFACING WITH C¤

9.1 Calling C Functions from Assembly¤

Text Only
; Linking with C library
; Compile: nasm -f elf32 program.asm -o program.o
; Link: gcc -m32 program.o -o program

section .data
    format_str db "Number: %d, String: %s", 10, 0
    test_string db "Hello from ASM!", 0
    scanf_format db "%d", 0

section .bss
    user_input resd 1

section .text
    extern printf
    extern scanf
    extern malloc
    extern free
    extern exit
    global main

main:
    push ebp
    mov ebp, esp

    ; Call printf
    push test_string        ; Push arguments in reverse order
    push 42
    push format_str
    call printf             ; Call C function
    add esp, 12             ; Clean up stack (3 args * 4 bytes)

    ; Call scanf
    push user_input         ; Address to store input
    push scanf_format
    call scanf
    add esp, 8              ; Clean up stack

    ; Allocate memory with malloc
    push 100                ; Size to allocate
    call malloc
    add esp, 4              ; Clean up stack

    test eax, eax           ; Check if allocation succeeded
    jz .malloc_failed

    mov ebx, eax            ; Save pointer

    ; Use allocated memory
    mov dword [ebx], 12345  ; Store value in allocated memory

    ; Free memory
    push ebx                ; Pointer to free
    call free
    add esp, 4              ; Clean up stack

.malloc_failed:
    ; Exit program
    push 0                  ; Exit status
    call exit
    add esp, 4              ; Clean up (though we never return)

    ; Alternative exit method
    mov ebp, esp
    pop ebp
    ret                     ; Return to C runtime

; Calling convention examples
; cdecl calling convention (default for C)
call_cdecl_function:
    push 30                 ; Third argument
    push 20                 ; Second argument
    push 10                 ; First argument
    call c_function         ; Call C function
    add esp, 12             ; Caller cleans up stack
    ; Return value in EAX
    ret

; stdcall calling convention (Windows API)
call_stdcall_function:
    push 30                 ; Third argument
    push 20                 ; Second argument
    push 10                 ; First argument
    call stdcall_function   ; Function cleans up its own stack
    ; No stack cleanup needed
    ; Return value in EAX
    ret

; fastcall calling convention
call_fastcall_function:
    mov ecx, 10             ; First argument in ECX
    mov edx, 20             ; Second argument in EDX
    push 30                 ; Additional arguments on stack
    call fastcall_function  ; Call function
    add esp, 4              ; Clean up stack arguments only
    ; Return value in EAX
    ret

9.2 Assembly Functions Called from C¤

C
// C header file (asm_functions.h)
#ifndef ASM_FUNCTIONS_H
#define ASM_FUNCTIONS_H

extern int asm_add(int a, int b);
extern int asm_factorial(int n);
extern void asm_string_copy(char* dest, const char* src);
extern int asm_array_sum(int* array, int count);
extern void asm_matrix_multiply(int* a, int* b, int* result, int size);

#endif
Text Only
; Assembly implementation (asm_functions.asm)
section .text
    global asm_add
    global asm_factorial
    global asm_string_copy
    global asm_array_sum
    global asm_matrix_multiply

; int asm_add(int a, int b);
asm_add:
    push ebp
    mov ebp, esp

    mov eax, [ebp+8]        ; First parameter (a)
    add eax, [ebp+12]       ; Add second parameter (b)

    pop ebp
    ret                     ; Return value in EAX

; int asm_factorial(int n);
asm_factorial:
    push ebp
    mov ebp, esp
    push ebx

    mov ebx, [ebp+8]        ; Get parameter n
    mov eax, 1              ; Initialize result

    test ebx, ebx           ; Check if n <= 0
    jle .done

.factorial_loop:
    imul eax, ebx           ; result *= n
    dec ebx                 ; n--
    jnz .factorial_loop     ; Continue if n != 0

.done:
    pop ebx
    pop ebp
    ret

; void asm_string_copy(char* dest, const char* src);
asm_string_copy:
    push ebp
    mov ebp, esp
    push esi
    push edi

    mov edi, [ebp+8]        ; Destination
    mov esi, [ebp+12]       ; Source

.copy_loop:
    lodsb                   ; Load byte from [ESI] to AL
    stosb                   ; Store AL to [EDI]
    test al, al             ; Check for null terminator
    jnz .copy_loop          ; Continue if not null

    pop edi
    pop esi
    pop ebp
    ret

; int asm_array_sum(int* array, int count);
asm_array_sum:
    push ebp
    mov ebp, esp
    push esi
    push ecx

    mov esi, [ebp+8]        ; Array pointer
    mov ecx, [ebp+12]       ; Count
    xor eax, eax            ; Initialize sum to 0

    test ecx, ecx           ; Check if count is 0
    jz .done

.sum_loop:
    add eax, [esi]          ; Add current element to sum
    add esi, 4              ; Move to next element (4 bytes per int)
    dec ecx                 ; Decrement counter
    jnz .sum_loop           ; Continue if not zero

.done:
    pop ecx
    pop esi
    pop ebp
    ret

; void asm_matrix_multiply(int* a, int* b, int* result, int size);
asm_matrix_multiply:
    push ebp
    mov ebp, esp
    push esi
    push edi
    push ebx
    push ecx
    push edx

    mov esi, [ebp+8]        ; Matrix A
    mov edi, [ebp+12]       ; Matrix B
    mov ebx, [ebp+16]       ; Result matrix
    mov ecx, [ebp+20]       ; Size

    ; Initialize loop counters
    mov dword [.row], 0     ; i = 0

.row_loop:
    mov eax, [.row]         ; Get current row
    cmp eax, ecx            ; Compare with size
    jge .done               ; Exit if i >= size

    mov dword [.col], 0     ; j = 0

.col_loop:
    mov eax, [.col]         ; Get current column
    cmp eax, ecx            ; Compare with size
    jge .next_row           ; Exit inner loop if j >= size

    ; Calculate result[i][j] = sum of A[i][k] * B[k][j]
    mov dword [.sum], 0     ; Initialize sum
    mov dword [.k], 0       ; k = 0

.k_loop:
    mov eax, [.k]           ; Get k
    cmp eax, ecx            ; Compare with size
    jge .store_result       ; Exit if k >= size

    ; Calculate A[i][k]
    mov eax, [.row]         ; i
    imul eax, ecx           ; i * size
    add eax, [.k]           ; i * size + k
    mov edx, [esi + eax*4]  ; A[i][k]

    ; Calculate B[k][j]
    mov eax, [.k]           ; k
    imul eax, ecx           ; k * size
    add eax, [.col]         ; k * size + j
    imul edx, [edi + eax*4] ; A[i][k] * B[k][j]

    ; Add to sum
    add [.sum], edx

    inc dword [.k]          ; k++
    jmp .k_loop

.store_result:
    ; Store result[i][j]
    mov eax, [.row]         ; i
    imul eax, ecx           ; i * size
    add eax, [.col]         ; i * size + j
    mov edx, [.sum]
    mov [ebx + eax*4], edx  ; result[i][j] = sum

    inc dword [.col]        ; j++
    jmp .col_loop

.next_row:
    inc dword [.row]        ; i++
    jmp .row_loop

.done:
    pop edx
    pop ecx
    pop ebx
    pop edi
    pop esi
    pop ebp
    ret

section .bss
    .row resd 1             ; Local variables
    .col resd 1
    .k resd 1
    .sum resd 1
C
// C program using assembly functions (main.c)
#include <stdio.h>
#include <stdlib.h>
#include "asm_functions.h"

int main() {
    // Test asm_add
    int result = asm_add(15, 25);
    printf("15 + 25 = %d\n", result);

    // Test asm_factorial
    int fact = asm_factorial(5);
    printf("5! = %d\n", fact);

    // Test asm_string_copy
    char source[] = "Hello, Assembly!";
    char dest[50];
    asm_string_copy(dest, source);
    printf("Copied string: %s\n", dest);

    // Test asm_array_sum
    int array[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    int sum = asm_array_sum(array, 10);
    printf("Sum of array: %d\n", sum);

    // Test matrix multiplication
    int size = 3;
    int a[9] = {1, 2, 3, 4, 5, 6, 7, 8, 9};
    int b[9] = {9, 8, 7, 6, 5, 4, 3, 2, 1};
    int result_matrix[9];

    asm_matrix_multiply(a, b, result_matrix, size);

    printf("Matrix multiplication result:\n");
    for (int i = 0; i < size; i++) {
        for (int j = 0; j < size; j++) {
            printf("%d ", result_matrix[i * size + j]);
        }
        printf("\n");
    }

    return 0;
}

9.3 Inline Assembly in C (GCC)¤

C
// Basic inline assembly
#include <stdio.h>

int main() {
    int a = 10, b = 20, result;

    // Extended inline assembly
    asm volatile (
        "movl %1, %%eax\n\t"        // Move a to EAX
        "addl %2, %%eax\n\t"        // Add b to EAX
        "movl %%eax, %0"            // Move result to output
        : "=m" (result)             // Output operands
        : "m" (a), "m" (b)          // Input operands
        : "eax"                     // Clobbered registers
    );

    printf("Result: %d\n", result);

    // Using register constraints
    int x = 5, y = 3, multiply_result;
    asm volatile (
        "imull %2, %0"              // Multiply
        : "=r" (multiply_result)    // Output in any register
        : "0" (x), "r" (y)          // Input: %0 = x (same as output), %2 = y
    );

    printf("5 * 3 = %d\n", multiply_result);

    // Inline assembly with memory operands
    int array[5] = {1, 2, 3, 4, 5};
    int array_sum = 0;

    asm volatile (
        "movl $0, %%eax\n\t"        // Clear EAX (sum)
        "movl $0, %%ecx\n\t"        // Clear ECX (index)
        "1:\n\t"                    // Loop label
        "addl (%1,%%ecx,4), %%eax\n\t" // Add array[index] to sum
        "incl %%ecx\n\t"            // Increment index
        "cmpl $5, %%ecx\n\t"        // Compare with array length
        "jl 1b\n\t"                 // Jump back if less
        "movl %%eax, %0"            // Store result
        : "=m" (array_sum)          // Output
        : "r" (array)               // Input: array base address
        : "eax", "ecx"              // Clobbered registers
    );

    printf("Array sum: %d\n", array_sum);

    // Reading CPU information with CPUID
    unsigned int eax, ebx, ecx, edx;
    asm volatile (
        "cpuid"
        : "=a" (eax), "=b" (ebx), "=c" (ecx), "=d" (edx)
        : "a" (0)                   // Input: function 0
    );

    printf("CPUID: EAX=%08X EBX=%08X ECX=%08X EDX=%08X\n", 
           eax, ebx, ecx, edx);

    // Reading timestamp counter
    unsigned long long timestamp;
    asm volatile (
        "rdtsc\n\t"                 // Read timestamp counter
        "movl %%eax, %0\n\t"        // Store low part
        "movl %%edx, %1"            // Store high part
        : "=m" (((unsigned int*)&timestamp)[0]),
          "=m" (((unsigned int*)&timestamp)[1])
        :
        : "eax", "edx"
    );

    printf("Timestamp: %llu\n", timestamp);

    return 0;
}

10. PRACTICAL EXAMPLES¤

10.1 Complete Program Examples¤

Text Only
; Example 1: Simple Calculator
section .data
    menu_msg db "Calculator Menu:", 10
             db "1. Add", 10
             db "2. Subtract", 10  
             db "3. Multiply", 10
             db "4. Divide", 10
             db "5. Exit", 10
             db "Choice: ", 0
    menu_len equ $ - menu_msg

    num1_msg db "Enter first number: ", 0
    num1_len equ $ - num1_msg

    num2_msg db "Enter second number: ", 0
    num2_len equ $ - num2_msg

    result_msg db "Result: ", 0
    result_len equ $ - result_msg

    error_msg db "Error: Division by zero!", 10, 0
    error_len equ $ - error_msg

    newline db 10, 0

section .bss
    choice resb 2
    num1 resb 10
    num2 resb 10
    result resb 15

section .text
    global _start

_start:
    call main_loop

    ; Exit program
    mov eax, 1
    mov ebx, 0
    int 0x80

main_loop:
    ; Display menu
    mov eax, 4
    mov ebx, 1
    mov ecx, menu_msg
    mov edx, menu_len
    int 0x80

    ; Read choice
    mov eax, 3
    mov ebx, 0
    mov ecx, choice
    mov edx, 2
    int 0x80

    ; Check choice
    mov al, [choice]
    cmp al, '5'
    je .exit
    cmp al, '1'
    je .addition
    cmp al, '2'
    je .subtraction
    cmp al, '3'
    je .multiplication
    cmp al, '4'
    je .division
    jmp main_loop          ; Invalid choice, show menu again

.addition:
    call get_numbers
    call add_numbers
    call display_result
    jmp main_loop

.subtraction:
    call get_numbers
    call subtract_numbers
    call display_result
    jmp main_loop

.multiplication:
    call get_numbers
    call multiply_numbers
    call display_result
    jmp main_loop

.division:
    call get_numbers
    call divide_numbers
    call display_result
    jmp main_loop

.exit:
    ret

get_numbers:
    ; Get first number
    mov eax, 4
    mov ebx, 1
    mov ecx, num1_msg
    mov edx, num1_len
    int 0x80

    mov eax, 3
    mov ebx, 0
    mov ecx, num1
    mov edx, 10
    int 0x80

    ; Get second number
    mov eax, 4
    mov ebx, 1
    mov ecx, num2_msg
    mov edx, num2_len
    int 0x80

    mov eax, 3
    mov ebx, 0
    mov ecx, num2
    mov edx, 10
    int 0x80

    ret

add_numbers:
    call string_to_int_num1
    mov ebx, eax
    call string_to_int_num2
    add eax, ebx
    call int_to_string
    ret

subtract_numbers:
    call string_to_int_num1
    mov ebx, eax
    call string_to_int_num2
    sub ebx, eax
    mov eax, ebx
    call int_to_string
    ret

multiply_numbers:
    call string_to_int_num1
    mov ebx, eax
    call string_to_int_num2
    imul eax, ebx
    call int_to_string
    ret

divide_numbers:
    call string_to_int_num2
    test eax, eax
    jz .division_by_zero

    mov ebx, eax
    call string_to_int_num1
    cdq                     ; Sign extend EAX to EDX:EAX
    idiv ebx
    call int_to_string
    ret

.division_by_zero:
    mov eax, 4
    mov ebx, 1
    mov ecx, error_msg
    mov edx, error_len
    int 0x80
    ret

string_to_int_num1:
    mov esi, num1
    call string_to_int
    ret

string_to_int_num2:
    mov esi, num2
    call string_to_int
    ret

string_to_int:
    ; Input: ESI = string pointer
    ; Output: EAX = integer value
    push ebx
    push ecx
    push edx

    xor eax, eax            ; Result
    xor ebx, ebx            ; Sign (0 = positive, 1 = negative)
    mov ecx, 10             ; Base

    ; Check for negative sign
    cmp byte [esi], '-'
    jne .convert_loop
    mov ebx, 1              ; Set negative flag
    inc esi                 ; Skip minus sign

.convert_loop:
    mov dl, [esi]           ; Get character
    cmp dl, 10              ; Check for newline
    je .done
    cmp dl, 0               ; Check for null
    je .done
    cmp dl, '0'             ; Check if digit
    jb .done
    cmp dl, '9'
    ja .done

    sub dl, '0'             ; Convert to digit
    imul eax, ecx           ; Multiply result by 10
    add eax, edx            ; Add digit
    inc esi                 ; Next character
    jmp .convert_loop

.done:
    test ebx, ebx           ; Check sign flag
    jz .positive
    neg eax                 ; Make negative

.positive:
    pop edx
    pop ecx
    pop ebx
    ret

int_to_string:
    ; Input: EAX = integer
    ; Output: result buffer contains string
    push ebx
    push ecx
    push edx
    push edi

    mov edi, result         ; Destination buffer
    mov ebx, 10             ; Divisor
    mov ecx, 0              ; Digit counter

    ; Handle negative numbers
    test eax, eax
    jns .positive_number
    neg eax                 ; Make positive
    mov byte [edi], '-'     ; Store minus sign
    inc edi

.positive_number:
    ; Convert digits (in reverse order)
    push edi                ; Save buffer start

.convert_digits:
    xor edx, edx            ; Clear remainder
    div ebx                 ; Divide by 10
    add dl, '0'             ; Convert remainder to ASCII
    push edx                ; Push digit onto stack
    inc ecx                 ; Increment digit count
    test eax, eax           ; Check if quotient is zero
    jnz .convert_digits

    ; Pop digits in correct order
    pop edi                 ; Restore buffer position
.store_digits:
    pop edx                 ; Get digit
    mov [edi], dl           ; Store digit
    inc edi                 ; Next position
    dec ecx                 ; Decrement counter
    jnz .store_digits

    mov byte [edi], 0       ; Null terminate

    pop edi
    pop edx
    pop ecx
    pop ebx
    ret

display_result:
    ; Display result message
    mov eax, 4
    mov ebx, 1
    mov ecx, result_msg
    mov edx, result_len
    int 0x80

    ; Display result number
    mov edi, result
    call string_length
    mov edx, eax
    mov eax, 4
    mov ebx, 1
    mov ecx, result
    int 0x80

    ; Display newline
    mov eax, 4
    mov ebx, 1
    mov ecx, newline
    mov edx, 1
    int 0x80

    ret

string_length:
    ; Input: EDI = string pointer
    ; Output: EAX = string length
    push edi
    push ecx

    mov eax, 0              ; Search for null terminator
    mov ecx, -1             ; Maximum length
    repne scasb             ; Scan for null
    not ecx                 ; Convert to positive
    dec ecx                 ; Subtract 1 for null terminator
    mov eax, ecx            ; Return length

    pop ecx
    pop edi
    ret

10.2 Data Structures Implementation¤

Text Only
; Example 2: Dynamic Array (Vector) Implementation
section .data
    malloc_error db "Memory allocation failed!", 10, 0
    index_error db "Index out of bounds!", 10, 0

section .bss
    ; Vector structure
    struc Vector
        .data      resd 1    ; Pointer to data
        .size      resd 1    ; Current number of elements
        .capacity  resd 1    ; Maximum capacity
    endstruc

section .text
    extern malloc
    extern realloc
    extern free
    extern printf
    global vector_create
    global vector_destroy
    global vector_push_back
    global vector_pop_back
    global vector_get
    global vector_set
    global vector_size
    global vector_capacity

; Vector* vector_create(int initial_capacity)
vector_create:
    push ebp
    mov ebp, esp
    push ebx

    ; Allocate vector structure
    push Vector_size
    call malloc
    add esp, 4

    test eax, eax
    jz .allocation_failed

    mov ebx, eax            ; Save vector pointer

    ; Initialize vector fields
    mov dword [ebx + Vector.size], 0
    mov eax, [ebp+8]        ; Get initial capacity
    mov [ebx + Vector.capacity], eax

    ; Allocate data array
    shl eax, 2              ; capacity * sizeof(int)
    push eax
    call malloc
    add esp, 4

    test eax, eax
    jz .data_allocation_failed

    mov [ebx + Vector.data], eax
    mov eax, ebx            ; Return vector pointer
    jmp .done

.data_allocation_failed:
    push ebx
    call free
    add esp, 4

.allocation_failed:
    xor eax, eax            ; Return NULL

.done:
    pop ebx
    pop ebp
    ret

; void vector_destroy(Vector* vec)
vector_destroy:
    push ebp
    mov ebp, esp

    mov eax, [ebp+8]        ; Get vector pointer
    test eax, eax
    jz .done

    ; Free data array
    push dword [eax + Vector.data]
    call free
    add esp, 4

    ; Free vector structure
    push dword [ebp+8]
    call free
    add esp, 4

.done:
    pop ebp
    ret

; int vector_push_back(Vector* vec, int value)
vector_push_back:
    push ebp
    mov ebp, esp
    push ebx
    push esi

    mov ebx, [ebp+8]        ; Vector pointer
    mov esi, [ebp+12]       ; Value to add

    ; Check if resize is needed
    mov eax, [ebx + Vector.size]
    mov ecx, [ebx + Vector.capacity]
    cmp eax, ecx
    jl .no_resize

    ; Resize array (double capacity)
    shl ecx, 1              ; New capacity = old capacity * 2
    mov [ebx + Vector.capacity], ecx
    shl ecx, 2              ; New size in bytes

    push ecx                ; New size
    push dword [ebx + Vector.data]  ; Old data
    call realloc
    add esp, 8

    test eax, eax
    jz .realloc_failed

    mov [ebx + Vector.data], eax

.no_resize:
    ; Add element
    mov eax, [ebx + Vector.size]
    mov ecx, [ebx + Vector.data]
    mov [ecx + eax*4], esi  ; data[size] = value

    inc dword [ebx + Vector.size]   ; Increment size
    mov eax, 1              ; Return success
    jmp .done

.realloc_failed:
    xor eax, eax            ; Return failure

.done:
    pop esi
    pop ebx
    pop ebp
    ret

; int vector_pop_back(Vector* vec)
vector_pop_back:
    push ebp
    mov ebp, esp

    mov eax, [ebp+8]        ; Vector pointer
    mov ecx, [eax + Vector.size]
    test ecx, ecx
    jz .empty_vector

    dec ecx                 ; size--
    mov [eax + Vector.size], ecx

    ; Return the popped element
    mov edx, [eax + Vector.data]
    mov eax, [edx + ecx*4]  ; Return data[size-1]
    jmp .done

.empty_vector:
    mov eax, -1             ; Return error value

.done:
    pop ebp
    ret

; int vector_get(Vector* vec, int index)
vector_get:
    push ebp
    mov ebp, esp

    mov eax, [ebp+8]        ; Vector pointer
    mov ecx, [ebp+12]       ; Index

    ; Bounds check
    cmp ecx, [eax + Vector.size]
    jae .out_of_bounds

    mov edx, [eax + Vector.data]
    mov eax, [edx + ecx*4]  ; Return data[index]
    jmp .done

.out_of_bounds:
    mov eax, -1             ; Return error value

.done:
    pop ebp
    ret

; int vector_set(Vector* vec, int index, int value)
vector_set:
    push ebp
    mov ebp, esp

    mov eax, [ebp+8]        ; Vector pointer
    mov ecx, [ebp+12]       ; Index
    mov edx, [ebp+16]       ; Value

    ; Bounds check
    cmp ecx, [eax + Vector.size]
    jae .out_of_bounds

    mov eax, [eax + Vector.data]
    mov [eax + ecx*4], edx  ; data[index] = value
    mov eax, 1              ; Return success
    jmp .done

.out_of_bounds:
    xor eax, eax            ; Return failure

.done:
    pop ebp
    ret

; int vector_size(Vector* vec)
vector_size:
    push ebp
    mov ebp, esp

    mov eax, [ebp+8]        ; Vector pointer
    mov eax, [eax + Vector.size]

    pop ebp
    ret

; int vector_capacity(Vector* vec)
vector_capacity:
    push ebp
    mov ebp, esp

    mov eax, [ebp+8]        ; Vector pointer
    mov eax, [eax + Vector.capacity]

    pop ebp
    ret

10.3 Algorithm Implementation¤

```assembly ; Example 3: Quick Sort Implementation section .text global quicksort global partition

; void quicksort(int* array, int low, int high) quicksort: push ebp mov ebp, esp push ebx push esi push edi

Text Only
mov esi, [ebp+8]        ; Array pointer
mov ebx, [ebp+12]       ; Low index
mov edi, [ebp+16]       ; High index

; Check if low < high
cmp ebx, edi
jge .done

; Partition the array
push edi                ; High
push ebx                ; Low
push esi                ; Array
call partition
add esp, 12

mov ecx, eax            ; Save pivot index

; Recursively sort left partition
dec eax                 ; pivot - 1
push eax                ; High for left partition
push ebx                ; Low (unchanged)
push esi                ; Array
call quicksort
add esp, 12

; Recursively sort right partition
inc ecx                 ; pivot + 1
push edi                ; High (unchanged)
push ecx                ; Low for right partition
push esi                ; Array
call quicksort
add esp, 12

.done: pop edi pop esi pop ebx pop ebp ret

; int partition(int* array, int low, int high) partition: push ebp mov ebp, esp push ebx push esi push edi

Text Only
mov esi, [ebp+8]        ; Array pointer
mov ebx, [ebp+12]       ; Low index
mov edi, [ebp+16]       ; High index

; Choose pivot (last element)
mov eax, [esi + edi*4]  ; pivot = array[high]
mov ecx, ebx            ; i = low
dec ecx                 ; i = low - 1

mov edx, ebx            ; j = low

.partition_loop: cmp edx, edi ; if j >= high jge .final_swap

Text Only
; Compare array[j] with pivot
cmp dword [esi + edx*4], eax
jg .next_iteration

; array[j] <= pivot, so swap
inc ecx                 ; i++

; Swap array[i] and array[]