assembly - Why are disassembled data becoming instructions? -
i need understand happens in moment when fragment of code "happens": "jmp begin". understand .com file can 64kb want put in 1 segment. need jmp if want put variables. when search it, many guides in comment jmp begin skip data , nothing else. , here question: happens in moment:
it appears runs this
mov al, mov bl, b sub al, bl
but can't understand why looks in turbo debugger. when change starting value of result ? greater 0 changes else , when change example 90 looks normal. new assembly , can't seem grasp @ all. here whole code:
.model tiny code segment org 100h assume cs:code, ds:code start: jmp begin equ 20 b equ 10 c equ 100 d equ 5 result db ? begin: mov al, mov bl, b sub al, bl mov bl, c mul bl mov bl, d div bl mov result, al mov ah, 4ch int 21h code ends end start
i try give explanation.
the problem in old days (and partly still true today) processors didn't differentiate code , data bytes in memory. means byte in .com file can used both code , data. debugger has no clue bytes executed code , bytes used data. byte can used both code , data in tricky cases... program can create data in memory valid code , can jump onto execute it.
in many (but not all) cases debugger find out code , data code analysis can complex debuggers/disassemblers don't have such code flow analyzer. reason pick offset in file/memory (this current instruction pointer) , starting offset decode series of consecutive bytes assembly instructions serially without following jmp
instructions until screen of debugger filled enough number of disassembled lines. dumb disassemblers/debuggers don't care whether disassembled bytes used instructions or data in program, treat them instructions.
if debugging program , debugger stops @ breakpoint takes current instruction pointer , performs dumb disassembly again starting offset primitive "fill debugger screen" method.
this serial disassembly of consecutive bytes simple method works of time. if serially decode non-jmp
instructions follow each other can sure processor execute them in order. however, once reach , decode jmp
instruction can't sure following bytes valid code. can try decode them instructions hoping there no data mixed middle of code (and yes, in cases there no data after jmp
(or similar control flow instruction), why debuggers give dumb disassembly "possibly useful prediction"). in fact, of code full of conditional jumps , disassembling bytes after them code useful debugger. having data in middle of code after jump instruction quite rare, can treat edge case.
let's assume have simple .com program jumps on data , exists int 20h
:
jmp start db 90h start: int 20h
the disassembler tell following disassembling starting offset 0000:
--> 0000 eb 01 jmp short 0003 0002 90 nop 0003 cd 20 int 20h
cool, looks our asm source code... let's change program bit: let's change data...
jmp start db cdh start: int 20h
now the disassembler show this:
--> 0000 eb 01 jmp short 0003 0002 cd cd int cdh 0004 20 ...... whatever...
the problem instructions consist of more 1 byte , debugger doesn't care whether bytes represent code or data you. in above example if disassembler serially disassembles bytes offset 0000 till end of program (including data) 1 byte data disassemble 2 byte instruction ("stealing" first byte of actual code) next instruction debugger tries disassemble come @ offset 0004 instead of 0003 jmp
jump. in first example didn't have such problem because data disassembled 1 byte instruction , accidentally after disassembling data part of program next instruction disassemble debugger @ offset 0003 target of jmp
.
however debugger shows in case fortunately not happen when program gets executed. executing 1 instruction program jump offset 0003 , debugger dumb disassembly again time starting offset 0003 in middle of instruction in previous incorrect disassembly...
let's debug second example program , execute instruction in one-by-one. when start program instruction pointer == 0000 debugger shows this:
--> 0000 eb 01 jmp short 0003 0002 cd cd int cdh 0004 20 ...... whatever...
however when trigger "step" command execute 1 instruction instruction pointer (ip) changes 0003 , debugger performs "dumb disassembling" again offset 0003 till debugger screen filled see this:
--> 0003 cd 20 int 20h 0005 ...... whatever...
conclusion: if have dumb disassemblers , mix data middle of code (with jmp
s around data) dumb disassembler treat data code , may cause "minor" issue you've encountered.
an advanced disassembler flow analysis (like ida pro) disassembling following jump instructions. after disassembling jmp
@ offset 0000 find out next instruction disassemble target of jmp
@ 0003 , disassemble int 20h
next step. mark db cdh
byte @ offset 0002 data.
additional explanation:
as have noticed instruction in (the quite outdated) 8086 instruction set can anywhere between 1-6 bytes long jmp
or call
can jump anywhere in memory byte granularity. length of instruction can determined first 1 or 2 bytes of instruction. bytes "stick together" instruction when processor targets first byte of instruction special ip (instruction pointer register) , tries execute bytes @ given offset. let's see tricky example: have bytes eb ff 26 05 00 03 00 in memory @ offset 0000 , execute step-by-step.
--> 0000 eb ff jmp short 0001 0002 26 05 00 03 es: add ax, 300h 0006 00 ...... whatever...
the processor instruction pointer (ip) points offset 0000 decodes instruction , bytes there "stick instruction" time of execution. (the processor performs instruction decoding @ 0000.) since first byte eb knows instruction length 2 bytes. debugger knows decodes instruction , generates additional buggy disassembly based on incorrect assumption @ point processor execute instruction @ offset 0002, , @ offset 0006, etc... see isn't true, processor stick bytes instructions @ quite different offsets.
as see tricky byte code contains jmp
jumps offset 0001 in middle of executed jmp
instruction itself!!! isn't problem @ all. processor doesn't care , happily jumps offset 0001 next step try decode instruction (or "stick bytes") there. let's see kind of instruction processor find @ 0001:
--> 0001 ff 26 05 00 jmp word ptr [5] 0005 03 00 add ax, word ptr [bx+si]
as see have our next instruction @ 0001 , debugger shows garbage disassembly @ offset 0005 based on false assumption processor offset @ point...
the instruction @ 0001 tells processor pick word offset 0005 , interpret offset jump there. see value of word ptr [5]
3 (as little endian 16 bit value) processor puts 3 ip register (jumps 0003). let's see finds @ offset 0003:
--> 0003 05 00 03 add ax, 300h
it difficult show disassembly tricky byte code eb ff 26 05 00 03 00 in style of the debugger because actual instructions executed processor in overlapping memory areas. first processor executed bytes 0000-0001, 0001-0004, , 0003-0005.
in newer risc architectures length of instructions fix , have on aligned memory areas , isn't possible jump anywhere job of debugger easier in case of x86.
Comments
Post a Comment