assembly - Why are disassembled data becoming instructions? -


i need understand happens in moment when fragment of code "happens": "jmp begin". understand .com file can 64kb want put in 1 segment. need jmp if want put variables. when search it, many guides in comment jmp begin skip data , nothing else. , here question: happens in moment:

enter image description here

it appears runs this

        mov     al,         mov     bl, b         sub     al, bl 

but can't understand why looks in turbo debugger. when change starting value of result ? greater 0 changes else , when change example 90 looks normal. new assembly , can't seem grasp @ all. here whole code:

            .model tiny  code        segment              org    100h             assume cs:code, ds:code  start:                 jmp     begin               equ     20 b               equ     10 c               equ     100 d               equ     5 result          db      ?   begin:              mov     al,             mov     bl, b             sub     al, bl             mov     bl, c             mul     bl             mov     bl, d             div     bl                           mov     result, al             mov     ah, 4ch             int     21h  code        ends             end             start 

i try give explanation.

the problem in old days (and partly still true today) processors didn't differentiate code , data bytes in memory. means byte in .com file can used both code , data. debugger has no clue bytes executed code , bytes used data. byte can used both code , data in tricky cases... program can create data in memory valid code , can jump onto execute it.

in many (but not all) cases debugger find out code , data code analysis can complex debuggers/disassemblers don't have such code flow analyzer. reason pick offset in file/memory (this current instruction pointer) , starting offset decode series of consecutive bytes assembly instructions serially without following jmp instructions until screen of debugger filled enough number of disassembled lines. dumb disassemblers/debuggers don't care whether disassembled bytes used instructions or data in program, treat them instructions.

if debugging program , debugger stops @ breakpoint takes current instruction pointer , performs dumb disassembly again starting offset primitive "fill debugger screen" method.

this serial disassembly of consecutive bytes simple method works of time. if serially decode non-jmp instructions follow each other can sure processor execute them in order. however, once reach , decode jmp instruction can't sure following bytes valid code. can try decode them instructions hoping there no data mixed middle of code (and yes, in cases there no data after jmp (or similar control flow instruction), why debuggers give dumb disassembly "possibly useful prediction"). in fact, of code full of conditional jumps , disassembling bytes after them code useful debugger. having data in middle of code after jump instruction quite rare, can treat edge case.

let's assume have simple .com program jumps on data , exists int 20h:

    jmp start     db  90h start:     int 20h 

the disassembler tell following disassembling starting offset 0000:

--> 0000   eb 01        jmp short 0003     0002   90           nop     0003   cd 20        int 20h 

cool, looks our asm source code... let's change program bit: let's change data...

    jmp start     db  cdh start:     int 20h 

now the disassembler show this:

--> 0000   eb 01        jmp short 0003     0002   cd cd        int cdh     0004   20 ...... whatever... 

the problem instructions consist of more 1 byte , debugger doesn't care whether bytes represent code or data you. in above example if disassembler serially disassembles bytes offset 0000 till end of program (including data) 1 byte data disassemble 2 byte instruction ("stealing" first byte of actual code) next instruction debugger tries disassemble come @ offset 0004 instead of 0003 jmp jump. in first example didn't have such problem because data disassembled 1 byte instruction , accidentally after disassembling data part of program next instruction disassemble debugger @ offset 0003 target of jmp.

however debugger shows in case fortunately not happen when program gets executed. executing 1 instruction program jump offset 0003 , debugger dumb disassembly again time starting offset 0003 in middle of instruction in previous incorrect disassembly...

let's debug second example program , execute instruction in one-by-one. when start program instruction pointer == 0000 debugger shows this:

--> 0000   eb 01        jmp short 0003     0002   cd cd        int cdh     0004   20 ...... whatever... 

however when trigger "step" command execute 1 instruction instruction pointer (ip) changes 0003 , debugger performs "dumb disassembling" again offset 0003 till debugger screen filled see this:

--> 0003   cd 20      int 20h     0005   ...... whatever... 

conclusion: if have dumb disassemblers , mix data middle of code (with jmps around data) dumb disassembler treat data code , may cause "minor" issue you've encountered.

an advanced disassembler flow analysis (like ida pro) disassembling following jump instructions. after disassembling jmp @ offset 0000 find out next instruction disassemble target of jmp @ 0003 , disassemble int 20h next step. mark db cdh byte @ offset 0002 data.

additional explanation:

as have noticed instruction in (the quite outdated) 8086 instruction set can anywhere between 1-6 bytes long jmp or call can jump anywhere in memory byte granularity. length of instruction can determined first 1 or 2 bytes of instruction. bytes "stick together" instruction when processor targets first byte of instruction special ip (instruction pointer register) , tries execute bytes @ given offset. let's see tricky example: have bytes eb ff 26 05 00 03 00 in memory @ offset 0000 , execute step-by-step.

--> 0000   eb ff        jmp short 0001     0002   26 05 00 03  es: add ax, 300h     0006   00 ...... whatever... 

the processor instruction pointer (ip) points offset 0000 decodes instruction , bytes there "stick instruction" time of execution. (the processor performs instruction decoding @ 0000.) since first byte eb knows instruction length 2 bytes. debugger knows decodes instruction , generates additional buggy disassembly based on incorrect assumption @ point processor execute instruction @ offset 0002, , @ offset 0006, etc... see isn't true, processor stick bytes instructions @ quite different offsets.

as see tricky byte code contains jmp jumps offset 0001 in middle of executed jmp instruction itself!!! isn't problem @ all. processor doesn't care , happily jumps offset 0001 next step try decode instruction (or "stick bytes") there. let's see kind of instruction processor find @ 0001:

--> 0001   ff 26 05 00  jmp word ptr [5]     0005   03 00        add ax, word ptr [bx+si] 

as see have our next instruction @ 0001 , debugger shows garbage disassembly @ offset 0005 based on false assumption processor offset @ point...

the instruction @ 0001 tells processor pick word offset 0005 , interpret offset jump there. see value of word ptr [5] 3 (as little endian 16 bit value) processor puts 3 ip register (jumps 0003). let's see finds @ offset 0003:

--> 0003   05 00 03     add ax, 300h 

it difficult show disassembly tricky byte code eb ff 26 05 00 03 00 in style of the debugger because actual instructions executed processor in overlapping memory areas. first processor executed bytes 0000-0001, 0001-0004, , 0003-0005.

in newer risc architectures length of instructions fix , have on aligned memory areas , isn't possible jump anywhere job of debugger easier in case of x86.


Comments

Popular posts from this blog

Load Balancing in Bluemix using custom domain and DNS SRV records -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -

python - Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] error -