Scot Matson Scot Matson - 8 days ago 7
Linux Question

Disassembling and Reassembling, how to properly pipeline this in the terminal?

I'm using the eicar.com file and playing around with reverse engineering tools. I'd like to be able to disassemble and reassemble this file. I get close but there are still a few problems that I cannot figure out.

This is the original

eicar.com
ascii file.

X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*


Using udcli
udcli -noff -nohex eicar.com > stage1.asm
I end up with this x86 assembly

pop eax
xor eax, 0x2550214f
inc eax
inc ecx
push eax
pop ebx
xor al, 0x5c
push eax
pop edx
pop eax
xor eax, 0x5e502834
sub [edi], esi
inc ebx
inc ebx
sub [edi], esi
jge 0x40
inc ebp
dec ecx
inc ebx
inc ecx
push edx
sub eax, 0x4e415453
inc esp
inc ecx
push edx
inc esp
sub eax, 0x49544e41
push esi
dec ecx
push edx
push ebp
push ebx
sub eax, 0x54534554
sub eax, 0x454c4946
and [eax+ecx*2], esp
sub ecx, [eax+0x2a]


Finally, putting it back together with
nasm
using this command,
nasm stage1.asm -o stage2
I end up with...

fXf5O!P%f@fAfPf[4\fPfZfXf54(P^fg)7fCfCfg)7^O<8d>^R^@fEfIfCfAfRf- STANfDfAfRfDf-ANTIfVfIfRfUfSf-TESTf-FILEfg!$Hfg+H*


In this case I'm starting with an ASCII file and end up with a bin file that holds a lot of extra garbage.

What am I missing here? How do I end up with the original ASCII string and have the proper file type?

EDIT:
Per @Ross Ridge's suggestion, he noted that I was disassembling a 16-bit file as a 32-bit one, this has successfully cleaned up the string but he file type however is still incorrectly output as binary.

First fix:
udcli -16 -noff -nohex eicar.com > stage1.asm
to obtain proper output string.

Results in
X5O!P%@AP[4\PZX54(P^)7CC)7^O<8d>"^@EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*


Still a little garbage data not present in the original but very close.

Answer

In general you can't reassemble the output of a dissembler back into the exact the same binary file as the original. There is often more than one way to assemble a given assembly instruction into machine code. As far your ultimate goal of understanding the code you're trying to do this with it's also not very helpful. Even if you do get something that you can assemble back into the original code, it's extremely unlikely you'll get something you can modify and assemble into code that works.

To illustrate this I've provided my own "disassembly" of the eicar.com file, one that allows it to be modified to a limited extent. You can modify the string it prints, so long as the message isn't too long and does't contain any dollar sign $ characters. You should be able to modify the string while still keeping the output consisting of only of printable ASCII characters, assuming you only put printable ASCII characters in the string.

    BITS    16
    ORG     0x100

ascii_shift EQU 0x097b

start:
    pop     ax
    xor     ax, 0x2000 | (skip - start + 0x100) | 0x000f
    push    ax
    and     ax, 0x4000 | (skip - start + 0x100)
    push    ax
    pop     bx
    xor     al, (msg - start) ^ (skip - start)
    push    ax
    pop     dx
    pop     ax
    xor     ax, (0x2000 | (skip - start + 0x100) | 0x000f) ^ ascii_shift
    push    ax
    pop     si
    sub     [bx], si
    inc     bx
    inc     bx
    sub     [bx], si
    jnl     skip

msg:
    DB      'EICAR-STANDARD-ANTIVIRUS-TEST-FILE!'
    DB      '$'

%if ($ - msg) < 0x21
    TIMES   0x21 - ($ - msg) DB '$'
%endif

skip:
    DW      0x21cd + ascii_shift
    DW      0x20cd + ascii_shift

%if skip - msg > 0x7e
%error  'msg too long'
%endif

I won't explain how the code works, but I'll give you one hint: MS-DOS pushes a 16-bit 0 value on the stack at the start execution of a .COM format executable.