Wkhan Wkhan - 1 month ago 21
C Question

C to Y86 Assembly

I am trying to understand this code in order to translate this into y86 assembly.
Can someone please answer questions written in parenthesis.

/* This function copy_block - Copy src to dest and return xor checksum of src */
long copy_block(long *src, long *dest, long len) //(first two input arguments will be stored in %RDI and %RSI where will the be third argument stored?)
{
long result = 0;
while (len > 0) {
long val = *src++; //(is this dereferencing first and then adding or opposite?)
*dest++ = val; //(what is this line doing?)
result ^= val; //(what is checksum and why are we XORing val with sum of previous XORed values?)
len--;
}
return result;
}


Sample input

.align 8
# Source block
src:
.quad 0x00a
.quad 0x0b0
.quad 0xc00
# Destination block
dest:
.quad 0x111 #(why are there three sources and just one dest?)

Answer

Y86 is not a "real" architecture, but one designed for educational purposes, I believe. I found references to it here. It does seem to use AT&T syntax (i.e. source,target operand order).

first two input arguments will be stored in %RDI and %RSI where will the be third argument stored?

Check the Y86 ABI, probably in your lecture notes. If it matches SYSV AMD64 ABI, it'd be the rdx register (see the last row in the table).

If you have a C compiler that produces Y86 assembly, then write a trivial program that takes three parameters, and returns a trivial combination -- say, three ints that it sums together and returns the result. Then see which registers are used to produce the result.

long val = *src++;
*dest++ = val;

Post-increment happens after the expression is evaluated. So, the above are equivalent to

long val = *src;
src++;

and

*dest = val;
dest++;

what is checksum?

It is a small number used to detect changes in chunks of data, due to transmission errors or similar. See the Wikipedia article on checksum for details.

why are we XORing val with sum of previous XORed values?

There are many different ways to calculate a checksum; this particular way is calculated using XOR.

XOR was probably chosen for simplicity. Real-world checksum algorithms use either lookup tables, or bit shifts, XORs, and (unsigned modulo) addition and multiplication. Those would involve a lot more work to translate to assembly by hand.

With checksums, the idea is that you send or store a chunk of data, and the checksum. The recipient or reader can then recalculate the checksum, and compare it to the stored checksum. If the two checksums differ, the data may have an error.

Checksums are not foolproof. It is quite possible to have so many errors that although the two are clearly different, their checksums match.

Checksums also do not identify where or what the error is. Error-identifying "checksums" are more properly called error correction codes. They are commonly used in e.g. CD, DVD, and Blu-ray media; most hard drives also internally maintain checksums of the data they've written, so that they can detect errors when reading it back. See the Wikipedia article on Error detection and correction for further information on checksums and error correction codes.

why are there three sources and just one dest?

The function copies len quads from src to dest, so source and destination should be of equal length.

The sample input could be a bug, but I think it is more likely that somewhere in your lecture notes is mentioned that the sample input is assumed to be followed by sufficient number of undefined bytes, i.e. that the number of quads following the final label is not indicative of anything. In other words, that the lecturer/TA/whoever could have just as well supplied the sample input as

    .align 8

# Source block
src:
    .quad 0x00a
    .quad 0x0b0 
    .quad 0xc00

 # Destination block
 dest:

as the contents of the destination would be overwritten anyway, and its contents do not matter. The Y86 assembly does not seem to support stuff like .quad ? directives, which would make the memory reservation clear.

Comments