PatB PatB - 7 months ago 35
Perl Question

Pack/unpack binary string in perl

I am attempting to understand a fragment of Perl code. I think its purpose is to make a binary string from an input integer, but in reversed bit order (low bit on the left, high bit on the right). I do not understand what pack/unpack is doing to the input values however; it appears to be incorrect.

Consider this test code:

for (my $i = 0; $i < 16; $i++) {

for (my $j = 0; $j < 16; $j++) {

$x = $i * 16 + $j;
$x = unpack("b8", pack("U", $x));
printf $x;
print " ";
}
print "\n";
}


This produces:

00000000 10000000 01000000 11000000 00100000 10100000 01100000 11100000 00010000 10010000 01010000 11010000 00110000 10110000 01110000 11110000
00001000 10001000 01001000 11001000 00101000 10101000 01101000 11101000 00011000 10011000 01011000 11011000 00111000 10111000 01111000 11111000
00000100 10000100 01000100 11000100 00100100 10100100 01100100 11100100 00010100 10010100 01010100 11010100 00110100 10110100 01110100 11110100
00001100 10001100 01001100 11001100 00101100 10101100 01101100 11101100 00011100 10011100 01011100 11011100 00111100 10111100 01111100 11111100
00000010 10000010 01000010 11000010 00100010 10100010 01100010 11100010 00010010 10010010 01010010 11010010 00110010 10110010 01110010 11110010
00001010 10001010 01001010 11001010 00101010 10101010 01101010 11101010 00011010 10011010 01011010 11011010 00111010 10111010 01111010 11111010
00000110 10000110 01000110 11000110 00100110 10100110 01100110 11100110 00010110 10010110 01010110 11010110 00110110 10110110 01110110 11110110
00001110 10001110 01001110 11001110 00101110 10101110 01101110 11101110 00011110 10011110 01011110 11011110 00111110 10111110 01111110 11111110
01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011
01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011
01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011
01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011 01000011
11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011
11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011
11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011
11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011 11000011


So, what is going on here? It seems that all the 'high ASCII' values (over 128) are incorrectly converted, but despite reading the documentation for pack and unpack I cannot see what is going on here.

Answer

pack's U mode packs it into a UTF-8 character which may or may not be one byte. (The fact that your output begins 110 means that the result is two bytes long, but that's a different story.)

From the documentation:

U - A Unicode character number. Encodes to a character in character mode and UTF-8 (or UTF-EBCDIC in EBCDIC platforms) in byte mode.

You should use the C option to ensure that you only get one byte as a result:

C - An unsigned char (octet) value.

That gives us:

for ( my $i = 0; $i < 16; $i++ ) {

    for ( my $j = 0; $j < 16; $j++ ) {

        $x = $i * 16 + $j;
        $x = unpack("b8", pack("C", $x));
        printf $x;
        print " ";
    }
    print "\n";
}