I suppose I am using LWP::Simple::get incorrectly, but I am at my wit's end as to how to correct it. My first try was a simple
perl -e 'use LWP::Simple; print get("http://localhost/wtf.txt");'
Content-Type: text/plain; charset=utf-8
This is a bug in your code.
LWP::Simple::get doesn't return the original bytes (in some encoding), it returns decoded text (i.e. Unicode). (Which makes sense, because if it returned bytes, you wouldn't know how to decode them because
get doesn't tell you the encoding.)
get("http://localhost/wtf.txt") returns a string containing the codepoint U+00f6.
If you want to get UTF-8 output, do
binmode STDOUT, ":encoding(UTF-8)"; first. That ensures all text written to STDOUT is encoded as UTF-8.
On the other hand, if you want to ignore encodings and just write the bytes that you received from the web server, then
LWP::Simple is the wrong choice. Use
LWP::UserAgent instead and call
The truncation in your second example is probably due to
unpack, which don't make sense on Unicode strings (they're meant for byte strings, i.e. all codepoints <= 255).