on my site I allow for direct text file uploads. These files are then stored on the server, and displayed on the website. I use UTF-8 on the site.
Now I run into trouble when people upload non-
You cannot. Welcome to encodings.
Seriously though, files are just binary blobs. The bits and bytes in the file could mean anything at all; it could be images, CAD data or, perhaps, text. It depends on how you interpret the bytes. For text files that specifically means with which encoding you interpret them. There's nothing in the files themselves that tells you the correct encoding, you have to know it. Typically you want to know it from metadata accompanying the file. In the case of random user uploads though, there is no metadata, and/or it wouldn't be reliable. So you cannot "know".
The next step would be to guess, but that is obviously not foolproof. You can rule out certain encodings, for example if a file does not validate as UTF-8 (
mb_check_encoding($data, 'UTF-8') == false), then it cannot be UTF-8. However, any single byte encoding will validate as any other single byte encoding. It's impossible to distinguish ISO-8859-1 from ISO-8859-2 this way, the bytes are equally valid in both. It's just that the characters that show up may not be the ones you want. To detect that automatically you need a statistical language analyser which can tell you that this character probably shouldn't show up in that word for it to be grammatical. Obviously for that to work you need to know the language used in the file, or you need to detect that first… And even then this is hardly foolproof.
The sanest way is to ask the user. Accept the upload, perhaps do some upfront testing on which encodings can be ruled out, then ask the user which of a bunch of possible encodings the file is in. Present them the result, what the file looks like when interpreted as the chosen encoding, let the user confirm that it looks alright. Many decent text editors do this when you open a file with an ambiguous encoding.