Use ICU for dealing with your data (or a similar library)
In your own data store, make sure everything is stored in the same encoding
Make sure you are always using your unicode library for mundane tasks like string length, capitalization status, etc. Never use standard library builtins like is_alpha unless that is the definition you want.
I can't say it enough: never iterate over the indices of a string if you care about correctness, always use your unicode library for this.