linodh linodh - 16 days ago 5
C# Question

How to create an MS-DOS txt file using C# (CP_OEM vs CP_ACP)

I am trying to create a flat file for a legacy system and they mandates that the data to be presented in TextEncoding of MS DOS .txt file (Text Document - MS-DOS Format CP_OEM). I am a bit confused between files generated by using UTF8Encoding class in C# and I think it produce a file in default txt file (Encoding: CP_ACP).

I think Encoding names CP_ACP , Winodows and ANSI refers to same thing and Windows default is ANSI and it will omit any unicode character information.

If I use UTF8Encoding class in C# library to create a text file, is it going to be in the MS DOS txt file format?

Does the following c# code to get bytes from a text file using UTF8 encoding make up an MS DOS format file?

byte[] title = new UTF8Encoding(true).GetBytes("New Text File");


I read the following posts to check on my information but nothing conclusive yet.
https://blogs.msdn.microsoft.com/oldnewthing/20120220-00?p=8273

https://blog.mh-nexus.de/2015/01/character-encoding-confusion

https://blogs.msdn.microsoft.com/oldnewthing/20090115-00?p=19483

Answer

You can use the File.ReadXY(String, Encoding) and File.WriteXY(String, String[], Encoding) methods, where XY is either AllLines, Lines or AllText working with string[], IEnumerable<string> and string respectively.

MS-DOS uses different code pages. Probably the code page 850 "Western European / Latin-1" or code page 437 "OEM-US / OEM / PC-8 / DOS Latin US" (as @HansPassant suggests) will be okay. If you are not sure, which code page you need, create example files containing letters like ä, ö, ü, é, è, ê, ç, à or greek letters and see whether they work. If you don't use such letters or other special characters, then the code page is not very critical.

File.WriteAllText(path, "Hello World", new Encoding(850));

The character codes from 0 to 127 (7-bit) are the same for all MS-DOS code pages, for ANSI and UTF-8. UTF files are sometimes introduced with a BOM (byte order mark).

MS-DOS knows only 8-bit characters. The codes 128 to 255 differ for the different national code pages.

See: File Class, Encoding Class and Wikipedia: Code Page.