(I looked at previous posts and tried what they suggested but to no avail.)
I'm attempting to read in a file containing only Japanese characters. Here is what that file looks like:
わたし わ エドワド オ’ハゲン です。 これ は なん です か？
When I attempt to read it, nothing is displayed as output in the console and when debugging, the read buffer is just garbage. Here is the function I am using to read in the file:
wchar_t* ReadTextFileW(wchar_t* filePath, size_t numBytesToRead, size_t maxBufferSize, const wchar_t* mode, int seekOffset, int seekOrigin)
size_t numItems = 0;
size_t bufferSize = 0;
wchar_t* buffer = NULL;
FILE* file = NULL;
//Ensure the filePath does NOT lead to a device.
if (IsPathADevice(filePath) == false)
//0 indicates to read as much as possible (the max specified).
if (numBytesToRead == 0)
numBytesToRead = maxBufferSize;
if (filePath != NULL && mode != NULL)
//Ensure there are no errors in opening the file.
if (_wfopen_s(&file, filePath, mode) == 0)
//Set the cursor location (back to the beginning of the file by default).
if (fseek(file, seekOffset, seekOrigin) != 0)
//Error: Could not change file cursor position.
//Calculate the size of the buffer in bytes.
bufferSize = numBytesToRead * sizeof(wchar_t);
//Create the buffer to store file data in.
buffer = (wchar_t*)_aligned_malloc(bufferSize, BYTE_ALIGNMENT);
//Ensure the buffer was allocated.
if (buffer == NULL)
//Error: Buffer could not be allocated.
//Clear any garbage data in the buffer.
memset(buffer, 0, bufferSize);
//Read the data from the file.
numItems = fread_s(buffer, bufferSize, sizeof(wchar_t), numBytesToRead, file);
//Check for read errors.
if (numItems <= 0)
//Error: File could not be read.
//Ensure the file is closed without errors.
if (fclose(file) != 0)
//Error: File did not close properly.
wchar_t* retVal = ReadTextFileW(L"C:\\jap.txt");
wchar_t* retVal = ShitFuck::ReadTextFileW(L"C:\\jap.txt", 0, 1024, L"r, ccs=UTF-8");
MessageBoxW(NULL, retVal, NULL, 0);
This is an excerpt discussing content on encoding for Japanese language, created using Notepad++ (stated in comments as being used by OP)
Double Byte encodings, also called, by usage, Double Byte Character Set (DBCS)
Some of them preexisted Unicode, and were designed to encode character sets with a large number of characters, mainly found in Far East languages with ideographic or syllabic scripts:
The 2 Bytes Universal Character Set : UCS-2 Big Endian and UCS-2 Little Endian The Japanese Code Page : Shift-JIS ( Windows-932 ) The Chinese Code Pages : Simplified Chinese GB2312 ( Windows-936 ), Traditionnal Chinese Big5 ( Windows-950 ) The Korean Code Pages : Windows 949, EUC-KR
It would appear that Shift-JIS might be the encoding you are trying to read. From here
Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft...
In general, you need to determine the encoding used to create the multi-byte characters in a file, before they can be correctly read back out by a function in C, or any other language. This link may help.