UserRR UserRR - 2 months ago 8
C++ Question

C++ int vs long long in 64 bit machine

My computer has 64 bit processor and when I look for

sizeof(int)
,
sizeof(long)
, and
sizeof(long long)
, it turns out that int and long are 32 bits, and long long is 64 bit. I researched the reason, and it appears that popular assumption telling that int in C++ fits machine's word size is wrong. As I understood it is up to compiler to define what will be the size, and mine is Mingw-w64. The reason for my research was understanding that if the usage of types smaller than word size is beneficial for speed(for instance, short vs int) or if it has negative effect. In 32 bit system, one popular opinion is: due to the fact that word size is int, short will be converted into int and it would cause additional bit shifts and etc, thus leading to worse performance. Opposing opinion is that there will be benefit in cache level(I didn't go deep into it), and using short would be usefull for virtual memory economy. So, in addition to confusion between this dilemma, I also face another problem. My system is 64 bit, and it doesn't matter if I use int or short , it still will be less than the word size, and I start thinking that wouldn't it be efficient to use 64 bit long long because it is at the level the system is designed to. Also I read that there is another constraint, which is library(ILP64, LP64) of OS that defines the type sizes. In ILP64 default int is 64 bit in contrast to LP64, would it speed up the program if I use OS with ILP64 support? Once I started to ask which type should I use for speeding up my C++ program, I faced more deep topics in which I have no expertise and some explanations seems to contradict to each other. Can you please explain:

1) If it is best practice to use long long in x64 for achieving maximum performance even for for 1-4 byte data?

2) Trade-off in using a type less than word size(memory win vs additional operations)

3) Does a x64 computer where word&int size is 64 bits, has possibility of processing a short, using 16 bit word size by using so called backward compatibility? Or it must put the 16bit file into 64 bit file, and the fact that it can be done defines the system as backward compatible.

4) Can we force the compiler to make the int 64 bit?

5) How to incorporate ILP64 into PC that uses LP64?

6) What are possible problems of using code adapted to above issues with other compilers, OS's, and architectures(32 bit processor)?

Answer

1) If it is best practice to use long long in x64 for achieving maximum performance even for for 1-4 byte data?

No- and it will probably in fact make your performance worse. For example, if you use 64-bit integers where you could have gotten away with 32-bit integers then you have just doubled the amount of data that must be sent between the processor and memory and the memory is orders of magnitude slower. All of your caches and memory buses will crap out twice as fast.

2) Trade-off in using a type less than word size(memory win vs additional operations)

Generally, the dominant driver of performance in a modern machine is going to be how much data needs to be stored in order to run a program. You are going to see significant performance cliffs once the working set size of your program exceeds the capacity of your registers, L1 cache, L2 cache, L3 cache, and RAM, in that order.

In addition, using a smaller data type can be a win if your compiler is smart enough to figure out how to use your processor's vector instructions (aka SSE instructions). Modern vector processing units are smart enough to cram eight 16-bit short integers into the same space as two 64-bit long long integers, so you can do four times as many operations at once.

3) Does a x64 computer where word&int size is 64 bits, has possibility of processing a short, using 16 bit word size by using so called backward compatibility? Or it must put the 16bit file into 64 bit file, and the fact that it can be done defines the system as backward compatible.

I'm not sure what you're asking here. In general, 64-bit machines are capable of executing 32-bit and 16-bit executable files because those earlier executable files use a subset of the 64-bit machine's potential.

Hardware instruction sets are generally backwards compatible, meaning that processor designers tend to add capabilities, but rarely if ever remove capabilities.

4) Can we force the compiler to make the int 64 bit?

There are fairly standard extensions for all compilers that allow you to work with fixed-bit-size data. For example, the header file stdint.h declares types such as int64_t, uint64_t, etc.

5) How to incorporate ILP64 into PC that uses LP64?

https://software.intel.com/en-us/node/528682

6) What are possible problems of using code adapted to above issues with other compilers, OS's, and architectures(32 bit processor)?

Generally the compilers and systems are smart enough to figure out how to execute your code on any given system. However, 32-bit processors are going to have to do extra work to operate on 64-bit data. In other words, correctness should not be an issue, but performance will be.

But it's generally the case that if performance is really critical to you, then you need to program for a specific architecture and platform anyway.

Clarification Request: Thanks alot! I wanted to clarify question no:1. You say that it is bad for memory. Lets take an example of 32 bit int. When you send it to memory, because it is 64 bit system, for a desired integer 0xee ee ee ee, when we send it won't it become 0x ee ee ee ee+ 32 other bits? How can a processor send 32 bits when the word size is 64 bits? 32 bits are the desired values, but won't it be combined with 32 unused bits and sent this way? If my assumption is true, then there is no difference for memory.

There are two things to discuss here.

First, the situation you discuss does not occur. A processor does not need to "promote" a 32-bit value into a 64-bit value in order to use it appropriately. This is because modern processors have different accessing modes that are capable of dealing with different size data appropriately.

For example, a 64-bit Intel processor has a 64-bit register named RAX. However, this same register can be used in 32-bit mode by referring to it as EAX, and even in 16-bit and 8-bit modes. I stole a diagram from here:

x86_64 registers rax/eax/ax/al overwriting full register contents

0x1122334455667788
================ rax (64 bits)
        ======== eax (32 bits)
            ====  ax (16 bits)
            ==    ah (8 bits)
              ==  al (8 bits)

Between the compiler and assembler, the correct code is generated so that a 32-bit value is handled appropriately.

Second, when we're talking about memory overhead and performance we should be more specific. Modern memory systems are composed of a disk, then main memory (RAM) and typically two or three caches (e.g. L3, L2, and L1). The smallest quantity of data that can be addressed on the disk is call a page, and typically page sizes are usually 4096 bytes (though they don't have to be). The, the smallest quantity of data that can be addressed in memory is called a cache line, which is usually much larger than 32 or 64 bits. On my computer the cache line size is 64 bytes. The processor is the only place where data is actually transferred and addressed at the word level and below.

So if you want to change one 64-bit word in a file that resides on disk, then, on my computer, this actually requires that you load 4096 bytes from the disk into memory, and then 64 bytes from memory into the L3, L2, and L1 caches, and then the processor takes a single 64-bit word from the L1 cache.

The result is that the word size means nothing for memory bandwidth. However, you can fit 16 of those 32-bit integers in the same space you can pack 8 of those 64-bit integers. Or you could even fit 32 16-bit values or 64 8-bit values in the same space. If your program uses a lot of different data values you can significantly improve performance by using the smallest data type necessary.