jeffer son jeffer son - 1 year ago 64
C++ Question

Why is this C++ code faster than assembly

I wrote these 2 solutions for Project Euler Q14, in assembly and in C++. They are the same identical brute force approach. The assembly solution assembled with

nasm -felf64 p14.asm && gcc p14.o -o p14

The C++ compiled with

g++ p14.cpp -o p14

Assembly p14.asm

section .data
fmt db "%d", 10, 0

global main
extern printf

section .text

mov rcx, 1000000
xor rdi, rdi ; max i
xor rsi, rsi ; i

dec rcx
xor r10, r10 ; count
mov rax, rcx

test rax, 1
jpe even

mov rbx, 3
mul rbx
inc rax
jmp c1

mov rbx, 2
xor rdx, rdx
div rbx

inc r10
cmp rax, 1
jne l2

cmp rdi, r10
cmovl rdi, r10
cmovl rsi, rcx

cmp rcx, 2
jne l1

mov rdi, fmt
xor rax, rax
call printf

C++ p14.cpp

#include <iostream>

using namespace std;

int sequence(long n) {
int count = 1;
while (n != 1) {
if (n % 2 == 0)
n /= 2;
n = n*3 + 1;


return count;

int main() {
int max = 0, maxi;
for (int i = 999999; i > 0; --i) {
int s = sequence(i);
if (s > max) {
max = s;
maxi = i;

cout << maxi << endl;

I know about the compiler optimizations to improve speed and everything, but I don't see many ways to optimize my assembly solution further (speaking programmatically not mathematically).

The C++ code has modulus every term and division every even term, where assembly is only one division per even term.

But the assembly is taking on average 1 second longer than the C++ solution. Why is this? Asking out of mainly curiosity.

Answer Source

Just to add, C++ program is translated to assembly program, during the generation of machine code from source code. So, It would be virtually wrong to say that assembly is slower than C++. Moreover, to add, binary code generated is different from compiler to compiler. So, It may happen that a smart C++ compiler may produce optimum and efficient binary code than a dump assembler. However, What I believe is your profiling methodology may have certain flaws. So following are the general guidelines for profiling :-

  1. Make sure that Your system is in normal state. For that stop all the running processes(applications), that you have started. Also stop the processes, which you think is doing CPU intensive job(or polling over network).
  2. Your datasize must be greater in size.
  3. Your test must be running for something more than 5-10 seconds.
  4. You must not rely on just one sample. You have to perform your test for like n no of times. Collect result and calculate mean or median of the result.