Quantcast
Channel: Why is my assembly code much slower than the C implementation - Stack Overflow
Viewing all articles
Browse latest Browse all 2

Why is my assembly code much slower than the C implementation

$
0
0

I am learning assembly. So I wrote a routine that returns the square root of its input if the input is non-negative, and it returns 0 otherwise.

I have implemented the routine in both assembly and C, I would like to understand why my C routines compiled with -O2 are much faster than my assembly routine. The disassembled code for the C routines look slightly more complex than my assembly routine, so I don't understand where I am going wrong.

The assembly routine (srt.asm) :

global srtsection .textsrt:pxor xmm1,xmm1comisd xmm0,xmm1jbe  Psqrtsd xmm0,xmm0retqP:  pxor xmm0,xmm0retq

I am compiling the above as

nasm -g -felf64 srt.asm

The C routines (srtc.c)

#include <stdio.h>#include <math.h>#include <time.h>extern double srt(double);double srt1(double x){    return sqrt( (x > 0) * x );}double srt2(double x){    if( x > 0) return sqrt(x);    return 0;}int main(void){    double v = 0;    clock_t start;    clock_t end;    double niter = 2e8;    start = clock();    v = 0;    for( double i = 0; i < niter; i++ ) {        v += srt(i);    }    end = clock();    printf("time taken srt = %f v=%g\n", (double) (end - start)/CLOCKS_PER_SEC,v);    start = clock();    v = 0;    for( double i = 0; i < niter; i++ ) {        v += srt1(i);    }    end = clock();    printf("time taken srt1 = %f v=%g\n", (double) (end - start)/CLOCKS_PER_SEC,v);    start = clock();    v = 0;    for( double i = 0; i < niter; i++ ) {        v += srt2(i);    }    end = clock();    printf("time taken srt2 = %f v=%g\n", (double) (end - start)/CLOCKS_PER_SEC,v);    return 0;}

The above is compiled as

gcc -g -O2 srt.o -o srtc srtc.c -lm

The output of the program is

time taken srt = 0.484375 v=1.88562e+12time taken srt1 = 0.312500 v=1.88562e+12time taken srt2 = 0.312500 v=1.88562e+12

So my assembly routine is significantly slower.

The disassembled C code is

Disassembly of section .text:0000000000000000 <srt1>:   0:   f3 0f 1e fa             endbr64    4:   66 0f ef c9             pxor   xmm1,xmm1   8:   66 0f 2f c1             comisd xmm0,xmm1   c:   77 04                   ja     12 <srt1+0x12>   e:   f2 0f 59 c1             mulsd  xmm0,xmm1  12:   66 0f 2e c8             ucomisd xmm1,xmm0  16:   66 0f 28 d0             movapd xmm2,xmm0  1a:   f2 0f 51 d2             sqrtsd xmm2,xmm2  1e:   77 05                   ja     25 <srt1+0x25>  20:   66 0f 28 c2             movapd xmm0,xmm2  24:   c3                      ret      25:   48 83 ec 18             sub    rsp,0x18  29:   f2 0f 11 54 24 08       movsd  QWORD PTR [rsp+0x8],xmm2  2f:   e8 00 00 00 00          call   34 <srt1+0x34>  34:   f2 0f 10 54 24 08       movsd  xmm2,QWORD PTR [rsp+0x8]  3a:   48 83 c4 18             add    rsp,0x18  3e:   66 0f 28 c2             movapd xmm0,xmm2  42:   c3                      ret      43:   66 66 2e 0f 1f 84 00    data16 nop WORD PTR cs:[rax+rax*1+0x0]  4a:   00 00 00 00   4e:   66 90                   xchg   ax,ax0000000000000050 <srt2>:  50:   f3 0f 1e fa             endbr64   54:   66 0f ef c9             pxor   xmm1,xmm1  58:   66 0f 2f c1             comisd xmm0,xmm1  5c:   66 0f 28 d1             movapd xmm2,xmm1  60:   77 0e                   ja     70 <srt2+0x20>  62:   66 0f 28 c2             movapd xmm0,xmm2  66:   c3                      ret      67:   66 0f 1f 84 00 00 00    nop    WORD PTR [rax+rax*1+0x0]  6e:   00 00   70:   66 0f 2e c8             ucomisd xmm1,xmm0  74:   66 0f 28 d0             movapd xmm2,xmm0  78:   f2 0f 51 d2             sqrtsd xmm2,xmm2  7c:   76 e4                   jbe    62 <srt2+0x12>  7e:   48 83 ec 18             sub    rsp,0x18  82:   f2 0f 11 54 24 08       movsd  QWORD PTR [rsp+0x8],xmm2  88:   e8 00 00 00 00          call   8d <srt2+0x3d>  8d:   f2 0f 10 54 24 08       movsd  xmm2,QWORD PTR [rsp+0x8]  93:   48 83 c4 18             add    rsp,0x18  97:   66 0f 28 c2             movapd xmm0,xmm2  9b:   c3                      ret    

Viewing all articles
Browse latest Browse all 2

Trending Articles