Why is my assembly code much slower than the C implementation

I am learning assembly. So I wrote a routine that returns the square root of its input if the input is non-negative, and it returns 0 otherwise.

I have implemented the routine in both assembly and C, I would like to understand why my C routines compiled with -O2 are much faster than my assembly routine. The disassembled code for the C routines look slightly more complex than my assembly routine, so I don't understand where I am going wrong.

The assembly routine (srt.asm) :

global srtsection .textsrt:pxor xmm1,xmm1comisd xmm0,xmm1jbe  Psqrtsd xmm0,xmm0retqP:  pxor xmm0,xmm0retq

I am compiling the above as

nasm -g -felf64 srt.asm

The C routines (srtc.c)

#include <stdio.h>#include <math.h>#include <time.h>extern double srt(double);double srt1(double x){    return sqrt( (x > 0) * x );}double srt2(double x){    if( x > 0) return sqrt(x);    return 0;}int main(void){    double v = 0;    clock_t start;    clock_t end;    double niter = 2e8;    start = clock();    v = 0;    for( double i = 0; i < niter; i++ ) {        v += srt(i);    }    end = clock();    printf("time taken srt = %f v=%g\n", (double) (end - start)/CLOCKS_PER_SEC,v);    start = clock();    v = 0;    for( double i = 0; i < niter; i++ ) {        v += srt1(i);    }    end = clock();    printf("time taken srt1 = %f v=%g\n", (double) (end - start)/CLOCKS_PER_SEC,v);    start = clock();    v = 0;    for( double i = 0; i < niter; i++ ) {        v += srt2(i);    }    end = clock();    printf("time taken srt2 = %f v=%g\n", (double) (end - start)/CLOCKS_PER_SEC,v);    return 0;}

The above is compiled as

gcc -g -O2 srt.o -o srtc srtc.c -lm

The output of the program is

time taken srt = 0.484375 v=1.88562e+12time taken srt1 = 0.312500 v=1.88562e+12time taken srt2 = 0.312500 v=1.88562e+12

So my assembly routine is significantly slower.

The disassembled C code is

Disassembly of section .text:0000000000000000 <srt1>:   0:   f3 0f 1e fa             endbr64    4:   66 0f ef c9             pxor   xmm1,xmm1   8:   66 0f 2f c1             comisd xmm0,xmm1   c:   77 04                   ja     12 <srt1+0x12>   e:   f2 0f 59 c1             mulsd  xmm0,xmm1  12:   66 0f 2e c8             ucomisd xmm1,xmm0  16:   66 0f 28 d0             movapd xmm2,xmm0  1a:   f2 0f 51 d2             sqrtsd xmm2,xmm2  1e:   77 05                   ja     25 <srt1+0x25>  20:   66 0f 28 c2             movapd xmm0,xmm2  24:   c3                      ret      25:   48 83 ec 18             sub    rsp,0x18  29:   f2 0f 11 54 24 08       movsd  QWORD PTR [rsp+0x8],xmm2  2f:   e8 00 00 00 00          call   34 <srt1+0x34>  34:   f2 0f 10 54 24 08       movsd  xmm2,QWORD PTR [rsp+0x8]  3a:   48 83 c4 18             add    rsp,0x18  3e:   66 0f 28 c2             movapd xmm0,xmm2  42:   c3                      ret      43:   66 66 2e 0f 1f 84 00    data16 nop WORD PTR cs:[rax+rax*1+0x0]  4a:   00 00 00 00   4e:   66 90                   xchg   ax,ax0000000000000050 <srt2>:  50:   f3 0f 1e fa             endbr64   54:   66 0f ef c9             pxor   xmm1,xmm1  58:   66 0f 2f c1             comisd xmm0,xmm1  5c:   66 0f 28 d1             movapd xmm2,xmm1  60:   77 0e                   ja     70 <srt2+0x20>  62:   66 0f 28 c2             movapd xmm0,xmm2  66:   c3                      ret      67:   66 0f 1f 84 00 00 00    nop    WORD PTR [rax+rax*1+0x0]  6e:   00 00   70:   66 0f 2e c8             ucomisd xmm1,xmm0  74:   66 0f 28 d0             movapd xmm2,xmm0  78:   f2 0f 51 d2             sqrtsd xmm2,xmm2  7c:   76 e4                   jbe    62 <srt2+0x12>  7e:   48 83 ec 18             sub    rsp,0x18  82:   f2 0f 11 54 24 08       movsd  QWORD PTR [rsp+0x8],xmm2  88:   e8 00 00 00 00          call   8d <srt2+0x3d>  8d:   f2 0f 10 54 24 08       movsd  xmm2,QWORD PTR [rsp+0x8]  93:   48 83 c4 18             add    rsp,0x18  97:   66 0f 28 c2             movapd xmm0,xmm2  9b:   c3                      ret

Why is my assembly code much slower than the C implementation

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Program RSUSR003 Reports "Security violation" in SM21 system log

[MP3] Texzy Ft Dr. Ritzy –“Leg Over” (Prod. @DrRitzy & @KezzyKlef)

CALVIN ESSIX Arrested by Miami-Dade County Corrections on Feb 14, 2017

PDFファイルの「名前を付けて保存」ができない

Re: 古いPDFが開けない

Outlook のコマンドラインスイッチと初期化される情報について

Re: Subqueries in ABAP CDS Views

Panini (Spain) - Adrenalyn XL LaLiga Santander 2022-23 (07) - Platinum Pocket...

Call of Duty Black Ops 3 Compatibility Pack 1

The Nightmare Before Christmas 1993 3D HSBS MULTISUBS 1080p BluRay x264...

Cleethorpes pair jailed for savage attack on man in street

Riley County Arrest Report Tuesday April 30

Telangana State MP MLA Mobile Numbers Full Information

Bureau of Internal Revenue: Regional Offices (Directory)

The Mother and the Murderer: Woman confronts son’s killer in prison

Outlook 2010 で「予定表」と「Calendar」の二つの予定表が作成される問題

AOMEI Backupper is in progress, please wait

Union County Arrests and Mugshots 09-05-2019

Dhadak Title Song Lyrics Translation | Tere Naam ki Koi Dhadak Hai Na