pthread和printf的C性能差

Antonio R. 发表于 Dev

安东尼奥·R。

我正在用大型数组测试Linux的ac代码以测量线程性能，当线程增加到最大内核数（Intel 4770为8）时，应用程序的伸缩性很好，但这仅用于代码的纯数学部分。

如果我为结果数组添加printf部分，则时间变得太大，即使重定向到文件，也从几秒到几分钟不等，而在printf时，这些数组应仅增加几秒钟。

编码：

（gcc 7.5.0-Ubuntu 18.04）

没有printf循环：

gcc -O3 -m64 exp_multi.c -pthread -lm

使用printf循环：

gcc -DPRINT_ARRAY -O3 -m64 exp_multi.c -pthread -lm

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <pthread.h>
#define MAXSIZE 1000000
#define REIT 100000
#define XXX -5
#define num_threads 8
static double xv[MAXSIZE];
static double yv[MAXSIZE];

/*    gcc -O3 -m64 exp_multi.c -pthread -lm    */


void* run(void *received_Val){
    int single_val = *((int *) received_Val);
    int r;
    int i;
    double p;

  for (r = 0; r < REIT; r++) {
    p = XXX + 0.00001*single_val*MAXSIZE/num_threads;

    for (i = single_val*MAXSIZE/num_threads; i < (single_val+1)*MAXSIZE/num_threads; i++) {

      xv[i]=p;
      yv[i]=exp(p);

    p += 0.00001;
    }

  }

    return 0;
}


int main(){
    int i;
      pthread_t tid[num_threads];


    for (i=0;i<num_threads;i++){
      int *arg = malloc(sizeof(*arg));
      if ( arg == NULL ) {
        fprintf(stderr, "Couldn't allocate memory for thread arg.\n");
        exit(1);
      }

      *arg = i;
        pthread_create(&(tid[i]), NULL, run, arg);
    }

    for(i=0; i<num_threads; i++)
    {
        pthread_join(tid[i], NULL);
    }

#ifdef PRINT_ARRAY
    for (i=0;i<MAXSIZE;i++){
        printf("x=%.20lf, e^x=%.20lf\n",xv[i],yv[i]);
    }
#endif

    return 0;
}

在malloc的pthread_create的传递整数作为最后一个参数在此建议后。

我尝试但没有成功，c铛，添加free（tid）指令，避免使用malloc指令，反向循环，仅1个单维数组，1个没有pthread的线程版本。

EDIT2：我认为exp函数是处理器资源密集型的，可能受处理器生成的每个内核缓存或SIMD资源的影响。以下示例代码基于在Starkoverflow上发布的许可代码。

无论是否有printf循环，这段代码都能快速运行，并且math.h的exp几年前已经得到改进，至少在Intel 4770（Haswell）上，它的运行速度可以提高40倍左右，该链接是已知的测试代码。数学库与SSE2的比较，现在数学的exp速度应该接近针对float和x8并行计算优化的AVX2算法。

测试结果：expf与其他SSE2算法（exp_ps）：

sinf .. ->            55.5 millions of vector evaluations/second ->  12 cycles/value
cosf .. ->            57.3 millions of vector evaluations/second ->  11 cycles/value
sincos (x87) .. ->    9.1 millions of vector evaluations/second ->   71 cycles/value
expf .. ->            61.4 millions of vector evaluations/second ->  11 cycles/value
logf .. ->            55.6 millions of vector evaluations/second ->  12 cycles/value
cephes_sinf .. ->     52.5 millions of vector evaluations/second ->  12 cycles/value
cephes_cosf .. ->     41.9 millions of vector evaluations/second ->  15 cycles/value
cephes_expf .. ->     18.3 millions of vector evaluations/second ->  35 cycles/value
cephes_logf .. ->     20.2 millions of vector evaluations/second ->  32 cycles/value
sin_ps .. ->          54.1 millions of vector evaluations/second ->  12 cycles/value
cos_ps .. ->          54.8 millions of vector evaluations/second ->  12 cycles/value
sincos_ps .. ->       54.6 millions of vector evaluations/second ->  12 cycles/value
exp_ps .. ->          42.6 millions of vector evaluations/second ->  15 cycles/value
log_ps .. ->          41.0 millions of vector evaluations/second ->  16 cycles/value

/* Performance test exp(x) algorithm

based on AVX implementation of Giovanni Garberoglio


 Copyright (C) 2020 Antonio R.


     AVX implementation of exp:
     Modified code from this source: https://github.com/reyoung/avx_mathfun
     Based on "sse_mathfun.h", by Julien Pommier
     http://gruntthepeon.free.fr/ssemath/
     Copyright (C) 2012 Giovanni Garberoglio
     Interdisciplinary Laboratory for Computational Science (LISC)
     Fondazione Bruno Kessler and University of Trento
     via Sommarive, 18
     I-38123 Trento (Italy)


    This software is provided 'as-is', without any express or implied
    warranty.  In no event will the authors be held liable for any damages
    arising from the use of this software.
    Permission is granted to anyone to use this software for any purpose,
    including commercial applications, and to alter it and redistribute it
    freely, subject to the following restrictions:
    1. The origin of this software must not be misrepresented; you must not
       claim that you wrote the original software. If you use this software
       in a product, an acknowledgment in the product documentation would be
       appreciated but is not required.
    2. Altered source versions must be plainly marked as such, and must not be
       misrepresented as being the original software.
    3. This notice may not be removed or altered from any source distribution.
    (this is the zlib license)

  */

/*    gcc -O3 -m64 -Wall -mavx2 -march=haswell  expc.c -lm     */

#include <stdio.h>
#include <immintrin.h>
#include <math.h>
#define MAXSIZE 1000000
#define REIT 100000
#define XXX -5


__m256 exp256_ps(__m256 x) {

/*
  To increase the compatibility across different compilers the original code is
  converted to plain AVX2 intrinsics code without ingenious macro's,
  gcc style alignment attributes etc.
  Moreover, the part "express exp(x) as exp(g + n*log(2))" has been significantly simplified.
  This modified code is not thoroughly tested!
*/


__m256   exp_hi        = _mm256_set1_ps(88.3762626647949f);
__m256   exp_lo        = _mm256_set1_ps(-88.3762626647949f);

__m256   cephes_LOG2EF = _mm256_set1_ps(1.44269504088896341f);
__m256   inv_LOG2EF    = _mm256_set1_ps(0.693147180559945f);

__m256   cephes_exp_p0 = _mm256_set1_ps(1.9875691500E-4);
__m256   cephes_exp_p1 = _mm256_set1_ps(1.3981999507E-3);
__m256   cephes_exp_p2 = _mm256_set1_ps(8.3334519073E-3);
__m256   cephes_exp_p3 = _mm256_set1_ps(4.1665795894E-2);
__m256   cephes_exp_p4 = _mm256_set1_ps(1.6666665459E-1);
__m256   cephes_exp_p5 = _mm256_set1_ps(5.0000001201E-1);
__m256   fx;
__m256i  imm0;
__m256   one           = _mm256_set1_ps(1.0f);

        x     = _mm256_min_ps(x, exp_hi);
        x     = _mm256_max_ps(x, exp_lo);

  /* express exp(x) as exp(g + n*log(2)) */
        fx     = _mm256_mul_ps(x, cephes_LOG2EF);
        fx     = _mm256_round_ps(fx, _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC);
__m256  z      = _mm256_mul_ps(fx, inv_LOG2EF);
        x      = _mm256_sub_ps(x, z);
        z      = _mm256_mul_ps(x,x);

__m256  y      = cephes_exp_p0;
        y      = _mm256_mul_ps(y, x);
        y      = _mm256_add_ps(y, cephes_exp_p1);
        y      = _mm256_mul_ps(y, x);
        y      = _mm256_add_ps(y, cephes_exp_p2);
        y      = _mm256_mul_ps(y, x);
        y      = _mm256_add_ps(y, cephes_exp_p3);
        y      = _mm256_mul_ps(y, x);
        y      = _mm256_add_ps(y, cephes_exp_p4);
        y      = _mm256_mul_ps(y, x);
        y      = _mm256_add_ps(y, cephes_exp_p5);
        y      = _mm256_mul_ps(y, z);
        y      = _mm256_add_ps(y, x);
        y      = _mm256_add_ps(y, one);

  /* build 2^n */
        imm0   = _mm256_cvttps_epi32(fx);
        imm0   = _mm256_add_epi32(imm0, _mm256_set1_epi32(0x7f));
        imm0   = _mm256_slli_epi32(imm0, 23);
__m256  pow2n  = _mm256_castsi256_ps(imm0);
        y      = _mm256_mul_ps(y, pow2n);
        return y;
}

int main(){
    int r;
    int i;
    float p;
    static float xv[MAXSIZE];
    static float yv[MAXSIZE];
    float *xp;
    float *yp;

  for (r = 0; r < REIT; r++) {
    p = XXX;
    xp = xv;
    yp = yv;

    for (i = 0; i < MAXSIZE; i += 8) {

    __m256 x = _mm256_setr_ps(p, p + 0.00001, p + 0.00002, p + 0.00003, p + 0.00004, p + 0.00005, p + 0.00006, p + 0.00007);
    __m256 y = exp256_ps(x);

    _mm256_store_ps(xp,x);
    _mm256_store_ps(yp,y);

    xp += 8;
    yp += 8;
    p += 0.00008;
    }

  }

  for (i=0;i<MAXSIZE;i++){
      printf("x=%.20f, e^x=%.20f\n",xv[i],yv[i]);
  }

    return 0;
}

为了进行比较，这是数学库中具有exp（x），单线程和浮点的代码示例。

#include <stdio.h>
#include <math.h>
#define MAXSIZE 1000000
#define REIT 100000
#define XXX -5
/*    gcc -O3 -m64 exp_st.c -lm    */


int main(){
    int r;
    int i;
    float p;
    static float xv[MAXSIZE];
    static float yv[MAXSIZE];


  for (r = 0; r < REIT; r++) {
    p = XXX;

    for (i = 0; i < MAXSIZE; i++) {

      xv[i]=p;
      yv[i]=expf(p);

    p += 0.00001;
    }
  }

  for (i=0;i<MAXSIZE;i++){
      printf("x=%.20f, e^x=%.20f\n",xv[i],yv[i]);
  }

    return 0;
}

SOLUTION：正如Andreas Wenzel所说，gcc编译器足够聪明，它决定没有必要将结果实际写入数组，编译器对这些写入进行了优化。在根据新信息进行了新的性能测试之后，或者在我犯了一些错误或假定错误的事实之前，似乎结果更加清晰：exp（double arg）或expf（float arg）均为x2 + exp（double arg）改进了，但它不是一种快速的AVX2算法（x8并行浮点arg），比SSE2算法（x4并行浮点arg）快大约6倍。以下是一些与英特尔超线程CPU预期相同的结果，但SSE2算法除外：

exp（双arg）单线程：18分46秒

exp（double arg）4个线程：5分4秒

exp（double arg）8个线程：4分28秒

expf（float arg）单线程：7分32秒

expf（float arg）4个线程：1分58秒

expf（float arg）8个线程：1分钟41秒

相对误差：

i           x                     y = expf(x)           double precision exp        relative error

i = 0       x =-5.000000000e+00   y = 6.737946998e-03   exp_dbl = 6.737946999e-03   rel_err =-1.124224480e-10
i = 124000  x =-3.758316040e+00   y = 2.332298271e-02   exp_dbl = 2.332298229e-02   rel_err = 1.803005727e-08
i = 248000  x =-2.518329620e+00   y = 8.059411496e-02   exp_dbl = 8.059411715e-02   rel_err =-2.716802480e-08
i = 372000  x =-1.278343201e+00   y = 2.784983218e-01   exp_dbl = 2.784983343e-01   rel_err =-4.490403948e-08
i = 496000  x =-3.867173195e-02   y = 9.620664716e-01   exp_dbl = 9.620664730e-01   rel_err =-1.481617428e-09
i = 620000  x = 1.201261759e+00   y = 3.324308872e+00   exp_dbl = 3.324308753e+00   rel_err = 3.571995830e-08
i = 744000  x = 2.441616058e+00   y = 1.149159718e+01   exp_dbl = 1.149159684e+01   rel_err = 2.955980805e-08
i = 868000  x = 3.681602478e+00   y = 3.970997620e+01   exp_dbl = 3.970997748e+01   rel_err =-3.232306688e-08
i = 992000  x = 4.921588898e+00   y = 1.372204742e+02   exp_dbl = 1.372204694e+02   rel_err = 3.563072184e-08

* Julien Pommier的* SSE2算法，x6,8速度从一个线程增加到8个线程。我的性能测试代码对传递到库的vector / 4浮点数组的typedef联合使用aligned（16），而不是对齐的整个浮点数组。这可能会导致性能下降，至少对于其他AVX2代码而言，它的多线程性能改进对于Intel Hyper-Threading似乎也不错，但速度较低时，时间在x2.5-x1.5之间增加。也许可以通过更好的数组对齐来加快SSE2代码的速度，但我无法改进：

exp_ps（x4并行浮点arg）单线程：12分7秒

exp_ps（x4并行浮点arg）4个线程：3分10秒

exp_ps（x4并行float arg）8个线程：1分钟46秒

相对误差：

i           x                     y = exp_ps(x)         double precision exp        relative error

i = 0       x =-5.000000000e+00   y = 6.737946998e-03   exp_dbl = 6.737946999e-03   rel_err =-1.124224480e-10
i = 124000  x =-3.758316040e+00   y = 2.332298271e-02   exp_dbl = 2.332298229e-02   rel_err = 1.803005727e-08
i = 248000  x =-2.518329620e+00   y = 8.059412241e-02   exp_dbl = 8.059411715e-02   rel_err = 6.527768787e-08
i = 372000  x =-1.278343201e+00   y = 2.784983218e-01   exp_dbl = 2.784983343e-01   rel_err =-4.490403948e-08
i = 496000  x =-3.977407143e-02   y = 9.610065222e-01   exp_dbl = 9.610065335e-01   rel_err =-1.174323454e-08
i = 620000  x = 1.200158238e+00   y = 3.320642233e+00   exp_dbl = 3.320642334e+00   rel_err =-3.054731957e-08
i = 744000  x = 2.441616058e+00   y = 1.149159622e+01   exp_dbl = 1.149159684e+01   rel_err =-5.342903415e-08
i = 868000  x = 3.681602478e+00   y = 3.970997620e+01   exp_dbl = 3.970997748e+01   rel_err =-3.232306688e-08
i = 992000  x = 4.921588898e+00   y = 1.372204742e+02   exp_dbl = 1.372204694e+02   rel_err = 3.563072184e-08

AVX2算法（x8并行浮点arg）单线程：1分45秒

AVX2算法（x8并行浮点arg）4个线程：28秒

AVX2算法（x8并行浮点arg）8个线程：27秒

相对误差：

i           x                     y = exp256_ps(x)      double precision exp        relative error

i = 0       x =-5.000000000e+00   y = 6.737946998e-03   exp_dbl = 6.737946999e-03   rel_err =-1.124224480e-10
i = 124000  x =-3.758316040e+00   y = 2.332298271e-02   exp_dbl = 2.332298229e-02   rel_err = 1.803005727e-08
i = 248000  x =-2.516632080e+00   y = 8.073104918e-02   exp_dbl = 8.073104510e-02   rel_err = 5.057888540e-08
i = 372000  x =-1.279417157e+00   y = 2.781994045e-01   exp_dbl = 2.781993997e-01   rel_err = 1.705288467e-08
i = 496000  x =-3.954863176e-02   y = 9.612231851e-01   exp_dbl = 9.612232069e-01   rel_err =-2.269774967e-08
i = 620000  x = 1.199879169e+00   y = 3.319715738e+00   exp_dbl = 3.319715775e+00   rel_err =-1.119642824e-08
i = 744000  x = 2.440370798e+00   y = 1.147729492e+01   exp_dbl = 1.147729571e+01   rel_err =-6.896860199e-08
i = 868000  x = 3.681602478e+00   y = 3.970997620e+01   exp_dbl = 3.970997748e+01   rel_err =-3.232306688e-08
i = 992000  x = 4.923286438e+00   y = 1.374535980e+02   exp_dbl = 1.374536045e+02   rel_err =-4.676466368e-08

源代码AVX2算法多线程

/* Performance test of a multithreaded exp(x) algorithm

based on AVX implementation of Giovanni Garberoglio


 Copyright (C) 2020 Antonio R.


     AVX implementation of exp:
     Modified code from this source: https://github.com/reyoung/avx_mathfun
     Based on "sse_mathfun.h", by Julien Pommier
     http://gruntthepeon.free.fr/ssemath/
     Copyright (C) 2012 Giovanni Garberoglio
     Interdisciplinary Laboratory for Computational Science (LISC)
     Fondazione Bruno Kessler and University of Trento
     via Sommarive, 18
     I-38123 Trento (Italy)


    This software is provided 'as-is', without any express or implied
    warranty.  In no event will the authors be held liable for any damages
    arising from the use of this software.
    Permission is granted to anyone to use this software for any purpose,
    including commercial applications, and to alter it and redistribute it
    freely, subject to the following restrictions:
    1. The origin of this software must not be misrepresented; you must not
       claim that you wrote the original software. If you use this software
       in a product, an acknowledgment in the product documentation would be
       appreciated but is not required.
    2. Altered source versions must be plainly marked as such, and must not be
       misrepresented as being the original software.
    3. This notice may not be removed or altered from any source distribution.
    (this is the zlib license)

  */

  /*    gcc -O3 -m64 -mavx2 -march=haswell expc_multi.c -pthread -lm    */

#include <stdio.h>
#include <stdlib.h>
#include <immintrin.h>
#include <math.h>
#include <pthread.h>
#define MAXSIZE 1000000
#define REIT 100000
#define XXX -5
#define num_threads 4

typedef float  FLOAT32[MAXSIZE] __attribute__((aligned(4)));
static FLOAT32 xv;
static FLOAT32 yv;



__m256 exp256_ps(__m256 x) {

/*
  To increase the compatibility across different compilers the original code is
  converted to plain AVX2 intrinsics code without ingenious macro's,
  gcc style alignment attributes etc.
  Moreover, the part "express exp(x) as exp(g + n*log(2))" has been significantly simplified.
  This modified code is not thoroughly tested!
*/


__m256   exp_hi        = _mm256_set1_ps(88.3762626647949f);
__m256   exp_lo        = _mm256_set1_ps(-88.3762626647949f);

__m256   cephes_LOG2EF = _mm256_set1_ps(1.44269504088896341f);
__m256   inv_LOG2EF    = _mm256_set1_ps(0.693147180559945f);

__m256   cephes_exp_p0 = _mm256_set1_ps(1.9875691500E-4);
__m256   cephes_exp_p1 = _mm256_set1_ps(1.3981999507E-3);
__m256   cephes_exp_p2 = _mm256_set1_ps(8.3334519073E-3);
__m256   cephes_exp_p3 = _mm256_set1_ps(4.1665795894E-2);
__m256   cephes_exp_p4 = _mm256_set1_ps(1.6666665459E-1);
__m256   cephes_exp_p5 = _mm256_set1_ps(5.0000001201E-1);
__m256   fx;
__m256i  imm0;
__m256   one           = _mm256_set1_ps(1.0f);

        x     = _mm256_min_ps(x, exp_hi);
        x     = _mm256_max_ps(x, exp_lo);

  /* express exp(x) as exp(g + n*log(2)) */
        fx     = _mm256_mul_ps(x, cephes_LOG2EF);
        fx     = _mm256_round_ps(fx, _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC);
__m256  z      = _mm256_mul_ps(fx, inv_LOG2EF);
        x      = _mm256_sub_ps(x, z);
        z      = _mm256_mul_ps(x,x);

__m256  y      = cephes_exp_p0;
        y      = _mm256_mul_ps(y, x);
        y      = _mm256_add_ps(y, cephes_exp_p1);
        y      = _mm256_mul_ps(y, x);
        y      = _mm256_add_ps(y, cephes_exp_p2);
        y      = _mm256_mul_ps(y, x);
        y      = _mm256_add_ps(y, cephes_exp_p3);
        y      = _mm256_mul_ps(y, x);
        y      = _mm256_add_ps(y, cephes_exp_p4);
        y      = _mm256_mul_ps(y, x);
        y      = _mm256_add_ps(y, cephes_exp_p5);
        y      = _mm256_mul_ps(y, z);
        y      = _mm256_add_ps(y, x);
        y      = _mm256_add_ps(y, one);

  /* build 2^n */
        imm0   = _mm256_cvttps_epi32(fx);
        imm0   = _mm256_add_epi32(imm0, _mm256_set1_epi32(0x7f));
        imm0   = _mm256_slli_epi32(imm0, 23);
__m256  pow2n  = _mm256_castsi256_ps(imm0);
        y      = _mm256_mul_ps(y, pow2n);
        return y;
}


void* run(void *received_Val){
    int single_val = *((int *) received_Val);
    int r;
    int i;
    float p;
    float *xp;
    float *yp;

  for (r = 0; r < REIT; r++) {
    p = XXX + 0.00001*single_val*MAXSIZE/num_threads;
    xp = xv + single_val*MAXSIZE/num_threads;
    yp = yv + single_val*MAXSIZE/num_threads;

    for (i = single_val*MAXSIZE/num_threads; i < (single_val+1)*MAXSIZE/num_threads; i += 8) {

      __m256 x = _mm256_setr_ps(p, p + 0.00001, p + 0.00002, p + 0.00003, p + 0.00004, p + 0.00005, p + 0.00006, p + 0.00007);
      __m256 y = exp256_ps(x);

      _mm256_store_ps(xp,x);
      _mm256_store_ps(yp,y);

      xp += 8;
      yp += 8;
      p += 0.00008;

    }
  }

    return 0;
}


int main(){
    int i;
      pthread_t tid[num_threads];


    for (i=0;i<num_threads;i++){
      int *arg = malloc(sizeof(*arg));
      if ( arg == NULL ) {
        fprintf(stderr, "Couldn't allocate memory for thread arg.\n");
        exit(1);
      }

      *arg = i;
        pthread_create(&(tid[i]), NULL, run, arg);
    }

    for(i=0; i<num_threads; i++)
    {
        pthread_join(tid[i], NULL);
    }

    for (i=0;i<MAXSIZE;i++){
        printf("x=%.20f, e^x=%.20f\n",xv[i],yv[i]);
    }

    return 0;
}

图表概述：不带printf循环的exp（双arg），不是真正的性能，正如Andreas Wenzel发现的那样，当结果将不会被打印时，gcc不会计算exp（x），即使float版本也要慢得多，因为它的装配少了说明。尽管图对于某些仅使用低级CPU缓存/寄存器的汇编算法可能有用。expf（float arg）的实际性能或带有printf循环的AVX2算法的最佳性能。

安德烈亚斯·温泽尔（Andreas Wenzel）

When you don't print the arrays at the end of the program, the gcc compiler is smart enough to realize that the results of the calculations have no observable effects. Therefore, the compiler decides that it is not necessary to actually write the results to the array, because these results are never used. Instead, these writes are optimized away by the compiler.

Also, when you don't print the results, the library function exp has no observable side-effects, provided it is not called with input that is so high that it would cause a floating-point overflow (which would cause the function to raise a floating point exception). This allows the compiler to optimize these function calls away, too.

As you can see in the assembly instructions emitted by the gcc compiler for your code which does not print the results, the compiled program doesn't unconditionally call the function exp, but instead tests if the input to the function exp would be higher than 7.09e2 (to ensure that no overflow will occur). Only if an overflow would occur, will the program jump to the code which calls the function exp. Here is the relevant assembly code:

ucomisd xmm1, xmm3
jnb .L9

In the above assembly code, the CPU register xmm3 contains the double-precision floating-point value 7.09e2. If the input is higher than this constant, the function exp would cause a floating-point overflow, because the result cannot be represented in a double-precision floating-point value.

Since your input is always valid and low enough to not cause a floating-point overflow, your program will never perform this jump, so it will never actually call the function exp.

This explains why your code is so much faster when you do not print the results. If you do not print the results, your compiler will determine that the calculations have no observable effects, so it will optimize them away.

Therefore, if you want the compiler to actually perform the calculations, you must ensure that the calculations have some observable effect. This does not mean that you have to actually print all results (which are several megabytes large). It is sufficient if you just print one line which is dependant on all the results (for example the sum of all results).

但是，如果用对exp其他自定义函数的调用来替换对库函数的函数调用，则编译器不够聪明，无法意识到该函数调用没有可观察到的效果，因此，甚至无法优化该函数调用。如果您不打印计算结果。

出于上述原因，如果要比较两个函数的性能，必须确保计算实际发生，方法是确保结果具有可观察的效果。否则，您将冒着编译器优化至少部分计算的风险，并且这种比较是不公平的。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-01-26

我来说两句

0 条评论

登录后参与评论

TOP 榜单

文章

pthread和printf的C性能差

pthread和printf的C性能差

隐藏发件人没有短信PHP

Hashchange事件侦听器在将事件处理程序附加到事件之前进行侦听

在浏览器中请求URL时会发生什么？

flask-admin 如何自定义删除按钮

材质UI垂直滑块。如何改变在垂直材料UI滑块导轨的厚度（反应）

用日期数据透视表和日期顺序查询

Jqgrid：多级别组摘要

java io ioexception无法解析服务器地址解析器的响应

Swift如何使用Base64Url编码JWT标头和有效负载之类的json对象

sshd AllowGroups组未授予访问权限

jQuery无限滚动固定div中的滚动

android 背部按下

Flexbox CSS 对齐属性环境惰性？

为什么随机森林中的平均降低基尼系数取决于人口规模？

ClickHouse 创建临时表

为什么PlusShare.Builder setRecipients方法不起作用？

如何在Android中识别MICR代码

PyQt4.QtCore模块无法向sip模块注册

正则表达式，用于查找所有以任何字母开头和数字开头的文件

是否可以通过编程方式对很多动画进行重新着色？

机器密钥生成