排序32位整数与排序64位整数一样快

Klein 发表于 Dev

克莱因

我认为这是事实：

快速排序应该相当友好地缓存。
64字节的缓存行可以包含16个32位int或8个64位int。

假设：

对32位整数的向量进行排序应比对64位整数的向量进行排序更快。

但是当我运行下面的代码时，我得到了结果：

i16 = 7.5168 
i32 = 7.3762 
i64 = 7.5758

为什么我没有得到想要的结果？

C ++：

#include <iostream> 
#include <vector>
#include <cstdint>
#include <algorithm>
#include <chrono>


int main() {
  const int vlength = 100'000'000;
  const int maxI = 50'000;

  std::vector<int16_t> v16;
  for (int i = 0; i < vlength; ++i) {
    v16.push_back(int16_t(i%maxI));
  }
  std::random_shuffle(std::begin(v16), std::end(v16));
  std::vector<int32_t> v32;
  std::vector<int64_t> v64;
  for (int i = 0; i < vlength; ++i) {
    v32.push_back(int32_t(v16[i]));
    v64.push_back(int64_t(v16[i]));
  }

  auto t1 = std::chrono::high_resolution_clock::now();
  std::sort(std::begin(v16), std::end(v16));
  auto t2 = std::chrono::high_resolution_clock::now();
  std :: cout << "i16 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count() << std :: endl;

  t1 = std::chrono::high_resolution_clock::now();
  std::sort(std::begin(v32), std::end(v32));
  t2 = std::chrono::high_resolution_clock::now();
  std :: cout << "i32 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count() << std :: endl;

  t1 = std::chrono::high_resolution_clock::now();
  std::sort(std::begin(v64), std::end(v64));
  t2 = std::chrono::high_resolution_clock::now();
  std :: cout << "i64 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count() << std :: endl;

}

编辑：为了避免如何进行缓存友好排序的问题，我还尝试了以下代码：

template <typename T>
inline void function_speed(T& vec) {
  for (auto& i : vec) {
    ++i;
  }
}

int main() {
  const int nIter = 1000;

  std::vector<int16_t> v16(1000000);
  std::vector<int32_t> v32(1000000);
  std::vector<int64_t> v64(1000000);



  auto t1 = std::chrono::high_resolution_clock::now();
  for (int i = 0; i < nIter; ++i) {
    function_speed(v16);
  }
  auto t2 = std::chrono::high_resolution_clock::now();
  std :: cout << "i16 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count()/double(nIter) << std :: endl;

  t1 = std::chrono::high_resolution_clock::now();
  for (int i = 0; i < nIter; ++i) {
    function_speed(v32);
  }
  t2 = std::chrono::high_resolution_clock::now();
  std :: cout << "i32 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count()/double(nIter) << std :: endl;

  t1 = std::chrono::high_resolution_clock::now();
  for (int i = 0; i < nIter; ++i) {
    function_speed(v64);
  }
  t2 = std::chrono::high_resolution_clock::now();
  std :: cout << "i64 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count()/double(nIter) << std :: endl;

}

典型结果：

i16 = 0.00618648
i32 = 0.00617911
i64 = 0.00606275

我知道适当的基准测试本身就是一门科学，也许我做错了。

EDIT2：通过避免溢出，我现在开始获得更多有趣的结果：

template <typename T>
inline void function_speed(T& vec) {
  for (auto& i : vec) {
    ++i;
    i %= 1000;
  }
}

给出如下结果：

i16 = 0.0143789
i32 = 0.00958941
i64 = 0.019691

如果我改为：

template <typename T>
inline void function_speed(T& vec) {
  for (auto& i : vec) {
    i = (i+1)%1000;
  }
}

我得到：

i16 = 0.00939448
i32 = 0.00913768
i64 = 0.019615

杂项

错误的假设；对于大多数N，所有O（N log N）排序算法都必须对缓存不友好！可能的输入。

Furtehrmore，我认为优化的编译器可以彻底删除排序，而未经优化的构建对于基准测试毫无意义。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-3

我来说两句

0 条评论

登录后参与评论

上一篇：使用ICompositeViewEngine的解析程序或ServiceProvider的必需依赖项

64位向量的点乘积比32位无符号整数的向量快一倍？

汇编：使用两个32位寄存器中的值进行除法，就好像它们是一个64位整数一样

关键任务工作负载的64位JVM和32位一样好？

如何迫使Wine在64位Ubuntu上像32位Windows一样工作？

用两个32位整数“模拟”一个64位整数

在32位iPhone上处理64位整数

windows 使用 long int 就像在 32 位机器中一样，而在 64 位机器中

右移32位整数

反向32位整数

将 32 位有符号整数转换为 64 位整数，同时保留精确位

为什么将两个32位整数合并为一个64位整数？

按第一位数而不是整数排序

如何将32位带符号整数转换为64位无符号整数的高32位？

alpha png的24位与32位png一样吗？

对整数列表进行排序，就好像它是Java中的字符串列表一样

检测32位整数溢出

从64位整数到64位整数的可逆“哈希”函数

为什么复数乘法几乎与python中的整数乘法一样快？

两个整数的“ min”与“ bit hacking”一样快吗？

Fortran能够以与整数加法一样快的速度计算real * 4的平方根？

以与GCC builtinpopcount（int）一样快的整数计数位1

64位操作系统能否从闪存驱动器中像32位版本一样快速流畅地运行？

如何用八（8）个4位整数创建一个32位整数？

Intel（x86_64）64位与32位整数算术性能差异

为什么64位JVM比32位JVM快？

如何使用64位无符号整数作为App Engine数据存储区中的实体键并保留排序顺序？

将两个32位整数转换为一个带符号的64位整数字符串

使用移位和加法对 64 位整数进行模 7（32 位有效，但 64 位无效）

优化器可以省略从 64 位到 32 位整数类型的分配吗？

TOP 榜单

文章

排序32位整数与排序64位整数一样快

排序32位整数与排序64位整数一样快

Android Studio Kotlin：提取为常量

IE 11中的FormData未定义

计算数据帧R中的字符串频率

如何在R中转置数据

如何使用Redux-Toolkit重置Redux Store

Excel 2016图表将增长与4个参数进行比较

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

OpenCv：改变 putText() 的位置

ActiveModelSerializer仅显示关联的ID

算术中的c ++常量类型转换

如何开始为Ubuntu开发

将加号/减号添加到jQuery菜单

去噪自动编码器和常规自动编码器有什么区别？

获取并汇总所有关联的数据

OpenGL纹理格式的颜色错误

在 React Native Expo 中使用 react-redux 更改另一个键的值

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

TreeMap中的自定义排序

Redux动作正常，但减速器无效

如何对treeView的子节点进行排序