Skip to main content

exp4: 自动向量化与基于 intrinsic 的手动向量化

Course WorkIntroduction to High Performance ComputingintrinsicLess than 1 minuteAbout 70 words

Performance

MethodTime
baseline4711 us
auto simd530 us
intrinsic514 us

Implementation

void a_plus_b_intrinsic(float* a, float* b, float* c, int n) {
  for (int i = 0; i < n; i += 8) {
    _mm256_store_ps(
        c + i, _mm256_add_ps(_mm256_load_ps(a + i), _mm256_load_ps(b + i)));
  }
}