Skip to main content

exp8: 单机性能优化

Course WorkIntroduction to High Performance ComputingLess than 1 minuteAbout 278 words

Task 0

Performance

OptionElapsed Time / secondsPerformance / GFlops
-O01.00720.2665
-O10.34610.7755
-O20.33320.8057
-O30.04965.4081
-fast0.03866.9524

Task 1

Performance

UNROLL_NElapsed Time / secondsPerformance / GFlops
12.081415.7431
21.931116.9688
41.804818.1562
81.778718.4227
161.827617.9297

回答问题

Question 1

请参考 ICC 手册open in new window 并简述参数 (-O0, -O1, -O2, -O3, -fast) 分别进行了哪些编译优化。每种参数罗列几个优化技术即可。

Answer 1
-O0

禁用所有优化

-O1
  • data-flow analysis
  • code motion
  • strength reduction and test replacement
  • split-lifetime analysis
  • instruction scheduling
-O2
  • Vectorization
  • Inlining of intrinsics
  • inlining
  • constant propagation
  • forward substitution
  • routine attribute propagation
  • variable address-taken analysis
  • dead static function elimination
  • removal of unreferenced variables
-O3
  • Fusion
  • Block-Unroll-and-Jam
  • collapsing if statements
-fast
  • interprocedural optimization between files
  • optimization of floating-point divides that give slightly less precise results than full IEEE division
  • link all libraries statically
  • generate instructions for the highest instruction set available
Question 2

请简述任务一中循环展开带来的好处。

Answer 2
  • 可以减少循环变量的比较次数和分支跳转次数
  • 减少数据依赖, 增加并发, 充分利用 CPU 流水线