Speculative Execution

With speculative execution, the operations are evaluated, but the final results are not stored in the program registers or data memory until the processor can be certain that these instructions should actually have been executed.

Eliminate write/read Dependency

the outcome of a memory read depends on a recent memory write.

Loop Unrolling

for (int i=0; i<n-3; i+=4)  // note the n-3 bound for starting i + 0..3
{
  sum1 += data[i+0];
  sum2 += data[i+1];
  sum3 += data[i+2];
  sum4 += data[i+3];
}
sum = sum1 + sum2 + sum3 + sum4;
// if n%4 != 0, handle final 0..3 elements with a rolled up loop or whatever

In general, we have found that unrolling a loop and accumulating multiple values in parallel is a more reliable way to achieve improved program performance.

Pipelining

In computer scienceCPU Instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some CPU Instruction by dividing incoming instructions into a series of sequential steps (the eponymous “pipeline”) performed by different processor units with different parts of instructions processed in parallel.

A key feature of pipelining is that it increases the throughput of the system (i.e., the number of customers served per unit time), but it may also slightly increase the latency

Resources

优化算法复杂度(从 O(N)优化到 O(logN) ),优化锁的粒度或者无锁化,应用各种池化技术:内存池、连接池、线程池,协程池等,压缩技术,预拉取,缓存,批量处理,SIMD,内存对齐等等手段后,其实还有一种手段就是 Profile-Guided Optimization (PGO)。本文会介绍 PGO 的原理,以及 Go/C++语言进行 PGO 的实践。