When developing concurrent programs
for a CPU using OS threading APIs or OpenMP, for example, the programmer
considers the physical resources available (e.g., CPU cores) and the overhead of creating and switching between threads when their number substantially exceeds the
resource availability. With OpenCL, the goal is often to represent parallelism programmatically
at the finest granularity possible. The generalization of the OpenCL
interface and the low-level kernel language allows efficient mapping to a wide range
of hardware.