OpenCL is closer to OpenMP than the threading APIs of Win32 and POSIX,
supporting data-parallel execution but retaining a low level of control. The unit of
concurrent execution in OpenCL C is a work-item. As with the two previous examples,
each work-item executes the kernel function body. Instead of manually strip
mining the loop, we will often map a single iteration of the loop to a work-item.