y2r probably isn't one. afair decoding is still done on the CPU and not in the OS (otherwise video playback would already be a lot faster probably).
I think it would work for:
- Vertex shaders
- Software rendering
- Texture decoding
- Texture / Surface conversions
- Possibly audio but the overhead in OpenCL is probably too large
* = I think rather than working on OpenCL we should first employ MT / OpenMP
** = Would require OpenCL / OpenGL interaction
So overall: Not a fan of OpenCL for most things. I believe it adds to much complexity into the design early on, the overhead is probably larger than the benefits in a lot of cases too.
As Citra is currently doing all of the above tasks single threaded, we should first try to switch to better CPU use by multi-threading them.
Additionally the vertex shaders currently do a ton of memory accesses. We should fix those (by adding a register allocator) and use AVX512 if available.
Similar things go for the software renderer: It's absolutely horrible and slow right now. It should be a simple JIT, multithreaded with a lot of SIMD (and a fallback interpreter using the same emitter code).