Commit d5aa0ef

committed

Implement internal tiling for demosaicing

All demosaicing is now done with internal tiling in CPU and OpenCL code (if required by available memory and used algorithms). We use simple horizontal tiles for performance. For CPU this does not require any copy of input data and we only have to stitch output data. For OpenCL we have to copy image data before and after the tiling code but this is fast as data is contiguous and all happens on gpu memory. Only if the input/tile height ratio is too large we do a fallback to CPU. Writing of the pipe's detail mask is calculated from sharpened output data after internal tiling. If we don't have to tile, there is no performance penalty at all. In general, the new internal tiling is faster in the vast majority of cases, - stitching is much faster especially with OpenCL. We avoid transfer from/to graphics memory, all is done in graphics memory. This strategy leads to more tiles as we have to keep the output buffer for stitching. On my 8GB nvidia card with default setting a 40mpix xtrans doing markjestejn3 with two tiles took ~930msec, the new internal tiling code does 10 tiles but takes just 860msec. - the generic tiling required the costly tiling_roi variants - if we want a details blending mask and mem resources would need tiling we now avoid the CPU fallback with drastically improved performance. Some tiling related logs and deduplications.

1 parent 30ed62b commit d5aa0efCopy full SHA for d5aa0ef

5 files changed

+390

-299

lines changed

src/iop
- demosaic.c
- demosaicing

5 files changed

+390

-299

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit d5aa0ef

5 files changed

5 files changed

File tree

5 files changed

5 files changed

0 commit comments