Skip to content

Commit d5aa0ef

Browse files
Implement internal tiling for demosaicing
All demosaicing is now done with internal tiling in CPU and OpenCL code (if required by available memory and used algorithms). We use simple horizontal tiles for performance. For CPU this does not require any copy of input data and we only have to stitch output data. For OpenCL we have to copy image data before and after the tiling code but this is fast as data is contiguous and all happens on gpu memory. Only if the input/tile height ratio is too large we do a fallback to CPU. Writing of the pipe's detail mask is calculated from sharpened output data after internal tiling. If we don't have to tile, there is no performance penalty at all. In general, the new internal tiling is faster in the vast majority of cases, - stitching is much faster especially with OpenCL. We avoid transfer from/to graphics memory, all is done in graphics memory. This strategy leads to more tiles as we have to keep the output buffer for stitching. On my 8GB nvidia card with default setting a 40mpix xtrans doing markjestejn3 with two tiles took ~930msec, the new internal tiling code does 10 tiles but takes just 860msec. - the generic tiling required the costly tiling_roi variants - if we want a details blending mask and mem resources would need tiling we now avoid the CPU fallback with drastically improved performance. Some tiling related logs and deduplications.
1 parent 30ed62b commit d5aa0ef

File tree

5 files changed

+390
-299
lines changed

5 files changed

+390
-299
lines changed

0 commit comments

Comments
 (0)