With the GPU accelerated softISP 2 separate benchmark results are printed,
1 for the generation of the output images on the GPU and a separate one
for generating the statistics on the CPU.
Add a new name argument to the Benchmark class descriptor and print this
out when printing the benchmark result.
Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
When calling `Debayer::process()` from `SoftwareIsp::process()`, the
`DebayerParams` object is copied multiple times:
(1) call of `BoundMethodMember<...>::activate()`
inside `Object::invokeMethod()`
(2) constructor of `BoundMethodArgs<...>`
inside `BoundMethodMember<...>::activate()`
(3) call of `BoundMethodMember<...>::invoke()`
inside `BoundMethodArgs::invokePack()`
(4) call of the actual pointer to member function
inside `BoundMethodMember::invoke()`
While compilers might avoid one or two of the above copies, this is still
not ideal. By making `Debayer::process()` take the parameter object by
const lvalue reference, only the copy in the `BoundMethodArgs` constructor
remains. So do that.
Before:
[0:12:51.133836595] [12424] DEBUG SoftwareIsp software_isp.cpp:399 params=0x7d0a691f57d0
copy from 0x7d0a691f57d0 into 0x7baa65f2bf30
copy from 0x7baa65f2bf30 into 0x7c6a69209758
copy from 0x7c6a69209758 into 0x7baa63223930
copy from 0x7baa63223930 into 0x7baa63223a70
[0:12:51.134559602] [12426] DEBUG eGL debayer_egl.cpp:538 params=0x7baa63223a70
771.099877 (30.06 fps) cam0-stream0 seq: 000031 bytesused: 8666112
After:
[0:13:42.861691943] [12543] DEBUG SoftwareIsp software_isp.cpp:399 params=0x7cfaad5f57d0
copy from 0x7cfaad5f57d0 into 0x7c5aad609758
[0:13:42.862453917] [12545] DEBUG eGL debayer_egl.cpp:538 params=0x7c5aad609758
822.827388 (30.02 fps) cam0-stream0 seq: 000031 bytesused: 8666112
Signed-off-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
The default CCM in uncalibrated.yaml is just an identity transformation
and has been enabled by default only to always provide a correction
matrix to GPU ISP. It slows down CPU ISP when CCM is not used.
Now, when a default correction matrix is always provided to GPU ISP, we
can disable the Ccm algorithm in uncalibrated.yaml again. The check for
ccmEnabled in GPU ISP is no longer needed and it must be removed in
order not to fail when Ccm algorithm is not enabled. ccmEnabled flag is
still needed in CPU ISP where the processing differs based on whether
CCM is present or not.
Reviewed-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
Reviewed-by: Robert Mader <robert.mader@collabora.com>
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
The Lut algorithm is not really an algorithm. Moreover, algorithms may
be enabled or disabled but with Lut disabled, nothing will work.
Let's move the construction of lookup tables to CPU debayering, where it
is used. The implied and related changes are:
- DebayerParams is changed to contain the real params rather than lookup
tables.
- contrastExp parameter introduced by GPU ISP is used for CPU ISP too.
- The params must be initialised so that debayering gets meaningful
parameter values even when some algorithms are disabled.
- combinedMatrix must be put to params everywhere where it is modified.
- Matrix changes needn't be tracked in the algorithms any more.
- CPU debayering must watch for changes of the corresponding parameters
to update the lookup tables when and only when needed.
- Swapping red and blue is integrated into lookup table constructions.
- gpuIspEnabled flags are removed as they are not needed any more.
Reviewed-by: Robert Mader <robert.mader@collabora.com>
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Acked-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
The black level offset subtracted in AWB is wrong. It assumes that the
stats contain sums of the individual colour pixels. But they actually
contain sums of the colour channels of larger "superpixels" consisting
of the individual colour pixels. Each of the RGB colour values and the
computed luminosity (a histogram entry) are added once to the stats per
such a superpixel. This means the offset computed from the black level
and the number of pixels should be used as it is, not divided.
The patch fixes the subtracted offset. Since the evaluation is the same
for all the three colours now, the individual class variables are
replaced with a single RGB variable.
Fixes: 4e13c6f55b ("Honor black level in AWB")
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Robert Mader <robert.mader@collabora.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Mesa surfaceless platform appears to be a better fit for the use-case at hand:
1. Like GBM it is Mesa specific, so no change in supported setups is
expected. If ever required, a fallback to the generic device platform
could be added on top.
2. It leaves the complexity of selecting a renderer device to the
driver, reducing code and dependencies.
3. It allows to use llvmpipe / software drivers without dri device,
which can be useful on CI or debugging (with LIBGL_ALWAYS_SOFTWARE=1).
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> # sm8250/rb5, x1e/Dell Insprion14p
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Milan Zamazal <mzamazal@redhat.com> # TI AM69
Tested-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com> # ThinkPad X1 Yoga Gen 7 + ov2740
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
In some cases the GPU can deliver 15x performance in Debayer with the
CCM on, reference hardware Qualcomm RB5 with IMX512 sensor.
Given this large performance difference it makes sense to make GPUISP
the default for the Software ISP.
If LIBCAMERA_SOFTISP_MODE is omitted gpu will be the default. If
libcamera is compiled without gpuisp support, CPU Debayer will be used.
It is still possible to select CPU mode with LIBCAMERA_SOFISP_MODE=cpu.
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
Tested-by: Hans de Goede <johannes.goede@oss.qualcomm.com> # ThinkPad T14s gen 6 (arm64) ov02c10 + X1c gen 12 ov08x40
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> # Lenovo X13s
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
If GPUISP support is available make it so an environment variable can
switch it on.
Given we don't have full feature parity with CPUISP just yet on pixel
format output, we should default to CPUISP mode giving the user the option
to switch on GPUISP by setting LIBCAMERA_SOFTISP_MODE=gpu
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
Tested-by: Hans de Goede <johannes.goede@oss.qualcomm.com> # ThinkPad T14s gen 6 (arm64) ov02c10 + X1c gen 12 ov08x40
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> # Lenovo X13s
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Make getInputConfig and getOutputConfig static so as to allow for
interrogation of the supported pixel formats prior to object instantiation.
Do this so as to allow the higher level logic make an informed choice
between CPU and GPU ISP based on which pixel formats are supported.
Currently CPU ISP supports more diverse input and output schemes.
Acked-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
Tested-by: Hans de Goede <johannes.goede@oss.qualcomm.com> # ThinkPad T14s gen 6 (arm64) ov02c10 + X1c gen 12 ov08x40
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> # Lenovo X13s
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
In order to have Debayer::start() tell the eGL shader compilation routine what
the input and output pixel format is, we need to have a copy of the
selected format available. Add variables to the inputConfig and
outputConfig structures to allow tracking of this data for later use.
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
Tested-by: Hans de Goede <johannes.goede@oss.qualcomm.com> # ThinkPad T14s gen 6 (arm64) ov02c10 + X1c gen 12 ov08x40
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> # Lenovo X13s
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Pass contrastExp as calculated in lut to debayer params not the raw
contrast. This way we calculate contrastExp once per frame in lut and pass
the calculated value into the shaders, instead of passing contrast and
calculating contrastExp once per pixel in the shaders.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
Tested-by: Hans de Goede <johannes.goede@oss.qualcomm.com> # ThinkPad T14s gen 6 (arm64) ov02c10 + X1c gen 12 ov08x40
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> # Lenovo X13s
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
In order to initialise and deinitialise gpuisp we need to be able to setup
EGL in the same thread as Debayer::process() happens in.
This requires extending the Debayer object to provide start and stop
methods which are triggered through invokeMethod in the same way as
process() is.
Introduce start() and stop() methods to the Debayer class. Trigger those
methods as described above via invokeMethod. The debayer_egl class will
take care of initialising and de-initialising as necessary. Debayer CPU
sees no functional change.
Per feedback from Barnabas the stop method is using blocking
synchronisation and thus we drop ispWorkerThread_.removeMessages().
[bod: Made method blocking not queued per Robert's bugfixes]
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Move the initialisation of Bayer params and CCM to a new constructor in the
Debayer class.
Ensure we call the base class constructor from DebayerCpu's constructor in
the expected constructor order Debayer then DebayerCpu.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
The DebayerCpu class has a number of variables, embedded structures and
methods which are useful to DebayerGpu implementation.
Move relevant variables and methods to base class.
Since we want to call setParams() from the GPUISP and reuse the code in
the existing CPUISP as a first step, we need to move all of the
dependent variables in DebayerCPU to the Debayer base class including
LookupTable and redCcm_.
The DebayerEGL class will ultimately be able to consume both the CCM and
non-CCM data.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Add a method to the SwstatsCpu class to process a whole Framebuffer in
one go, rather then line by line. This is useful for gathering stats
when debayering is not necessary or is not done on the CPU.
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
[bod: various rebase splats fixed]
[bod: Added constructor Doxygen header]
[bod: Squashed a fix from Hans to calculate stats on every 4th frame]
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
patternSize_ is a private variable and its meaning is already documented
in the patternSize() getter documentation.
Move the list of valid sizes to the patternSize() getter documentation
and drop the patternSize_ documentation.
While at it also add 1x1 as valid size for use with future support
of single plane non Bayer input data.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Update the documentation of the statsProcessFn() / processLine0() src[]
pointer argument to take into account that swstats_cpu may also be used
with planar input data or with non Bayer single plane input data.
The statsProcessFn typedef is private, so no documentation is generated
for it. Move the new updated src[] pointer argument documentation to
processLine0() so that it gets included in the generated docs.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
For debugging purposes, threads can be assigned a name, which eases
distinguishing between them in e.g. htop or gdb. This uses a
Linux-specific API for now which is limited to 15 characters (+ null
terminator), so truncation is done and names for existing thread
instantiations were chosen to be consise.
[Kieran: Apply checkstyle suggestions, rebase on proxy rework]
Signed-off-by: Schulz, Andreas <andreas.schulz2@karlstorz.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Reviewed-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
When CPU ISP is asked to apply the CCM matrix
[0 1 0]
[0 0 0]
[0 0 0]
for a format that requires swapping red and blue channels, the resulting
image has a wrong colour. The CCM matrix above should take green from
pixels and make it red. Instead, the image is blue.
The problem is that the lookup tables setup in CPU debayering swaps red
and blue in the lookup tables for red and blue, but not for green. The
colours must be swapped also in the lookup table for green, which this
patch adds.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Debayering is carried out on `ispWorkerThread_`. When stopping, the queued
work needs to be flushed or cancelled to ensure that the next time it starts,
it won't process stale data. So remove all messages targeting the `Debayer`
object on the worker thread.
Signed-off-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
The window coordinates passed to SwStatsCpu::setWindow are confusing.
Let's clarify what the coordinates should be.
A source of confusion is that the specified window is relative to the
processed area. Debayering adjusts line pointers for its processed area
and this is what's also passed to stats processing. The window passed
to SwStatsCpu::setWindow should either specify the size of the whole
processed (not image) area, or its cropping in case the stats shouldn't
be gathered over the whole processed area. This patch should clarify
this in the code.
Reviewed-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
Tested-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
SwStatsCpu::setWindow reduces the window width by the added x-offset, to
prevent exceeding image bounds. But if the window width is smaller than
the x-offset, we get unsigned integer underflow. Fix it by setting the
window width to 0 in such a case.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Tested-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Run sw-statistics once every 4th frame, instead of every frame. There are
2 reasons for this:
1. There really is no need to have statistics for every frame and only
doing this every 4th frame helps save some CPU time.
2. The generic nature of the simple pipeline-handler, so no information
about possible CSI receiver frame-delays. In combination with the software
ISP often being used with sensors without sensor info in the sensor-helper
code, so no reliable control-delay information means that the software ISP
is prone to AGC oscillation. Skipping statistics gathering also means
skipping running the AGC algorithm slowing it down, avoiding this
oscillation.
Note ideally the AGC oscillation problem would be fixed by adding sensor
metadata support all through the stack so that the exact gain and exposure
used for a specific frame are reliably provided by the sensor metadata.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Tested-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Generating statistics for every single frame is not really necessary.
However a roundtrip through ipa_->processStats() still need to be done
every frame, even if there are no stats to make the IPA generate metadata
for every frame.
Add a valid flag to the statistics struct to let the IPA know when there
are no statistics for the frame being processed and modify the IPA to
only generate metadata for frames without valid statistics.
This is a preparation patch for skipping statistics generation for some
frames.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Software ISP performs performance measurement on certain part of initial
frames. Let's make this range configurable.
For this purpose, this patch introduces new configuration options
software_isp.measure.skip and software_isp.measure.number. Setting the
latter one to 0 disables the measurement.
Instead of the last frame, the class member and its configuration
specify the number of frames to measure. This is easier to use for
users and doesn't require to adjust two configuration parameters when
the number of the initially skipped frames is changed.
The patch also changes the names of the class members to make them more
accurate.
Completes software ISP TODO #7.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Paul Elder <paul.elder@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
On some platforms, working directly on the input buffer is very slow due
to disabled caching. This is why we copy the input buffer into standard
(cached) memory. This is an unnecessary overhead on platforms with
cached buffers.
Let's make input buffer copying configurable. The default is still
copying, as its overhead is much lower than contingent operations on
non-cached memory. Ideally, we should improve this in future to set the
default to non-copying if we can be sure under observable circumstances
that we are working with cached buffers.
Completes software ISP TODO #6.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Paul Elder <paul.elder@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Type casting from unsigned int to int performed in stats computation is
unnecessary, window_.width is unsigned and the array index is always
non-negative. Let's simply use unsigned int in all the
SwStatsCpu:stats* methods.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Umang Jain <uajain@igalia.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Report exposure and gain in metadata.
This is more complicated than it could be expected because the exposure
value should be in microseconds but it's handled using V4L2_CID_EXPOSURE
control, which doesn't specify the unit, see
https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/control.html.
So the unit conversion is done in the way rkisp1 IPA uses.
This requires getting and passing IPACameraSensorInfo around. To avoid
naming confusion and to improve consistency with rkisp1 IPA,
sensorCtrlInfoMap parameter is renamed to sensorControls.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Extend the Simple IPA IPC to support returning a metadata ControlList
when the process call has completed.
A new signal from the IPA is introduced to report the metadata,
similarly to what the hardware pipelines do.
Merge the metadata reported by the ISP into any completing request to
provide to the application. Completion of a request is delayed until
this is done; this doesn't apply to canceled requests.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
This patch applies color correction matrix (CCM) in debayering if the
CCM is specified. Not using CCM must still be supported for performance
reasons.
The CCM is applied as follows:
[r1 g1 b1] [r]
[r2 g2 b2] * [g]
[r3 g3 b3] [b]
The CCM matrix (the left side of the multiplication) is constant during
single frame processing, while the input pixel (the right side) changes.
Because each of the color channels is only 8-bit in software ISP, we can
make 9 lookup tables with 256 input values for multiplications of each
of the r_i, g_i, b_i values. This way we don't have to multiply each
pixel, we can use table lookups and additions instead. Gamma (which is
non-linear and thus cannot be a part of the 9 lookup tables values) is
applied on the final values rounded to integers using another lookup
table.
Because the changing part is the pixel value with three color elements,
only three dynamic table lookups are needed. We use three lookup tables
to represent the multiplied matrix values, each of the tables
corresponding to the given matrix column and pixel color.
We use int16_t to store the precomputed multiplications. This seems to
be noticeably (>10%) faster than `float' for the price of slightly less
accuracy and it covers the range of values that sane CCMs produce. The
selection and structure of data is performance critical, for example
using bytes would add significant (>10%) speedup but would be too short
to cover the value range.
The color lookup tables can be represented either as unions,
accommodating tables for both the CCM and non-CCM cases, or as separate
tables for each of the cases, leaving the tables for the other case
unused. The latter is selected as a matter of preference.
The tables are copied (as before), which is not elegant but also not a
big problem. There are patches posted that use shared buffers for
parameters passing in software ISP (see software ISP TODO #5) and they
can be adjusted for the new parameter format.
Color gains from white balance are supposed not to be a part of the
specified CCM. They are applied on it using matrix multiplication,
which is simple and in correspondence with future additions in the form
of matrix multiplication, like saturation adjustment.
With this patch, the reported per-frame slowdown when applying CCM is
about 45% on Debix Model A and about 75% on TI AM69 SK.
Using std::clamp in debayering adds some performance penalty (a few
percent). The clamping is necessary to eliminate out of range values
possibly produced by the CCM. If it could be avoided by adjusting the
precomputed tables some way then performance could be improved a bit.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>