Make the storage used to accumulate the RGB sums and the Y histogram
value a vector of SwIspStats objects instead of a single object so
that when using multi-threading every thread can use its own storage to
collect intermediate stats to avoid cache-line bouncing.
Benchmarking with the GPU-ISP which does separate swstats benchmarking,
on the Arduino Uno-Q which has a weak CPU which is good for performance
testing, shows 20ms to generate stats for a 3272x2464 frame both before
and after this change.
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
With the GPU accelerated softISP 2 separate benchmark results are printed,
1 for the generation of the output images on the GPU and a separate one
for generating the statistics on the CPU.
Add a new name argument to the Benchmark class descriptor and print this
out when printing the benchmark result.
Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
When calling `Debayer::process()` from `SoftwareIsp::process()`, the
`DebayerParams` object is copied multiple times:
(1) call of `BoundMethodMember<...>::activate()`
inside `Object::invokeMethod()`
(2) constructor of `BoundMethodArgs<...>`
inside `BoundMethodMember<...>::activate()`
(3) call of `BoundMethodMember<...>::invoke()`
inside `BoundMethodArgs::invokePack()`
(4) call of the actual pointer to member function
inside `BoundMethodMember::invoke()`
While compilers might avoid one or two of the above copies, this is still
not ideal. By making `Debayer::process()` take the parameter object by
const lvalue reference, only the copy in the `BoundMethodArgs` constructor
remains. So do that.
Before:
[0:12:51.133836595] [12424] DEBUG SoftwareIsp software_isp.cpp:399 params=0x7d0a691f57d0
copy from 0x7d0a691f57d0 into 0x7baa65f2bf30
copy from 0x7baa65f2bf30 into 0x7c6a69209758
copy from 0x7c6a69209758 into 0x7baa63223930
copy from 0x7baa63223930 into 0x7baa63223a70
[0:12:51.134559602] [12426] DEBUG eGL debayer_egl.cpp:538 params=0x7baa63223a70
771.099877 (30.06 fps) cam0-stream0 seq: 000031 bytesused: 8666112
After:
[0:13:42.861691943] [12543] DEBUG SoftwareIsp software_isp.cpp:399 params=0x7cfaad5f57d0
copy from 0x7cfaad5f57d0 into 0x7c5aad609758
[0:13:42.862453917] [12545] DEBUG eGL debayer_egl.cpp:538 params=0x7c5aad609758
822.827388 (30.02 fps) cam0-stream0 seq: 000031 bytesused: 8666112
Signed-off-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
The default CCM in uncalibrated.yaml is just an identity transformation
and has been enabled by default only to always provide a correction
matrix to GPU ISP. It slows down CPU ISP when CCM is not used.
Now, when a default correction matrix is always provided to GPU ISP, we
can disable the Ccm algorithm in uncalibrated.yaml again. The check for
ccmEnabled in GPU ISP is no longer needed and it must be removed in
order not to fail when Ccm algorithm is not enabled. ccmEnabled flag is
still needed in CPU ISP where the processing differs based on whether
CCM is present or not.
Reviewed-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
Reviewed-by: Robert Mader <robert.mader@collabora.com>
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
The Lut algorithm is not really an algorithm. Moreover, algorithms may
be enabled or disabled but with Lut disabled, nothing will work.
Let's move the construction of lookup tables to CPU debayering, where it
is used. The implied and related changes are:
- DebayerParams is changed to contain the real params rather than lookup
tables.
- contrastExp parameter introduced by GPU ISP is used for CPU ISP too.
- The params must be initialised so that debayering gets meaningful
parameter values even when some algorithms are disabled.
- combinedMatrix must be put to params everywhere where it is modified.
- Matrix changes needn't be tracked in the algorithms any more.
- CPU debayering must watch for changes of the corresponding parameters
to update the lookup tables when and only when needed.
- Swapping red and blue is integrated into lookup table constructions.
- gpuIspEnabled flags are removed as they are not needed any more.
Reviewed-by: Robert Mader <robert.mader@collabora.com>
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Acked-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
The black level offset subtracted in AWB is wrong. It assumes that the
stats contain sums of the individual colour pixels. But they actually
contain sums of the colour channels of larger "superpixels" consisting
of the individual colour pixels. Each of the RGB colour values and the
computed luminosity (a histogram entry) are added once to the stats per
such a superpixel. This means the offset computed from the black level
and the number of pixels should be used as it is, not divided.
The patch fixes the subtracted offset. Since the evaluation is the same
for all the three colours now, the individual class variables are
replaced with a single RGB variable.
Fixes: 4e13c6f55b ("Honor black level in AWB")
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Robert Mader <robert.mader@collabora.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Mesa surfaceless platform appears to be a better fit for the use-case at hand:
1. Like GBM it is Mesa specific, so no change in supported setups is
expected. If ever required, a fallback to the generic device platform
could be added on top.
2. It leaves the complexity of selecting a renderer device to the
driver, reducing code and dependencies.
3. It allows to use llvmpipe / software drivers without dri device,
which can be useful on CI or debugging (with LIBGL_ALWAYS_SOFTWARE=1).
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Tested-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> # sm8250/rb5, x1e/Dell Insprion14p
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Milan Zamazal <mzamazal@redhat.com> # TI AM69
Tested-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com> # ThinkPad X1 Yoga Gen 7 + ov2740
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
In some cases the GPU can deliver 15x performance in Debayer with the
CCM on, reference hardware Qualcomm RB5 with IMX512 sensor.
Given this large performance difference it makes sense to make GPUISP
the default for the Software ISP.
If LIBCAMERA_SOFTISP_MODE is omitted gpu will be the default. If
libcamera is compiled without gpuisp support, CPU Debayer will be used.
It is still possible to select CPU mode with LIBCAMERA_SOFISP_MODE=cpu.
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
Tested-by: Hans de Goede <johannes.goede@oss.qualcomm.com> # ThinkPad T14s gen 6 (arm64) ov02c10 + X1c gen 12 ov08x40
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> # Lenovo X13s
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
If GPUISP support is available make it so an environment variable can
switch it on.
Given we don't have full feature parity with CPUISP just yet on pixel
format output, we should default to CPUISP mode giving the user the option
to switch on GPUISP by setting LIBCAMERA_SOFTISP_MODE=gpu
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
Tested-by: Hans de Goede <johannes.goede@oss.qualcomm.com> # ThinkPad T14s gen 6 (arm64) ov02c10 + X1c gen 12 ov08x40
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> # Lenovo X13s
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Make getInputConfig and getOutputConfig static so as to allow for
interrogation of the supported pixel formats prior to object instantiation.
Do this so as to allow the higher level logic make an informed choice
between CPU and GPU ISP based on which pixel formats are supported.
Currently CPU ISP supports more diverse input and output schemes.
Acked-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
Tested-by: Hans de Goede <johannes.goede@oss.qualcomm.com> # ThinkPad T14s gen 6 (arm64) ov02c10 + X1c gen 12 ov08x40
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> # Lenovo X13s
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
In order to have Debayer::start() tell the eGL shader compilation routine what
the input and output pixel format is, we need to have a copy of the
selected format available. Add variables to the inputConfig and
outputConfig structures to allow tracking of this data for later use.
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
Tested-by: Hans de Goede <johannes.goede@oss.qualcomm.com> # ThinkPad T14s gen 6 (arm64) ov02c10 + X1c gen 12 ov08x40
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> # Lenovo X13s
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Pass contrastExp as calculated in lut to debayer params not the raw
contrast. This way we calculate contrastExp once per frame in lut and pass
the calculated value into the shaders, instead of passing contrast and
calculating contrastExp once per pixel in the shaders.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
Tested-by: Hans de Goede <johannes.goede@oss.qualcomm.com> # ThinkPad T14s gen 6 (arm64) ov02c10 + X1c gen 12 ov08x40
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> # Lenovo X13s
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
In order to initialise and deinitialise gpuisp we need to be able to setup
EGL in the same thread as Debayer::process() happens in.
This requires extending the Debayer object to provide start and stop
methods which are triggered through invokeMethod in the same way as
process() is.
Introduce start() and stop() methods to the Debayer class. Trigger those
methods as described above via invokeMethod. The debayer_egl class will
take care of initialising and de-initialising as necessary. Debayer CPU
sees no functional change.
Per feedback from Barnabas the stop method is using blocking
synchronisation and thus we drop ispWorkerThread_.removeMessages().
[bod: Made method blocking not queued per Robert's bugfixes]
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Move the initialisation of Bayer params and CCM to a new constructor in the
Debayer class.
Ensure we call the base class constructor from DebayerCpu's constructor in
the expected constructor order Debayer then DebayerCpu.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
The DebayerCpu class has a number of variables, embedded structures and
methods which are useful to DebayerGpu implementation.
Move relevant variables and methods to base class.
Since we want to call setParams() from the GPUISP and reuse the code in
the existing CPUISP as a first step, we need to move all of the
dependent variables in DebayerCPU to the Debayer base class including
LookupTable and redCcm_.
The DebayerEGL class will ultimately be able to consume both the CCM and
non-CCM data.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Add a method to the SwstatsCpu class to process a whole Framebuffer in
one go, rather then line by line. This is useful for gathering stats
when debayering is not necessary or is not done on the CPU.
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
[bod: various rebase splats fixed]
[bod: Added constructor Doxygen header]
[bod: Squashed a fix from Hans to calculate stats on every 4th frame]
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
patternSize_ is a private variable and its meaning is already documented
in the patternSize() getter documentation.
Move the list of valid sizes to the patternSize() getter documentation
and drop the patternSize_ documentation.
While at it also add 1x1 as valid size for use with future support
of single plane non Bayer input data.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Update the documentation of the statsProcessFn() / processLine0() src[]
pointer argument to take into account that swstats_cpu may also be used
with planar input data or with non Bayer single plane input data.
The statsProcessFn typedef is private, so no documentation is generated
for it. Move the new updated src[] pointer argument documentation to
processLine0() so that it gets included in the generated docs.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
For debugging purposes, threads can be assigned a name, which eases
distinguishing between them in e.g. htop or gdb. This uses a
Linux-specific API for now which is limited to 15 characters (+ null
terminator), so truncation is done and names for existing thread
instantiations were chosen to be consise.
[Kieran: Apply checkstyle suggestions, rebase on proxy rework]
Signed-off-by: Schulz, Andreas <andreas.schulz2@karlstorz.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Reviewed-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
When CPU ISP is asked to apply the CCM matrix
[0 1 0]
[0 0 0]
[0 0 0]
for a format that requires swapping red and blue channels, the resulting
image has a wrong colour. The CCM matrix above should take green from
pixels and make it red. Instead, the image is blue.
The problem is that the lookup tables setup in CPU debayering swaps red
and blue in the lookup tables for red and blue, but not for green. The
colours must be swapped also in the lookup table for green, which this
patch adds.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Debayering is carried out on `ispWorkerThread_`. When stopping, the queued
work needs to be flushed or cancelled to ensure that the next time it starts,
it won't process stale data. So remove all messages targeting the `Debayer`
object on the worker thread.
Signed-off-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
The window coordinates passed to SwStatsCpu::setWindow are confusing.
Let's clarify what the coordinates should be.
A source of confusion is that the specified window is relative to the
processed area. Debayering adjusts line pointers for its processed area
and this is what's also passed to stats processing. The window passed
to SwStatsCpu::setWindow should either specify the size of the whole
processed (not image) area, or its cropping in case the stats shouldn't
be gathered over the whole processed area. This patch should clarify
this in the code.
Reviewed-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
Tested-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
SwStatsCpu::setWindow reduces the window width by the added x-offset, to
prevent exceeding image bounds. But if the window width is smaller than
the x-offset, we get unsigned integer underflow. Fix it by setting the
window width to 0 in such a case.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Barnabás Pőcze <barnabas.pocze@ideasonboard.com>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Tested-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Run sw-statistics once every 4th frame, instead of every frame. There are
2 reasons for this:
1. There really is no need to have statistics for every frame and only
doing this every 4th frame helps save some CPU time.
2. The generic nature of the simple pipeline-handler, so no information
about possible CSI receiver frame-delays. In combination with the software
ISP often being used with sensors without sensor info in the sensor-helper
code, so no reliable control-delay information means that the software ISP
is prone to AGC oscillation. Skipping statistics gathering also means
skipping running the AGC algorithm slowing it down, avoiding this
oscillation.
Note ideally the AGC oscillation problem would be fixed by adding sensor
metadata support all through the stack so that the exact gain and exposure
used for a specific frame are reliably provided by the sensor metadata.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Tested-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Generating statistics for every single frame is not really necessary.
However a roundtrip through ipa_->processStats() still need to be done
every frame, even if there are no stats to make the IPA generate metadata
for every frame.
Add a valid flag to the statistics struct to let the IPA know when there
are no statistics for the frame being processed and modify the IPA to
only generate metadata for frames without valid statistics.
This is a preparation patch for skipping statistics generation for some
frames.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Reviewed-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Milan Zamazal <mzamazal@redhat.com>
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Software ISP performs performance measurement on certain part of initial
frames. Let's make this range configurable.
For this purpose, this patch introduces new configuration options
software_isp.measure.skip and software_isp.measure.number. Setting the
latter one to 0 disables the measurement.
Instead of the last frame, the class member and its configuration
specify the number of frames to measure. This is easier to use for
users and doesn't require to adjust two configuration parameters when
the number of the initially skipped frames is changed.
The patch also changes the names of the class members to make them more
accurate.
Completes software ISP TODO #7.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Paul Elder <paul.elder@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
On some platforms, working directly on the input buffer is very slow due
to disabled caching. This is why we copy the input buffer into standard
(cached) memory. This is an unnecessary overhead on platforms with
cached buffers.
Let's make input buffer copying configurable. The default is still
copying, as its overhead is much lower than contingent operations on
non-cached memory. Ideally, we should improve this in future to set the
default to non-copying if we can be sure under observable circumstances
that we are working with cached buffers.
Completes software ISP TODO #6.
Signed-off-by: Milan Zamazal <mzamazal@redhat.com>
Reviewed-by: Paul Elder <paul.elder@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>