Added filtering of shadowed color regions to registration #253

kohrt · 2015-06-02T09:40:56Z

Hi,

I tested the registration and it works fine, nice work! But it ran a bit slow on my system, so I decided to improve it. The original version took ~5.1 ms on my system. With this changes it went down to ~1 ms.

I achieved this by replacing the multi dimensional array with a one dimensional one, which is structured similar to the image data. map[i] is the value for depth[i], etc. So you only need to walk through the array with a pointer and you are done.

The new maps also include some static multiplications and additions which where computed at runtime before. There is also a new map for y image offsets, to reduce computations while runtime even more.

The rounding of the x coordinate is replaced by const int cx = rx + 0.5f;, which should be the same for positive numbers. But all valid x coordinates should be greater equals 0 anyway. To make sure that x is positive, a check for rx > -0.5f is added to the if statement below.

The last thing I changed is the API of the apply method. I changed the type for the registered image to libfreenect2::Frame, so that it is possible to check for the correct size.

floe · 2015-06-02T12:20:42Z

Awesome, I was planning for something similar, but I wouldn't have assumed that it's actually that fast. I'll merge this soon-ish.

floe · 2015-06-02T12:27:24Z

examples/protonect/src/registration.cpp

-  rx = wx / (color.fx * color_q);
-  ry = wy / (color.fx * color_q);
+  rx = (wx / (color.fx * color_q)) - (color.shift_m / color.shift_d);
+  ry = (wy / (color.fx * color_q)) * color.fy + color.cy;


Should this be color.fy in the second line? Then it could also be simplified to ry = (wy / color_q) + color.cy;

I think it should be fy, I didn't checked the previous code for errors. But from the senser provided intrinsics fy and fx are always the same. But if someone uses other intrinsics this could lead to wrong values, I will change it.

kohrt · 2015-06-02T14:20:42Z

I was impressed too. After restructuring the maps and using pointers it went down to 2.3 ms. With the static computations added to the maps it went down to 1.9 ms. With the extra map for cy and floorf(rx + 0.5f) instead of round(rx), it was 1.3 ms. And finally after replacing floorf(rx + 0.5f) by (int)(rx + 0.5f) it was only 1 ms.
I think now it is so fast, that there is no need for an OpenCL/GL/CUDA implementation (at least on fast x86 machines). I am afraid, that copying the color and depth image to the GPU and reading the result might consume a lot of this 1 ms, so there is less headroom to improve.

xlz · 2015-06-03T19:12:31Z

Does this include depth filtering? I think that's the hard part, involving a depth map of 1920x1080 (or half the size).

floe · 2015-06-04T08:37:51Z

@xlz you mean depth buffer? Not yet. I have the undistortion working now, unfortunately in a separate branch. Currently checking what's the best way to integrate this PR with my code.

kohrt · 2015-06-04T13:09:58Z

Well, this is now only a color -> depth registration, so there won't be a high resolution depth map. I though about a depth -> color registration using the inverse of this registration, but didn't found a way to do this, while cy is static for each x,y, the depth value is involved to in the middle of the calculation to find the corresponding cx coordinate.
For the filtering I thought, one could create an array of the size of the color image and store for each pixel if it has been used for the registered color image (and for which registered color pixel it has been used). If two registered color pixels are using the same color image pixel, just select the one that belongs to the depth pixel with the lower distance. But the problem is that the color image is much bigger, so that the background and foreground pixels could just map to neighboring color pixels and then you won't find out, that there are background pixel. So for the filtering we need something more fancy.

floe · 2015-06-04T13:46:12Z

@xlz has a branch where a depth buffer has already been implemented, see his comment in #223.
EDIT: wrong issue, can't find the comment anymore - @xlz could you re-post a link to your depth buffer code?

xlz · 2015-06-04T15:00:18Z

This one does depth filter: https://github.com/xlz/libfreenect2/commit/a0c0b6e5402ad363dcd5bbe58ebab8d093ea2f80

What I meant for depth map is an intermediate map to filter duplicate pixels, just as you described. The fancy something is erosion which extracts minimum of neighboring pixels. Small erosion kernel size will produce aliasing in the edge. There was unavoidable aliasing with erosion on a half-size depth filter so I chose full-size.

In depth to color registration, it is upsampling the depth image so higher depth resolution wouldn't mean higher depth quality. In color to depth registration, even naive downsampling of RGB image will still retain very RGB quality in the pointcloud.

kohrt · 2015-06-04T15:26:39Z

Nice. I though of something like this, setting a whole neighborhood instead of one pixel. But this filtering is pretty expensive, especially the erode.

Yeah, you can't add information when upsampling, but you will get more detailed texture in the point cloud. It always depends on what you want.

kohrt · 2015-06-05T12:42:16Z

I updated the pull request. It now includes undistortion of the depth and registered color image. It is still arround 0.9 to 1.0 ms. on my system. There was also a double conversion in one of the if statements, which I also replaced.
Maybe we can also build in the filtering. Shouldn't be to hard to do this without opencv.

floe · 2015-06-05T20:33:07Z

Great! Just one small additional request: could you add some comments to your code? There's a fair amount of pointer arithmetic going on, and it's rather easy to lose track...

kohrt · 2015-06-08T11:30:39Z

Added some comments, hope it is easier to keep track now.

kohrt · 2015-06-10T14:13:33Z

I just implemented the filtering from @xlz into the registration, but without any OpenCV. I tried to optimize the filtering as well. The hole registration with filtering takes now ~5.5 ms. on my system.

@xlz: could you check if the filtering is fine like this and if it works similar to your implementation?

xlz · 2015-06-10T15:53:29Z

Good work. Unfortunately I won't be able to test it for a while due to traveling etc.

I roughly looked at the new code. Most looks good. One issue: don't assume 3 bytes per pixel. Hardware decoders (Tegra, Vaapi) will return RGBA format.

IIUC this is still a 5x3 filter on a 1920x1080 image. If you haven't done so, also verify if there is any aliasing with maximum depth range.

HenningJ · 2015-06-10T16:54:05Z

It does not always do what it's supposed to:

Seems to work for my "shadow", but the shadow of the cup in front of me is not completely black. Is that because it's quite close to the camera?
And what's going on with the bottle? Is that because it's mostly transparent?

But otherwise it looks pretty good

More general question: the (small) shadow on the left side (that's in the original depth image) is because of the distance between IR-emitter and IR-sensor? and the (larger) shadow on the right side (in the registered image) is because of the distance between IR-sensor and RGB-sensor? Is that correct?

kohrt · 2015-06-10T17:58:28Z

@xlz ok, didn't know that there are rgba color images. Is the byte order RGBA?
Yes, it's a 5 x 3 filter. Should be the same as or at least similar to your implementation. Instead of eroding later, it's done in one step.
How do I check for aliasing? Could an adaptive filter that is bigger for near pixels and smaller for far ones solve those issues?
I also want to make the filtering optional, so that it can be disabled if not needed.

@HenningJ the filter only works if there are depth values. At the borders of objects, like the cup, you don't get depth values, so there are no foreground depth pixel which use the color pixel from the border of the cup. Only the depth pixel from the background map to those border pixels, therefore they get used for those background depth pixels.
The same is true for the bottle. You can see that there are no values for the middle area (possibly to close to the camera), so the color pixels are used for the background pixels.
The shadow on the left is probably from the distance between the emitter and the sensor.

xlz · 2015-06-10T19:33:14Z

RGBA yes.

If you see nothing strange then no aliasing. Do not need fancy filtering here.

Optional sure.

floe · 2015-06-10T20:18:33Z

What I did in my (unmerged) registration code is to simply treat all color image data as RGBA (i.e. registered color images are output as RGBA, and input images are assumed to be RGBA, too). Only difference for RGB images is that the source pointer is advanced by 3 instead of 4 bytes. IMHO, this has two advantages:

we can always use uint32_t operations, which are generally not slower, and often faster, than moving 3 separate bytes around
there is an additional data channel available which could be used for downsampled depth data.

However, I would not implement this within this current PR. @wiedemeyer if possible, I'd like to split this into several PRs:

the original one (this one here, Added filtering of shadowed color regions to registration #253) doing just the speedup
a separate one for the depth buffer
and possible a third for parameters, tweaks etc.

floe · 2015-06-10T21:07:04Z

I tried to cherry-pick the commits except the last one (97976ea), but apparently Github isn't smart enough to fast-forward the PR on its own. Can you try and see what happens if you merge master into this PR?

kohrt · 2015-06-11T06:44:21Z

I rebased it to the current master. But I saw you already pulled the first commits.

HenningJ · 2015-06-11T07:44:55Z

@wiedemeyer thanks for the explanation.

kohrt · 2015-06-11T08:01:01Z

Just to make sure, the color image from the CPU JPEG endcoding has the BGR format, but VAAPI and Tegra have RGBA, right?
I just added the changes, so that the code can handle 3 and 4 byte color images. It also improved performance a bit (5.0 instead of 5.5 ms.).

…set to zero.

kohrt · 2015-06-11T09:04:54Z

Yes, of course. I saw it right after I pushed.

HenningJ · 2015-06-11T09:06:51Z

enough pestering, gotta get back to work ;-)

kohrt · 2015-06-11T09:14:22Z

If the input image is 3 byte the alpha channel of the output image will be set to the blue channel value of the next pixel.
First I though a 3 byte output image would be ok too, then the next run would overwrite the previous 4th byte anyway, but this will always write one byte after the end of the buffer, which might not be so good.

HenningJ · 2015-06-11T09:24:03Z

First I though a 3 byte output image would be ok too, then the next run would overwrite the previous 4th byte anyway

Yeah, that's what I meant.

but this will always write one byte after the end of the buffer, which might not be so good.

This will happen now anyways, won't it? You're still writing a four byte (unsigned) int when there are only 3 bytes left in the 3 byte output image. You're just writing 0 to the additional byte, instead of some other number. If there is anything at that byte, it's gonna be bad either way. So I don't see any scenario where the additional &-operation actually helps.

kohrt · 2015-06-11T09:34:42Z

Ok, the & could be removed, if everyone ignores that byte anyway.
The current implementation only allows 4 byte output images. So that writing over the buffer is not an issue.
Maybe it would be nicer if all color images in libfreenect2 have the same format either 3 or 4 byte, then no special cases need to be handled. Especially users can be sure that the image format is always the same, independent from the JPEG processor that is being used.

HenningJ · 2015-06-11T09:39:18Z

Ok, the & could be removed, if everyone ignores that byte anyway.

Well...if anybody needs that byte to be 0, then of course your implementation is correct, it's just the comment thats wrong. ;-)

floe · 2015-06-11T10:00:35Z

I think that we should simply change turbo_jpeg_rgb_packet_processor.cpp, lines 76 and 118, and replace TJPF_BGR by TJPF_RGBX. That would reduce the number of special cases quite a bit.

kohrt · 2015-06-11T11:10:05Z

I tried it out, and it makes things much more beautiful.
But if TJPF_RGBX is used, the images need to be converted to BGRX or BGR before then can be displayed with OpenCV in Protonect.
So it might be better to use TJPF_BGRX or even TJPF_BGRA (if we care about the values in the alpha channel). @xlz could VAAPI and Tegra also output BGRA instead of RGBA?

kohrt · 2015-06-11T11:20:55Z

should I push the changes to this pull request so you can try them out?

floe · 2015-06-11T12:02:33Z

Yes, please. Regarding BGRA vs RGBA: we're trying to get rid of OpenCV anyway, the viewer from PR #261 is based on OpenGL and can use RGBA. But for the moment, TJPF_BGRX would probably be the best choice.

Updated registration and removed handling of 3 byte color images. Updated protonect to display color image correct.

kohrt · 2015-06-11T13:32:23Z

I updated the PR.
I would vote for sticking with the BGRA format. But mainly because I am used to it. I did not find any information on which format is better. Only http://en.wikipedia.org/wiki/RGBA_color_space mentioned, that colors are commonly stored as ARGB in a 32 bit word, which results in BGRA on little endian.

floe · 2015-06-11T13:58:26Z

OK, from what I've seen, any OpenGL implementation which can deal with libfreenect2 should also be able to handle BGRA just fine.

xlz · 2015-06-11T14:27:43Z

VA-API can output both. Tegra can only output RGBA.

floe · 2015-06-11T19:07:35Z

For the registration, the color order probably doesn't even matter now that we simply treat all individual pixels as uint32_t. @wiedemeyer Did you notice any speed issues with the color image decoder? After all, it needs to push 33% more data around now for each frame...

kohrt · 2015-06-11T19:20:12Z

Yeah, the order does not matter for the registration. The registration with filter takes ~5.7 ms. and 1.0 ms. without. So no real changes.
I didn't looked at the jpeg decoder, but I guess that moving 33% more bytes is just a fraction of the time needed to decode the jpeg data.

kohrt · 2015-06-12T09:36:10Z

Just let both versions run of the jpeg decoder run and both are around 14 ms.

floe · 2015-06-14T14:09:41Z

Excellent, good to know. Merged.

Added filtering of shadowed color regions to registration

floe reviewed Jun 2, 2015
View reviewed changes

floe added this to the 0.1 milestone Jun 2, 2015

floe mentioned this pull request Jun 5, 2015

Minimum hardware requirements (general question, not a bug or an issue) #251

Closed

implemented filtering of shadowed regions.

d3d3477

registration code can now handle 3 byte and 4 byte color images.

05d73e0

kohrt changed the title ~~Improved speed of registration by factor 5~~ Added filtering of shadowed color regions to registration Jun 11, 2015

Thiemo Wiedemeyer added 3 commits June 11, 2015 10:40

made filtering optional, but enabled by default.

4149685

small bug fix. always output 4 byte color image and alpha channel is …

da4a8cb

…set to zero.

updated protonect due to registration changes.

3405757

Changed jpeg processor to always output BGRX format.

d823f84

Updated registration and removed handling of 3 byte color images. Updated protonect to display color image correct.

floe added a commit that referenced this pull request Jun 14, 2015

Merge pull request #253 from wiedemeyer/improved_registration

0fb3667

Added filtering of shadowed color regions to registration

floe merged commit 0fb3667 into OpenKinect:master Jun 14, 2015

kohrt deleted the improved_registration branch June 16, 2015 14:12

floe mentioned this pull request Jul 6, 2015

Registration #223

Closed

Added filtering of shadowed color regions to registration #253

Added filtering of shadowed color regions to registration #253

Uh oh!

Conversation

kohrt commented Jun 2, 2015

Uh oh!

floe commented Jun 2, 2015

Uh oh!

floe Jun 2, 2015

Choose a reason for hiding this comment

Uh oh!

kohrt Jun 2, 2015

Choose a reason for hiding this comment

Uh oh!

kohrt commented Jun 2, 2015

Uh oh!

xlz commented Jun 3, 2015

Uh oh!

floe commented Jun 4, 2015

Uh oh!

kohrt commented Jun 4, 2015

Uh oh!

floe commented Jun 4, 2015

Uh oh!

xlz commented Jun 4, 2015

Uh oh!

kohrt commented Jun 4, 2015

Uh oh!

kohrt commented Jun 5, 2015

Uh oh!

floe commented Jun 5, 2015

Uh oh!

kohrt commented Jun 8, 2015

Uh oh!

kohrt commented Jun 10, 2015

Uh oh!

xlz commented Jun 10, 2015

Uh oh!

HenningJ commented Jun 10, 2015

Uh oh!

kohrt commented Jun 10, 2015

Uh oh!

xlz commented Jun 10, 2015

Uh oh!

floe commented Jun 10, 2015

Uh oh!

floe commented Jun 10, 2015

Uh oh!

kohrt commented Jun 11, 2015

Uh oh!

HenningJ commented Jun 11, 2015

Uh oh!

kohrt commented Jun 11, 2015

Uh oh!

kohrt commented Jun 11, 2015

Uh oh!

HenningJ commented Jun 11, 2015

Uh oh!

kohrt commented Jun 11, 2015

Uh oh!

HenningJ commented Jun 11, 2015

Uh oh!

kohrt commented Jun 11, 2015

Uh oh!

HenningJ commented Jun 11, 2015

Uh oh!

floe commented Jun 11, 2015

Uh oh!

kohrt commented Jun 11, 2015

Uh oh!

kohrt commented Jun 11, 2015

Uh oh!

floe commented Jun 11, 2015

Uh oh!

kohrt commented Jun 11, 2015

Uh oh!

floe commented Jun 11, 2015

Uh oh!

xlz commented Jun 11, 2015

Uh oh!