Skip to content

Added filtering of shadowed color regions to registration #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 14, 2015
Merged

Added filtering of shadowed color regions to registration #253

merged 6 commits into from
Jun 14, 2015

Conversation

kohrt
Copy link
Contributor

@kohrt kohrt commented Jun 2, 2015

Hi,

I tested the registration and it works fine, nice work! But it ran a bit slow on my system, so I decided to improve it. The original version took ~5.1 ms on my system. With this changes it went down to ~1 ms.

I achieved this by replacing the multi dimensional array with a one dimensional one, which is structured similar to the image data. map[i] is the value for depth[i], etc. So you only need to walk through the array with a pointer and you are done.

The new maps also include some static multiplications and additions which where computed at runtime before. There is also a new map for y image offsets, to reduce computations while runtime even more.

The rounding of the x coordinate is replaced by const int cx = rx + 0.5f;, which should be the same for positive numbers. But all valid x coordinates should be greater equals 0 anyway. To make sure that x is positive, a check for rx > -0.5f is added to the if statement below.

The last thing I changed is the API of the apply method. I changed the type for the registered image to libfreenect2::Frame, so that it is possible to check for the correct size.

@floe
Copy link
Contributor

floe commented Jun 2, 2015

Awesome, I was planning for something similar, but I wouldn't have assumed that it's actually that fast. I'll merge this soon-ish.

rx = wx / (color.fx * color_q);
ry = wy / (color.fx * color_q);
rx = (wx / (color.fx * color_q)) - (color.shift_m / color.shift_d);
ry = (wy / (color.fx * color_q)) * color.fy + color.cy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be color.fy in the second line? Then it could also be simplified to ry = (wy / color_q) + color.cy;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be fy, I didn't checked the previous code for errors. But from the senser provided intrinsics fy and fx are always the same. But if someone uses other intrinsics this could lead to wrong values, I will change it.

@floe floe added this to the 0.1 milestone Jun 2, 2015
@kohrt
Copy link
Contributor Author

kohrt commented Jun 2, 2015

I was impressed too. After restructuring the maps and using pointers it went down to 2.3 ms. With the static computations added to the maps it went down to 1.9 ms. With the extra map for cy and floorf(rx + 0.5f) instead of round(rx), it was 1.3 ms. And finally after replacing floorf(rx + 0.5f) by (int)(rx + 0.5f) it was only 1 ms.
I think now it is so fast, that there is no need for an OpenCL/GL/CUDA implementation (at least on fast x86 machines). I am afraid, that copying the color and depth image to the GPU and reading the result might consume a lot of this 1 ms, so there is less headroom to improve.

@xlz
Copy link
Member

xlz commented Jun 3, 2015

Does this include depth filtering? I think that's the hard part, involving a depth map of 1920x1080 (or half the size).

@floe
Copy link
Contributor

floe commented Jun 4, 2015

@xlz you mean depth buffer? Not yet. I have the undistortion working now, unfortunately in a separate branch. Currently checking what's the best way to integrate this PR with my code.

@kohrt
Copy link
Contributor Author

kohrt commented Jun 4, 2015

Well, this is now only a color -> depth registration, so there won't be a high resolution depth map. I though about a depth -> color registration using the inverse of this registration, but didn't found a way to do this, while cy is static for each x,y, the depth value is involved to in the middle of the calculation to find the corresponding cx coordinate.
For the filtering I thought, one could create an array of the size of the color image and store for each pixel if it has been used for the registered color image (and for which registered color pixel it has been used). If two registered color pixels are using the same color image pixel, just select the one that belongs to the depth pixel with the lower distance. But the problem is that the color image is much bigger, so that the background and foreground pixels could just map to neighboring color pixels and then you won't find out, that there are background pixel. So for the filtering we need something more fancy.

@floe
Copy link
Contributor

floe commented Jun 4, 2015

@xlz has a branch where a depth buffer has already been implemented, see his comment in #223.
EDIT: wrong issue, can't find the comment anymore - @xlz could you re-post a link to your depth buffer code?

@xlz
Copy link
Member

xlz commented Jun 4, 2015

This one does depth filter: https://github.com/xlz/libfreenect2/commit/a0c0b6e5402ad363dcd5bbe58ebab8d093ea2f80

What I meant for depth map is an intermediate map to filter duplicate pixels, just as you described. The fancy something is erosion which extracts minimum of neighboring pixels. Small erosion kernel size will produce aliasing in the edge. There was unavoidable aliasing with erosion on a half-size depth filter so I chose full-size.

In depth to color registration, it is upsampling the depth image so higher depth resolution wouldn't mean higher depth quality. In color to depth registration, even naive downsampling of RGB image will still retain very RGB quality in the pointcloud.

@kohrt
Copy link
Contributor Author

kohrt commented Jun 4, 2015

Nice. I though of something like this, setting a whole neighborhood instead of one pixel. But this filtering is pretty expensive, especially the erode.

Yeah, you can't add information when upsampling, but you will get more detailed texture in the point cloud. It always depends on what you want.

@kohrt
Copy link
Contributor Author

kohrt commented Jun 5, 2015

I updated the pull request. It now includes undistortion of the depth and registered color image. It is still arround 0.9 to 1.0 ms. on my system. There was also a double conversion in one of the if statements, which I also replaced.
Maybe we can also build in the filtering. Shouldn't be to hard to do this without opencv.

@floe
Copy link
Contributor

floe commented Jun 5, 2015

Great! Just one small additional request: could you add some comments to your code? There's a fair amount of pointer arithmetic going on, and it's rather easy to lose track...

@kohrt
Copy link
Contributor Author

kohrt commented Jun 8, 2015

Added some comments, hope it is easier to keep track now.

@kohrt
Copy link
Contributor Author

kohrt commented Jun 10, 2015

I just implemented the filtering from @xlz into the registration, but without any OpenCV. I tried to optimize the filtering as well. The hole registration with filtering takes now ~5.5 ms. on my system.

@xlz: could you check if the filtering is fine like this and if it works similar to your implementation?

@xlz
Copy link
Member

xlz commented Jun 10, 2015

Good work. Unfortunately I won't be able to test it for a while due to traveling etc.

I roughly looked at the new code. Most looks good. One issue: don't assume 3 bytes per pixel. Hardware decoders (Tegra, Vaapi) will return RGBA format.

IIUC this is still a 5x3 filter on a 1920x1080 image. If you haven't done so, also verify if there is any aliasing with maximum depth range.

@HenningJ
Copy link
Contributor

It does not always do what it's supposed to:
registration
Seems to work for my "shadow", but the shadow of the cup in front of me is not completely black. Is that because it's quite close to the camera?
And what's going on with the bottle? Is that because it's mostly transparent?

But otherwise it looks pretty good

More general question: the (small) shadow on the left side (that's in the original depth image) is because of the distance between IR-emitter and IR-sensor? and the (larger) shadow on the right side (in the registered image) is because of the distance between IR-sensor and RGB-sensor? Is that correct?

@kohrt
Copy link
Contributor Author

kohrt commented Jun 10, 2015

@xlz ok, didn't know that there are rgba color images. Is the byte order RGBA?
Yes, it's a 5 x 3 filter. Should be the same as or at least similar to your implementation. Instead of eroding later, it's done in one step.
How do I check for aliasing? Could an adaptive filter that is bigger for near pixels and smaller for far ones solve those issues?
I also want to make the filtering optional, so that it can be disabled if not needed.

@HenningJ the filter only works if there are depth values. At the borders of objects, like the cup, you don't get depth values, so there are no foreground depth pixel which use the color pixel from the border of the cup. Only the depth pixel from the background map to those border pixels, therefore they get used for those background depth pixels.
The same is true for the bottle. You can see that there are no values for the middle area (possibly to close to the camera), so the color pixels are used for the background pixels.
The shadow on the left is probably from the distance between the emitter and the sensor.

@xlz
Copy link
Member

xlz commented Jun 10, 2015

RGBA yes.

If you see nothing strange then no aliasing. Do not need fancy filtering here.

Optional sure.

@floe
Copy link
Contributor

floe commented Jun 10, 2015

What I did in my (unmerged) registration code is to simply treat all color image data as RGBA (i.e. registered color images are output as RGBA, and input images are assumed to be RGBA, too). Only difference for RGB images is that the source pointer is advanced by 3 instead of 4 bytes. IMHO, this has two advantages:

  • we can always use uint32_t operations, which are generally not slower, and often faster, than moving 3 separate bytes around
  • there is an additional data channel available which could be used for downsampled depth data.

However, I would not implement this within this current PR. @wiedemeyer if possible, I'd like to split this into several PRs:

  1. the original one (this one here, Added filtering of shadowed color regions to registration #253) doing just the speedup
  2. a separate one for the depth buffer
  3. and possible a third for parameters, tweaks etc.

@floe
Copy link
Contributor

floe commented Jun 10, 2015

I tried to cherry-pick the commits except the last one (97976ea), but apparently Github isn't smart enough to fast-forward the PR on its own. Can you try and see what happens if you merge master into this PR?

@kohrt
Copy link
Contributor Author

kohrt commented Jun 11, 2015

I rebased it to the current master. But I saw you already pulled the first commits.

@HenningJ
Copy link
Contributor

@wiedemeyer thanks for the explanation.

@kohrt
Copy link
Contributor Author

kohrt commented Jun 11, 2015

Just to make sure, the color image from the CPU JPEG endcoding has the BGR format, but VAAPI and Tegra have RGBA, right?
I just added the changes, so that the code can handle 3 and 4 byte color images. It also improved performance a bit (5.0 instead of 5.5 ms.).

@kohrt kohrt changed the title Improved speed of registration by factor 5 Added filtering of shadowed color regions to registration Jun 11, 2015
@kohrt
Copy link
Contributor Author

kohrt commented Jun 11, 2015

Yes, of course. I saw it right after I pushed.

@HenningJ
Copy link
Contributor

enough pestering, gotta get back to work ;-)

@kohrt
Copy link
Contributor Author

kohrt commented Jun 11, 2015

If the input image is 3 byte the alpha channel of the output image will be set to the blue channel value of the next pixel.
First I though a 3 byte output image would be ok too, then the next run would overwrite the previous 4th byte anyway, but this will always write one byte after the end of the buffer, which might not be so good.

@HenningJ
Copy link
Contributor

First I though a 3 byte output image would be ok too, then the next run would overwrite the previous 4th byte anyway

Yeah, that's what I meant.

but this will always write one byte after the end of the buffer, which might not be so good.

This will happen now anyways, won't it? You're still writing a four byte (unsigned) int when there are only 3 bytes left in the 3 byte output image. You're just writing 0 to the additional byte, instead of some other number. If there is anything at that byte, it's gonna be bad either way. So I don't see any scenario where the additional &-operation actually helps.

@kohrt
Copy link
Contributor Author

kohrt commented Jun 11, 2015

Ok, the & could be removed, if everyone ignores that byte anyway.
The current implementation only allows 4 byte output images. So that writing over the buffer is not an issue.
Maybe it would be nicer if all color images in libfreenect2 have the same format either 3 or 4 byte, then no special cases need to be handled. Especially users can be sure that the image format is always the same, independent from the JPEG processor that is being used.

@HenningJ
Copy link
Contributor

Ok, the & could be removed, if everyone ignores that byte anyway.

Well...if anybody needs that byte to be 0, then of course your implementation is correct, it's just the comment thats wrong. ;-)

@floe
Copy link
Contributor

floe commented Jun 11, 2015

I think that we should simply change turbo_jpeg_rgb_packet_processor.cpp, lines 76 and 118, and replace TJPF_BGR by TJPF_RGBX. That would reduce the number of special cases quite a bit.

@kohrt
Copy link
Contributor Author

kohrt commented Jun 11, 2015

I tried it out, and it makes things much more beautiful.
But if TJPF_RGBX is used, the images need to be converted to BGRX or BGR before then can be displayed with OpenCV in Protonect.
So it might be better to use TJPF_BGRX or even TJPF_BGRA (if we care about the values in the alpha channel). @xlz could VAAPI and Tegra also output BGRA instead of RGBA?

@kohrt
Copy link
Contributor Author

kohrt commented Jun 11, 2015

should I push the changes to this pull request so you can try them out?

@floe
Copy link
Contributor

floe commented Jun 11, 2015

Yes, please. Regarding BGRA vs RGBA: we're trying to get rid of OpenCV anyway, the viewer from PR #261 is based on OpenGL and can use RGBA. But for the moment, TJPF_BGRX would probably be the best choice.

Updated registration and removed handling of 3 byte color images.
Updated protonect to display color image correct.
@kohrt
Copy link
Contributor Author

kohrt commented Jun 11, 2015

I updated the PR.
I would vote for sticking with the BGRA format. But mainly because I am used to it. I did not find any information on which format is better. Only http://en.wikipedia.org/wiki/RGBA_color_space mentioned, that colors are commonly stored as ARGB in a 32 bit word, which results in BGRA on little endian.

@floe
Copy link
Contributor

floe commented Jun 11, 2015

OK, from what I've seen, any OpenGL implementation which can deal with libfreenect2 should also be able to handle BGRA just fine.

@xlz
Copy link
Member

xlz commented Jun 11, 2015

VA-API can output both. Tegra can only output RGBA.

@floe
Copy link
Contributor

floe commented Jun 11, 2015

For the registration, the color order probably doesn't even matter now that we simply treat all individual pixels as uint32_t. @wiedemeyer Did you notice any speed issues with the color image decoder? After all, it needs to push 33% more data around now for each frame...

@kohrt
Copy link
Contributor Author

kohrt commented Jun 11, 2015

Yeah, the order does not matter for the registration. The registration with filter takes ~5.7 ms. and 1.0 ms. without. So no real changes.
I didn't looked at the jpeg decoder, but I guess that moving 33% more bytes is just a fraction of the time needed to decode the jpeg data.

@kohrt
Copy link
Contributor Author

kohrt commented Jun 12, 2015

Just let both versions run of the jpeg decoder run and both are around 14 ms.

@floe
Copy link
Contributor

floe commented Jun 14, 2015

Excellent, good to know. Merged.

floe added a commit that referenced this pull request Jun 14, 2015
Added filtering of shadowed color regions to registration
@floe floe merged commit 0fb3667 into OpenKinect:master Jun 14, 2015
@kohrt kohrt deleted the improved_registration branch June 16, 2015 14:12
@floe floe mentioned this pull request Jul 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants