Skip to content

Handling race condition for pending_transfers_ and its iterators. #212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

larshg
Copy link
Contributor

@larshg larshg commented Apr 24, 2015

I tried updating libusb to the newest master, which due to this commit:
libusb/libusb@a886bb0

Made a lot more errors on the transfer submitting than I experienced before. After investigating it a bit, I came to the conclusion that there is thread unsafe adding and removing going on, which occasionally made the iterators invalid, since the underlying list/queue had changed.

This pull request adds a mutex to allow only one thread at a time to access the pending_transfer_ list.
Can you verify this @christiankerl?

There might be more places to handle this problem, but for now I'm not experiencing the errors anymore.

I tested with the "older" version of libusb and this addition doesn't seem to break it, nor make any performance loss.

This might be related to the error in #185.

@xlz
Copy link
Member

xlz commented Apr 29, 2015

It seems the current libfreenect2 usage of libusb callback function just happens to not crash given that libusb has no guarantee the callback function will not be called from different threads. I think adding locks would be an ok temporary fix.

For a better solution, it's better without introducing synchronization points here and there because that defeats the point of asynchronous transfers. This two queues pending_transfers_ and idle_transfers_ basically juggle transfer structures around and the only point of the two queues is to determine whether a transfer is being worked on or being idle. The method of determining the status is to use thread unsafe iterators over the queues.

Putting the transfer status inside the transfer structure and using a std::vector to contain the transfers would also do the job in a thread-safe way.

I'll try to come up with something after your patch.

@xlz
Copy link
Member

xlz commented Apr 29, 2015

Bisection confirms a886bb0 is the commit that breaks the current libfreenect2.

Your patch did not fix the problem in my testing. I did not get any "failed to submit transfer" error, but the errors were a lot of ISO transfers being lost and depth packets being incomplete.

I also found TransferPool::onTransferCompleteStatic is always called from a same thread so there is no issue about thread safety.

One minor issue, your patch uses std::mutex and will fail to build if the user does not enable C++11.

@larshg
Copy link
Contributor Author

larshg commented Apr 30, 2015

Hi @xlz

The runtime error I get is that the iterator is invalid - nothing to do about failed to submit transfers :)

You might be right that its the same thread that executes onTransferCompleteStatic, but its the main thread that adds and removes transfers at startup and on close.

But as soon as one transfer is submitted, libusb spawns a thread and that one will eventually be using a iterator on the collections, while the main thread is adding or removing transfers too. This will make the iterator invalid.

Instead of using std::mutex I guess we can use a mutex from tinythread or another synchronization way.

@xlz
Copy link
Member

xlz commented Apr 30, 2015

Then my error from a886bb0 is different from yours. I did not get any invalid iterator errors.

You might be right that its the same thread that executes onTransferCompleteStatic, but its the main thread that adds and removes transfers at startup and on close.

What I did was printing the thread id at the position where you put the locks. Apparently TransferPool::onTransferComplete is the only entry point that accesses the iterators after initialization.

@larshg
Copy link
Contributor Author

larshg commented Apr 30, 2015

I just figured that I can only get the error to happened when I run it in debug mode - probably because all the transfers get submitted before the first transfer is complete in release mode. Did you try in debug mode?

@xlz
Copy link
Member

xlz commented May 1, 2015

Agreed. What you found is true even before a886bb0.

With the current libusbx-winiso, and with this line removed in transfer_pool.cpp:

    std::cerr << "[TransferPool::submit] transfer submission disabled!" << std::endl;

I got "deque iterator not incrementable" error using Debug build, and hang-up using Release build when exiting Protonect. The problem is exactly during initialization and deinitialization.

Either you have to put locks everywhere, or have to consider reimplementing the transfer pool.

@xlz
Copy link
Member

xlz commented May 2, 2015

@larshg

It seems on Windows a886bb0 improves performance, but on Linux a886bb0 really breaks the original transfer_pool.cpp.

Can you try https://github.com/xlz/libfreenect2/commit/c6f6e57df3671285812e5ae3ac9ceedd49258d42 from my transfer-pool branch? (This commit did not fix the problem on Linux.)

@xlz
Copy link
Member

xlz commented May 4, 2015

Further testing shows a886bb0 introduces regression on Linux. Only one out of 60 ISO transfers actually completes and the data received is corrupted.

@larshg
Copy link
Contributor Author

larshg commented May 5, 2015

Hi @xlz

I tried the commit and it seems to work fine at my end.

Atleast I cannot provoke the iterator errors.

@floe
Copy link
Contributor

floe commented May 21, 2015

Just to clarify, this works on Windows & Linux now and can be merged?

@xlz
Copy link
Member

xlz commented May 21, 2015

This can't be merged in the current form. There are two issues: use of C++11 feature std::mutex, and race condition of iterators of idle_transfers_ in TransferPool::deallocate() and TransferPool::onTransferComplete.

@floe floe added this to the 0.1 milestone May 21, 2015
@larshg
Copy link
Contributor Author

larshg commented May 26, 2015

@floe I haven't tried on linux, as I don't have dual boot (yet).
I have tried to address the two things @xlz pointed out. Using libfreenect2/threading.h and the mutex defined there and wrapped idle_transfers_ iterator in a lock.

@xlz
Copy link
Member

xlz commented May 26, 2015

Well, write access to idle_transfers_ in TransferPool::submit() and TransferPool::onTransferComplete() is not locked yet. The problem here is TransferPool::deallocate() must only be called after all asynchronous transfer cancellation requests complete. If deallocate() is called before onTransferComplete(), callback function is applied on freed memory, and then results in crash, even if the iterator is protected by locks. I thought using only locks would solve this, but apparently not.

@RyanGordon
Copy link
Contributor

@xlz Would a semaphore then be more appropriate for handling that scenario?

@hanyazou
Copy link
Contributor

Pending_transfers_lock_.lock() and unlock() are needed in TransferPool::cancel(). It works well on my Mac. Without this, BAD ACCESS will occur when you push ESC key to quit the Protonect.

@larshg
Copy link
Contributor Author

larshg commented May 26, 2015

@hanyazou I have added the lock - can you see/test if it works?

@hanyazou
Copy link
Contributor

@larshg it works. thank you for your quick response.

@floe
Copy link
Contributor

floe commented May 26, 2015

In the current patch, idle_transfers_ is only locked and unlocked exactly once (in TransferPool::deallocate()). Is this lock still required, then?

@xlz
Copy link
Member

xlz commented May 27, 2015

@RyanGordon https://github.com/xlz/libfreenect2/commit/c6f6e57df3671285812e5ae3ac9ceedd49258d42 is my proposal to fix this bug. It refactors TransferPool using std::vector and waits for all transfers during cancellation (with a stopped flag). No locks are needed.

Locking pending_transfers_ when canceling transfers.
@floe
Copy link
Contributor

floe commented Jun 8, 2015

Superseded by lock-less variant in #275, closing.

@floe floe closed this Jun 8, 2015
@larshg larshg deleted the UsbTansfers branch June 8, 2015 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants