Skip to content

Logging refactoring continued #364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 4, 2015
Merged

Logging refactoring continued #364

merged 10 commits into from
Sep 4, 2015

Conversation

xlz
Copy link
Member

@xlz xlz commented Aug 20, 2015

Continuing from #309.

I changed the logging API to libfreenect2::{get,set}GlobalLogger() (instead of setLogger()) in case in the future we want the non-global logger interface again. The rationale of reverting to a global logger is described in this commit "Convert to a global static logger" and in the discussion in #309.

libfreenect2 still manages the memory of the logger passed in because it needs access to the logger until the very end in all kinds of destructors.

@xlz xlz mentioned this pull request Aug 21, 2015
@xlz
Copy link
Member Author

xlz commented Aug 21, 2015

@floe

This PR is complete. I have tested (Linux/Ubuntu) all combinations of cmake .. -DENABLE_OPENGL={ON,OFF} -DENABLE_CXX11={ON,OFF} && ./Protonect {-noviewer,} {gl,cl,cpu}. All work as expected.

@xlz
Copy link
Member Author

xlz commented Aug 24, 2015

The last fix needs more investigation.

@xlz
Copy link
Member Author

xlz commented Aug 24, 2015

So the issue is quite complicated.

At first it was whenever the OpenGL processor prints errors like this std::cerr << "1" << "2"; exit(-1);, only the first "1" is printed, with the rest lost. I checked the stack at that moment, it looked like this:

#0  0x00007ffff65f64d4 in __GI___pthread_mutex_lock (mutex=0x7ffff7ffd968 <_rtld_global+2312>) at ../nptl/pthread_mutex_lock.c:114
#1  0x00007ffff7ddb150 in tls_get_addr_tail (ti=0x7ffff7543718, dtv=0x7ffff7fd8090, the_map=0x7ffff7fe89f8) at dl-tls.c:722
#2  0x00007ffff72b498c in __cxxabiv1::__cxa_get_globals () at ../../../../src/libstdc++-v3/libsupc++/eh_globals.cc:62
#3  0x00007ffff72b48b9 in std::uncaught_exception () at ../../../../src/libstdc++-v3/libsupc++/eh_catch.cc:139
#4  0x00007ffff72ebae2 in ~sentry (this=0x7fffffffce00, __in_chrg=<optimized out>) at /build/buildd/gcc-4.8-4.8.4/build/x86_64-linux-gnu/libstdc++-v3/include/ostream:431
#5  std::__ostream_insert<char, std::char_traits<char> > (__out=..., __s=__s@entry=0x40849a "device serial: ", __n=15) at /build/buildd/gcc-4.8-4.8.4/build/x86_64-linux-gnu/libstdc++-v3/include/bits/ostream_insert.h:112
#6  0x00007ffff72ebec7 in std::operator<< <std::char_traits<char> > (__out=..., __s=0x40849a "device serial: ") at /build/buildd/gcc-4.8-4.8.4/build/x86_64-linux-gnu/libstdc++-v3/include/ostream:535

The std::ostream::sentry's destructor flushes the stream and outputs the content. During that it also checks std::uncaught_exception(). Nothing is wrong with that. In std::uncaught_exception() __cxa_get_globals() accesses a static thread local variable. Thread local access __tls_get_addr() calls into tls_get_addr_tail().

The commit "Fix exception handling in LogMessage destructor" removed usage of operator<<() for outputting, and outputting will no longer access thread local variables. But this only works around the problem.

I tried adding some std::cerr << in Protonect.cpp, and it behaved like the above again. And actually the main thread was hanging, in a deadlock or something. This means in the past, all std::cerr << in the codebase will probably cause the calling thread to hang. That may explain some of the OpenGL black screen bugs.

The only other access to thread-local variables happens in

#0  __tls_get_addr (ti=0x7ffff65eafb0) at dl-tls.c:770
#1  0x00007ffff63e6b8a in _initClIcd () at ocl_icd_loader.c:560
#2  0x00007ffff7dea13a in call_init (l=<optimized out>, argc=argc@entry=2, argv=argv@entry=0x7fffffffe388, env=env@entry=0x7fffffffe3a0) at dl-init.c:78
#3  0x00007ffff7dea223 in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, l=<optimized out>) at dl-init.c:36
#4  _dl_init (main_map=0x7ffff7ffe1c8, argc=2, argv=0x7fffffffe388, env=0x7fffffffe3a0) at dl-init.c:126
#5  0x00007ffff7ddb30a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2

According to https://bugzilla.redhat.com/show_bug.cgi?id=1219646, ocl-icd implementation was fundamentally flawed (before 2.2.4) in its use of thread local storage in the library constructor. If I disable OpenCL, or update ocl-icd-libopencl1 to 2.2.7-2 (Ubuntu 15.10 Wily Werewolf), the problem goes away.

Other bug reports of the same issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1223055
https://bugzilla.redhat.com/show_bug.cgi?id=661676
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54018

According to GCC docs, "No static initialization may refer to the address of a thread-local variable." It looks like the case in these reports.

Now that many users are stuck with Ubuntu 14.04 with the old version of ocl-icd, a work around has to be provided. Suppose no further access to any thread local variable is allowed under the buggy ocl-icd. That means no access to std::uncaught_exception (std::ostream::sentry::~sentry() and std::condition_variable_any), and no access to anything related to exception:

__cxxabiv1::__cxa_tm_cleanup
__cxxabiv1::__cxa_rethrow
std::current_exception
__cxxabiv1::__cxa_allocate_exception
__cxxabiv1::__cxa_allocate_dependent_exception
__cxa_current_exception_type
__cxxabiv1::__cxa_begin_catch
__cxxabiv1::__cxa_end_catch

To avoid std::uncaught_exception(), only needs to disable automatic flushing:

std::nounitbuf(std::cerr);

It is harder to avoid exceptions. -fno-exceptions will disable c++ exceptions. Several places in the OpenCL processor and tinythread/tinythread.cpp need fixing. Probably use a macro LIBFREENECT2_NOEXCEPTION.

There are some other issues with OpenCL (ocl-icd version detection and NOEXCEPTION setup, CMake cleanup, macro detection, embedding a cl.hpp because newer opencl-headers no longer has one). In terms of PRs I'll probably address all OpenCL issues in a new one.

@xlz
Copy link
Member Author

xlz commented Aug 25, 2015

The above analysis means the last fix is unneeded. For platforms without the buggy ocl-icd, std::cerr will work as expected instead of replacing it with stderr. The next step is to remove the last fix and add a commit of platform specific check for buggy ocl-icd.

@floe floe added this to the 0.1 milestone Sep 2, 2015
@floe
Copy link
Contributor

floe commented Sep 2, 2015

AFAICT, to keep the changesets down to reasonable size, the PRs should now be merged in order #364 -> #368 -> #376 - is that correct?

@xlz
Copy link
Member Author

xlz commented Sep 2, 2015

I think #368 will be mergeable after merging #364.

#376 probably needs some review (and fixing conflict).

christiankerl and others added 10 commits September 3, 2015 13:01
fixed WithLogImpl::setLog; removed global ConsoleLog instance;
updated Freenect2 to manage lifetime of Log instance

renamed Log to Logger

added LIBFREENECT2_API macro to logging classes

added environment variable LIBFREENECT2_LOGGER_LEVEL to change
default logger level, possible values
'debug','info','warning','error'

made logger level immutable
Before this commit, logger pointers get passed around through
inheritance and manually constructed dependency assignment lists.
The manual management is hard to scale with logging calls which
can appear anywhere in the code.

This commit implements a single global static logger for all
Freenect2 contexts. It still can be replaced by different
loggers, but only one at a time.

Now it is the responsibility of each logging point to include
libfreenect2/logging.h, which is not automatically included.
Also add a "None" logging level

Thus remove NoopLogger, and sort logging levels by verbosity.
Also implement a WithPerfLogging class based on timing code to
remove duplicate timing code in several processors.
Also export level2str() in Logger for external use.
ocl-icd under 2.2.3 calls dlopen() in its library constructor
and accesses a thread local variable in the process. This causes
all subsequent access to any other thread local variables to
deadlock.

The bug is fixed in ocl-icd 2.2.4, which is not in stable releases
in Ubuntu or Debian. Thus this provides a workaround given buggy
ocl-icd.

To avoid access to thread local variable, errno, std::ostream
with unitbuf, and exception handling in libstdc++ cannot be used.

This commit checks ocl-icd version, and refactor the OpenCL
processor to not use exceptions. Then disable unitbuf on std::cerr
and disable all exceptions with -fno-exceptions (when available).

This commit and the ocl-icd bug do not affect Mac OS X or Windows.
@xlz
Copy link
Member Author

xlz commented Sep 3, 2015

Rebased to master.

@floe
Copy link
Contributor

floe commented Sep 4, 2015

Thanks for your work. I've tested this on my machine, and everything works fine, so let's merge it.

floe added a commit that referenced this pull request Sep 4, 2015
Logging refactoring continued
@floe floe merged commit 7691a03 into OpenKinect:master Sep 4, 2015
@floe floe mentioned this pull request Sep 4, 2015
2 tasks
@xlz xlz deleted the logging branch September 13, 2015 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants