-
Notifications
You must be signed in to change notification settings - Fork 1.8k
in_tail: fix memory leak when using generic unicode conversion. #10781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Phillip Adair Stewart Whelan <[email protected]>
WalkthroughRemoved the compile-time conditional around decoded buffer cleanup in plugins/in_tail/tail_file.c so the decoded buffer is always freed and pointer reset after processing content. No interfaces or control flow were otherwise changed. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/in_tail/tail_file.c (1)
486-499
: Prevent memory leak when chaining preferred and generic Unicode conversionsI’ve verified that both
ctx->preferred_input_encoding
andctx->generic_input_encoding_type
can be set independently via configuration (they’re parsed from separate plugin properties in plugins/in_tail/tail_config.c) and that both conversion blocks run back-to-back when FLB_HAVE_UNICODE_ENCODER is enabled and both fields are non-default. This means the buffer allocated by the first (preferred) conversion can be overwritten—and lost—by the second (generic) conversion, resulting in a leak.Please apply the following change in plugins/in_tail/tail_file.c (around lines 486–499):
@@ if (ctx->generic_input_encoding_type != FLB_GENERIC_UNSPECIFIED) { - original_len = end - data; - decoded = NULL; - ret = flb_unicode_generic_convert_to_utf8(ctx->generic_input_encoding_name, - (unsigned char*)data, (unsigned char**)&decoded, - end - data); - if (ret > 0) { - data = decoded; - end = data + strlen(decoded); - } - else { - flb_plg_error(ctx->ins, "encoding failed '%.*s' with status %d", end - data, data, ret); - } + original_len = end - data; + /* Convert into a temporary buffer; if successful, free any prior decoded. */ + char *decoded2 = NULL; + ret = flb_unicode_generic_convert_to_utf8(ctx->generic_input_encoding_name, + (unsigned char *) data, + (unsigned char **) &decoded2, + end - data); + if (ret > 0) { + if (decoded != NULL) { + flb_free(decoded); + } + decoded = decoded2; + data = decoded; + end = data + (size_t) ret; + } + else { + flb_plg_error(ctx->ins, + "encoding failed '%.*s' with status %d", + (int) (end - data), data, ret); + flb_free(decoded2); + }Key points:
- Use a temporary
decoded2
so the originaldecoded
pointer isn’t clobbered on failure nor silently overwritten on success.- Free the prior
decoded
only after the second conversion succeeds.- Use the returned length (
ret
) rather thanstrlen()
to handle embedded NULs and avoid rescanning the string.Files to update:
- plugins/in_tail/tail_file.c (around lines 486–499)
🧹 Nitpick comments (1)
plugins/in_tail/tail_file.c (1)
471-471
: Printf precision arg must be int; cast size_t to intThe dynamic precision for %.*s expects an int, but (end - data) is size_t. Cast to avoid UB and format warnings.
- flb_plg_debug(ctx->ins, "nothing to convert encoding '%.*s'", end - data, data); + flb_plg_debug(ctx->ins, "nothing to convert encoding '%.*s'", + (int) (end - data), data);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
plugins/in_tail/tail_file.c
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: PR - fuzzing test
🔇 Additional comments (1)
plugins/in_tail/tail_file.c (1)
654-657
: LGTM: unconditional free fixes the leak in generic-encoding buildsFreeing decoded unconditionally (and nulling the pointer) ensures memory from flb_unicode_generic_convert_to_utf8 is released when FLB_HAVE_UNICODE_ENCODER is disabled at build time. This addresses the reported leak without altering control flow. Nice, focused fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, really good observation and good catch!
If you'd OK, could you back port to 4.0 branch as well, @pwhelan ? |
yeah we need to backport this for this week release |
done! |
Summary
when setting
FLB_UNICODE_ENCODER=No
or settingFLB_USE_SIMDUTF=No
(which turns offFLB_UNICODE_ENCODER
) the functionprocess_content
intail_file.c
will leak memory fromflb_unicode_generic_convert_to_utf8
due to the call toflb_free
being gated by an ifdef toFLB_UNICODE_ENCODER
.Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit