Skip to content

Conversation

pwhelan
Copy link
Contributor

@pwhelan pwhelan commented Aug 25, 2025

Summary

when setting FLB_UNICODE_ENCODER=No or setting FLB_USE_SIMDUTF=No (which turns off FLB_UNICODE_ENCODER) the function process_content in tail_file.c will leak memory from flb_unicode_generic_convert_to_utf8 due to the call to flb_free being gated by an ifdef to FLB_UNICODE_ENCODER.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed a potential memory leak in file tailing, improving stability across all configurations.
  • Performance
    • Reduced memory usage during content processing by ensuring temporary buffers are always released after use, leading to more consistent resource management.

Copy link

coderabbitai bot commented Aug 25, 2025

Walkthrough

Removed the compile-time conditional around decoded buffer cleanup in plugins/in_tail/tail_file.c so the decoded buffer is always freed and pointer reset after processing content. No interfaces or control flow were otherwise changed.

Changes

Cohort / File(s) Summary
Tail decoded buffer cleanup
plugins/in_tail/tail_file.c
Removed #ifdef FLB_HAVE_UNICODE_ENCODER guards; unconditionally frees and nulls the decoded buffer in process_content; no other logic altered.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

A nibble of bytes, a tidy heap,
I twitch my nose—no leaks to keep.
Snip the guard, free what we found,
Burrow cleanly, byte by pound.
Thump! Memory snug, pointers neat—
This rabbit’s patch is lean and sweet. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pwhelan-fix-memory-leak-tail-generic-unicode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/in_tail/tail_file.c (1)

486-499: Prevent memory leak when chaining preferred and generic Unicode conversions

I’ve verified that both ctx->preferred_input_encoding and ctx->generic_input_encoding_type can be set independently via configuration (they’re parsed from separate plugin properties in plugins/in_tail/tail_config.c) and that both conversion blocks run back-to-back when FLB_HAVE_UNICODE_ENCODER is enabled and both fields are non-default. This means the buffer allocated by the first (preferred) conversion can be overwritten—and lost—by the second (generic) conversion, resulting in a leak.

Please apply the following change in plugins/in_tail/tail_file.c (around lines 486–499):

@@ if (ctx->generic_input_encoding_type != FLB_GENERIC_UNSPECIFIED) {
-    original_len = end - data;
-    decoded = NULL;
-    ret = flb_unicode_generic_convert_to_utf8(ctx->generic_input_encoding_name,
-                                              (unsigned char*)data, (unsigned char**)&decoded,
-                                              end - data);
-    if (ret > 0) {
-        data = decoded;
-        end  = data + strlen(decoded);
-    }
-    else {
-        flb_plg_error(ctx->ins, "encoding failed '%.*s' with status %d", end - data, data, ret);
-    }
+    original_len = end - data;
+    /* Convert into a temporary buffer; if successful, free any prior decoded. */
+    char *decoded2 = NULL;
+    ret = flb_unicode_generic_convert_to_utf8(ctx->generic_input_encoding_name,
+                                              (unsigned char *) data,
+                                              (unsigned char **) &decoded2,
+                                              end - data);
+    if (ret > 0) {
+        if (decoded != NULL) {
+            flb_free(decoded);
+        }
+        decoded = decoded2;
+        data    = decoded;
+        end     = data + (size_t) ret;
+    }
+    else {
+        flb_plg_error(ctx->ins,
+                      "encoding failed '%.*s' with status %d",
+                      (int) (end - data), data, ret);
+        flb_free(decoded2);
+    }

Key points:

  • Use a temporary decoded2 so the original decoded pointer isn’t clobbered on failure nor silently overwritten on success.
  • Free the prior decoded only after the second conversion succeeds.
  • Use the returned length (ret) rather than strlen() to handle embedded NULs and avoid rescanning the string.

Files to update:

  • plugins/in_tail/tail_file.c (around lines 486–499)
🧹 Nitpick comments (1)
plugins/in_tail/tail_file.c (1)

471-471: Printf precision arg must be int; cast size_t to int

The dynamic precision for %.*s expects an int, but (end - data) is size_t. Cast to avoid UB and format warnings.

-            flb_plg_debug(ctx->ins, "nothing to convert encoding '%.*s'", end - data, data);
+            flb_plg_debug(ctx->ins, "nothing to convert encoding '%.*s'",
+                          (int) (end - data), data);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between eb77790 and 4694ca6.

📒 Files selected for processing (1)
  • plugins/in_tail/tail_file.c (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (1)
plugins/in_tail/tail_file.c (1)

654-657: LGTM: unconditional free fixes the leak in generic-encoding builds

Freeing decoded unconditionally (and nulling the pointer) ensures memory from flb_unicode_generic_convert_to_utf8 is released when FLB_HAVE_UNICODE_ENCODER is disabled at build time. This addresses the reported leak without altering control flow. Nice, focused fix.

Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, really good observation and good catch!

@cosmo0920
Copy link
Contributor

If you'd OK, could you back port to 4.0 branch as well, @pwhelan ?

@edsiper edsiper merged commit 9107895 into master Aug 26, 2025
64 checks passed
@edsiper edsiper deleted the pwhelan-fix-memory-leak-tail-generic-unicode branch August 26, 2025 13:11
@edsiper
Copy link
Member

edsiper commented Aug 26, 2025

yeah we need to backport this for this week release

@pwhelan
Copy link
Contributor Author

pwhelan commented Aug 26, 2025

yeah we need to backport this for this week release

done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants