in_tail: fix memory leak when using generic unicode conversion. #10781

pwhelan · 2025-08-25T21:27:26Z

Summary

when setting FLB_UNICODE_ENCODER=No or setting FLB_USE_SIMDUTF=No (which turns off FLB_UNICODE_ENCODER) the function process_content in tail_file.c will leak memory from flb_unicode_generic_convert_to_utf8 due to the call to flb_free being gated by an ifdef to FLB_UNICODE_ENCODER.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

Example configuration file for the change
Debug log output from testing the change

Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Run local packaging test showing all targets (including any new ones) build.
Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

Documentation required for this feature

Backporting

Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

Bug Fixes
- Fixed a potential memory leak in file tailing, improving stability across all configurations.
Performance
- Reduced memory usage during content processing by ensuring temporary buffers are always released after use, leading to more consistent resource management.

Signed-off-by: Phillip Adair Stewart Whelan <[email protected]>

coderabbitai · 2025-08-25T21:27:36Z

Walkthrough

Removed the compile-time conditional around decoded buffer cleanup in plugins/in_tail/tail_file.c so the decoded buffer is always freed and pointer reset after processing content. No interfaces or control flow were otherwise changed.

Changes

Cohort / File(s)	Summary
Tail decoded buffer cleanup `plugins/in_tail/tail_file.c`	Removed `#ifdef FLB_HAVE_UNICODE_ENCODER` guards; unconditionally frees and nulls the decoded buffer in process_content; no other logic altered.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

cmake: make simdutf optional. #10778 — Adjusts when FLB_HAVE_UNICODE_ENCODER is defined in build config, directly related to the removed conditional around decoded-buffer cleanup.

Poem

A nibble of bytes, a tidy heap,
I twitch my nose—no leaks to keep.
Snip the guard, free what we found,
Burrow cleanly, byte by pound.
Thump! Memory snug, pointers neat—
This rabbit’s patch is lean and sweet. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pwhelan-fix-memory-leak-tail-generic-unicode

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/in_tail/tail_file.c (1)
486-499: Prevent memory leak when chaining preferred and generic Unicode conversions

I’ve verified that both ctx->preferred_input_encoding and ctx->generic_input_encoding_type can be set independently via configuration (they’re parsed from separate plugin properties in plugins/in_tail/tail_config.c) and that both conversion blocks run back-to-back when FLB_HAVE_UNICODE_ENCODER is enabled and both fields are non-default. This means the buffer allocated by the first (preferred) conversion can be overwritten—and lost—by the second (generic) conversion, resulting in a leak.

Please apply the following change in plugins/in_tail/tail_file.c (around lines 486–499):
@@ if (ctx->generic_input_encoding_type != FLB_GENERIC_UNSPECIFIED) {
-    original_len = end - data;
-    decoded = NULL;
-    ret = flb_unicode_generic_convert_to_utf8(ctx->generic_input_encoding_name,
-                                              (unsigned char*)data, (unsigned char**)&decoded,
-                                              end - data);
-    if (ret > 0) {
-        data = decoded;
-        end  = data + strlen(decoded);
-    }
-    else {
-        flb_plg_error(ctx->ins, "encoding failed '%.*s' with status %d", end - data, data, ret);
-    }
+    original_len = end - data;
+    /* Convert into a temporary buffer; if successful, free any prior decoded. */
+    char *decoded2 = NULL;
+    ret = flb_unicode_generic_convert_to_utf8(ctx->generic_input_encoding_name,
+                                              (unsigned char *) data,
+                                              (unsigned char **) &decoded2,
+                                              end - data);
+    if (ret > 0) {
+        if (decoded != NULL) {
+            flb_free(decoded);
+        }
+        decoded = decoded2;
+        data    = decoded;
+        end     = data + (size_t) ret;
+    }
+    else {
+        flb_plg_error(ctx->ins,
+                      "encoding failed '%.*s' with status %d",
+                      (int) (end - data), data, ret);
+        flb_free(decoded2);
+    }
Key points:

Use a temporary decoded2 so the original decoded pointer isn’t clobbered on failure nor silently overwritten on success.

Free the prior decoded only after the second conversion succeeds.

Use the returned length (ret) rather than strlen() to handle embedded NULs and avoid rescanning the string.

Files to update:

plugins/in_tail/tail_file.c (around lines 486–499)

🧹 Nitpick comments (1)

plugins/in_tail/tail_file.c (1)
471-471: Printf precision arg must be int; cast size_t to int

The dynamic precision for %.*s expects an int, but (end - data) is size_t. Cast to avoid UB and format warnings.
-            flb_plg_debug(ctx->ins, "nothing to convert encoding '%.*s'", end - data, data);
+            flb_plg_debug(ctx->ins, "nothing to convert encoding '%.*s'",
+                          (int) (end - data), data);

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between eb77790 and 4694ca6.

📒 Files selected for processing (1)

plugins/in_tail/tail_file.c (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)

GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
GitHub Check: pr-compile-centos-7
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
GitHub Check: PR - fuzzing test

🔇 Additional comments (1)

plugins/in_tail/tail_file.c (1)

654-657: LGTM: unconditional free fixes the leak in generic-encoding builds

Freeing decoded unconditionally (and nulling the pointer) ensures memory from flb_unicode_generic_convert_to_utf8 is released when FLB_HAVE_UNICODE_ENCODER is disabled at build time. This addresses the reported leak without altering control flow. Nice, focused fix.

cosmo0920

Ah, really good observation and good catch!

cosmo0920 · 2025-08-26T06:38:25Z

If you'd OK, could you back port to 4.0 branch as well, @pwhelan ?

edsiper · 2025-08-26T13:12:18Z

yeah we need to backport this for this week release

pwhelan · 2025-08-26T13:57:47Z

yeah we need to backport this for this week release

done!

in_tail: fix memory leak when using generic unicode conversion.

4694ca6

Signed-off-by: Phillip Adair Stewart Whelan <[email protected]>

pwhelan requested review from edsiper, fujimotos and koleini as code owners August 25, 2025 21:27

github-actions bot added the docs-required label Aug 25, 2025

pwhelan temporarily deployed to pr August 25, 2025 21:27 — with GitHub Actions Inactive

pwhelan requested review from cosmo0920 and removed request for edsiper, koleini and fujimotos August 25, 2025 21:31

coderabbitai bot reviewed Aug 25, 2025

View reviewed changes

pwhelan temporarily deployed to pr August 25, 2025 21:44 — with GitHub Actions Inactive

cosmo0920 approved these changes Aug 26, 2025

View reviewed changes

cosmo0920 added the backport to v4.0.x label Aug 26, 2025

edsiper merged commit 9107895 into master Aug 26, 2025
64 checks passed

edsiper deleted the pwhelan-fix-memory-leak-tail-generic-unicode branch August 26, 2025 13:11

pwhelan mentioned this pull request Aug 26, 2025

in_tail: fix memory leak when using generic unicode conversion (backport #10781) #10785

Merged

1 task

coderabbitai bot mentioned this pull request Aug 26, 2025

cmake: explicitly disable FLB_UNICODE_ENCODER when FLB_USE_SIMDUTF is disabled. #10786

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

in_tail: fix memory leak when using generic unicode conversion. #10781

in_tail: fix memory leak when using generic unicode conversion. #10781

Uh oh!

pwhelan commented Aug 25, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 25, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

cosmo0920 left a comment •

edited

Loading

Uh oh!

cosmo0920 commented Aug 26, 2025

Uh oh!

Uh oh!

edsiper commented Aug 26, 2025

Uh oh!

pwhelan commented Aug 26, 2025

Uh oh!

Uh oh!

in_tail: fix memory leak when using generic unicode conversion. #10781

in_tail: fix memory leak when using generic unicode conversion. #10781

Uh oh!

Conversation

pwhelan commented Aug 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cosmo0920 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cosmo0920 commented Aug 26, 2025

Uh oh!

Uh oh!

edsiper commented Aug 26, 2025

Uh oh!

pwhelan commented Aug 26, 2025

Uh oh!

Uh oh!

pwhelan commented Aug 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 25, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

cosmo0920 left a comment •

edited

Loading