MDBF-143: Add infer #835

grooverdan · 2025-09-08T00:22:42Z

First draft. Seeing if github actions can handle building the entire infer from scratch.

Their last release 1.2.0 was in June 2024 and its actively developed.

master-migration/master.cfg

RazvanLiviuVarzaru · 2025-10-27T09:31:36Z

These 4 commits, does not seem necessary as you are doing all the work in the shell script.
They are welcome but not in the scope of this patch?

configuration/builders/sequences/sast.py

configuration/steps/commands/scripts/infer.sh

configuration/builders/sequences/helpers.py

configuration/steps/commands/scripts/infer.sh

RazvanLiviuVarzaru · 2025-10-27T11:53:08Z

Maybe the specifications have already been set, but after analyzing the script, I was wondering if the following flow might be simpler? Still, I believe you’ve already achieved the maximum efficiency in terms of time and resource consumption with the current script, even though it’s a bit difficult for me to understand it completely at the moment. (which is not a blocker of course I just need your guidance)

If the push is on a main branch e.g. 10.6, 10.11,...,main :: perform a full analysis i.e.
    -> build the code
    -> capture
    -> analysis
    -> publish report on CI
If the push is on a feature branch (e.g. bb-*, refs/pull/head, so on) then analyze the diff between the feature branch and #master_branch# as follow:
    [1]:
    -> build the code on the feature branch
    -> infer capture ...
    -> git diff --name-only origin/feature..origin/#master_branch#  > index.txt
    -> infer analyze --changed-files-index index.txt
    -> move the report 
    [2]:
    -> build the code on the #master_branch# (e.g. for bb-10.6-razvan, master_branch=10.6)
    -> infer capture --reactive --mark-if-unchanged
    -> infer analyze --incremental-analysis --changed-files-index index.txt
    -> infer reportdiff --report-current report_for_the feature --report-previous report_of_the_master_branch
    -> publish the diff report on CI

With the above workflow you don't need the bind mounts anymore at the expense of:

a full analysis on main branches
building twice for a feature branch (although building those targets is very fast)
So I am not sure it is the best idea, it's just how I imagine it in the simplest form.

configuration/steps/commands/scripts/infer.sh

grooverdan · 2025-10-28T07:09:32Z

Maybe the specifications have already been set

not totally, its hopefully a bit simpler now

achieved the maximum efficiency in terms of time and resource consumption with the current script

I did try. The incremental builds aren't saving much in capture, and not sure how much in analyse.

At some point if no decent incremental results are there that could be removed leaving just the need for X main branch saves.

if need_to_full build;
     -> build the code
     -> capture
     -> analysis
else
     -> build the code
     -> capture --incremental
     -> analysis --incremental
fi

save result to $infer

diff to $master_branch branch to generate incremental report using json "db" manipulation

# quasy save diff also just for ref bug probably could be removed.

publish report on CI

RazvanLiviuVarzaru · 2025-10-28T09:03:16Z

configuration/builders/sequences/helpers.py

    )
+
+
+def save_infer_logs(


this function is tightly coupled with infer() in sast.py.
Can simply move the step in sast.py and use InContainer -> ShellStep -> SavePackages

Changed in 474b2e4

fauust

Agree with @RazvanLiviuVarzaru, it's a bit complex to understand what the big picture is, probably because I don't know the tool. I'll probably need to read the infer doc next and come with (maybe) some design arguments.

Small changes requested for consistency mostly.

fauust · 2025-10-28T09:19:40Z

configuration/steps/commands/scripts/infer.sh

+infer --version
+
+if [ $# -lt 1 ]; then
+	echo insufficient args >&2


Can't you use functions from the bash library to make this a bit more homogeneous with all BB output messages (bb_log_info|warn|error)?

@fauust the script is replaced in a built-in buildbot command.
It is not a script that is downloaded at runtime but rather embedded in the config-time command, hence the lack of source support.

I will, in the future, add support for downloading multiple scripts.

fauust · 2025-10-28T09:21:12Z

configuration/steps/commands/scripts/infer.sh

+
+pushd /mnt/src
+if [ ! -d .git ]; then
+  git clone https://github.com/MariaDB/server.git


missing final dot:
git clone https://github.com/MariaDB/server.git .

fauust · 2025-10-28T09:26:02Z

configuration/steps/commands/scripts/infer.sh

+
+# Directory to clean
+# Target maximum usage (in percent)
+max_usage=90


to be synced with zabbix alerting (warning = 80, critical = 90). We can change it on zabbix ofc.

bg-bbw5 has 1.6 TB, 80 % is reasonably enough.

fauust · 2025-10-28T09:31:33Z

configuration/steps/commands/scripts/infer.sh

+
+if [ ! -d bld ]; then
+  mkdir bld
+  build


fauust · 2025-10-28T09:32:16Z

configuration/steps/commands/scripts/infer.sh

+}
+# Capture and analyze the feature of the files changes in index
+#
+cd bld


popd/pushd for consistency

fauust · 2025-10-28T09:38:01Z

configuration/steps/commands/scripts/infer.sh

+    # 2 is the size of an empty json array '[]'
+    if [ "$filesize" -gt 2 ]; then
+      echo "$msg"
+      echo


echo -e "$msg\n"

but see my comment on bash lib.

This preforms static analysis on the MariaDB codebase by maintaining a git source repository as a shared volume. Because static analysis takes time, a lot of time, there is a shared cache volume to store build results from main branches of the codebase so that as much incremental usage can occur. Infer runs in to phases, a capture and an analyze. Infer output are in a result-dir this contains: * report.json - what infer tools use * report.txt - the human readable version of this * capture.db - the sqlite3 version presentation of captured files and the relation to functions definitions. * results.db - the analyze phase outputs Of these, the report.json is desirable as the long term record of vulnerabilities. and the main_diff containing the difference from the last main X.Y branch commit.

RazvanLiviuVarzaru · 2025-10-28T12:40:56Z

Added a worker: 4f32380

Run infer on bg-bbw5-x64

RazvanLiviuVarzaru · 2025-10-28T13:54:06Z

configuration/steps/commands/scripts/infer.sh

+  analyze
+  cp -a "$result_dir" "$infer/$commit"
+  cd ..
+else


On a fresh run

+ infer reportdiff --report-current report.json --report-previous /mnt/infer/68432a0bc365d783cde21487c257b7e865221f41/report.json --project-root /mnt/src --results-dir /home/buildbot/infer_results_diff Uncaught Internal Error: (Sys_error "/home/buildbot/report.json: No such file or directory")

Guessing your initial intention was to put everything under this line under the else block and keep the ${result_dir} cleanup for CI outside the IF...ELSE.

Because I remember you had an exit 0 initially under the full run block.
For clarity you can define a variable to decide when to perform a full analysis vs. an incremental then put the full / incremental blocks under functions.

Maybe this? Haven't tested it yet, I just arranged everything for clarity.

#!/bin/bash # Infer script for performing # static analysis on the MariaDB codebase set -x -e infer --version trap 'cleanup_for_CI' EXIT if [ $# -lt 1 ]; then echo insufficient args >&2 exit 1 fi # Testing this version branch=$1 if [ -z "$branch" ]; then echo "usage $0 {branch/commit}" >&2 exit 1 fi ################################################################################ ## CONFIGURATION ## ################################################################################ base=$PWD result_dir=$PWD/infer_results infer="/mnt/infer" sources="/mnt/src" max_usage=80 # maximum disk usage (in percent) : "${JOBS:=4}" ################################################################################ ## FUNCTIONS ## ################################################################################ get_source() { pushd $sources trap 'popd' RETURN if [ ! -d .git ]; then git clone https://github.com/MariaDB/server.git . else git clean -df fi git fetch origin "$branch" git checkout -f FETCH_HEAD git submodule update --init --recursive --jobs "${JOBS}" git clean -df commit=$(git rev-parse FETCH_HEAD) } cleanup_for_CI() { rm -rf "${result_dir}"/*.db "${result_dir}"/tmp } # Function to get current disk usage (integer percent) get_usage() { df -P "$infer" | awk 'NR==2 {gsub(/%/,""); print $5}' } host_cleanup() { rm -rf "${result_dir}" index.txt report.json echo "Checking disk usage on $(df -h "$infer" | tail -n -1)" usage=$(get_usage) echo "Current usage: ${usage}%" # Find directories sorted by oldest modification time (oldest first) mapfile -t dirs < <( find "$infer" -mindepth 1 -maxdepth 1 -type d -printf '%T@ %p\n' \ | sort -n | awk '{print $2}' ) # Loop through and delete until below threshold for dir in "${dirs[@]}"; do if (( usage < max_usage )); then echo "Disk usage is ${usage}%, below ${max_usage}%. Done." break fi echo "Deleting oldest directory: $dir" rm -rf -- "$dir" usage=$(get_usage) echo "New usage: ${usage}%" done if (( usage >= max_usage )); then echo "Warning: disk still above ${max_usage}% after deleting all directories!" else echo "Done. Disk usage now ${usage}%." fi } populate_differences() { pushd $sources trap 'popd' RETURN # Just assume we diverged from main at some point # Using $commit because merge-base didn't process # pull request references. merge_base=$(git merge-base "$commit" origin/main) # Find something closer - e.g. we've appended to a branch # we've already tested mapfile -t commits < <(git rev-list "${merge_base}..FETCH_HEAD") for common_commit in "${commits[@]}"; do if [ -d "${infer}/$common_commit" ]; then break; fi done if [ ! -d "${infer}/$common_commit" ]; then return 1 fi merge_base=$common_commit # The file changes we from last results git diff --name-only FETCH_HEAD.."${merge_base}" | tee "$base"/index.txt if [ ! -s "$base"/index.txt ]; then echo "Empty changes - nothing necessary" rm "$base"/index.txt exit 0 fi limit=50 if [ "$(wc -l < "${base}"/index.txt)" -gt $limit ]; then echo "More than $limit changes, just do a full generation" rm "$base/index.txt" return 1 fi # use previous results as a base cp -a "$infer/$merge_base" "$result_dir" # Using as a recently used maker # Eventually we can remove/clear based on not being looked at touch "$infer/$merge_base" return 0 } build() { pushd $base trap 'popd' RETURN if [ ! -d bld ]; then mkdir bld fi cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \ -DCMAKE_C_COMPILER=clang \ -DCMAKE_CXX_COMPILER=clang++ \ -S /mnt/src -B bld cmake --build bld \ --target GenError GenServerSource GenUnicodeDataSource GenFixPrivs \ --parallel "$JOBS" } capture() { infer capture --compilation-database compile_commands.json --project-root /mnt/src --results-dir "${result_dir}" "$@" } analyze() { infer analyze --project-root /mnt/src --results-dir "${result_dir}" --max-jobs "${JOBS}" "$@" } check() { file=$1 msg=$2 if [ -f "${file}" ]; then filesize=$(stat -c%s "$file") # 2 is the size of an empty json array '[]' if [ "$filesize" -gt 2 ]; then echo "$msg" echo echo "Here are the changes:" jq . "${file}" return 1 fi fi return 0 } full_analysis() { pushd $base/bld trap 'popd' RETURN echo "full run, this could take a while" capture analyze cp -a "$result_dir" "$infer/$commit" } incremental_analysis() { pushd $base/bld trap 'popd' RETURN echo "incremental run" # We've copied over a result dir, so we're continuing # https://fbinfer.com/docs/infer-workflow/#differential-workflow # using 'infer capture" instead infer run capture --reactive # some form of incremental analyze --changed-files-index $base/index.txt # Preserve result cp "${result_dir}"/report.json $base/report.json # just in case these have changed, including generated files build # Can we use the previous captured $infer/$merge_base capture --merge-capture "$infer/$merge_base" --reactive --mark-unchanged-procs analyze --incremental-analysis --changed-files-index ../index.txt # It may be merged next, or a commit pushed on top of it. infer reportdiff --report-current $base/report.json --report-previous "${result_dir}"/report.json --project-root /mnt/src --results-dir "${result_dir}" ## At this point we have infer_results/differential/{fixed,introduced}.json #!? Change the name as we're going to use differential as a main branch difference #!!mv "${result_dir}"/differential "${result_dir}"/diff_prev_commit rm -rf $base/bld $base/index.txt # Useful enough to save as $infer/ # Its unknown if this is on main branch or now, but just save. # If its merged next, then a commit exists, if a user appends # a commit, we've got a smaller delta. cp -a "${result_dir}" "$infer/${commit}" # Look at the changes from the main branch # # Take the main branch report.json # remove fixed, add introduced, and then walk # though other commits, if they exist, and apply the # same again up until, and including the last commit source $sources/VERSION branch=${MYSQL_VERSION_MAJOR}.${MYSQL_VERSION_MINOR} pushd $sources merge_base=$(git merge-base "origin/$branch" "$commit") mapfile -t commits < <(git rev-list "${merge_base}..${commit}") popd base=$infer/$merge_base last_ref=$base for common_commit in "${commits[@]}"; do diff_dir="${infer}/$common_commit"/differential/ if [ -d "$diff_dir" ]; then # removed fixed issues and append introduced. jq --slurpfile to_remove "${diff_dir}"/fixed.json ' ($to_remove[0] | map(.hash)) as $hashes_to_remove | map(select(.hash as $h | $hashes_to_remove | index($h) | not))' \ "${last_ref}"/report.json > filtered.json jq -s 'add | unique_by(.hash)' filtered.json "${diff_dir}"/introduced.json > report.json last_ref=$PWD fi done infer reportdiff --report-current report.json --report-previous "${base}"/report.json --project-root /mnt/src --results-dir "${result_dir}_diff" result_dir_main_diff=${result_dir}/main_diff mv "${result_dir}_diff"/differential/ "${result_dir_main_diff}" cp -a "${result_dir_main_diff}" "$infer/${commit}" check "${result_dir}"/differential/fixed.json "Good human! Thanks for fixing the bad things in the last commit" check "${result_dir}"/differential/introduced.json "Bad human! Don't introduce bad things in the last commit" >&2 check "${result_dir_main_diff}"/fixed.json "Good human! Thanks for fixing the bad things" if check "${result_dir_main_diff}"//introduced.json "Bad human! Don't introduce bad things" >&2; then exit 1 fi } ################################################################################ ## MAIN SCRIPT ## ################################################################################ host_cleanup if [ -d "${infer}/$commit" ]; then echo "Already scanned $commit" exit 0 fi get_source if populate_differences; then echo "No common commit ancestor with analysis or over depth limit($limit)" >&2 echo "This is going to take a while for a full scan" fi if [ ! -f index.txt ]; then RUN_MODE="full" else RUN_MODE="incremental" fi build if [ "$RUN_MODE" = "full" ]; then full_analysis fi if [ "$RUN_MODE" = "incremental" ]; then incremental_analysis fi

grooverdan force-pushed the MDBF-143-infer branch 6 times, most recently from 822e6bb to b795254 Compare September 10, 2025 06:20

grooverdan marked this pull request as draft September 10, 2025 06:21

grooverdan force-pushed the MDBF-143-infer branch from b795254 to b6cbeca Compare September 10, 2025 06:39

RazvanLiviuVarzaru reviewed Sep 10, 2025

View reviewed changes

master-migration/master.cfg Show resolved Hide resolved

grooverdan force-pushed the MDBF-143-infer branch 2 times, most recently from ba8e232 to e484be1 Compare September 11, 2025 08:17

grooverdan changed the title ~~MDBF-143: Add infer (container build)~~ MDBF-143: Add infer Sep 11, 2025

grooverdan force-pushed the MDBF-143-infer branch from e484be1 to d26d94f Compare September 12, 2025 08:27

grooverdan mentioned this pull request Sep 18, 2025

MDBF-1076: Create MSAN Debug builder #769

Merged

grooverdan force-pushed the MDBF-143-infer branch from d26d94f to 9f4c334 Compare October 9, 2025 07:02