-
Couldn't load subscription status.
- Fork 25
MDBF-143: Add infer #835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
MDBF-143: Add infer #835
Conversation
822e6bb to
b795254
Compare
b795254 to
b6cbeca
Compare
ba8e232 to
e484be1
Compare
e484be1 to
d26d94f
Compare
d26d94f to
9f4c334
Compare
|
Maybe the specifications have already been set, but after analyzing the script, I was wondering if the following flow might be simpler? Still, I believe you’ve already achieved the maximum efficiency in terms of time and resource consumption with the current script, even though it’s a bit difficult for me to understand it completely at the moment. (which is not a blocker of course I just need your guidance) With the above workflow you don't need the bind mounts anymore at the expense of:
|
1512189 to
9cd42b6
Compare
not totally, its hopefully a bit simpler now
I did try. The incremental builds aren't saving much in capture, and not sure how much in analyse. At some point if no decent incremental results are there that could be removed leaving just the need for X main branch saves. |
9cd42b6 to
56697ad
Compare
| ) | ||
|
|
||
|
|
||
| def save_infer_logs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is tightly coupled with infer() in sast.py.
Can simply move the step in sast.py and use InContainer -> ShellStep -> SavePackages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in 474b2e4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @RazvanLiviuVarzaru, it's a bit complex to understand what the big picture is, probably because I don't know the tool. I'll probably need to read the infer doc next and come with (maybe) some design arguments.
Small changes requested for consistency mostly.
| infer --version | ||
|
|
||
| if [ $# -lt 1 ]; then | ||
| echo insufficient args >&2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you use functions from the bash library to make this a bit more homogeneous with all BB output messages (bb_log_info|warn|error)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fauust the script is replaced in a built-in buildbot command.
It is not a script that is downloaded at runtime but rather embedded in the config-time command, hence the lack of source support.
I will, in the future, add support for downloading multiple scripts.
|
|
||
| pushd /mnt/src | ||
| if [ ! -d .git ]; then | ||
| git clone https://github.com/MariaDB/server.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing final dot:
git clone https://github.com/MariaDB/server.git .
|
|
||
| # Directory to clean | ||
| # Target maximum usage (in percent) | ||
| max_usage=90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be synced with zabbix alerting (warning = 80, critical = 90). We can change it on zabbix ofc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bg-bbw5 has 1.6 TB, 80 % is reasonably enough.
|
|
||
| if [ ! -d bld ]; then | ||
| mkdir bld | ||
| build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
| } | ||
| # Capture and analyze the feature of the files changes in index | ||
| # | ||
| cd bld |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
popd/pushd for consistency
| # 2 is the size of an empty json array '[]' | ||
| if [ "$filesize" -gt 2 ]; then | ||
| echo "$msg" | ||
| echo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echo -e "$msg\n"but see my comment on bash lib.
645b4fe to
d4c0ecc
Compare
This preforms static analysis on the MariaDB codebase
by maintaining a git source repository as a shared volume.
Because static analysis takes time, a lot of time, there
is a shared cache volume to store build results from main
branches of the codebase so that as much incremental usage
can occur.
Infer runs in to phases, a capture and an analyze.
Infer output are in a result-dir this contains:
* report.json - what infer tools use
* report.txt - the human readable version of this
* capture.db - the sqlite3 version presentation of captured files and the
relation to functions definitions.
* results.db - the analyze phase outputs
Of these, the report.json is desirable as the long term record of vulnerabilities.
and the main_diff containing the difference from the last
main X.Y branch commit.
d4c0ecc to
474b2e4
Compare
|
Added a worker: 4f32380 |
Run infer on bg-bbw5-x64
bae2dfe to
4f32380
Compare
| analyze | ||
| cp -a "$result_dir" "$infer/$commit" | ||
| cd .. | ||
| else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a fresh run
+ infer reportdiff --report-current report.json --report-previous /mnt/infer/68432a0bc365d783cde21487c257b7e865221f41/report.json --project-root /mnt/src --results-dir /home/buildbot/infer_results_diff
Uncaught Internal Error: (Sys_error "/home/buildbot/report.json: No such file or directory")
Guessing your initial intention was to put everything under this line under the else block and keep the ${result_dir} cleanup for CI outside the IF...ELSE.
Because I remember you had an exit 0 initially under the full run block.
For clarity you can define a variable to decide when to perform a full analysis vs. an incremental then put the full / incremental blocks under functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this? Haven't tested it yet, I just arranged everything for clarity.
#!/bin/bash
# Infer script for performing
# static analysis on the MariaDB codebase
set -x -e
infer --version
trap 'cleanup_for_CI' EXIT
if [ $# -lt 1 ]; then
echo insufficient args >&2
exit 1
fi
# Testing this version
branch=$1
if [ -z "$branch" ]; then
echo "usage $0 {branch/commit}" >&2
exit 1
fi
################################################################################
## CONFIGURATION ##
################################################################################
base=$PWD
result_dir=$PWD/infer_results
infer="/mnt/infer"
sources="/mnt/src"
max_usage=80 # maximum disk usage (in percent)
: "${JOBS:=4}"
################################################################################
## FUNCTIONS ##
################################################################################
get_source()
{
pushd $sources
trap 'popd' RETURN
if [ ! -d .git ]; then
git clone https://github.com/MariaDB/server.git .
else
git clean -df
fi
git fetch origin "$branch"
git checkout -f FETCH_HEAD
git submodule update --init --recursive --jobs "${JOBS}"
git clean -df
commit=$(git rev-parse FETCH_HEAD)
}
cleanup_for_CI()
{
rm -rf "${result_dir}"/*.db "${result_dir}"/tmp
}
# Function to get current disk usage (integer percent)
get_usage() {
df -P "$infer" | awk 'NR==2 {gsub(/%/,""); print $5}'
}
host_cleanup()
{
rm -rf "${result_dir}" index.txt report.json
echo "Checking disk usage on $(df -h "$infer" | tail -n -1)"
usage=$(get_usage)
echo "Current usage: ${usage}%"
# Find directories sorted by oldest modification time (oldest first)
mapfile -t dirs < <(
find "$infer" -mindepth 1 -maxdepth 1 -type d -printf '%T@ %p\n' \
| sort -n | awk '{print $2}'
)
# Loop through and delete until below threshold
for dir in "${dirs[@]}"; do
if (( usage < max_usage )); then
echo "Disk usage is ${usage}%, below ${max_usage}%. Done."
break
fi
echo "Deleting oldest directory: $dir"
rm -rf -- "$dir"
usage=$(get_usage)
echo "New usage: ${usage}%"
done
if (( usage >= max_usage )); then
echo "Warning: disk still above ${max_usage}% after deleting all directories!"
else
echo "Done. Disk usage now ${usage}%."
fi
}
populate_differences()
{
pushd $sources
trap 'popd' RETURN
# Just assume we diverged from main at some point
# Using $commit because merge-base didn't process
# pull request references.
merge_base=$(git merge-base "$commit" origin/main)
# Find something closer - e.g. we've appended to a branch
# we've already tested
mapfile -t commits < <(git rev-list "${merge_base}..FETCH_HEAD")
for common_commit in "${commits[@]}"; do
if [ -d "${infer}/$common_commit" ]; then
break;
fi
done
if [ ! -d "${infer}/$common_commit" ]; then
return 1
fi
merge_base=$common_commit
# The file changes we from last results
git diff --name-only FETCH_HEAD.."${merge_base}" | tee "$base"/index.txt
if [ ! -s "$base"/index.txt ]; then
echo "Empty changes - nothing necessary"
rm "$base"/index.txt
exit 0
fi
limit=50
if [ "$(wc -l < "${base}"/index.txt)" -gt $limit ]; then
echo "More than $limit changes, just do a full generation"
rm "$base/index.txt"
return 1
fi
# use previous results as a base
cp -a "$infer/$merge_base" "$result_dir"
# Using as a recently used maker
# Eventually we can remove/clear based on not being looked at
touch "$infer/$merge_base"
return 0
}
build()
{
pushd $base
trap 'popd' RETURN
if [ ! -d bld ]; then
mkdir bld
fi
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-S /mnt/src -B bld
cmake --build bld \
--target GenError GenServerSource GenUnicodeDataSource GenFixPrivs \
--parallel "$JOBS"
}
capture()
{
infer capture --compilation-database compile_commands.json --project-root /mnt/src --results-dir "${result_dir}" "$@"
}
analyze()
{
infer analyze --project-root /mnt/src --results-dir "${result_dir}" --max-jobs "${JOBS}" "$@"
}
check()
{
file=$1
msg=$2
if [ -f "${file}" ]; then
filesize=$(stat -c%s "$file")
# 2 is the size of an empty json array '[]'
if [ "$filesize" -gt 2 ]; then
echo "$msg"
echo
echo "Here are the changes:"
jq . "${file}"
return 1
fi
fi
return 0
}
full_analysis()
{
pushd $base/bld
trap 'popd' RETURN
echo "full run, this could take a while"
capture
analyze
cp -a "$result_dir" "$infer/$commit"
}
incremental_analysis()
{
pushd $base/bld
trap 'popd' RETURN
echo "incremental run"
# We've copied over a result dir, so we're continuing
# https://fbinfer.com/docs/infer-workflow/#differential-workflow
# using 'infer capture" instead infer run
capture --reactive
# some form of incremental
analyze --changed-files-index $base/index.txt
# Preserve result
cp "${result_dir}"/report.json $base/report.json
# just in case these have changed, including generated files
build
# Can we use the previous captured $infer/$merge_base
capture --merge-capture "$infer/$merge_base" --reactive --mark-unchanged-procs
analyze --incremental-analysis --changed-files-index ../index.txt
# It may be merged next, or a commit pushed on top of it.
infer reportdiff --report-current $base/report.json --report-previous "${result_dir}"/report.json --project-root /mnt/src --results-dir "${result_dir}"
## At this point we have infer_results/differential/{fixed,introduced}.json
#!? Change the name as we're going to use differential as a main branch difference
#!!mv "${result_dir}"/differential "${result_dir}"/diff_prev_commit
rm -rf $base/bld $base/index.txt
# Useful enough to save as $infer/
# Its unknown if this is on main branch or now, but just save.
# If its merged next, then a commit exists, if a user appends
# a commit, we've got a smaller delta.
cp -a "${result_dir}" "$infer/${commit}"
# Look at the changes from the main branch
#
# Take the main branch report.json
# remove fixed, add introduced, and then walk
# though other commits, if they exist, and apply the
# same again up until, and including the last commit
source $sources/VERSION
branch=${MYSQL_VERSION_MAJOR}.${MYSQL_VERSION_MINOR}
pushd $sources
merge_base=$(git merge-base "origin/$branch" "$commit")
mapfile -t commits < <(git rev-list "${merge_base}..${commit}")
popd
base=$infer/$merge_base
last_ref=$base
for common_commit in "${commits[@]}"; do
diff_dir="${infer}/$common_commit"/differential/
if [ -d "$diff_dir" ]; then
# removed fixed issues and append introduced.
jq --slurpfile to_remove "${diff_dir}"/fixed.json '
($to_remove[0] | map(.hash)) as $hashes_to_remove
| map(select(.hash as $h | $hashes_to_remove | index($h) | not))' \
"${last_ref}"/report.json > filtered.json
jq -s 'add | unique_by(.hash)' filtered.json "${diff_dir}"/introduced.json > report.json
last_ref=$PWD
fi
done
infer reportdiff --report-current report.json --report-previous "${base}"/report.json --project-root /mnt/src --results-dir "${result_dir}_diff"
result_dir_main_diff=${result_dir}/main_diff
mv "${result_dir}_diff"/differential/ "${result_dir_main_diff}"
cp -a "${result_dir_main_diff}" "$infer/${commit}"
check "${result_dir}"/differential/fixed.json "Good human! Thanks for fixing the bad things in the last commit"
check "${result_dir}"/differential/introduced.json "Bad human! Don't introduce bad things in the last commit" >&2
check "${result_dir_main_diff}"/fixed.json "Good human! Thanks for fixing the bad things"
if check "${result_dir_main_diff}"//introduced.json "Bad human! Don't introduce bad things" >&2; then
exit 1
fi
}
################################################################################
## MAIN SCRIPT ##
################################################################################
host_cleanup
if [ -d "${infer}/$commit" ]; then
echo "Already scanned $commit"
exit 0
fi
get_source
if populate_differences; then
echo "No common commit ancestor with analysis or over depth limit($limit)" >&2
echo "This is going to take a while for a full scan"
fi
if [ ! -f index.txt ]; then
RUN_MODE="full"
else
RUN_MODE="incremental"
fi
build
if [ "$RUN_MODE" = "full" ]; then
full_analysis
fi
if [ "$RUN_MODE" = "incremental" ]; then
incremental_analysis
fi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much nicer

First draft. Seeing if github actions can handle building the entire infer from scratch.
Their last release 1.2.0 was in June 2024 and its actively developed.