Multi-language CPU benchmarking suite for reproducible, parameterized performance across Python, C, C++, Go, and Java — with configurable data scales and thread counts to exercise both single‑core and multi‑core performance on emerging workloads.
Focus: multi-language coverage, emerging workloads (DL, video, KV/search, compilers), configurable data scales, configurable threading (single‑core → multi‑core sweeps), reproducible runs with parameter overrides, optional hardware counters, and clean artifact hygiene.
- Features
- Repository Layout
- Prerequisites & Installation
- Quick Start
- Benchmark Metadata Schema
- Transformer Inference Example
- Parameter Override Cheat Sheet
- Running Setup Only
- Running Workloads
- Perf Collection
- Preset Runner
- Add a New Benchmark
- Cleaning Artifacts
- System Info Capture
- Notes
- Multi-language coverage: Python / C / C++ / Go / Java
- Emerging workloads: deep learning transformers/CNNs, video processing, key‑value stores & search, compilers/builds, numerical kernels
- Configurable data scales: tune dataset sizes and generation for small smoke tests → stress runs
- Configurable threading: sweep thread counts to measure single‑core and multi‑core scaling
- Toolchain flexibility: compilers/interpreters are a first‑class evaluation dimension
- Unified runner with structured logs & CSV outputs for analysis
- Per-benchmark setup and optional data generation hooks
- Declarative parameter overrides (data / workload / setup) for reproducibility
- Optional
perf statcollection with cache dropping and curated event sets - Utilities: artifact cleanup and system information capture
configs/benchmarks_index.yaml– Registry of benchmarks (name → path)src/– Benchmark source trees grouped by languagescripts/run_cpu.py– Core workload runnerscripts/setup_env.py– Setup/build phase executor onlyscripts/clean_artifacts.py– Remove build/data/vendor artifactsrun_v_0_0_1.py– Orchestrated preset workload + parameter set runnerres/– Results & system info (e.g.system_info.json)log/– Logs from preset runs
Supported OS/Arch: Linux (x86_64 & ARM64).
Minimum versions:
| Tool | Version |
|---|---|
| Python | >= 3.10 |
| GCC/G++ | >= 11 |
| Clang/Clang++ | >= 14 |
| Java (JDK) | >= 17 |
| Go | >= 1.24.5 |
| (Optional) | perf |
Compiler / interpreter choice is an evaluation dimension. Point
PATH(or system alternatives) sogcc,clang,java,go,python3resolve to your selected versions.
# Environment setup
sudo apt-get update
sudo apt-get install -y python3 python3-venv python3-pip gcc g++ clang openjdk-17-jdk
pip3 install pyyaml
# Go (x86_64) – adapt for ARM64
wget https://go.dev/dl/go1.24.5.linux-amd64.tar.gz -O /tmp/go.tar.gz \
&& sudo tar -C /usr/local -xzf /tmp/go.tar.gz \
&& echo 'export PATH=/usr/local/go/bin:$PATH' | sudo tee /etc/profile.d/go.sh
# perf
sudo apt-get install -y linux-perf || sudo apt-get install -y perf
# clone
git clone https://github.com/YukiCheZ/cpu_bench.git && cd cpu_bench
# Setup and run a pair of workloads with override param
python3 scripts/run_cpu.py --workloads numpy_benchmark.matmul ffmpeg_benchmark.ffmpeg \
--setup-env --set-param numpy_benchmark.matmul.workload.size=2048 --verbose
# CPU Bench full preset suite
python3 run_v_0_0_1.py
# Clean artifacts (preview then execute)
python3 scripts/clean_artifacts.py --all --verboseEach benchmark has a metadata.yaml defining setup and workloads.
Key fields:
name,type,language,domain[]setup.command+ optionalsetup.parametersworkloads[]: each workload may have:name,descriptiondata.command+data.parameters(optional pre-run generation)command(main workload)parameters(runtime options)
characteristics: e.g.cpu_bound,memory_intensivetags: free-form grouping labelsdependencies: e.g.python>=3.8,torch>=2.0
Parameter semantics:
- Scalars →
--param value - Boolean defaults → flag style (
--param) when true - Explicit boolean override:
...param=true|false(onlytrueemits flag) - Unknown override keys are ignored with warning
Location: src/languages/python/transformer_inference/metadata.yaml
Setup:
setup:
command: python3 setup.pyData generation & workload:
workloads:
- name: transformer_inference
data:
command: python3 generate_data.py
parameters:
num_batches: { default: 50 }
batch_size: { default: 8 }
seq_len: { default: 256 }
vocab_size: { default: 10000 }
command: python3 run_benchmark.py
parameters:
threads: { default: 1 }
num_encoder_layers:{ default: 24 }
compile: { default: false }Run sequence:
# Setup
python3 scripts/setup_env.py --benches transformer_inference --verbose
# Data + workload with overrides
python3 scripts/run_cpu.py --workloads transformer_inference.transformer_inference \
--set-param transformer_inference.transformer_inference.data.num_batches=10 \
--set-param transformer_inference.transformer_inference.workload.threads=8 \
--set-param transformer_inference.transformer_inference.workload.compile=true \
--verboseFormat patterns:
<benchmark>.<workload>.data.<param>=<value> # data generation
<benchmark>.<workload>.workload.<param>=<value> # workload runtime
<benchmark>._.setup.<param>=<value> # setup phase
<benchmark>.<workload>.workload.<flag> # boolean flag (default true or explicit true)
Examples:
--set-param numpy_benchmark.matmul.workload.size=4096
--set-param ffmpeg_benchmark._.setup.compiler=clang
--set-param ffmpeg_benchmark._.setup.opt=-O2
--set-param transformer_inference.transformer_inference.workload.compilepython3 scripts/setup_env.py --benches <bench1> <bench2> --verbose
python3 scripts/setup_env.py --all --verbose
python3 scripts/setup_env.py --benches ffmpeg_benchmark \
--set-param ffmpeg_benchmark._.setup.compiler=clang \
--set-param ffmpeg_benchmark._.setup.opt=-O2 --verboseOutput log: setup_logs_<timestamp>.log (override with --out-log).
# Specific workloads
python3 scripts/run_cpu.py --workloads numpy_benchmark.matmul ffmpeg_benchmark.ffmpeg --verbose
# Entire benchmarks
python3 scripts/run_cpu.py --benches numpy_benchmark ffmpeg_benchmark --setup-env --verboseCSV: benchmark_name,workload_name,elapsed_time(s),param_overrides
Enable counters:
Edit variable PERF_CMD_PREFIX of scripts/run_cpu.py, add the microarchitectural events you want to observe, the script contains an example of an event list for a Skylake architecture.
python3 scripts/run_cpu.py --workloads numpy_benchmark.matmul --use-perf --verboseBehavior:
- Drops caches:
sync; echo 3 > /proc/sys/vm/drop_caches - Wraps command with predefined
perf statevents - Stores outputs under
perf/near log file
Requires permission for /proc/sys/vm/drop_caches (may need sudo).
run_v_0_0_1.py is the suite orchestrator for version v0.0.1 — it runs the full benchmark selection by combining curated workload sets with the parameter set v0.0.1 (data augmentation, compiler/env choices, optimization flags, and threading).
python3 run_v_0_0_1.pyArtifacts:
- Logs →
log/logs_<tag>_<timestamp>.log - Results →
res/results_<tag>_<timestamp>.csv
Networking policy:
- The full run flow is
setup → data → workload. Only the setup phase may access the network (e.g., to download dependencies). The data and workload phases are designed to run offline.
Failure reporting:
- At the end of a run, the log aggregates all failed workloads with their error reasons under a "[WARNING SUMMARY] Failed workloads" section.
- Create
src/languages/<lang>/<benchmark_name>/+metadata.yaml. - Register in
configs/benchmarks_index.yaml. - Ensure workload prints result line.
- (Optional) Add to a preset set in
run_v_0_0_1.py.
Minimal template:
name: my_benchmark
language: C
setup:
command: ./build.sh
workloads:
- name: main
command: ./run.sh
parameters:
size: { default: 1024, description: problem size }
characteristics:
cpu_bound: true
tags: [numeric]
dependencies:
gcc: ">=11"Note: Your workload must emit exactly one parseable line:
[RESULT] Total elapsed time: <seconds> s
The runner extracts <seconds> as a float; absence triggers an error.
python3 scripts/clean_artifacts.py --benches ffmpeg_benchmark opencv_benchmark --verbose
python3 scripts/clean_artifacts.py --all --dry-run --verbose
python3 scripts/clean_artifacts.py --all --verboseTargets removed: data, bin, .m2, build, target, deps_vendored, go.mod, go.sum, vendor.
run_v_0_0_1.py writes res/system_info.json with:
- CPU (
lscpu), memory (/proc/meminfo), OS (/etc/os-release),uname -a - Toolchain versions (gcc, clang, python, java, go)
- Optional DMI (if
dmidecodepresent)
- Large workload sizes may stress memory; scale down via overrides for smoke tests.
- Verify
benchmarks_index.yamlpaths before first run. - Adjust thread counts to explore scaling; record toolchain versions for fairness.
- Perf requires appropriate kernel permissions; consider running as a privileged user if needed.
- Only the setup phase requires network connectivity; subsequent data/workload phases are offline.
- Check the run log for a summary of failed workloads and reasons.
Pull requests adding new benchmarks, refining metadata, or expanding parameter sets are welcome. Please include:
metadata.yamlwith descriptions & defaults- Example result line in README or PR description
- Justification of benchmark relevance (CPU characteristics)
MIT License. See LICENSE for full terms.