An x86 JIT playground for writing microbenchmarks and other experiments for understanding microarchitectural implementation details.
The crates in this repository rely heavily on
CensoredUsername/dynasm-rs
for generating code during runtime, and you will probably want to read
the dynasm-rs
documentation if
you intend on writing your own experiments.
Unlike eigenform/lamina, this
relies on the "raw events" exposed via the Linux perf
API in userspace.
config.sh - Wrapper for invoking setup scripts
perfect/ - Main library crate
perfect-zen2/ - Zen2 experiments
perfect-zen3/ - Zen3 experiments
perfect-tremont/ - Tremont experiments
scripts/ - Miscellaneous scripts
All of the experiments here are small programs used to demonstrate, observe, and document different microarchitectural implementation details. This includes things like:
- Measuring the sizes of hardware buffers
- Demonstrating the behavior of certain hardware optimizations
- Demonstrating problems in microarchitectural security
In general, all of the experiments are implemented by:
- Emitting chunks of code with a run-time assembler
- Running emitted code after configuring performance-monitoring counters
- Printing some results to
stdout
These experiments are meant to serve as a kind of executable documentation for certain things. This will not be very useful to you unless you're planning on reading and understanding the code!
Note that most of the interesting experiments here are probably only relevant for the Zen 2 microarchitecture (and potentially previous/later Zen iterations, depending on the particular experiment). These are not intended to be portable to different platforms since they necessarily take advantage of implementation details specific to the microarchitecture.
- Integer PRF Capacity
- FP/Vector PRF Capacity
- Store Queue Capacity
- Load Queue Capacity
- Reorder Buffer Capacity
- Taken Branch Buffer Capacity
- Dispatch Behavior
- Branch Direction Prediction
- Branch Target Prediction
- Direction Predictor Stimulus/Response
- L1D Way Prediction
- Observing CVE-2023-20593 (Zenbleed)
- Observing CVE-2021-26318/AMD-SB-1017 (PREFETCH Behavior Across Privilege Domains)
- Observing CVE-2022-4543 (EntryBleed)
- Observing Speculative Loads with Timing (Flush+Reload)
- Observing L1D Way Mispredictions (Collide+Probe)
NOTE: Users can also use
./config.sh
and the provided scripts to toggle certain features. In the near future, these scripts will be removed, and users will be expected to use theperfect-env
binary.
NOTE: My machine sets the
kernel.perf_event_paranoid
sysctl
knob to-1
at boot-time. It's not clear yet whether this is actually necessary to support our use of theperf
API, and there is currently no command inperfect-env
for changing this during runtime.
Users are expected to use the perfect-env
binary in order to configure
certain features on the target machine during runtime before experiments.
Toggling these features requires root permissions on the target machine.
See the --help
flag for more details.
# Build the `perfect-env` binary
$ cargo build --release --bin perfect-env
...
$ sudo ./target/release/perfect-env --help
...
# Apply the default configuration
$ sudo ./target/release/perfect-env defaults
The "default" configuration applies the following changes:
- Use of the
RDPMC
instruction is allowed in userspace - The
vm.mmap_min_addr
sysctl
knob is set to zero - Simultaneous Multi-threading (SMT) is disabled
cpufreq
frequency boosting is disabled
See documentation in the source for more details about which settings might be required/optional for a particular experiment.
Most [if not all] experiments also assume that a particular CPU core is
isolated from interrupts and other tasks scheduled by the kernel.
This requires the following kernel command-line options (where N
is the core
you expect to be running experiments on):
isolcpus=nohz,domain,managed_irq,N nohz_full=N
WARNING:
Under normal circumstances (without
isolcpus
), the Linux watchdog timer relies on counter #0 being configured automatically by theperf
subsystem to count CPU cycles.Our use of the
perf-event
crate only ever configures the first available counter. This means that, whenisolcpus
is not used, correct use ofRDPMC
in measured code must read from counter #1 instead of counter #0. Otherwise, attempted uses ofRDPMC
will read the CPU cycle counter instead of the desired PMC event.You're expected to keep this in mind while writing/running experiments. Currently, all experiments assume the use of
isolcpus
, andRDPMC
is always used with counter #0.
The "harness" is a trampoline [emitted during runtime] that jumps into other
code emitted during runtime. In most experiments, this is used to collect
measurements with the RDPMC
instruction and manage all of the state
associated with running experiments.
A few important details:
-
The default configuration tries allocate the low 256MiB of virtual memory (from
0x0000_0000_0000_0000
to0x0000_0000_1000_0000
). This is used to simplify some things by allowing us to emit loads and stores with simple immediate addressing. If thevm.mmap_min_addr
sysctl knob isn't set to zero, this will cause you to panic when emitting the harness. -
The default configuration tries to allocate 64MiB at virtual address
0x0000_1337_0000_0000
for emitting the harness itself. -
The default configuration (Zen 2) pins the current process to core #15. This reflects my own setup (on 16-core the Ryzen 3950X), and you may want to change this to something suitable for your own setup, ie.
use perfect::*; fn main() { let harness = HarnessConfig::default_zen2() .pinned_core(3) .emit(); ... }
See ./perfect/src/harness.rs
for more details.
Typical usage looks something like this:
# Disable SMT, enable RDPMC, disable frequency scaling, enable low mmap()
$ sudo ./target/release/perfect-env smt off
$ sudo ./target/release/perfect-env rdpmc on
$ sudo ./target/release/perfect-env boost off
$ sudo ./target/release/perfect-env mmap-min-addr 0
# Run an experiment
$ cargo run -r -p perfect-zen2 --bin <experiment>
...