Skip to content

Commit 2d7479e

Browse files
iximeowgjcolombo
andauthored
add CPU platforms to instances (#8728)
this materializes RFD 314 and in some respects, 505. builds on #8725 for CPU family information, which is a stand-in for the notion of sled families and generations described in RFD 314. There are a few important details here where CPU platforms differ from the sled CPU family and I've refreshed RFD 314 (and 505) as appropriate. ## hardware CPU families are less linear than Oxide CPU platforms. We can (and do, in RFD 314) define Milan restrictively enough that we can present Turin (and probably later!) CPUs to guests "as if" they were Milan. Similarly I'd expect that Turin would be defined as roughly "Milan-plus-some-AVX-512-features" and pretty forward-compatible. Importantly these are related to but not directly representative of real CPUs; as an example I'd expect "Turin"-the-instance-CPU-platform to be able to run on a Turin Dense CPU. Conversely, there's probably not a reason _to_ define a "Turin Dense" CPU platform since from a guest perspective they'd look about the same. But at the same time the lineage through the AMD server part family splits at Zen 4 kind of, with Zen 4 vs Zen 4c-based parts and similar with Zen 5/c. It's somewhat hard (I think) to predict what workloads would be sensitive to this. And as #8730 gets into a bit, the details of a processor's packaging (core topology, frequency, cache size) can vary substantially even inside one CPU family. The important part here is that we do not expect CPU platforms to cover these details and it would probably be cumbersome to try; if the instance's constraint is "I want AVX256, and I want to be on high-frequency-capable processors only", then it doesn't actually matter if it's run on a Turin or a Milan and to tie it to that CPU platform may be overly restrictive. On instance CPU platforms, the hope is that by focusing on CPU features we're able to present a more linear path as the microarchitectures grow. ## instance platforms aren't "minimum" I've walked back the initial description of an instance's CPU platform as the "minimum CPU platform". As present in other systems, "minimum CPU platform" would more analogously mean "can we put you on a Rome Gimlet or must we put you on a Milan Gimlet?", or "Genoa Cosmo vs Turin Cosmo?" - it doesn't seem _possible_ to say "this instance must have AVX 512, but otherwise I don't care what kind of hardware it runs on.", but that's more what _we mean_ by CPU platform. In a "minimum CPU platform" interpretation, we _could_ provide a bunch of Turin CPUID bits to a VM that said it wanted Milan. But since there's no upper bound here, if an OS has an issue with a future "Zen 14" or whatever, a user would discover that by their "minimum-Milan" instance getting scheduled on the new space-age processor and exploding on boot or something. OSes _shouldn't_ do that, but... Implementation-wise, this is really just about the names right now. You always get Milan CPUID leaves for the time being. When there are Turin CPUID leaves defined for the instance CPU platform, and Cosmos on which they make sense, this becomes more concrete. ## "are these CPU platforms compatible?" RFD 314 has a section now, and I've added a stub function, covering some more obvious ways that CPU platforms would be *incompatible*. This is particularly fraught if we consider being incorrect about topology an incompatibility, but even setting that aside several bits in CPUID are descriptive of architectural behaviors and are not easily (or at all) able to be emulated. `functionally_same()` and the CPUID profiles here may be fated to move out of Omicron and into another crate which can be shared with Propolis, where it can ensure that a requested profile is consistent with the hardware on which Propolis would create a VM (not to mention test uses) --------- Co-authored-by: Greg Colombo <[email protected]>
1 parent c6ee759 commit 2d7479e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+2833
-62
lines changed

Cargo.lock

Lines changed: 10 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -653,6 +653,7 @@ rand_distr = "0.5.1"
653653
rand_seeder = "0.4.0"
654654
range-requests = { path = "range-requests" }
655655
ratatui = "0.29.0"
656+
raw-cpuid = { git = "https://github.com/oxidecomputer/rust-cpuid.git", rev = "0a8dbd2311263f6a59ea58089e33c8331436ff3a" }
656657
rayon = "1.10"
657658
rcgen = "0.12.1"
658659
reconfigurator-cli = { path = "dev-tools/reconfigurator-cli" }

common/src/api/external/mod.rs

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1197,6 +1197,10 @@ pub struct Instance {
11971197

11981198
#[serde(flatten)]
11991199
pub auto_restart_status: InstanceAutoRestartStatus,
1200+
1201+
/// The CPU platform for this instance. If this is `null`, the instance
1202+
/// requires no particular CPU platform.
1203+
pub cpu_platform: Option<InstanceCpuPlatform>,
12001204
}
12011205

12021206
/// Status of control-plane driven automatic failure recovery for this instance.
@@ -1261,6 +1265,51 @@ pub enum InstanceAutoRestartPolicy {
12611265
BestEffort,
12621266
}
12631267

1268+
/// A required CPU platform for an instance.
1269+
///
1270+
/// When an instance specifies a required CPU platform:
1271+
///
1272+
/// - The system may expose (to the VM) new CPU features that are only present
1273+
/// on that platform (or on newer platforms of the same lineage that also
1274+
/// support those features).
1275+
/// - The instance must run on hosts that have CPUs that support all the
1276+
/// features of the supplied platform.
1277+
///
1278+
/// That is, the instance is restricted to hosts that have the CPUs which
1279+
/// support all features of the required platform, but in exchange the CPU
1280+
/// features exposed by the platform are available for the guest to use. Note
1281+
/// that this may prevent an instance from starting (if the hosts that could run
1282+
/// it are full but there is capacity on other incompatible hosts).
1283+
///
1284+
/// If an instance does not specify a required CPU platform, then when
1285+
/// it starts, the control plane selects a host for the instance and then
1286+
/// supplies the guest with the "minimum" CPU platform supported by that host.
1287+
/// This maximizes the number of hosts that can run the VM if it later needs to
1288+
/// migrate to another host.
1289+
///
1290+
/// In all cases, the CPU features presented by a given CPU platform are a
1291+
/// subset of what the corresponding hardware may actually support; features
1292+
/// which cannot be used from a virtual environment or do not have full
1293+
/// hypervisor support may be masked off. See RFD 314 for specific CPU features
1294+
/// in a CPU platform.
1295+
#[derive(
1296+
Copy, Clone, Debug, Deserialize, Serialize, JsonSchema, Eq, PartialEq,
1297+
)]
1298+
#[serde(rename_all = "snake_case")]
1299+
pub enum InstanceCpuPlatform {
1300+
/// An AMD Milan-like CPU platform.
1301+
AmdMilan,
1302+
1303+
/// An AMD Turin-like CPU platform.
1304+
// Note that there is only Turin, not Turin Dense - feature-wise there are
1305+
// collapsed together as the guest-visible platform is the same.
1306+
// If the two must be distinguished for instance placement, we'll want to
1307+
// track whatever the motivating constraint is more explicitly. CPU
1308+
// families, and especially the vendor code names, don't necessarily promise
1309+
// details about specific processor packaging choices.
1310+
AmdTurin,
1311+
}
1312+
12641313
// AFFINITY GROUPS
12651314

12661315
/// Affinity policy used to describe "what to do when a request cannot be satisfied"

dev-tools/omdb/src/bin/omdb/db.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4774,6 +4774,7 @@ async fn cmd_db_instance_info(
47744774
propolis_ip: _,
47754775
propolis_port: _,
47764776
instance_id: _,
4777+
cpu_platform: _,
47774778
time_created,
47784779
time_deleted,
47794780
runtime:
@@ -7376,6 +7377,7 @@ fn prettyprint_vmm(
73767377
const INSTANCE_ID: &'static str = "instance ID";
73777378
const SLED_ID: &'static str = "sled ID";
73787379
const SLED_SERIAL: &'static str = "sled serial";
7380+
const CPU_PLATFORM: &'static str = "CPU platform";
73797381
const ADDRESS: &'static str = "propolis address";
73807382
const STATE: &'static str = "state";
73817383
const WIDTH: usize = const_max_len(&[
@@ -7386,6 +7388,7 @@ fn prettyprint_vmm(
73867388
INSTANCE_ID,
73877389
SLED_ID,
73887390
SLED_SERIAL,
7391+
CPU_PLATFORM,
73897392
STATE,
73907393
ADDRESS,
73917394
]);
@@ -7399,6 +7402,7 @@ fn prettyprint_vmm(
73997402
sled_id,
74007403
propolis_ip,
74017404
propolis_port,
7405+
cpu_platform,
74027406
runtime: db::model::VmmRuntimeState { state, r#gen, time_state_updated },
74037407
} = vmm;
74047408

@@ -7425,6 +7429,7 @@ fn prettyprint_vmm(
74257429
if let Some(serial) = sled_serial {
74267430
println!("{indent}{SLED_SERIAL:>width$}: {serial}");
74277431
}
7432+
println!("{indent}{CPU_PLATFORM:>width$}: {cpu_platform}");
74287433
}
74297434

74307435
async fn cmd_db_vmm_list(
@@ -7500,6 +7505,7 @@ async fn cmd_db_vmm_list(
75007505
sled_id,
75017506
propolis_ip: _,
75027507
propolis_port: _,
7508+
cpu_platform: _,
75037509
runtime:
75047510
db::model::VmmRuntimeState {
75057511
state,

end-to-end-tests/src/instance_launch.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ async fn instance_launch() -> Result<()> {
7979
start: true,
8080
auto_restart_policy: Default::default(),
8181
anti_affinity_groups: Vec::new(),
82+
cpu_platform: None,
8283
})
8384
.send()
8485
.await?;

nexus/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,7 @@ oxide-tokio-rt.workspace = true
134134
oximeter.workspace = true
135135
oximeter-instruments = { workspace = true, features = ["http-instruments"] }
136136
oximeter-producer.workspace = true
137+
raw-cpuid = { workspace = true, features = ["std"] }
137138
rustls = { workspace = true }
138139
rustls-pemfile = { workspace = true }
139140
update-common.workspace = true

nexus/db-model/src/instance.rs

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
use super::InstanceIntendedState as IntendedState;
66
use super::{
77
ByteCount, Disk, ExternalIp, Generation, InstanceAutoRestartPolicy,
8-
InstanceCpuCount, InstanceState, Vmm, VmmState,
8+
InstanceCpuCount, InstanceCpuPlatform, InstanceState, Vmm, VmmState,
99
};
1010
use crate::collection::DatastoreAttachTargetConfig;
1111
use crate::serde_time_delta::optional_time_delta;
@@ -68,6 +68,12 @@ pub struct Instance {
6868
#[diesel(column_name = boot_disk_id)]
6969
pub boot_disk_id: Option<Uuid>,
7070

71+
/// The instance's required CPU platform. If this is `None`, Nexus will not
72+
/// constrain placement decisions by CPU platform. Instead, after selecting
73+
/// a sled by any other constraints the instance will be incarnated with the
74+
/// most general CPU platform supported by the selected sled.
75+
pub cpu_platform: Option<InstanceCpuPlatform>,
76+
7177
#[diesel(embed)]
7278
pub runtime_state: InstanceRuntimeState,
7379

@@ -139,6 +145,7 @@ impl Instance {
139145
// Intentionally ignore `params.boot_disk_id` here: we can't set
140146
// `boot_disk_id` until the referenced disk is attached.
141147
boot_disk_id: None,
148+
cpu_platform: params.cpu_platform.map(Into::into),
142149

143150
runtime_state,
144151
intended_state,
@@ -493,4 +500,6 @@ pub struct InstanceUpdate {
493500
pub ncpus: InstanceCpuCount,
494501

495502
pub memory: ByteCount,
503+
504+
pub cpu_platform: Option<InstanceCpuPlatform>,
496505
}
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
// This Source Code Form is subject to the terms of the Mozilla Public
2+
// License, v. 2.0. If a copy of the MPL was not distributed with this
3+
// file, You can obtain one at https://mozilla.org/MPL/2.0/.
4+
5+
use crate::SledCpuFamily;
6+
7+
use super::impl_enum_type;
8+
use serde::{Deserialize, Serialize};
9+
10+
impl_enum_type!(
11+
InstanceCpuPlatformEnum:
12+
13+
#[derive(
14+
Copy,
15+
Clone,
16+
Debug,
17+
PartialEq,
18+
AsExpression,
19+
FromSqlRow,
20+
Serialize,
21+
Deserialize
22+
)]
23+
pub enum InstanceCpuPlatform;
24+
25+
AmdMilan => b"amd_milan"
26+
AmdTurin => b"amd_turin"
27+
);
28+
29+
impl InstanceCpuPlatform {
30+
/// Returns a slice containing the set of sled CPU families that can
31+
/// accommodate an instance with this CPU platform.
32+
pub fn compatible_sled_cpu_families(&self) -> &'static [SledCpuFamily] {
33+
match self {
34+
// Turin-based sleds have a superset of the features made available
35+
// in a guest's Milan CPU platform
36+
Self::AmdMilan => {
37+
&[SledCpuFamily::AmdMilan, SledCpuFamily::AmdTurin]
38+
}
39+
Self::AmdTurin => &[SledCpuFamily::AmdTurin],
40+
}
41+
}
42+
}
43+
44+
impl From<omicron_common::api::external::InstanceCpuPlatform>
45+
for InstanceCpuPlatform
46+
{
47+
fn from(value: omicron_common::api::external::InstanceCpuPlatform) -> Self {
48+
use omicron_common::api::external::InstanceCpuPlatform as ApiPlatform;
49+
match value {
50+
ApiPlatform::AmdMilan => Self::AmdMilan,
51+
ApiPlatform::AmdTurin => Self::AmdTurin,
52+
}
53+
}
54+
}
55+
56+
impl From<InstanceCpuPlatform>
57+
for omicron_common::api::external::InstanceCpuPlatform
58+
{
59+
fn from(value: InstanceCpuPlatform) -> Self {
60+
match value {
61+
InstanceCpuPlatform::AmdMilan => Self::AmdMilan,
62+
InstanceCpuPlatform::AmdTurin => Self::AmdTurin,
63+
}
64+
}
65+
}

nexus/db-model/src/lib.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ mod image;
4444
mod instance;
4545
mod instance_auto_restart_policy;
4646
mod instance_cpu_count;
47+
mod instance_cpu_platform;
4748
mod instance_intended_state;
4849
mod instance_state;
4950
mod internet_gateway;
@@ -125,6 +126,7 @@ mod utilization;
125126
mod virtual_provisioning_collection;
126127
mod virtual_provisioning_resource;
127128
mod vmm;
129+
mod vmm_cpu_platform;
128130
mod vni;
129131
mod volume;
130132
mod volume_repair;
@@ -183,6 +185,7 @@ pub use image::*;
183185
pub use instance::*;
184186
pub use instance_auto_restart_policy::*;
185187
pub use instance_cpu_count::*;
188+
pub use instance_cpu_platform::*;
186189
pub use instance_intended_state::*;
187190
pub use instance_state::*;
188191
pub use internet_gateway::*;
@@ -252,6 +255,7 @@ pub use v2p_mapping::*;
252255
pub use virtual_provisioning_collection::*;
253256
pub use virtual_provisioning_resource::*;
254257
pub use vmm::*;
258+
pub use vmm_cpu_platform::*;
255259
pub use vmm_state::*;
256260
pub use vni::*;
257261
pub use volume::*;

nexus/db-model/src/schema_versions.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ use std::{collections::BTreeMap, sync::LazyLock};
1616
///
1717
/// This must be updated when you change the database schema. Refer to
1818
/// schema/crdb/README.adoc in the root of this repository for details.
19-
pub const SCHEMA_VERSION: Version = Version::new(189, 0, 0);
19+
pub const SCHEMA_VERSION: Version = Version::new(190, 0, 0);
2020

2121
/// List of all past database schema versions, in *reverse* order
2222
///
@@ -28,6 +28,7 @@ static KNOWN_VERSIONS: LazyLock<Vec<KnownVersion>> = LazyLock::new(|| {
2828
// | leaving the first copy as an example for the next person.
2929
// v
3030
// KnownVersion::new(next_int, "unique-dirname-with-the-sql-files"),
31+
KnownVersion::new(190, "add-instance-cpu-platform"),
3132
KnownVersion::new(189, "reconfigurator-chicken-switches-to-config"),
3233
KnownVersion::new(188, "positive-quotas"),
3334
KnownVersion::new(187, "no-default-pool-for-internal-silo"),

0 commit comments

Comments
 (0)