Skip to content

Conversation

@ricardon
Copy link
Contributor

This pull request updates in the OHCL kernel the patchset to use the ACPI wakeup mailbox for DeviceTree to the latest version on LKML (available here: https://lore.kernel.org/all/[email protected]).

Details on the design and implementation can be retrieved from the cover letter on LKML.

Before applying the patches, I reverted the previous version for clarity.

Systems that describe hardware using DeviceTree graphs may enumerate and
implement the wakeup mailbox as defined in the ACPI specification but do
not otherwise depend on ACPI. Expose functions to setup and access the
location of the wakeup mailbox from outside ACPI code.

The function acpi_setup_mp_wakeup_mailbox() stores the physical address of
the mailbox and updates the wakeup_secondary_cpu_64() APIC callback.

The function acpi_madt_multiproc_wakeup_mailbox() returns a pointer to the
mailbox.

Signed-off-by: Ricardo Neri <[email protected]>
@chris-oo
Copy link
Member

This requires a corresponding openhcl_boot change for acpi=off on the command line correct?

@ricardon
Copy link
Contributor Author

This requires a corresponding openhcl_boot change for acpi=off on the command line correct?

Yes, that is correct.

@chris-oo
Copy link
Member

chris-oo commented Nov 19, 2025

See microsoft/openvmm#2441 for the openhcl_boot change we'll take with the kernel update.

@hargar19 hargar19 requested a review from dcui November 19, 2025 06:31
ricardon and others added 10 commits November 19, 2025 09:36
sysctl_sched_itmt_enabled is declared in asm/topology.h with the
__read_mostly attribute, but the header does not include linux/cache.h.
This causes a build failure when a file includes asm/topology.h without
including linux/cache.h:

     ./arch/x86/include/asm/topology.h:264:27: error: expected ‘=’, ‘,’,
      ‘;’, ‘asm’ or ‘__attribute__’ before ‘sysctl_sched_itmt_enabled’
     264 | extern bool __read_mostly sysctl_sched_itmt_enabled;
         |                           ^~~~~~~~~~~~~~~~~~~~~~~~~

Include the needed header.

Signed-off-by: Ricardo Neri <[email protected]>
Add DeviceTree bindings to enumerate the wakeup mailbox used in platform
firmware for Intel processors.

x86 platforms commonly boot secondary CPUs using an INIT assert, de-assert
followed by Start-Up IPI messages. The wakeup mailbox can be used when this
mechanism is unavailable.

The wakeup mailbox offers more control to the operating system to boot
secondary CPUs than a spin-table. It allows the reuse of the same wakeup
vector for all CPUs while maintaining control over which CPUs to boot and
when. While it is possible to achieve the same level of control using a
spin-table, it would require specifying a separate `cpu-release-addr` for
each secondary CPU.

The operation and structure of the mailbox are described in the
Multiprocessor Wakeup Structure defined in the ACPI specification. Note
that this structure does not specify how to publish the mailbox to the
operating system (ACPI-based platform firmware uses a separate table). No
ACPI table is needed in DeviceTree-based firmware to enumerate the mailbox.

Nodes that want to refer to the reserved memory usually define
a `memory-region` property. /cpus/cpu* nodes would want to refer to the
mailbox, but they do not have such property defined in the DeviceTree
specification. Moreover, it would imply that there is a memory region per
CPU. Instead, add a `compatible` property that the operating system can use
to discover the mailbox.

Reviewed-by: Dexuan Cui <[email protected]>
Reviewed-by: Rob Herring (Arm) <[email protected]>
Acked-by: Rafael J. Wysocki (Intel) <[email protected]>
Co-developed-by: Yunhong Jiang <[email protected]>
Signed-off-by: Yunhong Jiang <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
The Wakeup Mailbox is a mechanism to boot secondary CPUs on systems that do
not want or cannot use the INIT + StartUp IPI messages.

The platform firmware is expected to implement the mailbox as described in
the Multiprocessor Wakeup Structure of the ACPI specification. It is also
expected to publish the mailbox to the operating system as described in the
corresponding DeviceTree schema that accompanies the documentation of the
Linux kernel.

Reuse the existing functionality to set the memory location of the mailbox
and update the wakeup_secondary_cpu_64() APIC callback. Make this
functionality available to DeviceTree-based systems by making CONFIG_X86_
MAILBOX_WAKEUP depend on either CONFIG_OF or CONFIG_ACPI_MADT_WAKEUP.

do_boot_cpu() uses wakeup_secondary_cpu_64() when set. It will be set if a
wakeup mailbox is enumerated via an ACPI table or a DeviceTree node. For
cases in which this behavior is not desired, this APIC callback can be
updated later during boot using platform-specific hooks.

Reviewed-by: Dexuan Cui <[email protected]>
Co-developed-by: Yunhong Jiang <[email protected]>
Signed-off-by: Yunhong Jiang <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
Hyper-V VTL clears x86_platform.realmode_{init(), reserve()} in
hv_vtl_init_platform() whereas it sets real_mode_header later in
hv_vtl_early_init(). There is no need to deal with the settings of real
mode memory in two places. Also, both functions are called much earlier
than x86_platform.realmode_init() (via an early_initcall), where the
real_mode_header is needed.

Set real_mode_header in hv_vtl_init_platform() to keep all code dealing
with memory for the real mode trampoline in one place. Besides making the
code more readable, it prepares it for a subsequent changeset in which the
behavior needs to change to support Hyper-V VTL guests in TDX a
environment.

Reviewed-by: Dexuan Cui <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Yunhong Jiang <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
x86 CPUs boot in real mode. This mode uses a 1MB address space. The
trampoline must reside below this 1MB memory boundary.

There are platforms in which the firmware boots the secondary CPUs,
switches them to long mode and transfers control to the kernel. An example
of such a mechanism is the ACPI Multiprocessor Wakeup Structure.

In this scenario there is no restriction on locating the trampoline under
1MB memory. Moreover, certain platforms (for example, Hyper-V VTL guests)
may not have memory available for allocation below 1MB.

Add a new member to struct x86_init_resources to specify the upper bound
for the location of the trampoline memory. Preserve the default upper bound
of 1MB to conserve the current behavior.

Reviewed-by: Dexuan Cui <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Originally-by: Thomas Gleixner <[email protected]>
Signed-off-by: Yunhong Jiang <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
The hypervisor is an untrusted entity for TDX guests. It cannot be used
to boot secondary CPUs - neither via hypercalls nor the INIT assert,
de-assert, plus Start-Up IPI messages.

Instead, the platform virtual firmware boots the secondary CPUs and
puts them in a state to transfer control to the kernel. This mechanism uses
the wakeup mailbox described in the Multiprocessor Wakeup Structure of the
ACPI specification. The entry point to the kernel is trampoline_start64.

Allocate and setup the trampoline using the default x86_platform callbacks.

The platform firmware configures the secondary CPUs in long mode. It is no
longer necessary to locate the trampoline under 1MB memory. After handoff
from firmware, the trampoline code switches briefly to 32-bit addressing
mode, which has an addressing limit of 4GB. Set the upper bound of the
trampoline memory accordingly.

Reviewed-by: Dexuan Cui <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Yunhong Jiang <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
A Hyper-V VTL level 2 guest in a TDX environment needs to map the physical
page of the ACPI Multiprocessor Wakeup Structure as private (encrypted). It
needs to know the physical address of this structure. Add a helper function
to retrieve the address.

Suggested-by: Michael Kelley <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
The current code maps MMIO devices as shared (decrypted) by default in a
confidential computing VM.

In a TDX environment, secondary CPUs are booted using the Multiprocessor
Wakeup Structure defined in the ACPI specification. The virtual firmware
and the operating system function in the guest context, without
intervention from the VMM. Map the physical memory of the mailbox as
private. Use the is_private_mmio() callback.

Reviewed-by: Dexuan Cui <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Yunhong Jiang <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
The hypervisor is an untrusted entity for TDX guests. It cannot be used
to boot secondary CPUs. The function hv_vtl_wakeup_secondary_cpu() cannot
be used.

Instead, the virtual firmware boots the secondary CPUs and places them in
a state to transfer control to the kernel using the wakeup mailbox. The
firmware enumerates the mailbox via either an ACPI table or a DeviceTree
node.

If the wakeup mailbox is present, the kernel updates the APIC callback
wakeup_secondary_cpu_64() to use it.

Reviewed-by: Dexuan Cui <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
The wakeup mailbox that the virtual firmware implements to boot secondary
CPUs is defined in the ACPI specification (see version 6.6 section
5.2.12.19). The code in the kernel that makes use of the mailbox resides
in the x86 ACPI subsystem. CONFIG_ACPI needs to be set as 'y' to select it.

The option CONFIG_ACPI selects or enables many other configuration options
which in turn select more options that are not used with DeviceTree-based
firmware. Unselect all the options that have a menuconfig prompt.

The newly selected code remains dormant if the acpi=off is specified in the
kernel command line. The code that interacts with the mailbox remains
usable for DeviceTree platform firmware.

These are the options that are selected after running `make olddefconfig`
with this changeset:

 * CONFIG_ACPI_MADT_WAKEUP=y
   Enables the wakeup mailbox.

 * CONFIG_ACPI_LPIT
   Support for ACPI low-power idle table (not used with DeviceTree FW).

 * CONFIG_ACPI_LEGACY_TABLES_LOOKUP=y
   Tweaks ACPI root table lookup (not used with DeviceTree FW).

 * CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC=y
 * CONFIG_ARCH_HAS_ACPI_TABLE_UPGRADE=y
   Architecture capabilities (unused).

 * CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT=y
   Support for ACPI-based suspend/resume (not used with DeviceTree FW).

 * CONFIG_ACPI_HOTPLUG_IOAPIC=y
   Support for IO-APIC hotplug (not used with DeviceTree FW).

 * CONFIG_HAVE_ACPI_APEI=y
 * CONFIG_HAVE_ACPI_APEI_NMI=y
   Architecture capabilities (not used as CONFIG_ACPI_APEI=n).

 * CONFIG_PCI_LABEL=y
   ACPI-provided PCI naming facilities (not used with DeviceTree FW).

 * CONFIG_PNP=y
 * CONFIG_PNPACPI=y
   Support only. Does not add drivers.

 * CONFIG_FIRMWARE_TABLE=y
   Library for parsing ACPI tables.

Signed-off-by: Ricardo Neri <[email protected]>
@ricardon ricardon force-pushed the rneri/updated-dt-wakeup-mailbox branch from 3be0357 to 23e4de9 Compare November 19, 2025 17:38
@ricardon
Copy link
Contributor Author

I updated my branch to reflect fixes for breaks reported by the kernel test robot [email protected].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants