Commit Graph

1517 Commits

Author SHA1 Message Date
Michael Brown
25fa01822b [riscv] Serialise MMIO accesses with respect to each other
iPXE drivers have been written with the implicit assumption that MMIO
writes are allowed to be posted but that an MMIO register read or
write after another MMIO register write will always observe the
effects of the first write.

For example: after having written a byte to the transmit holding
register (THR) of a 16550 UART, it is expected that any subsequent
read of the line status register (LSR) will observe a value consistent
with the occurrence of the write.

RISC-V does not seem to provide any ordering guarantees between
accesses to different registers within the same MMIO device.  Add
fences as part of the MMIO accessors to provide the assumed
guarantees.

Use "fence io, io" before each MMIO read or write to enforce full
serialisation of MMIO accesses with respect to each other.  This is
almost certainly more conservative than is strictly necessary.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-22 09:45:09 +01:00
Michael Brown
cca1cfd49e [uart] Allow for dynamically registered 16550 UARTs
Use the generic UART driver-private data pointer, rather than
embedding the generic UART within the 16550 UART structure.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-21 23:34:32 +01:00
Michael Brown
6c8fb4b89d [uart] Allow for the existence of non-16550 UARTs
Remove the assumption that all platforms use a fixed number of 16550
UARTs identifiable by a simple numeric index.  Create an abstraction
allowing for dynamic instantiation and registration of any number of
arbitrary UART models.

The common case of the serial console on x86 uses a single fixed UART
specified at compile time.  Avoid unnecessarily dragging in the
dynamic instantiation code in this use case by allowing COMCONSOLE to
refer to a single static UART object representing the relevant port.

When selecting a UART by command-line argument (as used in the
"gdbstub serial <port>" command), allow the UART to be specified as
either a numeric index (to retain backwards compatiblity) or a
case-insensitive port name such as "COM2".

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-20 12:52:04 +01:00
Michael Brown
5783a10f72 [riscv] Write SBI console output to early UART, if enabled
The early UART is an optional feature used to obtain debug output from
the prefix before iPXE is able to parse the device tree.

Extend this feature to also cover any console output that iPXE
attempts to send to the SBI console, on the basis that the purpose of
the early UART is to provide an output-only device for situations in
which there is no functional SBI console.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-12 12:57:26 +01:00
Michael Brown
41e65df19d [riscv] Maximise barrier effects of memory fences
The RISC-V "fence" instruction encoding includes bits for predecessor
and successor input and output operations, separate from read and
write operations.  It is up to the CPU implementation to decide what
counts as I/O space rather than memory space for the purposes of this
instruction.

Since we do not expect fencing to be performance-critical, keep
everything as simple and reliable as possible by using the unadorned
"fence" instruction (equivalent to "fence iorw, iorw").

Add a memory clobber to ensure that the compiler does not reorder the
barrier.  (The volatile qualifier seems to already prevent reordering
in practice, but this is not guaranteed according to the compiler
documentation.)

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-12 12:33:46 +01:00
Michael Brown
5b3ebf8b24 [riscv] Support T-Head CPUs using non-standard Memory Attribute Extension
Xuantie/T-Head processors such as the C910 (as used in the Sipeed
Lichee Pi 4A) use the high bits of the PTE in a very non-standard way
that is incompatible with the RISC-V specification.

As per the "Memory Attribute Extension (XTheadMae)", bits 62 and 61
represent cacheability and "bufferability" (write-back cacheability)
respectively.  If we do not enable these bits, then the processor gets
incredibly confused at the point that paging is enabled.  The symptom
is that cache lines will occasionally fail to fill, and so reads from
any address may return unrelated data from a previously read cache
line for a different address.

Work around these hardware flaws by detecting T-Head CPUs (via the
"get machine vendor ID" SBI call), then reading the vendor-specific
SXSTATUS register to determine whether or not the vendor-specific
Memory Attribute Extension has been enabled by the M-mode firmware.
If it has, then set bits 61 and 62 in each page table entry that is
used to access normal memory.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-02 14:19:15 +01:00
Michael Brown
817145fe01 [riscv] Do not set executable bit in early UART page mapping
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-02 08:59:54 +01:00
Michael Brown
7df005c4c6 [riscv] Add fences around early UART writes
Add a fence between the write to the UART transmit register and the
subsequent read from the transmit status register, to ensure that the
status correctly reflects the occurrence of the write.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-02 08:36:22 +01:00
Michael Brown
88cffd75a9 [riscv] Zero SATP after any failed attempt to enable paging
The RISC-V specification states that "if SATP is written with an
unsupported mode, the entire write has no effect; no fields in SATP
are modified".  We currently rely on this specified behaviour when
calculating the early UART base address: if SATP has a non-zero value
then we assume that paging must be enabled.

The XuanTie C910 CPU (as used in the Lichee Pi 4A) does not conform to
this specified behaviour.  Writing SATP with an unsupported mode will
leave SATP.MODE as zero (i.e. bare physical addressing) but the write
to SATP.PPN will still take effect, leaving SATP with an illegal
non-zero value.

Work around this misbehaviour by explicitly writing zero to SATP if we
detect that the mode change has not taken effect (e.g. because the CPU
does not support the requested paging mode).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-02 08:09:15 +01:00
Michael Brown
3fe321c42a [riscv] Add support for a SiFive-compatible early UART
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-27 17:24:16 +01:00
Michael Brown
2e27d772ca [riscv] Support mapping early UARTs outside of the identity map
Some platforms (such as the Sipeed Lichee Pi 4A) choose to make early
debugging entertainingly cumbersome for the programmer.  These
platforms not only fail to provide a functional SBI debug console, but
also choose to place the UART at a physical address that cannot be
identity-mapped under the only paging model supported by the CPU.

Support such platforms by creating a virtual address mapping for the
early UART (in the 2MB megapage immediately below iPXE itself), and
using this as the UART base address whenever paging is enabled.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-27 16:31:51 +01:00
Michael Brown
98fdfdd255 [riscv] Add support for writing prefix debug messages direct to a UART
Some platforms (such as the Sipeed Lichee Pi 4A) do not provide a
functional SBI debug console.  We can obtain early debug messages on
these systems by writing directly to the UART used by the vendor
firmware.

There is no viable way to parse the UART address from the device tree,
since the prefix debug messages occur extremely early, before the C
runtime environment is available and therefore before any information
has been parsed from the device tree.  The early UART model and
register addresses must be configured by editing config/serial.h if
needed.  (This is an acceptable limitation, since prefix debugging is
an extremely specialised use case.)

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-27 14:49:18 +01:00
Michael Brown
2e8d45aeef [riscv] Create macros for writing characters to the debug console
Abstract out the SBI debug console calls into macros that can be
shared between print_message and print_hex_value.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-26 23:36:02 +01:00
Michael Brown
6eb51f1a6a [riscv] Ignore riscv,isa property in favour of direct CSR testing
The riscv,isa devicetree property appears not to be fully populated on
some real-world systems.  For example, the Sipeed Lichee Pi 4A
(running the vendor U-Boot) reports itself as "rv64imafdcvsu", which
does not include the "zicntr" extension even though the time CSR is
present and functional.

Ignore the riscv,isa property and rely solely on CSR testing to
determine whether or not extensions are present.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-26 21:34:11 +01:00
Michael Brown
192cfc3cc5 [image] Use image name rather than pointer value in all debug messages
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-26 18:22:07 +01:00
Michael Brown
eae9a27542 [riscv] Support mapping I/O devices outside of the identity map
With the 64-bit paging schemes (Sv39, Sv48, and Sv57), we identity-map
as much of the physical address space as is possible.  Experimentation
shows that this is not sufficient to provide access to all I/O
devices.  For example: the Sipeed Lichee Pi 4A includes a CPU that
supports only Sv39, but places I/O devices at the top of a 40-bit
address space.

Add support for creating I/O page table entries on demand to map I/O
devices, based on the existing design used for x86_64 BIOS.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-26 17:56:27 +01:00
Michael Brown
56f5845b36 [riscv] Include carriage returns in libprefix.S debug messages
Support debug consoles that do not automatically convert LF to CRLF by
including the CR character within the debug message strings.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-26 00:10:30 +01:00
Michael Brown
09140ab2c1 [memmap] Allow explicit colour selection for memory map debug messages
Provide DBGC_MEMMAP() as a replacement for memmap_dump(), allowing the
colour used to match other messages within the same message group.

Retain a dedicated colour for output from memmap_dump_all(), on the
basis that it is generally most useful to visually compare full memory
dumps against previous full memory dumps.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-25 12:06:53 +01:00
Michael Brown
8d88870da5 [riscv] Support older SBI implementations
Fall back to attempting the legacy SBI console and shutdown calls if
the standard calls fail.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-25 10:43:39 +01:00
Michael Brown
036e43334a [memmap] Rename addr/last fields to min/max for clarity
Use the terminology "min" and "max" for addresses covered by a memory
region descriptor, since this is sufficiently intuitive to generally
not require further explanation.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-23 16:55:42 +01:00
Michael Brown
4a39b877dd [initrd] Split out initrd construction from bzimage.c
Provide a reusable function initrd_load_all() to load all initrds
(including any constructed CPIO headers) into a contiguous memory
region, and support functions to find the constructed total length and
permissible post-reshuffling load address range.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-23 12:31:46 +01:00
Michael Brown
029c7c4178 [initrd] Rename bzimage_align() to initrd_align()
Alignment of initrd lengths is applicable to all Linux kernels, not
just those in the x86 bzImage format.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-22 16:28:15 +01:00
Michael Brown
11e01f0652 [uheap] Expose external heap region directly
We currently rely on implicit detection of the external heap region.
The INT 15 memory map mangler relies on examining the corresponding
in-use memory region, and the initrd reshuffler relies on performing a
separate detection of the largest free memory block after startup has
completed.

Replace these with explicit public symbols to describe the external
heap region.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-22 16:28:15 +01:00
Michael Brown
a534563345 [riscv] Speed up memmove() when copying in forwards direction
Use the word-at-a-time variable-length memcpy() implementation when
performing an overlapping copy in the forwards direction, since this
is guaranteed to be safe and likely to be substantially faster than
the existing bytewise copy.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-21 16:12:56 +01:00
Michael Brown
c1cd54ad74 [initrd] Move initrd reshuffling to be architecture-independent code
There is nothing x86-specific in initrd.c, and a variant of the
reshuffling logic will be required for executing bare-metal kernels on
RISC-V and AArch64.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-21 12:12:16 +01:00
Michael Brown
ecac4a34c7 [lkrn] Add basic support for the RISC-V Linux kernel image format
The RISC-V and AArch64 bare-metal kernel images share a common header
format, and require essentially the same execution environment: loaded
close to the start of RAM, entered with paging disabled, and passed a
pointer to a flattened device tree that describes the hardware and any
boot arguments.

Implement basic support for executing bare-metal RISC-V and AArch64
kernel images.  The (trivial) AArch64-specific code path is untested
since we do not yet have the ability to build for any bare-metal
AArch64 platforms.  Constructing and passing an initramfs image is not
yet supported.

Rename the IMAGE_BZIMAGE build configuration option to IMAGE_LKRN,
since "bzImage" is specific to x86.  To retain backwards compatibility
with existing local build configurations, we leave IMAGE_BZIMAGE as
the enabled option in config/default/pcbios.h and treat IMAGE_LKRN as
a synonym for IMAGE_BZIMAGE when building for x86 BIOS.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-20 13:08:38 +01:00
Michael Brown
d0c35b6823 [bios] Use generic external heap based on the system memory map
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-19 20:47:21 +01:00
Michael Brown
140ceeeb08 [riscv] Use generic external heap based on the system memory map
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-19 19:36:25 +01:00
Michael Brown
624d76e26d [bios] Use memmap_describe() to find an external heap location
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-16 18:04:27 +01:00
Michael Brown
c8d64ecd87 [bios] Use memmap_describe() to find a relocation address
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-16 17:39:29 +01:00
Michael Brown
dbc86458e5 [comboot] Use memmap_describe() to obtain available memory
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-16 17:02:55 +01:00
Michael Brown
d0adf3b4cc [multiboot] Use memmap_describe() to construct Multiboot memory map
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-16 17:02:55 +01:00
Michael Brown
a353e70800 [memmap] Use memmap_dump_all() to dump debug memory maps
There are several places where get_memmap() is called solely to
produce debug output.  Replace these with calls to memmap_dump_all()
(which will be a no-op unless debugging is enabled).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-16 16:18:36 +01:00
Michael Brown
3812860e39 [bios] Describe umalloc() heap as an in-use memory area
Use the concept of an in-use memory region defined as part of the
system memory map API to describe the umalloc() heap.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-16 16:18:36 +01:00
Michael Brown
4c4c94ca09 [bios] Update to use the generic system memory map API
Provide an implementation of the system memory map API based on the
assorted BIOS INT 15 calls, and a temporary implementation of the
legacy get_memmap() function using the new API.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-16 16:18:36 +01:00
Michael Brown
3f6ee95737 [fdtmem] Update to use the generic system memory map API
Provide an implementation of the system memory map API based on the
system device tree, excluding any memory outside the size of the
accessible physical address space and defining an in-use region to
cover the relocated copy of iPXE and the system device tree.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-16 16:18:36 +01:00
Michael Brown
e0c4cfa81e [fdtmem] Record size of accessible physical address space
The size of accessible physical address space will be required for the
runtime memory map, not just at relocation time.  Make this size an
additional parameter to fdt_register() (matching the prototype for
fdt_relocate()), and record the value for future reference.

Note that we cannot simply store the limit in fdt_relocate() since it
is called before .data is writable and before .bss is zeroed.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-14 22:09:51 +01:00
Michael Brown
64ad1d03c3 [bios] Rename memmap.c to int15.c
Create namespace for an architecture-independent memmap.c by renaming
the BIOS-specific memmap.c to int15.c.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-14 22:02:46 +01:00
Michael Brown
d1c1e578af [riscv] Add a .pf32 build target for padded parallel flash images
QEMU's -pflash option requires an image that has been padded to the
exact expected size (32MB for all of the supported RISC-V virtual
machines).

Add a .pf32 build target which is simply the equivalent .sbi target
padded to 32MB in size, to simplify testing.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-13 18:25:24 +01:00
Michael Brown
6fd927f929 [riscv] Perform a writability test before applying relocations
If paging is not supported, then we will attempt to apply dynamic
relocations to fix up the runtime addresses.  If the image is
currently executing directly from flash memory, this can result in
effectively sending an undefined sequence of commands to the flash
device, which can cause unwanted side effects.

Perform an explicit writability test before applying relocations,
using a write value chosen to be safe for at least any devices
conforming to the JEDEC Common Flash Interface (CFI01).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-13 17:42:53 +01:00
Michael Brown
4566f59757 [riscv] Avoid potentially overwriting the scratch area during relocation
We do not currently describe the temporary page table or the temporary
stack as areas to be avoided during relocation of the iPXE image to a
new physical address.

Perform the copy of the iPXE image and zeroing of the .bss within
libprefix.S, after we have no futher use for the temporary page table
or the temporary initial stack.  Perform the copy and registration of
the system device tree in C code after relocation is complete and the
new stack (within .bss) has been set up.

This provides a clean separation of responsibilities between the
RISC-V libprefix.S and the architecture-independent fdtmem.c.  The
prefix is responsible only for relocating iPXE to the new physical
address returned from fdtmem_relocate(), and doesn't need to know or
care where fdtmem.c is planning to place the copy of the device tree.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-13 14:00:34 +01:00
Michael Brown
8e38af800b [riscv] Add a .lkrn build target resembling a Linux kernel binary
On x86 BIOS, it has been useful to be able to build iPXE to resemble a
Linux kernel, so that it can be loaded by programs such as syslinux
which already know how to handle Linux kernel binaries.

Add an equivalent .lkrn build target for RISC-V SBI, allowing for
build targets such as:

  make bin-riscv64/ipxe.lkrn

  make bin-riscv64/cgem.lkrn

The Linux kernel header format allows us to specify a required length
(including uninitialised-data portions) and defines that the image
will be loaded at a fixed offset from the start of RAM.  We can
therefore use known-safe areas of memory (within our own .bss) for the
initial temporary page table and stack.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-13 13:03:08 +01:00
Michael Brown
17fd67ce03 [riscv] Relocate to a safe physical address on startup
On startup, we may be running from read-only memory.  We need to parse
the devicetree to obtain the system memory map, and identify a safe
location to which we can copy our own binary image along with a
stashed copy of the devicetree, and then transfer execution to this
new location.

Parsing the system memory map realistically requires running C code.
This in turn requires a small temporary stack, and some way to ensure
that symbol references are valid.

We first attempt to enable paging, to make the runtime virtual
addresses equal to the link-time virtual addresses.  If this fails,
then we attempt to apply the compressed relocation records.

Assuming that one of these has worked (i.e. that either the CPU
supports paging or that our image started execution in writable
memory), then we call fdtmem_relocate() to parse the system memory map
to find a suitable relocation target address.

After the copy we disable paging, jump to the relocated copy,
re-enable paging, and reapply relocation records (if needed).  At this
point, we have a full runtime environment, and can transfer control to
normal C code.

Provide this functionality as part of libprefix.S, since it is likely
to be shared by multiple prefixes.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-12 13:59:42 +01:00
Michael Brown
3dfc88158c [riscv] Construct page tables based on link-time virtual addresses
Always construct the page tables based on the link-time address values
even if relocations have already been applied, on the assumption that
relocations will be reapplied after paging has been enabled.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-12 13:59:42 +01:00
Michael Brown
c45dc4a55d [riscv] Allow apply_relocs() to use non-inline relocation records
The address of the compressed relocation records is currently
calculated implicitly relative to the program counter.  This requires
the relocation records to be copied as part of relocation to a new
physical address, so that they can be reapplied (if needed) after
copying iPXE to the new physical address.

Since the relocation destination will never overlap the original iPXE
image, and since the relocation records will not be needed further
after completing relocation, we can avoid the need to copy the records
by passing in a pointer to the relocation records present in the
original iPXE image.

Pass the compressed relocation record address as an explicit parameter
to apply_relocs(), rather than being implicit in the program counter.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-12 12:23:23 +01:00
Michael Brown
420e475b11 [riscv] Return accessible physical address space size from enable_paging()
Relocation requires knowledge of the size of the accessible physical
address space, which for 64-bit CPUs will vary according to the paging
level supported by the processor.

Update enable_paging_64() and enable_paging_32() to calculate and
return the size of the accessible physical address space.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-12 11:47:25 +01:00
Michael Brown
6fe9ce66ae [fdtmem] Add ability to parse FDT memory map for a relocation address
Add code to parse the devicetree memory nodes, memory reservations
block, and reserved memory nodes to construct an ordered and
non-overlapping description of the system memory map, and use this to
identify a suitable address to which iPXE may be relocated at runtime.

We choose to place iPXE on a superpage boundary (as required by the
paging code), and to use the highest available address within
accessible memory.  This mirrors the approach taken for x86 BIOS
builds, where we have long assumed that any image format that we might
need to support may require specific fixed addresses towards the
bottom of the memory map, but is very unlikely to require specific
fixed addresses towards the top of the memory map (since those
addresses may not exist, depending on the amount of installed RAM).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-11 18:23:08 +01:00
Michael Brown
2e45106c0a [riscv] Ensure that prefix_virt is aligned on an xlen boundary
Ensure that the prefix_virt dynamic relocation ends up on a suitably
aligned boundary for a compressed relocation.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-11 14:17:39 +01:00
Michael Brown
95ede670bc [riscv] Hold virtual address offset in the thread pointer register
iPXE does not make use of any thread-local storage.  Use the otherwise
unused thread pointer register ("tp") to hold the current value of
the virtual address offset, rather than using a global variable.

This ensures that virt_offset can be made valid even during very early
initialisation (when iPXE may be executing directly from read-only
memory and so cannot update a global variable).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-11 13:46:21 +01:00
Michael Brown
3027864f13 [riscv] Use load and store pseudo-instructions where possible
The pattern of "load address to register" followed by "load value from
address in register" generally results in three instructions: two to
load the address and one to load the value.

This can be reduced to two instructions by allowing the assembler to
incorporate the low bits of the address within the load (or store)
instruction itself.  In the case of a store, this requires specifying
a second register that can be temporarily used to hold the high bits
of the address.  (In the case of a load, the destination register is
reused for this purpose.)

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-09 15:23:41 +01:00