On platforms where DMA devices are not in the same coherency domain as
the CPU cache, it is necessary to create page table entries where the
translations are marked as uncacheable.
We choose to place iPXE within the low 4GB of memory (since 32-bit DMA
devices are still reasonably common even on systems with 64-bit CPUs).
We therefore need to cover only the low 4GB of memory with these page
table entries.
Update virt_to_phys() to allow for the existence of such a mapping,
assuming that iPXE itself will always reside within the top 4GB of the
64-bit virtual address space (and therefore that the DMA mapping must
lie somewhere below this in the negative virtual address space).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some legacy drivers use large static allocations for transmit and
receive buffers. To avoid bloating the .bss segment, we currently
implement these as a single common symbol named "_shared_bss" (which
is permissible since only one legacy driver may be active at any one
time).
Switch to dynamic allocation of these .bss-like segments, to avoid the
requirement for using common symbols.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently have contexts in which the local variable "nic" is a
pointer to the global variable also called "nic". This complicates
the creation of macros.
Rename the global variable to "legacy_nic" to reduce pollution of the
global namespace and to allow for the creation of macros referring to
fields within this global variable.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The 16550 design includes a programmable 16-bit clock divider for an
arbitrary input clock, requiring knowledge of the input clock
frequency in order to calculate the divider value for a given baud
rate. The 16550 UARTs in an x86 PC will always have a 1.8432 MHz
input clock. Non-x86 systems may have other input clock frequencies.
Define the input clock frequency as a property of a 16550 UART, and
read the value from the device tree "clock-frequency" property.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the platform configuration to provide a mechanism for
identifying the serial console UART. Provide two globally available
mechanisms: "null" (i.e. no serial console), and "fixed" (i.e. use
whatever is specified by COMCONSOLE in config/serial.h).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the generic UART driver-private data pointer, rather than
embedding the generic UART within the 16550 UART structure.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
16550 UARTs exist on non-x86 platforms but will be accessible via MMIO
rather than port I/O. It is possible to encounter MMIO-mapped 16550
UARTs on x86 platforms, but there is no real requirement to support
them in iPXE since the standard COM1, COM2, etc ports have been
present on every PC-compatible machine since 1981.
Assume for now that accessing 16550 UART registers requires
inb()/outb() on x86 and readb()/writeb() on other architectures.
Allow for the existence of a register shift on MMIO-mapped 16550
UARTs, since modern SoCs tend to treat register addresses as being
aligned to either 32-bit or 64-bit boundaries.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove the assumption that all platforms use a fixed number of 16550
UARTs identifiable by a simple numeric index. Create an abstraction
allowing for dynamic instantiation and registration of any number of
arbitrary UART models.
The common case of the serial console on x86 uses a single fixed UART
specified at compile time. Avoid unnecessarily dragging in the
dynamic instantiation code in this use case by allowing COMCONSOLE to
refer to a single static UART object representing the relevant port.
When selecting a UART by command-line argument (as used in the
"gdbstub serial <port>" command), allow the UART to be specified as
either a numeric index (to retain backwards compatiblity) or a
case-insensitive port name such as "COM2".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In the context of serial consoles, the use of any frame formats other
than the standard 8 data bits, no parity, and one stop bit is so rare
as to be nonexistent.
Remove the almost certainly unused support for custom frame formats.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We will want to be able to create the console device as early as
possible. Refactor devicetree probing to remove the assumption that a
devicetree device must have a devicetree parent, and expose functions
to allow a standalone device to be created given only the offset of a
node within the tree.
The full device path is no longer trivial to construct with this
assumption removed. The full path is currently used only for debug
messages. Remove the stored full path, use just the node name for
debug messages, and ensure that the topology information previously
visible in the full path is reconstructible from the combined debug
output if needed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for RFC 3442 classless static routes provided via DHCP
option 121.
Originally-implemented-by: Hazel Smith <hazel.smith@leicester.ac.uk>
Originally-implemented-by: Raphael Pour <raphael.pour@hetzner.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Extend the definition of an IPv4 routing table entry to allow for the
expression of non-default gateways for specified off-link subnets, and
of on-link secondary subnets (where we can send directly to the
destination address even though our source address is not within the
subnet).
This more precise definition also allows us to correctly handle
routing in the (uncommon for iPXE) case when multiple network
interfaces are open concurrently and more than one interface has a
default gateway.
The common case of a single IPv4 address/netmask and a default gateway
now results in two routing table entries. To retain backwards
compatibility with existing documentation (and to avoid on-screen
clutter), the "route" command prints default gateways on the same line
as the locally assigned address. There is therefore no change in
output from the "route" command unless explicit additional (off-link
or on-link) routes are present.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently rely on the recursive nature of devicetree bus probing to
obtain the region cell size specification from the parent device.
This blocks the possibility of creating a standalone console device
based on /chosen/stdout-path before probing the whole bus.
Fix by using fdt_parent() to locate the parent device at the point of
use within dt_ioremap().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide DBGC_MEMMAP() as a replacement for memmap_dump(), allowing the
colour used to match other messages within the same message group.
Retain a dedicated colour for output from memmap_dump_all(), on the
basis that it is generally most useful to visually compare full memory
dumps against previous full memory dumps.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the terminology "min" and "max" for addresses covered by a memory
region descriptor, since this is sufficiently intuitive to generally
not require further explanation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the shared initrd reshuffling and CPIO header construction code
for RISC-V bare-metal kernels. This allows for files to be injected
into the constructed ("magic") initrd image in exactly the same way as
is done for bzImage and UEFI kernels.
We append a dummy image encompassing the FDT to the end of the
reshuffle list, so that it ends up directly following the constructed
initrd in memory (but excluded from the initrd length, which was
recorded before constructing the FDT).
We also temporarily prepend the kernel binary itself to the reshuffle
list. This is guaranteed to be safe (since reshuffling is designed to
be unable to fail), and avoids the requirement for the kernel segment
to be available before reshuffling. This is useful since current
RISC-V bare-metal kernels tend to be distributed as EFI zboot images,
which require large temporary allocations from the external heap for
the intermediate images created during archive extraction.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a reusable function initrd_load_all() to load all initrds
(including any constructed CPIO headers) into a contiguous memory
region, and support functions to find the constructed total length and
permissible post-reshuffling load address range.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Alignment of initrd lengths is applicable to all Linux kernels, not
just those in the x86 bzImage format.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Eliminate the requirement for free space when reshuffling initrds by
swapping adjacent initrds using an in-place triple reversal.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently rely on implicit detection of the external heap region.
The INT 15 memory map mangler relies on examining the corresponding
in-use memory region, and the initrd reshuffler relies on performing a
separate detection of the largest free memory block after startup has
completed.
Replace these with explicit public symbols to describe the external
heap region.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow a single initrd image to be passed verbatim to the booted RISC-V
kernel, as a proof of concept.
We do not yet support reshuffling to make optimal use of available
memory, or dynamic construction of CPIO headers, but this is
sufficient to allow iPXE to start up the Fedora 42 kernel with its
matching initrd image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow an initrd location to be specified in our constructed device
tree via the "linux,initrd-start" and "linux,initrd-end" properties.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is nothing x86-specific in initrd.c, and a variant of the
reshuffling logic will be required for executing bare-metal kernels on
RISC-V and AArch64.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Current RISC-V and AArch64 kernels found in the wild tend not to be in
the documented kernel format, but are instead "EFI zboot" kernels
comprising a small EFI executable that decompresses and executes the
inner payload (which is a kernel in the expected format).
The EFI zboot header includes a recognisable magic value "zimg" along
with two fields describing the offset and length of the compressed
payload. We can therefore treat this as an archive image format,
extracting the payload as-is and then relying on our existing ability
to execute compressed images.
This is sufficient to allow iPXE to execute the Fedora 42 RISC-V
kernel binary as currently published.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The RISC-V and AArch64 bare-metal kernel images share a common header
format, and require essentially the same execution environment: loaded
close to the start of RAM, entered with paging disabled, and passed a
pointer to a flattened device tree that describes the hardware and any
boot arguments.
Implement basic support for executing bare-metal RISC-V and AArch64
kernel images. The (trivial) AArch64-specific code path is untested
since we do not yet have the ability to build for any bare-metal
AArch64 platforms. Constructing and passing an initramfs image is not
yet supported.
Rename the IMAGE_BZIMAGE build configuration option to IMAGE_LKRN,
since "bzImage" is specific to x86. To retain backwards compatibility
with existing local build configurations, we leave IMAGE_BZIMAGE as
the enabled option in config/default/pcbios.h and treat IMAGE_LKRN as
a synonym for IMAGE_BZIMAGE when building for x86 BIOS.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add an implementation of umalloc() using the generalised model of a
heap, placing the external heap in the largest usable region obtained
from the system memory map.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Size-tracked pointers allocated via umalloc() have historically been
aligned to a page boundary, as have the edges of the hidden memory
region covering the external heap.
Allow the block and size-tracked pointer alignments to be specified as
heap configuration parameters.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Create a generic model of a heap as a list of free blocks with
optional methods for growing and shrinking the heap.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
All memory map users have been updated to use the new system memory
map API. Remove get_memmap() and its associated definitions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the concept of an in-use memory region defined as part of the
system memory map API to describe the umalloc() heap.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide an implementation of the system memory map API based on the
system device tree, excluding any memory outside the size of the
accessible physical address space and defining an in-use region to
cover the relocated copy of iPXE and the system device tree.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Define a generic system memory map API, based on the abstraction
created for parsing the FDT memory map and adding a concept of hidden
in-use memory regions as required to support patching the BIOS INT 15
memory map.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The size of accessible physical address space will be required for the
runtime memory map, not just at relocation time. Make this size an
additional parameter to fdt_register() (matching the prototype for
fdt_relocate()), and record the value for future reference.
Note that we cannot simply store the limit in fdt_relocate() since it
is called before .data is writable and before .bss is zeroed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We do not currently describe the temporary page table or the temporary
stack as areas to be avoided during relocation of the iPXE image to a
new physical address.
Perform the copy of the iPXE image and zeroing of the .bss within
libprefix.S, after we have no futher use for the temporary page table
or the temporary initial stack. Perform the copy and registration of
the system device tree in C code after relocation is complete and the
new stack (within .bss) has been set up.
This provides a clean separation of responsibilities between the
RISC-V libprefix.S and the architecture-independent fdtmem.c. The
prefix is responsible only for relocating iPXE to the new physical
address returned from fdtmem_relocate(), and doesn't need to know or
care where fdtmem.c is planning to place the copy of the device tree.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add code to parse the devicetree memory nodes, memory reservations
block, and reserved memory nodes to construct an ordered and
non-overlapping description of the system memory map, and use this to
identify a suitable address to which iPXE may be relocated at runtime.
We choose to place iPXE on a superpage boundary (as required by the
paging code), and to use the highest available address within
accessible memory. This mirrors the approach taken for x86 BIOS
builds, where we have long assumed that any image format that we might
need to support may require specific fixed addresses towards the
bottom of the memory map, but is very unlikely to require specific
fixed addresses towards the top of the memory map (since those
addresses may not exist, depending on the amount of installed RAM).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE does not make use of any thread-local storage. Use the otherwise
unused thread pointer register ("tp") to hold the current value of
the virtual address offset, rather than using a global variable.
This ensures that virt_offset can be made valid even during very early
initialisation (when iPXE may be executing directly from read-only
memory and so cannot update a global variable).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The "reg" property is also used by non-device nodes, such as the nodes
describing the system memory map.
Provide generalised functionality for parsing the "#address-cells",
"#size-cells", and "reg" properties.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In a position-dependent executable, where all addresses are fixed
at link time, we can use the standard technique as documented by
GNU ld to get the value of an absolute symbol, e.g.:
extern char _my_symbol[];
printf ( "Absolute symbol value is %x\n", ( ( int ) _my_symbol ) );
This technique may not work in a position-independent executable.
When dynamic relocations are applied, the runtime addresses will no
longer be equal to the link-time addresses. If the code to obtain the
address of _my_symbol uses PC-relative addressing, then it will
calculate the runtime "address" of the absolute symbol, which will no
longer be equal the the link-time "address" (i.e. the correct value)
of the absolute symbol.
Define macros ABS_SYMBOL(), ABS_VALUE_INIT(), and ABS_VALUE() that
provide access to the correct values of absolute symbols even in
position-independent code, and use these macros wherever absolute
symbols are accessed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
During early initialisation on some platforms, the .data and .bss
sections may not yet be writable.
Display the assertion message before attempting to increment the
assertion failure counter, since writing to the assertion counter may
trigger a CPU exception that ends up resetting the system.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The virtual offset memory model used for i386-pcbios and x86_64-pcbios
can be generalised to also cover riscv32-sbi and riscv64-sbi. In both
architectures, the 32-bit builds will use a circular map of the 32-bit
address space, and the 64-bit builds will use an identity map for the
relevant portion of the physical address space, with iPXE itself
placed in the negative (kernel) address space.
Generalise and document the virt_offset mechanism, and set it as the
default for both PCBIOS and SBI platforms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove the last remaining traces of the concept of a user pointer,
leaving iPXE with a simpler and cleaner memory model that implicitly
assumes that all memory locations can be reached through pointer
dereferences.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The uaccess.h header is no longer required for any code that touches
external ("user") memory, since such memory accesses are now performed
through pointer dereferences. Reduce the number of files including
this header.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Almost all image consumers do not need to modify the content of the
image. Now that the image data is a pointer type (rather than the
opaque userptr_t type), we can rely on the compiler to enforce this at
build time.
Change the .data field to be a const pointer, so that the compiler can
verify that image consumers do not modify the image content. Provide
a transparent .rwdata field for consumers who have a legitimate (and
now explicit) reason to modify the image content.
We do not attempt to impose any runtime restriction on checking
whether or not an image is writable. The only existing instances of
genuinely read-only images are the various unit test images, and it is
acceptable for defective test cases to result in a segfault rather
than a runtime error.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Not all images are allocated via alloc_image(). For example: embedded
images, the static images created to hold a runtime command line, and
the images used by unit tests are all static structures.
Using image_set_cmdline() (via e.g. the "imgargs" command) to set the
command-line arguments of a static image will succeed but will leak
memory, since nothing will ever free the allocated command line.
There are no code paths that can lead to calling image_set_len() on a
static image, but there is no safety check against future code paths
attempting this.
Define a flag IMAGE_STATIC to mark an image as statically allocated,
generalise free_image() to also handle freeing dynamically allocated
portions of static images (such as the command line), and expose
free_image() for use by static images.
Define a related flag IMAGE_STATIC_NAME to mark the name as statically
allocated. Allow a statically allocated name to be replaced with a
dynamically allocated name since this is a potentially valid use case
(e.g. if "imgdecrypt --name <name>" is used on an embedded image).
Signed-off-by: Michael Brown <mcb30@ipxe.org>