Some past security reviews carried out for UEFI Secure Boot signing
submissions have covered specific drivers or functional areas of iPXE.
Mark all of the files comprising these areas as permitted for UEFI
Secure Boot.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Mark all files used in a standard build of bin-x86_64-efi/snponly.efi
as permitted for UEFI Secure Boot. These files represent the core
functionality of iPXE that is guaranteed to have been included in
every binary that was previously subject to a security review and
signed by Microsoft. It is therefore legitimate to assume that at
least these files have already been reviewed to the required standard
multiple times.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Ensure that zero-length big integer literals are treated as containing
a zero value. Avoid tests on every big integer arithmetic operation
by ensuring that bigint_required_size() always returns a non-zero
value: the zero-length tests can therefore be restricted to only
bigint_init() and bigint_done().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Make pci_can_probe() part of the runtime selectable PCI I/O API, and
defer this check to the per-range API.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the linker table mechanism to enumerate the underlying PCI I/O
APIs, to allow PCIAPI_CLOUD to become architecture-independent code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On some systems (observed on an AWS EC2 c7a.medium instance in
eu-west-2), there is no single PCI configuration space access
mechanism that covers all PCI devices. For example: the ECAM may
provide access only to bus 01 and above, with type 1 accesses required
to access bus 00.
Allow the choice of PCI configuration space access mechanism to be
made on a per-range basis, rather than being made just once in an
initialisation function.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE runs only in environments that support unaligned accesses to RAM.
However, memcpy() and memset() are also used to write to graphical
framebuffer memory, which may support only aligned accesses on some
CPU architectures such as ARM.
Restructure the 64-bit ARM memcpy() and memset() routines along the
lines of the RISC-V implementations, which split the region into
pre-aligned, aligned, and post-aligned sections.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow code using the combined MMIO and port I/O accessors to safely
call iounmap() to unmap the MMIO or port I/O region.
In the virtual offset I/O mapping API as used for UEFI, 32-bit BIOS,
and 32-bit RISC-V SBI, iounmap() is a no-op anyway. In 64-bit RISC-V
SBI, we have no concept of port I/O and so the issue is moot.
This leaves only 64-bit BIOS, where it suffices to simply do nothing
for any pages outside of the chosen MMIO virtual address range.
For symmetry, we implement the equivalent change in the very closely
related RISC-V page management code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the combined accessors ioread8() and iowrite8() to read and write
16550 UART registers, to allow the decision between using MMIO and
port I/O to be made at runtime.
Minimise the increase in code size for x86 by ignoring the register
shift, since this is essentially used only for non-x86 SoCs.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some devices (such as a 16550 UART) may be accessed via either MMIO or
port I/O. This is currently forced to be a compile-time decision.
For example: we currently access a 16550 UART via port I/O on x86 and
via MMIO on any other platform.
PCI UARTs with MMIO BARs do exist but are not currently supported in
an x86 build of iPXE. Some AWS EC2 systems (observed on a c6i.metal
instance in eu-west-2) provide only a PCI MMIO UART, and it is
therefore currently impossible to get serial output from iPXE on these
instance types.
Add ioread8(), ioread16(), etc accessors that will select between MMIO
and port I/O at the point of use. For non-x86 platforms where we
currently have no port I/O support, these simply become wrappers
around the corresponding readb(), readw(), etc MMIO accessors. On
x86, we use the fairly well-known trick of treating any 16-bit address
(below 64kB) as a port I/O address.
This trick works even in the i386 BIOS build of iPXE (where virtual
addresses are offset from physical addresses by a runtime constant),
since the first 64kB of the virtual address space will correspond to
the iPXE binary itself (along with its uninitialised-data space), and
so must be RAM rather than a valid MMIO address range.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commands were originally ordered by functional group (e.g. keeping the
image management commands together), with arrays used to impose a
functionally meaningful order within the group.
As the number of commands and functional groups has expanded over the
years, this has become essentially useless as an organising principle.
Switch to sorting commands alphabetically (using the linker table
mechanism).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Variables in the .bss section cannot be relied upon to have zero
values during early initialisation, before we have relocated ourselves
to somewhere suitable in RAM and zeroed the .bss section.
Place any explicitly zero-initialised variables in the .data section
rather than in .bss, so that we can rely on their values even during
this early initialisation stage.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On startup, we may be running from read-only memory, and therefore
cannot zero the .bss section (or write to the .data section) until we
have parsed the system memory map and relocated ourselves to somewhere
suitable in RAM. The code that runs during this early initialisation
stage must be carefully written to avoid writing to the .data section
and to avoid reading from or writing to the .bss section.
Detecting code that erroneously writes to the .data or .bss sections
is relatively easy since running from read-only memory (e.g. via
QEMU's -pflash option) will immediately reveal the bug. Detecting
code that erroneously reads from the .bss section is harder, since in
a freshly powered-on machine (or in a virtual machine) there is a high
probability that the contents of the memory will be zero even before
we explicitly zero out the section.
Add the ability to fill the .bss section with an invalid non-zero
value to expose bugs in early initialisation code that erroneously
relies upon variables in .bss before the section has been zeroed. We
use the value 0xeb55eb55eb55eb55 ("EBSS") since this is immediately
recognisable as a value in a crash dump, and will trigger a page fault
if dereferenced since the address is in a non-canonical form.
Poisoning the .bss can be done only when the image is known to already
reside in writable memory. It will overwrite the relocation records,
and so can be done only on a system where relocation is known to be
unnecessary (e.g. because paging is supported). We therefore do not
enable this behaviour by default, but leave it as a configurable
option via the config/fault.h header.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
PCI Express devices do not have physical INTx output signals, and on
modern motherboards there is unlikely to be any interrupt controller
with physical interrupt input signals. There are multiple levels of
abstraction involved in emulating the legacy INTx interrupt mechanism:
the PCIe device sends Assert_INTx and Deassert_INTx messages, PCIe
bridges and switches must collate these virtual wires, and the root
complex must map the virtual wires into messages that can be
understood by the host's emulated 8259 PIC.
This complex chain of emulations is rarely tested on modern hardware,
since operating systems will invariably use MSI-X for PCI devices and
the I/O APIC for non-PCI devices such as the real-time clock. Since
the legacy interrupt emulation mechanism is rarely tested, it is
frequently unreliable. We have encountered many issues over the years
in which legacy interrupts are simply not raised as expected, even
when inspection shows that the device believes it is asserting an
interrupt and the controller believes that the interrupt is enabled.
We already maintain a list of devices that are known to fail to
generate legacy interrupts correctly. This list is based on the PCI
vendor and device IDs, which is not necessarily a fair test since the
root cause may be a board-level misconfiguration rather than a
device-level fault.
Assume that any PCI Express device has a high chance of not being able
to raise legacy interrupts reliably. This is a relatively intrusive
change since it will affect essentially all modern network devices,
but should hopefully fix all future issues with non-functional legacy
interrupts, without needing to constantly grow the list of known
broken devices.
If some PCI Express devices are found to fail when operated in polling
mode, then this change will need to be revisited.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
In the case of a misbehaving PXE stack, it is often useful to know the
PCI vendor and device IDs (e.g. for adding the device to the list of
devices with known broken support for generating interrupts).
The PCI vendor and device ID is already available to the prefix code,
and so can trivially be printed out. Add this information to the PXE
prefix startup banner.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Set the appropriate Svpbmt type bits within page table entries if the
extension is supported. Tested only in QEMU so far, due to the lack
of availability of real hardware supporting Svpbmt.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reuse the code that creates I/O device page mappings to create the
coherent DMA mapping of the 32-bit address space on demand, instead of
constructing this mapping as part of the initial page table.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
All 64-bit paging schemes support at least 1GB "gigapages". Use these
to map I/O devices instead of 2MB "megapages". This reduces the
number of consumed page table entries, increases the visual similarity
of I/O remapped addresses to the underlying physical addresses, and
opens up the possibility of reusing the code to create the coherent
DMA map of the 32-bit address space.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The data cache must be invalidated twice for RX DMA buffers: once
before passing ownership to the DMA device (in case the cache happens
to contain dirty data that will be written back at an undefined future
point), and once after receiving ownership from the DMA device (in
case the CPU happens to have speculatively accessed data in the buffer
while it was owned by the hardware).
Only the used portion of the buffer needs to be invalidated after
completion, since we do not care about data within the unused portion.
Update the DMA API to include the used length as an additional
parameter to dma_unmap(), and add the necessary second cache
invalidation pass to the RISC-V DMA API implementation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a RISC-V assembly language implementation of TCP/IP checksumming,
which is around 50x faster than the generic algorithm. The main loop
checksums aligned xlen-bit words, using almost entirely compressible
instructions and accumulating carries in a separate register to allow
folding to be deferred until after all loops have completed.
Experimentation on a C910 CPU suggests that this achieves around four
bytes per clock cycle, which is comparable to the x86 implementation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide an implementation of dma_map() that performs cache clean or
invalidation as required, and an implementation of dma_alloc() that
returns virtual addresses within the coherent mapping of the 32-bit
physical address space.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On platforms where DMA devices are not in the same coherency domain as
the CPU cache, it is necessary to be able to explicitly clean the
cache (i.e. force data to be written back to main memory) and
invalidate the cache (i.e. discard any cached data and force a
subsequent read from main memory).
Add support for cache management via the standard Zicbom extension or
the T-Head cache management operations extension, with the supported
extension detected on first use.
Support cache management operations only on I/O buffers, since these
are guaranteed to not share cachelines with other data.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use PTEs 256-259 to create a mapping of the 32-bit physical address
space with attributes suitable for coherent DMA mappings.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The page table entries for the identity map vary according to the
paging level in use, and so must be constructed within the loop used
to detect the maximum supported paging level. Other page table
entries are invariant between paging levels, and so may be constructed
just once before entering the loop.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The 16550 design includes a programmable 16-bit clock divider for an
arbitrary input clock, requiring knowledge of the input clock
frequency in order to calculate the divider value for a given baud
rate. The 16550 UARTs in an x86 PC will always have a 1.8432 MHz
input clock. Non-x86 systems may have other input clock frequencies.
Define the input clock frequency as a property of a 16550 UART, and
read the value from the device tree "clock-frequency" property.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When a native serial driver is enabled for the system console device
specified via "/chosen/stdout-path", it is very likely that this will
correspond to the same physical serial port used for the SBI debug
console.
Inhibit input and output via the SBI console whenever a serial console
is active, to avoid duplicated output characters and unpredictable
input behaviour.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE drivers have been written with the implicit assumption that MMIO
writes are allowed to be posted but that an MMIO register read or
write after another MMIO register write will always observe the
effects of the first write.
For example: after having written a byte to the transmit holding
register (THR) of a 16550 UART, it is expected that any subsequent
read of the line status register (LSR) will observe a value consistent
with the occurrence of the write.
RISC-V does not seem to provide any ordering guarantees between
accesses to different registers within the same MMIO device. Add
fences as part of the MMIO accessors to provide the assumed
guarantees.
Use "fence io, io" before each MMIO read or write to enforce full
serialisation of MMIO accesses with respect to each other. This is
almost certainly more conservative than is strictly necessary.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the generic UART driver-private data pointer, rather than
embedding the generic UART within the 16550 UART structure.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove the assumption that all platforms use a fixed number of 16550
UARTs identifiable by a simple numeric index. Create an abstraction
allowing for dynamic instantiation and registration of any number of
arbitrary UART models.
The common case of the serial console on x86 uses a single fixed UART
specified at compile time. Avoid unnecessarily dragging in the
dynamic instantiation code in this use case by allowing COMCONSOLE to
refer to a single static UART object representing the relevant port.
When selecting a UART by command-line argument (as used in the
"gdbstub serial <port>" command), allow the UART to be specified as
either a numeric index (to retain backwards compatiblity) or a
case-insensitive port name such as "COM2".
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The early UART is an optional feature used to obtain debug output from
the prefix before iPXE is able to parse the device tree.
Extend this feature to also cover any console output that iPXE
attempts to send to the SBI console, on the basis that the purpose of
the early UART is to provide an output-only device for situations in
which there is no functional SBI console.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The RISC-V "fence" instruction encoding includes bits for predecessor
and successor input and output operations, separate from read and
write operations. It is up to the CPU implementation to decide what
counts as I/O space rather than memory space for the purposes of this
instruction.
Since we do not expect fencing to be performance-critical, keep
everything as simple and reliable as possible by using the unadorned
"fence" instruction (equivalent to "fence iorw, iorw").
Add a memory clobber to ensure that the compiler does not reorder the
barrier. (The volatile qualifier seems to already prevent reordering
in practice, but this is not guaranteed according to the compiler
documentation.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Xuantie/T-Head processors such as the C910 (as used in the Sipeed
Lichee Pi 4A) use the high bits of the PTE in a very non-standard way
that is incompatible with the RISC-V specification.
As per the "Memory Attribute Extension (XTheadMae)", bits 62 and 61
represent cacheability and "bufferability" (write-back cacheability)
respectively. If we do not enable these bits, then the processor gets
incredibly confused at the point that paging is enabled. The symptom
is that cache lines will occasionally fail to fill, and so reads from
any address may return unrelated data from a previously read cache
line for a different address.
Work around these hardware flaws by detecting T-Head CPUs (via the
"get machine vendor ID" SBI call), then reading the vendor-specific
SXSTATUS register to determine whether or not the vendor-specific
Memory Attribute Extension has been enabled by the M-mode firmware.
If it has, then set bits 61 and 62 in each page table entry that is
used to access normal memory.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a fence between the write to the UART transmit register and the
subsequent read from the transmit status register, to ensure that the
status correctly reflects the occurrence of the write.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The RISC-V specification states that "if SATP is written with an
unsupported mode, the entire write has no effect; no fields in SATP
are modified". We currently rely on this specified behaviour when
calculating the early UART base address: if SATP has a non-zero value
then we assume that paging must be enabled.
The XuanTie C910 CPU (as used in the Lichee Pi 4A) does not conform to
this specified behaviour. Writing SATP with an unsupported mode will
leave SATP.MODE as zero (i.e. bare physical addressing) but the write
to SATP.PPN will still take effect, leaving SATP with an illegal
non-zero value.
Work around this misbehaviour by explicitly writing zero to SATP if we
detect that the mode change has not taken effect (e.g. because the CPU
does not support the requested paging mode).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some platforms (such as the Sipeed Lichee Pi 4A) choose to make early
debugging entertainingly cumbersome for the programmer. These
platforms not only fail to provide a functional SBI debug console, but
also choose to place the UART at a physical address that cannot be
identity-mapped under the only paging model supported by the CPU.
Support such platforms by creating a virtual address mapping for the
early UART (in the 2MB megapage immediately below iPXE itself), and
using this as the UART base address whenever paging is enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some platforms (such as the Sipeed Lichee Pi 4A) do not provide a
functional SBI debug console. We can obtain early debug messages on
these systems by writing directly to the UART used by the vendor
firmware.
There is no viable way to parse the UART address from the device tree,
since the prefix debug messages occur extremely early, before the C
runtime environment is available and therefore before any information
has been parsed from the device tree. The early UART model and
register addresses must be configured by editing config/serial.h if
needed. (This is an acceptable limitation, since prefix debugging is
an extremely specialised use case.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Abstract out the SBI debug console calls into macros that can be
shared between print_message and print_hex_value.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The riscv,isa devicetree property appears not to be fully populated on
some real-world systems. For example, the Sipeed Lichee Pi 4A
(running the vendor U-Boot) reports itself as "rv64imafdcvsu", which
does not include the "zicntr" extension even though the time CSR is
present and functional.
Ignore the riscv,isa property and rely solely on CSR testing to
determine whether or not extensions are present.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
With the 64-bit paging schemes (Sv39, Sv48, and Sv57), we identity-map
as much of the physical address space as is possible. Experimentation
shows that this is not sufficient to provide access to all I/O
devices. For example: the Sipeed Lichee Pi 4A includes a CPU that
supports only Sv39, but places I/O devices at the top of a 40-bit
address space.
Add support for creating I/O page table entries on demand to map I/O
devices, based on the existing design used for x86_64 BIOS.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Support debug consoles that do not automatically convert LF to CRLF by
including the CR character within the debug message strings.
Signed-off-by: Michael Brown <mcb30@ipxe.org>