Simplify the SMBIOS structure parsing code by assuming that all
structure content is fully accessible via pointer dereferences.
In particular, this allows the convoluted find_smbios_structure() and
read_smbios_structure() to be combined into a single function
smbios_structure() that just returns a direct pointer to the SMBIOS
structure, with smbios_string() similarly now returning a direct
pointer to the relevant string.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the ACPI table parsing code by assuming that all table
content is fully accessible via pointer dereferences.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove the intermediate concept of a user pointer from real address
conversion, leaving real_to_virt() as the directly implemented
function.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Remove the intermediate concept of a user pointer from physical
address conversions, leaving virt_to_phys() and phys_to_virt() as the
directly implemented functions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The user_to_virt() function is now a straightforward wrapper around
addition, with the addend almost invariably being zero.
Remove this redundant wrapper.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The memcpy_user(), memmove_user(), memcmp_user(), memset_user(), and
strlen_user() functions are now just straightforward wrappers around
the corresponding standard library functions.
Remove these redundant wrappers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The userptr_add() and userptr_diff() functions are now just
straightforward wrappers around addition and subtraction.
Remove these redundant wrappers.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Clarify the intended usage of userptr_sub() by renaming it to
userptr_diff() (to avoid confusion with userptr_add()), and fix the
existing call sites that erroneously use userptr_sub() to subtract an
offset from a userptr_t value.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for parsing device trees where an external factor (such as a
downloaded image length) determines the maximum length, which must be
validated against the length within the device tree header.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When running on a platform that uses FDT as its hardware description
mechanism, we are likely to have multiple device tree structures. At
a minimum, there will be the device tree passed to us from the
previous boot stage (e.g. OpenSBI), and the device tree that we
construct to be passed to the booted operating system.
Update the internal FDT API to include an FDT pointer in all function
parameter lists.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If the UNDI interrupt remains constantly asserted (e.g. because the
BIOS has enabled interrupts for an unrelated device sharing the same
IRQ, or because of bugs in the OEM UNDI driver), then we may get stuck
in an interrupt storm.
We cannot safely chain to the previous interrupt handler (which could
plausibly handle an unrelated device interrupt) since there is no
well-defined behaviour for previous interrupt handlers. We have
observed BIOSes to provide default interrupt handlers that variously
do nothing, send EOI, disable the IRQ, or crash the system.
Fix by disabling the UNDI interrupt whenever our handler is triggered,
and rearm it as needed when polling the network device. This ensures
that forward progress continues to be made even if something causes
the interrupt to be constantly asserted.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
When using the undionly.kkpxe binary (which is never recommended), the
UNDI interrupt may still be enabled when iPXE starts up. If the PXE
base code interrupt handler is not well-behaved, this can result in
undefined behaviour when interrupts are first enabled (e.g. for
entropy gathering, or for allowing the timer tick to occur).
Fix by detecting and disabling the UNDI interrupt during the prefix
code.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The implementation of PXENV_UNDI_GET_NIC_TYPE in some PXE ROMs
(observed with an Intel X710 ROM in a Dell PowerEdge R6515) will fail
to write the NicType byte, leaving it uninitialised.
Prepopulate the NicType byte with a highly unlikely value as a
sentinel to allow us to detect this, and assume that any such devices
are overwhelmingly likely to be PCI devices.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Legacy IRQ 8 appears to be enabled by default on some platforms. If
iPXE selects the RTC entropy source, this will currently result in the
RTC IRQ 8 being unconditionally disabled. This can break assumptions
made by BIOSes or subsequent bootloaders: in particular, the FreeBSD
loader may lock up at the point of starting its default 10-second
countdown when it calls INT 15,86.
Fix by restoring the previous state of IRQ 8 instead of disabling it
unconditionally. Note that we do not need to disable IRQ 8 around the
point of hooking (or unhooking) the ISR, since this code will be
executing in iPXE's normal state of having interrupts disabled anyway.
Also restore the previous state of the RTC periodic interrupt enable,
rather than disabling it unconditionally.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Return the previous interrupt enabled state from enable_irq() and
disable_irq(), to allow callers to more easily restore this state.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The bzImage specification allows two bytes for the setup code jump
instruction at offset 0x200, which limits its relative offset to +0x7f
bytes. This currently imposes an upper limit on the length of the
version string, which currently precedes the setup code.
Fix by moving the version string to the .prefix.data section, so that
it no longer affects the placement of the setup code.
Originally-fixed-by: Miao Wang <shankerwangmiao@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
iPXE allows individual raw files to be automatically wrapped with
suitable CPIO headers and injected into the magic initrd image as
exposed to a booted Linux kernel. This feature is currently limited
to placing files within directories that already exist in the initrd
filesystem.
Remove this limitation by adding the ability for iPXE to construct
CPIO headers for parent directories as needed, under control of the
"mkdir=<n>" command-line argument. For example:
initrd config.ign /usr/share/oem/config.ign mkdir=1
will create CPIO headers for the "/usr/share/oem" directory as well as
for the "/usr/share/oem/config.ign" file itself.
This simplifies the process of booting operating systems such as
Flatcar Linux, which otherwise require the single "config.ign" file to
be manually wrapped up as a CPIO archive solely in order to create the
relevant parent directory entries.
The value <n> may be used to control the number of parent directory
entries that are created. For example, "mkdir=2" would cause up to
two parent directories to be created (i.e. "/usr/share" and
"/usr/share/oem" in the above example). A negative value such as
"mkdir=-1" may be used to create all parent directories up to the root
of the tree.
Do not create any parent directory entries by default, since doing so
would potentially cause the modes and ownership information for
existing directories to be overwritten.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Expose the effective carry (or borrow) out flag from big integer
addition and subtraction, and use this to elide an explicit bit test
when performing x25519 reduction.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The seed CSR defined by the Zkr extension is accessible only in M-mode
by default. Older versions of OpenSBI (prior to version 1.4) do not
set mseccfg.sseed, with the result that attempts to access the seed
CSR from S-mode will raise an illegal instruction exception.
Add a facility for testing the accessibility of arbitrary CSRs, and
use it to check that the seed CSR is accessible before reporting the
seed CSR entropy source as being functional.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add basic support for running directly on top of SBI, with no UEFI
firmware present. Build as e.g.:
make CROSS=riscv64-linux-gnu- bin-riscv64/ipxe.sbi
The resulting binary can be tested in QEMU using e.g.:
qemu-system-riscv64 -M virt -cpu max -serial stdio \
-kernel bin-riscv64/ipxe.sbi
No drivers or executable binary formats are supported yet, but the
unit test suite may be run successfully.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The timer and entropy seed CSRs will, by design, return different
values each time they are read.
Add the missing volatile qualifiers on the inline assembly to prevent
gcc from assuming that repeated invocations may be elided.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Zkr entropy source extension defines a potentially unprivileged
seed CSR that can be read to obtain 16 bits of entropy input, with a
mandated requirement that 256 entropy input bits read from the seed
CSR will contain at least 128 bits of min-entropy.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The Zicntr extension defines an unprivileged wall-clock time CSR that
roughly matches the behaviour of an invariant TSC on x86. The nominal
frequency of this timer may be read from the "timebase-frequency"
property of the CPU node in the device tree.
Add a timer source using RDTIME to provide implementations of udelay()
and currticks(), modelled on the existing RDTSC-based timer for x86.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RISC-V seems to allow for direct discovery of CPU features only from
M-mode (e.g. by setting up a trap handler and then attempting to
access a CSR), with S-mode code expected to read the resulting
constructed ISA description from the device tree.
Add the ability to check for the presence of named extensions listed
in the "riscv,isa" property of the device tree node corresponding to
the boot hart.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add the ability to issue Supervisor Binary Interface (SBI) calls via
the ECALL instruction, and use the SBI DBCN extension to implement a
debug console.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Every architecture uses the same implementation for bigint_is_set(),
and there is no reason to suspect that a future CPU architecture will
provide a more efficient way to implement this operation.
Simplify the code by providing a single architecture-independent
implementation of bigint_is_set().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The big integer shift operations are misleadingly described as
rotations since the original x86 implementations are essentially
trivial loops around the relevant rotate-through-carry instruction.
The overall operation performed is a shift rather than a rotation.
Update the function names and descriptions to reflect this.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
An n-bit multiplication product may be added to up to two n-bit
integers without exceeding the range of a (2n)-bit integer:
(2^n - 1)*(2^n - 1) + (2^n - 1) + (2^n - 1) = 2^(2n) - 1
Exploit this to perform big integer multiplication in constant time
without requiring the caller to provide temporary carry space.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for building as a Linux userspace binary for AArch32.
This allows the self-test suite to be more easily run for the 32-bit
ARM code. For example:
make CROSS=arm-linux-gnu- bin-arm32-linux/tests.linux
qemu-arm -L /usr/arm-linux-gnu/sys-root/ \
./bin-arm32-linux/tests.linux
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Reading from PMCCNTR causes an undefined instruction exception when
running in PL0 (e.g. as a Linux userspace binary), unless the
PMUSERENR.EN bit is set.
Restructure profile_timestamp() for 32-bit ARM to perform an
availability check on the first invocation, with subsequent
invocations returning zero if PMCCNTR could not be enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
All consumers of profile_timestamp() currently treat the value as an
unsigned long. Only the elapsed number of ticks is ever relevant: the
absolute value of the timestamp is not used. Profiling is used to
measure short durations that are generally fewer than a million CPU
cycles, for which an unsigned long is easily large enough.
Standardise the return type of profile_timestamp() as unsigned long
across all CPU architectures. This allows 32-bit architectures such
as i386 and riscv32 to omit all logic associated with retrieving the
upper 32 bits of the 64-bit hardware counter, which simplifies the
code and allows riscv32 and riscv64 to share the same implementation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Big integer multiplication currently performs immediate carry
propagation from each step of the long multiplication, relying on the
fact that the overall result has a known maximum value to minimise the
number of carries performed without ever needing to explicitly check
against the result buffer size.
This is not a constant-time algorithm, since the number of carries
performed will be a function of the input values. We could make it
constant-time by always continuing to propagate the carry until
reaching the end of the result buffer, but this would introduce a
large number of redundant zero carries.
Require callers of bigint_multiply() to provide a temporary carry
storage buffer, of the same size as the result buffer. This allows
the carry-out from the accumulation of each double-element product to
be accumulated in the temporary carry space, and then added in via a
single call to bigint_add() after the multiplication is complete.
Since the structure of big integer multiplication is identical across
all current CPU architectures, provide a single shared implementation
of bigint_multiply(). The architecture-specific operation then
becomes the multiplication of two big integer elements and the
accumulation of the double-element product.
Note that any intermediate carry arising from accumulating the lower
half of the double-element product may be added to the upper half of
the double-element product without risk of overflow, since the result
of multiplying two n-bit integers can never have all n bits set in its
upper half. This simplifies the carry calculations for architectures
such as RISC-V and LoongArch64 that do not have a carry flag.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit 79c0173 ("[build] Create util/genfsimg for building
filesystem-based images"), the EFI boot file name for each CPU
architecture is defined within the genfsimg script itself, rather than
being passed in as a Makefile parameter.
Remove the now-redundant Makefile definitions for EFI_BOOT_FILE.
Reported-by: Christian I. Nilsson <nikize@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for building iPXE as a 64-bit or 32-bit RISC-V binary, for
either UEFI or Linux userspace platforms. For example:
# RISC-V 64-bit UEFI
make CROSS=riscv64-linux-gnu- bin-riscv64-efi/ipxe.efi
# RISC-V 32-bit UEFI
make CROSS=riscv64-linux-gnu- bin-riscv32-efi/ipxe.efi
# RISC-V 64-bit Linux
make CROSS=riscv64-linux-gnu- bin-riscv64-linux/tests.linux
qemu-riscv64 -L /usr/riscv64-linux-gnu/sys-root \
./bin-riscv64-linux/tests.linux
# RISC-V 32-bit Linux
make CROSS=riscv64-linux-gnu- SYSROOT=/usr/riscv32-linux-gnu/sys-root \
bin-riscv32-linux/tests.linux
qemu-riscv32 -L /usr/riscv32-linux-gnu/sys-root \
./bin-riscv32-linux/tests.linux
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Define a cpu_halt() function which is architecture-specific but
platform-independent, and merge the multiple architecture-specific
implementations of the EFI cpu_nap() function into a single central
efi_cpu_nap() that uses cpu_halt() if applicable.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The definitions of the setjmp() and longjmp() functions are common to
all architectures, with only the definition of the jump buffer
structure being architecture-specific.
Move the architecture-specific portions to bits/setjmp.h and provide a
common setjmp.h for the function definitions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Move the <gdbmach.h> file to <bits/gdbmach.h>, and provide a common
dummy implementation for all architectures that have not yet
implemented support for GDB.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Simplify the process of adding a new CPU architecture by providing
common implementations of typically empty architecture-specific header
files.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The UEFI device model requires us to not probe the PCI bus directly,
but instead to wait to be offered the opportunity to drive devices via
our driver service binding handle.
We currently inhibit PCI bus probing by having pci_discover() return
an empty range when using the EFI PCI I/O API. This has the unwanted
side effect that scanning the bus manually using the "pciscan" command
will also fail to discover any devices.
Separate out the concept of being allowed to probe PCI buses from the
mechanism for discovering PCI bus:dev.fn address ranges, so that this
limitation may be removed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Redefine bit 30 of an SMBIOS numerical setting to be part of the
function number, in order to allow access to hypervisor CPUID leaves.
This technically breaks backwards compatibility with scripts
attempting to read more than 64 consecutive functions. Since there is
no meaningful block of 64 consecutive related functions, it is
vanishingly unlikely that this capability has ever been used.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Hypervisors typically intercept CPUID leaves in the range 0x40000000
to 0x400000ff, with leaf 0x40000000 returning the maximum supported
function within this range in register %eax.
iPXE currently masks off bit 30 from the requested CPUID leaf when
checking to see if a function is supported, which causes this check to
read from leaf 0x00000000 instead of 0x40000000.
Fix by including bit 30 within the mask.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Older versions of gcc (observed with gcc 4.8.5 on CentOS 7) complain
about having the label "err_ioremap" at the end of a compound
statement in bios_mp_start_all(). The label is correctly placed,
since it immediately follows the iounmap() that would be required to
undo a successful ioremap() in the non-error case.
Fix by adding an explicit "return" immediately after the label.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Intel and AMD distribute microcode updates, which are typically
applied by the BIOS and/or the booted operating system.
BIOS updates can be difficult to obtain and cumbersome to apply, and
are often neglected. Operating system updates may be subject to
strict change control processes, particularly for production
workloads. There is therefore value in being able to update the
microcode at boot time using a freshly downloaded microcode update
file, particularly in scenarios where the physical hardware and the
installed operating system are controlled by different parties (such
as in a public cloud infrastructure).
Add support for parsing Intel and AMD microcode update images, and for
applying the updates to all CPUs in the system.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide an implementation of the iPXE multiprocessor API for BIOS,
based on sending broadcast INIT and SIPI interprocessor interrupts to
start up all application processors.
Signed-off-by: Michael Brown <mcb30@ipxe.org>