Commit Graph

1594 Commits

Author SHA1 Message Date
Michael Brown
4224f574da [pci] Map all MSI-X interrupts to a dummy target address by default
Interrupts as such are not used in iPXE, which operates in polling
mode.  However, some network cards (such as the Intel 40GbE and 100GbE
NICs) will defer writing out completions until the point of asserting
an MSI-X interrupt.

From the point of view of the PCI device, asserting an MSI-X interrupt
is just a 32-bit DMA write of an opaque value to an opaque target
address.  The PCI device has no know to know whether or not the target
address corresponds to a real APIC.

We can therefore trick the PCI device into believing that it is
asserting an MSI-X interrupt, by configuring it to write an opaque
32-bit value to a dummy target address in host memory.  This is
sufficient to trigger the associated write of the completions to host
memory.

Allocate a dummy target address when enabling MSI-X on a PCI device,
and map all interrupts to this target address by default.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-09 16:29:29 +01:00
Michael Brown
ce30ba14fc [gve] Select preferred operating mode
Select a preferred operating mode from those advertised as supported
by the device, falling back to the oldest known mode (GQI-QPL) if
no modes are advertised.

Since there are devices in existence that support only QPL addressing,
and since we want to minimise code size, we choose to always use a
single fixed ring buffer even when using raw DMA addressing.  Having
paid this penalty, we therefore choose to prefer QPL over RDA since
this allows the (virtual) hardware to minimise the number of page
table manipulations required.  We similarly prefer GQI over DQO since
this minimises the amount of work we have to do: in particular, the RX
descriptor ring contents can remain untouched for the lifetime of the
device and refills require only a doorbell write.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-06 14:04:18 +01:00
Michael Brown
74c9fd72cf [gve] Add support for out-of-order queues
Add support for the "DQO" out-of-order transmit and receive queue
formats.  These are almost entirely different in format and usage (and
even endianness) from the original "GQI" in-order transmit and receive
queues, and arguably should belong to a completely different device
with a different PCI ID.  However, Google chose to essentially crowbar
two unrelated device models into the same virtual hardware, and so we
must handle both of these device models within the same driver.

Most of the new code exists solely to handle the differences in
descriptor sizes and formats.  Out-of-order completions are handled
via a buffer ID ring (as with other devices supporting out-of-order
completions, such as the Xen, Hyper-V, and Amazon virtual NICs).  A
slight twist is that on the transmit datapath (but not the receive
datapath) the Google NIC provides only one completion per packet
instead of one completion per descriptor, and so we must record the
list of chained buffer IDs in a separate array at the time of
transmission.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-06 14:04:12 +01:00
Michael Brown
0d1ddfe42c [gve] Cancel pending transmissions when closing device
We cancel any pending transmissions when (re)starting the device since
any transmissions that were initiated before the admin queue reset
will not complete.

The network device core will also cancel any pending transmissions
after the device is closed.  If the device is closed with some
transmissions still pending and is then reopened, this will therefore
result in a stale I/O buffer being passed to netdev_tx_complete_err()
when the device is restarted.

This error has not been observed in practice since transmissions
generally complete almost immediately and it is therefore unlikely
that the device will ever be closed with transmissions still pending.
With out-of-order queues, the device seems to delay transmit
completions (with no upper time limit) until a complete batch is
available to be written out as a block of 128 bytes.  It is therefore
very likely that the device will be closed with transmissions still
pending.

Fix by ensuring that we have dropped all references to transmit I/O
buffers before returning from gve_close().

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-06 13:16:22 +01:00
Joseph Wong
cf53497541 [bnxt] Handle link related async events
Handle async events related to link speed change, link speed config
change, and port phy config changes.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2025-10-01 16:20:23 +01:00
Michael Brown
4508e10233 [gve] Allow for descriptor and completion lengths to vary by mode
The descriptors and completions in the DQO operating mode are not the
same sizes as the equivalent structures in the GQI operating mode.
Allow the queue stride size to vary by operating mode (and therefore
to be known only after reading the device descriptor and selecting the
operating mode).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-09-30 12:17:22 +01:00
Michael Brown
20a489253c [gve] Rename GQI-specific data structures and constants
Rename data structures and constants that are specific to the GQI
operating mode, to allow for a cleaner separation from other operating
modes.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-09-30 11:10:20 +01:00
Michael Brown
86b322d999 [gve] Allow for out-of-order buffer consumption
We currently assume that the buffer index is equal to the descriptor
ring index, which is correct only for in-order queues.

Out-of-order queues will include a buffer tag value that is copied
from the descriptor to the completion.  Redefine the data buffers as
being indexed by this tag value (rather than by the descriptor ring
index), and add a circular ring buffer to allow for tags to be reused
in whatever order they are released by the hardware.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-09-30 11:09:45 +01:00
Michael Brown
b8dd3c384b [gve] Add support for raw DMA addressing
Raw DMA addressing allows the transmit and receive descriptors to
provide the DMA address of the data buffer directly, without requiring
the use of a pre-registered queue page list.  It is modelled in the
device as a magic "raw DMA" queue page list (with QPL ID 0xffffffff)
covering the whole of the DMA address space.

When using raw DMA addressing, the transmit and receive datapaths
could use the normal pattern of mapping I/O buffers directly, and
avoid copying packet data into and out of the fixed queue page list
ring buffer.  However, since we must retain support for queue page
list addressing (which requires this additional copying), we choose to
minimise code size by continuing to use the fixed ring buffer even
when using raw DMA addressing.

Add support for using raw DMA addressing by setting the queue page
list base device address appropriately, omitting the commands to
register and unregister the queue page lists, and specifying the raw
DMA QPL ID when creating the TX and RX queues.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-09-29 15:13:55 +01:00
Michael Brown
9f554ec9d0 [gve] Add concept of a queue page list base device address
Allow for the existence of a queue page list where the base device
address is non-zero, as will be the case for the raw DMA addressing
(RDA) operating mode.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-09-29 15:13:55 +01:00
Michael Brown
91db5b68ff [gve] Set descriptor and completion ring sizes when creating queues
The "create TX queue" and "create RX queue" commands have fields for
the descriptor and completion ring sizes, which are currently left
unpopulated since they are not required for the original GQI-QPL
operating mode.

Populate these fields, and allow for the possibility that a transmit
completion ring exists (which will be the case when using the DQO
operating mode).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-09-29 15:13:55 +01:00
Michael Brown
048a346705 [gve] Add concept of operating mode
The GVE family supports two incompatible descriptor queue formats:

  * GQI: in-order descriptor queues
  * DQO: out-of-order descriptor queues

and two addressing modes:

  * QPL: pre-registered queue page list addressing
  * RDA: raw DMA addressing

All four combinations (GQI-QPL, GQI-RDA, DQO-QPL, and DQO-RDA) are
theoretically supported by the Linux driver, which is essentially the
only public reference provided by Google.  The original versions of
the GVE NIC supported only GQI-QPL mode, and so the iPXE driver is
written to target this mode, on the assumption that it would continue
to be supported by all models of the GVE NIC.

This assumption turns out to be incorrect: Google does not deem it
necessary to retain backwards compatibility.  Some newer machine types
(such as a4-highgpu-8g) support only the DQO-RDA operating mode.

Add a definition of operating mode, and pass this as an explicit
parameter to the "configure device resources" admin queue command.  We
choose a representation that subtracts one from the value passed in
this command, since this happens to allow us to decompose the mode
into two independent bits (one representing the use of DQO descriptor
format, one representing the use of QPL addressing).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-09-29 15:13:55 +01:00
Michael Brown
610089b98e [gve] Remove separate concept of "packet descriptor"
The Linux driver occasionally uses the terminology "packet descriptor"
to refer to the portion of the descriptor excluding the buffer
address.  This is not a helpful separation, and merely adds
complexity.

Simplify the code by removing this artifical separation.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-09-29 15:12:54 +01:00
Michael Brown
ee9aea7893 [gve] Parse option list returned in device descriptor
Provide space for the device to return its list of supported options.
Parse the option list and record the existence of each option in a
support bitmask.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-09-26 12:02:03 +01:00
Joseph Wong
6464f2edb8 [bnxt] Add error recovery support
Add support to advertise adapter error recovery support to the
firmware.  Implement error recovery operations if adapter fault is
detected.  Refactor memory allocation to better align with probe and
open functions.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2025-09-18 13:25:07 +01:00
Michael Brown
61b4585e2a [efi] Drag in MNP driver whenever SNP driver is present
The chainloaded-device-only "snponly" driver already drags in support
for driving SNP, NII, and MNP devices, on the basis that the user
generally doesn't care which UEFI API is used and just wants to boot
from the same network device that was used to load iPXE.

The multi-device "snp" driver already drags in support for driving SNP
and NII devices, but does not drag in support for MNP devices.

There is essentially zero code size overhead to dragging in support
for MNP devices, since this support is always present in any iPXE
application build anyway (as part of the code to download
"autoexec.ipxe" prior to installing our own drivers).

Minimise surprise by dragging in support for MNP devices whenever
using the "snp" driver, following the same reasoning used for the
"snponly" driver.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-08-27 13:12:11 +01:00
Joseph Wong
a53ec44932 [bnxt] Update CQ doorbell type
Update completion queue doorbell to a non-arming type, since polling
is used.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2025-08-13 12:36:20 +01:00
Michael Brown
8460dc4e8f [dwgpio] Use fdt_reg() to get GPIO port numbers
DesignWare GPIO port numbers are represented as unsized single-entry
regions.  Use fdt_reg() to obtain the GPIO port number, rather than
requiring access to a region cell size specification stored in the
port group structure.

This allows the field name "regs" in the port group structure to be
repurposed to hold the I/O register base address, which then matches
the common usage in other drivers.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-08-07 15:49:12 +01:00
Michael Brown
88ba011764 [fdt] Provide fdt_reg() for unsized single-entry regions
Many region types (e.g. I2C bus addresses) can only ever contain a
single region with no size cells specified.  Provide fdt_reg() to
reduce boilerplate in this common use case.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-08-07 15:49:09 +01:00
Michael Brown
2e4e1f7e9e [dwgpio] Add driver for the DesignWare GPIO controller
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-08-05 14:39:56 +01:00
Michael Brown
5f10b74555 [fdt] Use phandle as device location
Consumption of phandles will be in the form of locating a functional
device (e.g. a GPIO device, or an I2C device, or a reset controller)
by phandle, rather than locating the device tree node to which the
phandle refers.

Repurpose fdt_phandle() to obtain the phandle value (instead of
searching by phandle), and record this value as the bus location
within the generic device structure.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-08-04 14:52:00 +01:00
Michael Brown
f7a1e9ef8e [dwmac] Show core version in debug messages
Read and display the core version immediately after mapping the MMIO
registers, to provide a basic sanity check that the registers have
been correctly mapped and the core is not held in reset.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-07-30 15:59:38 +01:00
Michael Brown
01b1028d4e [bnxt] Remove unnecessary test_if macro
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-07-30 14:08:25 +01:00
Joseph Wong
6ca7a560a4 [bnxt] Remove unnecessary I/O macros
Remove unnecessary driver specific macros.  Use standard
pci_read_config_xxxx, pci_write_config_xxx, writel/q calls.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2025-07-30 14:03:51 +01:00
Michael Brown
e01e5ff7c6 [dwusb] Add driver for DesignWare USB3 host controller
Add a basic driver for the DesignWare USB3 host controller as found in
the Lichee Pi 4A.

This driver covers only the DesignWare host controller hardware.  On
the Lichee Pi 4A, this is sufficient to get the single USB root hub
port (exposed internally via the SODIMM connector) up and running.

The driver does not yet handle the various GPIOs that control power
and signal routing for the Lichee Pi 4A's onboard VL817 USB hub and
the four physical USB-A ports.  This therefore leaves the USB hub and
the USB-A ports unpowered, and the USB2 root hub port routed to the
physical USB-C port.  Devices plugged in to the USB-A ports will not
be powered up, and a device plugged in to the USB-C port will
enumerate as a USB2 device.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-07-21 15:55:13 +01:00
Michael Brown
6c42ea1275 [xhci] Allow for non-PCI xHCI host controllers
Allow for the existence of xHCI host controllers where the underlying
hardware is not a PCI device.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-07-21 15:33:58 +01:00
Michael Brown
eca97c2ee2 [xhci] Use root hub port number to determine slot type
We currently use the downstream hub's port number to determine the
xHCI slot type for a newly connected USB device.  The downstream hub
port number is irrelevant to the xHCI controller's supported protocols
table: the relevant value is the number of the root hub port through
which the device is attached.

Fix by using the root hub port number instead of the immediate parent
hub's port number.

This bug has not previously been detected since the slot type for the
first N root hub ports will invariably be zero to indicate that these
are USB ports.  For any xHCI controller with a sufficiently large
number of root hub ports, the code would therefore end up happening to
calculate the correct slot type value despite using an incorrect port
number.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-07-18 14:58:56 +01:00
Michael Brown
1e3fb1b37e [init] Show initialisation function names in debug messages
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-07-15 14:10:33 +01:00
Michael Brown
7ac4b3c6f1 [efi] Assume that vendor wireless drivers are unusable via SNP
The UEFI model for wireless network boot cannot sensibly be described
without cursing.  Commit 758a504 ("[efi] Inhibit calls to Shutdown()
for wireless SNP devices") attempts to work around some of the known
issues.

Experimentation shows that on at least some platforms (observed with a
Lenovo ThinkPad T14s Gen 5) the vendor SNP driver is broken to the
point of being unusable in anything other than the single use case
envisioned by the firwmare authors.  Doing almost anything directly
via the SNP protocol interface has a greater than 50% chance of
locking up the system.

Assume, in the absence of any evidence to the contrary so far, that
vendor SNP drivers for wireless network devices are so badly written
as to be unusable.  Refuse to even attempt to interact with these
drivers via the SNP or NII protocol interfaces.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-07-15 09:12:54 +01:00
Michael Brown
c2cdc1d31e [dwmac] Add driver for DesignWare Ethernet MAC
Add a basic driver for the DesignWare Ethernet MAC network interface
as found in the Lichee Pi 4A.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-07-10 14:39:07 +01:00
Michael Brown
bbabde8ff8 [riscv] Invalidate data cache on completed RX DMA buffers
The data cache must be invalidated twice for RX DMA buffers: once
before passing ownership to the DMA device (in case the cache happens
to contain dirty data that will be written back at an undefined future
point), and once after receiving ownership from the DMA device (in
case the CPU happens to have speculatively accessed data in the buffer
while it was owned by the hardware).

Only the used portion of the buffer needs to be invalidated after
completion, since we do not care about data within the unused portion.

Update the DMA API to include the used length as an additional
parameter to dma_unmap(), and add the necessary second cache
invalidation pass to the RISC-V DMA API implementation.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-07-10 14:39:07 +01:00
Michael Brown
22de0c4edf [dma] Use virtual addresses for dma_map()
Cache management operations must generally be performed on virtual
addresses rather than physical addresses.

Change the address parameter in dma_map() to be a virtual address, and
make dma() the API-level primitive instead of dma_phys().

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-07-08 15:13:19 +01:00
Joseph Wong
6bc55d65b1 [bnxt] Update supported devices array
Add support for new device IDs. Remove device IDs which were never
in use.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2025-07-02 16:18:33 -07:00
Joseph Wong
0020627777 [bnxt] Update device descriptions
Use human readable strings for dev_description in PCI_ROM array.

Signed-off-by:  Joseph Wong <joseph.wong@broadcom.com>
2025-07-01 16:05:34 -07:00
Joseph Wong
126366ac47 [bnxt] Remove VLAN stripping logic
Remove logic that programs the hardware to strip out VLAN from RX
packets.  Do not drop packets due to VLAN mismatch and allow the upper
layer to decide whether to discard the packets.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2025-06-29 14:21:51 +01:00
Joseph Wong
54392f0d70 [bnxt] Increase Tx descriptors
Increase TX and CMP descriptor counts.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2025-06-25 14:05:33 +01:00
Michael Brown
d3e10ebd35 [legacy] Allocate legacy driver .bss-like segments at probe time
Some legacy drivers use large static allocations for transmit and
receive buffers.  To avoid bloating the .bss segment, we currently
implement these as a single common symbol named "_shared_bss" (which
is permissible since only one legacy driver may be active at any one
time).

Switch to dynamic allocation of these .bss-like segments, to avoid the
requirement for using common symbols.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-24 13:41:51 +01:00
Michael Brown
6ea800ab54 [legacy] Rename the global legacy NIC to "legacy_nic"
We currently have contexts in which the local variable "nic" is a
pointer to the global variable also called "nic".  This complicates
the creation of macros.

Rename the global variable to "legacy_nic" to reduce pollution of the
global namespace and to allow for the creation of macros referring to
fields within this global variable.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-24 13:41:51 +01:00
Michael Brown
d0c02e0df8 [legacy] Allocate extra padding in receive buffers
Allow for legacy drivers that include VLAN tags or CRCs within their
received packets.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-24 13:41:51 +01:00
Michael Brown
9ada09c919 [dwuart] Read input clock frequency from the device tree
The 16550 design includes a programmable 16-bit clock divider for an
arbitrary input clock, requiring knowledge of the input clock
frequency in order to calculate the divider value for a given baud
rate.  The 16550 UARTs in an x86 PC will always have a 1.8432 MHz
input clock.  Non-x86 systems may have other input clock frequencies.

Define the input clock frequency as a property of a 16550 UART, and
read the value from the device tree "clock-frequency" property.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-23 22:56:38 +01:00
Michael Brown
0ed1dea7f4 [uart] Wait for 16550 UART to become idle before modifying LCR
Some implementations of 16550-compatible UARTs (e.g. the DesignWare
UART) are known to ignore writes to the line control register while
the transmitter is active.

Wait for the transmitter to become empty before attempting to write to
the line control register.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-23 22:56:09 +01:00
Michael Brown
5d9f20bbd6 [dwuart] Add "ns16550a" compatible device ID
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-23 15:10:55 +01:00
Michael Brown
53a3befb69 [dwuart] Add a basic driver for the Synopsys DesignWare UART
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-21 23:34:32 +01:00
Michael Brown
cca1cfd49e [uart] Allow for dynamically registered 16550 UARTs
Use the generic UART driver-private data pointer, rather than
embedding the generic UART within the 16550 UART structure.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-21 23:34:32 +01:00
Michael Brown
6c8fb4b89d [uart] Allow for the existence of non-16550 UARTs
Remove the assumption that all platforms use a fixed number of 16550
UARTs identifiable by a simple numeric index.  Create an abstraction
allowing for dynamic instantiation and registration of any number of
arbitrary UART models.

The common case of the serial console on x86 uses a single fixed UART
specified at compile time.  Avoid unnecessarily dragging in the
dynamic instantiation code in this use case by allowing COMCONSOLE to
refer to a single static UART object representing the relevant port.

When selecting a UART by command-line argument (as used in the
"gdbstub serial <port>" command), allow the UART to be specified as
either a numeric index (to retain backwards compatiblity) or a
case-insensitive port name such as "COM2".

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-20 12:52:04 +01:00
Joseph Wong
1de3aef78c [bnxt] Remove TX padding
Remove unnecessary TX padding.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2025-06-11 15:07:40 +01:00
Michael Brown
c4a3d438e6 [dt] Allow for creation of standalone devices
We will want to be able to create the console device as early as
possible.  Refactor devicetree probing to remove the assumption that a
devicetree device must have a devicetree parent, and expose functions
to allow a standalone device to be created given only the offset of a
node within the tree.

The full device path is no longer trivial to construct with this
assumption removed.  The full path is currently used only for debug
messages.  Remove the stored full path, use just the node name for
debug messages, and ensure that the topology information previously
visible in the full path is reconstructible from the combined debug
output if needed.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-06-11 13:02:20 +01:00
Michael Brown
bb2011241f [dt] Locate parent node at point of use in dt_ioremap()
We currently rely on the recursive nature of devicetree bus probing to
obtain the region cell size specification from the parent device.
This blocks the possibility of creating a standalone console device
based on /chosen/stdout-path before probing the whole bus.

Fix by using fdt_parent() to locate the parent device at the point of
use within dt_ioremap().

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-05-30 16:39:10 +01:00
Joseph Wong
1dd9ac13fd [bnxt] Use updated DMA APIs
Replace malloc_phys with dma_alloc, free_phys with dma_free, alloc_iob
with alloc_rx_iob, free_iob with free_rx_iob, virt_to_bus with dma or
iob_dma.  Replace dma_addr_t with physaddr_t.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2025-05-14 14:21:02 +01:00
Joseph Wong
08edad7ca3 [bnxt] Return proper error codes in probe
Return the proper error codes in bnxt_init_one, to indicate the
correct return status upon completion.  Failure paths could
incorrectly indicate a success.  Correct assertion condition to check
for non-NULL pointer.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2025-05-14 14:08:27 +01:00