Commit Graph

1392 Commits

Author SHA1 Message Date
Michael Brown
0049243367 [ena] Switch to two-phase reset mechanism
The Linux and FreeBSD drivers for the (totally undocumented) ENA
adapters use a two-phase reset mechanism: first set ENA_CTRL.RESET and
wait for this to be reflected in ENA_STAT.RESET, then clear
ENA_CTRL.RESET and again wait for it to be reflected in
ENA_STAT.RESET.

The iPXE driver currently assumes a self-clearing reset mechanism,
which appeared to work at the time that the driver was created but
seems no longer to function, at least on the t3.nano and t3a.nano
instance types found in eu-west-1.

Switch to a simplified version of the two-phase reset mechanism as
used by Linux and FreeBSD.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-02-13 19:08:45 +00:00
Christian Iversen
1af0fe04f8 [hermon] Add support for ConnectX-3 based cards
After a ton of tedious work, I am pleased to finally introduce full
support for ConnectX-3 cards in iPXE!

The work has been done by finding all publicly available versions of
the Mellanox Flexboot sources, cleaning them up, synthesizing a git
history from them, cleaning out non-significant changes, and
correlating with the iPXE upstream git history.

After this, a proof-of-concept diff was produced, that allowed iPXE to
be compiled with rudimentary ConnectX-3 support. This diff was over
10k lines, and contained many changes that were not part of the core
driver.

Special thanks to Michael Brown <mcb30@ipxe.org> for answering my
barrage of questions, and helping brainstorm the development along the
way.

Signed-off-by: Christian Iversen <ci@iversenit.dk>
2021-02-02 00:37:43 +01:00
Michael Brown
6f1cb791ee [hermon] Avoid parsing length field on completion errors
The CQE length field will not be valid for a completion in error.
Avoid parsing the length field and just call the completion handler
directly.

In debug builds, also dump the queue pair context to allow for
inspection of the error.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-02-01 23:08:49 +00:00
Michael Brown
8747241b3e [hermon] Make hermon_dump_xxx() functions no-ops on non-debug builds
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-02-01 23:00:05 +00:00
Michael Brown
410566cef7 [hermon] Minimise reset time
Check for reset completion by waiting for the device to respond to PCI
configuration cycles, as documented in the Programmer's Reference
Manual.  On the original ConnectX HCA, this reduces the time spent on
reset from 1000ms down to 1ms.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-02-01 22:29:30 +00:00
Christian Iversen
7b2b35981f [hermon] Throttle debug output when sensing port type
When auto-detecting the initial port type, the Hermon driver will spam
the debug output without hesitation.  Add a short delay in each
iteration to fix this.

Signed-off-by: Christian Iversen <ci@iversenit.dk>
2021-02-01 12:35:22 +00:00
Christian Iversen
299c671f57 [hermon] Add a debug notice when initialization is complete
Signed-off-by: Christian Iversen <ci@iversenit.dk>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-02-01 12:30:25 +00:00
Christian Iversen
8b07c88df8 [hermon] Add support for port management event
Inspired by Flexboot, the function hermon_event_port_mgmnt_change() is
added to handle the HERMON_EV_PORT_MGMNT_CHANGE event type, which
updates the Infiniband subsystem.

Signed-off-by: Christian Iversen <ci@iversenit.dk>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-02-01 11:44:54 +00:00
Christian Iversen
d948ac6c61 [hermon] Adjust Ethernet work queue size
Hermon Ethernet work queues have more RX than TX entries, unlike most
other drivers.  This is possibly the source of some stochastic
deadlocks previously experienced with this driver.

Update the sizes to be in line with other drivers, and make them
slightly larger for better performance.  These new queue sizes have
been found to work well with ConnectX-3 hardware.

Signed-off-by: Christian Iversen <ci@iversenit.dk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-02-01 11:12:26 +00:00
Michael Brown
e62c3e3513 [hermon] Use reset value suitable for ConnectX-3
The programming documentation states that the reset magic value is
"0x00000001 (Big Endian)", and the current code matches this by using
the value 0x01000000 for the implicitly little-endian writel().

Inspection of the FlexBoot source code reveals an exciting variety of
reset values, some suggestive of confusion around endianness.

Experimentation suggests that the value 0x01000001 works reliably
across a wide range of hardware.

Debugged-by: Christian Iversen <ci@iversenit.dk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-02-01 01:53:15 +00:00
Christian Iversen
2e3d5909ee [hermon] Clean up whitespace in hermon.c
Signed-off-by: Christian Iversen <ci@iversenit.dk>
2021-02-01 01:48:29 +00:00
Christian Iversen
79031fee21 [iscsi] Update link to iBFT reference manual
Signed-off-by: Christian Iversen <ci@iversenit.dk>
2021-02-01 01:27:08 +01:00
Michael Brown
def46cf344 [hermon] Limit link poll frequency in DOWN state
Some older versions of the hardware (and/or firmware) do not report an
event when an Infiniband link reaches the INIT state.  The driver
works around this missing event by calling ib_smc_update() on each
event queue poll while the link is in the DOWN state.

Commit 6cb12ee ("[hermon] Increase polling rate for command
completions") addressed this by speeding up the time taken to issue
each command invoked by ib_smc_update().  Experimentation shows that
the impact is still significant: for example, in a situation where an
unplugged port is opened, the throughput on the other port can be
reduced by over 99%.

Fix by throttling the rate at which link polling is attempted.

Debugged-by: Christian Iversen <ci@iversenit.dk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-01-31 23:29:45 +00:00
Christian Iversen
43d72d0087 [hermon] Perform clean MPT unmap on device shutdown
This change is ported from Flexboot sources.  When stopping a Hermon
device, perform hermon_unmap_mpt() which runs HERMON_HCR_HW2SW_MPT to
bring the Memory Protection Table (MPT) back to software control.

Signed-off-by: Christian Iversen <ci@iversenit.dk>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-01-29 00:46:53 +00:00
Christian Iversen
699b9f1d1b [hermon] Use Ethernet MAC as eIPoIB local EMAC
The eIPoIB local Ethernet MAC is currently constructed from the port
GUID.  Given a base GUID/MAC value of N, Mellanox seems to populate:

  Node GUID:   N + 0
  Port 1 GUID: N + 1
  Port 2 GUID: N + 2

and

  Port 1 MAC:  N + 0
  Port 2 MAC:  N + 1

This causes a duplicate local MAC address when port 1 is configured as
Infiniband and port 2 as Ethernet, since both will derive their MAC
address as (N + 1).

Fix by using the port's Ethernet MAC as the eIPoIB local EMAC.  This
is a behavioural change that could potentially break configurations
that rely on the local EMAC value, such as a DHCP server relying on
the chaddr field for DHCP reservations.

Signed-off-by: Christian Iversen <ci@iversenit.dk>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-01-29 00:13:46 +00:00
Christian Iversen
6cb12ee2b0 [hermon] Increase polling rate for command completions
Some older versions of the hardware (and/or firmware) do not report an
event when an Infiniband link reaches the INIT state.  The driver
works around this missing event by calling ib_smc_update() on each
event queue poll while the link is in the DOWN state.  This results in
a very large number of commands being issued while any open Infiniband
link is in the DOWN state (e.g. unplugged), to the point that the 1ms
delay from waiting for each command to complete will noticeably affect
responsiveness.

Fix by decreasing the command completion polling delay from 1ms to
10us.

Signed-off-by: Christian Iversen <ci@iversenit.dk>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-01-28 23:47:00 +00:00
Michael Brown
7d32225b55 [hermon] Add event queue debug functions
Add hermon_dump_eqctx() for dumping the event queue context and
hermon_dump_eqes() for dumping any unconsumed event queue entries.

Originally-implemented-by: Christian Iversen <ci@iversenit.dk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-01-28 22:30:56 +00:00
Christian Iversen
7c40227e18 [hermon] Increase command timeout from 2 to 10 seconds
Some commands (particularly in relation to device initialization) can
occasionally take longer than 2 seconds, and the Mellanox documentation
recommends a 10 second timeout.

Signed-off-by: Christian Iversen <ci@iversenit.dk>
2021-01-28 20:55:14 +00:00
Michael Brown
cd126c41bb [hermon] Add assorted debug error messages
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-01-28 20:52:36 +00:00
Michael Brown
ce45c8dc21 [hermon] Show "issuing command" messages only at DBGLVL_EXTRA
Originally-implemented-by: Christian Iversen <ci@iversenit.dk>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-01-28 17:29:36 +00:00
Christian Iversen
a2893dc18a [hermon] Reorganize PCI ROM list and document well-known product names
Signed-off-by: Christian Iversen <ci@iversenit.dk>
2021-01-28 17:23:05 +00:00
Christian Iversen
0e788c8eda [golan] Backport typo fix in nodnic_prm.h: s/HERMON/NODNIC/
Signed-off-by: Christian Iversen <ci@iversenit.dk>
2021-01-28 17:19:22 +00:00
Christian Iversen
36a892a7c7 [arbel] Clean up whitespace in MT25218_PRM.h header
Signed-off-by: Christian Iversen <ci@iversenit.dk>
2021-01-28 17:14:08 +00:00
Christian Iversen
414c842f06 [hermon] Clean up whitespace in MT25408_PRM.h header
Signed-off-by: Christian Iversen <ci@iversenit.dk>
2021-01-28 17:10:47 +00:00
Christian Iversen
b9de7e6eda [infiniband] Require drivers to specify the number of ports
Require drivers to report the total number of Infiniband ports.  This
is necessary to report the correct number of ports on devices with
dynamic port types.

For example, dual-port Mellanox cards configured for (eth, ib) would
be rejected by the subnet manager, because they report using "port 2,
out of 1".

Signed-off-by: Christian Iversen <ci@iversenit.dk>
2021-01-27 01:15:35 +00:00
Michael Brown
8e3826aa10 [build] Inhibit spurious array bounds warning on some versions of gcc
Some versions of gcc (observed with gcc 9.3.0 on NixOS Linux) produce
a spurious warning about an out-of-bounds array access for the
isa_extra_probe_addrs[] array.

Work around this compiler bug by redefining the array index as a
signed long, which seems to somehow avoid this spurious warning.

Debugged-by: Manuel Mendez <mmendez534@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-01-15 20:54:27 +00:00
Manuel Mendez
a5fb41873d [isa] Add missing #include <config/isa.h>
Signed-off-by: Manuel Mendez <mmendez534@gmail.com>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-01-13 23:01:27 +00:00
Michael Brown
c42f31bc8a [xhci] Avoid false positive Coverity warning
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-01-04 09:37:59 +00:00
Michael Brown
7ce3b84050 [xhci] Show meaningful error messages after command failures
Ensure that any command failure messages are followed up with an error
message indicating what the failed command was attempting to perform.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-01-03 19:12:00 +00:00
Michael Brown
017b345d5a [xhci] Fail attempts to issue concurrent commands
The xHCI driver can handle only a single command TRB in progress at
any one time.  Immediately fail any attempts to issue concurrent
commands (which should not occur in normal operation).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2021-01-03 19:08:49 +00:00
Martin Habets
da491eaae7 [sfc] Update email addresses
Email from solarflare.com will stop working, so update those.  Remove
email for Shradha Shah, as she is not involved with this any more.
Update copyright notices for files touched.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-12-28 18:41:55 +00:00
Mohammed Taha
ce841946df [golan] Add new PCI IDs
Signed-off-by: Mohammed <mohammedt@mellanox.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-12-28 13:55:30 +00:00
Michael Brown
f47a45ea2d [iphone] Add iPhone tethering driver
USB tethering via an iPhone is unreasonably complicated due to the
requirement to perform a pairing operation that involves establishing
a TLS session over a completely unrelated USB function that speaks a
protocol that is almost, but not quite, entirely unlike TCP.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-12-16 13:29:06 +00:00
Michael Brown
13a6d17296 [xhci] Update driver to use DMA API
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-29 11:25:40 +00:00
Michael Brown
8d337ecdae [dma] Move I/O buffer DMA operations to iobuf.h
Include a potential DMA mapping within the definition of an I/O
buffer, and move all I/O buffer DMA mapping functions from dma.h to
iobuf.h.  This avoids the need for drivers to maintain a separate list
of DMA mappings for each I/O buffer that they may handle.

Network device drivers typically do not keep track of transmit I/O
buffers, since the network device core already maintains a transmit
queue.  Drivers will typically call netdev_tx_complete_next() to
complete a transmission without first obtaining the relevant I/O
buffer pointer (and will rely on the network device core automatically
cancelling any pending transmissions when the device is closed).

To allow this driver design approach to be retained, update the
netdev_tx_complete() family of functions to automatically perform the
DMA unmapping operation if required.  For symmetry, also update the
netdev_rx() family of functions to behave the same way.

As a further convenience for drivers, allow the network device core to
automatically perform DMA mapping on the transmit datapath before
calling the driver's transmit() method.  This avoids the need to
introduce a mapping error handling code path into the typically
error-free transmit methods.

With these changes, the modifications required to update a typical
network device driver to use the new DMA API are fairly minimal:

- Allocate and free descriptor rings and similar coherent structures
  using dma_alloc()/dma_free() rather than malloc_phys()/free_phys()

- Allocate and free receive buffers using alloc_rx_iob()/free_rx_iob()
  rather than alloc_iob()/free_iob()

- Calculate DMA addresses using dma() or iob_dma() rather than
  virt_to_bus()

- Set a 64-bit DMA mask if needed using dma_set_mask_64bit() and
  thereafter eliminate checks on DMA address ranges

- Either record the DMA device in netdev->dma, or call iob_map_tx() as
  part of the transmit() method

- Ensure that debug messages use virt_to_phys() when displaying
  "hardware" addresses

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-28 20:26:28 +00:00
Michael Brown
70e6e83243 [dma] Record DMA device as part of DMA mapping if needed
Allow for dma_unmap() to be called by code other than the DMA device
driver itself.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-28 18:56:50 +00:00
Michael Brown
cf12a41703 [dma] Modify DMA API to simplify calculation of medial addresses
Redefine the value stored within a DMA mapping to be the offset
between physical addresses and DMA addresses within the mapped region.

Provide a dma() wrapper function to calculate the DMA address for any
pointer within a mapped region, thereby simplifying the use cases when
a device needs to be given addresses other than the region start
address.

On a platform using the "flat" DMA implementation the DMA offset for
any mapped region is always zero, with the result that dma_map() can
be optimised away completely and dma() reduces to a straightforward
call to virt_to_phys().

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-25 16:15:55 +00:00
Michael Brown
24ef743778 [intelxl] Configure DMA mask as 64-bit
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-24 17:47:42 +00:00
Michael Brown
9e280aecb7 [intel] Configure DMA mask as 64-bit
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-24 17:46:39 +00:00
Michael Brown
03314e8da9 [intelxl] Update driver to use DMA API
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-21 13:35:11 +00:00
Michael Brown
76a7bfe939 [intelxl] Read PCI bus:dev.fn number from PFFUNC_RID register
For the physical function driver, the transmit queue needs to be
configured to be associated with the relevant physical function
number.  This is currently obtained from the bus:dev.fn address of the
underlying PCI device.

In the case of a virtual machine using the physical function via PCI
passthrough, the PCI bus:dev.fn address within the virtual machine is
unrelated to the real physical function number.  Such a function will
typically be presented to the virtual machine as a single-function
device.  The function number extracted from the PCI bus:dev.fn address
will therefore always be zero.

Fix by reading from the Function Requester ID Information Register,
which always returns the real PCI bus:dev.fn address as used by the
physical host.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-21 13:35:11 +00:00
Michael Brown
b6eb17cbd7 [intelxl] Read MAC address from PRTPM_SA[HL] instead of PRTGL_SA[HL]
The datasheet is fairly incomprehensible in terms of identifying the
appropriate MAC address for use by the physical function driver.
Choose to read the MAC address from PRTPM_SAH and PRTPM_SAL, which at
least matches the MAC address as selected by the Linux i40e driver.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-20 19:15:30 +00:00
Michael Brown
062711f1cf [intel] Use physical addresses in debug messages
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-16 15:07:03 +00:00
Michael Brown
810dc5d6c3 [realtek] Use physical addresses in debug messages
Physical addresses in debug messages are more meaningful from an
end-user perspective than potentially IOMMU-mapped I/O virtual
addresses, and have the advantage of being calculable without access
to the original DMA mapping entry (e.g. when displaying an address for
a single failed completion within a descriptor ring).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-16 14:58:57 +00:00
Michael Brown
fc5cf18dab [efi] Use casts rather than virt_to_bus() for UNDI buffer addresses
For a software UNDI, the addresses in PXE_CPB_TRANSMIT.FrameAddr and
PXE_CPB_RECEIVE.BufferAddr are host addresses, not bus addresses.

Remove the spurious (and no-op) use of virt_to_bus() and replace with
a cast via intptr_t.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-15 23:36:17 +00:00
Michael Brown
83b8c0e211 [efi] Do not populate media header length in PXE transmit CPB
The UEFI specification defines PXE_CPB_TRANSMIT.DataLen as excluding
the length of the media header.  iPXE currently fills in DataLen as
the whole frame length (including the media header), along with
placing the media header length separately in MediaheaderLen.  On some
UNDI implementations (observed using a VMware ESXi 7.0b virtual
machine), this causes transmitted packets to include 14 bytes of
trailing garbage.

Match the behaviour of the EDK2 SnpDxe driver, which fills in DataLen
as the whole frame length (including the media header) and leaves
MediaheaderLen as zero.  This behaviour also violates the UEFI
specification, but is likely to work in practice since EDK2 is the
reference implementation.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-15 23:17:17 +00:00
Michael Brown
5439329c99 [intel] Update driver to use DMA API
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-13 19:55:22 +00:00
Michael Brown
580d9b00da [realtek] Update driver to use DMA API
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-05 20:18:29 +00:00
Michael Brown
be1c87b722 [malloc] Rename malloc_dma() to malloc_phys()
The malloc_dma() function allocates memory with specified physical
alignment, and is typically (though not exclusively) used to allocate
memory for DMA.

Rename to malloc_phys() to more closely match the functionality, and
to create name space for functions that specifically allocate and map
DMA-capable buffers.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-05 19:13:52 +00:00
Michael Brown
f560e7b70b [realtek] Reset NIC when closing interface if using legacy mode
The legacy transmit descriptor index is not reset by anything short of
a full device reset.  This can cause the legacy transmit ring to stall
after closing and reopening the device, since the hardware and
software indices will be out of sync.

Fix by performing a reset after closing the interface.  Do this only
if operating in legacy mode, since in C+ mode the reset is not
required and would undesirably clear additional state (such as the C+
command register itself).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2020-11-04 14:35:19 +00:00