The existing virtio network driver has been somewhat hacked together
over the past two decades by multiple contributors, and includes a
substantial amount of logic that is almost but not quite duplicated
between the "legacy" and "modern" code paths.
Rip out the existing driver and replace with a completely new driver
written based on the Virtual I/O Device specification document, not
derived from the Linux kernel driver.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 3d43789 ("[lacp] Detect and ignore erroneously looped back LACP
packets") added protection against LACP packet storms that arise when
our own transmitted packets are somehow looped back to the same port,
but does not protect against a situation in which we have two
different ports that are externally bridged to each other.
This situation is unlikely to arise in practice since a properly
configured link partner should not be both sending and forwarding LACP
packets. Triggering this situation essentially requires our two ports
to be connected to a non-LACP-capable switch, while another port on
the same switch is connected to a separate device that is sending out
LACP packets.
Guard against this situation by using the MAC address of the first
network device as the LACP system identifier, thereby allowing the
loopback detection to reject any packets that were sent from any of
our ports.
Since the system identifier is no longer unique between ports, use the
guaranteed-unique network device scope ID as the group key to indicate
that we do not support aggregation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The RSA-PSS signature scheme is crowbarred somewhat awkwardly into TLS
version 1.2. Certificates with the standard rsaEncryption OID in the
public key may be used with either PKCS#1 or RSA-PSS, which breaks the
straightforward mapping between the OID and the signature algorithm.
Extend the definition of a TLS signature hash algorithm to include a
required OID-identified algorithm in the certificate's public key.
This allows us to define signature schemes such as rsa_pss_rsae_sha256
where the signature scheme uses an algorithm that differs from the
algorithm identified in the certificate's public key.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for the RSA-PSS signature scheme as defined in RFC 8017
and required for TLS version 1.3.
Signature verification is deliberately implemented by first deriving
the salt value and then reconstructing the entire expected signature.
This is arguably inefficient since it involves two invocations of the
mask generation function when only one is required. However, this
implementation approach keeps the code size minimal (since there is no
need to implement separate verification logic), and makes it provably
impossible to accidentally omit a verification step (such as checking
the leading zero bits or the fixed 0x01 or 0xbc bytes). Since
signature verification is not a fast-path operation, the guaranteed
correctness is more valuable than a marginally faster execution.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The RSA-PSS signature scheme has the same basic structure as the
existing PKCS#1 signature scheme, with a difference only in how the
digest value is encoded before being enciphered.
Abstract out the digest encoding from the signature and verification
methods, and add an explicit "pkcs1" to the relevant method names.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Make the public key self-tests fully deterministic by temporarily
overriding the function used to obtain random data for RSA encryption.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Modify bnxt_hwrm_run() to accept a flag indicating whether to abort
immediately upon a command failure. During initialization path,
driver will continue to abort on first error. During teardown,
sequence will continue executing subsequent cleanup commands even if
one fails. This ensures a best-effort cleanup.
Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
Enhance code readability in the completion queue servicing logic to
use explicit function calls per case statement, rather than falling
through to the next statement. Add debug print in ring allocation
path. Fix typo in PCI ROM entry.
Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
The regparm function attribute is meaningful only for i386, not for
x86_64, and is reported as a build error by GCC 16.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The regparm function attribute is meaningful only for i386, not for
x86_64, and is reported as a build error by GCC 16.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The EFI device path settings are currently registered as the
"netX.dhcp" settings block, in order that they will be automatically
overridden if a real DHCP configuration takes place. This does not
work as expected in an IPv6-only network, since the IPv6 configurator
will register "netX.ndp" rather than "netX.dhcp".
Fix by registering the EFI device path settings as either "netX.dhcp"
or "netX.ndp" based on the first address family encountered within the
device path.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
RFC 5246 defines the signature_algorithm extension values for TLS
version 1.2 as being tuples of {HashAlgorithm, SignatureAlgorithm}
pairs. RFC 8446 redefines the signature_algorithm extension values
for TLS version 1.3 in a backwards-compatible way as opaque 16-bit
SignatureScheme values, and RFC 8447 updates RFC 5246 to allow these
values to be used with TLS version 1.2.
Redefine our concept of a signature algorithm identifier to remove the
internal structure that no longer exists.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The null crypto algorithms are intended to do nothing: the null digest
algorithm accepts all input and generates a zero-length digest, and
the null cipher algorithm simply copies the input unmodifed to the
output.
The null public-key algorithm currently does nothing successfully.
Unlike the null digest and cipher algorithms, the null public-key
algorithm's methods are never called.
Change the null public-key algorithm to fail all operations, thereby
allowing its methods to be used as stubs by algorithms such as ECDSA
that do not implement all of the possible public-key operations.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The != operator has higher precedence than = in C, so the expressions:
rc = imgacquire ( ..., image ) != 0
are parsed as:
rc = ( imgacquire ( ..., image ) != 0 )
This assigns the boolean result (0 or 1) to rc instead of the actual
return code from imgacquire(). As a result, strerror(rc) reports an
incorrect error message when debugging is enabled.
Add parentheses around each assignment to ensure rc captures the
actual return value, matching the pattern already used in
efi_autoexec_filesystem() within the same file.
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for the HMAC-based Extract-and-Expand Key Derivation
Function (HKDF) as used in TLS version 1.3 and defined in RFC 5869.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit 988243c ("[virtio] Add virtio-net 1.0 support") erroneously
placed the code to unmap the device regions before the code to
unregister the network device. In the common case that the network
device is still open at the time that we shut down to boot the OS,
this results in the regions being accessed after having been unmapped.
For 32-bit BIOS or for UEFI with no IOMMU enabled, the iounmap()
operation is a no-op and so the driver still happens to work despite
the ordering bug. For 64-bit BIOS or for UEFI with an IOMMU enabled,
the iounmap() operation is not a no-op, and the driver will trigger a
page fault.
Fix by moving the call to unregister_netdev() to before the code that
unmaps the device regions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The unused RX I/O buffers are currently freed without being deleted
from the list, with the list head being reinitialised only after all
buffers have been deleted. This triggers assertion failures due to
the list integrity checks when debugging is enabled.
Fix by deleting each buffer individually, so that the list structure
remains valid at all times.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit b9d68b9 ("[ethernet] Use standard 1500 byte MTU unless
explicitly overridden") added code to explicitly set the MTU for
virtio-net devices, but only on the legacy probe path.
Make the behaviour consistent by setting the MTU on both legacy and
modern probe paths.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Alibaba Cloud will refuse to use images for some instance types unless
the image is explicitly marked as supporting NVMe disks.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The underlying snapshots are not automatically deleted along with the
image, and there is no flag that can be set to cause them to be
automatically deleted.
Tag the underlying snapshots for deletion before deleting the image,
delete the image, and then delete any such tagged snapshots (including
any that may remain from a previous failed deletion attempt).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a workflow to build and import the official iPXE images for
Alibaba Cloud. As with the AWS and Google Cloud imports, treat this
as a workflow that must be triggered manually.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Experimentation suggests Alibaba Cloud API calls are extremely
unreliable, with a failure rate around 1%. It is therefore necessary
to allow for retrying basically every API call.
Some API calls (e.g. DescribeImages or ModifyImageAttribute) are
naturally idempotent and so safe to retry. Some non-idempotent API
calls (e.g. CopyImage) support explicit idempotence tokens. The
remaining API calls may simply fail on a retry, if the original
request happened to succeed but failed to return a response.
We could write convoluted retry logic around the non-idempotent calls,
but this would substantially increase the complexity of the already
unnecessarily complex code. For now, we assume that retrying
non-idempotent requests is probably more likely to fix transient
failures than to cause additional problems.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The CopyImage API call does work, but is unacceptably slow due to rate
limiting. Importing a full set of images to all regions can take
several hours (and is likely to fail at some point due to transient
errors in making API calls).
Resort to a mixture of strategies to get images imported to all
regions:
- For regions with working OSS that are not blocked by Chinese state
censorship laws, upload the image files to an OSS bucket and then
import the images.
- For regions with working OSS that are blocked by Chinese state
censorship laws but that have working FC, use a temporary FC
function to copy the image files from the uncensored OSS buckets
and then import the images. Attempt downloads from a variety of
uncensored buckets, since cross-region OSS traffic tends to
experience a failure rate of around 10% of requests.
- For regions that have working OSS but are blocked by Chinese state
censorship laws and do not have working FC, or for regions that
don't even have working OSS, resort to using CopyImage to copy the
previously imported images from another region. Spread the
imports across as many source regions as possible to minimise the
effect of the CopyImage rate limiting.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Spinning up ECS instances is supported in all ECS regions (unlike
Function Compute), but turns out to be unacceptably unreliable since
Alibaba Cloud has a very irritating tendency to fail to launch ECS
instances for a variety of spurious and unpredictable reasons.
Rewrite the censorship bypass mechanism to use the (extremely slow)
CopyImage API call to copy an imported image from an uncensored region
to a censored region.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Function Compute is unsupported in several Alibaba Cloud regions.
Rewrite the censorship bypass mechanism to access OSS buckets using a
temporary ECS instance instead of a temporary Function Compute
function.
Importing images now requires that the account has been prepared using
the "ali-setup" script, which creates the necessary role, VPCs, and
vSwitches to allow ECS instances to be launched in each region.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Importing images into Alibaba Cloud currently relies upon using a
temporary Function Compute function to work around Chinese state
censorship laws that prevent direct access to OSS bucket contents in
mainland China regions.
Unfortunately, Alibaba Cloud regions are extremely asymmetric in terms
of feature support. (For example, some regions do not even support
IPv6 networking.) Several mainland China regions do not support
Function Compute, and so this workaround is not available for those
regions.
A possible alternative censorship workaround is to create temporary
ECS virtual machine instances instead of temporary Function Compute
functions. This requires the existence of a role that can be used by
ECS instances to access OSS. We cannot use the AliyunFcDefaultRole
that is currently used by Function Compute, since this role cannot be
assumed by ECS instances.
Creating roles is a privileged operation, and it would be sensible to
assume that the image importer (which may be running as part of a
GitHub Actions workflow) may not have permission to itself create a
suitable temporary role. The censorship bypass role must therefore be
set up once in advance by a suitably privileged user.
Add the ability to create a suitable censorship bypass role to the
Alibaba Cloud setup utility.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Creating ad hoc instances in Alibaba Cloud is extremely cumbersome and
tedious due to the need to specify an explicit vSwitch and security
group, with no defaults being available.
Add a utility that will create a VPC within each region, a vSwitch
within each zone within each region, and a security group within each
region.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Update the descriptive text for the disk log console tools to remove
references to INT13, since these now work for both BIOS and UEFI disk
log consoles.
Leave the script names as {aws,gce,ali}-int13con, to avoid breaking
any existing tooling that might use these names.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow the UEFI CPU architecture to be detected for the partitioned
disk images generated by genfsimg as of commit 2c84b68 ("[build] Use a
partition table in generated USB disk images").
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add support for a disk log partition console, using the same on-disk
structures as for the BIOS INT13 console.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Split out the generic portions of the INT13 disk log console support
to a separate file that can be shared between BIOS and UEFI platforms.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The name "int13" is intrinsically specific to a BIOS environment.
Generalise the build configuration option CONSOLE_INT13 to
CONSOLE_DISKLOG, in preparation for adding EFI disk log console
support.
Existing configurations using CONSOLE_INT13 will continue to work.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The workaround used for UEFI in commit 926816c ("[efi] Pad transmit
buffer length to work around vendor driver bugs") is also applicable
to the BIOS UNDI driver.
Apply the same workaround of padding the transmit I/O buffers to the
minimum Ethernet frame length before passing them to the underlying
UNDI driver's transmit function.
Reported-by: Alexander Patrakov <patrakov@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Commit cb95b5b ("[efi] Veto the Dhcp6Dxe driver on all platforms")
vetoed the Dhcp6Dxe driver to work around the bug described at
https://github.com/tianocore/edk2/issues/10506 that results in
EfiDhcp6Stop() getting stuck in a tight loop waiting for an event that
will never occur.
Since we now call UnloadImage() at TPL_APPLICATION, we no longer
trigger the bug in Dhcp6Dxe, and so the veto may be removed.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
As of commit c3376f8 ("[efi] Drop to external TPL for calls to
ConnectController()"), the veto mechanism will drop to TPL_APPLICATION
for calls to DisconnectController().
Match this behaviour for calls to UnloadImage(), since that is likely
to result in calls to DisconnectController(). For example, any EDK2
driver using NetLibDefaultUnload() as its unload handler will call
DisconnectController() to disconnect itself from all handles.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
On Ubuntu/Debian, syslinux-common installs mbr.bin to
/usr/lib/syslinux/mbr/mbr.bin. This path is not currently searched by
find_syslinux_file(), causing USB disk image generation to fail with
"could not find mbr.bin".
Add /usr/lib/syslinux/mbr, /usr/share/syslinux/mbr, and
/usr/local/share/syslinux/mbr to the search paths.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
For UEFI, the USB disk image is constructed from the built EFI binary
(e.g. bin-x86_64-efi/ipxe.efi) by genfsimg, which does not itself have
any way to access the build configuration. We therefore need a way to
annotate the binary such that genfsimg can determine whether or not to
include a log partition within the USB disk image.
The "OEM ID" and "OEM information" fields within the PE header can be
used for this, since they are easily accessed and serve no other
purpose. We define bit 0 of "OEM information" as a flag indicating
that a log partition should be included. If this bit is set, genfsimg
will create a log partition with a layout matching that of the BIOS
build (i.e. using partition 3 and at an offset of 16kB from the start
of the disk).
The PE header is constructed by elf2efi.c, which takes as an input the
linked ELF form of the binary. We use an ELF .note section to allow
any linked-in object to communicate the log partition request through
to elf2efi.c, which then populates the OEM information field
accordingly.
We choose to use the same field locations within the BIOS bzImage
header, since this allows genfsimg to use the same logic for both BIOS
and UEFI binaries. In a BIOS build, there is no external processing
equivalent to elf2efi.c, and so we construct the field value directly
using absolute symbols and explicit relocation records.
(Note that the bzImage header is relevant only when using genfsimg to
construct a combined BIOS/UEFI image. In the common case of building
a BIOS-only image such as bin/ipxe.usb, the partition table is
manually constructed by usbdisk.S and genfsimg is not involved.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The syslinux function check_fat_bootsect() performs some sanity checks
to ensure that the filesystem type string (e.g. "FAT12") is correct
for the total number of clusters in the FAT. There is unfortunately a
bug in its calculation of the number of sectors occupied by the root
directory, which causes it to underestimate the number of sectors by a
factor of 32.
When the total number of clusters is close to the FAT12 limit of 4096,
this bug can cause syslinux to erroneously report that the filesystem
has "more than 4084 clusters but claims FAT12".
Work around this bug by selecting an explicit cluster size in order to
avoid potentially problematic cluster counts. We default to using 4kB
clusters, doubling to 8kB if using 4kB would result in a total cluster
count near 4096 (the FAT12 limit) or near 65536 (the FAT16 limit).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The calculations around the FAT filesystem layout currently use a
mixture of kilobytes and sector counts. Switch to using sector counts
throughout the calculation, to make the code easier to read.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The USB disk image constructed by util/genfsimg is currently a raw FAT
filesystem, with no containing partition. This makes it incompatible
with the use of CONSOLE_INT13, since there is no way to add a
dedicated log partition without a partition table.
Add a partition table when building a non-ISO image, using the mbr.bin
provided by syslinux (since we are already using syslinux to invoke
the ipxe.lkrn within the FAT filesystem).
The BIOS .usb targets are built using a manually constructed partition
table with C/H/S geometry x/64/32. Match this geometry to minimise
the differences between genfsimg and non-genfsimg USB disk images.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We use mformat to ensure that the FAT filesystem starts as empty.
However, formatting the filesystem can still leave old data blocks
present (though unreferenced) within the disk image.
Truncate the image to a zero length before extending, to ensure that
no stale content is retained.
Signed-off-by: Michael Brown <mcb30@ipxe.org>