Commit Graph

1648 Commits

Author SHA1 Message Date
Michael Brown ca85200809 [virtio] Replace the virtio core and network device driver
The existing virtio network driver has been somewhat hacked together
over the past two decades by multiple contributors, and includes a
substantial amount of logic that is almost but not quite duplicated
between the "legacy" and "modern" code paths.

Rip out the existing driver and replace with a completely new driver
written based on the Virtual I/O Device specification document, not
derived from the Linux kernel driver.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-05-13 15:32:17 +01:00
Michael Brown ab9d7b0067 [pci] Provide pci_bar_is_io() to determine BAR type
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-05-12 20:50:18 +01:00
Joseph Wong ae8defc279 [bnxt] Do not abort teardown on command failure
Modify bnxt_hwrm_run() to accept a flag indicating whether to abort
immediately upon a command failure.  During initialization path,
driver will continue to abort on first error.  During teardown,
sequence will continue executing subsequent cleanup commands even if
one fails.  This ensures a best-effort cleanup.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2026-05-01 17:09:32 +01:00
Joseph Wong 822d4b1437 [bnxt] Improve code readability and debug output
Enhance code readability in the completion queue servicing logic to
use explicit function calls per case statement, rather than falling
through to the next statement.  Add debug print in ring allocation
path.  Fix typo in PCI ROM entry.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2026-05-01 16:54:46 +01:00
Michael Brown 2d28657ef6 [w89c840] Fix build warnings with GCC 16
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-05-01 13:56:32 +01:00
Christian I. Nilsson 3d1a20eacd [intel] Add PCI ID for I219-V and -LM existing of 18-29
Signed-off-by: Christian I. Nilsson <nikize@gmail.com>
2026-04-30 14:43:35 +01:00
Michael Brown 295f3bed20 [virtio] Ensure that device is closed before unmapping regions
Commit 988243c ("[virtio] Add virtio-net 1.0 support") erroneously
placed the code to unmap the device regions before the code to
unregister the network device.  In the common case that the network
device is still open at the time that we shut down to boot the OS,
this results in the regions being accessed after having been unmapped.

For 32-bit BIOS or for UEFI with no IOMMU enabled, the iounmap()
operation is a no-op and so the driver still happens to work despite
the ordering bug.  For 64-bit BIOS or for UEFI with an IOMMU enabled,
the iounmap() operation is not a no-op, and the driver will trigger a
page fault.

Fix by moving the call to unregister_netdev() to before the code that
unmaps the device regions.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-04-23 23:10:24 +01:00
Michael Brown 1c54e7e8a4 [virtio] Fix assertion failures when interface is closed
The unused RX I/O buffers are currently freed without being deleted
from the list, with the list head being reinitialised only after all
buffers have been deleted.  This triggers assertion failures due to
the list integrity checks when debugging is enabled.

Fix by deleting each buffer individually, so that the list structure
remains valid at all times.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-04-23 14:57:20 +01:00
Michael Brown 2b2580f78f [virtio] Set MTU for both modern and legacy devices
Commit b9d68b9 ("[ethernet] Use standard 1500 byte MTU unless
explicitly overridden") added code to explicitly set the MTU for
virtio-net devices, but only on the legacy probe path.

Make the behaviour consistent by setting the MTU on both legacy and
modern probe paths.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-04-23 14:12:53 +01:00
Joseph Wong 619b1db1b9 [bnxt] Update conditions for invoking short commands
Include additional condition to invoke short command logic when
firmware indicates it is required.  Replace 100ms delay with wmb() to
ensure DMA buffer is ready when short command is invoked.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2026-03-19 13:09:25 +00:00
Joseph Wong ad748f0d92 [bnxt] Update link speed definitions
Add new link speed definitions and remove unused D3 Flow Control
definitions.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2026-02-25 17:46:10 +00:00
Joseph Wong f0ceb70cb9 [bnxt] Fix memory leak in probe()
Fix potential memory leak in probe() if initialization fails after
HWRM memory has been allocated.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2026-02-24 10:03:49 +00:00
Joseph Wong a6d393ecc8 [bnxt] Skip unnecessary calls for VFs
Add a check for VFs in HWRM backing store related functions to return
immediately as these function are not needed.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2026-02-24 09:58:38 +00:00
Joseph Wong 1eb571cef4 [bnxt] Remove access of deprecated link speed variables
Remove access of deprecated link speed variables for 5750x devices.
Update test flag to include CHIP_P5_PLUS when excluding access of
certain NVM variables.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2026-02-23 12:58:44 +00:00
Joseph Wong a5e4bb98bf [bnxt] Fix typo in function declaration
Fix typo in function declaration.  Duplicate declaration of
bnxt_adv_cq_index().  Modified to include function declaration for
bnxt_adv_nq_index().

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2026-02-23 12:52:55 +00:00
Joseph Wong df5957ccc9 [bnxt] Fix coding style
Ensure whitespace and indentation adhere to iPXE coding standards.
Fix vertical alignment of multi-line function calls.

No functional changes.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2026-02-23 12:33:45 +00:00
Joseph Wong 9d6831bb07 [bnxt] Correct port index usage
Use port index value retrieved from the firmware when calling
bnxt_hwrm_queue_qportcfg() to retrieve the queue_id.  This function
is available for all devices.

Signed-off-by: Joseph Wong <joseph.wong@broadcom.com>
2026-02-23 12:27:53 +00:00
Michael Brown ae8e23a452 [build] Handle all driver list construction via parserom.pl
Handle construction of the EFI, Linux, Xen, and VMBus driver build
rules via parserom.pl to ensure consistency.  In particular, this
allows those drivers to appear in the DRIVERS_SECBOOT list used to
filter out non-permitted drivers in a Secure Boot build.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-02-13 14:16:44 +00:00
Michael Brown 81da1a1b6c [dt] Add DT_ROM() and DT_ID() macros
Add DT_ROM() and DT_ID() macros following the pattern for PCI_ROM()
and PCI_ID(), to allow for the possibility of including devicetree
network devices within the "all-drivers" build of iPXE.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-02-12 13:29:06 +00:00
Michael Brown 6e56f7ff25 [linux] Remove unused can_probe field from driver definition
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-02-12 12:50:18 +00:00
Michael Brown 1523512198 [build] Allow PCI_ROM() and ISA_ROM() to span multiple lines
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-02-11 16:28:41 +00:00
Michael Brown 4d6c8ab443 [usb] Add USB_ROM() and USB_ID() macros
Add USB_ROM() and USB_ID() macros following the pattern for PCI_ROM()
and PCI_ID(), to allow for the possibility of including USB network
devices within the "all-drivers" build of iPXE.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-02-11 16:07:12 +00:00
Michael Brown 95e756569a [pci] Ignore invalid subordinate bus numbers
Some systems (observed on a Dell C6615) fail to correctly populate the
subordinate PCI bus number on some PCI bridges.  We do not currently
guard against this behaviour, causing us to subsequently scan through
a huge expanse of the PCI bus:dev.fn address range.

Fix by ignoring the subordinate bus number if it is lower than the
bridge's own bus number.

Reported-by: Anisse Astier <an.astier@criteo.com>
Reported-by: Ahmad Mahagna <ahmhad@nvidia.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-02-05 12:09:59 +00:00
Michael Brown 03a906a9f3 [build] Mark Realtek driver as permitted for UEFI Secure Boot
The Realtek driver and its dependencies are cleanly structured, easy
to review, directly maintained, and very well tested.  Review these
files and mark them as permitted for UEFI Secure Boot.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-01-28 13:04:07 +00:00
Michael Brown e31dc79d40 [build] Mark EFI SNP/MNP driver wrappers as permitted for UEFI Secure Boot
The EFI SNP/MNP driver wrapper is a trivial layer that exists only to
allow for the separation of "snponly.efi" as a build target.  Review
this trivial wrapper and mark it as permitted for UEFI Secure Boot.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-01-27 16:39:40 +00:00
Michael Brown adcaaf9b93 [build] Mark known reviewed files as permitted for UEFI Secure Boot
Some past security reviews carried out for UEFI Secure Boot signing
submissions have covered specific drivers or functional areas of iPXE.
Mark all of the files comprising these areas as permitted for UEFI
Secure Boot.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-01-14 16:10:29 +00:00
Michael Brown 6cccb3bdc0 [build] Mark core files as permitted for UEFI Secure Boot
Mark all files used in a standard build of bin-x86_64-efi/snponly.efi
as permitted for UEFI Secure Boot.  These files represent the core
functionality of iPXE that is guaranteed to have been included in
every binary that was previously subject to a security review and
signed by Microsoft.  It is therefore legitimate to assume that at
least these files have already been reviewed to the required standard
multiple times.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-01-14 13:25:34 +00:00
Michael Brown 30948987fd [build] Mark existing files as explicitly forbidden for Secure Boot
The third-party 802.11 stack and NFS protocol code are known to
include multiple potential vulnerabilities and are explicitly
forbidden from being included in Secure Boot signed builds.  This is
currently handled at the per-directory level by defining a list of
source directories (SRCDIRS_INSEC) that are to be excluded from Secure
Boot builds.

Annotate all files in these directories with FILE_SECBOOT() to convey
this information to the new per-file Secure Boot permissibility check,
and remove the old separation between SRCDIRS and SRCDIRS_INSEC.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2026-01-13 15:18:16 +00:00
Christian I. Nilsson 3d5cd3d79e [intel] Add PCI ID for I219-V and -LM 24
Signed-off-by: Christian I. Nilsson <nikize@gmail.com>
2025-12-15 21:47:19 +01:00
Michael Brown d4258272c6 [crypto] Construct signatures using ASN.1 builders
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-12-01 16:02:54 +00:00
Bert Ezendam d73981aece [intel] Add PCI IDs for I225 and I226 chipsets
Identifiers are taken from the pci.ids database.

Signed-off-by: Bert Ezendam <bert.ezendam@alliander.com>
2025-11-26 14:14:02 +00:00
Michael Brown 9c1ac48bcf [pci] Allow probing permission to vary by range
Make pci_can_probe() part of the runtime selectable PCI I/O API, and
defer this check to the per-range API.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-11-24 23:16:32 +00:00
Michael Brown ff1a17dc7e [pci] Use linker tables for runtime selectable PCI APIs
Use the linker table mechanism to enumerate the underlying PCI I/O
APIs, to allow PCIAPI_CLOUD to become architecture-independent code.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-11-24 20:54:01 +00:00
Michael Brown 08d4d7fe9d [uart] Make baud rate a property of the UART
Make the current baud rate (if specified) a property of the UART, to
allow the default_serial_console() function to specify the default
baud rate as well as the default UART device.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-11-05 12:18:17 +00:00
Michael Brown fde35ff003 [pci] Disable decoding while setting a BAR value
Setting the base address for a 64-bit BAR requires two separate 32-bit
writes to configuration space, and so will necessarily result in the
BAR temporarily holding an invalid partially written address.

Some hypervisors (observed on an AWS EC2 c7a.medium instance in
eu-west-2) will assume that guests will write BAR values only while
decoding is disabled, and may not rebuild MMIO mappings for the guest
if the BAR registers are written while decoding is enabled.  The
effect of this is that MMIO accesses are not routed through to the
device even though inspection from within the guest shows that every
single PCI configuration register has the correct value.  Writes to
the device will be ignored, and reads will return the all-ones pattern
that typically indicates a nonexistent device.

With the ENA network driver now using low latency transmit queues,
this results in the transmit descriptors being lost (since the MMIO
writes to BAR2 never reach the device), which in turn causes the
device to lock up as soon as the transmit doorbell is rung for the
first time.

Fix by disabling decoding of memory and I/O cycles while setting a BAR
address (as we already do while sizing a BAR), so that the invalid
partial address can never be decoded and so that hypervisors will
rebuild MMIO mappings as expected.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-29 23:30:52 +00:00
Michael Brown 0336e2987c [ena] Leave queue base address empty when creating a low latency queue
The queue base address is meaningless for a low latency queue, since
the queue entries are written directly to the on-device memory.  Any
non-zero queue base address will be safely ignored by the hardware,
but leaves open the possibility that future revisions could treat it
as an error.

Leave this field as zero, to match the behaviour of the Linux driver.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-28 12:27:06 +00:00
Michael Brown b2e8468219 [ena] Limit receive queue size to work around hardware bugs
Commit a801244 ("[ena] Increase receive ring size to 128 entries")
increased the receive ring size to 128 entries (while leaving the fill
level at 16), since using a smaller receive ring caused unexplained
failures on some instance types.

The original hardware bug that resulted in that commit seems to have
been fixed: experiments suggest that the original failure (observed on
a c6i.large instance in eu-west-2) will no longer reproduce when using
a receive ring containing only 16 entries (as was the case prior to
that commit).

Newer generations of the ENA hardware (observed on an m8i.large
instance in eu-south-2) seem to have a new and exciting hardware bug:
these instance types appear to use a hash of the received packet
header to determine which portion of the (out-of-order) receive ring
to use.  If that portion of the ring happens to be empty (e.g. because
only 32 entries of the 128-entry ring are filled at any one time),
then the packet will be silently dropped.

Work around this new hardware bug by reducing the receive ring size
down to the current fill level of 32 entries.  This appears to work on
all current instance types (but has not been exhaustively tested).

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-17 13:25:05 +01:00
Michael Brown 846c505ae9 [ena] Increase transmit queue size to match receive fill level
Avoid running out of transmit descriptors when sending TCP ACKs by
increasing the transmit queue size to match the increased received
fill level.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-17 13:25:05 +01:00
Michael Brown 0ae5e25de2 [ena] Add memory barrier after writing to on-device memory
Ensure that writes to on-device memory have taken place before writing
to the doorbell register.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-17 12:35:23 +01:00
Michael Brown c296747d0e [ena] Increase receive fill level
Experiments suggest that at least some instance types (observed with
c6i.large in eu-west-2) experience high packet drop rates with only 16
receive buffers allocated.  Increase the fill level to 32 buffers.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-16 16:36:29 +01:00
Michael Brown c1badf71ca [ena] Add support for low latency transmit queues
Newer generations of the ENA hardware require the use of low latency
transmit queues, where the submission queues and the initial portion
of the transmitted packet are written to on-device memory via BAR2
instead of being read from host memory.

Detect support for low latency queues and set the placement policy
appropriately.  We attempt the use of low latency queues only if the
device reports that it supports inline headers, 128-byte entries, and
two descriptors prior to the inlined header, on the basis that we
don't care about using low latency queues on older versions of the
hardware since those versions will support normal host memory
submission queues anyway.

We reuse the redundant memory allocated for the submission queue as
the bounce buffer for constructing the descriptors and inlined packet
data, since this avoids needing a separate allocation just for the
bounce buffer.

We construct a metadata submission queue entry prior to the actual
submission queue entry, since experimentation suggests that newer
generations of the hardware require this to be present even though it
conveys no information beyond its own existence.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-16 16:36:29 +01:00
Michael Brown 0d15d7f0a5 [ena] Record supported device features
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-16 16:36:29 +01:00
Michael Brown e5e371f485 [ena] Cancel uncompleted transmit buffers on close
Avoid spurious assertion failures by ensuring that references to
uncompleted transmit buffers are not retained after the device has
been closed.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-16 16:36:29 +01:00
Michael Brown dcc5d36ce5 [ena] Map the on-device memory, if present
Newer generations of the ENA hardware require the use of low latency
transmit queues, where the submission queues and the initial portion
of the transmitted packet are written to on-device memory via BAR2
instead of being read from host memory.

Prepare for this by mapping the on-device memory BAR.  As with the
register BAR, we may need to steal a base address from the upstream
PCI bridge since the BIOS on some instance types (observed with an
m8i.metal-48xl instance in eu-south-2) will fail to assign an address
to the device.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-15 15:55:57 +01:00
Michael Brown 510f3e5e17 [ena] Add descriptive messages for any admin queue command failures
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-15 12:00:42 +01:00
Michael Brown 3538e9c39a [pci] Record prefetchable memory window for PCI bridges
Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-14 18:38:08 +01:00
Michael Brown 04a61c413d [ena] Use pci_bar_set() to place device within bridge memory window
Use pci_bar_set() when we need to set a device base address (on
instance types such as c6i.metal where the BIOS fails to do so), so
that 64-bit BARs will be handled automatically.

This particular issue has so far been observed only on 6th generation
instances.  These use 32-bit BARs, and so the lack of support for
handling 64-bit BARs has not caused any observable issue.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-14 15:57:02 +01:00
Michael Brown 94902ae187 [pci] Handle sizing of 64-bit BARs
Provide pci_bar_set() to handle setting the base address for a
potentially 64-bit BAR, and rewrite pci_bar_size() to correctly handle
sizing of 64-bit BARs.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-14 14:43:50 +01:00
Michael Brown 4f44f62402 [gve] Rearm interrupts unconditionally on every poll
Experimentation suggests that rearming the interrupt once per observed
completion is not sufficient: we still see occasional delays during
which the hardware fails to write out completions.

As described in commit d2e1e59 ("[gve] Use dummy interrupt to trigger
completion writeback in DQO mode"), there is no documentation around
the precise semantics of the interrupt rearming mechanism, and so
experimentation is the only available guide.  Switch to rearming both
TX and RX interrupts unconditionally on every poll, since this
produces better experimental results.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-10 13:12:19 +01:00
Michael Brown f5ca1de738 [gve] Use raw DMA addresses in descriptors in DQO-QPL mode
The DQO-QPL operating mode uses registered queue page lists but still
requires the raw DMA address (rather than the linear offset within the
QPL) to be provided in transmit and receive descriptors.

Set the queue page list base device address appropriately.

Signed-off-by: Michael Brown <mcb30@ipxe.org>
2025-10-10 12:49:26 +01:00