[gve] Allocate all possible event counters

The admin queue API requires us to tell the device how many event
counters we have provided via the "configure device resources" admin
queue command.  There is, of course, absolutely no documentation
indicating how many event counters actually need to be provided.

We require only two event counters: one for the transmit queue, one
for the receive queue.  (The receive queue doesn't seem to actually
make any use of its event counter, but the "create receive queue"
admin queue command will fail if it doesn't have an available event
counter to choose.)

In the absence of any documentation, we currently make the assumption
that allocating and configuring 16 counters (i.e. one whole cacheline)
will be sufficient to allow for the use of two counters.

This assumption turns out to be incorrect.  On larger instance types
(observed with a c3d-standard-16 instance in europe-west4-a), we find
that creating the transmit or receive queues will each fail with a
probability of around 50% with the "failed precondition" error code.

Experimentation suggests that even though the device has accepted our
"configure device resources" command indicating that we are providing
only 16 event counters, it will attempt to choose any of its potential
32 event counters (and will then fail since the event counter that it
unilaterally chose is outside of the agreed range).

Work around this firmware bug by always allocating the maximum number
of event counters supported by the device.  (This requires deferring
the allocation of the event counters until after issuing the "describe
device" command.)

Signed-off-by: Michael Brown <mcb30@ipxe.org>
This commit is contained in:
Michael Brown
2024-09-17 13:11:43 +01:00
parent 9bb2068636
commit 59d123658b
2 changed files with 76 additions and 64 deletions

View File

@@ -50,15 +50,6 @@ struct google_mac {
*/
#define GVE_ALIGN GVE_PAGE_SIZE
/**
* Length alignment
*
* All DMA data structure lengths seem to need to be aligned to a
* multiple of 64 bytes. (This is not documented anywhere, but is
* inferred from existing source code and experimentation.)
*/
#define GVE_LEN_ALIGN 64
/** Configuration BAR */
#define GVE_CFG_BAR PCI_BASE_ADDRESS_0
@@ -350,22 +341,6 @@ struct gve_event {
volatile uint32_t count;
} __attribute__ (( packed ));
/**
* Maximum number of event counters
*
* We tell the device how many event counters we have provided via the
* "configure device resources" admin queue command. The device will
* accept being given only a single counter, but will subsequently
* fail to create a receive queue.
*
* There is, of course, no documentation indicating how may event
* counters actually need to be provided. In the absence of evidence
* to the contrary, assume that 16 counters (i.e. the smallest number
* we can allocate, given the length alignment constraint on
* allocations) will be sufficient.
*/
#define GVE_EVENT_MAX ( GVE_LEN_ALIGN / sizeof ( struct gve_event ) )
/** Event counter array */
struct gve_events {
/** Event counters */