Skip to content

Verify VIF limit can be reached #338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Verify VIF limit can be reached #338

wants to merge 3 commits into from

Conversation

last-genius
Copy link

Adds a new test verifying xcp-ng can reach the new suggested limit of 16 VIFs (also tracking performance in a basic way).

Draft until the upstream xapi PR raising the limit gets merged.

This allows avoiding a subsequent vif-list or similar to immediately
operate on the VIF

Signed-off-by: Andrii Sultanov <[email protected]>
@stormi
Copy link
Member

stormi commented Jul 17, 2025

Ping @ydirson and @bleader. This is a draft PR not because it's not ready but because the feature is not released yet. Let's not leave Andriy waiting for (one of?) his first test addition(s) to xcp-ng-tests.

Comment on lines +29 to +34
vm.param_set('VCPUs-max', vcpus)
vm.param_set('VCPUs-at-startup', vcpus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a permanent change done to the VM, we likely don't want that when it is not a throwaway VM. I'd think this is a good candidate for a fixture.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the vm is destroyed at the end

Comment on lines +32 to +38
logging.info('Create VIFs before starting the VM')
for i in range(existing_vifs, vif_limit):
vm.create_vif(i, network_uuid=network_uuid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly those should be disposed in cleanup phase

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the vm is destroyed, these are cleaned up as well.

Copy link
Member

@stormi stormi Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though we currently give module scope to VM fixtures (so they are indeed destroyed between tests of different modules), we strive to always leave resources in the state where we found them.

There are exceptions:

  • tests and fixtures depending on higher level fixtures which create throw-away VM clones for the tests.
  • tests and fixtures depending on higher level fixtures which creates snapshots and revert to them after the tests.

Comment on lines 46 to 63
logging.info('Configure interfaces')
config = '\n'.join([f'iface {interface_name}{i} inet dhcp\n'
f'auto {interface_name}{i}'
for i in range(1, vif_limit)])
vm.ssh([f'echo "{config}" >> /etc/network/interfaces'])

logging.info('Install iperf3 on VM and host')
if vm.ssh_with_result(['apt install iperf3 --assume-yes']).returncode != 0:
assert False, "Failed to install iperf3 on the VM"
host.yum_install(['iperf3'])

logging.info('Reconfigure VM networking')
if vm.ssh_with_result(['systemctl restart networking']).returncode != 0:
assert False, "Failed to configure networking"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid to put anything between "Configure interfaces" and 'Reconfigure VM networking', but modifying the VM config does not seem right, why not simply call ip commands to bring the ifaces up?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need each interface to get a distinct ipv4 address, otherwise multiple iperf clients won't run in parallel on different interfaces, i thought this was the easiest way to do this (calling ssh in a loop would also take a very long time, so the script is much faster)

# There is a ResourceWarning due to background=True on an ssh call
# We do ensure the processes are killed
@pytest.mark.filterwarnings("ignore::ResourceWarning")
@pytest.mark.debian_uefi_vm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need the mark to be declared in this commit already.

@stormi I'm not yet 100% uptospeed with how VM selection work, but that pattern reminds me something - is it even going to work as expected?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i'll reorder the commits

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, regarding VM selection, that's the first time we'd have a marker that asks for a specific distro. But that's not the first time that we have a test that wants a Unix VM with requirements not met by alpine (see vtpm tests).

IMO:

  • any requirements on the VM should be expressed in a fixture, that checks that the VM provided to the test is suitable. A marker doesn't do this. We could have a debian_vm fixture if we really need a debian VM, but we avoid writing tests for a given distro as much as possible. I'd rather have a more generic fixture, like "unix_vm_with_whatever_criteria_let_us_know_that_the_test_should_pass_with_it". That's something that the vtpm test does poorly, by the way (checks are in the test there, not in a fixture where they would belong).
  • If necessary for filtering tests in jobs, we can automatically generate markers based on fixture requirements. Check the unix_vm or windows_vm markers, for example. Here we could have a unix_vm_with_whatever_criteria_let_us_know_that_the_test_should_pass_with_it.
  • There does not need to be an exact match between VM identifiers defined in jobs.py (such as the existing debian_uefi_vm). We first express the requirements in the test, and then see how we can use the test in jobs.py. Here, making the job use debian_uefi_vm seems appropriate, but that's not the only solution we could choose. We could also want to run the test with a collection of OSes (various unixes, windows...) so that the test not only checks XAPI's and Xen's handling of the VIF limit, but also the OSes and PV drivers. Maybe @dinhngtu will have an opinion on this.

Another related topic: should the fixture checking the requirements make the test fail, or skip?

In most cases, we'll want the test to fail. It indicates that we called the test with a VM that is not appropriate. However, if we want to include the test in a "multi" job, one that runs on many different VMs, and if the test can run with most unix VMs (for one version of the test) and windows VMs (for another version of the test) then skipping just for the few VMs that are not suitable (alpine...) could be a way. There's also the possibility to reuse existing VM collections that already exclude alpine in jobs.py/vm_data.py, such as the ones we use for testing guest tools (currently named tools_unix and tools_windows, but if we can define a common set of selection criteria, the name could change.

It's all a question of compromise between being specific and limiting the amount of jobs and different VM collections in CI.

(Another side note: in the future I'd like skipped tests to be considered as failures in Jenkins, unless they belong to a list of expected skips).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit lost on the PR's context. Is the per-image limitation coming from the guest kernel or something else? If it's something not statically-defined by the image itself, I think runtime detection + pytest.skip would be more appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question for you was mostly whether you'd find useful to test Windows VMs with up to 16 VIFs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it'd be useful.

# There is a ResourceWarning due to background=True on an ssh call
# We do ensure the processes are killed
@pytest.mark.filterwarnings("ignore::ResourceWarning")
@pytest.mark.debian_uefi_vm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why specifically UEFI?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only thing this really needs to express is "non-alpine", preferably debian because some other distros might have different naming schemes for interfaces. uefi is not necessary

Currently the test verifies that interfaces have been created correctly
(that there were enough grant frames to allocate all the queues), and
simply prints performance metrics. This should be integrated with
whatever database we are going to use for performance monitoring in the
future.

Signed-off-by: Andrii Sultanov <[email protected]>
github-merge-queue bot pushed a commit to xapi-project/xen-api that referenced this pull request Aug 11, 2025
…ain creation (#6577)

xenopsd currently hardcodes 64 as the value for `max_grant_frames` for
all domains. This limits how many grants a domain can allocate, thus
limiting the number of VIFs and VBDs a domain can have.
`max_grant_frames` can be changed for individual VMs through specifying
a particular value in `platform`, but it's much better to be able to
estimate how many grant frames a VM would need based on the number of
VBDs and VIFs.

Implement this and add some notes on the difficulties and imprecisions
of such an estimation.

This allows raising the number of supported VIFs from the current limit
of 7 - we've picked 16 as a sweet spot, though this could be discussed
further. (note that it's the `allowed_vifs` limit that's changed, VIFs
can be created in arbitrary numbers on the CLI and by clients not
honouring the `allowed_vifs` advice).

Given the current behaviour of the XenServer/XCP-ng system
(hypervisor+drivers), where more VIFs allow for higher overall
networking throughput, this is highly beneficial - in testing overall
throughput with 16 VIFs was 18-27% higher than with 8 VIFs (tested with
multiple iperf3 instances running on all interfaces simultaneously). The
following table shows the performance per-VIF and per-VM on a host with
16 pcpus with 8 vCPUs for domU and dom0 each:

![image](https://github.com/user-attachments/assets/f61488bf-90f0-4cbd-8896-176e053f708f)

Moreover, some users coming from VMWare are used to networking setups
with dozens of VIFs, and this is a step towards allowing that without
encountering any other bottlenecks in the system.

Most of this work (except the last commit) was tested very thoroughly at
XenServer, since it was initially intended to raise the supported limit
of VBDs (it passed Ring3 BST+BVT multiple times, and ran through the
config limits test several times). Raising the number of supported VBDs
has other issues so was abandoned, but this PR solves all the issues for
VIFs.
The new limit was tested with XCP-ng, with a new test being added to our
CI (xcp-ng/xcp-ng-tests#338), which will verify
the ability to hit the new limit and perform well at it.
@last-genius
Copy link
Author

The upstream PR was merged: xapi-project/xen-api#6577

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants