Verify VIF limit can be reached #338

last-genius · 2025-07-07T09:40:19Z

Adds a new test verifying xcp-ng can reach the new suggested limit of 16 VIFs (also tracking performance in a basic way).

Draft until the upstream xapi PR raising the limit gets merged.

This allows avoiding a subsequent vif-list or similar to immediately operate on the VIF Signed-off-by: Andrii Sultanov <[email protected]>

stormi · 2025-07-17T17:04:32Z

Ping @ydirson and @bleader. This is a draft PR not because it's not ready but because the feature is not released yet. Let's not leave Andriy waiting for (one of?) his first test addition(s) to xcp-ng-tests.

tests/limits/test_vif_limit.py

ydirson · 2025-07-18T08:41:48Z

tests/limits/test_vif_limit.py

+        vm.param_set('VCPUs-max', vcpus)
+        vm.param_set('VCPUs-at-startup', vcpus)


This is a permanent change done to the VM, we likely don't want that when it is not a throwaway VM. I'd think this is a good candidate for a fixture.

but the vm is destroyed at the end

ydirson · 2025-07-18T08:42:54Z

tests/limits/test_vif_limit.py

+        logging.info('Create VIFs before starting the VM')
+        for i in range(existing_vifs, vif_limit):
+            vm.create_vif(i, network_uuid=network_uuid)


similarly those should be disposed in cleanup phase

when the vm is destroyed, these are cleaned up as well.

Though we currently give module scope to VM fixtures (so they are indeed destroyed between tests of different modules), we strive to always leave resources in the state where we found them.

There are exceptions:

tests and fixtures depending on higher level fixtures which create throw-away VM clones for the tests.

tests and fixtures depending on higher level fixtures which creates snapshots and revert to them after the tests.

tests/limits/test_vif_limit.py

ydirson · 2025-07-18T08:46:42Z

tests/limits/test_vif_limit.py

+        logging.info('Configure interfaces')
+        config = '\n'.join([f'iface {interface_name}{i} inet dhcp\n'
+                            f'auto {interface_name}{i}'
+                            for i in range(1, vif_limit)])
+        vm.ssh([f'echo "{config}" >> /etc/network/interfaces'])
+
+        logging.info('Install iperf3 on VM and host')
+        if vm.ssh_with_result(['apt install iperf3 --assume-yes']).returncode != 0:
+            assert False, "Failed to install iperf3 on the VM"
+        host.yum_install(['iperf3'])
+
+        logging.info('Reconfigure VM networking')
+        if vm.ssh_with_result(['systemctl restart networking']).returncode != 0:
+            assert False, "Failed to configure networking"


I'd avoid to put anything between "Configure interfaces" and 'Reconfigure VM networking', but modifying the VM config does not seem right, why not simply call ip commands to bring the ifaces up?

I need each interface to get a distinct ipv4 address, otherwise multiple iperf clients won't run in parallel on different interfaces, i thought this was the easiest way to do this (calling ssh in a loop would also take a very long time, so the script is much faster)

tests/limits/test_vif_limit.py

ydirson · 2025-07-18T08:59:42Z

tests/limits/test_vif_limit.py

+# There is a ResourceWarning due to background=True on an ssh call
+# We do ensure the processes are killed
+@pytest.mark.filterwarnings("ignore::ResourceWarning")
+@pytest.mark.debian_uefi_vm


This would need the mark to be declared in this commit already.

@stormi I'm not yet 100% uptospeed with how VM selection work, but that pattern reminds me something - is it even going to work as expected?

sure, i'll reorder the commits

So, regarding VM selection, that's the first time we'd have a marker that asks for a specific distro. But that's not the first time that we have a test that wants a Unix VM with requirements not met by alpine (see vtpm tests).

IMO:

any requirements on the VM should be expressed in a fixture, that checks that the VM provided to the test is suitable. A marker doesn't do this. We could have a debian_vm fixture if we really need a debian VM, but we avoid writing tests for a given distro as much as possible. I'd rather have a more generic fixture, like "unix_vm_with_whatever_criteria_let_us_know_that_the_test_should_pass_with_it". That's something that the vtpm test does poorly, by the way (checks are in the test there, not in a fixture where they would belong).

If necessary for filtering tests in jobs, we can automatically generate markers based on fixture requirements. Check the unix_vm or windows_vm markers, for example. Here we could have a unix_vm_with_whatever_criteria_let_us_know_that_the_test_should_pass_with_it.

There does not need to be an exact match between VM identifiers defined in jobs.py (such as the existing debian_uefi_vm). We first express the requirements in the test, and then see how we can use the test in jobs.py. Here, making the job use debian_uefi_vm seems appropriate, but that's not the only solution we could choose. We could also want to run the test with a collection of OSes (various unixes, windows...) so that the test not only checks XAPI's and Xen's handling of the VIF limit, but also the OSes and PV drivers. Maybe @dinhngtu will have an opinion on this.

Another related topic: should the fixture checking the requirements make the test fail, or skip?

In most cases, we'll want the test to fail. It indicates that we called the test with a VM that is not appropriate. However, if we want to include the test in a "multi" job, one that runs on many different VMs, and if the test can run with most unix VMs (for one version of the test) and windows VMs (for another version of the test) then skipping just for the few VMs that are not suitable (alpine...) could be a way. There's also the possibility to reuse existing VM collections that already exclude alpine in jobs.py/vm_data.py, such as the ones we use for testing guest tools (currently named tools_unix and tools_windows, but if we can define a common set of selection criteria, the name could change.

It's all a question of compromise between being specific and limiting the amount of jobs and different VM collections in CI.

(Another side note: in the future I'd like skipped tests to be considered as failures in Jenkins, unless they belong to a list of expected skips).

I'm a bit lost on the PR's context. Is the per-image limitation coming from the guest kernel or something else? If it's something not statically-defined by the image itself, I think runtime detection + pytest.skip would be more appropriate.

The question for you was mostly whether you'd find useful to test Windows VMs with up to 16 VIFs.

Yes, it'd be useful.

ydirson · 2025-07-18T09:00:10Z

tests/limits/test_vif_limit.py

+# There is a ResourceWarning due to background=True on an ssh call
+# We do ensure the processes are killed
+@pytest.mark.filterwarnings("ignore::ResourceWarning")
+@pytest.mark.debian_uefi_vm


Why specifically UEFI?

the only thing this really needs to express is "non-alpine", preferably debian because some other distros might have different naming schemes for interfaces. uefi is not necessary

tests/limits/test_vif_limit.py

Signed-off-by: Andrii Sultanov <[email protected]>

Currently the test verifies that interfaces have been created correctly (that there were enough grant frames to allocate all the queues), and simply prints performance metrics. This should be integrated with whatever database we are going to use for performance monitoring in the future. Signed-off-by: Andrii Sultanov <[email protected]>

…ain creation (#6577) xenopsd currently hardcodes 64 as the value for `max_grant_frames` for all domains. This limits how many grants a domain can allocate, thus limiting the number of VIFs and VBDs a domain can have. `max_grant_frames` can be changed for individual VMs through specifying a particular value in `platform`, but it's much better to be able to estimate how many grant frames a VM would need based on the number of VBDs and VIFs. Implement this and add some notes on the difficulties and imprecisions of such an estimation. This allows raising the number of supported VIFs from the current limit of 7 - we've picked 16 as a sweet spot, though this could be discussed further. (note that it's the `allowed_vifs` limit that's changed, VIFs can be created in arbitrary numbers on the CLI and by clients not honouring the `allowed_vifs` advice). Given the current behaviour of the XenServer/XCP-ng system (hypervisor+drivers), where more VIFs allow for higher overall networking throughput, this is highly beneficial - in testing overall throughput with 16 VIFs was 18-27% higher than with 8 VIFs (tested with multiple iperf3 instances running on all interfaces simultaneously). The following table shows the performance per-VIF and per-VM on a host with 16 pcpus with 8 vCPUs for domU and dom0 each: ![image](https://github.com/user-attachments/assets/f61488bf-90f0-4cbd-8896-176e053f708f) Moreover, some users coming from VMWare are used to networking setups with dozens of VIFs, and this is a step towards allowing that without encountering any other bottlenecks in the system. Most of this work (except the last commit) was tested very thoroughly at XenServer, since it was initially intended to raise the supported limit of VBDs (it passed Ring3 BST+BVT multiple times, and ran through the config limits test several times). Raising the number of supported VBDs has other issues so was abandoned, but this PR solves all the issues for VIFs. The new limit was tested with XCP-ng, with a new test being added to our CI (xcp-ng/xcp-ng-tests#338), which will verify the ability to hit the new limit and perform well at it.

last-genius · 2025-08-11T07:08:21Z

The upstream PR was merged: xapi-project/xen-api#6577

lib/vm.py: Return newly created UUID in create_vif

d4efcc1

This allows avoiding a subsequent vif-list or similar to immediately operate on the VIF Signed-off-by: Andrii Sultanov <[email protected]>

last-genius mentioned this pull request Jul 7, 2025

Raise VIF limit from 7 to 16 by calculating max_grant_frames on domain creation xapi-project/xen-api#6577

Merged

stormi requested review from ydirson and bleader July 8, 2025 09:25

ydirson requested changes Jul 18, 2025

View reviewed changes

last-genius added 2 commits July 22, 2025 10:40

Add limit-tests to jobs.py and debian_uefi_vm as a new marker

4b57782

Signed-off-by: Andrii Sultanov <[email protected]>

last-genius force-pushed the asv/vif-limit branch from c3182c9 to 6a7acdb Compare July 22, 2025 09:41

		vm.param_set('VCPUs-max', vcpus)
		vm.param_set('VCPUs-at-startup', vcpus)

Verify VIF limit can be reached #338

Are you sure you want to change the base?

Verify VIF limit can be reached #338

Uh oh!

Conversation

last-genius commented Jul 7, 2025

Uh oh!

stormi commented Jul 17, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stormi Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

last-genius commented Aug 11, 2025

Uh oh!

Uh oh!

stormi Jul 18, 2025 •

edited

Loading