-
Notifications
You must be signed in to change notification settings - Fork 6
Verify VIF limit can be reached #338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This allows avoiding a subsequent vif-list or similar to immediately operate on the VIF Signed-off-by: Andrii Sultanov <[email protected]>
vm.param_set('VCPUs-max', vcpus) | ||
vm.param_set('VCPUs-at-startup', vcpus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a permanent change done to the VM, we likely don't want that when it is not a throwaway VM. I'd think this is a good candidate for a fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the vm is destroyed at the end
logging.info('Create VIFs before starting the VM') | ||
for i in range(existing_vifs, vif_limit): | ||
vm.create_vif(i, network_uuid=network_uuid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly those should be disposed in cleanup phase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the vm is destroyed, these are cleaned up as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though we currently give module scope to VM fixtures (so they are indeed destroyed between tests of different modules), we strive to always leave resources in the state where we found them.
There are exceptions:
- tests and fixtures depending on higher level fixtures which create throw-away VM clones for the tests.
- tests and fixtures depending on higher level fixtures which creates snapshots and revert to them after the tests.
tests/limits/test_vif_limit.py
Outdated
logging.info('Configure interfaces') | ||
config = '\n'.join([f'iface {interface_name}{i} inet dhcp\n' | ||
f'auto {interface_name}{i}' | ||
for i in range(1, vif_limit)]) | ||
vm.ssh([f'echo "{config}" >> /etc/network/interfaces']) | ||
|
||
logging.info('Install iperf3 on VM and host') | ||
if vm.ssh_with_result(['apt install iperf3 --assume-yes']).returncode != 0: | ||
assert False, "Failed to install iperf3 on the VM" | ||
host.yum_install(['iperf3']) | ||
|
||
logging.info('Reconfigure VM networking') | ||
if vm.ssh_with_result(['systemctl restart networking']).returncode != 0: | ||
assert False, "Failed to configure networking" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd avoid to put anything between "Configure interfaces" and 'Reconfigure VM networking', but modifying the VM config does not seem right, why not simply call ip
commands to bring the ifaces up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need each interface to get a distinct ipv4 address, otherwise multiple iperf clients won't run in parallel on different interfaces, i thought this was the easiest way to do this (calling ssh
in a loop would also take a very long time, so the script is much faster)
# There is a ResourceWarning due to background=True on an ssh call | ||
# We do ensure the processes are killed | ||
@pytest.mark.filterwarnings("ignore::ResourceWarning") | ||
@pytest.mark.debian_uefi_vm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need the mark to be declared in this commit already.
@stormi I'm not yet 100% uptospeed with how VM selection work, but that pattern reminds me something - is it even going to work as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, i'll reorder the commits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, regarding VM selection, that's the first time we'd have a marker that asks for a specific distro. But that's not the first time that we have a test that wants a Unix VM with requirements not met by alpine (see vtpm tests).
IMO:
- any requirements on the VM should be expressed in a fixture, that checks that the VM provided to the test is suitable. A marker doesn't do this. We could have a
debian_vm
fixture if we really need a debian VM, but we avoid writing tests for a given distro as much as possible. I'd rather have a more generic fixture, like "unix_vm_with_whatever_criteria_let_us_know_that_the_test_should_pass_with_it". That's something that thevtpm
test does poorly, by the way (checks are in the test there, not in a fixture where they would belong). - If necessary for filtering tests in jobs, we can automatically generate markers based on fixture requirements. Check the
unix_vm
orwindows_vm
markers, for example. Here we could have aunix_vm_with_whatever_criteria_let_us_know_that_the_test_should_pass_with_it
. - There does not need to be an exact match between VM identifiers defined in
jobs.py
(such as the existingdebian_uefi_vm
). We first express the requirements in the test, and then see how we can use the test injobs.py
. Here, making the job usedebian_uefi_vm
seems appropriate, but that's not the only solution we could choose. We could also want to run the test with a collection of OSes (various unixes, windows...) so that the test not only checks XAPI's and Xen's handling of the VIF limit, but also the OSes and PV drivers. Maybe @dinhngtu will have an opinion on this.
Another related topic: should the fixture checking the requirements make the test fail, or skip?
In most cases, we'll want the test to fail. It indicates that we called the test with a VM that is not appropriate. However, if we want to include the test in a "multi" job, one that runs on many different VMs, and if the test can run with most unix VMs (for one version of the test) and windows VMs (for another version of the test) then skipping just for the few VMs that are not suitable (alpine...) could be a way. There's also the possibility to reuse existing VM collections that already exclude alpine in jobs.py
/vm_data.py
, such as the ones we use for testing guest tools (currently named tools_unix
and tools_windows
, but if we can define a common set of selection criteria, the name could change.
It's all a question of compromise between being specific and limiting the amount of jobs and different VM collections in CI.
(Another side note: in the future I'd like skipped tests to be considered as failures in Jenkins, unless they belong to a list of expected skips).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit lost on the PR's context. Is the per-image limitation coming from the guest kernel or something else? If it's something not statically-defined by the image itself, I think runtime detection + pytest.skip
would be more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question for you was mostly whether you'd find useful to test Windows VMs with up to 16 VIFs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it'd be useful.
# There is a ResourceWarning due to background=True on an ssh call | ||
# We do ensure the processes are killed | ||
@pytest.mark.filterwarnings("ignore::ResourceWarning") | ||
@pytest.mark.debian_uefi_vm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why specifically UEFI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only thing this really needs to express is "non-alpine", preferably debian because some other distros might have different naming schemes for interfaces. uefi is not necessary
Signed-off-by: Andrii Sultanov <[email protected]>
Currently the test verifies that interfaces have been created correctly (that there were enough grant frames to allocate all the queues), and simply prints performance metrics. This should be integrated with whatever database we are going to use for performance monitoring in the future. Signed-off-by: Andrii Sultanov <[email protected]>
c3182c9
to
6a7acdb
Compare
…ain creation (#6577) xenopsd currently hardcodes 64 as the value for `max_grant_frames` for all domains. This limits how many grants a domain can allocate, thus limiting the number of VIFs and VBDs a domain can have. `max_grant_frames` can be changed for individual VMs through specifying a particular value in `platform`, but it's much better to be able to estimate how many grant frames a VM would need based on the number of VBDs and VIFs. Implement this and add some notes on the difficulties and imprecisions of such an estimation. This allows raising the number of supported VIFs from the current limit of 7 - we've picked 16 as a sweet spot, though this could be discussed further. (note that it's the `allowed_vifs` limit that's changed, VIFs can be created in arbitrary numbers on the CLI and by clients not honouring the `allowed_vifs` advice). Given the current behaviour of the XenServer/XCP-ng system (hypervisor+drivers), where more VIFs allow for higher overall networking throughput, this is highly beneficial - in testing overall throughput with 16 VIFs was 18-27% higher than with 8 VIFs (tested with multiple iperf3 instances running on all interfaces simultaneously). The following table shows the performance per-VIF and per-VM on a host with 16 pcpus with 8 vCPUs for domU and dom0 each:  Moreover, some users coming from VMWare are used to networking setups with dozens of VIFs, and this is a step towards allowing that without encountering any other bottlenecks in the system. Most of this work (except the last commit) was tested very thoroughly at XenServer, since it was initially intended to raise the supported limit of VBDs (it passed Ring3 BST+BVT multiple times, and ran through the config limits test several times). Raising the number of supported VBDs has other issues so was abandoned, but this PR solves all the issues for VIFs. The new limit was tested with XCP-ng, with a new test being added to our CI (xcp-ng/xcp-ng-tests#338), which will verify the ability to hit the new limit and perform well at it.
The upstream PR was merged: xapi-project/xen-api#6577 |
Adds a new test verifying xcp-ng can reach the new suggested limit of 16 VIFs (also tracking performance in a basic way).
Draft until the upstream xapi PR raising the limit gets merged.