Skip to content

Conversation

Rishikpulhani
Copy link
Contributor

@Rishikpulhani Rishikpulhani commented Aug 11, 2025

When a container using DHCP is torn down, its lease is left active on the DHCP server until it expires. This can be problematic in environments with small IP pools or long lease times. In setups using Dynamic DNS (DDNS), it can also lead to stale DNS records.

This change introduces the capability for netavark to send a DHCPRELEASE message to the DHCP server when a container's network is torn down.

Implementation

  • Adding a release_lease method to the DhcpV4Service in dhcp_service.rs, which wraps the release function from the underlying mozim client.
  • Modifying the teardown gRPC handler in dhcp_proxy.rs to create a temporary DhcpV4Service instance, retrieve the lease from the cache, and call the new release_lease method in a "best-effort" manner.

Testing

This feature was verified using the BATS suite in the test-dhcp/ directory.

  • The test 003-teardown.bats was updated with a new assertion.
  • The test confirms that after run_teardown is called, the netavark-dhcp-proxy service log contains the message "Successfully sent DHCPRELEASE for <MAC_ADDRESS>".
  • This approach of checking the proxy's own log was chosen because the virtual bridge in the test environment does not reliably forward the broadcast DHCPRELEASE packet to the dnsmasq listener. The proxy log provides definitive proof that the send logic was triggered correctly.

fixes #1271

Help Needed: Blocked on Testing

I am opening this as a Draft PR because I am currently blocked on testing due to persistent environmental issues in my WSL Fedora setup. I've attempted several testing architectures (macvlan, unmanaged bridge with a containerized DHCP server, etc.) but have been consistently blocked by WSL-specific networking and systemd user session problems.

The code is ready for a logical review, and I would greatly appreciate it if a maintainer with a stable test environment could help verify the functionality or suggest a reliable testing setup for WSL.

Summary by Sourcery

Enable sending DHCPRELEASE on container network teardown to free up IP leases immediately

New Features:

  • Add release_lease to DhcpV4Service to send DHCPRELEASE using the mozim client
  • Update teardown handler to invoke release_lease for cached leases in a best-effort manner

Enhancements:

  • Log lease release failures as warnings without aborting the teardown process

Summary by Sourcery

Enable netavark to send DHCPRELEASE when tearing down container networks to prevent stale leases and DNS records

New Features:

  • Send DHCPRELEASE on container network teardown to proactively free IP leases
  • Add release_lease method to DhcpV4Service to wrap mozim client’s release functionality
  • Invoke release_lease in the teardown handler in a best-effort manner

Enhancements:

  • Log DHCPRELEASE attempts with debug messages and warn on failures without aborting teardown

Tests:

  • Update 003-teardown.bats to assert the proxy log contains the success message for DHCPRELEASE

Copy link

sourcery-ai bot commented Aug 11, 2025

Reviewer's Guide

This PR enhances netavark’s DHCP proxy to actively send a DHCPRELEASE when a container’s network is torn down by adding a release_lease method to the DHCP service, integrating it into the teardown handler in a best-effort manner, and updating the teardown test to assert on proxy logs.

Sequence diagram for DHCPRELEASE on container teardown

sequenceDiagram
    participant Container
    participant NetavarkProxy
    participant DhcpV4Service
    participant DHCPServer
    Container->>NetavarkProxy: Request network teardown
    NetavarkProxy->>DhcpV4Service: Create service, retrieve lease
    DhcpV4Service->>DHCPServer: Send DHCPRELEASE
    DhcpV4Service-->>NetavarkProxy: Return result (success/failure)
    NetavarkProxy-->>Container: Confirm teardown
Loading

Class diagram for updated DhcpV4Service and teardown handler

classDiagram
    class DhcpV4Service {
        +release_lease(lease: MozimV4Lease): Result<(), DhcpServiceError>
        ...
    }
    class NetavarkProxyService {
        +teardown(request: NetworkConfig): Response<Lease>
        ...
    }
    NetavarkProxyService --> DhcpV4Service : uses
    DhcpV4Service --> MozimV4Lease : parameter
    MozimV4Lease <.. DhcpV4Service : used in release_lease
Loading

File-Level Changes

Change Details Files
Introduce DHCP lease release functionality
  • Add release_lease method in DhcpV4Service calling the mozim client’s release
  • Emit debug log before invoking the release operation
src/dhcp_proxy/dhcp_service.rs
Invoke lease release in container teardown handler
  • Convert cached lease to mozim format and initialize a temporary DhcpV4Service
  • Call release_lease in a best-effort block, logging warnings on errors and debug on success
src/commands/dhcp_proxy.rs
Enhance teardown test to verify release logic
  • Override start_proxy to enable debug logging within the test namespace
  • Add grep assertion to check proxy.log for the successful DHCPRELEASE message
test-dhcp/003-teardown.bats

Assessment against linked issues

Issue Objective Addressed Explanation
#1271 Implement sending a DHCP RELEASE message when a container using DHCP is torn down.
#1271 Ensure the DHCP RELEASE logic is tested and verifiable in the test suite.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

2 similar comments
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@Rishikpulhani Rishikpulhani changed the title WIP: feat(dhcp): Send DHCPRELEASE on container teardown Feat(dhcp): Send DHCPRELEASE on container teardown Aug 11, 2025
@Rishikpulhani Rishikpulhani marked this pull request as ready for review August 11, 2025 19:55
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Rishikpulhani - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `src/commands/dhcp_proxy.rs:175` </location>
<code_context>
+                // We use a clone of 'nc' for the new service, and the original 'nc' for logging.
+                match DhcpV4Service::new(nc.clone(), self.dora_timeout) {
+                    Ok(mut service) => {
+                        if let Err(e) = service.release_lease(&mozim_lease) {
+                            // Log the error but continue. Use the correct variable for the MAC address.
+                            warn!(
+                                "Failed to send DHCPRELEASE for {}: {}",
+                                &nc.container_mac_addr, e
+                            );
+                        } else {
+                            debug!(
+                                "Successfully sent DHCPRELEASE for {}",
</code_context>

<issue_to_address>
Check if release_lease could block or introduce latency in teardown.

If release_lease performs network I/O, consider adding a timeout or running it asynchronously to avoid blocking teardown longer than necessary.

Suggested implementation:

```rust
                match DhcpV4Service::new(nc.clone(), self.dora_timeout) {
                    Ok(mut service) => {
                        // Run release_lease asynchronously with a timeout to avoid blocking teardown.
                        let release_timeout = std::time::Duration::from_secs(3);
                        match tokio::time::timeout(release_timeout, service.release_lease(&mozim_lease)).await {
                            Ok(Ok(_)) => {
                                debug!(
                                    "Successfully sent DHCPRELEASE for {}",
                                    &nc.container_mac_addr
                                );
                            }
                            Ok(Err(e)) => {
                                warn!(
                                    "Failed to send DHCPRELEASE for {}: {}",
                                    &nc.container_mac_addr, e
                                );
                            }
                            Err(_) => {
                                warn!(
                                    "DHCPRELEASE for {} timed out after {:?}",
                                    &nc.container_mac_addr, release_timeout
                                );
                            }
                        }
                    }
                    Err(e) => {
                        warn!("Failed to create DHCP service for release: {e}");
                    }
                }

```

- The containing function must be marked as `async` if it isn't already, and its callers must be updated accordingly.
- The `release_lease` method must be async. If it is not, you will need to refactor it to be async.
- Ensure `tokio` is available in your dependencies.
- If you want the timeout duration to be configurable, consider making it a struct field or a constant.
</issue_to_address>

### Comment 2
<location> `test-dhcp/003-teardown.bats:45` </location>
<code_context>
+       # broadcast DHCPRELEASE packet to the dnsmasq listener. We check the
+       # proxy's own log for the success message, as this directly confirms
+       # the code path was executed successfully.
+       run_helper grep "Successfully sent DHCPRELEASE for ${CONTAINER_MAC}" "$TMP_TESTDIR/proxy.log"
        run_helper cat "$TMP_TESTDIR/nv-proxy.lease"
        # Check that the length of the lease file is now zero
</code_context>

<issue_to_address>
Consider adding a negative test to verify log output when DHCPRELEASE fails.

Testing a failure scenario, such as an invalid lease or mozim client error, would confirm that warning logs are generated and teardown proceeds as expected.

Suggested implementation:

```shell
       run_helper grep "Successfully sent DHCPRELEASE for ${CONTAINER_MAC}" "$TMP_TESTDIR/proxy.log"
       run_helper cat "$TMP_TESTDIR/nv-proxy.lease"
       # Check that the length of the lease file is now zero
       run_helper jq ". | length" <<<"$output"

       # Negative test: simulate DHCPRELEASE failure by using an invalid MAC address
       INVALID_MAC="00:00:00:00:00:00"
       # Prepare a config with the invalid MAC
       read -r -d '\0' invalid_input_config <<EOF
{
  "container_mac": "$INVALID_MAC",
  "other_config": "value"
}
EOF
       # Run teardown with invalid MAC
       run_teardown "$invalid_input_config"
       # Check the proxy log for a warning or error about DHCPRELEASE failure
       run_helper grep "Failed to send DHCPRELEASE for ${INVALID_MAC}" "$TMP_TESTDIR/proxy.log"
       # Confirm teardown still proceeds and lease file is handled
       run_helper cat "$TMP_TESTDIR/nv-proxy.lease"

```

- You may need to adjust the invalid_input_config structure to match your actual teardown input requirements.
- Ensure your proxy implementation logs a message like "Failed to send DHCPRELEASE for ..." on failure; if the log message differs, update the grep pattern accordingly.
- If teardown expects more fields in the config, add them to invalid_input_config.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

openshift-ci bot commented Aug 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Rishikpulhani, sourcery-ai[bot]
Once this PR has been reviewed and has the lgtm label, please assign luap99 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR

Comment on lines 8 to 11
# Redefine start_proxy locally to enable debug logging for this test only.
# This version will be used instead of the one from helpers.bash.
function start_proxy() {
RUST_LOG=debug ip netns exec "$NS_NAME" $NETAVARK dhcp-proxy --dir "$TMP_TESTDIR" --uds "$TMP_TESTDIR" &>"$TMP_TESTDIR/proxy.log" &
PROXY_PID=$!
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason why we should not have debug log tunred on for all tests. I rather not duplicate this here.

# broadcast DHCPRELEASE packet to the dnsmasq listener. We check the
# proxy's own log for the success message, as this directly confirms
# the code path was executed successfully.
run_helper grep "Successfully sent DHCPRELEASE for ${CONTAINER_MAC}" "$TMP_TESTDIR/proxy.log"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach of checking the proxy's own log was chosen because the virtual bridge in the test environment does not reliably forward the broadcast DHCPRELEASE packet to the dnsmasq listener.

Can you expand on this? I think we also have to check dnsmasq logs, because our logs could just lie and say we send the event without ever doing so.
If we are able to request a lease we should be able to send the release event as well or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Luap99, you're right to question this. My initial implementation did exactly what you suggested – I was checking the dnsmasq log for the DHCPRELEASE message.

However, the test was consistently failing. The dnsmasq.log showed a perfect DHCPDISCOVER/OFFER/REQUEST/ACK sequence, but the DHCPRELEASE message never appeared

Initially i thought that maybe the test's cleanup logic is terminating the dnsmasq process before the kernel's virtual network has time to deliver the in-flight DHCPRELEASE packet from the proxy.

To debug this, I instrumented the netavark-dhcp-proxy with detailed logging. The proxy's debug log definitively shows that the release_lease function is executed correctly and that the mozim client successfully sends the release packet to the network socket.

I thought that this is just the case with the test environment and so I just added a check for checking that whether the packet was emitted or not instead of checking the dnsmasq logs

Is this an acceptable testing strategy given the circumstances, or is there a better way to resolve this race condition in the BATS environment that I might be missing?

PS: I am actually facing the same issue with my new implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Luap99,

I went through the codebase of both Netavark and Mozim, as well as the logs emitted during tests.

In the Mozim library, I used the following release method on the DhcpV4ClientAsync:
https://docs.rs/mozim/latest/mozim/struct.DhcpV4ClientAsync.html#method.release

This internally calls the release method on DhcpV4Client:
https://docs.rs/mozim/latest/src/mozim/dhcpv4/client.rs.html#554

When reviewing the proxy logs, I found this Release message being sent:

[DEBUG mozim::dhcpv4::msg] DHCP message Message { opcode: BootRequest, htype: Eth, hlen: 6, hops: 0, xid: 1505379826, secs: 0, flags: Flags { broadcast: false }, ciaddr: 10.220.46.58, yiaddr: 0.0.0.0, siaddr: 0.0.0.0, giaddr: 0.0.0.0, chaddr: [106, 108, 253, 24, 156, 237, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], sname: Some([102, 111, 111, 98, 97, 114]), fname: None, magic: [99, 130, 83, 99], opts: DhcpOptions({ServerIdentifier: ServerIdentifier(10.220.46.1), MessageType: MessageType(Release), Hostname: Hostname("foobar"), ClientIdentifier: ClientIdentifier([0, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 97, 98, 99, 100, 101, 102, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 97, 98, 99, 100, 101, 102, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 97, 98, 99, 100, 101, 102, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 97, 98, 99, 100, 101, 102])}) }
   [DEBUG mozim::socket] Sending raw ethernet package: [102, 98, 123, 131, 209, 27, 106, 108, 253, 24, 156, 237, 8, 0, 69, 0, 1, 97, 0, 0, 64, 0, 128, 17, 135, 153, 10, 220, 46, 58, 10, 220, 46, 1, 0, 68, 0, 67, 1, 77, 198, 56, 1, 1, 6, 0, 89, 186, 69, 242, 0, 0, 0, 0, 10, 220, 46, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 106, 108, 253, 24, 156, 237, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 102, 111, 111, 98, 97, 114, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 99, 130, 83, 99, 54, 4, 10, 220, 46, 1, 53, 1, 7, 12, 6, 102, 111, 111, 98, 97, 114, 61, 65, 0, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 97, 98, 99, 100, 101, 102, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 97, 98, 99, 100, 101, 102, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 97, 98, 99, 100, 101, 102, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 97, 98, 99, 100, 101, 102, 255]
   [DEBUG mozim::socket] Raw socket sent: 367 bytes

However, for some reason, this message never reaches the DHCP server.
It’s not due to the server being stopped — I even added delays in 003-teardown.bats before the log check:

sleep 10
assert `grep -c "Successfully sent DHCPRELEASE for ${CONTAINER_MAC}" "$TMP_TESTDIR/dnsmasq.log"` == 1

This had no effect. It seems that although the Mozim client is sending the packet, the DHCP server does not receive it.
My suspicion is that the DHCP server may be dropping these packets.

Any help in identifying what might be going wrong here would be greatly appreciated. Thanks!

@Rishikpulhani Rishikpulhani force-pushed the fix-dhcp-release branch 3 times, most recently from e149268 to fbd03ff Compare August 14, 2025 11:10
@Rishikpulhani Rishikpulhani requested a review from Luap99 August 14, 2025 14:11
@Rishikpulhani
Copy link
Contributor Author

@Luap99
I'm writing to provide an update on the testing for the DHCPRELEASE feature. I've successfully verified that the implementation is working correctly and wanted to document my findings regarding the test environment.

Investigation of the Log Behavior

My initial observation was that while the DHCPDISCOVER/OFFER/REQUEST/ACK (DORA) messages were visible in the dnsmasq log, the DHCPRELEASE message was not.

My working hypothesis is that this is due to the nature of the packets and the virtual test network's architecture. The initial DORA messages are initiated via broadcast, which are flooded across the br0 bridge and correctly picked up by the dnsmasq listener. However, the DHCPRELEASE message is a unicast packet, sent directly from the client's IP to the server's IP. It appears that on a Linux virtual bridge, this directed traffic between two endpoints may not be visible to a listener on the bridge device (br0) itself.

Furthermore, my review of the mozim client indicates that it sends the DHCPRELEASE packet in a "fire-and-forget" manner. It considers the operation successful upon sending the packet and does not wait for an acknowledgment, which is compliant with the DHCP standard for this message type.

This behavior supports the conclusion that our primary testing goal should be to verify that the client successfully sends the DHCPRELEASE packet. Verifying receipt by the dnsmasq server is not a reliable metric in this specific test environment. Since both the client and server operate on the same virtual bridge, it is plausible that the unicast DHCPRELEASE message is not delivered to the server's listener. This is a quirk of the test setup and would not occur in a real-world scenario where the client and server reside on different machines

Verification with tcpdump

To definitively prove the packet was being sent by my code, I captured the traffic directly from the br0 bridge inside the test's temporary network namespace.
The steps taken were:

  • Paused the 003-teardown.bats test just before the run_teardown command.
# Run teardown
       echo "Pausing for 60 seconds to allow for inspection. Find the netns and run tcpdump now."
       sleep 100
       run_teardown "$input_config"
  • Identified the active network namespace
$ sudo ip netns
  • Executed tcpdump inside the namespace, listening on the br0 interface.
$ sudo ip netns exec akWOUvMNBG tcpdump -i br0 -n 'port 67 or port 68' -vv
dropped privs to tcpdump
tcpdump: listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
17:04:05.775866 IP (tos 0x0, ttl 128, id 0, offset 0, flags [DF], proto UDP (17), length 353)
    10.102.85.56.bootpc > 10.102.85.1.bootps: [udp sum ok] BOOTP/DHCP, Request from 9a:bc:e6:d8:61:25, length 325, xid 0xf1260074, Flags [none] (0x0000)
          Client-IP 10.102.85.56
          Client-Ethernet-Address 9a:bc:e6:d8:61:25
          sname "foobar"
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            Server-ID (54), length 4: 10.102.85.1
            Client-ID (61), length 65: "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
            Hostname (12), length 6: "foobar"
            DHCP-Message (53), length 1: Release
tcpdump: pcap_loop: The interface disappeared
1 packet captured
1 packet received by filter
0 packets dropped by kernel

This tcpdump output provides definitive proof that the code is correctly generating and sending the unicast DHCPRELEASE packet onto the virtual network.
Based on this, I have updated the BATS test to verify success by checking for the "Successfully sent DHCPRELEASE..." message in the netavark-proxy.log, as this provides direct proof of the code's execution.

I've detailed my testing methodology and findings above and would be grateful for your review of this approach. Please let me know if you have any feedback or suggestions. Thanks!

@Luap99
Copy link
Member

Luap99 commented Aug 20, 2025

@Rishikpulhani Ok I think the package should be able to get received generally, but maybe there is another bug somewhere. I don't really have much time to debug it deeply but I think if we send it and your tcpdump confirms that it is good enough for me now.

log::error!("{err}");
continue;
pub async fn process_client_stream(service_arc: Arc<Mutex<DhcpV4Service>>) {
loop {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this not just stay as loop like this

while let Some(lease) = service_arc.lock().await.client.next().await {
...
}

The extra scopes to reduce lock holding is interesting but not actually doing anything since this function is the only one holding the lock anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @Luap99, but this was done to prevent a potential deadlock scenario.

If we use:

while let Some(lease) = service_arc.lock().await.client.next().await {
...
}

we end up holding the lock on service_arc for the entire loop. Later, inside the Ok arm of the match lease when we need to access the client inside DhcpV4Service, we again need to acquire the lock on service_arc to access the client:

match lease {
            Ok(lease) => {
                let mut client = service_arc.lock().await;
                log::info!(
                    "got new lease for mac {}: {:?}",
                    &client.network_config.container_mac_addr,
                    &lease
                );
                // rest of the code
}

This would result in a deadlock.

To avoid this, I restructured it into a loop with equivalent stopping conditions using an if/else and break. This way, the lock is released after acquiring the lease and then reacquired only when needed to access the client.

Comment on lines 8 to 11
function start_proxy() {
RUST_LOG=debug ip netns exec "$NS_NAME" $NETAVARK dhcp-proxy --dir "$TMP_TESTDIR" --uds "$TMP_TESTDIR" &>"$TMP_TESTDIR/proxy.log" &
PROXY_PID=$!
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed it; however, this means there will no longer be any tests related to DHCPRELEASE in the testing suite.

@Rishikpulhani Rishikpulhani force-pushed the fix-dhcp-release branch 7 times, most recently from 519fa3f to 5b2eb34 Compare August 21, 2025 11:52
@Rishikpulhani
Copy link
Contributor Author

I’ve incorporated the changes you suggested.
PTAL @Luap99

@Rishikpulhani Rishikpulhani requested a review from Luap99 August 21, 2025 12:30
feat(dhcp): Send DHCPRELEASE on container teardown

When a container using DHCP is torn down, its lease is left active on the
DHCP server until it expires. This can be problematic in environments
with small IP pools or long lease times. In setups using Dynamic DNS (DDNS),
it can also lead to stale DNS records.

This change introduces the capability for netavark to send a DHCPRELEASE
message to the DHCP server when a container's network is torn down.

This is implemented by:
- Adding a `release_lease` method to the `DhcpV4Service` in `dhcp_service.rs`,
  which wraps the `release` function from the underlying mozim client.
- Wrapped DhcpV4Service inside Arc<> share that to the process_client_stream
  function and then also store it in the task map
- used it in process_client_stream task and also in the teardown function to
  send the DHCPRELEASE message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DHCP proxy service does not relinquish lease on container teardown
2 participants