Skip to content

Conversation

kohlschuetter
Copy link
Contributor

@kohlschuetter kohlschuetter commented Jul 15, 2025

Currently, read and write operations to the VirtualFileSystem are
"stateless" in the sense that there is no corresponding "open" and
"close" exposed. At the NFS4 level, this exists, and nfs4j tracks these
internally already (stateid4.opaque byte sequences), but they're not
exposed to VirtualFileSystem.

This is a performance problem: PseudoFs calls checkAccess(inode,
ACE4_READ_DATA/ACE4_WRITE_DATA) for each read/write, which in turn
triggers a Stat for each read/write operation.

This incurs an unnecessary performance penalty of more than 20%.

The access check is unnecessary because a successful check upon "open"
will be valid for the entire lifetime of the client-side file
descriptor, just as it is when opening a file descriptor on other POSIX
file systems.

In order to properly track granted access, we can leverage data stored
in stateid4 and the existing work in FileTracker.

Since stateid4 exists in NFSv4 only, we make sure there is no
performance degradation in NFSv3 and no exposure of NFSv4-only internals
in the VirtualFileSystem API:

Expose stateids as OpenHandle to VirtualFileSystem read/write,
defaulting to existing read/write methods for backwards compatibility.
In PseudoFS, check SharedAccess state from FileTracker and use this to
determine access during read/write when available. Rework stateid4 to
implement OpenHandle, and expose ClientId (the first 8 bytes of the
opaque) so we can look up the information in NFSv4StateHandler.

With the change, for sustained reads I'm seeing 854 MB/s instead of 712
MB/s on LocalFileSystem, and 2000 MB/s instead of 1666 MB/s on a custom
implementation.

Also add a small change to limit the scope of addDisposeListener lambdas

When specifying lambdas for cleanup operations, it is good practice to
reduce the scope of instances referenced ("pinned") in the lambda scope.

Right now, we reference unnecessary objects, adding to heap
fragmentation.

Reduce the scope of referenced objects by accessing the required
references before entering the dispose listener lambda.

Signed-off-by: Christian Kohlschütter <[email protected]>
Currently, read and write operations to the VirtualFileSystem are
"stateless" in the sense that there is no corresponding "open" and
"close" exposed. At the NFS4 level, this exists, and nfs4j tracks these
internally already (stateid4.opaque byte sequences), but they're not
exposed to VirtualFileSystem.

This is a performance problem: PseudoFs calls checkAccess(inode,
ACE4_READ_DATA/ACE4_WRITE_DATA) for each read/write, which in turn
triggers a Stat for each read/write operation.

This incurs an unnecessary performance penalty of more than 20%.

The access check is unnecessary because a successful check upon "open"
will be valid for the entire lifetime of the client-side file
descriptor, just as it is when opening a file descriptor on other POSIX
file systems.

In order to properly track granted access, we can leverage data stored
in stateid4 and the existing work in FileTracker.

Since stateid4 exists in NFSv4 only, we make sure there is no
performance degradation in NFSv3 and no exposure of NFSv4-only internals
in the VirtualFileSystem API:

Expose stateids as OpenHandle to VirtualFileSystem read/write,
defaulting to existing read/write methods for backwards compatibility.
In PseudoFS, check SharedAccess state from FileTracker and use this to
determine access during read/write when available. Rework stateid4 to
implement OpenHandle, and expose ClientId (the first 8 bytes of the
opaque) so we can look up the information in NFSv4StateHandler.

With the change, for sustained reads I'm seeing 854 MB/s instead of 712
MB/s on LocalFileSystem, and 2000 MB/s instead of 1666 MB/s on a custom
implementation.

Signed-off-by: Christian Kohlschütter <[email protected]>
@kofemann
Copy link
Member

Consider the following scenario:

sequenceDiagram
    Client->>Server: OPEN
    Server ->> Client: OK, open_stateid
    Client->>Server: READ(open_stateid)
    Server ->> Client: OK, bytes
    Client->>Server: CLOSE(open_stateid)
    Server ->> Server: invalidate open_stateid
    Server ->> Client: OK 

    Client->>Server: OPEN
    Server ->> Client: OK, open_stateid, delegation_stateid
    Client->>Server: CLOSE (open_stateid)
    Server ->> Server: invalidate open_stateid
    Server ->> Client: OK 
    Client->>Server: READ(delegation_stateid)
    Server ->> Client: OK, bytes
Loading

With your approach, the filesystem implementation should track all stateids.
So, maybe FileTracker can be updated to accept listeners that will be triggered when a new state is created or destroyed. A file system can listen for such events:

var myListeningFs = ...;
var stateHandler = new NFSv4StateHandler();

stateHandler.getFileTracker().addListener(myListeningFs);

@kohlschuetter
Copy link
Contributor Author

I think right now, the OpenHandle is really only actionable for PseudoFs.
Providing access from OpenHandle to the "opaque" part is merely the "raw bytes" representation for logging purposes, etc.

We could expose getClientId, and maybe StateOwner somehow, but I'm not sure how far we need to push this.

With the patch, I'm getting 20% better performance without sacrificing PseudoFs, so that's good enough for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants