Skip to content

Add semaphore to AsyncFileSystemWrapper #1901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

dgegen
Copy link

@dgegen dgegen commented Aug 2, 2025

This pull request tries to enhance the AsyncFileSystemWrapper by introducing semaphore support to limit concurrent calls to the underlying synchronous file system. These changes would address the issues encountered while using zarr v3 with fsspec.implementations.sftp, as discussed in Zarr issue #3196.

Perhaps I am missing the bigger picture here and this is actually is not a good idea – so any thoughts or feedback is appreciated :)

Suggested changes

  • Initialize a semaphore in AsyncFileSystemWrapper if asynchronous mode is disabled. This ensures that concurrent calls are managed safely, preventing potential deadlocks in systems that cannot manage concurrent requests.
  • Added LockedFileSystem class to test this case.

- Initialize a semaphore in AsyncFileSystemWrapper if asynchronous mode is
  disabled. This ensures that concurrent calls are managed safely, preventing
  potential deadlocks in systems that cannot manage concurrent requests.
- Added LockedFileSystem class to test this case.
@martindurant
Copy link
Member

Your implementation is interesting - it allows for the amount of concurrency to be controlled; it should be surfaced to the whole wrapper and disabled by default; I suspect that in test_semaphore_synchronous with a semaphore with large value or none at all, you would NOT see a deadlock.

So, I think that allowing to pass in a semaphore in a fine idea, but I'm not sure that defaulting to 1 is a good idea. I don't know if it really has a negative consequence, but we already knew that LocalFileSystem didn't have any deadlock issues. You mock has to explicitly make a lock to create a problem :)

See this ancient discussion of the strange things that paramiko in particular does https://stackoverflow.com/a/32875571/3821154

@dgegen
Copy link
Author

dgegen commented Aug 16, 2025

I removed the default behaviour of adding a semaphore if async=False and instead added the semaphore as an argument to AsyncFileSystemWrapper as you suggested.

I'm not sure if you missed it, but the second test I added actually checks whether the file system is locked and only passes if it triggers the Concurrent requests error:

@pytest.mark.asyncio
async def test_deadlock_when_asynchronous():
    fs = AsyncFileSystemWrapper(LockedFileSystem(), asynchronous=False, semaphore=asyncio.Semaphore(3))
    paths = [f"path_{i}" for i in range(1, 3)]

    with pytest.raises(RuntimeError, match="Concurrent requests!"):
        await asyncio.gather(*(fs._cat_file(path) for path in paths))

Including Seamphore(1) also solved the Zarr issue #3196 which got me looking into this in the first place. So despite all the particularities of the paramiko implementation, this proved to be sufficient to resolve the problem in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants