-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
bugDid we break something?Did we break something?fs: hdfsRelated to the HDFS filesystemRelated to the HDFS filesystemhelp wantedupstreamIssues which need to be resolved in an upstream dependencyIssues which need to be resolved in an upstream dependency
Description
Bug Report
Description
dvc pull
fails on HDFS after removing .dvc/cache
. It means someone clones the repository at first then dvc pull
always fails.
But dvc pull -q
succeed. So it seems that some log printing causes this problem.
I explain things that may help you to debug hopefully.
- Variable total is not a number. It causes the error.
- Variable **d contains variable
total
which is from size - But in this case the variable
size
is not a number. It is a bound method. here
Reproduce
- dvc init
- Copy dataset.zip to the directory
- dvc remote add -d storage hdfs://user/dvc/mystorage
- dvc add dataset.zip
- dvc push
- rm -rf dataset.zip .dvc/.cache
- dvc pull
Expected
dvc pull
and dvc fetch
are executed successfully n HDFS.
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 3.55.2 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-6.10.4-linuxkit-x86_64-with-glibc2.28
Subprojects:
dvc_data = 3.16.5
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.4.0
scmrepo = 3.3.7
Supports:
azure (adlfs = 2024.7.0, knack = 0.12.0, azure-identity = 1.17.1),
gdrive (pydrive2 = 1.20.0),
gs (gcsfs = 2024.9.0.post1),
hdfs (fsspec = 2024.9.0, pyarrow = 17.0.0),
http (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.9.0, boto3 = 1.35.16),
ssh (sshfs = 2024.6.0),
webdav (webdav4 = 0.10.0),
webdavs (webdav4 = 0.10.0),
webhdfs (fsspec = 2024.9.0)
Config:
Global: /home/user/.config/dvc
System: /etc/xdg/dvc
Cache types: symlink
Cache directory: fuse.osxfs on osxfs
Caches: local
Remotes: hdfs
Workspace directory: fuse.osxfs on osxfs
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/19c955812b0a09cd409a3779f4e4d774
Additional Information (if any):
I attach error log below.
$ dvc pull -v
2024-10-08 15:51:47,388 DEBUG: v3.55.2 (pip), CPython 3.10.12 on Linux-6.10.4-linuxkit-x86_64-with-glibc2.28
2024-10-08 15:51:47,390 DEBUG: command: /home/user/.local/bin/dvc pull -v
Collecting |0.00 [00:00, ?entry/s]
Fetching2024-10-08 15:51:49,343 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-10-08 15:51:50,297 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
2024-10-08 15:51:50,625 DEBUG: Preparing to transfer data from 'hdfs://user/dvc/mystorage/files/md5' to '/home/user/repo/.dvc/cache/files/md5'
2024-10-08 15:51:50,625 DEBUG: Preparing to collect status from '/home/user/repo/.dvc/cache/files/md5'
2024-10-08 15:51:50,625 DEBUG: Collecting status from '/home/user/repo/.dvc/cache/files/md5'
2024-10-08 15:51:50,629 DEBUG: Preparing to collect status from '/user/dvc/mystorage/files/md5'
2024-10-08 15:51:50,630 DEBUG: Collecting status from '/user/dvc/mystorage/files/md5'
2024-10-08 15:51:50,691 DEBUG: Estimated remote size: 256 files
2024-10-08 15:51:50,692 DEBUG: Querying 2 oids via traverse
Fetching
0%| |Fetching from hdfs 0/1 [00:00<?, ?file/s]
2024-10-08 15:51:51,217 DEBUG: Removing '/home/user/repo/.dvc/cache/files/md5/12/.bnFqV3d0PmZTKtbQoPM-8A.tmp'
2024-10-08 15:51:51,219 ERROR: failed to transfer '126a8a51b9d1bbd07fddc65819a542c3' - unsupported operand type(s) for +: 'method' and 'float'
Traceback (most recent call last):
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 349, in transfer
_try_links(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 281, in _try_links
return copy(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 97, in copy
return _get(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 227, in _get
_get_one(from_paths[0], to_paths[0])
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/generic.py", line 217, in _get_one
return from_fs.get_file(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 645, in get_file
self.fs.get_file(from_info, to_info, callback=callback, **kwargs)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/fsspec/implementations/arrow.py", line 210, in get_file
super().get_file(rpath, lpath, **kwargs)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/fsspec/spec.py", line 904, in get_file
callback.set_size(getattr(f1, "size", None))
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/fsspec/callbacks.py", line 97, in set_size
self.call()
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/fsspec/callbacks.py", line 311, in call
self.tqdm = self._tqdm_cls(total=self.size, **self._tqdm_kwargs)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_data/callbacks.py", line 92, in __init__
super().__init__(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1098, in __init__
self.refresh(lock_args=self.lock_args)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1347, in refresh
self.display()
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1495, in display
self.sp(self.__str__() if msg is None else msg)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1151, in __str__
return self.format_meter(**self.format_dict)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_data/callbacks.py", line 129, in format_dict
meter = self.format_meter( # type: ignore[call-arg]
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 534, in format_meter
if total and n >= (total + 0.5): # allow float imprecision (#849)
TypeError: unsupported operand type(s) for +: 'method' and 'float'
Fetching Exception ignored in: <function tqdm.__del__ at 0x7ffffdaf53f0> 0/1 [00:00<?, ?file/s]
Traceback (most recent call last):
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/tqdm/std.py", line 1148, in __del__
self.close()
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_data/callbacks.py", line 115, in close
self.postfix["info"] = ""
TypeError: 'NoneType' object does not support item assignment
2024-10-08 15:51:51,224 DEBUG: failed to protect '/home/user/repo/.dvc/cache/files/md5/12/6a8a51b9d1bbd07fddc65819a542c3' - [Errno 2] No such file or directory: '/home/user/repo/.dvc/cache/files/md5/12/6a8a51b9d1bbd07fddc65819a542c3'
Traceback (most recent call last):
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc_data/hashfile/db/local.py", line 117, in protect
os.chmod(path, self.CACHE_MODE)
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/repo/.dvc/cache/files/md5/12/6a8a51b9d1bbd07fddc65819a542c3'
Fetching
2024-10-08 15:51:51,227 ERROR: failed to pull data from the cloud - 1 files failed to download
Traceback (most recent call last):
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 35, in run
stats = self.repo.pull(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/repo/pull.py", line 30, in pull
processed_files_count = self.fetch(
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/home/user/.local/share/uv/tools/dvc/lib/python3.10/site-packages/dvc/repo/fetch.py", line 200, in fetch
raise DownloadError(failed_count)
dvc.exceptions.DownloadError: 1 files failed to download
2024-10-08 15:51:51,234 DEBUG: Analytics is disabled.
Metadata
Metadata
Assignees
Labels
bugDid we break something?Did we break something?fs: hdfsRelated to the HDFS filesystemRelated to the HDFS filesystemhelp wantedupstreamIssues which need to be resolved in an upstream dependencyIssues which need to be resolved in an upstream dependency