Skip to content

Misc. bug: convert_hf_to_gguf.py runs out of memory #15623

@EugeoSynthesisThirtyTwo

Description

Name and Version

llama-cli version:

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
version: 6240 (54a241f5)
built with MSVC 19.42.34435.0 for x64

Operating systems

Windows 10

Which llama.cpp modules do you know to be affected?

No response

Command line

python D:/IA/llama.cpp/convert_hf_to_gguf.py download_dir --outfile outputfile.gguf

Problem description & steps to reproduce

I tried to convert this model into a gguf file. But at the end (217GB/221GB), it threw an exception at me. I have 192 GB of RAM and the script was using the whole thing.

First Bad Commit

No response

Relevant log output

Traceback (most recent call last):
  File "D:\IA\llama.cpp\convert_hf_to_gguf.py", line 8817, in <module>
    main()
  File "D:\IA\llama.cpp\convert_hf_to_gguf.py", line 8811, in main
    model_instance.write()
  File "D:\IA\llama.cpp\convert_hf_to_gguf.py", line 435, in write
    self.gguf_writer.write_tensors_to_file(progress=True)
  File "D:\IA\llama.cpp\gguf-py\gguf\gguf_writer.py", line 456, in write_tensors_to_file
    ti.tensor.tofile(fout)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 220, in tofile
    eager = LazyNumpyTensor.to_eager(self)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 179, in to_eager
    return cls._recurse_apply(t, simple_to_eager)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 105, in _recurse_apply
    return fn(o)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 169, in simple_to_eager
    _t._args = cls._recurse_apply(_t._args, simple_to_eager)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 105, in _recurse_apply
    return fn(o)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 169, in simple_to_eager
    _t._args = cls._recurse_apply(_t._args, simple_to_eager)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 105, in _recurse_apply
    return fn(o)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 170, in simple_to_eager
    _t._data = _t._func(*_t._args, **_t._kwargs)
RuntimeError: [enforce fail at alloc_cpu.cpp:121] data. DefaultCPUAllocator: not enough memory: you tried to allocate 2952790016 bytes.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions