Misc. bug: convert_hf_to_gguf.py runs out of memory

### Name and Version

llama-cli version:
```
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
version: 6240 (54a241f5)
built with MSVC 19.42.34435.0 for x64
```

### Operating systems

Windows 10

### Which llama.cpp modules do you know to be affected?

_No response_

### Command line

```shell
python D:/IA/llama.cpp/convert_hf_to_gguf.py download_dir --outfile outputfile.gguf
```

### Problem description & steps to reproduce


I tried to convert [this model](https://huggingface.co/TheDrummer/GLM-Steam-106B-A12B-v1) into a gguf file. But at the end (217GB/221GB), it threw an exception at me. I have 192 GB of RAM and the script was using the whole thing.

### First Bad Commit

_No response_

### Relevant log output

```shell
Traceback (most recent call last):
  File "D:\IA\llama.cpp\convert_hf_to_gguf.py", line 8817, in <module>
    main()
  File "D:\IA\llama.cpp\convert_hf_to_gguf.py", line 8811, in main
    model_instance.write()
  File "D:\IA\llama.cpp\convert_hf_to_gguf.py", line 435, in write
    self.gguf_writer.write_tensors_to_file(progress=True)
  File "D:\IA\llama.cpp\gguf-py\gguf\gguf_writer.py", line 456, in write_tensors_to_file
    ti.tensor.tofile(fout)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 220, in tofile
    eager = LazyNumpyTensor.to_eager(self)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 179, in to_eager
    return cls._recurse_apply(t, simple_to_eager)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 105, in _recurse_apply
    return fn(o)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 169, in simple_to_eager
    _t._args = cls._recurse_apply(_t._args, simple_to_eager)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 105, in _recurse_apply
    return fn(o)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 169, in simple_to_eager
    _t._args = cls._recurse_apply(_t._args, simple_to_eager)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 100, in _recurse_apply
    L.append(LazyBase._recurse_apply(item, fn))
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 105, in _recurse_apply
    return fn(o)
  File "D:\IA\llama.cpp\gguf-py\gguf\lazy.py", line 170, in simple_to_eager
    _t._data = _t._func(*_t._args, **_t._kwargs)
RuntimeError: [enforce fail at alloc_cpu.cpp:121] data. DefaultCPUAllocator: not enough memory: you tried to allocate 2952790016 bytes.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: convert_hf_to_gguf.py runs out of memory #15623

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: convert_hf_to_gguf.py runs out of memory #15623

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions