ggml-org · grig95 · Aug 26, 2025 · Aug 26, 2025 · Aug 26, 2025
diff --git a/docs/code_documentation/documentation.adoc b/docs/code_documentation/documentation.adoc
@@ -0,0 +1,93 @@
+= Code Documentation
+
+WARNING: This documentation is neither complete (i.e. it does not cover everything) nor exhaustive (i.e. it does not completely cover everything it touches on). The version used is a February 18th 2025 version, specifically commit 63ac128. Subsequent modifications have not yet been reviewed.
+
+[[docs:overview]]
+== Overview
+
+[[docs:overview:main.cpp]]
+=== main.cpp
+
+"`main.cpp`" is the primary source file from which the documentation process started. It compiles into the llama-cli executable which provides chatbot functionality inside the terminal and has the following high-level structure (note that this analysis is not exhaustive):
+
+* (lines) 1-86: include headers, global variables, helper functions
+* 88-133: parameter parsing (call to [.codebit]#`common_params_parse(...)`# on line 91, edge case hadling afterwards), [.codebit]#`common_init()`#, console initialization
+* 135: [.codebit]#`llama_backend_init()`#
+* 136: [.codebit]#`llama_numa_init(...)`#
+* 150: call to [.codebit]#`common_init_from_params(...)`# generates [.codebit]#`struct llama_model`# and [.codebit]#`struct llama_context`#
+* 165-194: set up [.codebit]#`struct ggml_threadpool`#
+* 203-226: conversation mode setup
+* 235-432: session setup
+* 434: [.codebit]#`common_sampler_init(...)`#
+* 460-483: session setup
+* 485-532: inference preparation
+* 534-906: run loop
+    ** 535-630: input and context management
+    ** 632-652: token evaluation by [.codebit]#`llama_decode(...)`# call (line 640)
+    ** 704-728: display logic
+    ** 731-906: antiprompt/reverse prompt detection, console logic
+* 908-923: cleanup (print final logs, dealocate memory)
+
+
+[[docs:overview:call_paths]]
+=== Call Paths
+
+Following is a description of the call paths followed in the documentation process. These are centered on the inference process and the setup necessary for it, and will provide a good picture of the program's general control flow.
+
+==== Model and context init
+
+* [.codebit]#`common_init_from_params(...)`# -> [.codebit]#`llama_model_load_from_file(...)`#, [.codebit]#`llama_init_from_model(...)`#
+    ** [.codebit]#`llama_model_load_from_file(...)`# -> [.codebit]#`llama_model_load_from_file_impl(...)`# -> [.codebit]#`ggml_backend_dev_get(...)`#, [.codebit]#`llama_model_load(...)`#
+        *** [.codebit]#`ggml_backend_dev_get(...)`# -> [.codebit]#`get_reg()`# -> [.codebit]#`struct ggml_backend_registry()`# -> [.codebit]#`struct ggml_backend_registry.register_backend(...)`#, [.codebit]#`ggml_backend_cuda_reg()`# and [.codebit]#`ggml_backend_cpu_reg()`# (among others, depending on the build)
+    ** [.codebit]#`llama_init_from_model(...)`# -> [.codebit]#`struct llama_context(...)`#, [.codebit]#`ggml_backend_dev_init(...)`#, [.codebit]#`ggml_backend_sched_new(...)`#
+        *** [.codebit]#`ggml_backend_dev_init(...)`# -> [.codebit]#`struct ggml_backend_device.iface.init_backend(...)`#
+
+Note that the calls to [.codebit]#`ggml_backend_cuda_reg()`# and [.codebit]#`ggml_backend_cpu_reg()`# go much deeper and are responsible for the proper setup of usable devices (among other functions for different backends that are not documented here). They are overall very similar and will be detailed in their own sections.
+
+==== Inference
+
+* [.codebit]#`llama_decode(...)`# -> [.codebit]#`llama_decode_impl(...)`# -> [.codebit]#`ggml_backend_sched_set_eval_callback(...)`#, [.codebit]#`llama_build_graph(...)`#, [.codebit]#`llama_set_inputs(...)`#, [.codebit]#`llama_graph_compute(...)`#
+    ** [.codebit]#`llama_build_graph(...)`# -> [.codebit]#`struct llm_build_context(...)`#, [.codebit]#`struct llm_build_context.init()`#, [.codebit]#`struct llm_build_context.build_llama()`# (one of many branches)
+        *** [.codebit]#`struct llm_build_context.init()`# -> [.codebit]#`ggml_init(...)`#
+        *** [.codebit]#`struct llm_build_context.build_llama()`# -> [.codebit]#`ggml_new_graph_custom(...)`#, [.codebit]#`llm_build_input_embd(...)`#
+            **** [.codebit]#`ggml_new_graph_custom(...)`# -> [.codebit]#`ggml_graph_nbytes(...)`#, [.codebit]#`ggml_new_object(...)`#, [.codebit]#`ggml_hash_set_reset(...)`#
+        *** [.codebit]#`llm_build_input_embd(...)`# -> [.codebit]#`ggml_new_tensor_1d(...)`#, [.codebit]#`ggml_new_tensor_2d(...)`# -> [.codebit]#`ggml_new_tensor_impl(...)`#
+    ** [.codebit]#`llama_graph_compute(...)`# -> [.codebit]#`ggml_backend_sched_graph_compute_async(...)`# -> [.codebit]#`ggml_backend_sched_alloc_graph(...)`#, [.codebit]#`ggml_backend_sched_compute_splits(...)`#
+        *** [.codebit]#`ggml_backend_sched_alloc_graph(...)`# -> [.codebit]#`ggml_backend_sched_split_graph(...)`#, [.codebit]#`ggml_backend_sched_alloc_splits(...)`#
+        *** [.codebit]#`ggml_backend_sched_compute_splits(...)`# -> [.codebit]#`struct ggml_backend_sched.callback_eval`#, [.codebit]#`ggml_backend_graph_compute_async(...)`#
+            **** [.codebit]#`ggml_backend_graph_compute_async(...)`# -> [.codebit]#`struct ggml_backend.iface.graph_compute`#
+
+Here note that the call path ends in [.codebit]#`struct ggml_backend.iface.graph_compute`#, which is a pointer to a function specific to each backend set in the initialization phase by a call to [.codebit]#`struct ggml_backend_device.iface.init_backend(...)`#, which is itself another pointer to a function set during backend initialization, specifically in the calls to [.codebit]#`ggml_backend_cuda_reg()`# and [.codebit]#`ggml_backend_cpu_reg()`# (and their counterparts for the other supported backends). Again, these will be detailed in their own sections.
+
+[[docs:funcstructs]]
+== Functions and structures
+
+This section will elaborate on the functions and structures mentioned above, as well as other relevant ones, grouped by the files which contain them and ordered by their position in said files.
+
+NOTE: There are many types with the formats [.codebit]#`typename_t`# and [.codebit]#`typename_ptr`#. In most, if not all cases, [.codebit]#`typename_t`# is a [.codebit]#`typedef`# that stands for [.codebit]#`typename*`#, while [.codebit]#`typename_ptr`# stands for [.codebit]#`std::unique_ptr<typename, optional_typename_deleter>`#.
+
+include::documentation/common.h.adoc[]
+
+include::documentation/common.cpp.adoc[]
+
+include::documentation/llama-context.h.adoc[]
+
+include::documentation/llama.cpp.adoc[]
+
+include::documentation/ggml-impl.h.adoc[]
+
+include::documentation/ggml-backend-reg.cpp.adoc[]
+
+include::documentation/ggml-cuda.cu.adoc[]
+
+include::documentation/ggml-cpu.cpp.adoc[]
+
+include::documentation/ggml-cpu.c.adoc[]
+
+include::documentation/ggml-backend.cpp.adoc[]
+
+include::documentation/ggml-backend-impl.h.adoc[]
+
+include::documentation/ggml.h.adoc[]
+
+include::documentation/ggml.c.adoc[]
diff --git a/docs/code_documentation/documentation/common.cpp.adoc b/docs/code_documentation/documentation/common.cpp.adoc
@@ -0,0 +1,17 @@
+[[docs:funcstructs:common.cpp]]
+== common.cpp
+
+
+[[docs:funcstructs:common.cpp:common_init_from_params]]
+=== common_init_from_params
+
+Signature:
+[.codebit]#`struct common_init_result common_init_from_params(common_params & params)`#
+
+Firstly, the function loads the model ([.codebit]#`struct llama_model`#). Depending on the parameters and the build, this can go through one of three branches, calling [.codebit]#`common_load_model_from_hf(...)`# to load from a HuggingFace repository, [.codebit]#`common_load_model_from_url(...)`# to load from an URL or [.codebit]#`llama_model_load_from_file(...)`# to load from a local file. The first two branches also end up indirectly calling [.codebit]#`llama_model_load_from_file(...)`#. 
+
+Secondly, it passes the loaded model to [.codebit]#`llama_init_from_model(...)`# to generate the corresponding [.codebit]#`llama_context`#.
+
+Thirdly, it loads the control vectors, then the lora adapters ([.codebit]#`struct llama_adapter_lora`#) indicated by the parameters through calls to [.codebit]#`llama_adapter_lora_init(...)`#. It also performs a warmup run of the model if so indicated by [.codebit]#`params.warmup`#.
+
+Lastly, it bundles and returns the [.codebit]#`llama_model`#, [.codebit]#`llama_context`# and lora adapters in a [.codebit]#`struct common_init_result`#.
diff --git a/docs/code_documentation/documentation/common.h.adoc b/docs/code_documentation/documentation/common.h.adoc
@@ -0,0 +1,19 @@
+[[docs:funcstructs:common.h]]
+== common.h
+
+
+[[docs:funcstructs:common.h:struct-common_init_result]]
+=== struct common_init_result
+
+This structure is just a wrapper containing [.codebit]##`std::unique_ptr`##s to a [.codebit]#`llama_model`#, a [.codebit]#`llama_context`# and lora adapters:
+
+[source,C++]
+----
+// note: defines object's lifetime
+struct common_init_result {
+    llama_model_ptr   model;
+    llama_context_ptr context;
+
+    std::vector<llama_adapter_lora_ptr> lora;
+};
+----
diff --git a/docs/code_documentation/documentation/ggml-backend-impl.h.adoc b/docs/code_documentation/documentation/ggml-backend-impl.h.adoc
@@ -0,0 +1,67 @@
+[[docs:funcstructs:ggml-backend-impl.h]]
+== ggml-backend-impl.h
+
+
+[[docs:funcstructs:ggml-backend-impl.h:struct-ggml_backend_i]]
+=== struct ggml_backend_i
+
+The interface for a [.codebit]#`ggml_backend`#. Has the following mandatory members:
+
+
+* [.codebit]#`const char * (*get_name)(ggml_backend_t backend)`#
+* [.codebit]#`void (*free)(ggml_backend_t backend)`#
+* [.codebit]#`enum ggml_status (*graph_compute)     (ggml_backend_t backend, struct ggml_cgraph * cgraph)`#: from comments: "compute graph (always async if supported by the backend)"
+
+
+
+[[docs:funcstructs:ggml-backend-impl.h:struct-ggml_backend]]
+=== struct ggml_backend
+
+Describes a high-level backend that contains an interface for tensor operations (optional), graph computation and event synchronization (optional). Has the following members:
+
+
+* [.codebit]#`ggml_guid_t guid`#
+* [.codebit]#`struct ggml_backend_i iface`#
+* [.codebit]#`ggml_backend_dev_t device`#
+* [.codebit]#`void * context`#
+
+
+[[docs:funcstructs:ggml-backend-impl.h:struct-ggml_backend_device_i]]
+=== struct ggml_backend_device_i
+
+The interface of a [.codebit]#`ggml_backend_device`#. Here are some of its members:
+
+* [.codebit]#`const char * (*get_name)(ggml_backend_dev_t dev)`#: from comments
+* [.codebit]#`ggml_backend_t (*init_backend)(ggml_backend_dev_t dev, const char * params)`#: initializes the [.codebit]#`ggml_backend`# corresponding to this device
+* [.codebit]#`bool (*supports_op)(ggml_backend_dev_t dev, const struct ggml_tensor * op)`#
+
+
+[[docs:funcstructs:ggml-backend-impl.h:struct-ggml_backend_device]]
+=== struct ggml_backend_device
+
+Describes a usable device. Has the following members:
+
+* [.codebit]#`struct ggml_backend_device_i iface`#
+* [.codebit]#`ggml_backend_reg_t reg`#
+* [.codebit]#`void * context`#
+
+
+[[docs:funcstructs:ggml-backend-impl.h:struct-ggml_backend_reg_i]]
+=== struct ggml_backend_reg_i
+
+The interface for a [.codebit]#`ggml_backend_reg`#. Has the following members:
+
+* [.codebit]#`const char * (*get_name)(ggml_backend_reg_t reg)`#
+* [.codebit]#`size_t (*get_device_count)(ggml_backend_reg_t reg)`#
+* [.codebit]#`ggml_backend_dev_t (*get_device)(ggml_backend_reg_t reg, size_t index)`#
+* [.codebit]#`void * (*get_proc_address)(ggml_backend_reg_t reg, const char * name)`#: from comments: "(optional) get a pointer to a function in the backend; backends can add custom functions that are not part of the standard ggml-backend interface"
+
+
+[[docs:funcstructs:ggml-backend-impl.h:struct-ggml_backend_reg]]
+=== struct ggml_backend_reg
+
+A registry managing the devices for a specific backend. Has the following members:
+
+* [.codebit]#`int api_version`#: must be initialized to [.codebit]#`GGML_BACKEND_API_VERSION`#
+* [.codebit]#`struct ggml_backend_reg_i iface`#
+* [.codebit]#`void * context`#
diff --git a/docs/code_documentation/documentation/ggml-backend-reg.cpp.adoc b/docs/code_documentation/documentation/ggml-backend-reg.cpp.adoc
@@ -0,0 +1,70 @@
+[[docs:funcstructs:ggml-backend-reg.cpp]]
+== ggml-backend-reg.cpp
+
+
+[[docs:funcstructs:ggml-backend-reg.cpp:struct-ggml_backend_reg_entry]]
+=== struct ggml_backend_reg_entry
+
+[source,C++]
+----
+struct ggml_backend_reg_entry {
+    ggml_backend_reg_t reg;
+    dl_handle_ptr handle;
+};
+----
+
+Note that [.codebit]#`ggml_backend_reg_t`# is an alias for [.codebit]#`ggml_backend_reg*`#.
+
+
+[[docs:funcstructs:ggml-backend-reg.cpp:struct-ggml_backend_registry]]
+=== struct ggml_backend_registry
+
+It has two members:
+
+* [.codebit]#`std::vector<ggml_backend_reg_entry> backends`#
+* [.codebit]#`std::vector<ggml_backend_dev_t> devices`#
+
+Its default constructor calls its [.codebit]#`register_backend(...)`# method with the [.codebit]##`ggml_backend_reg`##s specific to each backend with which llama.cpp is compiled (see [.codebit]#`ggml_backend_cuda_reg()`# and [.codebit]#`ggml_backend_cpu_reg()`#). This constructor *_should not_* be called manually, as this structure is meant to be a singleton. See [.codebit]#`get_reg()`#.
+
+
+[[docs:funcstructs:ggml-backend-reg.cpp:struct-ggml_backend_registry.register_backend]]
+=== struct ggml_backend_registry.register_backend
+
+Signature:
+[.codebit]#`void register_backend(ggml_backend_reg_t reg, dl_handle_ptr handle = nullptr)`#
+
+Pushes the given pair into the structure's [.codebit]#`backends`# member and calls its [.codebit]#`register_device(...)`# method for every device associated with the [.codebit]#`ggml_backend_reg`# (uses [.codebit]#`ggml_backend_reg_dev_count(...)`# and [.codebit]#`ggml_backend_reg_dev_get(...)`# to iterate through and retrieve them).
+
+
+[[docs:funcstructs:ggml-backend-reg.cpp:struct-ggml_backend_registry.register_device]]
+=== struct ggml_backend_registry.register_device
+
+Signature:
+[.codebit]#`void register_device(ggml_backend_dev_t device)`#
+
+Simply pushes to the structure's [.codebit]#`devices`# member.
+
+
+[[docs:funcstructs:ggml-backend-reg.cpp:get_reg]]
+=== get_reg
+
+Signature: [.codebit]#`static ggml_backend_registry & get_reg()`#
+
+Helps implement a singleton-like design pattern for [.codebit]#`struct ggml_backend_registry`#:
+
+[source,C++]
+----
+static ggml_backend_registry & get_reg() {
+    static ggml_backend_registry reg;
+    return reg;
+}
+----
+
+
+[[docs:funcstructs:ggml-backend-reg.cpp:ggml_backend_dev_get]]
+=== ggml_backend_dev_get
+
+Signature:
+[.codebit]#`ggml_backend_dev_t ggml_backend_dev_get(size_t index)`#
+
+Returns [.codebit]#`get_reg().devices[index]`#.