Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions docs/code_documentation/documentation.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
= Code Documentation

WARNING: This documentation is neither complete (i.e. it does not cover everything) nor exhaustive (i.e. it does not completely cover everything it touches on). The version used is a February 18th 2025 version, specifically commit 63ac128. Subsequent modifications have not yet been reviewed.

[[docs:overview]]
== Overview

[[docs:overview:main.cpp]]
=== main.cpp

"`main.cpp`" is the primary source file from which the documentation process started. It compiles into the llama-cli executable which provides chatbot functionality inside the terminal and has the following high-level structure (note that this analysis is not exhaustive):

* (lines) 1-86: include headers, global variables, helper functions
* 88-133: parameter parsing (call to [.codebit]#`common_params_parse(...)`# on line 91, edge case hadling afterwards), [.codebit]#`common_init()`#, console initialization
* 135: [.codebit]#`llama_backend_init()`#
* 136: [.codebit]#`llama_numa_init(...)`#
* 150: call to [.codebit]#`common_init_from_params(...)`# generates [.codebit]#`struct llama_model`# and [.codebit]#`struct llama_context`#
* 165-194: set up [.codebit]#`struct ggml_threadpool`#
* 203-226: conversation mode setup
* 235-432: session setup
* 434: [.codebit]#`common_sampler_init(...)`#
* 460-483: session setup
* 485-532: inference preparation
* 534-906: run loop
** 535-630: input and context management
** 632-652: token evaluation by [.codebit]#`llama_decode(...)`# call (line 640)
** 704-728: display logic
** 731-906: antiprompt/reverse prompt detection, console logic
* 908-923: cleanup (print final logs, dealocate memory)


[[docs:overview:call_paths]]
=== Call Paths

Following is a description of the call paths followed in the documentation process. These are centered on the inference process and the setup necessary for it, and will provide a good picture of the program's general control flow.

==== Model and context init

* [.codebit]#`common_init_from_params(...)`# -> [.codebit]#`llama_model_load_from_file(...)`#, [.codebit]#`llama_init_from_model(...)`#
** [.codebit]#`llama_model_load_from_file(...)`# -> [.codebit]#`llama_model_load_from_file_impl(...)`# -> [.codebit]#`ggml_backend_dev_get(...)`#, [.codebit]#`llama_model_load(...)`#
*** [.codebit]#`ggml_backend_dev_get(...)`# -> [.codebit]#`get_reg()`# -> [.codebit]#`struct ggml_backend_registry()`# -> [.codebit]#`struct ggml_backend_registry.register_backend(...)`#, [.codebit]#`ggml_backend_cuda_reg()`# and [.codebit]#`ggml_backend_cpu_reg()`# (among others, depending on the build)
** [.codebit]#`llama_init_from_model(...)`# -> [.codebit]#`struct llama_context(...)`#, [.codebit]#`ggml_backend_dev_init(...)`#, [.codebit]#`ggml_backend_sched_new(...)`#
*** [.codebit]#`ggml_backend_dev_init(...)`# -> [.codebit]#`struct ggml_backend_device.iface.init_backend(...)`#

Note that the calls to [.codebit]#`ggml_backend_cuda_reg()`# and [.codebit]#`ggml_backend_cpu_reg()`# go much deeper and are responsible for the proper setup of usable devices (among other functions for different backends that are not documented here). They are overall very similar and will be detailed in their own sections.

==== Inference

* [.codebit]#`llama_decode(...)`# -> [.codebit]#`llama_decode_impl(...)`# -> [.codebit]#`ggml_backend_sched_set_eval_callback(...)`#, [.codebit]#`llama_build_graph(...)`#, [.codebit]#`llama_set_inputs(...)`#, [.codebit]#`llama_graph_compute(...)`#
** [.codebit]#`llama_build_graph(...)`# -> [.codebit]#`struct llm_build_context(...)`#, [.codebit]#`struct llm_build_context.init()`#, [.codebit]#`struct llm_build_context.build_llama()`# (one of many branches)
*** [.codebit]#`struct llm_build_context.init()`# -> [.codebit]#`ggml_init(...)`#
*** [.codebit]#`struct llm_build_context.build_llama()`# -> [.codebit]#`ggml_new_graph_custom(...)`#, [.codebit]#`llm_build_input_embd(...)`#
**** [.codebit]#`ggml_new_graph_custom(...)`# -> [.codebit]#`ggml_graph_nbytes(...)`#, [.codebit]#`ggml_new_object(...)`#, [.codebit]#`ggml_hash_set_reset(...)`#
*** [.codebit]#`llm_build_input_embd(...)`# -> [.codebit]#`ggml_new_tensor_1d(...)`#, [.codebit]#`ggml_new_tensor_2d(...)`# -> [.codebit]#`ggml_new_tensor_impl(...)`#
** [.codebit]#`llama_graph_compute(...)`# -> [.codebit]#`ggml_backend_sched_graph_compute_async(...)`# -> [.codebit]#`ggml_backend_sched_alloc_graph(...)`#, [.codebit]#`ggml_backend_sched_compute_splits(...)`#
*** [.codebit]#`ggml_backend_sched_alloc_graph(...)`# -> [.codebit]#`ggml_backend_sched_split_graph(...)`#, [.codebit]#`ggml_backend_sched_alloc_splits(...)`#
*** [.codebit]#`ggml_backend_sched_compute_splits(...)`# -> [.codebit]#`struct ggml_backend_sched.callback_eval`#, [.codebit]#`ggml_backend_graph_compute_async(...)`#
**** [.codebit]#`ggml_backend_graph_compute_async(...)`# -> [.codebit]#`struct ggml_backend.iface.graph_compute`#

Here note that the call path ends in [.codebit]#`struct ggml_backend.iface.graph_compute`#, which is a pointer to a function specific to each backend set in the initialization phase by a call to [.codebit]#`struct ggml_backend_device.iface.init_backend(...)`#, which is itself another pointer to a function set during backend initialization, specifically in the calls to [.codebit]#`ggml_backend_cuda_reg()`# and [.codebit]#`ggml_backend_cpu_reg()`# (and their counterparts for the other supported backends). Again, these will be detailed in their own sections.

[[docs:funcstructs]]
== Functions and structures

This section will elaborate on the functions and structures mentioned above, as well as other relevant ones, grouped by the files which contain them and ordered by their position in said files.

NOTE: There are many types with the formats [.codebit]#`typename_t`# and [.codebit]#`typename_ptr`#. In most, if not all cases, [.codebit]#`typename_t`# is a [.codebit]#`typedef`# that stands for [.codebit]#`typename*`#, while [.codebit]#`typename_ptr`# stands for [.codebit]#`std::unique_ptr<typename, optional_typename_deleter>`#.

include::documentation/common.h.adoc[]

include::documentation/common.cpp.adoc[]

include::documentation/llama-context.h.adoc[]

include::documentation/llama.cpp.adoc[]

include::documentation/ggml-impl.h.adoc[]

include::documentation/ggml-backend-reg.cpp.adoc[]

include::documentation/ggml-cuda.cu.adoc[]

include::documentation/ggml-cpu.cpp.adoc[]

include::documentation/ggml-cpu.c.adoc[]

include::documentation/ggml-backend.cpp.adoc[]

include::documentation/ggml-backend-impl.h.adoc[]

include::documentation/ggml.h.adoc[]

include::documentation/ggml.c.adoc[]
17 changes: 17 additions & 0 deletions docs/code_documentation/documentation/common.cpp.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[[docs:funcstructs:common.cpp]]
== common.cpp


[[docs:funcstructs:common.cpp:common_init_from_params]]
=== common_init_from_params

Signature:
[.codebit]#`struct common_init_result common_init_from_params(common_params & params)`#

Firstly, the function loads the model ([.codebit]#`struct llama_model`#). Depending on the parameters and the build, this can go through one of three branches, calling [.codebit]#`common_load_model_from_hf(...)`# to load from a HuggingFace repository, [.codebit]#`common_load_model_from_url(...)`# to load from an URL or [.codebit]#`llama_model_load_from_file(...)`# to load from a local file. The first two branches also end up indirectly calling [.codebit]#`llama_model_load_from_file(...)`#.

Secondly, it passes the loaded model to [.codebit]#`llama_init_from_model(...)`# to generate the corresponding [.codebit]#`llama_context`#.

Thirdly, it loads the control vectors, then the lora adapters ([.codebit]#`struct llama_adapter_lora`#) indicated by the parameters through calls to [.codebit]#`llama_adapter_lora_init(...)`#. It also performs a warmup run of the model if so indicated by [.codebit]#`params.warmup`#.

Lastly, it bundles and returns the [.codebit]#`llama_model`#, [.codebit]#`llama_context`# and lora adapters in a [.codebit]#`struct common_init_result`#.
19 changes: 19 additions & 0 deletions docs/code_documentation/documentation/common.h.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[[docs:funcstructs:common.h]]
== common.h


[[docs:funcstructs:common.h:struct-common_init_result]]
=== struct common_init_result

This structure is just a wrapper containing [.codebit]##`std::unique_ptr`##s to a [.codebit]#`llama_model`#, a [.codebit]#`llama_context`# and lora adapters:

[source,C++]
----
// note: defines object's lifetime
struct common_init_result {
llama_model_ptr model;
llama_context_ptr context;

std::vector<llama_adapter_lora_ptr> lora;
};
----
67 changes: 67 additions & 0 deletions docs/code_documentation/documentation/ggml-backend-impl.h.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
[[docs:funcstructs:ggml-backend-impl.h]]
== ggml-backend-impl.h


[[docs:funcstructs:ggml-backend-impl.h:struct-ggml_backend_i]]
=== struct ggml_backend_i

The interface for a [.codebit]#`ggml_backend`#. Has the following mandatory members:


* [.codebit]#`const char * (*get_name)(ggml_backend_t backend)`#
* [.codebit]#`void (*free)(ggml_backend_t backend)`#
* [.codebit]#`enum ggml_status (*graph_compute) (ggml_backend_t backend, struct ggml_cgraph * cgraph)`#: from comments: "compute graph (always async if supported by the backend)"



[[docs:funcstructs:ggml-backend-impl.h:struct-ggml_backend]]
=== struct ggml_backend

Describes a high-level backend that contains an interface for tensor operations (optional), graph computation and event synchronization (optional). Has the following members:


* [.codebit]#`ggml_guid_t guid`#
* [.codebit]#`struct ggml_backend_i iface`#
* [.codebit]#`ggml_backend_dev_t device`#
* [.codebit]#`void * context`#


[[docs:funcstructs:ggml-backend-impl.h:struct-ggml_backend_device_i]]
=== struct ggml_backend_device_i

The interface of a [.codebit]#`ggml_backend_device`#. Here are some of its members:

* [.codebit]#`const char * (*get_name)(ggml_backend_dev_t dev)`#: from comments
* [.codebit]#`ggml_backend_t (*init_backend)(ggml_backend_dev_t dev, const char * params)`#: initializes the [.codebit]#`ggml_backend`# corresponding to this device
* [.codebit]#`bool (*supports_op)(ggml_backend_dev_t dev, const struct ggml_tensor * op)`#


[[docs:funcstructs:ggml-backend-impl.h:struct-ggml_backend_device]]
=== struct ggml_backend_device

Describes a usable device. Has the following members:

* [.codebit]#`struct ggml_backend_device_i iface`#
* [.codebit]#`ggml_backend_reg_t reg`#
* [.codebit]#`void * context`#


[[docs:funcstructs:ggml-backend-impl.h:struct-ggml_backend_reg_i]]
=== struct ggml_backend_reg_i

The interface for a [.codebit]#`ggml_backend_reg`#. Has the following members:

* [.codebit]#`const char * (*get_name)(ggml_backend_reg_t reg)`#
* [.codebit]#`size_t (*get_device_count)(ggml_backend_reg_t reg)`#
* [.codebit]#`ggml_backend_dev_t (*get_device)(ggml_backend_reg_t reg, size_t index)`#
* [.codebit]#`void * (*get_proc_address)(ggml_backend_reg_t reg, const char * name)`#: from comments: "(optional) get a pointer to a function in the backend; backends can add custom functions that are not part of the standard ggml-backend interface"


[[docs:funcstructs:ggml-backend-impl.h:struct-ggml_backend_reg]]
=== struct ggml_backend_reg

A registry managing the devices for a specific backend. Has the following members:

* [.codebit]#`int api_version`#: must be initialized to [.codebit]#`GGML_BACKEND_API_VERSION`#
* [.codebit]#`struct ggml_backend_reg_i iface`#
* [.codebit]#`void * context`#
70 changes: 70 additions & 0 deletions docs/code_documentation/documentation/ggml-backend-reg.cpp.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
[[docs:funcstructs:ggml-backend-reg.cpp]]
== ggml-backend-reg.cpp


[[docs:funcstructs:ggml-backend-reg.cpp:struct-ggml_backend_reg_entry]]
=== struct ggml_backend_reg_entry

[source,C++]
----
struct ggml_backend_reg_entry {
ggml_backend_reg_t reg;
dl_handle_ptr handle;
};
----

Note that [.codebit]#`ggml_backend_reg_t`# is an alias for [.codebit]#`ggml_backend_reg*`#.


[[docs:funcstructs:ggml-backend-reg.cpp:struct-ggml_backend_registry]]
=== struct ggml_backend_registry

It has two members:

* [.codebit]#`std::vector<ggml_backend_reg_entry> backends`#
* [.codebit]#`std::vector<ggml_backend_dev_t> devices`#

Its default constructor calls its [.codebit]#`register_backend(...)`# method with the [.codebit]##`ggml_backend_reg`##s specific to each backend with which llama.cpp is compiled (see [.codebit]#`ggml_backend_cuda_reg()`# and [.codebit]#`ggml_backend_cpu_reg()`#). This constructor *_should not_* be called manually, as this structure is meant to be a singleton. See [.codebit]#`get_reg()`#.


[[docs:funcstructs:ggml-backend-reg.cpp:struct-ggml_backend_registry.register_backend]]
=== struct ggml_backend_registry.register_backend

Signature:
[.codebit]#`void register_backend(ggml_backend_reg_t reg, dl_handle_ptr handle = nullptr)`#

Pushes the given pair into the structure's [.codebit]#`backends`# member and calls its [.codebit]#`register_device(...)`# method for every device associated with the [.codebit]#`ggml_backend_reg`# (uses [.codebit]#`ggml_backend_reg_dev_count(...)`# and [.codebit]#`ggml_backend_reg_dev_get(...)`# to iterate through and retrieve them).


[[docs:funcstructs:ggml-backend-reg.cpp:struct-ggml_backend_registry.register_device]]
=== struct ggml_backend_registry.register_device

Signature:
[.codebit]#`void register_device(ggml_backend_dev_t device)`#

Simply pushes to the structure's [.codebit]#`devices`# member.


[[docs:funcstructs:ggml-backend-reg.cpp:get_reg]]
=== get_reg

Signature: [.codebit]#`static ggml_backend_registry & get_reg()`#

Helps implement a singleton-like design pattern for [.codebit]#`struct ggml_backend_registry`#:

[source,C++]
----
static ggml_backend_registry & get_reg() {
static ggml_backend_registry reg;
return reg;
}
----


[[docs:funcstructs:ggml-backend-reg.cpp:ggml_backend_dev_get]]
=== ggml_backend_dev_get

Signature:
[.codebit]#`ggml_backend_dev_t ggml_backend_dev_get(size_t index)`#

Returns [.codebit]#`get_reg().devices[index]`#.
Loading