You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/concepts/git_vs_http.md
+11-42Lines changed: 11 additions & 42 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,59 +4,28 @@ rendered properly in your Markdown viewer.
4
4
5
5
# Git vs HTTP paradigm
6
6
7
-
The `huggingface_hub` library is a library for interacting with the Hugging Face Hub, which is a
8
-
collection of git-based repositories (models, datasets or Spaces). There are two main
9
-
ways to access the Hub using `huggingface_hub`.
7
+
The `huggingface_hub` library is a library for interacting with the Hugging Face Hub, which is a collection of git-based repositories (models, datasets or Spaces). There are two main ways to access the Hub using `huggingface_hub`.
10
8
11
-
The first approach, the so-called "git-based" approach, is led by the [`Repository`] class.
12
-
This method uses a wrapper around the `git` command with additional functions specifically
13
-
designed to interact with the Hub. The second option, called the "HTTP-based" approach,
14
-
involves making HTTP requests using the [`HfApi`] client. Let's examine the pros and cons
15
-
of each approach.
9
+
The first approach, the so-called "git-based" approach, relies on using standard `git` commands directly in a terminal. This method allows you to clone repositories, create commits, and push changes manually. The second option, called the "HTTP-based" approach, involves making HTTP requests using the [`HfApi`] client. Let's examine the pros and cons of each approach.
16
10
17
-
## Repository: the historical git-based approach
11
+
## Git: the historical CLI-based approach
18
12
19
-
At first, `huggingface_hub` was mostly built around the [`Repository`] class. It provides
20
-
Python wrappers for common `git` commands such as `"git add"`, `"git commit"`, `"git push"`,
21
-
`"git tag"`, `"git checkout"`, etc.
13
+
At first, most users interacted with the Hugging Face Hub using plain `git` commands such as `git clone`, `git add`, `git commit`, `git push`, `git tag`, or `git checkout`.
22
14
23
-
The library also helps with setting credentials and tracking large files, which are often
24
-
used in machine learning repositories. Additionally, the library allows you to execute its
25
-
methods in the background, making it useful for uploading data during training.
15
+
This approach lets you work with a full local copy of the repository on your machine, just like in traditional software development. This can be an advantage when you need offline access or want to work with the full history of a repository. However, it also comes with downsides: you are responsible for keeping the repository up-to-date locally, handling credentials, and managing large files (via `git-lfs`), which can become cumbersome when working with large machine learning models or datasets.
26
16
27
-
The main advantage of using a [`Repository`] is that it allows you to maintain a local
28
-
copy of the entire repository on your machine. This can also be a disadvantage as
29
-
it requires you to constantly update and maintain this local copy. This is similar to
30
-
traditional software development where each developer maintains their own local copy and
31
-
pushes changes when working on a feature. However, in the context of machine learning,
32
-
this may not always be necessary as users may only need to download weights for inference
33
-
or convert weights from one format to another without the need to clone the entire
34
-
repository.
35
-
36
-
<Tipwarning={true}>
37
-
38
-
[`Repository`] is now deprecated in favor of the http-based alternatives. Given its large adoption in legacy code, the complete removal of [`Repository`] will only happen in release `v1.0`.
39
-
40
-
</Tip>
17
+
In many machine learning workflows, you may only need to download a few files for inference or convert weights without needing to clone the entire repository. In such cases, using `git` can be overkill and introduce unnecessary complexity.
41
18
42
19
## HfApi: a flexible and convenient HTTP client
43
20
44
-
The [`HfApi`] class was developed to provide an alternative to local git repositories, which
45
-
can be cumbersome to maintain, especially when dealing with large models or datasets. The
46
-
[`HfApi`] class offers the same functionality as git-based approaches, such as downloading
47
-
and pushing files and creating branches and tags, but without the need for a local folder
48
-
that needs to be kept in sync.
21
+
The [`HfApi`] class was developed to provide an alternative to using local git repositories, which can be cumbersome to maintain, especially when dealing with large models or datasets. The [`HfApi`] class offers the same functionality as git-based workflows -such as downloading and pushing files and creating branches and tags- but without the need for a local folder that needs to be kept in sync.
49
22
50
-
In addition to the functionalities already provided by `git`, the [`HfApi`] class offers
51
-
additional features, such as the ability to manage repos, download files using caching for
52
-
efficient reuse, search the Hub for repos and metadata, access community features such as
53
-
discussions, PRs, and comments, and configure Spaces hardware and secrets.
23
+
In addition to the functionalities already provided by `git`, the [`HfApi`] class offers additional features, such as the ability to manage repos, download files using caching for efficient reuse, search the Hub for repos and metadata, access community features such as discussions, PRs, and comments, and configure Spaces hardware and secrets.
54
24
55
25
## What should I use ? And when ?
56
26
57
-
Overall, the **HTTP-based approach is the recommended way to use**`huggingface_hub`
58
-
in all cases. [`HfApi`] allows to pull and push changes, work with PRs, tags and branches, interact with discussions and much more. Since the `0.16` release, the http-based methods can also run in the background, which was the last major advantage of the [`Repository`] class.
27
+
Overall, the **HTTP-based approach is the recommended way to use**`huggingface_hub` in all cases. [`HfApi`] allows you to pull and push changes, work with PRs, tags and branches, interact with discussions and much more.
59
28
60
-
However, not all git commands are available through [`HfApi`]. Some may never be implemented, but we are always trying to improve and close the gap. If you don't see your use case covered, please open [an issue on Github](https://github.com/huggingface/huggingface_hub)! We welcome feedback to help build the 🤗 ecosystem with and for our users.
29
+
However, not all git commands are available through [`HfApi`]. Some may never be implemented, but we are always trying to improve and close the gap. If you don't see your use case covered, please open [an issue on GitHub](https://github.com/huggingface/huggingface_hub)! We welcome feedback to help build the HF ecosystem with and for our users.
61
30
62
-
This preference of the http-based [`HfApi`] over the git-based [`Repository`]does not mean that git versioning will disappear from the Hugging Face Hub anytime soon. It will always be possible to use `git` commands locally in workflows where it makes sense.
31
+
This preference for the HTTP-based [`HfApi`] over direct `git` commands does not mean that git versioning will disappear from the Hugging Face Hub anytime soon. It will always be possible to use `git` locally in workflows where it makes sense.
In some environments, you might want to configure how HTTP calls are made, for example if you are using a proxy.
126
-
`huggingface_hub` let you configure this globally using [`configure_http_backend`]. All requests made to the Hub will
127
-
then use your settings. Under the hood, `huggingface_hub` uses `requests.Session` so you might want to refer to the
128
-
[`requests` documentation](https://requests.readthedocs.io/en/latest/user/advanced) to learn more about the available
129
-
parameters.
125
+
<Tip>
130
126
131
-
Since `requests.Session` is not guaranteed to be thread-safe, `huggingface_hub` creates one session instance per thread.
132
-
Using sessions allows us to keep the connection open between HTTP calls and ultimately save time. If you are
133
-
integrating `huggingface_hub` in a third-party library and wants to make a custom call to the Hub, use [`get_session`]
134
-
to get a Session configured by your users (i.e. replace any `requests.get(...)` call by `get_session().get(...)`).
127
+
In `huggingface_hub` v0.x, HTTP requests were handled with `requests`, and configuration was done via `configure_http_backend`. Since we now use `httpx`, configuration works differently: you must provide a factory function that takes no arguments and returns an `httpx.Client`. You can review the [default implementation here](https://github.com/huggingface/huggingface_hub/blob/v1.0-release/src/huggingface_hub/utils/_http.py) to see which parameters are used by default.
135
128
136
-
[[autodoc]] configure_http_backend
129
+
</Tip>
130
+
131
+
132
+
In some setups, you may need to control how HTTP requests are made, for example when working behind a proxy. The `huggingface_hub` library allows you to configure this globally with [`set_client_factory`]. After configuration, all requests to the Hub will use your custom settings. Since `huggingface_hub` relies on `httpx.Client` under the hood, you can check the [`httpx` documentation](https://www.python-httpx.org/advanced/clients/) for details on available parameters.
133
+
134
+
If you are building a third-party library and need to make direct requests to the Hub, use [`get_session`] to obtain a correctly configured `httpx` client. Replace any direct `httpx.get(...)` calls with `get_session().get(...)` to ensure proper behavior.
135
+
136
+
[[autodoc]] set_client_factory
137
137
138
138
[[autodoc]] get_session
139
139
140
+
In rare cases, you may want to manually close the current session (for example, after a transient `SSLError`). You can do this with [`close_session`]. A new session will automatically be created on the next call to [`get_session`].
141
+
142
+
Sessions are always closed automatically when the process exits.
143
+
144
+
[[autodoc]] close_session
145
+
146
+
For async code, use [`set_async_client_factory`] to configure an `httpx.AsyncClient` and [`get_async_session`] to retrieve one.
147
+
148
+
[[autodoc]] set_async_client_factory
149
+
150
+
[[autodoc]] get_async_session
151
+
152
+
<Tip>
153
+
154
+
Unlike the synchronous client, the lifecycle of the async client is not managed automatically. Use an async context manager to handle it properly.
일부 환경에서는 HTTP 호출이 이루어지는 방식을 구성할 수 있습니다. 예를 들어, 프록시를 사용하는 경우가 그렇습니다. `huggingface_hub`는 [`configure_http_backend`]를 사용하여 전역적으로 이를 구성할 수 있게 합니다. 그러면 Hub로의 모든 요청이 사용자가 설정한 설정을 사용합니다. 내부적으로 `huggingface_hub`는 `requests.Session`을 사용하므로 사용 가능한 매개변수에 대해 자세히 알아보려면 [requests 문서](https://requests.readthedocs.io/en/latest/user/advanced)를 참조하는 것이 좋습니다.
90
-
91
-
`requests.Session`이 스레드 안전을 보장하지 않기 때문에 `huggingface_hub`는 스레드당 하나의 세션 인스턴스를 생성합니다. 세션을 사용하면 HTTP 호출 사이에 연결을 유지하고 최종적으로 시간을 절약할 수 있습니다. `huggingface_hub`를 서드 파티 라이브러리에 통합하고 사용자 지정 호출을 Hub로 만들려는 경우, [`get_session`]을 사용하여 사용자가 구성한 세션을 가져옵니다 (즉, 모든 `requests.get(...)` 호출을 `get_session().get(...)`으로 대체합니다).
0 commit comments