Skip to content

Commit 4a460be

Browse files
[v1.0] Update "HTTP backend" docs + git_vs_http guide (#3357)
* HTTP configuration docs * http configuration docs * refactored git_vs_http * fix import * fix docs? * Update docs/source/en/package_reference/utilities.md Co-authored-by: célina <[email protected]> --------- Co-authored-by: célina <[email protected]>
1 parent 3b562ca commit 4a460be

File tree

7 files changed

+49
-75
lines changed

7 files changed

+49
-75
lines changed

docs/source/en/concepts/git_vs_http.md

Lines changed: 11 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -4,59 +4,28 @@ rendered properly in your Markdown viewer.
44

55
# Git vs HTTP paradigm
66

7-
The `huggingface_hub` library is a library for interacting with the Hugging Face Hub, which is a
8-
collection of git-based repositories (models, datasets or Spaces). There are two main
9-
ways to access the Hub using `huggingface_hub`.
7+
The `huggingface_hub` library is a library for interacting with the Hugging Face Hub, which is a collection of git-based repositories (models, datasets or Spaces). There are two main ways to access the Hub using `huggingface_hub`.
108

11-
The first approach, the so-called "git-based" approach, is led by the [`Repository`] class.
12-
This method uses a wrapper around the `git` command with additional functions specifically
13-
designed to interact with the Hub. The second option, called the "HTTP-based" approach,
14-
involves making HTTP requests using the [`HfApi`] client. Let's examine the pros and cons
15-
of each approach.
9+
The first approach, the so-called "git-based" approach, relies on using standard `git` commands directly in a terminal. This method allows you to clone repositories, create commits, and push changes manually. The second option, called the "HTTP-based" approach, involves making HTTP requests using the [`HfApi`] client. Let's examine the pros and cons of each approach.
1610

17-
## Repository: the historical git-based approach
11+
## Git: the historical CLI-based approach
1812

19-
At first, `huggingface_hub` was mostly built around the [`Repository`] class. It provides
20-
Python wrappers for common `git` commands such as `"git add"`, `"git commit"`, `"git push"`,
21-
`"git tag"`, `"git checkout"`, etc.
13+
At first, most users interacted with the Hugging Face Hub using plain `git` commands such as `git clone`, `git add`, `git commit`, `git push`, `git tag`, or `git checkout`.
2214

23-
The library also helps with setting credentials and tracking large files, which are often
24-
used in machine learning repositories. Additionally, the library allows you to execute its
25-
methods in the background, making it useful for uploading data during training.
15+
This approach lets you work with a full local copy of the repository on your machine, just like in traditional software development. This can be an advantage when you need offline access or want to work with the full history of a repository. However, it also comes with downsides: you are responsible for keeping the repository up-to-date locally, handling credentials, and managing large files (via `git-lfs`), which can become cumbersome when working with large machine learning models or datasets.
2616

27-
The main advantage of using a [`Repository`] is that it allows you to maintain a local
28-
copy of the entire repository on your machine. This can also be a disadvantage as
29-
it requires you to constantly update and maintain this local copy. This is similar to
30-
traditional software development where each developer maintains their own local copy and
31-
pushes changes when working on a feature. However, in the context of machine learning,
32-
this may not always be necessary as users may only need to download weights for inference
33-
or convert weights from one format to another without the need to clone the entire
34-
repository.
35-
36-
<Tip warning={true}>
37-
38-
[`Repository`] is now deprecated in favor of the http-based alternatives. Given its large adoption in legacy code, the complete removal of [`Repository`] will only happen in release `v1.0`.
39-
40-
</Tip>
17+
In many machine learning workflows, you may only need to download a few files for inference or convert weights without needing to clone the entire repository. In such cases, using `git` can be overkill and introduce unnecessary complexity.
4118

4219
## HfApi: a flexible and convenient HTTP client
4320

44-
The [`HfApi`] class was developed to provide an alternative to local git repositories, which
45-
can be cumbersome to maintain, especially when dealing with large models or datasets. The
46-
[`HfApi`] class offers the same functionality as git-based approaches, such as downloading
47-
and pushing files and creating branches and tags, but without the need for a local folder
48-
that needs to be kept in sync.
21+
The [`HfApi`] class was developed to provide an alternative to using local git repositories, which can be cumbersome to maintain, especially when dealing with large models or datasets. The [`HfApi`] class offers the same functionality as git-based workflows -such as downloading and pushing files and creating branches and tags- but without the need for a local folder that needs to be kept in sync.
4922

50-
In addition to the functionalities already provided by `git`, the [`HfApi`] class offers
51-
additional features, such as the ability to manage repos, download files using caching for
52-
efficient reuse, search the Hub for repos and metadata, access community features such as
53-
discussions, PRs, and comments, and configure Spaces hardware and secrets.
23+
In addition to the functionalities already provided by `git`, the [`HfApi`] class offers additional features, such as the ability to manage repos, download files using caching for efficient reuse, search the Hub for repos and metadata, access community features such as discussions, PRs, and comments, and configure Spaces hardware and secrets.
5424

5525
## What should I use ? And when ?
5626

57-
Overall, the **HTTP-based approach is the recommended way to use** `huggingface_hub`
58-
in all cases. [`HfApi`] allows to pull and push changes, work with PRs, tags and branches, interact with discussions and much more. Since the `0.16` release, the http-based methods can also run in the background, which was the last major advantage of the [`Repository`] class.
27+
Overall, the **HTTP-based approach is the recommended way to use** `huggingface_hub` in all cases. [`HfApi`] allows you to pull and push changes, work with PRs, tags and branches, interact with discussions and much more.
5928

60-
However, not all git commands are available through [`HfApi`]. Some may never be implemented, but we are always trying to improve and close the gap. If you don't see your use case covered, please open [an issue on Github](https://github.com/huggingface/huggingface_hub)! We welcome feedback to help build the 🤗 ecosystem with and for our users.
29+
However, not all git commands are available through [`HfApi`]. Some may never be implemented, but we are always trying to improve and close the gap. If you don't see your use case covered, please open [an issue on GitHub](https://github.com/huggingface/huggingface_hub)! We welcome feedback to help build the HF ecosystem with and for our users.
6130

62-
This preference of the http-based [`HfApi`] over the git-based [`Repository`] does not mean that git versioning will disappear from the Hugging Face Hub anytime soon. It will always be possible to use `git` commands locally in workflows where it makes sense.
31+
This preference for the HTTP-based [`HfApi`] over direct `git` commands does not mean that git versioning will disappear from the Hugging Face Hub anytime soon. It will always be possible to use `git` locally in workflows where it makes sense.

docs/source/en/package_reference/utilities.md

Lines changed: 29 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -120,23 +120,40 @@ You can also enable or disable progress bars for specific groups. This allows yo
120120

121121
[[autodoc]] huggingface_hub.utils.enable_progress_bars
122122

123-
## Configure HTTP backend
123+
## Configuring the HTTP Backend
124124

125-
In some environments, you might want to configure how HTTP calls are made, for example if you are using a proxy.
126-
`huggingface_hub` let you configure this globally using [`configure_http_backend`]. All requests made to the Hub will
127-
then use your settings. Under the hood, `huggingface_hub` uses `requests.Session` so you might want to refer to the
128-
[`requests` documentation](https://requests.readthedocs.io/en/latest/user/advanced) to learn more about the available
129-
parameters.
125+
<Tip>
130126

131-
Since `requests.Session` is not guaranteed to be thread-safe, `huggingface_hub` creates one session instance per thread.
132-
Using sessions allows us to keep the connection open between HTTP calls and ultimately save time. If you are
133-
integrating `huggingface_hub` in a third-party library and wants to make a custom call to the Hub, use [`get_session`]
134-
to get a Session configured by your users (i.e. replace any `requests.get(...)` call by `get_session().get(...)`).
127+
In `huggingface_hub` v0.x, HTTP requests were handled with `requests`, and configuration was done via `configure_http_backend`. Since we now use `httpx`, configuration works differently: you must provide a factory function that takes no arguments and returns an `httpx.Client`. You can review the [default implementation here](https://github.com/huggingface/huggingface_hub/blob/v1.0-release/src/huggingface_hub/utils/_http.py) to see which parameters are used by default.
135128

136-
[[autodoc]] configure_http_backend
129+
</Tip>
130+
131+
132+
In some setups, you may need to control how HTTP requests are made, for example when working behind a proxy. The `huggingface_hub` library allows you to configure this globally with [`set_client_factory`]. After configuration, all requests to the Hub will use your custom settings. Since `huggingface_hub` relies on `httpx.Client` under the hood, you can check the [`httpx` documentation](https://www.python-httpx.org/advanced/clients/) for details on available parameters.
133+
134+
If you are building a third-party library and need to make direct requests to the Hub, use [`get_session`] to obtain a correctly configured `httpx` client. Replace any direct `httpx.get(...)` calls with `get_session().get(...)` to ensure proper behavior.
135+
136+
[[autodoc]] set_client_factory
137137

138138
[[autodoc]] get_session
139139

140+
In rare cases, you may want to manually close the current session (for example, after a transient `SSLError`). You can do this with [`close_session`]. A new session will automatically be created on the next call to [`get_session`].
141+
142+
Sessions are always closed automatically when the process exits.
143+
144+
[[autodoc]] close_session
145+
146+
For async code, use [`set_async_client_factory`] to configure an `httpx.AsyncClient` and [`get_async_session`] to retrieve one.
147+
148+
[[autodoc]] set_async_client_factory
149+
150+
[[autodoc]] get_async_session
151+
152+
<Tip>
153+
154+
Unlike the synchronous client, the lifecycle of the async client is not managed automatically. Use an async context manager to handle it properly.
155+
156+
</Tip>
140157

141158
## Handle HTTP errors
142159

@@ -278,4 +295,4 @@ validated.
278295

279296
Not exactly a validator, but ran as well.
280297

281-
[[autodoc]] utils.smoothly_deprecate_legacy_arguments
298+
[[autodoc]] utils._validators.smoothly_deprecate_legacy_arguments

docs/source/ko/_toctree.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,6 @@
1818
title: 명령줄 인터페이스(CLI) 사용하기
1919
- local: guides/hf_file_system
2020
title: Hf파일시스템
21-
- local: guides/repository
22-
title: 리포지토리
2321
- local: guides/search
2422
title: Hub에서 검색하기
2523
- local: guides/inference

docs/source/ko/package_reference/utilities.md

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -84,16 +84,6 @@ True
8484

8585
[[autodoc]] huggingface_hub.utils.enable_progress_bars
8686

87-
## HTTP 백엔드 구성[[huggingface_hub.configure_http_backend]]
88-
89-
일부 환경에서는 HTTP 호출이 이루어지는 방식을 구성할 수 있습니다. 예를 들어, 프록시를 사용하는 경우가 그렇습니다. `huggingface_hub`[`configure_http_backend`]를 사용하여 전역적으로 이를 구성할 수 있게 합니다. 그러면 Hub로의 모든 요청이 사용자가 설정한 설정을 사용합니다. 내부적으로 `huggingface_hub``requests.Session`을 사용하므로 사용 가능한 매개변수에 대해 자세히 알아보려면 [requests 문서](https://requests.readthedocs.io/en/latest/user/advanced)를 참조하는 것이 좋습니다.
90-
91-
`requests.Session`이 스레드 안전을 보장하지 않기 때문에 `huggingface_hub`는 스레드당 하나의 세션 인스턴스를 생성합니다. 세션을 사용하면 HTTP 호출 사이에 연결을 유지하고 최종적으로 시간을 절약할 수 있습니다. `huggingface_hub`를 서드 파티 라이브러리에 통합하고 사용자 지정 호출을 Hub로 만들려는 경우, [`get_session`]을 사용하여 사용자가 구성한 세션을 가져옵니다 (즉, 모든 `requests.get(...)` 호출을 `get_session().get(...)`으로 대체합니다).
92-
93-
[[autodoc]] configure_http_backend
94-
95-
[[autodoc]] get_session
96-
9787

9888
## HTTP 오류 다루기[[handle-http-errors]]
9989

src/huggingface_hub/__init__.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -516,7 +516,7 @@
516516
"HfHubAsyncTransport",
517517
"HfHubTransport",
518518
"cached_assets_path",
519-
"close_client",
519+
"close_session",
520520
"dump_environment_info",
521521
"get_async_session",
522522
"get_session",
@@ -815,7 +815,7 @@
815815
"cancel_access_request",
816816
"cancel_job",
817817
"change_discussion_status",
818-
"close_client",
818+
"close_session",
819819
"comment_discussion",
820820
"create_branch",
821821
"create_collection",
@@ -1518,7 +1518,7 @@ def __dir__():
15181518
HfHubAsyncTransport, # noqa: F401
15191519
HfHubTransport, # noqa: F401
15201520
cached_assets_path, # noqa: F401
1521-
close_client, # noqa: F401
1521+
close_session, # noqa: F401
15221522
dump_environment_info, # noqa: F401
15231523
get_async_session, # noqa: F401
15241524
get_session, # noqa: F401

src/huggingface_hub/utils/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@
5555
CLIENT_FACTORY_T,
5656
HfHubAsyncTransport,
5757
HfHubTransport,
58-
close_client,
58+
close_session,
5959
fix_hf_endpoint_in_url,
6060
get_async_session,
6161
get_session,

src/huggingface_hub/utils/_http.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ def set_client_factory(client_factory: CLIENT_FACTORY_T) -> None:
174174
"""
175175
global _GLOBAL_CLIENT_FACTORY
176176
with _CLIENT_LOCK:
177-
close_client()
177+
close_session()
178178
_GLOBAL_CLIENT_FACTORY = client_factory
179179

180180

@@ -228,9 +228,9 @@ def get_async_session() -> httpx.AsyncClient:
228228
return _GLOBAL_ASYNC_CLIENT_FACTORY()
229229

230230

231-
def close_client() -> None:
231+
def close_session() -> None:
232232
"""
233-
Close the global httpx.Client used by `huggingface_hub`.
233+
Close the global `httpx.Client` used by `huggingface_hub`.
234234
235235
If a Client is closed, it will be recreated on the next call to [`get_client`].
236236
@@ -250,7 +250,7 @@ def close_client() -> None:
250250
logger.warning(f"Error closing client: {e}")
251251

252252

253-
atexit.register(close_client)
253+
atexit.register(close_session)
254254

255255

256256
def _http_backoff_base(
@@ -325,7 +325,7 @@ def _should_retry(response: httpx.Response) -> bool:
325325
logger.warning(f"'{err}' thrown while requesting {method} {url}")
326326

327327
if isinstance(err, httpx.ConnectError):
328-
close_client() # In case of SSLError it's best to close the shared httpx.Client objects
328+
close_session() # In case of SSLError it's best to close the shared httpx.Client objects
329329

330330
if nb_tries > max_retries:
331331
raise err

0 commit comments

Comments
 (0)