-
Notifications
You must be signed in to change notification settings - Fork 88
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Describe the bug
When downloading a large enough file on a slow enough network, retrieving streaming results from ObjectStore::get
will still report "operation timed out" even when it is making active progress
To Reproduce
use std::time::{Duration, Instant};
use futures::StreamExt;
use object_store::ObjectStore;
use object_store::path::Path;
#[tokio::main]
async fn main() {
let start = Instant::now();
let object_store_url = "https://datasets.clickhouse.com";
let client_options = object_store::ClientOptions::default()
// default the timeout to 1 second
.with_timeout(Duration::from_secs(1));
let object_store = object_store::http::HttpBuilder::new()
.with_client_options(client_options)
.with_url(object_store_url)
.build()
.unwrap();
// this is a 14GB file
let file_path = Path::from("hits_compatible/hits.parquet");
let response = object_store.get(&file_path).await.unwrap();
// read the response body relatively slowly
let mut stream = response.into_stream();
while let Some(chunk) = stream.next().await {
let chunk = chunk.unwrap();
// throttle the read speed
tokio::time::sleep(Duration::from_millis(100)).await;
}
println!("{:?} Done", start.elapsed());
}
Results in the following after about 15 seconds:
thread 'main' panicked at src/main.rs:29:27:
called `Result::unwrap()` on an `Err` value: Generic { store: "HTTP", source: HttpError { kind: Timeout, source: reqwest::Error { kind: Body, source: reqwest::Error { kind: Decode, source: reqwest::Error { kind: Body, source: TimedOut } } } } }
stack backtrace:
...
It doesn't error after 1 second because the request times out and is retried 10 times
Expected behavior
As long as the client is consuming data and the server doesn't shut the connection, I expect that the program will successfully complete and read the entire file.
Additional context
The fix for #15 from @tustvold makes this much better (now the first 10 timeouts are retried) but eventually the timeout still happens
**Potential ideas **
- Reset the retry counter once any data has been successfully read from the result stream
- Have separate timeout / retry policies that are applied to timeout errors specifically
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request