Skip to content

"error decoding response body: request or response body error: operation timed out" even when response stream is making progress #386

@alamb

Description

@alamb

Describe the bug
When downloading a large enough file on a slow enough network, retrieving streaming results from ObjectStore::get will still report "operation timed out" even when it is making active progress

To Reproduce

use std::time::{Duration, Instant};
use futures::StreamExt;
use object_store::ObjectStore;
use object_store::path::Path;


#[tokio::main]
async fn main() {
    let start = Instant::now();
    let object_store_url = "https://datasets.clickhouse.com";
    let client_options = object_store::ClientOptions::default()
        // default the timeout to 1 second
        .with_timeout(Duration::from_secs(1));

    let object_store = object_store::http::HttpBuilder::new()
        .with_client_options(client_options)
        .with_url(object_store_url)
        .build()
        .unwrap();

    // this is a 14GB file
    let file_path = Path::from("hits_compatible/hits.parquet");
    let response = object_store.get(&file_path).await.unwrap();

    // read the response body relatively slowly
    let mut stream = response.into_stream();
    while let Some(chunk) = stream.next().await {
        let chunk = chunk.unwrap();
        // throttle the read speed
        tokio::time::sleep(Duration::from_millis(100)).await;
    }

    println!("{:?} Done", start.elapsed());
}

Results in the following after about 15 seconds:

thread 'main' panicked at src/main.rs:29:27:
called `Result::unwrap()` on an `Err` value: Generic { store: "HTTP", source: HttpError { kind: Timeout, source: reqwest::Error { kind: Body, source: reqwest::Error { kind: Decode, source: reqwest::Error { kind: Body, source: TimedOut } } } } }
stack backtrace:
...

It doesn't error after 1 second because the request times out and is retried 10 times

Expected behavior
As long as the client is consuming data and the server doesn't shut the connection, I expect that the program will successfully complete and read the entire file.

Additional context
The fix for #15 from @tustvold makes this much better (now the first 10 timeouts are retried) but eventually the timeout still happens

**Potential ideas **

  1. Reset the retry counter once any data has been successfully read from the result stream
  2. Have separate timeout / retry policies that are applied to timeout errors specifically

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions