-
Notifications
You must be signed in to change notification settings - Fork 356
rfc: Making Storage a Trait #1885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
| // Deletion operations | ||
| async fn delete(&self, path: &str) -> Result<()>; | ||
| async fn remove_dir_all(&self, path: &str) -> Result<()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrating comments from @Fokko
This name feels very much file-system like, while Iceberg is designed to work against object stores. How about delete_prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
| #[async_trait] | ||
| pub trait Storage: Debug + Send + Sync { | ||
| // File existence and metadata | ||
| async fn exists(&self, path: &str) -> Result<bool>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrating comments from @c-thiel
I know that we use these results everywhere, but I think introducing more specific error types that we can match on for storage operations makes sense. They can implement Into of course.
For example, a RateLimited error that we got from the storage service should be treated differently from NotFound or CredentialsExpired.
With Lakekeeper we are currently using our own trait based IO due to many limitations in iceberg-rust, mainly due to unsupported signing mechanisms, missing refresh mechanisms, intransparent errors and missing extendability.
I would gladly switch to iceberg-rust if we get these solved.
Maybe this can serve as some inspiration: https://github.com/lakekeeper/lakekeeper/blob/b8fcf54c627d48a547ef0baf6863949b68579388/crates/io/src/error.rs#L291
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address @c-thiel 's comments, we have several approaches:
- Introduce another set of errors for storage.
- Extend current
ErrorKindfor storage errors. - Extend current
ErrorKind, but with another enum, for example
pub enum IoErrorKind {
FileNotFound,
CredentialExpired,
}
pub enum ErrorKind {
// Existing variants
...
Io(IoErrorKind)
}| // File object creation | ||
| fn new_input(&self, path: &str) -> Result<InputFile>; | ||
| fn new_output(&self, path: &str) -> Result<OutputFile>; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrating comments from @c-thiel
Many object stores have a good way of running batch deletions, for example the DeleteObjects API in AWS S3. How would you feel about including a delete_batch method too?
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CTTY for this pr, generally LGTM! One missing point is, I want the StorageBuilderRegistry to have some built in StorageBuilder registered when user creating a new catalog instance. I currenlty don't have a good solution, one approach would be to have a standalone crate, which loads built in StorageBuilders when StorageBuilderRegistry is initiated. And then we could have catalog crates to depend on it.
| #[async_trait] | ||
| pub trait Storage: Debug + Send + Sync { | ||
| // File existence and metadata | ||
| async fn exists(&self, path: &str) -> Result<bool>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address @c-thiel 's comments, we have several approaches:
- Introduce another set of errors for storage.
- Extend current
ErrorKindfor storage errors. - Extend current
ErrorKind, but with another enum, for example
pub enum IoErrorKind {
FileNotFound,
CredentialExpired,
}
pub enum ErrorKind {
// Existing variants
...
Io(IoErrorKind)
}| #[async_trait] | ||
| pub trait Storage: Debug + Send + Sync { | ||
| // File existence and metadata | ||
| async fn exists(&self, path: &str) -> Result<bool>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| async fn exists(&self, path: &str) -> Result<bool>; | |
| async fn exists(&self, path: AsRef<str>) -> Result<bool>; |
A little more rusty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this breaks object dyn compatibility
|
|
||
| // Deletion operations | ||
| async fn delete(&self, path: &str) -> Result<()>; | ||
| async fn remove_dir_all(&self, path: &str) -> Result<()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
| The `StorageBuilder` trait defines how storage backends are constructed: | ||
|
|
||
| ```rust | ||
| pub trait StorageBuilder: Debug + Send + Sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pub trait StorageBuilder: Debug + Send + Sync { | |
| pub trait StorageFactory: Debug + Send + Sync { |
nit: Factory sounds a litte better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should also be Serializable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I haven't figured out how to have both Serializable and dyn Trait yet, and I'm planning to explore more with typetag. Serializability is a huge pain and I'll add more details once I have more clarity
| pub fn new() -> Self { /* ... */ } | ||
| pub fn register(&mut self, scheme: impl Into<String>, builder: Arc<dyn StorageBuilder>); | ||
| pub fn get_builder(&self, scheme: &str) -> Result<Arc<dyn StorageBuilder>>; | ||
| pub fn supported_types(&self) -> Vec<String>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pub fn supported_types(&self) -> Vec<String>; | |
| pub fn supported_types(&self) -> impl Iterator<Item=&str>>; |
| use iceberg::io::FileIOBuilder; | ||
|
|
||
| // Basic usage (same as the existing code) | ||
| let file_io = FileIOBuilder::new("s3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer need FileIOBuilder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that FileIOBuilder seems excessive. I'm keeping FileIOBuilder here for now mainly because I'm uncertain where to keep Extensions right now mainly due to serde-related concerns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After StorageBuilder becomes serializable, we should be able to remove it.
| ↓ | ||
| FileIOBuilder::build() | ||
| ↓ | ||
| StorageBuilderRegistry::new() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think we should remove
FileIOBuildersince then? - The
StorageBuilderRegisteryshould be an instance in catalog instance?
Co-authored-by: Renjie Liu <[email protected]>
This reverts commit e7d67d8.
Which issue does this PR close?
What changes are included in this PR?
Are these changes tested?