- 
                Notifications
    You must be signed in to change notification settings 
- Fork 41
Allocate direct buffers for multipart upload #559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Allocate direct buffers for multipart upload #559
Conversation
| I think this feature is good. | 
        
          
                storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3Storage.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3Storage.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3Storage.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3StorageConfig.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      8ffac2f    to
    05ef2ed      
    Compare
  
    | @jeqo fixed tests, PTAL | 
Allow to allocate multipart upload buffers as direct buffer rather than on the heap. We try to set a pretty large multipart upload part size on cluster to optimize throughput and reduce S3 requests. At the same time, we try to keep kafka JVM heap size contained on most kafka installation in order to leave as much memory as possible for the page cache. As a matter of example, we will use 4GB heap size on machines with 64GB available memory. The consequence of using pretty large multipart upload size on contained JVM heap size is that we can pretty easily run out of heap size if we suddenly have to upload many segments to tiered storage. The strategy we propose is to allocate multipart buffer in direct memory so that we can more easily configure direct buffer budget.
05ef2ed    to
    9fcdda2      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @dopuskh3! I agree this is a valuable flag for deployments where JVM memory may not be enough for the part uploads.
I added few minor naming suggestions and some testing coverage; but overall looks good.
| + "All parts but the last one will have this size. " | ||
| + "Valid values: between 5MiB and 2GiB"; | ||
|  | ||
| public static final String S3_MULTIPART_UPLOAD_DIRECT_BUFFERS_CONFIG = "s3.multipart.upload.direct.buffers"; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| public static final String S3_MULTIPART_UPLOAD_DIRECT_BUFFERS_CONFIG = "s3.multipart.upload.direct.buffers"; | |
| public static final String S3_MULTIPART_UPLOAD_BUFFER_ALLOCATION_DIRECT_CONFIG = "s3.multipart.upload.buffer.allocation.direct"; | 
| return getBoolean(S3_PATH_STYLE_ENABLED_CONFIG); | ||
| } | ||
|  | ||
| public Boolean multipartDirectBuffers() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| public Boolean multipartDirectBuffers() { | |
| public Boolean uploadBufferAllocationDirect() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these tests be parametrized to have coverage for the direct allocation?
| private String bucketName; | ||
| private int partSize; | ||
|  | ||
| private boolean multipartDirectBuffers; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| private boolean multipartDirectBuffers; | |
| private boolean bufferAllocationDirect; | 
Allow to allocate multipart upload buffers as direct buffer rather than on the heap.
Reasoning
We try to set a pretty large multipart upload part size on cluster to optimize throughput and reduce S3 requests.
At the same time, we try to keep kafka JVM heap size contained on most kafka installation in order to leave as much memory as possible for the page cache. As a matter of example, we will use 4GB heap size on machines with 64GB available memory.
The consequence of using pretty large multipart upload size on contained JVM heap size is that we can pretty easily run out of heap size if we suddenly have to upload many segments to tiered storage.
The strategy we propose is to allocate multipart buffers in direct memory so that we can more easily configure direct buffer budget.
Usage
We introduced
s3.multipart.upload.direct.buffersboolean configuration flag (disabled by default).