-
Notifications
You must be signed in to change notification settings - Fork 104
Use hash func to boost file creation and lookup #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
How can you determine which hash function is the most suitable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 8 files
Prompt for AI agents (all 1 issues)
Understand the root cause of the following 1 issues and fix them.
<file name="simplefs.h">
<violation number="1" location="simplefs.h:122">
Declaring `simplefs_file_vm_ops` as `static` in the header defines a zeroed vm_ops instance in every translation unit instead of referencing a single populated table. Please make this an extern declaration so consumers can use the real ops struct.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.
visitorckw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that your PR description includes some performance benchmarks, but the commit message lacks any performance numbers to support your improvements. Please improve the commit message.
b928f62 to
308bb4c
Compare
308bb4c to
1645d00
Compare
I’m not sure if "fnv" is the most suitable, but index in SimpleFS is relatively small, using a more complex algorithm might not provide significant benefits. I think fnv is a reasonable balance between simplicity and performance. |
ca74c03 to
51e0478
Compare
visitorckw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You ignored many of my comments without making any changes or providing any replies. You still retained many irrelevant changes, making the review difficult. Additionally, a single-line commit message saying only "optimize the file search process" is way too vague. Please improve the git commit message.
Each `simplefs_extent` structure contains a counter that records the total number of files within that extent. When the counter matches the expected file number, it indicates there are no more files after this index, allowing the iterator to skip directly to the next extension block. This reduces unnecessary scanning and improves traversal efficiency.
Align the print function with the Simplefs print format for consistency. Also adjust variable declarations to fix compiler warnings when building under the C90 standard.
51e0478 to
0519d61
Compare
Move the symlink function to a new source file to better align with the Ext4 filesystem structure. This separation improves code organization and maintainability.
0519d61 to
864a9a1
Compare
Added all hash results into the commit. |
56c0522 to
0176a4b
Compare
Introduce a hash-based mechanism to speed up file creation and lookup operations. The hash function enables faster access to extent and logical block extent index, improving overall filesystem performance. hash_code = file_hash(file_name); extent index = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT block index = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT; Use perf to measure: 1. File Creation (random) Legacy: 259.842753513 seconds time elapsed 23.000247000 seconds user 150.380145000 seconds sys full_name_hash: 222.274028604 seconds time elapsed 20.794966000 seconds user 151.941876000 seconds sys 2. File Listing (random) Legacy: min time: 0.00171 s max time: 0.03799 s avg time: 0.00423332 s tot time: 129.539510 s full_name_hash: min time: 0.00171 s max time: 0.03588 s avg time: 0.00305601 s tot time: 93.514040 s 3. files Removal (Random) Legacy: 106.921706288 seconds time elapsed 16.987883000 seconds user 91.268661000 seconds sys full_name_hash: 86.132655220 seconds time elapsed 19.180209000 seconds user 68.476075000 seconds sys
0176a4b to
c2316df
Compare
Previously, SimpleFS used a sequential insertion method to create files, which worked efficiently when the filesystem contained only a small number of files.
However, in real-world use cases, filesystems often manage a large number of files, making sequential search and insertion inefficient.
Inspired by Ext4’s hash-based directory indexing, this change adopts a hash function to accelerate file indexing and improve scalability.
Change:
Implemented hash-based file index lookup
Improved scalability for large directory structures
hash_code = file_hash(file_name);
extent index = hash_code / SIMPLEFS_MAX_BLOCKS_PER_EXTENT
block index = hash_code % SIMPLEFS_MAX_BLOCKS_PER_EXTENT;
Performance test
legacy:
full_name_hash
legacy:
full_name_hash
Use perf stat ls -la to measure the query time for each file and sum up all elapsed times to calculate the total lookup cost.
Legacy :
min time: 0.00171 s
max time: 0.03799 s
avg time: 0.00423332 s
tot time: 129.539510 s
full_name_hash:
min time: 0.00171 s
max time: 0.03588 s
avg time: 0.00305601 s
tot time: 93.514040 s
Summary by cubic
Switched SimpleFS to configurable hash-based directory indexing (FNV‑1a in tests) to speed up file creation and lookup by mapping filenames to extent/block slots. On 30.6k files: create ~33% faster, delete ~12% faster, lookup ~41% faster.
New Features
Refactors
Written for commit c2316df. Summary will update automatically on new commits.