-
Notifications
You must be signed in to change notification settings - Fork 2.2k
multi: update ChanUpdatesInHorizon and NodeUpdatesInHorizon to return iterators (iter.Seq[T]) #10128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
In this commit, we introduce a new utility function `Collect` to the fn package. This function drains all elements from an iterator and returns them as a slice. This is particularly useful when transitioning from iterator-based APIs to code that expects slices, allowing for gradual migration to the new iterator patterns. The fn module's go.mod is also updated to require Go 1.23, which is necessary for the built-in iter.Seq type support.
In this commit, we add a replace directive to use the local fn package that now includes the new Collect function for iterators. This ensures that the main module can access the iterator utilities we've added. The replace directive will be removed once the fn package changes are merged and a new version is tagged.
In this commit, we introduce a new options pattern for configuring iterator behavior in the graph database. This includes configuration for batch sizes when iterating over channel and node updates, as well as an option to filter for public nodes only. The new functional options pattern allows callers to customize iterator behavior without breaking existing APIs. Default batch sizes are set to 1000 entries for both channel and node updates, which provides a good balance between memory usage and performance.
In this commit, we refactor the NodeUpdatesInHorizon method to return an iterator instead of a slice. This change significantly reduces memory usage when dealing with large result sets by allowing callers to process items incrementally rather than loading everything into memory at once. The new implementation uses Go 1.23's iter.Seq type to provide a standard iterator interface. The method now supports configurable batch sizes through functional options, allowing fine-tuned control over memory usage and performance characteristics. Rather than reading all the entries from disk into memory (before this commit, we did consult the cache for most entries, skipping the disk hits), we now expose a chunked iterator instead. We also make the process of filtering out public nodes first class. This saves many newly created db transactions later.
In this commit, we refactor the ChanUpdatesInHorizon method to return an iterator instead of a slice. This change significantly reduces memory usage when dealing with large result sets by allowing callers to process items incrementally rather than loading everything into memory at once.
In this commit, we update the SQL store implementation to support the new iterator-based API for NodeUpdatesInHorizon. This includes adding a new SQL query that supports efficient pagination through result sets. The SQL implementation uses cursor-based pagination with configurable batch sizes, allowing efficient iteration over large result sets without loading everything into memory. The query is optimized to use indexes effectively and minimize database round trips. New SQL query GetNodesByLastUpdateRange is updated to support: * Cursor-based pagination using (last_update, pub_key) compound cursor * Optional filtering for public nodes only * Configurable batch sizes via MaxResults parameter
In this commit, we update the SQL store implementation to support the new iterator-based API for ChanUpdatesInHorizon. This includes adding SQL query pagination support and helper functions for efficient batch processing. The SQL implementation uses cursor-based pagination with configurable batch sizes, allowing efficient iteration over large result sets without loading everything into memory. The query is optimized to use indexes effectively and minimize database round trips. New SQL query GetChannelsByPolicyLastUpdateRange is updated to support: - Cursor-based pagination using (max_update_time, id) compound cursor - Configurable batch sizes via MaxResults parameter - Efficient batch caching with updateChanCacheBatch helper
In this commit, we update all callers of NodeUpdatesInHorizon and ChanUpdatesInHorizon to use the new iterator-based APIs. The changes use fn.Collect to maintain existing behavior while benefiting from the memory efficiency of iterators when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @Roasbeef, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a significant architectural shift in how large sets of channel and node updates are retrieved from the graph database. By migrating to Go's new iter.Seq[T]
interface, the system now supports streaming and batched data retrieval, leading to reduced memory consumption and improved performance. Furthermore, optimizations have been applied to reduce mutex contention during cache updates and to integrate public node filtering directly into the data fetching process, enhancing overall efficiency and concurrency.
Highlights
- API Refactor: Iterator-based Data Retrieval: The
ChanUpdatesInHorizon
andNodeUpdatesInHorizon
functions have been updated to returniter.Seq[T]
instead of slices. This enables streaming and batched retrieval of graph data, allowing for more efficient memory usage and early termination of data processing. - Performance Optimization: Batched Database Reads: Data is now fetched from both KV and SQL databases in configurable batches. This reduces the memory footprint for large queries and improves overall performance by avoiding loading the entire dataset into memory at once.
- Concurrency Improvement: Reduced Mutex Contention: The cache update mechanism has been refined. Instead of holding a global cache mutex for the entire duration of a data retrieval operation, updates are now applied incrementally per batch. This significantly reduces the time the mutex is held, thereby decreasing contention and improving concurrency.
- Query Optimization: Integrated Public Node Filtering: Public node filtering, previously a post-retrieval step, has been integrated directly into the underlying database queries (for SQL) or the batch processing logic (for KV store). This pushes filtering closer to the data source, improving efficiency.
- Go Version Update: The project's Go module version has been updated from 1.19 to 1.23 to support the new
iter
package, which is central to the API changes in this PR.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a great refactoring to use iterators for fetching channel and node updates, which should significantly improve performance and reduce database transaction times for large graphs. The optimizations to the SQL queries are also a welcome improvement.
I've found a few minor issues, mostly related to style guide adherence (line length and typos in comments), and one more significant issue regarding an inconsistency in how time ranges are handled between the updated functions. Please take a look at the comments.
ChanUpdatesInHorizon(startTime, endTime time.Time, | ||
opts ...Option) (iter.Seq[ChannelEdge], error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be an inconsistency in how the endTime
parameter is handled between ChanUpdatesInHorizon
and NodeUpdatesInHorizon
.
NodeUpdatesInHorizon
is now inclusive ofendTime
in bothkv_store
andsql_store
implementations. This is a change from the previous behavior in the SQL store.ChanUpdatesInHorizon
is now exclusive ofendTime
in both implementations. This is a change from the previous behavior in thekv_store
.
This inconsistency in the API can be surprising and lead to bugs. For API consistency, it would be better if both functions handled the time range inclusively or exclusively in the same way. I'd recommend making both inclusive to match the new behavior of NodeUpdatesInHorizon
.
// We'll start by creating 100 nodes, each with an update time // spaced | ||
// one hour apart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | ||
require.NoError(t, err) | ||
|
||
// Collect only up to stopAt nodes, breakign afterwards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Create a fresh graph for each test, then add two new | ||
// nodes to teh graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Next, we'll create 25 channels between the two nodes, | ||
// each with increasign timestamps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Now we'll run teh main query, and verify that we get | ||
// back the expected number of channels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// With all the relevent buckets read, we'll now create a fresh | ||
// read cusor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// We'll now obtain a cursor to perform a range query within the | ||
// index to find all node announcements within the horizon. | ||
// The nodeUpdateIndex key format is: [8 bytes timestamp][33 bytes node pubkey] | ||
// This allows efficient range queries by time while maintaining a stable | ||
// sort order for nodes with the same timestamp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments exceed the 80-character line limit specified in the style guide.1
Please consider re-wrapping them for better readability and to adhere to the project's coding standards.
Style Guide References
// We'll now obtain a cursor to perform a range query within the | |
// index to find all node announcements within the horizon. | |
// The nodeUpdateIndex key format is: [8 bytes timestamp][33 bytes node pubkey] | |
// This allows efficient range queries by time while maintaining a stable | |
// sort order for nodes with the same timestamp. | |
// We'll now obtain a cursor to perform a range query within the | |
// index to find all node announcements within the horizon. The | |
// nodeUpdateIndex key format is: [8b timestamp][33b pubkey]. | |
// This allows efficient range queries by time while maintaining a | |
// stable sort order for nodes with the same timestamp. |
Footnotes
Cool idea. Will we be able to thread this through to There would still be an outgoing bandwidth issue, but this would help with excessive memory use. |
@Roasbeef, remember to re-request review from reviewers when ready |
Yep, I realize now that I stopped just short of threading it up to that level. The missing change here would be updating
Isn't this effectively addressed via the |
Haven't had an excuse to make an iterator yet, so I nerd-sniped myself into the creation of this PR.
This PR does a few things:
The cache changes now mean that an invocation doesn't have a consistent view of the cache, but for cases like this (serving gossip data to peers, can be lossy), we don't really need a consistent snapshot. This change should reduce over all mutex contention as well.