-
Notifications
You must be signed in to change notification settings - Fork 68
Category: [B2]; Team name: NPL; Dataset: Chordonomicon #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1. Find the segments of chords based on the <verse> label. 2. Inference the tone for each segment. 3. Transpose the chords into the Roman numeral expression.
Now the file includes a systematic analysis. But still haven't used for every song in the dataframe. Also includes a dictionary for search the mapping between chord-scales. (See the last part)
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small possible workaround to allow edge level prediction, at least with NoReadOut
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small possible workaround to allow edge level prediction, at least with NoReadOut
Checklist
Description
This pull request introduces a benchmark task based on musical chord synergy and redundancy.
The goal is to assess how well topological models capture the structure of chords, which are naturally occurring higher-order objects.
The task consists of predicting a theoretically grounded information-theoretic quantity, the local O-information [1], computed from real musical data, thereby enabling evaluation at the hyperedge level.
Concretely, we propose a hyperedge regression task with the following characteristics:
Dataset
We integrated a chord dataset, derived from the CHORDONOMICON dataset [3], which contains over 600 000 songs annotated with their chord progressions. We preprocess this dataset by standardizing chord notation, mapping every chord to a common alphabet of$12$ notes in the base version.
We then aggregate occurrences to construct a hypergraph in which each of the 226 hyperedges corresponds to a unique chord. The resulting processed datasets are publicly available on Hugging Face: link.
We additionally provide the pre-aggregation data, which can support alternative tasks such as predicting musical genre from hyperedges.
The dataset is available in 2 versions, depending on how musical scales are treated:
single_scale: notes are merged across octaves (e.g., C♯2 and C♯3 are treated as the same pitch class), yielding 12 distinct notes.all_scales: octave information is preserved, and notes at different octaves are treated as distinct, yielding 38 total notes. In this case the number of hyperedges is 4313.The choice of which dataset to load is made in the configuration file (chordonomicon.yaml) or directly with an argument in the dataset class (ChordonomiconDataset).
Issue
This benchmark task introduces local O-information in TDL evaluation: it is a mathematically rigorous measure of synergy and redundancy in multivariate systems.
Using it as a regression target, the goal is to set up a task for which:
Expressivity of TDL models
Local O-information [1], derived from the O-information [2], is an interesting quantity because it assess for each hyperedge whether the information it contains is redundant (recoverable from lower-order interactions) or synergistic (emerging only from higher-order interactions).$n$ -tuple $x^n$ is given as follow (see eq 4 in [1] for more details):
Concretely, the local O-information for an
where$h$ is the information-content function, corresponding to an hyperedge, $x_j$ is the marginalisation for variable $j$ (over $(n-1)$ variables) and $x^n_{-j}$ is the marginalisation over $j$ , that is, a function of all variables except $j$ .
Computing local O-information requires contrasting information across different orders, hence we think it might be a good evaluation of model expressivity, in the same spirit of the WL tests for GNNs.
Additional context
Limitations
Due to the structured nature of musical harmony, chords follow patterns and only cover a fraction of all possible combinations of the 12 pitch classes.
As a result, the number of hyperedges in the single-scale setting remains relatively modest (226).
When all scales are included, the number increases substantially (4,313), but the corresponding empirical frequencies become more variable and contain more outliers, which in turn introduces additional noise into the labels.
References
[1] "Quantifying high-order interdependencies on individual patterns via the local O-information: Theory and applications to music analysis", Scagliarini et al
[2] "Quantifying high-order interdependencies via multivariate extensions of the mutual information", Rosas et al
[3] "CHORDONOMICON: A Dataset of 666,000 Songs and their Chord Progressions", Kantarelis et al