-
Notifications
You must be signed in to change notification settings - Fork 15
feat: Add support for HIF format #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Seems that the first phase is ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, see the comments.
@@ -73,6 +73,10 @@ h1[5,2] = 6.5 | |||
@test get_vertex_meta(h1, 1) == get_vertex_meta(loaded_hg, 1) | |||
@test get_hyperedge_meta(h1, 2) == get_hyperedge_meta(loaded_hg, 2) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make unit tests also on data from the https://github.com/pszufe/HIF-standard/tree/main/tests/test_files/HIF-compliant from the HIF-standard libraries.
Make a copy of those files to test/data/HIF-standard folder in the package repository.
Provide in the above folder README.md explaining the files come from the above library.
Each of the files should be able to load and later save.
While saving provide code comparing both JSON outputs (parse both JSONs as Dicts and compare the Dicts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand we still do not have yet code that iterates over HIF test files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to handle the fact, that in order to load such a file I need to explicitly pass the types of values therein?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I. e. is manually loading each file separately the only way forward? I don't see how I can dynamically do it in a loop iterating over the directory contents
src/io.jl
Outdated
_ = format | ||
|
||
json_hg = Dict{Symbol, Any}() | ||
incidences = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls no untyped vectors /containers
https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-abstract-container
most likely this are something like Union{String, Int, Nothing}
(I am not sure - please check what types are possible)
src/io.jl
Outdated
he_meta = h.he_meta | ||
|
||
if any(isnothing, h.v_meta) | ||
v_meta = [i for i in 1:length(h.v_meta)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the intention of this code?
SimpleHypegraphs supports both Nothing
as type of metadata (in that case there is no metadata) as well as concrete types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but as far as I understand, we cannot have Nothing
in edge or node metadata in the HIF format. This is probably not the best take, but the idea was that if there are missing metadata points, then we need to fallback to the solution, where edge or node ids are indexes in the matrix
src/io.jl
Outdated
edge_dict = Dict(i => val for (i, val) in pairs(he_meta)) | ||
|
||
|
||
types = collect(typeof(h).parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most likely yo wanted to have those in the method definition:
hg_save(io::IO, h::Hypergraph{T, V, E, D}, format::HIF_Format) where {T, V, E, D}
Check how parametric types work in Julia:
https://docs.julialang.org/en/v1/manual/types/
src/io.jl
Outdated
incidences = [] | ||
v_meta = h.v_meta | ||
he_meta = h.he_meta | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not change type of a variable
https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-changing-the-type-of-a-variable
src/io.jl
Outdated
E = types[3] | ||
|
||
for node_idx in 1:length(v_meta) | ||
for edge_idx in 1:length(he_meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use eachindex
as this is the recommended syntax
https://discourse.julialang.org/t/best-practices/85138
src/io.jl
Outdated
for node_idx in 1:length(v_meta) | ||
for edge_idx in 1:length(he_meta) | ||
node = node_dict[node_idx] | ||
if V == String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dispatch on type (use multiple-dispatch property of the language)
BTW there is string(xs...)
method - so string()
works with any type.
src/io.jl
Outdated
|
||
weight = h[node_idx, edge_idx] | ||
|
||
if isnothing(weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterating over an entire sparse matrix is extremely inefficient (see the nested loop above)
Use gethyperedges
or getvertices
methods.
src/io.jl
Outdated
format::HIF_Format; | ||
T::Type{U} = Bool, | ||
D::Type{<:AbstractDict{Int, U}} = Dict{Int, T}, | ||
V::Union{Type{String}, Type{Int}} = String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since V
and E
are parametric the code should perform type casting when collecting their values from JSON.
So this should be V(val)
or E(val)
. For string you will need to handle this separately as for the String
target the string
method should be called.
src/io.jl
Outdated
|
||
data = JSON3.read(read(io, String)) | ||
|
||
nodes = get(data, "nodes", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no untyped containers pls (see the comment above)
src/SimpleHypergraphs.jl
Outdated
@@ -74,6 +74,10 @@ end | |||
include("hypergraph.jl") | |||
include("io.jl") | |||
|
|||
include("hif/hif.jl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just make those three files into one named io_hif.jl
without creating the hif
subfolder.
In Julia packages we do not try to keep one mehod per file
src/hif/hif_load.jl
Outdated
T::Type{U}, | ||
D::Type{<:AbstractDict{Int,U}}, | ||
) where {U<:Real} | ||
node_metadata = Vector{Union{JSON3.Object, Nothing}}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not want the user suddenly see some JSON3.Object
just cast it to a String
.
Additionally, the code could check if this is some more complex data structure and than make it into a Dict.
Perhaps what could help would be using second parameter for reading JSON3.read("test.json", Dict{String, Any})
src/hif/hif_load.jl
Outdated
append!(edge_metadata, [nothing for _ in 1:k]) | ||
end | ||
|
||
return Hypergraph{T,JSON3.Object,JSON3.Object,D}(n, k, node_metadata, edge_metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User wants to get as metadata parametric types something like String
or Dict{String, String}
for more complex metadata. I think we do not need to support deeper nesting than one level.
src/hif/hif_load.jl
Outdated
|
||
function add_weights_from_incidences!( | ||
h::Hypergraph, | ||
incidences::JSON3.Array{JSON3.Object}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those types will need to be updated once you sort out the way you read the data.
Also using just AbstractVector
as function type argument in such places is good enough.
@@ -73,6 +73,10 @@ h1[5,2] = 6.5 | |||
@test get_vertex_meta(h1, 1) == get_vertex_meta(loaded_hg, 1) | |||
@test get_hyperedge_meta(h1, 2) == get_hyperedge_meta(loaded_hg, 2) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand we still do not have yet code that iterates over HIF test files.
Work in progress.
TODO:
hq_load
function withHIF_Format
switchhq_save
likewise