Skip to content

Conversation

AleksanderWWW
Copy link

@AleksanderWWW AleksanderWWW commented Mar 19, 2025

Work in progress.

TODO:

  • implement hq_load function with HIF_Format switch
  • implement hq_save likewise
  • add tests

@AleksanderWWW AleksanderWWW marked this pull request as ready for review March 20, 2025 12:03
@AleksanderWWW
Copy link
Author

Seems that the first phase is ready.
It's far from perfect, so looking forward to a brutal review :)

Copy link
Owner

@pszufe pszufe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, see the comments.

@@ -73,6 +73,10 @@ h1[5,2] = 6.5
@test get_vertex_meta(h1, 1) == get_vertex_meta(loaded_hg, 1)
@test get_hyperedge_meta(h1, 2) == get_hyperedge_meta(loaded_hg, 2)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make unit tests also on data from the https://github.com/pszufe/HIF-standard/tree/main/tests/test_files/HIF-compliant from the HIF-standard libraries.
Make a copy of those files to test/data/HIF-standard folder in the package repository.
Provide in the above folder README.md explaining the files come from the above library.

Each of the files should be able to load and later save.
While saving provide code comparing both JSON outputs (parse both JSONs as Dicts and compare the Dicts)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand we still do not have yet code that iterates over HIF test files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to handle the fact, that in order to load such a file I need to explicitly pass the types of values therein?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I. e. is manually loading each file separately the only way forward? I don't see how I can dynamically do it in a loop iterating over the directory contents

src/io.jl Outdated
_ = format

json_hg = Dict{Symbol, Any}()
incidences = []
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls no untyped vectors /containers
https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-abstract-container
most likely this are something like Union{String, Int, Nothing} (I am not sure - please check what types are possible)

src/io.jl Outdated
he_meta = h.he_meta

if any(isnothing, h.v_meta)
v_meta = [i for i in 1:length(h.v_meta)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the intention of this code?
SimpleHypegraphs supports both Nothing as type of metadata (in that case there is no metadata) as well as concrete types

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but as far as I understand, we cannot have Nothing in edge or node metadata in the HIF format. This is probably not the best take, but the idea was that if there are missing metadata points, then we need to fallback to the solution, where edge or node ids are indexes in the matrix

src/io.jl Outdated
edge_dict = Dict(i => val for (i, val) in pairs(he_meta))


types = collect(typeof(h).parameters)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most likely yo wanted to have those in the method definition:

hg_save(io::IO, h::Hypergraph{T, V, E, D}, format::HIF_Format) where {T, V, E, D}

Check how parametric types work in Julia:
https://docs.julialang.org/en/v1/manual/types/

src/io.jl Outdated
incidences = []
v_meta = h.v_meta
he_meta = h.he_meta

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/io.jl Outdated
E = types[3]

for node_idx in 1:length(v_meta)
for edge_idx in 1:length(he_meta)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use eachindex as this is the recommended syntax
https://discourse.julialang.org/t/best-practices/85138

src/io.jl Outdated
for node_idx in 1:length(v_meta)
for edge_idx in 1:length(he_meta)
node = node_dict[node_idx]
if V == String
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dispatch on type (use multiple-dispatch property of the language)
BTW there is string(xs...) method - so string() works with any type.

src/io.jl Outdated

weight = h[node_idx, edge_idx]

if isnothing(weight)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterating over an entire sparse matrix is extremely inefficient (see the nested loop above)
Use gethyperedges or getvertices methods.

src/io.jl Outdated
format::HIF_Format;
T::Type{U} = Bool,
D::Type{<:AbstractDict{Int, U}} = Dict{Int, T},
V::Union{Type{String}, Type{Int}} = String,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since V and E are parametric the code should perform type casting when collecting their values from JSON.
So this should be V(val) or E(val). For string you will need to handle this separately as for the String target the string method should be called.

src/io.jl Outdated

data = JSON3.read(read(io, String))

nodes = get(data, "nodes", [])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no untyped containers pls (see the comment above)

@@ -74,6 +74,10 @@ end
include("hypergraph.jl")
include("io.jl")

include("hif/hif.jl")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just make those three files into one named io_hif.jl without creating the hif subfolder.
In Julia packages we do not try to keep one mehod per file

T::Type{U},
D::Type{<:AbstractDict{Int,U}},
) where {U<:Real}
node_metadata = Vector{Union{JSON3.Object, Nothing}}()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not want the user suddenly see some JSON3.Object just cast it to a String.
Additionally, the code could check if this is some more complex data structure and than make it into a Dict.
Perhaps what could help would be using second parameter for reading JSON3.read("test.json", Dict{String, Any})

append!(edge_metadata, [nothing for _ in 1:k])
end

return Hypergraph{T,JSON3.Object,JSON3.Object,D}(n, k, node_metadata, edge_metadata)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User wants to get as metadata parametric types something like String or Dict{String, String} for more complex metadata. I think we do not need to support deeper nesting than one level.


function add_weights_from_incidences!(
h::Hypergraph,
incidences::JSON3.Array{JSON3.Object},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those types will need to be updated once you sort out the way you read the data.
Also using just AbstractVector as function type argument in such places is good enough.

@@ -73,6 +73,10 @@ h1[5,2] = 6.5
@test get_vertex_meta(h1, 1) == get_vertex_meta(loaded_hg, 1)
@test get_hyperedge_meta(h1, 2) == get_hyperedge_meta(loaded_hg, 2)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand we still do not have yet code that iterates over HIF test files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants