Skip to content

timofruehwirth/affiliation-builder

Repository files navigation

Affiliation Builder is currently in alpha and welcomes feedback from early adopters.

Affiliation Builder

Build bipartite affiliation networks from JSON data using NetworkX.

PyPI version Python 3.9+ License: MIT

Overview

Affiliation Builder is a Python package for creating bipartite networks from JSON data on co-affiliation relationships. It transforms structured data about entities (such as people and organizations) and their shared affiliations (such as in events) into NetworkX graph objects for analysis and visualization.

While designed with event-participant data in mind, the package works with any co-affiliation scenario where a set of entities connects to a set of items through shared relationships.

Features

  • Flexible JSON input: Supports various JSON structures (arrays, wrapped objects)
  • Multiple entity types: Handle different entity types simultaneously (such as persons and organizations)
  • Simple and complex entities: Work with string identifiers or objects
  • Rich metadata: Preserve all JSON attributes as node properties
  • URL support: Load data from local files or URLs
  • Comprehensive validation: Detailed error messages and logging
  • NetworkX integration: Returns standard NetworkX graph objects

Requirements

The package has been developed and tested with:

  • Python 3.9+
  • NetworkX 3.0+
  • Requests 2.31.0+

Installation

pip install affiliation-builder

Quick Start

from affiliation_builder import build

# Build a bipartite network from JSON data
G = build(
    json_path='events.json',
    node_set_0_key='events',
    node_set_1_keys='participants',
    identifier_key='event_id',
    node_set_1_identifier_key='person_name'
)

# Returns a standard NetworkX Graph object
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")

# Access node sets
node_set_0 = {n for n, d in G.nodes(data=True) if d['bipartite'] == 0}
node_set_1 = {n for n, d in G.nodes(data=True) if d['bipartite'] == 1}

Understanding the Parameters

The build() function has 5 parameters that control how your JSON data maps to the bipartite network:

Parameter 1: json_path (str or Path)

What it is: Path to your local JSON file or URL

Examples:

json_path='data/events.json'
json_path='https://example.com/data.json'

Parameter 2: node_set_0_key (str or None)

What it is: JSON key containing your items (such as events)

Use None if: JSON is direct array of items (not wrapped in an object)

Examples:

Wrapped object format (specify the key):

{
  "events": [
    {"id": "evt1", "participants": ["Alice", "Bob"]},
    {"id": "evt2", "participants": ["Bob", "Carol"]}
  ]
}
node_set_0_key='events'

Direct array format (use None):

[
  {"id": "evt1", "participants": ["Alice", "Bob"]},
  {"id": "evt2", "participants": ["Bob", "Carol"]}
]
node_set_0_key=None

Parameter 3: node_set_1_keys (str or list of str)

What it is: The JSON key(s) that contain the entities affiliated with each item

Pass list when: You have multiple entity types (e.g., both persons and organizations)

Examples:

Single entity type:

{"id": "evt1", "participants": ["Alice", "Bob"]}
node_set_1_keys='participants'

Multiple entity types:

{
  "id": "evt1",
  "persons": ["Alice", "Bob"],
  "organizations": ["University A", "Company B"]
}
node_set_1_keys=['persons', 'organizations']

Parameter 4: identifier_key (str)

What it is: JSON key that uniquely identifies each item (e.g., event)

Examples:

{"id": "evt1", "name": "Conference 2024", ...}
identifier_key='id'

Parameter 5: node_set_1_identifier_key (str or None, optional)

What it is: Key to extract identifiers from entity objects (when entities are objects, not strings)

Use None (default) when: Entities are simple strings/numbers

Pass key when: Entities are objects with multiple attributes

Examples:

Simple entities (strings):

{"id": "evt1", "participants": ["Alice", "Bob"]}
node_set_1_identifier_key=None

Complex entities (objects):

{
  "id": "evt1",
  "participants": [
    {"person_name": "Alice", "role": "speaker", "affiliation": "MIT"},
    {"person_name": "Bob", "role": "attendee", "affiliation": "Stanford"}
  ]
}
# Extract 'Alice' and 'Bob' as node IDs
# All other attributes (role, affiliation) are preserved as node properties
node_set_1_identifier_key='person_name'

JSON Structure Examples

Example 1: Wrapped Object with Simple Entities

{
  "events": [
    {"name": "Conference 2024", "participants": ["Alice", "Bob", "Carol"]},
    {"name": "Workshop 2024", "participants": ["Bob", "David"]}
  ]
}
G = build(
    json_path='events.json',
    node_set_0_key='events',
    node_set_1_keys='participants',
    identifier_key='name'
)

Example 2: Direct Array with Complex Entities

[
  {
    "project_id": "proj1",
    "members": [
      {"name": "Alice", "role": "lead", "department": "Engineering"},
      {"name": "Bob", "role": "contributor", "department": "Design"}
    ]
  }
]
G = build(
    json_path='projects.json',
    node_set_0_key=None,  # Direct array
    node_set_1_keys='members',
    identifier_key='project_id',
    node_set_1_identifier_key='name'  # Extract name from member objects
)

# All attributes preserved as node properties
print(G.nodes['Alice'])  # {'bipartite': 1, 'role': 'lead', 'department': 'Engineering'}

Example 3: Multiple Entity Types

{
  "events": [
    {
      "name": "Summit 2024",
      "persons": ["Alice", "Bob"],
      "organizations": ["Company A", "University B"]
    }
  ]
}
G = build(
    json_path='https://example.com/data/events.json',
    node_set_0_key='events',
    node_set_1_keys=['persons', 'organizations'],  # Multiple types
    identifier_key='name'
)

Working with the Output

The build() function returns a standard NetworkX Graph object with bipartite structure for further processing:

import networkx as nx
from affiliation_builder import build

# Build network
G = build('events.json', 'events', 'participants', 'event_id')

# Access node sets
events = {n for n, d in G.nodes(data=True) if d['bipartite'] == 0}
participants = {n for n, d in G.nodes(data=True) if d['bipartite'] == 1}

# Check bipartite validity
print(nx.is_bipartite(G))

# Analyze the network
print(f"Number of events: {len(events)}")
print(f"Number of participants: {len(participants)}")
print(f"Network density: {nx.density(G)}")

# Project to unipartite network
P = nx.bipartite.weighted_projected_graph(G, participants)
print(f"Co-affiliation edges: {P.number_of_edges()}")

Duplicate Entity Node Handling

When the same entity appears multiple times (such as a participant in several events), the node is created once and edges are added for each affiliation. This is the expected behavior for affiliation networks.

However, if the same entity appears with different attributes in different items, the last set of attributes overwrites earlier sets. For example:

{
  "events": [
    {
      "name": "Event 1",
      "participants": [{"name": "Alice", "role": "speaker"}]
    },
    {
      "name": "Event 2",
      "participants": [{"name": "Alice", "role": "attendee"}]
    }
  ]
}

After processing, G.nodes['Alice'] will have role: 'attendee' (from Event 2), but not role: 'speaker' (from Event 1).

Limitations

  • UTF-8 encoding: Local JSON files must be UTF-8 encoded. Other encodings will raise an error. (URL sources handle encoding automatically based on server response headers.)
  • Hashable identifiers: Node IDs must be hashable Python types (strings, numbers, tuples). Lists or dictionaries as identifiers will be skipped with a warning.
  • Flat entity lists: Entity values (under node_set_1_keys) must be arrays. Nested structures are not recursively processed.

Security Considerations

Be aware of potential security risks when processing JSON data from untrusted sources:

Resource Exhaustion

  • Large files: No size limits are enforced on JSON files or URL downloads
  • Deep nesting: Extremely nested JSON structures could cause memory or stack issues
  • Malicious data: An attacker could provide data designed to consume excessive resources

Recommendations

  1. Trust your sources: Only load JSON from sources you control or trust
  2. Validate externally: Pre-validate JSON files for size and structure if loading from untrusted sources
  3. Monitor resources: For production use, implement resource monitoring
  4. Sandbox if needed: Run in isolated environments if processing untrusted data

Future Considerations

Future versions may include:

  • Optional max_size parameter for downloads
  • Configurable nesting depth limits
  • Enhanced validation options

For now: Use this package with data from trusted sources, or implement your own validation layer for untrusted input.

Logging

The package uses Python's logging module. By default, log messages are not displayed. To receive processing information, configure logging in your application:

Display full logging from DEBUG level upward:

import logging
from affiliation_builder import build

logging.getLogger('affiliation_builder').setLevel(logging.DEBUG)
logging.getLogger('affiliation_builder').addHandler(logging.StreamHandler())

Or set the level of logging to logging.INFO for logging only from INFO level upward.

Examples

See the examples/ directory for:

  • example.json - Sample JSON data structure
  • example.ipynb - Complete Jupyter Notebook with test analysis and visualization

Changelog

v0.2.0

  • Added support for single entities as JSON objects: input data no longer requires entities to be wrapped in a list when there is only one item

v0.1.0

  • Initial release
  • Core functionality for building bipartite affiliation networks from JSON data
  • Support for flexible input formats
  • Comprehensive error handling and logging

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests on GitHub.

License

This project is licensed under the MIT License - see this LICENSE for details.

Citation

If you use this software in your research, please cite:

@software{fruehwirth2025affiliation,
  author = {Frühwirth, Timo},
  title = {Affiliation Builder: Build bipartite affiliation networks from JSON data},
  year = {2025},
  url = {https://github.com/timofruehwirth/affiliation-builder},
  version = {0.1.0}
}

Acknowledgments

Built with NetworkX for network analysis and Requests for HTTP functionality.

About

Build bipartite affiliation networks from JSON event data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages