Affiliation Builder is currently in alpha and welcomes feedback from early adopters.
Build bipartite affiliation networks from JSON data using NetworkX.
Affiliation Builder is a Python package for creating bipartite networks from JSON data on co-affiliation relationships. It transforms structured data about entities (such as people and organizations) and their shared affiliations (such as in events) into NetworkX graph objects for analysis and visualization.
While designed with event-participant data in mind, the package works with any co-affiliation scenario where a set of entities connects to a set of items through shared relationships.
- Flexible JSON input: Supports various JSON structures (arrays, wrapped objects)
- Multiple entity types: Handle different entity types simultaneously (such as persons and organizations)
- Simple and complex entities: Work with string identifiers or objects
- Rich metadata: Preserve all JSON attributes as node properties
- URL support: Load data from local files or URLs
- Comprehensive validation: Detailed error messages and logging
- NetworkX integration: Returns standard NetworkX graph objects
The package has been developed and tested with:
- Python 3.9+
- NetworkX 3.0+
- Requests 2.31.0+
pip install affiliation-builderfrom affiliation_builder import build
# Build a bipartite network from JSON data
G = build(
json_path='events.json',
node_set_0_key='events',
node_set_1_keys='participants',
identifier_key='event_id',
node_set_1_identifier_key='person_name'
)
# Returns a standard NetworkX Graph object
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
# Access node sets
node_set_0 = {n for n, d in G.nodes(data=True) if d['bipartite'] == 0}
node_set_1 = {n for n, d in G.nodes(data=True) if d['bipartite'] == 1}The build() function has 5 parameters that control how your JSON data maps to the bipartite network:
What it is: Path to your local JSON file or URL
Examples:
json_path='data/events.json'
json_path='https://example.com/data.json'What it is: JSON key containing your items (such as events)
Use None if: JSON is direct array of items (not wrapped in an object)
Examples:
Wrapped object format (specify the key):
{
"events": [
{"id": "evt1", "participants": ["Alice", "Bob"]},
{"id": "evt2", "participants": ["Bob", "Carol"]}
]
}node_set_0_key='events'Direct array format (use None):
[
{"id": "evt1", "participants": ["Alice", "Bob"]},
{"id": "evt2", "participants": ["Bob", "Carol"]}
]node_set_0_key=NoneWhat it is: The JSON key(s) that contain the entities affiliated with each item
Pass list when: You have multiple entity types (e.g., both persons and organizations)
Examples:
Single entity type:
{"id": "evt1", "participants": ["Alice", "Bob"]}node_set_1_keys='participants'Multiple entity types:
{
"id": "evt1",
"persons": ["Alice", "Bob"],
"organizations": ["University A", "Company B"]
}node_set_1_keys=['persons', 'organizations']What it is: JSON key that uniquely identifies each item (e.g., event)
Examples:
{"id": "evt1", "name": "Conference 2024", ...}identifier_key='id'What it is: Key to extract identifiers from entity objects (when entities are objects, not strings)
Use None (default) when: Entities are simple strings/numbers
Pass key when: Entities are objects with multiple attributes
Examples:
Simple entities (strings):
{"id": "evt1", "participants": ["Alice", "Bob"]}node_set_1_identifier_key=NoneComplex entities (objects):
{
"id": "evt1",
"participants": [
{"person_name": "Alice", "role": "speaker", "affiliation": "MIT"},
{"person_name": "Bob", "role": "attendee", "affiliation": "Stanford"}
]
}# Extract 'Alice' and 'Bob' as node IDs
# All other attributes (role, affiliation) are preserved as node properties
node_set_1_identifier_key='person_name'{
"events": [
{"name": "Conference 2024", "participants": ["Alice", "Bob", "Carol"]},
{"name": "Workshop 2024", "participants": ["Bob", "David"]}
]
}G = build(
json_path='events.json',
node_set_0_key='events',
node_set_1_keys='participants',
identifier_key='name'
)[
{
"project_id": "proj1",
"members": [
{"name": "Alice", "role": "lead", "department": "Engineering"},
{"name": "Bob", "role": "contributor", "department": "Design"}
]
}
]G = build(
json_path='projects.json',
node_set_0_key=None, # Direct array
node_set_1_keys='members',
identifier_key='project_id',
node_set_1_identifier_key='name' # Extract name from member objects
)
# All attributes preserved as node properties
print(G.nodes['Alice']) # {'bipartite': 1, 'role': 'lead', 'department': 'Engineering'}{
"events": [
{
"name": "Summit 2024",
"persons": ["Alice", "Bob"],
"organizations": ["Company A", "University B"]
}
]
}G = build(
json_path='https://example.com/data/events.json',
node_set_0_key='events',
node_set_1_keys=['persons', 'organizations'], # Multiple types
identifier_key='name'
)The build() function returns a standard NetworkX Graph object with bipartite structure for further processing:
import networkx as nx
from affiliation_builder import build
# Build network
G = build('events.json', 'events', 'participants', 'event_id')
# Access node sets
events = {n for n, d in G.nodes(data=True) if d['bipartite'] == 0}
participants = {n for n, d in G.nodes(data=True) if d['bipartite'] == 1}
# Check bipartite validity
print(nx.is_bipartite(G))
# Analyze the network
print(f"Number of events: {len(events)}")
print(f"Number of participants: {len(participants)}")
print(f"Network density: {nx.density(G)}")
# Project to unipartite network
P = nx.bipartite.weighted_projected_graph(G, participants)
print(f"Co-affiliation edges: {P.number_of_edges()}")When the same entity appears multiple times (such as a participant in several events), the node is created once and edges are added for each affiliation. This is the expected behavior for affiliation networks.
However, if the same entity appears with different attributes in different items, the last set of attributes overwrites earlier sets. For example:
{
"events": [
{
"name": "Event 1",
"participants": [{"name": "Alice", "role": "speaker"}]
},
{
"name": "Event 2",
"participants": [{"name": "Alice", "role": "attendee"}]
}
]
}After processing, G.nodes['Alice'] will have role: 'attendee' (from Event 2), but not role: 'speaker' (from Event 1).
- UTF-8 encoding: Local JSON files must be UTF-8 encoded. Other encodings will raise an error. (URL sources handle encoding automatically based on server response headers.)
- Hashable identifiers: Node IDs must be hashable Python types (strings, numbers, tuples). Lists or dictionaries as identifiers will be skipped with a warning.
- Flat entity lists: Entity values (under
node_set_1_keys) must be arrays. Nested structures are not recursively processed.
Be aware of potential security risks when processing JSON data from untrusted sources:
- Large files: No size limits are enforced on JSON files or URL downloads
- Deep nesting: Extremely nested JSON structures could cause memory or stack issues
- Malicious data: An attacker could provide data designed to consume excessive resources
- Trust your sources: Only load JSON from sources you control or trust
- Validate externally: Pre-validate JSON files for size and structure if loading from untrusted sources
- Monitor resources: For production use, implement resource monitoring
- Sandbox if needed: Run in isolated environments if processing untrusted data
Future versions may include:
- Optional
max_sizeparameter for downloads - Configurable nesting depth limits
- Enhanced validation options
For now: Use this package with data from trusted sources, or implement your own validation layer for untrusted input.
The package uses Python's logging module. By default, log messages are not displayed. To receive processing information, configure logging in your application:
Display full logging from DEBUG level upward:
import logging
from affiliation_builder import build
logging.getLogger('affiliation_builder').setLevel(logging.DEBUG)
logging.getLogger('affiliation_builder').addHandler(logging.StreamHandler())Or set the level of logging to logging.INFO for logging only from INFO level upward.
See the examples/ directory for:
example.json- Sample JSON data structureexample.ipynb- Complete Jupyter Notebook with test analysis and visualization
- Added support for single entities as JSON objects: input data no longer requires entities to be wrapped in a list when there is only one item
- Initial release
- Core functionality for building bipartite affiliation networks from JSON data
- Support for flexible input formats
- Comprehensive error handling and logging
Contributions are welcome! Please feel free to submit issues or pull requests on GitHub.
This project is licensed under the MIT License - see this LICENSE for details.
If you use this software in your research, please cite:
@software{fruehwirth2025affiliation,
author = {Frühwirth, Timo},
title = {Affiliation Builder: Build bipartite affiliation networks from JSON data},
year = {2025},
url = {https://github.com/timofruehwirth/affiliation-builder},
version = {0.1.0}
}Built with NetworkX for network analysis and Requests for HTTP functionality.