Affiliation Builder

Affiliation Builder is currently in alpha and welcomes feedback from early adopters.

Affiliation Builder

Build bipartite affiliation networks from JSON data using NetworkX.

Overview

Affiliation Builder is a Python package for creating bipartite networks from JSON data on co-affiliation relationships. It transforms structured data about entities (such as people and organizations) and their shared affiliations (such as in events) into NetworkX graph objects for analysis and visualization.

While designed with event-participant data in mind, the package works with any co-affiliation scenario where a set of entities connects to a set of items through shared relationships.

Features

Flexible JSON input: Supports various JSON structures (arrays, wrapped objects)
Multiple entity types: Handle different entity types simultaneously (such as persons and organizations)
Simple and complex entities: Work with string identifiers or objects
Rich metadata: Preserve all JSON attributes as node properties
URL support: Load data from local files or URLs
Comprehensive validation: Detailed error messages and logging
NetworkX integration: Returns standard NetworkX graph objects

Requirements

The package has been developed and tested with:

Python 3.9+
NetworkX 3.0+
Requests 2.31.0+

Installation

pip install affiliation-builder

Quick Start

from affiliation_builder import build

# Build a bipartite network from JSON data
G = build(
    json_path='events.json',
    node_set_0_key='events',
    node_set_1_keys='participants',
    identifier_key='event_id',
    node_set_1_identifier_key='person_name'
)

# Returns a standard NetworkX Graph object
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")

# Access node sets
node_set_0 = {n for n, d in G.nodes(data=True) if d['bipartite'] == 0}
node_set_1 = {n for n, d in G.nodes(data=True) if d['bipartite'] == 1}

Understanding the Parameters

The build() function has 5 parameters that control how your JSON data maps to the bipartite network:

Parameter 1: `json_path` (str or Path)

What it is: Path to your local JSON file or URL

Examples:

json_path='data/events.json'
json_path='https://example.com/data.json'

Parameter 2: `node_set_0_key` (str or None)

What it is: JSON key containing your items (such as events)

Use None if: JSON is direct array of items (not wrapped in an object)

Examples:

Wrapped object format (specify the key):

{
  "events": [
    {"id": "evt1", "participants": ["Alice", "Bob"]},
    {"id": "evt2", "participants": ["Bob", "Carol"]}
  ]
}

node_set_0_key='events'

Direct array format (use None):

[
  {"id": "evt1", "participants": ["Alice", "Bob"]},
  {"id": "evt2", "participants": ["Bob", "Carol"]}
]

node_set_0_key=None

Parameter 3: `node_set_1_keys` (str or list of str)

What it is: The JSON key(s) that contain the entities affiliated with each item

Pass list when: You have multiple entity types (e.g., both persons and organizations)

Examples:

Single entity type:

{"id": "evt1", "participants": ["Alice", "Bob"]}

node_set_1_keys='participants'

Multiple entity types:

{
  "id": "evt1",
  "persons": ["Alice", "Bob"],
  "organizations": ["University A", "Company B"]
}

node_set_1_keys=['persons', 'organizations']

Parameter 4: `identifier_key` (str)

What it is: JSON key that uniquely identifies each item (e.g., event)

Examples:

{"id": "evt1", "name": "Conference 2024", ...}

identifier_key='id'

Parameter 5: `node_set_1_identifier_key` (str or None, optional)

What it is: Key to extract identifiers from entity objects (when entities are objects, not strings)

Use None (default) when: Entities are simple strings/numbers

Pass key when: Entities are objects with multiple attributes

Examples:

Simple entities (strings):

{"id": "evt1", "participants": ["Alice", "Bob"]}

node_set_1_identifier_key=None

Complex entities (objects):

{
  "id": "evt1",
  "participants": [
    {"person_name": "Alice", "role": "speaker", "affiliation": "MIT"},
    {"person_name": "Bob", "role": "attendee", "affiliation": "Stanford"}
  ]
}

# Extract 'Alice' and 'Bob' as node IDs
# All other attributes (role, affiliation) are preserved as node properties
node_set_1_identifier_key='person_name'

JSON Structure Examples

Example 1: Wrapped Object with Simple Entities

{
  "events": [
    {"name": "Conference 2024", "participants": ["Alice", "Bob", "Carol"]},
    {"name": "Workshop 2024", "participants": ["Bob", "David"]}
  ]
}

G = build(
    json_path='events.json',
    node_set_0_key='events',
    node_set_1_keys='participants',
    identifier_key='name'
)

Example 2: Direct Array with Complex Entities

[
  {
    "project_id": "proj1",
    "members": [
      {"name": "Alice", "role": "lead", "department": "Engineering"},
      {"name": "Bob", "role": "contributor", "department": "Design"}
    ]
  }
]

G = build(
    json_path='projects.json',
    node_set_0_key=None,  # Direct array
    node_set_1_keys='members',
    identifier_key='project_id',
    node_set_1_identifier_key='name'  # Extract name from member objects
)

# All attributes preserved as node properties
print(G.nodes['Alice'])  # {'bipartite': 1, 'role': 'lead', 'department': 'Engineering'}

Example 3: Multiple Entity Types

{
  "events": [
    {
      "name": "Summit 2024",
      "persons": ["Alice", "Bob"],
      "organizations": ["Company A", "University B"]
    }
  ]
}

G = build(
    json_path='https://example.com/data/events.json',
    node_set_0_key='events',
    node_set_1_keys=['persons', 'organizations'],  # Multiple types
    identifier_key='name'
)

Working with the Output

The build() function returns a standard NetworkX Graph object with bipartite structure for further processing:

import networkx as nx
from affiliation_builder import build

# Build network
G = build('events.json', 'events', 'participants', 'event_id')

# Access node sets
events = {n for n, d in G.nodes(data=True) if d['bipartite'] == 0}
participants = {n for n, d in G.nodes(data=True) if d['bipartite'] == 1}

# Check bipartite validity
print(nx.is_bipartite(G))

# Analyze the network
print(f"Number of events: {len(events)}")
print(f"Number of participants: {len(participants)}")
print(f"Network density: {nx.density(G)}")

# Project to unipartite network
P = nx.bipartite.weighted_projected_graph(G, participants)
print(f"Co-affiliation edges: {P.number_of_edges()}")

Duplicate Entity Node Handling

When the same entity appears multiple times (such as a participant in several events), the node is created once and edges are added for each affiliation. This is the expected behavior for affiliation networks.

However, if the same entity appears with different attributes in different items, the last set of attributes overwrites earlier sets. For example:

{
  "events": [
    {
      "name": "Event 1",
      "participants": [{"name": "Alice", "role": "speaker"}]
    },
    {
      "name": "Event 2",
      "participants": [{"name": "Alice", "role": "attendee"}]
    }
  ]
}

After processing, G.nodes['Alice'] will have role: 'attendee' (from Event 2), but not role: 'speaker' (from Event 1).

Limitations

UTF-8 encoding: Local JSON files must be UTF-8 encoded. Other encodings will raise an error. (URL sources handle encoding automatically based on server response headers.)
Hashable identifiers: Node IDs must be hashable Python types (strings, numbers, tuples). Lists or dictionaries as identifiers will be skipped with a warning.
Flat entity lists: Entity values (under node_set_1_keys) must be arrays. Nested structures are not recursively processed.

Security Considerations

Be aware of potential security risks when processing JSON data from untrusted sources:

Resource Exhaustion

Large files: No size limits are enforced on JSON files or URL downloads
Deep nesting: Extremely nested JSON structures could cause memory or stack issues
Malicious data: An attacker could provide data designed to consume excessive resources

Recommendations

Trust your sources: Only load JSON from sources you control or trust
Validate externally: Pre-validate JSON files for size and structure if loading from untrusted sources
Monitor resources: For production use, implement resource monitoring
Sandbox if needed: Run in isolated environments if processing untrusted data

Future Considerations

Future versions may include:

Optional max_size parameter for downloads
Configurable nesting depth limits
Enhanced validation options

For now: Use this package with data from trusted sources, or implement your own validation layer for untrusted input.

Logging

The package uses Python's logging module. By default, log messages are not displayed. To receive processing information, configure logging in your application:

Display full logging from DEBUG level upward:

import logging
from affiliation_builder import build

logging.getLogger('affiliation_builder').setLevel(logging.DEBUG)
logging.getLogger('affiliation_builder').addHandler(logging.StreamHandler())

Or set the level of logging to logging.INFO for logging only from INFO level upward.

Examples

See the examples/ directory for:

example.json - Sample JSON data structure
example.ipynb - Complete Jupyter Notebook with test analysis and visualization

Changelog

v0.2.0

Added support for single entities as JSON objects: input data no longer requires entities to be wrapped in a list when there is only one item

v0.1.0

Initial release
Core functionality for building bipartite affiliation networks from JSON data
Support for flexible input formats
Comprehensive error handling and logging

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests on GitHub.

License

This project is licensed under the MIT License - see this LICENSE for details.

Citation

If you use this software in your research, please cite:

@software{fruehwirth2025affiliation,
  author = {Frühwirth, Timo},
  title = {Affiliation Builder: Build bipartite affiliation networks from JSON data},
  year = {2025},
  url = {https://github.com/timofruehwirth/affiliation-builder},
  version = {0.1.0}
}

Acknowledgments

Built with NetworkX for network analysis and Requests for HTTP functionality.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
affiliation_builder		affiliation_builder
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

timofruehwirth/affiliation-builder

Folders and files

Latest commit

History

Repository files navigation

Affiliation Builder

Overview

Features

Requirements

Installation

Quick Start

Understanding the Parameters

Parameter 1: json_path (str or Path)

Parameter 2: node_set_0_key (str or None)

Parameter 3: node_set_1_keys (str or list of str)

Parameter 4: identifier_key (str)

Parameter 5: node_set_1_identifier_key (str or None, optional)

JSON Structure Examples

Example 1: Wrapped Object with Simple Entities

Example 2: Direct Array with Complex Entities

Example 3: Multiple Entity Types

Working with the Output

Duplicate Entity Node Handling

Limitations

Security Considerations

Resource Exhaustion

Recommendations

Future Considerations

Logging

Examples

Changelog

v0.2.0

v0.1.0

Contributing

License

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Parameter 1: `json_path` (str or Path)

Parameter 2: `node_set_0_key` (str or None)

Parameter 3: `node_set_1_keys` (str or list of str)

Parameter 4: `identifier_key` (str)

Parameter 5: `node_set_1_identifier_key` (str or None, optional)

Packages