Streaming parsers #1560

gromgull · 2013-05-16T12:23:37Z

gromgull
May 16, 2013
Maintainer

Currently all parsers write into a graph. For cases where you want to stream-process RDF, it would be nice if triples could be handled as they come in.

Two fold:

1 Most of our parsers have a Triple Sink object:

Define this interface centrally and unify

2 Make sure we read the input stream as a stream, and do not read the whole thing into a string :)

uholzer · 2013-05-16T16:22:29Z

uholzer
May 16, 2013
Collaborator

The same would be useful for serializers. This would one allow to easily impelement convertors between different serializations. Of course, this would not easily work for serializers that want to have the triples in a certain order.

0 replies

rlk-ama · 2014-03-07T13:09:20Z

rlk-ama
Mar 7, 2014

Hi,
have you done anything in that direction yet ?

0 replies

KMax · 2025-09-26T13:02:36Z

KMax
Sep 26, 2025

It is quite hard to build a reliable data processing pipeline with rdflib due to lack of the streaming support, i.e., everything have to be loaded into the memory first, therefore you cannot predict how much memory you need in advance. See my discussion in https://github.com/orgs/Jelly-RDF/discussions/97 where I was exploring Jelly format to tackle the issue.

One example use case would be to process an N-Triples file to fetch which ontology elements are used in there, so that this information could be used later for some query optimization or search purposes.

1 reply

Ostrzyciel Sep 26, 2025

In pyjelly we do have a streaming parser for the Jelly format that works with RDFLib. It can output to an RDFLib sink (graph or dataset), but we also have an extra method that returns a generator of RDF statements: https://github.com/Jelly-RDF/pyjelly/blob/e60e52efa3c47906517b361511dfe309a26171b0/pyjelly/integrations/rdflib/parse.py#L450

I guess you could hack together something like this with the current RDFLib parser, if you passed in a sink that would ingest the statements and make them immediately available as a generator. Or at least send these statements to some callback function. In Jelly we essentially circumvented that entirely.

nichtich · 2025-11-05T08:05:51Z

nichtich
Nov 5, 2025

How about something like this?

from rdflib.graph import Graph

class TripleStream(Graph):
    def __init__(self, triple_callback):
        super().__init__()
        self.triple_callback = triple_callback
    def add(self, triple):
        self.triple_callback(triple)

g = TripleStream(lambda t: print(t))
g.parse(source="./example.ttl")

1 reply

nichtich Nov 5, 2025

Extended version with iterator instead of callback:

from queue import Queue
import threading
from rdflib.graph import Graph

def parse_rdf_triples(source, buffer=10):
    q = Queue(maxsize=buffer)

    class TripleStream(Graph):
        def __init__(self):
            super().__init__()
        def add(self, triple):
            q.put(triple)

    def task():
        g = TripleStream()
        try:
            g.parse(source=source)
        except Exception as e:
            q.put(e)
        q.put(False)

    t = threading.Thread(target=task)
    t.start()

    while True:
        triple = q.get(timeout=5)
        if not triple:
            break
        if type(triple) is Exception:
            raise Exception
        yield triple

    t.join()

for triple in parse_rdf_triples(source="./example.ttl"):
    print(triple)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming parsers #1560

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Streaming parsers #1560

Uh oh!

gromgull May 16, 2013 Maintainer

Replies: 4 comments · 2 replies

Uh oh!

uholzer May 16, 2013 Collaborator

Uh oh!

rlk-ama Mar 7, 2014

Uh oh!

Uh oh!

KMax Sep 26, 2025

Uh oh!

Ostrzyciel Sep 26, 2025

Uh oh!

nichtich Nov 5, 2025

Uh oh!

Uh oh!

nichtich Nov 5, 2025

gromgull
May 16, 2013
Maintainer

Replies: 4 comments 2 replies

uholzer
May 16, 2013
Collaborator

rlk-ama
Mar 7, 2014

KMax
Sep 26, 2025

nichtich
Nov 5, 2025