Streaming parsers #1560
Replies: 4 comments 2 replies
-
|
The same would be useful for serializers. This would one allow to easily impelement convertors between different serializations. Of course, this would not easily work for serializers that want to have the triples in a certain order. |
Beta Was this translation helpful? Give feedback.
-
|
Hi, |
Beta Was this translation helpful? Give feedback.
-
|
It is quite hard to build a reliable data processing pipeline with rdflib due to lack of the streaming support, i.e., everything have to be loaded into the memory first, therefore you cannot predict how much memory you need in advance. See my discussion in https://github.com/orgs/Jelly-RDF/discussions/97 where I was exploring Jelly format to tackle the issue. One example use case would be to process an N-Triples file to fetch which ontology elements are used in there, so that this information could be used later for some query optimization or search purposes. |
Beta Was this translation helpful? Give feedback.
-
|
How about something like this? from rdflib.graph import Graph
class TripleStream(Graph):
def __init__(self, triple_callback):
super().__init__()
self.triple_callback = triple_callback
def add(self, triple):
self.triple_callback(triple)
g = TripleStream(lambda t: print(t))
g.parse(source="./example.ttl") |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently all parsers write into a graph. For cases where you want to stream-process RDF, it would be nice if triples could be handled as they come in.
Two fold:
1 Most of our parsers have a Triple Sink object:
Define this interface centrally and unify
2 Make sure we read the input stream as a stream, and do not read the whole thing into a string :)
Beta Was this translation helpful? Give feedback.
All reactions