Skip to content

Conversation

sjfhsjfh
Copy link
Contributor

@sjfhsjfh sjfhsjfh commented Jul 9, 2025

Added a prose to introduce the basics of DORA

Might include some flaws, any advice is welcomed

@sjfhsjfh
Copy link
Contributor Author

sjfhsjfh commented Jul 9, 2025

Related: dora-rs/dora#896

Copy link
Contributor

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for starting this work!

(I'm also planning to work on documentation in the next days/weeks. I'll probably document the dataflow YAML spec next. Just so that we don't do duplicated work.)

Comment on lines +11 to +26
```yml
nodes:
- id: hello_dora
build: pip install -e .
path: dora-helloworld
inputs:
tick: dora/timer/millis/20
outputs:
- hello
- id: hello_dora_2
build: pip install dora-hello
path: dora-hello
args: --name="World"
inputs:
hello: hello_dora/hello
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to make one of the nodes a Rust node to show that you can combine the two programming languages.

```yml
nodes:
- id: hello_dora
build: pip install -e .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This build command only works if you are inside a specific repo, right? Perhaps something that installs from a published package works better?


It describes a node by specifying its inputs and outputs, along with some other properties.

Each node is actually a Python package, with a main entry script that will be executed when the node is started.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodes can also be Rust executables, instead of Python scripts.


Each node is actually a Python package, with a main entry script that will be executed when the node is started.

A dataflow is actually an instance of a dataflow definition (the YAML file). You can start multiple dataflows from the same definition if you want. Each dataflow will be assigned a unique ID, which can be used to manage the dataflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we have to talk about dataflow IDs at this point. I think it's better to stick to dora run in this first example, which does not require any IDs.

Comment on lines +35 to +36
$ dora run hello.yml
dataflow start triggered: 0197a739-cb05-70b7-9714-f46476ebd16c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the expected output for a dora run command, is it?


> Sometimes the nodes in the same dataflow may exist in different machines, each machine will run a daemon, and the coordinator will be responsible for managing the dataflow (and of course, the nodes) across these daemons.

Running a dataflow requires a coordinator to be running already. If you don’t have one, no worries – simply use the `dora run` command (similar to docker run). This will start a coordinator (if one isn’t already running) and then run the dataflow for you:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in not exactly true. Dora run runs a dataflow without spawning any coordinator, or visible daemon.

I think it's a good idea to start with a basic dora run command that runs a dataflow locally. Then we can introduce the coordinator, daemon, and dora start in a separate chapter (e.g. a chapter named "Running Dataflows on Multiple Machines").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually our (Mivik and I) plan regarding the CLI behavior of the coordinator/daemon. We think it's important to inform users concisely about what the program is currently doing (or has done), such as starting the daemon.

After all, the concepts of coordinator and daemon aren't that complicated — they're still fairly easy to understand, in my opinion. Also, to keep the structure reasonable and focused on the basics, I kept the introduction to multi-machine introduction in a quote block as a more advanced topic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually our (Mivik and I) plan regarding the CLI behavior of the coordinator/daemon. We think it's important to inform users concisely about what the program is currently doing (or has done), such as starting the daemon.

That's fine with me in general! However, dora run doesn't spawn a coordinator, not even an internal one. Instead, it launches a special variant of the daemon that doesn't connect to anything else nor listens on any port. Instead of communicating via the coordinator, there is different code for reporting log output, dataflow results, etc.

The question is whether these details are relevant to the user. It could also be another source of confusion. For example, with the above description, I can imagine that the following questions arise:

  • Why doesn't dora run connect to the dora coordinator instance I spawned before?
  • Why is the daemon that is spawned by dora run not visible anywhere? (e.g. not connecting to my coordinator instance)
  • Why is my dora start command throwing a 'failed to connect to dora coordinator' error even though I ran a dora run command immediately before? The docs say that a daemon/coordinator is started by dora run, so why do I have to start it again?

The answer to all these is that dora run is designed to run a dataflow locally without interacting with any other dora coordinator or daemon instances.

Copy link
Contributor

@phil-opp phil-opp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for starting this work!

(I'm also planning to work on documentation in the next days/weeks. I'll probably document the dataflow YAML spec next. Just so that we don't do duplicated work.)

@sjfhsjfh
Copy link
Contributor Author

Other points that should also be included in the tutorial imo. Some of them may need enhancement of current CLI

@sjfhsjfh
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants