-
Notifications
You must be signed in to change notification settings - Fork 17
feat: new tutorial #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Related: dora-rs/dora#896 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for starting this work!
(I'm also planning to work on documentation in the next days/weeks. I'll probably document the dataflow YAML spec next. Just so that we don't do duplicated work.)
```yml | ||
nodes: | ||
- id: hello_dora | ||
build: pip install -e . | ||
path: dora-helloworld | ||
inputs: | ||
tick: dora/timer/millis/20 | ||
outputs: | ||
- hello | ||
- id: hello_dora_2 | ||
build: pip install dora-hello | ||
path: dora-hello | ||
args: --name="World" | ||
inputs: | ||
hello: hello_dora/hello | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to make one of the nodes a Rust node to show that you can combine the two programming languages.
```yml | ||
nodes: | ||
- id: hello_dora | ||
build: pip install -e . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This build command only works if you are inside a specific repo, right? Perhaps something that installs from a published package works better?
|
||
It describes a node by specifying its inputs and outputs, along with some other properties. | ||
|
||
Each node is actually a Python package, with a main entry script that will be executed when the node is started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nodes can also be Rust executables, instead of Python scripts.
|
||
Each node is actually a Python package, with a main entry script that will be executed when the node is started. | ||
|
||
A dataflow is actually an instance of a dataflow definition (the YAML file). You can start multiple dataflows from the same definition if you want. Each dataflow will be assigned a unique ID, which can be used to manage the dataflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we have to talk about dataflow IDs at this point. I think it's better to stick to dora run
in this first example, which does not require any IDs.
$ dora run hello.yml | ||
dataflow start triggered: 0197a739-cb05-70b7-9714-f46476ebd16c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the expected output for a dora run
command, is it?
|
||
> Sometimes the nodes in the same dataflow may exist in different machines, each machine will run a daemon, and the coordinator will be responsible for managing the dataflow (and of course, the nodes) across these daemons. | ||
|
||
Running a dataflow requires a coordinator to be running already. If you don’t have one, no worries – simply use the `dora run` command (similar to docker run). This will start a coordinator (if one isn’t already running) and then run the dataflow for you: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in not exactly true. Dora run
runs a dataflow without spawning any coordinator, or visible daemon.
I think it's a good idea to start with a basic dora run
command that runs a dataflow locally. Then we can introduce the coordinator, daemon, and dora start
in a separate chapter (e.g. a chapter named "Running Dataflows on Multiple Machines").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually our (Mivik and I) plan regarding the CLI behavior of the coordinator/daemon. We think it's important to inform users concisely about what the program is currently doing (or has done), such as starting the daemon.
After all, the concepts of coordinator and daemon aren't that complicated — they're still fairly easy to understand, in my opinion. Also, to keep the structure reasonable and focused on the basics, I kept the introduction to multi-machine introduction in a quote block as a more advanced topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually our (Mivik and I) plan regarding the CLI behavior of the coordinator/daemon. We think it's important to inform users concisely about what the program is currently doing (or has done), such as starting the daemon.
That's fine with me in general! However, dora run
doesn't spawn a coordinator, not even an internal one. Instead, it launches a special variant of the daemon that doesn't connect to anything else nor listens on any port. Instead of communicating via the coordinator, there is different code for reporting log output, dataflow results, etc.
The question is whether these details are relevant to the user. It could also be another source of confusion. For example, with the above description, I can imagine that the following questions arise:
- Why doesn't
dora run
connect to thedora coordinator
instance I spawned before? - Why is the daemon that is spawned by
dora run
not visible anywhere? (e.g. not connecting to my coordinator instance) - Why is my
dora start
command throwing a 'failed to connect to dora coordinator' error even though I ran adora run
command immediately before? The docs say that a daemon/coordinator is started bydora run
, so why do I have to start it again?
The answer to all these is that dora run
is designed to run a dataflow locally without interacting with any other dora coordinator or daemon instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for starting this work!
(I'm also planning to work on documentation in the next days/weeks. I'll probably document the dataflow YAML spec next. Just so that we don't do duplicated work.)
Other points that should also be included in the tutorial imo. Some of them may need enhancement of current CLI
|
Added a prose to introduce the basics of DORA
Might include some flaws, any advice is welcomed