Skip to content

Master heartbeat timeout is not notified and is not recoverable #47

@sergii-volokh

Description

@sergii-volokh

Locust4j uses zmq connection to locust master.
If locust4j gets no new heartbeat message from master during timeout of 60 secods, it "quits" and closes zmq connection.

How often it happened: recently I run a kind of endurance test with 5 worker instances twice.
The first run ended in 12 hours, and during this time all the 5 workers got the timeout and quit (earlier or later).
And the second run ended 25 hours, when only one worker was connected all the time. Other 4 of them got the heartbeat timeouts in the first several hours of execution.

What's wrong about it:

  • no correction restoration tries;
  • it's not visible to the main load test logic (no notification or public api method to check the state);
  • it's not configurable.

As minimal and simple solution, I'd suggest ability to pass the timeout value as a paremter, and to get some notification about the disconnection.
Maybe, additional state should be introduced like RunnerState.Disconnected and be sent with the notification.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions