Skip to content

Frequent errors showing up in benchmarking, introduced in "feat: improve logging in the query_plan" or a bit earlier #1035

@pflanze

Description

@pflanze

The SILO benchmarking/ dir uses api-query to carry out the requests. api-query counts the hard errors (failing connections) and if it reaches 6, then it stops with a "too many errors" message, making that particular benchmarking run fail. This never happened in the past, but rather regularly starting with SILO commit bd2b494. Since benchmarking jobs do 10 runs and are only aborted after the second failure of individual runs, most jobs still finish due to the second chance, but I've seen at least one job fail twice and hence aborted (one of the jobs in "erroneous-jobs" in the evobench job listing), I can look up which it was in case it matters.

It's not a big problem for the benchmarking, in fact we can just increase the number of allowed failures to make ~all jobs succeed. But I wonder if there's an underlying problem that this indicates.

The errors look like these in ~evobench/.evobench-run/working_directory_pool/18.output_of_benchmarking_command_at_2025-09-26T17:36:14.030083174+02:00:

2025-09-26T17:37:37.157186883+02:00     E       Error: too many errors (besides {200: 20304, 400: 177} ~successes): [posting the query "{\"action\":{\"minProportion\":0.001,\"randomize\":false,\"type\":\"Mutations\"},\"filterExpression\":{\"children\":[{\"column\":\"date\",\"from\":\"2023-07-24\",\"to\":\"2023-07-30\",\"type\":\"DateBetween\"},{\"children\":[{\"column\":\"host\",\"value\":\"Homo sapiens\",\"type\":\"StringEquals\"}],\"type\":\"Or\"},{\"sequenceName\":\"S\",\"position\":70,\"symbol\":\"V\",\"type\":\"AminoAcidEquals\"},{\"sequenceName\":\"S\",\"position\":69,\"symbol\":\"H\",\"type\":\"AminoAcidEquals\"}],\"type\":\"And\"}}"
2025-09-26T17:37:37.157248433+02:00     E       
2025-09-26T17:37:37.157259422+02:00     E       Caused by:
2025-09-26T17:37:37.157269562+02:00     E           0: error sending request for url (http://localhost:8081/query): connection error: Connection reset by peer (os error 104)
2025-09-26T17:37:37.157279602+02:00     E           1: connection error: Connection reset by peer (os error 104)
2025-09-26T17:37:37.157289672+02:00     E           2: Connection reset by peer (os error 104), posting the query "{\"action\":{\"groupByFields\":[\"originatingLab\"],\"randomize\":false,\"type\":\"Aggregated\"},\"filterExpression\":{\"type\":\"True\"}}"
2025-09-26T17:37:37.157319492+02:00     E       
2025-09-26T17:37:37.157330152+02:00     E       Caused by:
2025-09-26T17:37:37.157340012+02:00     E           0: error sending request for url (http://localhost:8081/query): connection closed before message completed
2025-09-26T17:37:37.157349872+02:00     E           1: connection closed before message completed, posting the query "{\"action\":{\"groupByFields\":[\"date\"],\"orderByFields\":[{\"field\":\"date\",\"order\":\"ascending\"}],\"randomize\":false,\"type\":\"Aggregated\"},\"filterExpression\":{\"children\":[{\"sequenceName\":\"S\",\"position\":100,\"symbol\":\"X\",\"type\":\"AminoAcidEquals\"}],\"type\":\"And\"}}"
2025-09-26T17:37:37.157359702+02:00     E       
2025-09-26T17:37:37.157369382+02:00     E       Caused by:
2025-09-26T17:37:37.157379332+02:00     E           0: error sending request for url (http://localhost:8081/query): connection closed before message completed
2025-09-26T17:37:37.157400072+02:00     E           1: connection closed before message completed, posting the query "{\"action\":{\"groupByFields\":[\"host\"],\"randomize\":false,\"type\":\"Aggregated\"},\"filterExpression\":{\"type\":\"True\"}}"
2025-09-26T17:37:37.157409972+02:00     E       
2025-09-26T17:37:37.157419742+02:00     E       Caused by:
2025-09-26T17:37:37.157429532+02:00     E           0: error sending request for url (http://localhost:8081/query): connection closed before message completed
2025-09-26T17:37:37.157439672+02:00     E           1: connection closed before message completed, posting the query "{\"action\":{\"groupByFields\":[\"date\"],\"randomize\":false,\"type\":\"Aggregated\"},\"filterExpression\":{\"children\":[{\"column\":\"date\",\"from\":\"2024-10-07\",\"to\":\"2025-04-05\",\"type\":\"DateBetween\"},{\"children\":[{\"column\":\"pangoLineage\",\"value\":\"KP.1.1\",\"includeSublineages\":true,\"type\":\"Lineage\"}],\"type\":\"Or\"},{\"children\":[{\"column\":\"host\",\"value\":\"Homo sapiens\",\"type\":\"StringEquals\"}],\"type\":\"Or\"}],\"type\":\"And\"}}"
2025-09-26T17:37:37.157449432+02:00     E       
2025-09-26T17:37:37.157459072+02:00     E       Caused by:
2025-09-26T17:37:37.157468832+02:00     E           0: error sending request for url (http://localhost:8081/query): connection error: Connection reset by peer (os error 104)
2025-09-26T17:37:37.157478642+02:00     E           1: connection error: Connection reset by peer (os error 104)
2025-09-26T17:37:37.157488542+02:00     E           2: Connection reset by peer (os error 104), posting the query "{\"action\":{\"groupByFields\":[\"division\",\"country\",\"region\"],\"randomize\":false,\"type\":\"Aggregated\"},\"filterExpression\":{\"children\":[{\"column\":\"date\",\"from\":\"2020-01-06\",\"to\":\"2025-04-06\",\"type\":\"DateBetween\"},{\"children\":[{\"column\":\"host\",\"value\":\"Homo sapiens\",\"type\":\"StringEquals\"}],\"type\":\"Or\"}],\"type\":\"And\"}}"
2025-09-26T17:37:37.157498292+02:00     E       
2025-09-26T17:37:37.157507861+02:00     E       Caused by:
2025-09-26T17:37:37.157517661+02:00     E           0: error sending request for url (http://localhost:8081/query): connection closed before message completed
2025-09-26T17:37:37.157527401+02:00     E           1: connection closed before message completed]

Searching for these errors across all benchmarking runs ever done yields the commit ids that had such errors:

evobench@gs-staging-1:~/.evobench-run/working_directory_pool$ grep -F 'Error: too many errors' *.output* -l|» ls -lrt| perl -wne 's{.*? (\d+\.)}{$1}; chomp; print "$_\t"; system "grep","commit.id", $_ '|perl -wne 'm{([^\t]+).*commit_id:.*?([a-f0-9]+)} or die; print "$1\t$2\n"'
18.output_of_benchmarking_command_at_2025-09-26T17:36:14.030083174+02:00	bd2b494687cdd17bdc043c15ebe83be094e871d6
32.output_of_benchmarking_command_at_2025-10-02T00:09:17.910603117+02:00	fddc48c5f4df46135e9451e1358542228680b9a4
30.output_of_benchmarking_command_at_2025-10-02T11:37:58.660032154+02:00	f547e9fdbf221c5b433493346dd50b105b579bf8
17.output_of_benchmarking_command_at_2025-10-02T13:29:05.961113749+02:00	e18702904a6fd3784d7b549a26d98207ea33441b
16.output_of_benchmarking_command_at_2025-10-03T00:12:38.257369762+02:00	f547e9fdbf221c5b433493346dd50b105b579bf8
7.output_of_benchmarking_command_at_2025-10-08T15:34:57.078044183+02:00	d650f2ecc8b35dd9d055126ae27944b4bf52d2f5
49.output_of_benchmarking_command_at_2025-10-09T04:32:26.947147676+02:00	d5bbaba6fbe93a6af7751e562ec62dcf8a35f319
39.output_of_benchmarking_command_at_2025-10-09T05:04:17.733228127+02:00	5beabade64f6568293f60d197497067e3e974c63
50.output_of_benchmarking_command_at_2025-10-11T03:56:29.837424237+02:00	0c805f70c73af1ba8b00b9dd2668e17139b34c73
0.output_of_benchmarking_command_at_2025-10-14T00:04:11.943828124+02:00	e380245471a2df2178186d7f099e89f314d6ea62
6.output_of_benchmarking_command_at_2025-10-20T15:47:04.806275382+02:00	4a7399b4147fbd45f51b95c14ab721dcd6ee337e

bd2b494 is the first commit that has them.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions