Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 16 additions & 5 deletions compiler/src/parsing/driver.re
Original file line number Diff line number Diff line change
Expand Up @@ -106,12 +106,24 @@ let parse_program_for_syntax_error = (~name=?, lexbuf, source) => {
let cached_parsetrees = Hashtbl.create(64);
let reset = () => Hashtbl.clear(cached_parsetrees);

let get_cached_parsetree = name => {
Option.fold(~none=None, ~some=Hashtbl.find_opt(cached_parsetrees), name);
let get_cached_parsetree = (name, source) => {
switch (
Option.fold(~none=None, ~some=Hashtbl.find_opt(cached_parsetrees), name)
) {
| Some((cached_source, cached_program))
when cached_source == Hashtbl.hash(source) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashing the entire parsetree is super expensive. I would use file_older or some other mechanism to verify that the file hasn't changed and that the cache is valid.

Copy link
Member Author

@spotandjake spotandjake Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashing the entire parsetree is super expensive. I would use file_older or some other mechanism to verify that the file hasn't changed and that the cache is valid.

I'm only hashing the source which is just hashing a string right? I am not against using file_older though but I don't think that would work here because of compile_string caching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not as bad; I thought it was hashing the AST.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope I am essentially just using the source hash to validate if the contents of the name has changed and we should be using a new parsetree rather than the cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to test this with a large file (maybe like 25k-50k lines) and make sure the hashing isn't really noticeable compared to the parsing. Also, we should make an issue to only keep a number of cached parsetrees, because this is basically a memory leak in its current form.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if we have any precedent anywhere for timing based tests?

Just as some mental logic hashing itself in ocaml looks to be o(n) Hashtbl.hash caml_hash and I imagine menhir is o(n) as well (though I havent checked).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not. They're both O(n), but remember that runtime complexity just describes how performance of the algorithm grows with more input, not actual running. You can have two O(n) algorithms, but one might take 20x longer than the other one. Both being O(n) just means that with more data, that one will still only take 20x longer than the other one.

Strings are easy to hash and have quick, optimized algorithms, whereas hashing a big data structure requires chasing a bunch of pointers, offset calculations, etc. ASTs are also a lot more data than the source strings.

Copy link
Member Author

@spotandjake spotandjake Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this, do you think it would be acceptable to test with Unix.time() and just ensure the time taken in A is less than B?

My concern with this approach is if the times are too close, a cpu hickup could cause our tests to flake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if you thought I meant I wanted perf tests in the test suite. I just want benchmarks that you ran on your machine and you report back on the numbers.

Some(cached_program)
| _ => None
};
};

let add_cached_parsetree = (name, source, program) => {
Hashtbl.add(cached_parsetrees, name, (Hashtbl.hash(source), program));
};

let parse = (~name=?, lexbuf, source): Parsetree.parsed_program => {
switch (get_cached_parsetree(name)) {
let source = source();
switch (get_cached_parsetree(name, source)) {
| Some(cached) => cached
| None =>
Sedlexing.set_position(lexbuf, Location.start_pos);
Expand All @@ -129,7 +141,6 @@ let parse = (~name=?, lexbuf, source): Parsetree.parsed_program => {
| Parser.Error =>
// Fast parse failed, so now we do a slow, thoughtful parse to produce a
// good error message.
let source = source();
ignore @@
parse_program_for_syntax_error(
~name?,
Expand All @@ -140,7 +151,7 @@ let parse = (~name=?, lexbuf, source): Parsetree.parsed_program => {
failwith("Impossible: Program with syntax error raised no error");
};
switch (name) {
| Some(name) => Hashtbl.add(cached_parsetrees, name, program)
| Some(name) => add_cached_parsetree(name, source, program)
| None => ()
};
program;
Expand Down
Loading