Clever Challenge - Paul-Andre Henegar #4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Api change
I changed the result struct slightly. In the
functionCallsmember, I count the number of function calls before and after the diff separately, likefunctionCalls map[string]struct{ before, after int }.Approach
My approach for the first 4 parts is relatively straightforward, with the only slightly interesting thing being that I used state functions (inspired by https://talks.golang.org/2011/lex.slide#1) and closures.
My approach for counting function calls was reading in the regions one at a time, tokenizing them and deciding if sequences of tokens are considered to be function calls. I read each region into two buffers, one for the before version, one for the after version, and counted function calls separately.
My "tokenizer" is really basic, and only distinguishes what looks like identifiers from whitespace and other characters. It does not treat comments or strings correctly, so my code would currently count a function call inside a comment. The tokenizer could be extended or replaced by an actual tokenizer without any problem. I tried using a regexp based tokenizer at first and it was extremely slow.
The function calls are counted differently depending on the language and I don't count them if the language is unknown. Right now, I only implemented basic function call counting for C (.c, .h) and Python (.py). For both languages, I keep a window of 3 tokens and check if the second token is an identifier, the third token is a '(', and the first token is not something that could indicate a function definition.
Speed
On my computer, the solution runs in about 310ms when I print the info and 120ms when I don't.