Skip to content

Conversation

@nyurik
Copy link
Member

@nyurik nyurik commented Oct 20, 2025

This PR replaces the NOM-based parser with a Pest-based lexer for parsing DBC files. The migration involves:

  • Migrating from nom parser combinators to pest grammar-based parsing
  • Updating error types from Error to DbcError with support for pest errors
  • Adding parse methods to AST types for converting pest pairs into structured data
  • Removing NOM-specific error handling code throughout the codebase

Existing Tests

All .dbc files from dbc-cantools and opendbc repos that passed the tests continue to generate identical results.

Fixed Parsing

  • Many test files from dbc-cantools repo are now properly parsing. See the tests/snapshots/dbc-cantools/ dir for what they generate. CamelCaseEmpty, cp1252, emc32, empty_ns, issue_199, issue_199_extended, issue_228, message-dlc-zero, no_signals, open_actuator, socialledge, variable_dlc
  • Many opendbc actual production DBC files are also properly parsing. Since they are fairly large, I did not generate the snap files of the actual output - instead, they are verified and silently skipped unless there is an error. If there was an error, it would create a small snapshot file indicating which error caused them. The fact that so many error files are being removed highlights that the new parser is able to parse them correctly. This does NOT mean all the elements are correctly exported - only that lexing passed, and some/all tokens from the lexer were consumed into the can-dbc data structures.
    • root: FORD_CADS, FORD_CADS_64, comma_body, fca_giorgio, ford_fusion_2018_pt, gm_global_a_high_voltage_management, gm_global_a_powertrain_expansion, gwm_haval_h6_phev_2024, hyundai_kia_generic, mazda_2017, mercedes_benz_e350_2010, psa_aee2010_r3, rivian_park_assist_can, tesla_can, tesla_model3_vehicle, tesla_powertrain, toyota_radar_dsu_tssp, vw_mqbevo, vw_pq
    • chrysler: _stellantis_common, chrysler_pacifica_2017_hybrid, chrysler_ram_dt, chrysler_ram_hd
    • gm: _community, gm_global_a_powertrain
    • honda: _bosch_2018, _bosch_radar_acc, _bosch_standstill, _community, _gearbox_common, _honda_common, _lkas_hud_5byte, _lkas_hud_8byte, _nidec_common, _nidec_scm_group_a, _nidec_scm_group_b, _steering_control_a, _steering_control_b, _steering_sensors_a, _steering_sensors_b, acura_ilx_2016_can, acura_rdx_2018_can, acura_rdx_2020_can, honda_bosch_radarless, honda_civic_hatchback_ex_2017_can, honda_civic_touring_2016_can, honda_clarity_hybrid_2018_can, honda_common_canfd, honda_crv_ex_2017_body, honda_crv_touring_2016_can, honda_insight_ex_2019_can, honda_odyssey_exl_2018
    • hyundai: hyundai_canfd, hyundai_palisade_2023
    • nissan: _nissan_common, nissan_leaf_2018, nissan_x_trail_2017
    • subaru: subaru_forester_2017, subaru_global_2017, subaru_global_2020_hybrid, subaru_outback_2015, subaru_outback_2019
    • toyota: _community, _toyota_2017, _toyota_adas_standard, toyota_new_mc_pt, toyota_nodsu_pt, toyota_secoc_pt, toyota_tnga_k_pt

@nyurik nyurik marked this pull request as ready for review October 24, 2025 22:41
@nyurik nyurik changed the title chore!: migrate to can-dbc-pest lexer chore: migrate to can-dbc-pest lexer, parse many new types, parse many new .dbc tests Oct 24, 2025
Copy link
Member

@tegimeki tegimeki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to adapt a dependent crate (dbc-data) to use this code, and it works. In fact, it found an invalid VAL_ entry that I had in a test DBC so I am fixing that and will prepare an update over there once this is released.

Also, the various DEF_REL types not implemented here will be good to add as noted, and that looks like it can get adapted to pest once this is merged.

Big change, big improvements!

Comment on lines +265 to +267
Rule::ba_def_rel => return Err(DbcError::NotImplemented("ba_def_rel")),
Rule::ba_def_def_rel => return Err(DbcError::NotImplemented("ba_def_def_rel")),
Rule::ba_rel => return Err(DbcError::NotImplemented("ba_rel")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hitting some of these cases in a couple private DBC files, and see that PR #56 is in progress for those types but will need to be updated for pest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically Pest already handles them - it understands REL syntax and it has rules for that (perhaps not-perfect ones). I just didn't want to include new feature support in this PR - i.e. make it more feature-compatible with the NOM-based version to be able to compare more easily. REL is coming in the next ones.

Copy link

@DanielT DanielT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm happy with this change. I think the pest-based parser is a nice improvement, and the way you've split up the code is also more usable.

I have some comments / nits:

I found it surprising that the pest grammar itself remains in the separate can-dbc-pest crate. I'm not specifically against it, but unless you think it would be useful independently, then I would rather have it as part of can-dbc.

It looks like you've removed the derive for the getters. Not a big problem, but I liked it the way it was.

@nyurik
Copy link
Member Author

nyurik commented Oct 27, 2025

Thanks @DanielT, there are several reasons for the separate crate:

  • testing is fundamentally different - I validate that all known .dbc files pass the lexer, and that the output of that lexer is stable - which is different from what can-dbc produces
  • I do not want to preclude other users of the lexer, for example if someone wants to create alternative structs, or create a dbc formater

That said, I have been contemplating putting them into a single project/single workspace. The only reason I haven't yet is because release-plz is a bit trickier when it comes to releasing multiple crates with different versioning. Plus the release page of github is very much single crate/project focused. So multiple seems like a simpler solution for now.

As for the getter crate - it only makes sense if the entire struct is modifiable only via methods - but half of the fields were already public, which created confusion. Also, many getters were inefficient - they returned a ref to a value even when the value was Copy. Dereferencing, especially for Copy-values smaller than 64bit is wasteful, but more importantly - the usage ergonomic of assert_eq!(&42, foo.bar()) is not great. So in the name of KISS and YAGNI principles, I think we don't need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants