-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Bootloop detection & recovery #4793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds file utilities (copy/compare/backup/restore/validate/dump), configuration helpers (backup/restore/verify/reset), integrates backup before config write, and implements bootloop detection/recovery with startup and OTA hooks for ESP8266/ESP32. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
wled00/cfg.cpp (1)
796-796
: Address unused variable warning.The pipeline reports that variable 'success' is set but not used. This appears to be related to the
deserializeConfigSec()
call on line 796. Consider either using the return value or explicitly marking it as unused.Apply this diff to address the warning:
- [[maybe_unused]] bool success = deserializeConfigSec(); + bool success = deserializeConfigSec(); + (void)success; // explicitly mark as unused for nowOr if the success value should be used:
- [[maybe_unused]] bool success = deserializeConfigSec(); + bool success = deserializeConfigSec(); + if (!success) { + DEBUG_PRINTLN(F("Failed to deserialize security config")); + }wled00/fcn_declare.h (1)
229-232
: Consider using consistent parameter types for file operations.The
copyFile
function usesconst String&
parameters while other file functions useconst char*
. For consistency and to avoid unnecessary String object creation, consider usingconst char*
for all file operation functions.-bool copyFile(const String &src_path, const String &dst_path); +bool copyFile(const char* src_path, const char* dst_path);wled00/util.cpp (1)
750-755
: Consider adding a threshold for brownout-triggered config restoration.Currently, any brownout immediately triggers config restoration, which might not be ideal if brownouts are due to hardware issues rather than configuration. Consider tracking consecutive brownouts similar to crash counting.
You could extend the bootloop detection to track brownout events separately and only restore config after multiple consecutive brownouts within a time window. This would prevent unnecessary config changes due to power supply issues.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
wled00/cfg.cpp
(2 hunks)wled00/fcn_declare.h
(3 hunks)wled00/file.cpp
(1 hunks)wled00/util.cpp
(2 hunks)wled00/wled.cpp
(1 hunks)wled00/wled_server.cpp
(2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: blazoncek
PR: wled/WLED#4482
File: wled00/udp.cpp:147-149
Timestamp: 2025-02-19T12:43:34.199Z
Learning: ESP8266 and ESP32 platforms have different maximum segment name lengths in WLED, which can cause truncation when syncing segment names between devices. This platform difference affects the user experience when using the segment name sync feature.
wled00/wled.cpp (3)
Learnt from: blazoncek
PR: #4667
File: usermods/user_fx/user_fx.cpp:27-30
Timestamp: 2025-04-30T05:41:03.633Z
Learning: In WLED, the Segment::allocateData() method already includes optimization to check if data is allocated and sufficiently sized, handling buffer reuse to reduce memory fragmentation. Adding an external check like if (SEGENV.data == nullptr && !SEGENV.allocateData(dataSize))
is unnecessary and could be problematic, as it bypasses proper size verification.
Learnt from: KrX3D
PR: #4664
File: wled00/syslog.cpp:50-51
Timestamp: 2025-04-26T12:51:04.993Z
Learning: In the WLED codebase, syslog configuration variables (syslogHost, syslogPort, syslogProtocol, syslogFacility, syslogSeverity) are defined as global variables in wled.h and are accessed directly in the SyslogPrinter implementation, rather than being class members.
Learnt from: blazoncek
PR: #4700
File: wled00/wled_server.cpp:409-414
Timestamp: 2025-05-26T16:09:34.325Z
Learning: In wled00/wled_server.cpp, the OTA update access control logic intentionally allows OTA updates from different subnets when otaSameSubnet is true AND a PIN is set. This was a conscious design decision by blazoncek to provide flexibility for remote OTA updates with PIN protection, though they acknowledged it may not be the optimal security model.
wled00/wled_server.cpp (2)
Learnt from: blazoncek
PR: #4700
File: wled00/wled_server.cpp:409-414
Timestamp: 2025-05-26T16:09:34.325Z
Learning: In wled00/wled_server.cpp, the OTA update access control logic intentionally allows OTA updates from different subnets when otaSameSubnet is true AND a PIN is set. This was a conscious design decision by blazoncek to provide flexibility for remote OTA updates with PIN protection, though they acknowledged it may not be the optimal security model.
Learnt from: blazoncek
PR: #4482
File: wled00/udp.cpp:147-149
Timestamp: 2025-02-19T12:43:34.199Z
Learning: ESP8266 and ESP32 platforms have different maximum segment name lengths in WLED, which can cause truncation when syncing segment names between devices. This platform difference affects the user experience when using the segment name sync feature.
wled00/util.cpp (10)
Learnt from: blazoncek
PR: #4482
File: wled00/udp.cpp:147-149
Timestamp: 2025-02-19T12:43:34.199Z
Learning: ESP8266 and ESP32 platforms have different maximum segment name lengths in WLED, which can cause truncation when syncing segment names between devices. This platform difference affects the user experience when using the segment name sync feature.
Learnt from: netmindz
PR: #4728
File: wled00/FX.h:378-378
Timestamp: 2025-06-15T09:59:52.720Z
Learning: In WLED's FX.h, MODE_COUNT represents the highest FX_MODE_ ID + 1, not the total count of FX_MODE_ definitions. This is because some effects may have been removed but their IDs are not reused, creating gaps in the numbering sequence. MODE_COUNT is used for array bounds and range checking.
Learnt from: netmindz
PR: #4728
File: wled00/FX.h:378-378
Timestamp: 2025-06-15T09:59:52.720Z
Learning: In WLED's FX.h, MODE_COUNT represents the highest FX_MODE_ ID + 1, not the total count of FX_MODE_ definitions. This is because some effects may have been removed but their IDs are not reused, creating gaps in the numbering sequence. MODE_COUNT is used for array bounds and range checking.
Learnt from: blazoncek
PR: #4658
File: wled00/led.cpp:90-91
Timestamp: 2025-04-24T09:31:06.879Z
Learning: In the WLED codebase, the getSegment()
function guards against out-of-bounds segment IDs, and getFirstSelectedSegId()
falls back to getMainSegmentId()
if no segments are selected, ensuring no crashes when used through the setValuesFromFirstSelectedSeg()
macro.
Learnt from: blazoncek
PR: #4658
File: wled00/led.cpp:90-91
Timestamp: 2025-04-24T09:31:06.879Z
Learning: In the WLED codebase, the getSegment()
function guards against out-of-bounds segment IDs, and getFirstSelectedSegId()
falls back to getMainSegmentId()
if no segments are selected, ensuring no crashes when used through the setValuesFromFirstSelectedSeg()
macro.
Learnt from: KrX3D
PR: #4664
File: wled00/syslog.cpp:50-51
Timestamp: 2025-04-26T12:51:04.993Z
Learning: In the WLED codebase, syslog configuration variables (syslogHost, syslogPort, syslogProtocol, syslogFacility, syslogSeverity) are defined as global variables in wled.h and are accessed directly in the SyslogPrinter implementation, rather than being class members.
Learnt from: blazoncek
PR: #4658
File: wled00/led.cpp:90-91
Timestamp: 2025-04-24T09:31:06.879Z
Learning: In the WLED codebase, the getSegment()
function guards against out-of-bounds segment IDs by falling back to the main segment ID, and getFirstSelectedSegId()
falls back to getMainSegmentId()
if no segments are selected, ensuring no crashes when used through the setValuesFromFirstSelectedSeg()
macro.
Learnt from: blazoncek
PR: #4482
File: wled00/udp.cpp:147-149
Timestamp: 2025-02-19T12:43:34.200Z
Learning: In WLED, maximum segment name length varies by platform:
- ESP8266: 32 characters (WLED_MAX_SEGNAME_LEN = 32)
- ESP32: 64 characters (WLED_MAX_SEGNAME_LEN = 64)
This platform difference can cause truncation when syncing longer names from ESP32 to ESP8266. Additionally, the WLED UI has limitations regarding modified maximum segment name lengths.
Learnt from: DedeHai
PR: #4682
File: wled00/FX.cpp:8997-9005
Timestamp: 2025-05-09T18:43:15.355Z
Learning: In the WLED codebase, SEGMENT.custom3 is always constrained to the range 0-31 and will not exceed this range.
Learnt from: KrX3D
PR: #4585
File: usermods/seven_segment_display_reloaded_v2/seven_segment_display_reloaded_v2.h:121-136
Timestamp: 2025-04-27T09:37:28.415Z
Learning: Using PROGMEM for the seven-segment font array (umSSDRNumbers) in the WLED SSDR usermod causes compilation problems, so it should be left as a regular array.
🪛 GitHub Actions: WLED CI
wled00/util.cpp
[error] 7-7: fatal error: esp32/rtc.h: No such file or directory. Compilation terminated.
wled00/cfg.cpp
[warning] 796-796: Variable 'success' set but not used [-Wunused-but-set-variable].
🔇 Additional comments (7)
wled00/wled.cpp (1)
411-411
: Good integration of bootloop detection into startup sequence.The placement of
handleBootLoop()
after filesystem initialization is logical and appropriate, ensuring the filesystem is available for any config backup/restore operations that may be needed during bootloop recovery.wled00/wled_server.cpp (2)
432-432
: Excellent addition of config backup before OTA update.Creating a backup of the current configuration before starting the OTA update is a smart safety measure that ensures recovery is possible if the update fails or introduces issues.
411-413
: Good integration of bootloop detection with OTA updates.The conditional call to
bootloopCheckOTA()
on non-ESP8266 platforms makes sense since ESP32 supports dual boot partitions for OTA rollback, while ESP8266 does not. This enables the bootloop detection system to track OTA updates and potentially perform automatic rollbacks if needed.wled00/cfg.cpp (3)
777-783
: Well-designed config backup and restore functions.The
backupConfig()
andrestoreConfig()
functions provide clean, focused interfaces for configuration management by leveraging the generic file utility functions. This abstraction makes the code more maintainable and the intent clearer.
785-793
: Effective config reset implementation.The
resetConfig()
function provides a clean way to reset configuration by renaming the file with a.rst.json
suffix rather than deleting it, which preserves the old config for potential recovery while ensuring fresh defaults on reboot.
820-820
: Good integration of backup into config save process.Adding
backupConfig()
before writing new configuration ensures that a backup is always available, providing an additional safety net for configuration management.wled00/util.cpp (1)
7-7
: Use the correct ESP32 RTC attribute headerThe include
#include "esp32/rtc.h"
is invalid and causes your build to fail. For ESP32 RTC NOINIT variables, pull inesp_attr.h
, which definesRTC_NOINIT_ATTR
.
- Location: wled00/util.cpp, around line 7
Suggested diff:
#ifdef ESP8266 #include "user_interface.h" // for bootloop detection #else -#include "esp32/rtc.h" // for bootloop detection +#include "esp_attr.h" // provides RTC_NOINIT_ATTR #endifPlease verify that
esp_attr.h
is present in your ESP32 core; if it isn’t, include the proper header (e.g.,esp_system.h
) that exposes the RTC NOINIT attribute.
Wow, this idea is seriously awesome and will be a huge benefit for people who don't have a serial/USB port for flashing. Thank you for doing this. This will definitely save a lot of people who only have OTA / WiFi on their boards. I love everything about this! I also really love that you're solving it via the ESP32's RTC which persists between reboots. That's smart! The 5-second reboot intervals detected 5 times in a row looks good, and you're clearing the crash-counter on a successful boot. The attempted auto-recovery methods also look great with the config restore, then a config reset, and finally boot partition swap. Great job, man. Very clever stuff! ❤️ |
Fixed some minor issues in the last commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! I'd still suggest maybe ".backup.json", but I don't think it matters that much. I don't think there's any down side to merging this.
thanks for the review. I think I have some uncommited work on this PR, one change is to use bak.filename.json as suggested. |
I think this feature might be worth including in the next maintenance release |
not 100% sure it is 1:1 compatible with V3 builds, some API calls may have been renamed, a port is certainly possible if that is the case. |
also added function to compare contents of two files for future use (currently not used)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
wled00/file.cpp (6)
442-479
: Add self-copy guard and align buffer size with FS_BUFSIZETwo small hardening tweaks:
- Prevent accidental self-copy (src == dst) which would truncate the file.
- Use FS_BUFSIZE to reduce syscall overhead and keep consistency with the rest of the file utilities.
bool copyFile(const char* src_path, const char* dst_path) { DEBUG_PRINTF("copyFile from %s to %s\n", src_path, dst_path); if(!WLED_FS.exists(src_path)) { DEBUG_PRINTLN(F("file not found")); return false; } + + // guard against accidental self-copy which would truncate the source + if (strcmp(src_path, dst_path) == 0) { + DEBUG_PRINTLN(F("copyFile: src and dst are identical")); + return false; + } bool success = true; // is set to false on error File src = WLED_FS.open(src_path, "r"); File dst = WLED_FS.open(dst_path, "w"); if (src && dst) { - uint8_t buf[128]; // copy file in 128-byte blocks + uint8_t buf[FS_BUFSIZE]; // align with other FS utilities (256 bytes) while (src.available() > 0) { size_t bytesRead = src.read(buf, sizeof(buf)); if (bytesRead == 0) { success = false; break; // error, no data read } size_t bytesWritten = dst.write(buf, bytesRead); if (bytesWritten != bytesRead) { success = false; break; // error, not all data written } } } else { success = false; // error, could not open files }
481-516
: Early outs and consistency: handle identical paths, size mismatch, and reuse FS_BUFSIZE
- If both paths are the same, skip I/O and return true.
- Compare sizes first to quickly decide on inequality.
- Use FS_BUFSIZE instead of 128 for symmetry and fewer reads.
bool compareFiles(const char* path1, const char* path2) { DEBUG_PRINTF("compareFile %s and %s\n", path1, path2); if (!WLED_FS.exists(path1) || !WLED_FS.exists(path2)) { DEBUG_PRINTLN(F("file not found")); return false; } + if (strcmp(path1, path2) == 0) { + return true; + } + bool identical = true; // set to false on mismatch File f1 = WLED_FS.open(path1, "r"); File f2 = WLED_FS.open(path2, "r"); if (f1 && f2) { - uint8_t buf1[128], buf2[128]; - while (f1.available() > 0 || f2.available() > 0) { + if (f1.size() != f2.size()) { + identical = false; + } else { + uint8_t buf1[FS_BUFSIZE], buf2[FS_BUFSIZE]; + while (f1.available() > 0) { size_t len1 = f1.read(buf1, sizeof(buf1)); - size_t len2 = f2.read(buf2, sizeof(buf2)); + size_t len2 = f2.read(buf2, sizeof(buf2)); if (len1 != len2) { identical = false; break; // files differ in size or read failed } if (memcmp(buf1, buf2, len1) != 0) { identical = false; break; // files differ in content } - } + } + } } else { identical = false; // error opening files } if (f1) f1.close(); if (f2) f2.close(); return identical; }
520-535
: Harden backup path construction: enforce leading '/' and detect truncation
- backupFile assumes filename starts with '/', since it uses filename + 1.
- Check snprintf return to catch truncation.
bool backupFile(const char* filename) { DEBUG_PRINTF("backup %s \n", filename); if (!validateJsonFile(filename)) { DEBUG_PRINTLN(F("broken file")); return false; } - char backupname[32]; - snprintf(backupname, sizeof(backupname), "%s%s", s_backup_json, filename + 1); // skip leading '/' in filename + if (!filename || filename[0] != '/') { + DEBUG_PRINTLN(F("backupFile: expected absolute path starting with '/'")); + return false; + } + char backupname[32]; + int n = snprintf(backupname, sizeof(backupname), "%s%s", s_backup_json, filename + 1); // skip leading '/' + if (n <= 0 || n >= (int)sizeof(backupname)) { + DEBUG_PRINTLN(F("backupFile: backup path truncated")); + return false; + } if (copyFile(filename, backupname)) { DEBUG_PRINTLN(F("backup ok")); return true; } DEBUG_PRINTLN(F("backup failed")); return false; }
537-558
: Mirror backup hardening in restore: leading '/' check and truncation detectionApply the same input validation and snprintf result check to avoid malformed backup path usage.
bool restoreFile(const char* filename) { DEBUG_PRINTF("restore %s \n", filename); - char backupname[32]; - snprintf(backupname, sizeof(backupname), "%s%s", s_backup_json, filename + 1); // skip leading '/' in filename + if (!filename || filename[0] != '/') { + DEBUG_PRINTLN(F("restoreFile: expected absolute path starting with '/'")); + return false; + } + char backupname[32]; + int n = snprintf(backupname, sizeof(backupname), "%s%s", s_backup_json, filename + 1); // skip leading '/' + if (n <= 0 || n >= (int)sizeof(backupname)) { + DEBUG_PRINTLN(F("restoreFile: backup path truncated")); + return false; + } if (!WLED_FS.exists(backupname)) { DEBUG_PRINTLN(F("no backup found")); return false; } if (!validateJsonFile(backupname)) { DEBUG_PRINTLN(F("broken backup")); return false; } if (copyFile(backupname, filename)) { DEBUG_PRINTLN(F("restore ok")); return true; } DEBUG_PRINTLN(F("restore failed")); return false; }
560-573
: Emit parse error details for faster diagnosticsCurrently only tells valid/invalid. Logging the specific deserialization error helps when diagnosing corrupted files in the field.
bool validateJsonFile(const char* filename) { if (!WLED_FS.exists(filename)) return false; File file = WLED_FS.open(filename, "r"); if (!file) return false; StaticJsonDocument<0> doc, filter; // https://arduinojson.org/v6/how-to/validate-json/ - bool result = deserializeJson(doc, file, DeserializationOption::Filter(filter)) == DeserializationError::Ok; + DeserializationError err = deserializeJson(doc, file, DeserializationOption::Filter(filter)); + bool result = (err == DeserializationError::Ok); file.close(); if (!result) { - DEBUG_PRINTF("Invalid JSON file %s\n", filename); + DEBUG_PRINTF("Invalid JSON file %s (%s)\n", filename, err.c_str()); } else { DEBUG_PRINTF("Valid JSON file %s\n", filename); } return result; }
575-594
: Close rootdir and tidy comment
- Close rootdir explicitly to release FS handle.
- Minor comment grammar nit.
-// print contents of all files in root dir to Serial except wsec files +// print contents of all JSON files in root dir to Serial, except wsec files void dumpFilesToSerial() { File rootdir = WLED_FS.open("/", "r"); File rootfile = rootdir.openNextFile(); while (rootfile) { size_t len = strlen(rootfile.name()); - // skip files starting with "wsec" and dont end in .json + // skip files starting with "wsec" and files that don't end in .json if (strncmp(rootfile.name(), "wsec", 4) != 0 && len >= 6 && strcmp(rootfile.name() + len - 5, ".json") == 0) { Serial.println(rootfile.name()); while (rootfile.available()) { Serial.write(rootfile.read()); } Serial.println(); Serial.println(); } rootfile.close(); rootfile = rootdir.openNextFile(); } + rootdir.close(); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
wled00/file.cpp
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-31T18:21:49.868Z
Learnt from: DedeHai
PR: wled/WLED#4793
File: wled00/file.cpp:481-513
Timestamp: 2025-07-31T18:21:49.868Z
Learning: In WLED, essential configuration files that require backup have short, controlled names (like "/cfg.json", "/presets.json") that are well within a 32-character buffer limit. The file naming is controlled by developers, making buffer overflow in backup filename construction highly unlikely.
Applied to files:
wled00/file.cpp
📚 Learning: 2025-02-19T12:43:34.200Z
Learnt from: blazoncek
PR: wled/WLED#4482
File: wled00/udp.cpp:147-149
Timestamp: 2025-02-19T12:43:34.200Z
Learning: In WLED, maximum segment name length varies by platform:
- ESP8266: 32 characters (WLED_MAX_SEGNAME_LEN = 32)
- ESP32: 64 characters (WLED_MAX_SEGNAME_LEN = 64)
This platform difference can cause truncation when syncing longer names from ESP32 to ESP8266. Additionally, the WLED UI has limitations regarding modified maximum segment name lengths.
Applied to files:
wled00/file.cpp
📚 Learning: 2025-02-19T12:43:34.199Z
Learnt from: blazoncek
PR: wled/WLED#4482
File: wled00/udp.cpp:147-149
Timestamp: 2025-02-19T12:43:34.199Z
Learning: ESP8266 and ESP32 platforms have different maximum segment name lengths in WLED, which can cause truncation when syncing segment names between devices. This platform difference affects the user experience when using the segment name sync feature.
Applied to files:
wled00/file.cpp
🧬 Code Graph Analysis (1)
wled00/file.cpp (1)
wled00/json.cpp (2)
buf
(1080-1088)buf
(1080-1080)
🔇 Additional comments (3)
wled00/file.cpp (3)
442-479
: Resolved the “success flag” bug; implementation is now robustNice fix: initializing success to true and flipping on error addresses the earlier issue where success was never set. Deleting the incomplete destination on failure is correct.
518-535
: Good: safe name construction and pre-validation before backupUsing snprintf (bounded) and validateJsonFile() before copying avoids both overflow and backing up broken JSON. This also addresses the earlier buffer overflow concern raised when strcat was used.
520-558
: Declarations and path usage verified
All function declarations in fcn_declare.h match the definitions in file.cpp, and every call site of backupFile, restoreFile, and validateJsonFile uses s_cfg_json (defined as “/cfg.json”) or other strings beginning with “/”. No literal, non-absolute paths were detected.
* added boot loop detection and config backup * automatic OTA rollback if loading backup does not fix it * added new file handling functions * adding verification of json files, added config restore at bootup if broken * added function to compare contents of two files for future use (currently not used)
How boot loop detection works:
All ESPs have an internal timer that is persistent: this is used to determine intervals between boots. If the reported reset reason hints at a crash, a persistent counter is incremented. If this happens several times in a row, it’s boot looping.
To recover from a boot loop three steps are attempted
Brownout detection
Bootloops due to brownout cannot be detected with the above method unless resorting to flash-stored variables. While possible, it is more complicated. The simple solution: if a brownout is detected, the config is restored from last backup: this will fix any immediate crashes after saving a bad choice. It will however also revert the last change if a brownout happens due to some other reason like setting the LEDs too bright but can’t account for hardware deficiency of user setups.
Backup are intentionally ending in .json so they can be displayed in file editor. Not sure this is the best idea: users could also download the file to view it, inexperienced users should not mess with the backups: should rather add a UI element to restore backups.
Ideas / additions based on this PR:
Summary by CodeRabbit
New Features
Improvements