-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Safety Checks for UI, fix for cfg exceeding LED limit #4939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- added some safety checks for UI to not throw errors if JSCON parameters are missing - add all buses in config even if they exceed RAM limit, truncates long buses
WalkthroughIntroduces BusPlaceholder and updates BusManager::add signature. Adjusts WS2812FX initialization to add placeholder buses on memory overrun and continue setup. Relaxes isOk() gating in several iterations. Adds UI guards for missing LED info. Expands heap monitoring with staged recovery actions. Updates serialization and JS generation loops. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
wled00/data/index.js (2)
926-944
: populateSegments guards — solid hardening, tiny nitGood null checks around seg deletion/power controls. Consider explicit radix in parseInt for clarity across engines.
- if (seg0briElement && segp0Element && parseInt(seg0briElement.value)==255) segp0Element.classList.add("hide"); + if (seg0briElement && segp0Element && parseInt(seg0briElement.value, 10) === 255) segp0Element.classList.add("hide");
2409-2415
: Typo in warning; keep message clearFix “elemen” → “element”.
- console.warn('No power elemen'); + console.warn('No power element');
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
wled00/FX_fcn.cpp
(1 hunks)wled00/data/index.js
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
wled00/data/**/*.{htm,html,css,js}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use tabs for indentation in web UI files (.htm/.html/.css/.js) under wled00/data
Files:
wled00/data/index.js
wled00/**/*.{cpp,h}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use spaces (2 per level) for C++ firmware files
Files:
wled00/FX_fcn.cpp
🧠 Learnings (4)
📚 Learning: 2025-04-24T09:31:06.879Z
Learnt from: blazoncek
PR: wled/WLED#4658
File: wled00/led.cpp:90-91
Timestamp: 2025-04-24T09:31:06.879Z
Learning: In the WLED codebase, the `getSegment()` function guards against out-of-bounds segment IDs, and `getFirstSelectedSegId()` falls back to `getMainSegmentId()` if no segments are selected, ensuring no crashes when used through the `setValuesFromFirstSelectedSeg()` macro.
Applied to files:
wled00/data/index.js
📚 Learning: 2025-04-24T09:31:06.879Z
Learnt from: blazoncek
PR: wled/WLED#4658
File: wled00/led.cpp:90-91
Timestamp: 2025-04-24T09:31:06.879Z
Learning: In the WLED codebase, the `getSegment()` function guards against out-of-bounds segment IDs by falling back to the main segment ID, and `getFirstSelectedSegId()` falls back to `getMainSegmentId()` if no segments are selected, ensuring no crashes when used through the `setValuesFromFirstSelectedSeg()` macro.
Applied to files:
wled00/data/index.js
📚 Learning: 2025-09-12T17:29:43.793Z
Learnt from: DedeHai
PR: wled/WLED#4923
File: wled00/FX.cpp:4883-4901
Timestamp: 2025-09-12T17:29:43.793Z
Learning: In WLED’s web UI, only one slider value (e.g., SEGMENT.intensity or SEGMENT.custom1) changes at a time; code relying on this may use simplified change guards, though presets/JSON can still update multiple fields atomically.
Applied to files:
wled00/data/index.js
📚 Learning: 2025-08-26T11:51:21.817Z
Learnt from: DedeHai
PR: wled/WLED#4798
File: wled00/FX.cpp:7531-7533
Timestamp: 2025-08-26T11:51:21.817Z
Learning: In WLED PR #4798, DedeHai confirmed that certain gamma-related calls in FX.cpp/FX_fcn.cpp/particle systems are intentional for effect-level shaping (e.g., brightness curves, TV sim, Pride 2015 pre-mix), distinct from final output gamma. Do not flag or remove these in future reviews; add comments when feasible to clarify intent.
Applied to files:
wled00/FX_fcn.cpp
🧬 Code graph analysis (1)
wled00/FX_fcn.cpp (2)
wled00/bus_manager.h (3)
isDigital
(141-142)isDigital
(180-180)is2Pin
(181-181)wled00/bus_manager.cpp (4)
add
(82-87)add
(82-82)add
(806-827)add
(806-806)
🔇 Additional comments (2)
wled00/data/index.js (2)
671-684
: Graceful defaults for missing LED metadata — LGTMDefensive checks for i.leds.count and i.leds.maxseg prevent UI crashes and use sane fallbacks. Indentation complies with tabs.
2280-2282
: rptSeg: safe read of power state — LGTMFallback to false when the power element is absent avoids exceptions.
Truncating too long buses is not OK -- it creates the weird behavior that the configuration "mutates" when saved by the user. (Why won't my settings stick??) I think what we need to do is create a placeholder bus object ( (We should probably also consider removing the |
very good point, I saw that and thought the same thing but did not think about a solution yet. Any suggestion in how to preserve the config? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have fixed the count
issues in finalizeInit()
instead.
wled00/FX_fcn.cpp
Outdated
DEBUG_PRINTF_P(PSTR("Bus %d memory usage exceeds limit, setting count to %d\n"), (int)bus.type, bus.count); | ||
} | ||
if (BusManager::add(bus) != -1) { | ||
mem += bus.memUsage(Bus::isDigital(bus.type) && !Bus::is2Pin(bus.type) ? digitalCount : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to estimate when you have bus already created.
Use BusManager::getBus(BusManager::getNumBusses()-1)->getBusSize();
instead.
Make up a new Bus subclass |
thanks, I need to process that. Working on heap/ram stuff RN so this is on backburner. If you have the time and interest, feel free to contribute directly to this PR. |
In the event that a Bus fails to initialize, or the memory validation fails, keep the configuration around so the settings contents don't change out from under the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
wled00/FX_fcn.cpp (1)
1198-1204
: Don’t stop initializing after the first placeholderBreaking on the first non-OK bus prevents later valid buses from being begun, causing unexpected loss of output. Continue past placeholders instead.
- if (!bus || !bus->isOk() || bus->getStart() + bus->getLength() > MAX_LEDS) break; + if (!bus) break; + if (!bus->isOk() || bus->getStart() + bus->getLength() > MAX_LEDS) continue;wled00/bus_manager.cpp (1)
820-844
: Channel-limit check ignores the new bus; enforce “would exceed” correctlyThe current test uses existing counts only, so it never triggers. Include the pending bus in the calculation.
- if (digital > WLED_MAX_DIGITAL_CHANNELS || analog > WLED_MAX_ANALOG_CHANNELS) placeholder = true; + const bool newIsDigital = Bus::isDigital(bc.type) && !Bus::is2Pin(bc.type); + const bool newIsAnalogPWM = Bus::isPWM(bc.type); + const unsigned newAnalogPins = newIsAnalogPWM ? Bus::numPWMPins(bc.type) : 0; + if ((digital + (newIsDigital ? 1 : 0)) > WLED_MAX_DIGITAL_CHANNELS || + (analog + newAnalogPins) > WLED_MAX_ANALOG_CHANNELS) { + placeholder = true; + }
🧹 Nitpick comments (1)
wled00/FX_fcn.cpp (1)
1749-1765
: Replace earlybreak
on non-OK buses withcontinue
(skip placeholders, don't mask later buses)Replace:
- if (!bus || !bus->isOk()) break; + if (!bus) break; + if (!bus->isOk()) continue;Apply in wled00/FX_fcn.cpp at lines 986, 1751, 1761, 1898.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
wled00/FX_fcn.cpp
(2 hunks)wled00/bus_manager.cpp
(5 hunks)wled00/bus_manager.h
(3 hunks)wled00/cfg.cpp
(1 hunks)wled00/xml.cpp
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
wled00/**/*.{cpp,h}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use spaces (2 per level) for C++ firmware files
Files:
wled00/xml.cpp
wled00/cfg.cpp
wled00/bus_manager.cpp
wled00/FX_fcn.cpp
wled00/bus_manager.h
🧠 Learnings (11)
📓 Common learnings
Learnt from: DedeHai
PR: wled/WLED#4939
File: wled00/FX_fcn.cpp:1176-1187
Timestamp: 2025-09-16T18:08:42.848Z
Learning: In WLED finalizeInit() bus creation (wled00/FX_fcn.cpp), intentionally allowing memory overruns when bus configurations exceed MAX_LED_MEMORY is a deliberate design choice. The trade-off prioritizes creating buses with reduced LED counts over completely failing to create buses, which would cause no LED output and UI failures. This approach forces users to update configurations after migrating to version 0.16 while maintaining basic functionality.
📚 Learning: 2025-09-16T18:08:42.848Z
Learnt from: DedeHai
PR: wled/WLED#4939
File: wled00/FX_fcn.cpp:1176-1187
Timestamp: 2025-09-16T18:08:42.848Z
Learning: In WLED finalizeInit() bus creation (wled00/FX_fcn.cpp), intentionally allowing memory overruns when bus configurations exceed MAX_LED_MEMORY is a deliberate design choice. The trade-off prioritizes creating buses with reduced LED counts over completely failing to create buses, which would cause no LED output and UI failures. This approach forces users to update configurations after migrating to version 0.16 while maintaining basic functionality.
Applied to files:
wled00/xml.cpp
wled00/cfg.cpp
wled00/bus_manager.cpp
wled00/FX_fcn.cpp
wled00/bus_manager.h
📚 Learning: 2025-08-26T11:51:21.817Z
Learnt from: DedeHai
PR: wled/WLED#4798
File: wled00/FX.cpp:7531-7533
Timestamp: 2025-08-26T11:51:21.817Z
Learning: In WLED PR #4798, DedeHai confirmed that certain gamma-related calls in FX.cpp/FX_fcn.cpp/particle systems are intentional for effect-level shaping (e.g., brightness curves, TV sim, Pride 2015 pre-mix), distinct from final output gamma. Do not flag or remove these in future reviews; add comments when feasible to clarify intent.
Applied to files:
wled00/FX_fcn.cpp
📚 Learning: 2025-08-28T08:09:20.630Z
Learnt from: mval-sg
PR: wled/WLED#4876
File: wled00/xml.cpp:0-0
Timestamp: 2025-08-28T08:09:20.630Z
Learning: The WLED codebase has opportunities for refactoring hardcoded array bounds (like the "15" used for DMX channels) to use sizeof(array)/sizeof(array[0]) for more maintainable code, but such changes should be done consistently across the entire codebase in a dedicated refactoring effort.
Applied to files:
wled00/FX_fcn.cpp
📚 Learning: 2025-04-30T05:41:03.633Z
Learnt from: blazoncek
PR: wled/WLED#4667
File: usermods/user_fx/user_fx.cpp:27-30
Timestamp: 2025-04-30T05:41:03.633Z
Learning: In WLED, the Segment::allocateData() method already includes optimization to check if data is allocated and sufficiently sized, handling buffer reuse to reduce memory fragmentation. Adding an external check like `if (SEGENV.data == nullptr && !SEGENV.allocateData(dataSize))` is unnecessary and could be problematic, as it bypasses proper size verification.
Applied to files:
wled00/FX_fcn.cpp
📚 Learning: 2025-08-31T03:38:14.114Z
Learnt from: BobLoeffler68
PR: wled/WLED#4891
File: wled00/FX.cpp:3333-3349
Timestamp: 2025-08-31T03:38:14.114Z
Learning: WLED PacMan effect (wled00/FX.cpp): Keep pacmancharacters_t position fields as signed int (not int16_t). Maintainer preference (blazoncek) prioritizes avoiding potential overhead/regressions over minor RAM savings. Avoid type shrinking here unless memory pressure is demonstrated.
Applied to files:
wled00/FX_fcn.cpp
📚 Learning: 2025-09-02T01:56:43.841Z
Learnt from: willmmiles
PR: wled/WLED#4890
File: lib/NeoESP32RmtHI/include/NeoEsp32RmtHIMethod.h:173-180
Timestamp: 2025-09-02T01:56:43.841Z
Learning: willmmiles prefers to maintain consistency with upstream NeoPixelBus patterns (like unchecked malloc in construct() methods) rather than diverging until improvements are made upstream first, to minimize maintenance burden and keep the codebase aligned.
Applied to files:
wled00/FX_fcn.cpp
📚 Learning: 2025-09-15T19:13:56.469Z
Learnt from: DedeHai
PR: wled/WLED#4791
File: wled00/util.cpp:737-743
Timestamp: 2025-09-15T19:13:56.469Z
Learning: In WLED's util.cpp, the *_realloc_malloc functions (p_realloc_malloc and d_realloc_malloc) are intentionally designed to free the original buffer on realloc failure and allocate a new buffer, implementing a "replace buffer" semantic rather than traditional realloc behavior. This is documented in the function comments and is the intended design by the author DedeHai.
Applied to files:
wled00/FX_fcn.cpp
📚 Learning: 2025-08-29T15:38:46.208Z
Learnt from: DedeHai
PR: wled/WLED#4791
File: wled00/FX_fcn.cpp:1187-1191
Timestamp: 2025-08-29T15:38:46.208Z
Learning: In WLED's allocate_buffer() function, BFRALLOC_ENFORCE_PSRAM already includes fallback logic to DRAM if PSRAM is not available, as documented in the comment "use PSRAM if available, otherwise fall back to DRAM". The function also uses validateFreeHeap() for additional safety checks. During setup() when finalizeInit() runs, PSRAM has vast available memory making failures unlikely.
Applied to files:
wled00/FX_fcn.cpp
📚 Learning: 2025-09-01T10:26:17.959Z
Learnt from: mval-sg
PR: wled/WLED#4876
File: wled00/wled_eeprom.cpp:0-0
Timestamp: 2025-09-01T10:26:17.959Z
Learning: In WLED PR #4876, the DMXStartLED EEPROM backward compatibility issue was partially addressed by keeping it at address 2550 and reading it as a 16-bit value, with DMXChannelsValue array moved to addresses 2552-2566. This maintains compatibility with pre-0.11 EEPROM layouts for DMXStartLED, though legacy "Set to 255" (code 6) configurations may still need migration logic.
Applied to files:
wled00/FX_fcn.cpp
📚 Learning: 2025-08-21T00:00:17.196Z
Learnt from: willmmiles
PR: wled/WLED#4859
File: usermods/udp_name_sync/udp_name_sync.cpp:62-70
Timestamp: 2025-08-21T00:00:17.196Z
Learning: In WLED codebase, UDP packet processing code must never rely on null termination being present on the wire and must always validate packet size first. The codebase already has some vulnerabilities where malformed data can crash the software, so avoiding introduction of new ones is a priority.
Applied to files:
wled00/FX_fcn.cpp
🧬 Code graph analysis (3)
wled00/bus_manager.cpp (1)
wled00/bus_manager.h (4)
BusPlaceholder
(368-395)getNumberOfPins
(161-161)isVirtual
(145-146)isVirtual
(184-184)
wled00/FX_fcn.cpp (2)
wled00/bus_manager.h (3)
isDigital
(141-142)isDigital
(180-180)is2Pin
(181-181)wled00/bus_manager.cpp (4)
add
(76-81)add
(76-76)add
(820-843)add
(820-820)
wled00/bus_manager.h (1)
wled00/bus_manager.cpp (15)
BusPlaceholder
(758-768)getPins
(320-324)getPins
(320-320)getPins
(575-580)getPins
(575-575)getPins
(647-651)getPins
(647-647)getPins
(716-719)getPins
(716-716)getPins
(770-776)getPins
(770-770)add
(76-81)add
(76-76)add
(820-843)add
(820-820)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: wled_build / Build Environments (usermods)
- GitHub Check: wled_build / Build Environments (esp32s3_4M_qspi)
- GitHub Check: wled_build / Build Environments (esp32s3dev_8MB_opi)
- GitHub Check: wled_build / Build Environments (lolin_s2_mini)
🔇 Additional comments (11)
wled00/xml.cpp (1)
317-317
: Good: allow placeholders to flow to UISwitching to a null-only break lets non-OK buses (placeholders) be serialized into settings JS. This aligns with the placeholder design.
Please confirm the UI correctly displays pins for placeholders even if PinManager::isPinOk(pin) returns false; otherwise consider printing stored pins unconditionally for placeholders.
wled00/cfg.cpp (1)
973-980
: Serialize non-OK buses too — matches placeholder approachDropping the isOk() guard here ensures placeholder buses get saved back to cfg.json, avoiding “mutating” config.
wled00/FX_fcn.cpp (1)
1810-1811
: LGTM: auto-segmentation tolerates placeholdersKeeping only a null check avoids skipping segments when a placeholder appears.
wled00/bus_manager.cpp (5)
758-769
: LGTM: BusPlaceholder preserves configConstructor copies all user config (pins, order, current, etc.). Defaults keep _valid=false, as intended.
770-776
: LGTM: getPins exposes stored pins for UI/serializationMatches Bus::getNumberOfPins(type).
933-936
: LGTM: avoid touching placeholders during ESP8266 LED_BUILTIN re-initAdded isOk() gate prevents placeholder interaction.
1027-1036
: LGTM: ABL setup skips placeholdersOnly OK digital buses get current limits.
1072-1077
: LGTM: ABL apply loops only real digital busesPrevents placeholder side effects.
wled00/bus_manager.h (3)
128-128
: Changed getLength() to always return _len — verify callersThis is necessary for placeholders, but check that length aggregations (e.g., BusManager::getTotalLength(false)) aren’t surfaced in places where “physical” length is expected. Use getTotalLength(true) or isOk() where appropriate.
366-395
: LGTM: Placeholder class surface is sufficient for UI and cfgNo-ops for IO, full accessors for config. Minor: matching the base default arg for getPins is optional, current override is fine.
499-499
: Signature change acknowledgedadd(const BusConfig&, bool) is fine; ensure all call sites use the new flag (you updated finalizeInit()).
@willmmiles IMO there is no need to complicate things with bus placeholder. The only thing needed to be done is ignore isOk() on some places and proceed without memory allocation and underlying physical bus initialisation. |
I did some experimenting in my dev branch and I think I have a good overall solution for ESP8266. It goes something like this:
still need to test how that behaves on ESP32 and maybe tweak it a little. For ESP8266 the new limit is about 720 WS281x LEDs, everything still works at that limit although a bit wonky sometimes. The user is actively informed about stability issues when setting more than about 300 LEDs. Once fully tested, I will push an update to this PR |
I considered that approach, but it would mean that all bus types would need have to have a "don't actually construct" argument, which struck me as rather silly. I found that using a clearly defined placeholder type was a cleaner implementation in the current code. I agree that there is a general problem of handling bus initialization failure that overlaps quite a bit here, but I think that needs to be approached in context of a bigger Bus API redesign. (I haven't done a full pass over it yet, but I'd like to give serious thought to elimininating |
I'm still hard against that. Either construct the bus or don't, but don't change the user's settings. |
doesn't your added code take care of that? i.e. truncating the created bus but keeping the config? sorry, did not yet have time to study in detail what you added. edit: the underlaying issue I am trying to address: ESP8266 on 0.16 will have a bad time (to the point where UI access becomes impossible) if you are coming in with a config using >1000 LEDs. Not creating the bus is one option, creating it but smaller would not leave users with a "its all bricked" moment. Not sure how edge-case this is though. |
I don't think so. If they fail to construct properly, they are just not ok. However, the approach here is that you want to check if memory exceeds MAX_LED_MEMORY before bus is created and it is done in I, personally, would just drop such bus and every bus after it. |
The posted code creates a fake bus that holds the settings but is inactive -- it doesn't produce any output. It seemed the best approach to me; making working busses that have different internal sizes than their settings would be complex to say the least. Re tightened requirements on 0.16, I think we should come at this from two fronts:
|
I was trying to make the point that allowing a "bus is not ok" state puts a performance cost on every bus access. If we do not allow "not ok" busses to exist at all -- replacing them with placeholders or the like -- then we don't ever have to check if the bus is ok, speeding up the core render loops.
I agree that the division of work between
We cannot drop busses as it destroys the user configuration. IMO "my configuration didn't save" is always a bug. |
I have a new approach/suggestion whicht might not require to divert or drop any buses but its a bit of a stretch: accept all buses if memory permits (or even if it does not really). Let the heap checker in the main loop handle the rest. |
Unfortunately, if it drops all busses, none of them will appear in the settings page: the bus list there is generated from the live BusManager state. |
since the major change in this matter are the added buffers, simply dropping those should work. The heap checker currently drops all segments but if the culprit is the global Edit: this works if (heap < MIN_HEAP_SIZE && lastHeap < MIN_HEAP_SIZE) {
DEBUG_PRINTF_P(PSTR("Heap too low! %u\n"), heap);
if(heapPanic) {
DEBUG_PRINTLN(F("!Heap panic reset!"));
strip.~WS2812FX(); // deallocate strip and all its memory
new(&strip) WS2812FX(); // re-create strip object, respecting memory limits
}
forceReconnect = true;
strip.resetSegments(); // remove all but one segments from memory
heapPanic = true; // enable last resort measures on next call if heap is still low
} else if (heap < MIN_HEAP_SIZE) {
DEBUG_PRINTLN(F("Heap low, purging segments."));
strip.purgeSegments();
} else {
heapPanic = false;
} Edit2:
Edit3: |
Right, I don't think you can have both. Either you seperately store "configured" vs "live" lengths, OR you have to null out buses that can't be allocated. I really don't think we should be trying to support "partially functional" buses. Also, if we override the user config, we open ourselves up to "why doesn't my configuration work??? I saved 900 LEDs but only 30 light up??" if there's ever a bug in the JS validation. I applaud that you're pushing for graceful degredation. What would change my mind on this is a better solution for posting error messages back to the main UI. Something that would stay on screen (without covering up parts of the interface) and clearly indicate what the system is unhappy about, what recovery actions were taken, and where to go to fix it. Otherwise I think sophisticated recovery behaviour will be more confusing to users than helpful.
I'm on board with the heap guard fix, but I do think it's worth pointing out that the impact of it is the same as dropping the bus: the LEDs will stay dark if there isn't enough memory to allocate |
The way I see it is once again a question of "how do we anger the least users" :)
what I did not try yet is if the heap is enough to do an OTA downgrade. IMHO this would be a good approach to solve this particular case. |
We're in agreement on that!
I'm worried that seeing it light the LEDs in orange might give the false impression that we're applying some artificial limit in the config page. After all, clearly the software is capable of lighting them up -- so why can't I use them? Still I think we're largely in agreement in the overall thrust of the solution. Lighting the LEDs or not when under memory pressure is negotiable.
+1 - we should guarantee that at least.
I think we should set the "hard" limits where things will work with no layering or transitions. Those are fun features but at least some real-world setups can get by without them. In the code, we should do what we can to degrade gracefully (eg. cancel transitions first if we're out of memory for segments; have a strong and well-documented ordering for discarding segments if there's insufficient memory for all of them; etc.) and provide some guidance on the "definitely safe" limits on the web site. |
Short answer seems to be that this will need work... I wasn't intending to test this, but I ran in to it with #4930 since I was just using the same 8266 I was last testing for this PR, configured for 500 LEDs. It triggered the low heap handler during the upload and reset the wifi. :( |
how long did that OTA take? I just tested with my enhanced code and OTA works as long as it takes less than 45 seconds. could up that limit easily. Let me summarize what we discussed and the options as I see them:
To me, any of the options are possible, personally I think the last one might be the least disruptive but would collide with the dummy approach which has other benefits in the future (no isOK() checking). I do agree here that users need more info on what is going on. We could add a slightly more sophisticated "error" log. Instead of doing the current short "toast" message, an error / symbol could be added to the UI, if clicked, it opens an error log of the last x error messages that could be a bit more descriptive. Extending the |
I didn't time it; I had a bunch of extra serial logging enabled to track the validation process as the ESP8266 validation has to be done partway through the upload. It wasn't super fast but I doubt it was a full 45 seconds. In any event, I'm not ok with "fails if your wifi isn't fast enough". Using lots of memory during OTA is normal and expected -- that's why it clears all the segments first. I'll figure out a way to cue the heap recovery to please leave the wifi alone if an OTA is in progress, and ensure the OTA failure timeouts are short enough that it'll recover quickly if there's a real problem with the OTA itself. (One of the fixes in the WIP patch is to guarantee that regular processing is restarted if the OTA aborts halfway - the original code wouldn't always recover.)
(edited to add numbers) My preference order would be 2, then 4; I still think "doesn't work at all" is a clearer indication that the configuration is untenable than "seems to work. then stops". I don't think 1 or 3 are good plans; anything that overwrites the user config is bad. (I for one don't usually remember the exact count of LEDs in a particular build - if I don't remember to back up the config, I end up doing "does 320 light them all? No? How about 350? OK, maybe 335?") I'd also note that 4 needs a fallback to 2 if even the bus creation fails (ie. someone uploads a bad backup file that has so many LEDs that we can't even construct the buses and buffers once). Ultimately at the code level the difference between the two is only "what is the hard limit at bus creation time" -- the code itself between 2 and 4 should otherwise be identical. (Or to put it another way, I think your improvements to the heap recovery are a good idea regardless of what we do for bad config handling.)
We should probably open a separate issue for an improved error UI and sort out the details elsewhere. |
fully agreed, the heap recovery needs a "gate". I don't know enough about the OTA handler, it might have "is in progress" check.
you guess is as good as mine :) I have no strong preference, users might have.
absolutely. I did not poke at "bugged" cases, just the real set limits. And also some UM's might hog a ton of static memory or heap which we cannot anticipate.
correct.
right. So I think we are in agreement. Shall I go ahead an merge my local changes to this PR? that would be:
in order for these changes to effective, #4928 should be merged first. We can still adjust that PR later if there are any bugs in it, they just need to be added to the 0.15 version PR as well. |
I've put a review on #4928. After those fixes, yes please go ahead and merge your updates here! |
@willmmiles I added in my improvements to the low-heap handler in main loop. |
I fixed the update cleanup handling in #4960, it'll run no matter why the update failed. (Although deleting the segments to free RAM is, alas, currently unrecoverable.) I'd also added an "update in progress" test to the heap check -- I went the other way and had the heap checker inspect the Merge conflicts, here we come... ;)
Ultimately I want to go the other way around -- I don't think we should ever have any
Agreed. Personally I'd also like to clean up the duplication of the memory estimation code in C++ and JS. Later I think we should add an HTTP API endpoint for letting the JS call the C++ code to do the memory estimation. Hopefully should be straightforward after the bus management work is done. |
much better way of doing it. I will remove my crude "hold up with that reset" for one less conflict ;)
sounds like a good goal. I will leave the check in for now.
Fully agree. I also thought about that but could not come up with a good plan on how to do it, my JS / HTTP / WS knowledge is very limited. So is this good to merge after I remove the connection reset blocker mentioned above? |
I haven't been able to do a thorough review or test it, but the overall approach seems OK. |
no problem, I'll wait with merging until you approve. |
fixes #4936
this is mostly what codepilot came up with.
Summary by CodeRabbit
New Features
Bug Fixes