Comments (30)
Could you please doublecheck the heap usage on the system overview if this error occours again?
from opendtu.
also the dtu is currently still in this state. Is there anything else I should look into?
from opendtu.
The Heap usage look a little bit high, but not too much. I am currently seeing something between 108 and 110kb.
You mentioned in the other issue that this only occours for one of your two ESP's. Have you tried to reflash this one again (After a complete flash erase)?
Do you see anything special in the serial consoole before this issue happens?
from opendtu.
I opened the ticket because I thought it was the ESP hardware but now the second ESP also shows this behavior.
Both ESPs were flash erased ~3 days ago.
The USB is only connected to power for this ESP at the moment since I cannot get the inverters signal at my PC.
I monitored the heap usage by polling ~30s see here (captured some spikes but always seems to recover fine)
from opendtu.
Today, before I flashed a new version, the uptime was > 2 days without any issues. So you maybe have some kind of different config. Are you using MQTT with TLS?
from opendtu.
no, just plain mqtt with HA discovery
from opendtu.
it seems to be appearing only with mqtt enabled so i dug a bit in that code and found #86 might be related
from opendtu.
finally the issue also appeared on a dtu without mqtt enabled, this was after 3d17h uptime. The dtus with mqtt enabled still seem to fail faster.
from opendtu.
Found this in "WebApi_ws_live.cpp" WebApiWsLiveClass::loop()
DynamicJsonDocument root(40960);
JsonVariant var = root;
generateJsonResponse(var);
size_t len = measureJson(root);
AsyncWebSocketMessageBuffer* buffer = _ws.makeBuffer(len); // creates a buffer (len + 1) for you.
if (buffer) {
serializeJson(root, (char*)buffer->get(), len + 1);
_ws.textAll(buffer);
}
It seems to me, that the buffer is created of the size of the Json document. But in serializeJson()
buffer size + 1
is used.
Looks not correct for me. The comment indicates that in earlier code len + 1
was allocated.
from opendtu.
OK, ignore my comment. I had to learn, that makeBuffer()
indeed creates a buffer + 1 length.
from opendtu.
@HacksBugsAndRockAndRoll are you running two ESPs in parallel and do you at least use distinct DTU_RADIO_IDs for them ?
from opendtu.
Yes to both of these questions.
from opendtu.
Can you confirm this issue also occurs if you are running only one ESP ?
We have reports that two ESPs BOTH questioning a single inverter can cause misattribution & misinterpretation of replies.
Can you supply Serial Logs where this is clearly wrong or where it happens in the payload parser / decoding ?
Do we need to add some logging to the payload parser / decoding to detect such misinterpretation / misattribution ?
from opendtu.
There should be no misinterpretation of the data when using two different DTU id's because one DTU wouldn't see the packages of the other one. Unless.... How different are the DTU id's? The RF packet only contains the lowest 4 bytes of the ID. If this bytes are identical there might be an issue. (But the chance is very small because there are 3 different CRC checksums which have to match)
from opendtu.
I observe the same issue. Restarting the ESP makes it go away. It appears randomly after 1-3 days of uptime. When it happens, the DTU is accessible via the web interface, but stops polling the inverters.
from opendtu.
@petrm which inverter(s) are you using? Are you using mqtt? If yes, what configuration? Are you using a DTU-Lite or Pro in parallel? What is your current installed Git Hash (Info --> System)?
from opendtu.
@petrm and @HacksBugsAndRockAndRoll do you have a chance to log the output of the serial console for a longer time? it would be interessting what happens just before this issue. (If there is something special in the serial console)
I also added some additional debug output of the startup sequence in the meantime. This output would be also interessting.
I still try to reproduce this issue. Are you doing anything special? (e.g. poll the web api using curl etc? Or something else which I may not have in mind currently?) I am rebooting my ESP regulary because of development work but I also reach uptimes of 5-10 days without problems.
from opendtu.
Since there is no way to get any log remotely, I can only attach the serial console when I am back in about three weeks.
I have a new state of the device: this time it can read the info from the inverter, it shows it in the UI, but sends it out corrupted to MQTT. When I go to the configuration page, the serial number there is displayed correctly, but inverter can't be identified. This would suggest that the serial number displayed there is read from a different place in the memory than the one actually used to query the inverter.
What would help:
- add option to reboot remotely as a workaround
- maybe add some debugging output also to the UI
from opendtu.
I have a new state of the device: this time it can read the info from the inverter, it shows it in the UI, but sends it out corrupted to MQTT. When I go to the configuration page, the serial number there is displayed correctly, but inverter can't be identified. This would suggest that the serial number displayed there is read from a different place in the memory than the one actually used to query the inverter.
This is absolutly correct. There is a config structure which stores the serial number etc. but when showing the type or exporting mqtt stuff the internal data structures of the hoymiles library are used. There is an vector which stores the inverters inside the hoymiles lib:
OpenDTU/lib/Hoymiles/src/Hoymiles.h
Line 30 in 9a44324
Anywhere in the code a part of this structure gets partly overwritten. (And therefor any future functionality is totally random). But this does not happen for all users. I would suspect some issue with the packet parser but without knowing the exact received packets its a little bit hard to analyze. (And due to the memory corruption it might be also wrong in an web output)
from opendtu.
Since I do not have a computer in a location where the DTU is in range to the inverter I'll need to set something up probably with a raspberry - this might take some time.
On a sidenote: the corruptions also happen during the night time when the inverter was offline. See attached graph where I track the uptime of the devices using 30s interval polls to the rest api.
To further explain this, I have my fork running on these two DTUs the only change I made is a regular detection for corrupted DTU serials in the configuration - in case a corruption is found a restart is triggered ( https://github.com/HacksBugsAndRockAndRoll/OpenDTU/blob/local/configfix-workaroud/src/ConfigFix.cpp - I know this is not the solution to the problem, but it is what allows me to use openDTU for my "productive" setup as long as the bug exists ).
The blue line shows the device with MQTT enabled which seems to increase the chance of corruption - also this device shows the corruptions during the night time.
from opendtu.
If the corruption also occours during the night time it's maybe not related to the response of the inverter. But then it should be sufficient if you place one of your ESP's out of range of the inverter but connected to a computer to get the serial output.
from opendtu.
Can you maybe download your config file (Settings --> Config Management), open the .bin file using a hex editor, overwrite your WiFi password with X (do not change the length of the file, just overwrite the characters) and provide this in some way? Then I can import your config with all applied settings and see if this issue occurs.
from opendtu.
Sure I'll have a look. So far I can report, that my setup in my room (no inverter connectivity but active mqtt) corrupted only once since last weekend. Unfortunateyl I did not get any meaningful logs since the rebooting did not wait for the serial to flush. Since I fixed this (Serial.flush() then reboot) 4 days ago no corruption happened on this device - I can however move the whole setup into inverter range now since I found a raspberry to attach to it.
My "productive" device had several restarts triggered in the meantime so I hope actually processing radio signals will also increase the corruption frequency on my test setup.
from opendtu.
Here is my config
config.zip
I upgraded to https://github.com/tbnobody/OpenDTU/commits/41758ba and no crash for 6 days.
from opendtu.
Short update. I now have 20 days uptime with d7fe495 and so far stable, no suspicious log messages or errors.
from opendtu.
@tbnobody havent looked into OpenDTU for the Serial.flush()
buffer. But in AhoyDTU I also have some suspicions that we may need to flush the Serial buffers from time to time in order not to reach a buffer overflow.
@HacksBugsAndRockAndRoll did you try the version that @petrm has tested for 20 days being rock-solid ?
from opendtu.
Currently I do not have a whole lot of time for this project. I can say, that I am running https://github.com/tbnobody/OpenDTU/commits/59b87c5 which is a slightly adjusted (self reset on corrupted config) version of 9a44324 and I still see the self resets triggered.
I'll need to rebase my stuff and update some time.
from opendtu.
@HacksBugsAndRockAndRoll as we are unable to reproduce this issue on other devices,
could you report that with a newer build OpenDTU v23.12.19 or something newer ?
from opendtu.
Would close this issue as it's really old and there where a lot of code iterations. Please open a new one if the problem occours again.
from opendtu.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.
from opendtu.
Related Issues (20)
- Bei einphasigen Wechselrichtern ist Angabe "Phase 1" irreführend HOT 4
- [Request] UI should show the last planned/requested limit
- Wert bleibt beim ausgeschalteten Wechselrichter stehen. HOT 5
- AsyncTCP parameter name change HOT 5
- Firmware update from v24.3.15 to any higher version via web interface says sucessful but version doesnt increase in system info HOT 3
- Problem beim ändern des Abfrage Intervalls HOT 10
- Zufälliges zurücksetzten des "Gesamtertrag Heute" HOT 13
- Prometheus API return wrong information in function addPanelInfo HOT 3
- Loosing Connection after few minutes after restart HMS-1600-4T HOT 1
- [Request]Monitor the connection quality between OpenDTU and Inverter(s)
- only last Version 2024.01.26 support HMS 4t 2000 Inverter HOT 10
- Hoymiles Inverter: Serial number with alphabetic character (f.e. 1144AXXXXXX) HOT 1
- Support for esp32-h2-mini-1 (zigbee) HOT 1
- Cannot add serial number containing letters (HMS-800-2T) HOT 1
- Consistency between absolute/relative limit values and homassistant integration
- [Request] Send raw values in API
- [Request] Stand-alone Hoymiles lib / Support for RTL-SDR and SX127x/RFM9x chips HOT 1
- [Request]Alphanumeric serial number required but entry only possible numerically HOT 1
- MQTT: setting limits not working after changing basic topic HOT 1
- Limit Value of inverter displayed wrong HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opendtu.