GithubHelp home page GithubHelp logo

Comments (30)

tbnobody avatar tbnobody commented on July 26, 2024

Could you please doublecheck the heap usage on the system overview if this error occours again?

from opendtu.

HacksBugsAndRockAndRoll avatar HacksBugsAndRockAndRoll commented on July 26, 2024

image

image

also the dtu is currently still in this state. Is there anything else I should look into?

from opendtu.

tbnobody avatar tbnobody commented on July 26, 2024

The Heap usage look a little bit high, but not too much. I am currently seeing something between 108 and 110kb.

You mentioned in the other issue that this only occours for one of your two ESP's. Have you tried to reflash this one again (After a complete flash erase)?

Do you see anything special in the serial consoole before this issue happens?

from opendtu.

HacksBugsAndRockAndRoll avatar HacksBugsAndRockAndRoll commented on July 26, 2024

I opened the ticket because I thought it was the ESP hardware but now the second ESP also shows this behavior.
Both ESPs were flash erased ~3 days ago.
The USB is only connected to power for this ESP at the moment since I cannot get the inverters signal at my PC.
I monitored the heap usage by polling ~30s see here (captured some spikes but always seems to recover fine)
image

from opendtu.

tbnobody avatar tbnobody commented on July 26, 2024

Today, before I flashed a new version, the uptime was > 2 days without any issues. So you maybe have some kind of different config. Are you using MQTT with TLS?

from opendtu.

HacksBugsAndRockAndRoll avatar HacksBugsAndRockAndRoll commented on July 26, 2024

no, just plain mqtt with HA discovery

from opendtu.

HacksBugsAndRockAndRoll avatar HacksBugsAndRockAndRoll commented on July 26, 2024

it seems to be appearing only with mqtt enabled so i dug a bit in that code and found #86 might be related

from opendtu.

HacksBugsAndRockAndRoll avatar HacksBugsAndRockAndRoll commented on July 26, 2024

finally the issue also appeared on a dtu without mqtt enabled, this was after 3d17h uptime. The dtus with mqtt enabled still seem to fail faster.

from opendtu.

helgeerbe avatar helgeerbe commented on July 26, 2024

Found this in "WebApi_ws_live.cpp" WebApiWsLiveClass::loop()

DynamicJsonDocument root(40960);
        JsonVariant var = root;
        generateJsonResponse(var);

        size_t len = measureJson(root);
        AsyncWebSocketMessageBuffer* buffer = _ws.makeBuffer(len); //  creates a buffer (len + 1) for you.
        if (buffer) {
            serializeJson(root, (char*)buffer->get(), len + 1);
            _ws.textAll(buffer);
        }

It seems to me, that the buffer is created of the size of the Json document. But in serializeJson() buffer size + 1is used.
Looks not correct for me. The comment indicates that in earlier code len + 1was allocated.

from opendtu.

helgeerbe avatar helgeerbe commented on July 26, 2024

OK, ignore my comment. I had to learn, that makeBuffer() indeed creates a buffer + 1 length.

from opendtu.

stefan123t avatar stefan123t commented on July 26, 2024

@HacksBugsAndRockAndRoll are you running two ESPs in parallel and do you at least use distinct DTU_RADIO_IDs for them ?

from opendtu.

HacksBugsAndRockAndRoll avatar HacksBugsAndRockAndRoll commented on July 26, 2024

Yes to both of these questions.

from opendtu.

stefan123t avatar stefan123t commented on July 26, 2024

Can you confirm this issue also occurs if you are running only one ESP ?
We have reports that two ESPs BOTH questioning a single inverter can cause misattribution & misinterpretation of replies.
Can you supply Serial Logs where this is clearly wrong or where it happens in the payload parser / decoding ?
Do we need to add some logging to the payload parser / decoding to detect such misinterpretation / misattribution ?

from opendtu.

tbnobody avatar tbnobody commented on July 26, 2024

There should be no misinterpretation of the data when using two different DTU id's because one DTU wouldn't see the packages of the other one. Unless.... How different are the DTU id's? The RF packet only contains the lowest 4 bytes of the ID. If this bytes are identical there might be an issue. (But the chance is very small because there are 3 different CRC checksums which have to match)

from opendtu.

petrm avatar petrm commented on July 26, 2024

I observe the same issue. Restarting the ESP makes it go away. It appears randomly after 1-3 days of uptime. When it happens, the DTU is accessible via the web interface, but stops polling the inverters.

from opendtu.

tbnobody avatar tbnobody commented on July 26, 2024

@petrm which inverter(s) are you using? Are you using mqtt? If yes, what configuration? Are you using a DTU-Lite or Pro in parallel? What is your current installed Git Hash (Info --> System)?

from opendtu.

tbnobody avatar tbnobody commented on July 26, 2024

@petrm and @HacksBugsAndRockAndRoll do you have a chance to log the output of the serial console for a longer time? it would be interessting what happens just before this issue. (If there is something special in the serial console)
I also added some additional debug output of the startup sequence in the meantime. This output would be also interessting.

I still try to reproduce this issue. Are you doing anything special? (e.g. poll the web api using curl etc? Or something else which I may not have in mind currently?) I am rebooting my ESP regulary because of development work but I also reach uptimes of 5-10 days without problems.

from opendtu.

petrm avatar petrm commented on July 26, 2024

Since there is no way to get any log remotely, I can only attach the serial console when I am back in about three weeks.
I have a new state of the device: this time it can read the info from the inverter, it shows it in the UI, but sends it out corrupted to MQTT. When I go to the configuration page, the serial number there is displayed correctly, but inverter can't be identified. This would suggest that the serial number displayed there is read from a different place in the memory than the one actually used to query the inverter.
What would help:

  1. add option to reboot remotely as a workaround
  2. maybe add some debugging output also to the UI

from opendtu.

tbnobody avatar tbnobody commented on July 26, 2024

I have a new state of the device: this time it can read the info from the inverter, it shows it in the UI, but sends it out corrupted to MQTT. When I go to the configuration page, the serial number there is displayed correctly, but inverter can't be identified. This would suggest that the serial number displayed there is read from a different place in the memory than the one actually used to query the inverter.

This is absolutly correct. There is a config structure which stores the serial number etc. but when showing the type or exporting mqtt stuff the internal data structures of the hoymiles library are used. There is an vector which stores the inverters inside the hoymiles lib:

std::vector<std::shared_ptr<InverterAbstract>> _inverters;

Anywhere in the code a part of this structure gets partly overwritten. (And therefor any future functionality is totally random). But this does not happen for all users. I would suspect some issue with the packet parser but without knowing the exact received packets its a little bit hard to analyze. (And due to the memory corruption it might be also wrong in an web output)

from opendtu.

HacksBugsAndRockAndRoll avatar HacksBugsAndRockAndRoll commented on July 26, 2024

Since I do not have a computer in a location where the DTU is in range to the inverter I'll need to set something up probably with a raspberry - this might take some time.
On a sidenote: the corruptions also happen during the night time when the inverter was offline. See attached graph where I track the uptime of the devices using 30s interval polls to the rest api.

To further explain this, I have my fork running on these two DTUs the only change I made is a regular detection for corrupted DTU serials in the configuration - in case a corruption is found a restart is triggered ( https://github.com/HacksBugsAndRockAndRoll/OpenDTU/blob/local/configfix-workaroud/src/ConfigFix.cpp - I know this is not the solution to the problem, but it is what allows me to use openDTU for my "productive" setup as long as the bug exists ).

The blue line shows the device with MQTT enabled which seems to increase the chance of corruption - also this device shows the corruptions during the night time.

image

from opendtu.

tbnobody avatar tbnobody commented on July 26, 2024

If the corruption also occours during the night time it's maybe not related to the response of the inverter. But then it should be sufficient if you place one of your ESP's out of range of the inverter but connected to a computer to get the serial output.

from opendtu.

tbnobody avatar tbnobody commented on July 26, 2024

Can you maybe download your config file (Settings --> Config Management), open the .bin file using a hex editor, overwrite your WiFi password with X (do not change the length of the file, just overwrite the characters) and provide this in some way? Then I can import your config with all applied settings and see if this issue occurs.

from opendtu.

HacksBugsAndRockAndRoll avatar HacksBugsAndRockAndRoll commented on July 26, 2024

Sure I'll have a look. So far I can report, that my setup in my room (no inverter connectivity but active mqtt) corrupted only once since last weekend. Unfortunateyl I did not get any meaningful logs since the rebooting did not wait for the serial to flush. Since I fixed this (Serial.flush() then reboot) 4 days ago no corruption happened on this device - I can however move the whole setup into inverter range now since I found a raspberry to attach to it.
My "productive" device had several restarts triggered in the meantime so I hope actually processing radio signals will also increase the corruption frequency on my test setup.

from opendtu.

petrm avatar petrm commented on July 26, 2024

Here is my config
config.zip
I upgraded to https://github.com/tbnobody/OpenDTU/commits/41758ba and no crash for 6 days.

from opendtu.

petrm avatar petrm commented on July 26, 2024

Short update. I now have 20 days uptime with d7fe495 and so far stable, no suspicious log messages or errors.

from opendtu.

stefan123t avatar stefan123t commented on July 26, 2024

@tbnobody havent looked into OpenDTU for the Serial.flush() buffer. But in AhoyDTU I also have some suspicions that we may need to flush the Serial buffers from time to time in order not to reach a buffer overflow.
@HacksBugsAndRockAndRoll did you try the version that @petrm has tested for 20 days being rock-solid ?

from opendtu.

HacksBugsAndRockAndRoll avatar HacksBugsAndRockAndRoll commented on July 26, 2024

Currently I do not have a whole lot of time for this project. I can say, that I am running https://github.com/tbnobody/OpenDTU/commits/59b87c5 which is a slightly adjusted (self reset on corrupted config) version of 9a44324 and I still see the self resets triggered.

image

I'll need to rebase my stuff and update some time.

from opendtu.

stefan123t avatar stefan123t commented on July 26, 2024

@HacksBugsAndRockAndRoll as we are unable to reproduce this issue on other devices,
could you report that with a newer build OpenDTU v23.12.19 or something newer ?

from opendtu.

tbnobody avatar tbnobody commented on July 26, 2024

Would close this issue as it's really old and there where a lot of code iterations. Please open a new one if the problem occours again.

from opendtu.

github-actions avatar github-actions commented on July 26, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion or issue for related concerns.

from opendtu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.