GithubHelp home page GithubHelp logo

Comments (38)

bsrkhan avatar bsrkhan commented on September 25, 2024 2

Been up for 8 days, running with no code changes. When I hit 14 days of uptime, I'll try the python script with my changes.

from acap-computer-vision-sdk-examples.

Corallo avatar Corallo commented on September 25, 2024 1

Hi @bsrkhan
We think we identified and corrected the problem that seems to be in the inference server.
Unfortunately you can't correct it by yourself as the source of it is still not public. We need to do tests to verify that the issue is correctly solved, and then we'll publish a patch.

from acap-computer-vision-sdk-examples.

Corallo avatar Corallo commented on September 25, 2024

HI @bsrkhan
Thanks for reporting the bug

So if I understood correctly, after running the example for a long time, you get a gRPC error in your log instead of the detections coming from the inference server. Is the status of the gRPC message UNAVIABLE?
If so, could you check the following:
docker -H camera_ip ps to see if the inference sever container is still on
ssh into the camera and journalctl -u larod to see if there are crash reports from the inference server in the log.

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

I have seen a few different rpc errors. The DEADLINE_EXCEEDED is most common. I've seen the status code UNAVAILABLE.
The inference server container will have exited. Strangely sometimes the dockerd will also be stopped on the camera and I have to start that too. I have 2 of the P3255 cameras and it only happens on 1 of them, the dockerd will crash and I have to start it either from the web gui or the command.

I ssh into the cameras and ran the command:
root@axis-b8a44f44f653:~# journalctl -u larod
-- Journal begins at Thu 2022-09-22 00:25:24 PDT, ends at Thu 2022-09-22 09:20:3 5 PDT. --
-- No entries --

I think there are no entries because the dockerd crashed and I started it again.

from acap-computer-vision-sdk-examples.

Corallo avatar Corallo commented on September 25, 2024

Thanks for reporting this issue, we will have to investigate it

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

Is there a time line for the investigation? Does that mean it is reproduceable on your end?

from acap-computer-vision-sdk-examples.

Corallo avatar Corallo commented on September 25, 2024

Hi, we have as objective to at least identify the problem in the following 3 weeks.
When we have established what is causing it, we will be able to give more details about when we plan to fix this issue

from acap-computer-vision-sdk-examples.

marbali8 avatar marbali8 commented on September 25, 2024

Hey @bsrkhan, another question. I'm guessing you use an SD card. How much GB does it have?

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

Yes I am using an SD card, it's 32 GB.

from acap-computer-vision-sdk-examples.

marbali8 avatar marbali8 commented on September 25, 2024

And which other specs? We have found that using outdated SD cards in terms of speed might cause performance issues, including the error that you encounter.

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

Kingston 32GB Canvas Select Plus MicroSDHC Class 10/ UHS-1

from acap-computer-vision-sdk-examples.

marbali8 avatar marbali8 commented on September 25, 2024

I suggest you try with a newer and faster SD card. The SD card that we use is SanDisk Extreme 128GB MicroSDXC UHS Speed Class 3, with app performance class A2. As you can see in SD card standards, UHS-1 is pretty old. The SD card that we use is also not very new, but it has three times the writing speed (see here).

If you decide to try this, please report back with the results so we can investigate further.

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

I will use the SD card you mentioned above and see if it fixes the problem and update here in a bit.

from acap-computer-vision-sdk-examples.

marbali8 avatar marbali8 commented on September 25, 2024

The example has now been running for more than two days with a simpler SD card than the one I mentioned one week ago. The one we are using now is: SanDisk 64GB MicroSDXC Class 10 / UHS-1. This SD card should be the same speed as yours, but double the capacity.
I am running the object-detector-python example with SDK version 1.2, since I see you are using fw 10.10* and ubuntu:20.04. I am also making sure not only that the containers are running, but also that they are outputting logs.

from acap-computer-vision-sdk-examples.

marbali8 avatar marbali8 commented on September 25, 2024

Hello, we have now been able to reproduce your problem. After 5 days, the behaviour that you explained happened and the camera crashed. To restart the application, I only had to restart the Docker ACAP and restart the application.
We are trying to investigate, but for now we will consider it a limitation of the application.

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

Keep me updated on the investigation results.

from acap-computer-vision-sdk-examples.

garaujo23 avatar garaujo23 commented on September 25, 2024

I have also experience this issue on the P3265-LV, after extended periods the camera crashes and you have to manually restart docker ACAP and the app. Tried with multiple different SD cards

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

Is there a timeline for the investigation now that the issue has be reproduced on your end? I would like some more info on the progress of this issue.

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

Any updates?

from acap-computer-vision-sdk-examples.

Corallo avatar Corallo commented on September 25, 2024

Hello @bsrkhan

We are still investigating the issue, this has priority for us, but to reproduce it we take ~ one week.
We reproduced it once, but we need to iterate more to obtain debug information.
Unfortunately, these stability test take time, we'll get you posted with the progress.

from acap-computer-vision-sdk-examples.

Corallo avatar Corallo commented on September 25, 2024

If in your case the example fails faster, maybe we can get your help to get debug details quicker.
if you want, you can ssh into the camera and run:

mkdir /var/spool/storage/SD_DISK/dumps

touch /lib/persistent/.ENABLE_KERNEL_CRASHDUMPS

sync

systemctl reboot

After the rebut, try to run
sysctl -w debug.crashdump=force-crash
And check if you have something in the /var/spool/storage/SD_DISK/dumps folder (this is to verify that the debug flag is working).
Then, run again your application, when it crashes it will produce a new file in the dumps directory that you can send to us.

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

I've got an .elf.lzo file. It is 1.4 GB, where should I upload it?

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

Here is a link to it on OneDrive: https://brocksolutionsinc-my.sharepoint.com/:u:/g/personal/rkhan_brocksolutions_com/ESoVcaCuGMROpP0gMVByoYsBJ0_Z05XhF473iAoQNv696w?e=CWLBTO

from acap-computer-vision-sdk-examples.

Corallo avatar Corallo commented on September 25, 2024

Thanks for sharing, we are still investigating the issue. We'll get back when we have news

from acap-computer-vision-sdk-examples.

garaujo23 avatar garaujo23 commented on September 25, 2024

Here is a link to some dumps if it helps https://drive.google.com/drive/folders/1XWdaAiuhptIS_7Vn5Y4OYsAMvcQStCF8?usp=sharing

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

Are there any updates on the cause of the issue?

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

Awesome, thanks for the update!

from acap-computer-vision-sdk-examples.

garaujo23 avatar garaujo23 commented on September 25, 2024

@Corallo any updates on the patch?

from acap-computer-vision-sdk-examples.

Corallo avatar Corallo commented on September 25, 2024

Hi. Unfortunately, we are still in the testing phase, we want to make sure that our patch doesn't introduce any problem.
Hopefully this month we will publish the update, if you don't want to wait for the test, you can hack your way by updating the acap-runtime image (the inference server) with 1.2.0-rc.1 that you can find on dockerhub (make sure to pick the one with the architecture for your camera).
Then, you also need to remove the -o flag from your docker-compose file.
This should solve the crash that occurs with long executions.

from acap-computer-vision-sdk-examples.

Corallo avatar Corallo commented on September 25, 2024

@garaujo23 @bsrkhan
You can now use the patched branch

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

Hi,

I've been using the updated branch, but the containers still crash.

I've updated to firmware version 11.1.72 from 10.10.69 to fix some errors that popped up.
This is the error:
<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
debug_error_string = "{"created":"@1673313831.171741141","description":"Error received from peer unix:/tmp/acap-runtime.sock","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"Deadline Exceeded","grpc_status":4}"

I'm using axisecp/acap-runtime:1.2.0-rc.1-armv7hf-containerized image of the inference server.

from acap-computer-vision-sdk-examples.

Corallo avatar Corallo commented on September 25, 2024

Please use the latest master. No need to use that rc branch anymore.
Also, try to run it for a bit longer (1-2 min). Deadline Exceeded might happen on the first execution as the camera is busy loading the model

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

I will try the latest version of the armv7hf image, but it doesn't happen right away. It is after several hours.

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

I tried using axisecp/acap-runtime:latest-armv7hf-containerized.
After 2 hours the container crashes with this message:
<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1674547783.905102373","description":"Error received from peer unix:/tmp/acap-runtime.sock","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"Deadline Exceeded","grpc_status":4}"

image

from acap-computer-vision-sdk-examples.

Corallo avatar Corallo commented on September 25, 2024

Did you modify the application? Are you using a different model?

We can try to dig more into this, but from our stability tests we didn't notice any issue so far.
Please, factory default your camera, and then reinstall the FW 11.1.72.

The best that we can do is to try to run a test with your same exact settings.

Please confirm that your settings are:

  • example: object-detector-python from acap-computer-vision-sdk-examples v1.6 (make sure to use the original example).

  • SDK_VERSION is 1.6

  • acap runtime is axisecp/acap-runtime:1.2.0-armv7hf-containerized (don't use latest, use this tagged version).

  • camera: P3255-LVE

  • fw: 11.1.72

Reset your camera to factory default, try again with these settings. We are going to do the same to see if we can replicate your issue again.

from acap-computer-vision-sdk-examples.

Corallo avatar Corallo commented on September 25, 2024

Hello.
We have been running stability tests on that model on P3255-LVE with firmware 11.1.72 on the object-detector-python example, and we can't replicate your issue. Monitoring the device, nothing seemed to be out of the ordinary.
We are going to close now the issue again, as it seems it might be an issue that occurs only on your device.
Feel free to send us more information if you have more insights on the problem.

from acap-computer-vision-sdk-examples.

bsrkhan avatar bsrkhan commented on September 25, 2024

Hi,

Currently using SDK_VERSION 1.5, I will update it too 1.6. Could that be why the containers are still crashing on my end?

I will use the specified ACAP runtime.

camera is P3255-LVE

fw is 11.1.72

Sorry about the late reply. I will update if the crashes continue.

from acap-computer-vision-sdk-examples.

marbali8 avatar marbali8 commented on September 25, 2024

Hello,

The fix was not directly related to the version of the SDK, we changed the inference script in the ACAP Runtime. That was then tagged as "1.2.0" for the ACAP Runtime and "v1.6" for the examples. If you use those tagged versions (or later), it shouldn't crash. As @Corallo was saying, we have tested with those tags and your camera model and FW version and we got no errors.

Feel free to send us the results :)

from acap-computer-vision-sdk-examples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.