GithubHelp home page GithubHelp logo

Gitlab runner/patrol hangs and does not release the pipeline. TestRunner could be the culprit, not sure 100% need more ideas advice for debugging about patrol HOT 6 OPEN

adrian-moisa avatar adrian-moisa commented on July 29, 2024
Gitlab runner/patrol hangs and does not release the pipeline. TestRunner could be the culprit, not sure 100% need more ideas advice for debugging

from patrol.

Comments (6)

adrian-moisa avatar adrian-moisa commented on July 29, 2024

So far I managed to find the source code that generated the logcat messages labeled as TestRunner. It's in AndroidJUnitRunner I'm studying further to see what could be the issue.

image

from patrol.

adrian-moisa avatar adrian-moisa commented on July 29, 2024

I did more digging and I found out:

  • There's a TestRunner in patrol but no longer used - I have to search more until I find TestRunner
    • Searched in: patrol, flutter, test_api, android-test - Not there
    • Found it in AndroidJunitRunner.java / LogRunListener.java from android-test lib - This is the androidx lib. RunListener is instantiated in 2 places addListenersLegacyOrder() and addListenersNewOrder(). I assume we use the latter. Even so the code seems similar.
  • Too deep in the stack - After digging more I concluded the problem is way too deep in the stack: patrol-cli / patrol / PatrolJunitRunner / AndroidJunitRunner / JunitRunner. My current belief is that junitRunner was not made to run tests that have 5K lines of code. Probably there's a OOM error or smth. Even if found, patching it wouldn't be trivial at all.
  • Duplicated About 10x - Still hangs - Apparently it does not matter which block of code we disable. It will always release the hanging behavior. Following this observation I tried to dupe About tests 10x. Apparently that is enough to trigger the hanging behavior. That strongly hints to a OOM issue or something akin. Meaning it does not matter what we are testing, but the volume of test code per test.
  • Split in one file/one main per api, Not working, Still hangs - Tried splitting the monolit test in 12 smaller monolits, same thing, still hangs.
  • Change order of apis, Still hangs - Same issue no change.
  • Split the gitlab pipeline in 2 jobs, fails again - Ideally it should reuse the cache such that we don't build twice. Managed to do so. Still remains hanging despite green tests.
  • Reset waydroid on a stuck pipeline - exits with process 1 - We tried to trigger a waydroid reset after the pipeline remains hanging. The problem is that the process exits with code 1 for error. So all builds will be red. This is a no go.
  • Split pipeline in 2 jobs + Reset waydroid before 2nd job - works - Resetting waydroid for both pipeline jobs did the trick. The problem is that we just added around 5 mins of runtime in the pipeline. It's not a sustainable solution. Maybe if we split on two runners and run in parallel.

The waydroid reset indicates that the problem involves also waydroid. Something gets stored in waydroid memory/storage and only by doing a simulator reset we can get rid of that issue. At the moment, this is beyond my technical understanding of the patrol stack to be able to track the issue in a reasonable amount of time. I'll leave it open, maybe somebody has some insights to share.

It would be a great issue to fix since waydroid provides a cheap potentially free solution for testing flutter apps. All these services that do testing in cloud are quite expensive as soon a project needs testing in large volume. We run our test on every single commit/push to be able to spot bugs early. And our tests take quite some minutes to run. Therefore on the long run the invoice would become prohibitive. That's why I insisted so hard on making waydroid work. I'll share the recipe to install waydroid on a debian host one of these days.

from patrol.

bartekpacia avatar bartekpacia commented on July 29, 2024

Does this occur only on Waydroid, and never on plain old Android emulator?

My current belief is that junitRunner was not made to run tests that have 5K lines of code. Probably there's a OOM error or smth. Even if found, patching it wouldn't be trivial at all.

I doubt it's the case. I assume that by "5k lines of code" you mean your Dart test. Even if lines of code mattered, then from "JUnitRunner's point of view", the test is literally a few lines of code:

@Test
public void runDartTest() {
PatrolJUnitRunner instrumentation = (PatrolJUnitRunner) InstrumentationRegistry.getInstrumentation();
instrumentation.runDartTest(dartTestName);
}

And if it was OOM, you'd see one in logcat.

I'd bet that it's some nasty race condition that only appears when test(s) take a long time - either in Patrol (more likely), or in android-test's TestOrchestrator.

from patrol.

adrian-moisa avatar adrian-moisa commented on July 29, 2024

Would be really hard to check also Android emulator in this setup. I originally tried to use android emu on the VPS but it's slow as a snail without a GPU and extremely unstable. It has trouble finishing a full set of tests. Fails randomly. I used waydroid because it's a lot less demanding on the resources side + it seems a lot more stable compared to android emu. Still not stellar performance but it's miles ahead in comparison to android emu. So I can't compare the two of them.

As for race condition, it's hard to say, between what and what. At the moment I have zero idea what could be the cause. I'd need to be able to run PatrolJUnitRunner and AndroidJUnitRunner in debug mode and become a lot more familiar with android runner overall. I just scratched the surface there.

from patrol.

bartekpacia avatar bartekpacia commented on July 29, 2024

If you want rock-solid and fast Android emulators, then emulator.wtf may useful for you (assuming you don't need to run some shell scripts, start a backend locally etc.). I have only good experiences with that service.

EDIT I saw "Adb is still involved in this setup" in OP. Well, too bad.

Maybe try using some more beefed-up VPS?

from patrol.

adrian-moisa avatar adrian-moisa commented on July 29, 2024

That was my beef with using any of these services. The amount of testing we are running is already quite large and it's expected to grow further. Yes, I could be looking at some other VPS provider that also provisions a GPU. That would improve the speed. And yes, I'm running a few scripts and a backend all together to get maximum coverage. So I'm strongly inclined to keep the testing inhouse.

image

from patrol.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.