GithubHelp home page GithubHelp logo

Comments (12)

vito avatar vito commented on June 24, 2024 3

We'll be fixing this pretty hard in the next release. A solution is in the works. We're just gonna rip out those two columns entirely and change how we run checks for manually triggered builds (do it on the side that pulls from the queue, not the side that pushes to it and does all this locking).

from atc.

concourse-bot avatar concourse-bot commented on June 24, 2024

Hi there!

We use Pivotal Tracker to provide visibility into what our team is working on. A story for this issue has been automatically created.

The current status is as follows:

  • #133300157 Jobs can get stuck at 'waiting for suitable set of input versions'

This comment, as well as the labels on the issue, will be automatically updated as the status in Tracker changes.

from atc.

challiwill avatar challiwill commented on June 24, 2024

We do see at least one example of a job hanging in this manner where all previous builds were successes that went green.

from atc.

cjcjameson avatar cjcjameson commented on June 24, 2024

For a time, we were running Concourse 2.3.1 against Garden runC 1.0.1 inadvertently. We tried switching it back, and we're still not sure if that helped or not.

The UI hanging message is "waiting for a suitable set of input versions". The column in the DB that we manually remediate by setting to false to get the jobs unblocked is jobs.resource_checking

@schubert @tom-meyer @dsharp-pivotal @jingyimei

from atc.

challiwill avatar challiwill commented on June 24, 2024

@cjcjameson any idea if the ATC is still running in to too many open files?

from atc.

cjcjameson avatar cjcjameson commented on June 24, 2024

No, I don't think so. We are getting a lot of "atc.container-keepaliver.looking-up-container" and then "atc.baggage-collector.could-not-locate-worker". Separate issue, probably

from atc.

challiwill avatar challiwill commented on June 24, 2024

Cool. I'm pretty sure those are fine.

from atc.

cjcjameson avatar cjcjameson commented on June 24, 2024

We've been doing more spelunking and our discoveries confirm the general diagnosis in the story description: code which should allow resource_checking to resolve doesn't get triggered or gets lost in a crash.

There might be more details however... We don't have to crash the ATC to demonstrate the behavior; we only need to have a long-running or semi-crashing container, then trigger another build. We will still be watching for reproductions of the error.

We often see this error occur as a consequence of the container hang that is documented here: cloudfoundry/guardian#54

Regardless of the guardian issue, however, we think that the Concourse side logic for the resource_checking field could be made more resilient to this general class of issues.

from atc.

cjcjameson avatar cjcjameson commented on June 24, 2024

We've changed our best guess regarding what causes container hangs (cloudfoundry/guardian#54) -- we understand now that it's likely related to stdout being left open, not directly attributable to zombie processes or daemons.

We are still seeking to connect this 'hang' -- containers staying open because children of their initial test process still have stdout open -- to why Concourse mount volume locking or resource checking might jam up in some failure condition.

However, we're working more on the underlying hang at the moment; we don't have a minimal reproduction of this on the Concourse end yet.

from atc.

hfinucane avatar hfinucane commented on June 24, 2024

I am seeing this often with concourse 2.4.0, especially on a pipeline where I frequently trigger jobs and then expect downstream jobs, waiting for passed, to go off. I'm trying to get up the courage to put psql -c 'UPDATE jobs SET resource_checking = false WHERE resource_checking = true;' in a cron job, but I really don't want to. I saw this in 2.3.1 as well, but less often.

I have had ATC crashes, as well as semi-frequent worker crashes- so a fairly unfriendly environment. I also see more

WARNING:  you don't own a lock of type ExclusiveLock
WARNING:  you don't own a lock of type ExclusiveLock

in my postgres logs than would be ideal, if that helps at all.

from atc.

tom-meyer avatar tom-meyer commented on June 24, 2024

We have observed the behavior of a job getting stuck at waiting for suitable inputs but with a new cause. This one seems to be we rolled a worker while it was mid build of the job. This resulted in a connection reset by peer error, and the following manually triggered build was not able to get off the ground.

from atc.

concourse-bot avatar concourse-bot commented on June 24, 2024

Hello again!

All stories related to this issue have been accepted, so I'm going to automatically close this issue.

At the time of writing, the following stories have been accepted:

  • #133300157 Jobs can get stuck at 'waiting for suitable set of input versions'

If you feel there is still more to be done, or if you have any questions, leave a comment and we'll reopen if necessary!

from atc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.