Comments (2)
This seems like a re-occurrence of the issue we diagnosed in #126272.
We see from the logs that we try to stall n4:
17:59:46 failover.go:856: failing n4 (pause)
teamcity-15831467-1719410159-51-n7cpu2: sending signal 19
18:00:46 failover.go:861: recovering n4 (pause)
teamcity-15831467-1719410159-51-n7cpu2: sending signal 18
18:00:48 monitor.go:177: Monitor event: n4: cockroach process for system interface died (exit code 7)
But the process is terminated before it can be recovered.
W240626 18:00:47.421735 694336 1@util/log/file.go:274 â‹® [-] 1473 disk slowness detected: unable to sync log files within 10s
W240626 18:00:47.450892 110 1@server/server_http.go:296 ⋮ [T1,Vsystem,n4] 1479 close tcp ‹[::]:26258›: ‹use of closed network connection›
F240626 18:00:47.423273 694337 1@util/log/file.go:265 â‹® [-] 1489 disk stall detected: unable to sync log files within 20s
F240626 18:00:47.423273 694337 1@util/log/file.go:265 â‹® [-] 1489 !
F240626 18:00:47.423273 694337 1@util/log/file.go:265 â‹® [-] 1489 !This node experienced a fatal error (printed above), and as a result the
F240626 18:00:47.423273 694337 1@util/log/file.go:265 â‹® [-] 1489 !process is terminating.
@itsbilal this was fixed in #125703, but we never backported it to any of the release branches. Maybe we should?
This specific failure is on an rc branch, so we can probably close this out. However, I'll assign it to you and keep it open for now so that it gets your eyes for the backport ping.
from cockroach.
I wonder if we can just tweak the failover roachtest to not stall the log directory at all, or to force a much higher value for COCKROACH_LOG_MAX_SYNC_DURATION
. I'm wary of backporting a subtle change in disk stall behaviour to stable release branches as it has some potential to backfire in the wild. WAL Failover is a new feature in 24.1 with not that many users (yet), but the ways in which cockroach responds to disk stalls is more set-in-stone and relied on by more customers, and I don't think it's a good idea to throw a subtle change in that space inside a patch release.
@arulajmani what do you think?
from cockroach.
Related Issues (20)
- roachtest: cluster_creation failed HOT 4
- roachtest: activerecord failed
- roachtest: hibernate-spatial failed HOT 1
- ccl/serverccl: TestAdminAPIChartCatalog failed HOT 1
- pkg/ccl/spanconfigccl/spanconfigsplitterccl/spanconfigsplitterccl_test: TestDataDriven failed [disk stall flake] HOT 1
- kv/kvserver: TestPromoteNonVoterInAddVoter failed HOT 1
- roachtest: hibernate failed HOT 1
- roachtest: hibernate-spatial failed HOT 1
- roachtest: pgjdbc failed HOT 1
- roachtest: pgjdbc failed HOT 1
- crosscluster/logical: PTS protects too much HOT 1
- protectedts: MakeSchemaObjectsTarget should protect the schema object's metadata HOT 1
- raft: handle snapshots with outdated term HOT 1
- raft: incorrect term on delegated snapshots HOT 1
- raft: put MsgHeartbeat.Match on the wire HOT 1
- roachtest: replicate/wide failed HOT 1
- roachtest: follower-reads/mixed-version/single-region failed HOT 2
- ldap: scalability testing
- Update the `changefeed.sink_io_workers` cluster setting description for sinks HOT 2
- sql: delete's can inject synthetic check constraints for internal expressions HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cockroach.