GithubHelp home page GithubHelp logo

Comments (11)

mattlord avatar mattlord commented on June 17, 2024

I’m afraid that I will have to mark this as can’t repeat as there’s no test case. Canceling a MoveTables workflow on a large table is something that happens all the time. So to me it would seem that there’s more to it than that.

You didn’t provide enough detail to try and determine what happened in your case either.

Someone or something dropped the logDataHourlyDeltas table in the middle of the import. As you saw, there was an insert error showing that before you even tried to cancel the workflow, no?

We could turn this into a FR / bug about allowing cancel and cleaning up when the table is already gone. That’s a known issue as you saw.

Please let me know what you would like to do. Thanks!

from vitess.

wiebeytec avatar wiebeytec commented on June 17, 2024

Someone or something dropped the logDataHourlyDeltas table in the middle of the import. As you saw, there was an insert error showing that before you even tried to cancel the workflow, no?

I'm positive nobody deleted that table. And I tried cancelling it several times, which may explain the the cancel line below that.

I can reproduce it, but only about 70% of the time (the --auto-start=false turned out not to be important):

$ ./vtctldclient MoveTables create --auto-start=false --workflow hourlies --source-keyspace legacy2023 --target-keyspace sites2023 --tables logDataHourlyDeltas
The following vreplication streams exist for workflow sites2023.hourlies:                                             
                                                                                                                      
id=16 on sites2023/vic-eu-central-1-fakelive-301: Status: Stopped. VStream has not started.                                                                                                                                                 
id=16 on sites2023/vic-eu-central-1-fakelive-300: Status: Stopped. VStream has not started.                           
id=16 on sites2023/vic-eu-central-1-fakelive-303: Status: Stopped. VStream has not started.                           
id=16 on sites2023/vic-eu-central-1-fakelive-302: Status: Stopped. VStream has not started.                           
                                                                                                                      
Traffic State: Reads Not Switched. Writes Not Switched

Then:

$ ./vtctldclient MoveTables start --workflow hourlies  --target-keyspace sites2023                                    
{                                                                                                                     
  "summary": "Successfully updated the hourlies workflow on (4) target primary tablets in the sites2023 keyspace",    
  "details": [                                                                                                        
    {                                                                                                                 
      "tablet": {                                                                                                     
        "cell": "vic-eu-central-1-fakelive",                                                                          
        "uid": 300                                                                                                    
      },                                                                                                                                                                                                                                    
      "changed": true                                                                                                 
    },                                                                                                                                                                                                                                      
    {                                                                                                                 
      "tablet": {                                                                                                                                                                                                                           
        "cell": "vic-eu-central-1-fakelive",                                                                          
        "uid": 301                                                                                                    
      },                                                                                                              
      "changed": true                                                                                                 
    },                                                                                                                
    {                                                                                                                                                                                                                                       
      "tablet": {                                                                                                     
        "cell": "vic-eu-central-1-fakelive",                                                                                                                                                                                                
        "uid": 302                                                                                                    
      },                                                                                                              
      "changed": true                                                                                                 
    },                                                                                                                
    {                                                                                                                 
      "tablet": {                                                                                                     
        "cell": "vic-eu-central-1-fakelive",                                                                          
        "uid": 303                                                                                                    
      },                                                                                                              
      "changed": true                                                                                                 
    }                                                                                                                 
  ]                                                                                                                   
}

Then:

$ ./vtctldclient MoveTables status --workflow hourlies  --target-keyspace sites2023
The following vreplication streams exist for workflow sites2023.hourlies:

id=16 on sites2023/vic-eu-central-1-fakelive-303: Status: Copying. VStream Lag: 0s.
id=16 on sites2023/vic-eu-central-1-fakelive-302: Status: Copying. VStream Lag: 0s.
id=16 on sites2023/vic-eu-central-1-fakelive-301: Status: Copying. VStream Lag: 0s.
id=16 on sites2023/vic-eu-central-1-fakelive-300: Status: Copying. VStream Lag: 0s.

Traffic State: Reads Not Switched. Writes Not Switched
$ ./vtctldclient MoveTables cancel --workflow hourlies  --target-keyspace sites2023
E0517 06:30:45.734471 1574458 main.go:56] rpc error: code = Unknown desc = cannot remove tables since one or more do not exist in the denylist

This is the vttablet log of only the one cancel operation (lines cut to 500 chars length):

mei 17 06:45:09 vitess-mysql-a1 start_vttablet[583]: E0517 06:45:09.954921     583 utils.go:218] Got unrecoverable error: task error: failed inserting rows: Table 'vt_sites2023.logDataHourlyDeltas' doesn't exist (errno 1146) (sqlstate 42S02) during query: insert into logDataHourlyDeltas(idSite,`timestamp`,idDataAttribute,instance,valueFloat,lastUpdate) values (975,1595223000,104,0,0.273064,1595223926), (975,1595223000,105,0,0.0910206,1595223926), (975,1595223000,341,11,0.364084,1595223926), (975,1595223000,533,0,903,1595223926), (975,1595223900,101,0,0.0910225,1595224811), (975,1595223900,

mei 17 06:45:09 vitess-mysql-a1 start_vttablet[583]: E0517 06:45:09.963547     583 controller.go:272] INTERNAL: unable to setState() in controller: could not insert into log table: insert into _vt.vreplication_log(vrepl_id, type, state, message) values(25, 'State Changed', 'Error', 'task error: failed inserting rows: Table \'vt_sites2023.logDataHourlyDeltas\' doesn\'t exist (errno 1146) (sqlstate 42S02) during query: insert into logDataHourlyDeltas(idSite,`timestamp`,idDataAttribute,instance,valueFloat,lastUpdate) values (975,1595223000,104,0,0.273064,1595223926), (975,1595223000,105,0,0.0

mei 17 06:45:09 vitess-mysql-a1 start_vttablet[583]: E0517 06:45:09.966850     583 dbclient.go:127] error in stream 25, will retry after 5s: task error: failed inserting rows: Table 'vt_sites2023.logDataHourlyDeltas' doesn't exist (errno 1146) (sqlstate 42S02) during query: insert into logDataHourlyDeltas(idSite,`timestamp`,idDataAttribute,instance,valueFloat,lastUpdate) values (975,1595223000,104,0,0.273064,1595223926), (975,1595223000,105,0,0.0910206,1595223926), (975,1595223000,341,11,0.364084,1595223926), (975,1595223000,533,0,903,1595223926), (975,1595223900,101,0,0.0910225,1595224811

from vitess.

mattlord avatar mattlord commented on June 17, 2024

Thank you for the additional info, @wiebeytec ! I'll have to do some investigating and testing (on main) then to see if I can come up with any explanation and a test case. Something quite odd and unexpected is certainly happening in your case — I just don't know what and will have to try and find out.

This is the vttablet log of only the one cancel operation (lines cut to 500 chars length):

Did you see this on all of that tablets though?

from vitess.

wiebeytec avatar wiebeytec commented on June 17, 2024

Did you see this on all of that tablets though?

I just performed the operation in a loop 20 times. It failed 20 times, and it was 100% consistent that one shard/tablet did not produce the "table doesn't exist" error upon cancel, and the other three did. In the above list, vic-eu-central-1-fakelive-0000000303 was always fine. This is the last tablet of shard sites2023/cc-

I also looked with show tables on vic-eu-central-1-fakelive-0000000303 that the table indeed kept coming and going.

from vitess.

mattlord avatar mattlord commented on June 17, 2024

I think I see the problem, shown with this patch here:

diff --git a/go/vt/vtctl/workflow/server.go b/go/vt/vtctl/workflow/server.go
index 17b01736a7..1bac98f55c 100644
--- a/go/vt/vtctl/workflow/server.go
+++ b/go/vt/vtctl/workflow/server.go
@@ -1970,14 +1970,6 @@ func (s *Server) WorkflowDelete(ctx context.Context, req *vtctldatapb.WorkflowDe
        span.Annotate("keep_routing_rules", req.KeepRoutingRules)
        span.Annotate("shards", req.Shards)

-       // Cleanup related data and artifacts.
-       if _, err := s.DropTargets(ctx, req.Keyspace, req.Workflow, req.KeepData, req.KeepRoutingRules, false); err != nil {
-               if topo.IsErrType(err, topo.NoNode) {
-                       return nil, vterrors.Wrapf(err, "%s keyspace does not exist", req.Keyspace)
-               }
-               return nil, err
-       }
-
        deleteReq := &tabletmanagerdatapb.DeleteVReplicationWorkflowRequest{
                Workflow: req.Workflow,
        }
@@ -2002,6 +1994,14 @@ func (s *Server) WorkflowDelete(ctx context.Context, req *vtctldatapb.WorkflowDe
                return nil, vterrors.Errorf(vtrpcpb.Code_FAILED_PRECONDITION, "the %s workflow does not exist in the %s keyspace", req.Workflow, req.Keyspace)
        }

+       // Cleanup related data and artifacts.
+       if _, err := s.DropTargets(ctx, req.Keyspace, req.Workflow, req.KeepData, req.KeepRoutingRules, false); err != nil {
+               if topo.IsErrType(err, topo.NoNode) {
+                       return nil, vterrors.Wrapf(err, "%s keyspace does not exist", req.Keyspace)
+               }
+               return nil, err
+       }
+
        response := &vtctldatapb.WorkflowDeleteResponse{}
        response.Summary = fmt.Sprintf("Successfully cancelled the %s workflow in the %s keyspace", req.Workflow, req.Keyspace)
        details := make([]*vtctldatapb.WorkflowDeleteResponse_TabletInfo, 0, len(res))

We're removing the tables and other related artifacts before removing the workflow, so there's a race there that you would see when there's a high write rate in the stream.

If you don't mind, would you mind doing the cancel with the legacy vtctlclient MoveTables (no d) command? The vtctldclient implementation is new in v18+ and that's where the issue lies so that should work fine and also points to the cause above. If not, no worries. This is almost certainly the cause and I will work up a test case this week.

from vitess.

wiebeytec avatar wiebeytec commented on June 17, 2024

Great to read you found something.

However, when I try it with vtctlclient, it also errors:

vtctlclient --server 127.0.0.1:15999 MoveTables cancel sites2023.hourlies

E0521 02:12:46.851596 1603627 main.go:96] E0521 00:12:46.851158 vtctl.go:2196] 
cannot remove tables since one or more do not exist in the denylist

The following vreplication streams exist for workflow sites2023.hourlies:

id=87 on -40/vic-eu-central-1-fakelive-0000000300: Status: Error: task error: failed inserting rows: Table 'vt_sites2023.logDataHourlyDeltas' doesn't exist (errno 1146) (sqlstate 42S02) during query: insert into logDataHourlyDeltas(idSite,`timestamp`,idDataAttribute,instance,valueFloat,lastUpdate) values (2441,1555932600,103,0,1.2,1555933525), etc
id=87 on 40-80/vic-eu-central-1-fakelive-0000000301: Status: Error: task error: failed inserting rows: Table 'vt_sites2023.logDataHourlyDeltas' doesn't exist (errno 1146) (sqlstate 42S02) during query: insert into logDataHourlyDeltas(idSite,`timestamp`,idDataAttribute,instance,valueFloat,lastUpdate) values (2349,1637340300,101,0,0.109238,1637341230), etc
id=87 on 80-cc/vic-eu-central-1-fakelive-0000000302: Status: Error: task error: failed inserting rows: Table 'vt_sites2023.logDataHourlyDeltas' doesn't exist (errno 1146) (sqlstate 42S02) during query: insert into logDataHourlyDeltas(idSite,`timestamp`,idDataAttribute,instance,valueFloat,lastUpdate) values (2212,1616848200,533,0,901,1616849106),etc
id=87 on cc-/vic-eu-central-1-fakelive-0000000303: Status: Error: task error: failed inserting rows: Table 'vt_sites2023.logDataHourlyDeltas' doesn't exist (errno 1146) (sqlstate 42S02) during query: insert into logDataHourlyDeltas(idSite,`timestamp`,idDataAttribute,instance,valueFloat,lastUpdate) values (2330,1576764900,104,0,0.01,1576765822), etc

MoveTables Error: rpc error: code = Unknown desc = cannot remove tables since one or more do not exist in the denylist
E0521 02:12:46.888985 1603627 main.go:105] remote error: rpc error: code = Unknown desc = cannot remove tables since one or more do not exist in the denylist

from vitess.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.