GithubHelp home page GithubHelp logo

Comments (4)

shuhaowu avatar shuhaowu commented on August 18, 2024

Why not just use mysqldump if you want "one big batch"? That seems to the better tool for this job.

from ghostferry.

kolbitsch-lastline avatar kolbitsch-lastline commented on August 18, 2024

really three reasons:

  • simplicity: having to configure 2 tools (set up DB configs, etc) introduces a chance of error
  • streamline: ghostferry knows which tables must be copied and what has already been done. By having to invoke mysqldump, you move this logic into the caller / duplicate these checks
  • foreign key constraints: if the schema contains table dependencies, you may need to create tables in a certain order - which may be mixed between mysqldump and ghostferry. See #161

But, I agree and my first idea was also to use mysqldump, but when working on automating it, it just became too painful and I thought it's a useful addition (and easier to test if natively supported in ghostferry)

from ghostferry.

kolbitsch-lastline avatar kolbitsch-lastline commented on August 18, 2024

BTW, note that the review looks scary, but it's not really that big of a change. I just refactored a bit of the copy method to make it less of a beast, and I renamed the Cursor class to be more clear

from ghostferry.

shuhaowu avatar shuhaowu commented on August 18, 2024

I'm not sure if we have the cycles right now to review through such a large PR, especially since we may have to perform validation on our production setup if there are large refactors within Ghostferry.

That said, we also have a few string tables only. The way we handle them in ghostferry is by calling a script to mysqldump within the cutover process synchronously (as these tables are small and will not block the cutover too long).

I'm uncertain what you mean by "too painful" to automate. Ghostferry is not really designed to be called directly, as the setup, teardown, and cutover process usually involve something unique per database, which is very difficult to generalize. As such, ghostferry defers those to the caller and is viewed more as a feature than a defect.

To give you an idea of what we do:

simplicity: having to configure 2 tools (set up DB configs, etc) introduces a chance of error

We derive the configuration of Ghostferry and mysqldump from the same code, which means the configuration is specified once and there's little room for error.

streamline: ghostferry knows which tables must be copied and what has already been done. By having to invoke mysqldump, you move this logic into the caller / duplicate these checks

I'm uncertain what you mean exactly. We black list string tables from Ghostferry using the filter feature and feed those tables to mysqldump, when applicable.

foreign key constraints: if the schema contains table dependencies, you may need to create tables in a certain order - which may be mixed between mysqldump and ghostferry. See #161

We don't have these issues, but is it not possible to create the table on the caller out of band before Ghostferry is invoked and we can possibly add a flag to disable copydb's table creation logic.

from ghostferry.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.