GithubHelp home page GithubHelp logo

Comments (5)

shuhaowu avatar shuhaowu commented on July 1, 2024

Another fact: If the query that we use to SELECT from the source database is incorrect, the current implementation of IterativeVerifier is unable to detect an error, as the ITerativeVerifier relies on effectively the same query.

It's unclear if anything could detect an error in that SELECT statement. It may be one of those things that we have to just get right.

from ghostferry.

shuhaowu avatar shuhaowu commented on July 1, 2024

I looked through some of our records and I've not seen any errors due to the IterativeVerifier. I cannot definitively say if we see any errors due to the IterativeVerifier besides encoding issues, however, as we didn't keep the most detailed error logs.

Looking through our existing code, the only test case that asserts if the IterativeVerifier finds some sort of algorithm corruption is done as follows: After the binlog streamer quits, we delibrately modify a row on the target in the test, and see if the reverify of the iterative verifier catches this. If a row is not copied to the target, any sort of inline verification would catch it immediately.

At this point I'm reasonably confident that the only thing we check is encoding issues.

from ghostferry.

shuhaowu avatar shuhaowu commented on July 1, 2024

After some thought, there's a big blocker for this: there is no easy way to retrieve the source row MD5 directly from MySQL when streaming the binlog. To get around this problem, you would basically need a another goroutine that constantly reverifies binlog entries in the background similar to how the existing IterativeVerifier works.

This idea would still get rid of the data iteration during the IterativeVerifier's execution. This might reduce the total runtime of Ghostferry by a large amount. It may also make the interrupt/resume code easier to write, as there's only a copy phase to deal with, not a separate verification phase. However, it's not certain if the code will be easier to understand as a result.

Going forward some things to find out are:

  • How long does it take to run the verifier.
  • How much time would we save if the verification is inline to the DataIterator, with a background reverification queue for the binlogs.
  • Are there some refactors we can do to make the existing IterativeVerifier easier to understand and work with?
    • I think yes, as I've done some of this as a part of interrupt/resume.

from ghostferry.

shuhaowu avatar shuhaowu commented on July 1, 2024

We are going ahead with this mainly because making the iterative verifier interruptible/resumable will make the codebase much more difficult to understand. Since the IterativeVerifier is almost the same as the InlineVerifier in terms of the corruption events it protects us against, implementing the InlineVerifier is preferred as it will allow us to get resumability for very little changes to that system.

from ghostferry.

shuhaowu avatar shuhaowu commented on July 1, 2024

The InlineVerifier has been fully merged and at some point I'll need to remove the IterativeVerifier.

from ghostferry.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.