Comments (4)
Why not just use mysqldump
if you want "one big batch"? That seems to the better tool for this job.
from ghostferry.
really three reasons:
- simplicity: having to configure 2 tools (set up DB configs, etc) introduces a chance of error
- streamline:
ghostferry
knows which tables must be copied and what has already been done. By having to invokemysqldump
, you move this logic into the caller / duplicate these checks - foreign key constraints: if the schema contains table dependencies, you may need to create tables in a certain order - which may be mixed between
mysqldump
andghostferry
. See #161
But, I agree and my first idea was also to use mysqldump
, but when working on automating it, it just became too painful and I thought it's a useful addition (and easier to test if natively supported in ghostferry
)
from ghostferry.
BTW, note that the review looks scary, but it's not really that big of a change. I just refactored a bit of the copy method to make it less of a beast, and I renamed the Cursor
class to be more clear
from ghostferry.
I'm not sure if we have the cycles right now to review through such a large PR, especially since we may have to perform validation on our production setup if there are large refactors within Ghostferry.
That said, we also have a few string tables only. The way we handle them in ghostferry is by calling a script to mysqldump
within the cutover process synchronously (as these tables are small and will not block the cutover too long).
I'm uncertain what you mean by "too painful" to automate. Ghostferry is not really designed to be called directly, as the setup, teardown, and cutover process usually involve something unique per database, which is very difficult to generalize. As such, ghostferry defers those to the caller and is viewed more as a feature than a defect.
To give you an idea of what we do:
simplicity: having to configure 2 tools (set up DB configs, etc) introduces a chance of error
We derive the configuration of Ghostferry and mysqldump from the same code, which means the configuration is specified once and there's little room for error.
streamline: ghostferry knows which tables must be copied and what has already been done. By having to invoke mysqldump, you move this logic into the caller / duplicate these checks
I'm uncertain what you mean exactly. We black list string tables from Ghostferry using the filter feature and feed those tables to mysqldump, when applicable.
foreign key constraints: if the schema contains table dependencies, you may need to create tables in a certain order - which may be mixed between mysqldump and ghostferry. See #161
We don't have these issues, but is it not possible to create the table on the caller out of band before Ghostferry is invoked and we can possibly add a flag to disable copydb's table creation logic.
from ghostferry.
Related Issues (20)
- tried to advance to a zero log position HOT 1
- mediumint not recognized as numeric type HOT 2
- MariaDB SHOW SLAVE HOSTS output differs from MySQL
- MariaDB binlog events differ from MySQL
- unsigned mediumint value through binlog streamer wrongfully parsed HOT 2
- Cannot follow tutorial due to incompatible docker-compose.yml HOT 1
- Ghostferry misses data for PK values of <= 0 HOT 4
- Investigate removing the cursor and merge it back with the DataIterator
- BinlogStreamerLag in ControlServer's progress API is not seconds as it says it is HOT 1
- Ghostferry control server's webui shows the time taken as now - start
- TestThrottlerThrottlesAndUnthrottles flaky
- Ghostferry should abort if it sees a DDL command that can compromise data integrity
- Ensure BinlogEventBuffer Channel is initialized before running the BinlogStreamer
- Ghostferry binlog streamer lag with large source write volume due to misconfigured BinlogEventBatchSize? HOT 1
- Alternate exit criteria for DataIterators
- InlineVerifyer: invalid memory address or nil pointer dereference
- Trouble with virtual generated columns HOT 6
- Potentially "overlocking" in cursor?
- Config value for copydb `ReplicatedMasterPositionQuery` with vanilla MysQL replication
- Resuming can caused missed replication events HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ghostferry.