shopify / ghostferry Goto Github PK
View Code? Open in Web Editor NEWThe swiss army knife of live data migrations
Home Page: https://shopify.github.io/ghostferry
License: MIT License
The swiss army knife of live data migrations
Home Page: https://shopify.github.io/ghostferry
License: MIT License
InlineVerification in BatchWriter could log an error message such as the following:
level=error msg="failed to write batch to target, 1 of 5 max retries" error="row fingerprints for pks [XXXXXX] on schema.table do not match" tag=batch_writer
This error may show up multiple times and resolves itself. The move continues on as normal and passes the final verification.
This could be caused by a harmless race condition between the BinlogWriter and the InlineVerifier. Since the InlineVerifier is not covered by TLA+, this race condition is not discovered until after implementation.
To understand the race condition, it is important to understand the operations that occur within the BatchWriter in pseudocode:
for i = 1..retries:
(On source) BEGIN
(On source) SELECT ... FOR UPDATE
for j = 1..retries:
(On target) BEGIN
(On target) INSERT IGNORE ...
(On target) SELECT MD5(... # Checksum occurs here
(On target) COMMIT/ROLLBACK
(On source) COMMIT/ROLLBACK
The error above is emitted at the SELECT MD5(...
line, which causes the inner loop to retry. The second time it encounters SELECT MD5(
line, the verification succeeds, causing the code to break out both the inner loop and outer loop and move on to the next batch of rows.
To understand the race, we can take a look at the state matrix of the source and target database row as the steps in the above pseudo code are executed. Specifically, we need to look at how the actions are serialized. We denote the database row values with v1 and v2, where v1 represents one set of values and v2 represents a different set of values. We define a "payload" as either the argument an action is given (e.g. INSERT v1 INTO ...) or the results an action receives (e.g. SELECT FROM ... -> v1). Let's also assume that the error occured once and resolved itself after:
Step | Actor | Action | Payload | Source | Target |
---|---|---|---|---|---|
1 | DataIterator | BEGIN | N/A | ?? | ?? |
2 | DataIterator | SELECT FOR UPDATE | v2 | ?? | ?? |
3 | BatchWriter | BEGIN | N/A | ?? | ?? |
4 | BatchWriter | INSERT IGNORE | v2 | ?? | ?? |
5 | BatchWriter | SELECT MD5() | v2 | ?? | ?? |
6 | BatchWriter | ROLLBACK | N/A | ?? | ?? |
7 | BatchWriter | BEGIN | N/A | ?? | ?? |
8 | BatchWriter | INSERT IGNORE | v2 | ?? | ?? |
9 | BatchWriter | SELECT MD5() | v2 | v2 | v2 |
10 | BatchWriter | COMMIT | N/A | ?? | ?? |
11 | DataIterator | ROLLBACK | N/A | ?? | ?? |
At step 9, know the checksum succeeds. Thus the payload == source == target at this step. Since the source row (and therefore its checksum) is obtained in step 2, we know the payload is always v2.
Filling out the rest of the states according to the transactional guarentees provided by MySQL gives us the following (exercise on filling it out is left to the reader):
Step | Actor | Action | Payload | Source | Target |
---|---|---|---|---|---|
1 | DataIterator | BEGIN | N/A | v2 | v1 |
2 | DataIterator | SELECT FOR UPDATE | v2 | v2 | v1 |
3 | BatchWriter | BEGIN | N/A | v2 | v1 |
4 | BatchWriter | INSERT IGNORE | v2 | v2 | v1 |
5 | BatchWriter | SELECT MD5() | v2 | v2 | v1 |
6 | BatchWriter | ROLLBACK | N/A | v2 | v1 |
7 | BatchWriter | BEGIN | N/A | v2 | v2 |
8 | BatchWriter | INSERT IGNORE | v2 | v2 | v2 |
9 | BatchWriter | SELECT MD5() | v2 | v2 | v2 |
10 | BatchWriter | COMMIT | N/A | v2 | v2 |
11 | DataIterator | ROLLBACK | N/A | v2 | v2 |
The only way such states could be obtained is by inserting some additional actions before step 1 and between step 6 and 7:
Step | Actor | Action | Payload | Source | Target |
---|---|---|---|---|---|
0a | Application | INSERT (SOURCE) | v1 | v1 | nil |
0b | Application | UPDATE (SOURCE) | v2 | v2 | nil |
0c | BinlogWriter | INSERT | v1 | v2 | v1 |
1 | DataIterator | BEGIN | N/A | v2 | v1 |
2 | DataIterator | SELECT FOR UPDATE | v2 | v2 | v1 |
3 | BatchWriter | BEGIN | N/A | v2 | v1 |
4 | BatchWriter | INSERT IGNORE | v2 | v2 | v1 |
5 | BatchWriter | SELECT MD5() | v2 | v2 | v1 |
6 | BatchWriter | ROLLBACK | N/A | v2 | v1 |
6a | BinlogWriter | UPDATE | v2 | v2 | v2 |
7 | BatchWriter | BEGIN | N/A | v2 | v2 |
8 | BatchWriter | INSERT IGNORE | v2 | v2 | v2 |
9 | BatchWriter | SELECT MD5() | v2 | v2 | v2 |
10 | BatchWriter | COMMIT | N/A | v2 | v2 |
11 | DataIterator | ROLLBACK | N/A | v2 | v2 |
This is relatively harmless and fixing it would require a lot more effort the worth for now.
Several steps are required to get to a resumable Ghostferry to handle schema changes. This issue outlines the steps required in their order.
To achieve this goal, we removed the IterativeVerifier
and replaced it with the InlineVerifier
. The InlineVerifier is conceptually simpler than the IterativeVerifier. The interrupt/resume code is less complicated than it would have been for the IterativeVerifier.
Ghostferry
StateTracker
to a signal handler and allow the normal copy phase of Ghostferry to be interrupted and resumed.IterativeVerifier
.These are completed in #120, #121, #122, #123, #124, #127 #129.
After these are done, we should try to integrate Ghostferry changes into any downstream consumers to make sure it works correctly.
These are pretty tentative.
Implement error handling such that errors caused by a schema change are retried/stalled.
Implement schema change detection via binlog on both the source and the target.
We can tie in the metrics of how many rows we are verifying vs the amount of binlogs we are streaming to figure out how many times we should reverify before we go to cutover. With some sort of basic controls algorithm we could potentially reduce the downtime due to verification to essentially 0.
If we implement this we need to be able to configure maximum number of reverification.
AutomaticCutover
is put there in order to prevent ghostferry from automatically finishing, allowing the cutover steps to be run manually via by a human operator. However, the way it is done (blocking the onFinishedIterations
callback for as long as AutomaticCutover == false
, which blocks WaitUntilRowCopyIsComplete
) feels like a hack. A cleaner strategy might be having a separate wait for manual cutover step or something like that.
In interrupt_resume_test.rb, we send SIGTERM to Ghostferry when ROW_COPY_COMPLETED is sent. However, the status handler doesn't wait until Ghostferry exits before returning. This means that Ghostferry is free to continue executing code, possibly moving an additional batch before quitting. Moving an additional batch results in a failed test occasionally.
To fix this, the send_signal
call needs to be changed to something similar to .kill
.
Some potential things that we need to look at for the IterativeVerifier:
After some discussion with @pushrax and @hkdsun, we think it may be possible that all the IterativeVerifier does is check if a type, encoding, or truncation issue (henceforth referred to as encoding issues) caused data corruption to occur during the BatchWriter and BinlogWriter. Examples of this are: #23 and go-mysql-org/go-mysql#205. It is unclear what other issues the IterativeVerifier can catch. This is especially true because the IterativeVerifier uses the same BinlogStreamer and data iteration as the Ferry: if there are any bugs within those structs, they would equally affect the IterativeVerifier.
If the assumptions above are indeed true, that is, the IterativeVerifier exists purely to catch encoding issues due to the need to translate data from MySQL to Golang and back to MySQL: the verification of data correctness could instead be done directly after the writes occurred. We can then eliminate the the IterativeVerifier and the overall architecture of Ghostferry would be much simpler. A simpler architecture also means the likelihood of bugs would be lower, so it's an appealing idea. It would also make implementing interrupt/resume much easier.
We should dig a little bit further on whether or not the IterativeVerifier only ever catch encoding issues. If so, we should decide if we want to get rid of the IterativeVerifier in favour of inline verification.
The original design of Ghostferry only included a CHECKSUM TABLE verifier. This verifier is implemented effectively outside of Ghostferry. It ensures the entire Ghostferry algorithm is implemented correctly (assuming CHECKSUM TABLE is implemented correctly). The IterativeVerifier was conceived as a drop-in replacement to CHECKSUM TABLE so we can verify partial table moves. This means the original mental model for this "drop-in" IterativeVerifier is that it would "ensures the entire Ghostferry algorithm is implemented correctly". It's unclear there was any direct challenge against this mental model.
So instead of getting rid of the IterativeVerifier, we could make the current IterativeVerifier match the mental model above more closely. However, it's equally unclear how to do this without reimplementing the binlog streaming and data iteration.
The most common type of failures in Shopify's internal usage of Ghostferry is errors caused by changing schemas. These can manifest themselves in many different ways including DataIterator
/BinlogWriter
inserting data into deleted columns, tables disappearing on target database, and other incompatibilities between source and target databases.
This is especially problematic for very time consuming ferry runs as it increases the likelihood of this class of failures.
We need to think about this can be remedied. Starting with a limited scope seems like a good idea for this. For example, start by supporting addition/removal of columns or tables.
cc @Shopify/pods
There are some really weird errors when running Ghostferry for an extended amount of time. Errors like the following shows up in the logs when I run Ghostferry moving data to move a large amount of data:
11786:[mysql] 2017/08/29 17:02:37 connection.go:67: invalid connection
12772:[mysql] 2017/08/29 17:15:40 packets.go:130: write tcp [REDACTED]->[REDACTED]: write: broken pipe
12773:[mysql] 2017/08/29 17:15:40 packets.go:130: write tcp [REDACTED]->[REDACTED]: write: broken pipe
12774:[mysql] 2017/08/29 17:15:40 connection.go:97: write tcp [REDACTED]->[REDACTED]: write: broken pipe
13756:[mysql] 2017/08/29 17:28:17 packets.go:66: unexpected EOF
13757:[mysql] 2017/08/29 17:28:17 packets.go:412: busy buffer
13760:[mysql] 2017/08/29 17:28:17 connection.go:67: invalid connection
15713:[mysql] 2017/08/29 17:55:37 packets.go:33: unexpected EOF
15718:[mysql] 2017/08/29 17:55:40 connection.go:67: invalid connection
16444:[mysql] 2017/08/29 18:12:27 packets.go:66: unexpected EOF
16445:[mysql] 2017/08/29 18:12:27 packets.go:412: busy buffer
16448:[mysql] 2017/08/29 18:12:27 connection.go:67: invalid connection
17208:[mysql] 2017/08/29 18:27:57 packets.go:33: unexpected EOF
17211:[mysql] 2017/08/29 18:27:57 connection.go:67: invalid connection
17295:[mysql] 2017/08/29 18:29:37 packets.go:66: unexpected EOF
17296:[mysql] 2017/08/29 18:29:37 packets.go:412: busy buffer
17299:[mysql] 2017/08/29 18:29:37 connection.go:67: invalid connection
No issues seem to arise from this as I think the underlying driver just reconnects, but I cannot be certain. Upstream has a lot of bugs complaining of similar behaviour. All the comments seems to be all over the place so I don't really know what to make of them (go-sql-driver/mysql#582, go-sql-driver/mysql#674, go-sql-driver/mysql#673 just for a few).
Also I just learned that go-sql-driver/mysql#302 merged about two weeks ago, which supposedly fixed an issue with potentially sending duplicate queries to MySQL. However, another bug seems to be present after: go-sql-driver/mysql#657
Also I noticed while tcpdumping that the errors are caused by connection reset by the mysql server, rather than the client (ghostferry). This maybe a red herring however as I only looked at a few cases and I don't have the logs from that anymore.
I haven't ran a detailed investigation into this or seen any case where this causes any issues during a real run.
Please see #24 for fixes.
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON `abc`.* TO 'ghostferry'@'%';
is an invalid query (you cannot grant REPLICATION SLAVE, REPLICATION CLIENT
on a schema)
The SELECT * FROM abc.table1 WHERE id = 351;
validation query needs to run on the target DB, not the source DB.
We need some documentation for this project. Specifically:
And possibly:
The API doc we can use godoc. We can document the rest via github pages via sphinx.
ghostferry-sharding will be undocumented for now as it is not a stable general purpose tool at this moment, but that may change in the future as it is well built and tested.
Exporting state on stdout works really well, but it has an annoying gotcha: if the tool invocation fails (e.g., a configuration error, DB connectivity error, or something similar) the output on stdout will not contain a valid resume state, as we never entered the main loop of ghostferry-copydb. As a result, the caller must be aware of any startup failure and has to take special precaution to not overwrite the statefile generated by a previous tool invocation.
For example: if we run ghostferry-copydb in a loop, where the previous iteration's stdout is written to disk and passed as CLI argument to the next iteration, any iteration where tool invocation fails will inevitably overwrite (and thus lose) the state.
One way to handle this situation would be to configure ghostferry to dump state to a file, and to only write the updated state once we are in a condition where this makes sense - we have successfully started, loaded the original state (if any), are able to export the state, etc. At the same time, we could import the state on copydb start (if that file already exists).
Note that this could also be handled outside the ghostferry library or ghostferry-copydb by putting more intelligence into a caller. But, by embedding more careful handling in the library/tool itself, we can provide users with safe default handling and avoid different solutions to a shared problem.
Clearly whatever change we make in the library/tool must allow for backwards compatibility to maintain the original behavior if that is expected by a caller
In ghostferry.tests.TestIgnoresTables
:
--- FAIL: TestIgnoresTables (1.70s)
Error Trace: iterative_verifier_integration_test.go:139
integration_test_case.go:158
integration_test_case.go:105
integration_test_case.go:42
integration_test_case.go:33
iterative_verifier_integration_test.go:153
Error: Should be true
Some deadlocking issues:
ERRO[0006] failed to write events to target, 1 of 5 max retries error="exec query (445 bytes): Error 1213: Deadlock found when trying to get lock; try restarting transaction" tag=binlog_writer
ERRO[0006] failed to write events to target, 2 of 5 max retries error="exec query (445 bytes): Error 1213: Deadlock found when trying to get lock; try restarting transaction" tag=binlog_writer
ERRO[0006] failed to write events to target, 3 of 5 max retries error="exec query (445 bytes): Error 1213: Deadlock found when trying to get lock; try restarting transaction" tag=binlog_writer
ERRO[0006] failed to write events to target, 4 of 5 max retries error="exec query (445 bytes): Error 1213: Deadlock found when trying to get lock; try restarting transaction" tag=binlog_writer
ERRO[0006] failed to write events to target after 5 attempts, retry limit exceeded error="exec query (445 bytes): Error 1213: Deadlock found when trying to get lock; try restarting transaction" tag=binlog_writer
ERRO[0006] fatal error detected, state dump coming in stdout errfrom=binlog_writer error="exec query (445 bytes): Error 1213: Deadlock found when trying to get lock; try restarting transaction" tag=error_handler
{
"CompletedTables": {},
"LastSuccessfulBinlogPos": {
"Name": "mysql-bin.000003",
"Pos": 852531
},
"LastSuccessfulPrimaryKeys": {
"gftest.table1": 459
}
}
panic: fatal error detected, see logs for details
goroutine 1624 [running]:
github.com/Shopify/ghostferry.(*PanicErrorHandler).Fatal(0xc422171090, 0x8d27a3, 0xd, 0xb502a0, 0xc422fa6140)
/home/ubuntu/.go_project/src/github.com/Shopify/ghostferry/error_handler.go:44 +0x70a
github.com/Shopify/ghostferry.(*BinlogWriter).Run(0xc420220af0)
/home/ubuntu/.go_project/src/github.com/Shopify/ghostferry/binlog_writer.go:60 +0x37d
github.com/Shopify/ghostferry.(*Ferry).Run.func4(0xc422e91ba0, 0xc420492b40)
/home/ubuntu/.go_project/src/github.com/Shopify/ghostferry/ferry.go:288 +0x55
created by github.com/Shopify/ghostferry.(*Ferry).Run
/home/ubuntu/.go_project/src/github.com/Shopify/ghostferry/ferry.go:286 +0x249
exit status 2
FAIL github.com/Shopify/ghostferry/test 6.349s
A proper abstraction could replace the ad-hoc statement cache in
Line 21 in ee2ff82
Originally by @pushrax:
Right now only integer types are supported. gh-ost handles this by using interface{} and letting the type propagate through untouched.
It's also possible to support composite keys in this way.
Comment by @shuhaowu:
Any thoughts on having a PK
type? This would be more explicit than an interface{}. Additionally it may allow us to quickly refactor the code with the new type but without having to change the underlying type right now from uint64 to something else.
f.inlineVerifier
on Ferry
and make it more integrated into Ferry directly.
Verifier
interface to see if it needs to be extended or removed.InlineVerifier.verifyallEventsInStore()
. Specifically, if a row is added to the BinlogVerifyStore
between the .Batches()
call and .RemoveVerifiedBatches()
call, that row must remain present after .RemoveVerifiedBatches()
.
verifyAllEventsInStore
, a private method.verifyAllEventsInStore
once and see if the counter only decreases by 1 time, or something along this line of thought.Ghostferry currently documents that it cannot work with foreign-key constraints (FKCs).
Intuitively this makes sense for two reasons:
However: mysql allows disabling foreign-key constraint checks on a per-session basis, and it does not re-validate constraints when this is disabled. As a result, we may temporarily disable constraint enforcement until the database is back in a consistent state. The only issue that does arise is that tables must be created in an order that satisfies their inter-dependencies.
The golang sql mysql driver even allows disabling constraints on a DB connection using a simple configuration change, making support even easier.
I have a working version of the above table creation change and was curious if you guys think it's a useful addition to the ghostferry-copydb tool. I understand it's somewhat hack-ish, but it could be useful in many scenarios where ghostferry cannot be used today.
Also: am I overlooking something with my assumption that disabling FKCs during the copy process is a problem?
Moving a shop failed with
Error 3144: Cannot create a JSON value from a string with CHARACTER SET binary
error.
I Test Copying , Found This Problem ,
Table Cloum Type Is:
| sf_modify_time | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
I'll try to reproduce this and record it here.
binlog_row_image
parameter introduced in mariadb 10.1.6.
https://mariadb.com/kb/en/library/replication-and-binary-log-system-variables/#binlog_row_image
Right now the HTML/CSS associated with the control server is configured to be a predefined location with the debian package. With a go build version of Ghostferry, one has to specify this as a config option. This is not really ideal. Perhaps there are better alternatives here.
...
INFO[0271] starting iterative verification in the background tag=iterative_verifier
INFO[0271] served http request method=POST path=/api/actions/verify tag=control_server time="44.141µs"
INFO[0271] starting verification during cutover tag=iterative_verifier
DEBU[0271] reverifying batches=2 tag=iterative_verifier
DEBU[0271] received pk batch to reverify len(pks)=1 table=t1 tag=iterative_verifier
DEBU[0271] received pk batch to reverify len(pks)=1 table=t2 tag=iterative_verifier
INFO[0271] cutover verification complete tag=iterative_verifier
panic: sync: negative WaitGroup counter
goroutine 6547 [running]:
sync.(*WaitGroup).Add(0xc00029c650, 0xffffffffffffffff)
/usr/lib/go/src/sync/waitgroup.go:74 +0x137
sync.(*WaitGroup).Done(0xc00029c650)
/usr/lib/go/src/sync/waitgroup.go:99 +0x34
github.com/Shopify/ghostferry.(*IterativeVerifier).StartInBackground.func1(0xc000314000)
/home/user1/go/src/github.com/Shopify/ghostferry/iterative_verifier.go:288 +0x117
created by github.com/Shopify/ghostferry.(*IterativeVerifier).StartInBackground
/home/user1/go/src/github.com/Shopify/ghostferry/iterative_verifier.go:279 +0x1f6
DEBU[8766] found 200 rows args="[1089172653]" sql="SELECT `id` FROM `db1`.`table1` WHERE `id` > ? ORDER BY `id` LIMIT 200" table=db1.table1 tag=cursor
DEBU[8766] found 200 rows args="[1089172906]" sql="SELECT `id` FROM `db1`.`table1` WHERE `id` > ? ORDER BY `id` LIMIT 200" table=db1.table1 tag=cursor
PANI[8766] logpos: 28594 0 *replication.GenericEvent tag=binlog_streamer
INFO[8766] exiting binlog streamer tag=binlog_streamer
[2018/11/16 15:26:51] [info] binlogsyncer.go:163 syncer is closing...
DEBU[8766] found 200 rows args="[1089173124]" sql="SELECT `id` FROM `db1`.`table1` WHERE `id` > ? ORDER BY `id` LIMIT 200" table=db1.table1 tag=cursor
DEBU[8766] found 200 rows args="[1089173356]" sql="SELECT `id` FROM `db1`.`table1` WHERE `id` > ? ORDER BY `id` LIMIT 200" table=db1.table1 tag=cursor
DEBU[8767] found 200 rows args="[1089173568]" sql="SELECT `id` FROM `db1`.`table1` WHERE `id` > ? ORDER BY `id` LIMIT 200" table=db1.table1 tag=cursor
[2018/11/16 15:26:51] [error] binlogstreamer.go:57 close sync with err: sync is been closing...
[2018/11/16 15:26:51] [info] binlogsyncer.go:178 syncer is closed
panic: (*logrus.Entry) (0x8ccd40,0xc000be3b30)
goroutine 42 [running]:
github.com/Shopify/ghostferry/vendor/github.com/sirupsen/logrus.Entry.log(0xc00013a1e0, 0xc0001de4e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/home/user1/go/src/github.com/Shopify/ghostferry/vendor/github.com/sirupsen/logrus/entry.go:128 +0x5a8
github.com/Shopify/ghostferry/vendor/github.com/sirupsen/logrus.(*Entry).Panic(0xc00013b090, 0xc000959ca8, 0x1, 0x1)
/home/user1/go/src/github.com/Shopify/ghostferry/vendor/github.com/sirupsen/logrus/entry.go:173 +0xb2
github.com/Shopify/ghostferry/vendor/github.com/sirupsen/logrus.(*Entry).Panicf(0xc00013b090, 0x8e4a89, 0x10, 0xc0000bdd30, 0x3, 0x3)
/home/user1/go/src/github.com/Shopify/ghostferry/vendor/github.com/sirupsen/logrus/entry.go:221 +0xed
github.com/Shopify/ghostferry.(*BinlogStreamer).updateLastStreamedPosAndTime(0xc000105ba0, 0xc000c442a0)
/home/user1/go/src/github.com/Shopify/ghostferry/binlog_streamer.go:211 +0x140
github.com/Shopify/ghostferry.(*BinlogStreamer).Run(0xc000105ba0)
/home/user1/go/src/github.com/Shopify/ghostferry/binlog_streamer.go:169 +0x356
github.com/Shopify/ghostferry.(*Ferry).Run.func4(0xc000298610, 0xc00015c0c0)
/home/user1/go/src/github.com/Shopify/ghostferry/ferry.go:339 +0x55
created by github.com/Shopify/ghostferry.(*Ferry).Run
/home/user1/go/src/github.com/Shopify/ghostferry/ferry.go:336 +0x243
The interrupt-resume feature as described in
https://shopify.github.io/ghostferry/master/copydbinterruptresume.html
works well, if the interrupt does not happen within batched RowsEvent events being processed.
A mysql replication event for changing data is always started by sending a TableMapEvent (describing the table to be changed), followed by one or more RowsEvent (containing the data to be changed). If multiple consecutive RowsEvent events are sent for the same table, the TableMapEvent is typically skipped (after sending it once).
Thus, if the interrupt happens after receiving the TableMapEvent but before receiving the last RowsEvent, ghostferry will try to resume from the last processed RowsEvent, causing the replication/BinlogSyncer syncer to crash with an invalid table id < table-ID >, no corresponding table map event exception.
This is due to the following code:
var ok bool
e.Table, ok = e.tables[e.TableID]
if !ok {
if len(e.tables) > 0 {
return errors.Errorf("invalid table id %d, no corresponding table map event", e.TableID)
} else {
return errMissingTableMapEvent(errors.Errorf("invalid table id %d, no corresponding table map event", e.TableID))
}
}
Note that this is not a bug in the replication module, because ghostferry points the syncer at a resume position after the TableMapEvent , and the code cannot satisfy this (without ignoring the fact that it doesn't know the table ID).
Changing the replication to ignore unknown table IDs is tempting but very tricky: we could cache the table IDs previously seen in ghostferry and "fill the gap" this way. Unfortunately the parsing of the RowsEvent data relies on the table schema. And, conceptually, it also seems quite unclean.
Thus, IMO the correct approach is to make sure the syncer is not pointed at a location from which it cannot resume.
Ghostferry today only stores the resume position via its lastStreamedBinlogPosition property. In my variant of ghostferry (see #26) , I have changed the BinlogStreamer class to keep track of an additional position: the lastResumeBinlogPosition . The idea is to analyze each received event from the syncer and check if it's a RowsEvent or not. If not, the event position is considered "safe to be resumed from" and the property is updated - otherwise it is kept at its current value.
Thus, the resume position is a tuple of "last received/processed event" and "last position we can resume from".
Then, when resuming, we do not resume from lastStreamedBinlogPosition, but instead from lastResumeBinlogPosition. Any event received in BinlogStreamer.Run is first analyzed if the event position is before lastStreamedBinlogPosition , and if so, it is skipped. Otherwise it is queued to the eventListeners .
Happy to share my changes as a pull request - unfortunately it's currently somewhat intermangled with my changes on ghostferry-replicatedb (see other ticket). So before I untangle these two unrelated changes, I wanted to hear your opinions on the above, and if I'm overlooking a simpler way of doing this.
I ran across your spec and thought I'd give some tips on using TLA+ here!
SourceTable = InitialTable
means you have to create a separate model for each different initial state. What we could instead do is say
CONSTANT TableCapacity
InitialTables == [1..TableCapacity -> PossibleRecords]
(*--algorithm
variables SourceTable \in InitialTables;
This sets the initial SourceTable
to some possible InitialTable
, meaning that a single model can now explore every single table of size TableCapacity
. This also has a performance bonus, since by wrapping everything in a single model TLC can skip checking symmetric states.
If you don't need to account for gaps in your table, IE NoRecordHere
isn't part of your specs then we can replace our InitialTables
with
InitialTables == UNION {[1..tc -> Records]: tc \in 0..TableCapacity}
This now generates all tables of size TableCapacity
or less. This means instead of writing <<r0, r1, NoRecordHere>>
we write <<r0, r1>>
.
If you do need NoRecordHere
, instead of defining it as something TLC can't check, try doing this:
CONSTANT NoRecord
ASSUME NoRecord \notin Records
That signals the intent more clearly, and also doesn't require you to override the module. Instead you just make NoRecord
a model value.
You current have this:
binlog_loop: while (...) {
binlog_read: if (...) {
...
binlog_write: ...
binlog_upkey: ...
};
}
It's hard to tell if the labels are sublabels or side-by-side or what. Instead, try this:
binlog_loop:
while (...) {
binlog_read:
if (...) {
...
binlog_write:
...
binlog_upkey:
...
};
}
Now it's clear that binlog_write
and binlog_upkey
are siblings under binlog_read
.
Hope these tips help!
Originally by @pushrax:
The Go library will return an error (ErrPktTooLarge
) if a packet is written that exceeds what it thinks is the max packet size.
By default that's 4 MiB. The default values are set in NewConfig()
. However, we're not calling NewConfig()
, we're building the config object ourselves. This means that in this code MaxAllowedPacket
is 0, and it loads the limit from the server. I tested this locally and it indeed is always loading the limit from the server. Thus, at the moment, exceeding the real max packet size will always return a nice Go error.
Some of the databases we want to copy using ghostferry-copydb are simple key-value pairs (2 string columns). This is currently not supported by ghostferry, as we need indexed columns (typically a primary key) to paginate by.
Given the nature of tables without such indices, it might be useful to be able to copy such tables "in one big batch". Clearly this only applies in very specific scenarios and has strong limitations if these tables have many rows (e.g., we must lock the entire table for the copy, and cannot resume a copy), but in certain scenarios it might be useful.
I have a working version of the above copy algorithm that I'm happy to send for review - but I'm not sure if you guys think this is generally useful. Please let me know and I can send a PR.
Also note that I agree that it's probably better to extend the pagination mechanism to support arbitrary tables. However this seems to be more work than what is reasonably possible in the near future, and it would not solve the issue with tables that don't have proper indices (although that is indeed a weird corner case)
Currently, Ghostferry will only attempt to acquire the cutover lock a single time before failing. We want to modify Ghostferry to retry acquiring the cutover lock, first using a static value, and then eventually using a dynamic value sent back that tells Ghostferry for how long to wait before attempting to acquire the lock again.
I Found replication event Not apply on taget tables!
But I deug it print the apply sql!
The datawriter was causing some flakiness in tests (and almost certainty on travis) where:
This was worked around via 56a3100 as I don't have too much time to look into it atm.
The first problem resembles brianmario/mysql2#896, where the author didn't report a reason, but reported a similar workaround to the above.
The second problem resembles brianmario/mysql2#956, although that issue is closed with the release 0.5.1. We currently use 0.5.2.
I suspect the root cause of this issue is somewhere inside mysql2, but I don't know what. The connection are not being shared across multiple threads, so this shouldn't be an issue. We are writing to the database as fast as we can in a loop, perhaps this contributes to the problem as slowing it down with a new connection every loop seems to have helped.
hi again
I got this error after fixing other things and wondered why its not supported? what is the deal with it or is there any planned feature about it?
anyway, we will try to fix it but wanted to hear from your side
Currently, we rely on the @@read_only
variable to check whether our connection to the database is indeed to an active writer or if it's actually a read replica. This is done specifically before we use the WaitUntilReplicaIsCaughtUpToMaster
struct to enter the cutover phase.
Perhaps not everybody's failover strategy is compatible with this behaviour. We could provide a configuration option on the WaitUntilReplicaIsCaughtUpToMaster
struct so that the user can provide us with a query that checks this condition.
As we merged #35, I moved the code that initialized the initializeWaitUntilReplicaIsCaughtUpToMasterConnection
call in ShardingFerry from after Waiting for binlog to catchup to Initialize
, which is quite a big change. The reason to do is is such that Ferry could use the WaitUntilReplicaIsCaughtUpToMaster
struct to sanity check the configuration during the Initialize phase.
A comment on the PR says that the reason of the previous placement is because pt-kill will kill idle connections. If we connect to the master too early (during Initialize), the connection might no longer be available. It was under my impression that the Go driver will handle this transparently.
Since, we never resolved that comment, I'm moving the discussion out here.
Originally by @pushrax:
It seems that even with RBR the binlog gets BEGIN/COMMIT events (via XID_EVENT) that are required for perfect replication while servicing queries in a consistent way from the replica. These events allow the replica to commit changes to multiple tables atomically. This might matter for our use-case, if the data we are writing to the target is also being selected. My understanding is that this only affects consistency of queries in the replica/target during the copying. We already tolerate inconsistent data during copying, so maybe it's completely fine to ignore these events.
Comment by @shuhaowu:
Note that it is not a problem if the target that we are copying data to is not being queried by any other source other than ghostferry itself. This means that the vast majority of the use cases is safe (also includes Github's gh-ost).
It's actually AFTER_BINLOG_STREAMED. The apply happens in another goroutine, so the name is in accurate.
Same applies for BEFORE_BINLOG_APPLY
All ShardingErrorHandler does is send a webhook. We can definitely merge this into the default but optional behaviour of core Ghostferry.
Also should make it so the error callback can be called independently of panic, so it can be invoked during the Initialize/Start phase of Ghostferry.
dml_events.go contains a helper method for converting MySQL values to uint64
:
func (r RowData) GetUint64(colIdx int) (res uint64, err error) {
which documents that MySQL never returns uint64
values. In experiments I see that for certain values, however, this is not the case, and invoking the method with RowData
that already contains the expected value type crashes
I will try to reproduce the issue using a unit-test, unfortunately I lost the stack-trace where this was happening, but I know that the crash happened on a row of table
CREATE TABLE `mytable` (
`mytable_id` bigint(19) unsigned NOT NULL AUTO_INCREMENT,
...
PRIMARY KEY (`entry_info_id`),
...
) ENGINE=InnoDB AUTO_INCREMENT=13871426091 DEFAULT CHARSET=utf8
and an mytable_id
value of 13871229209
.
We transfer 200 records at a time by default (DataIterationBatchSize). If the records are large, we will run into maximum limits and fail. We will retry, but always with the same number of records, and fail the same way.
This is a suggestion we treat DataIterationBatchSize as a default value and back off exponentially if we fail because of the maximum limits on the size. For example, use DataIterationBatchSize/2 on the second try, DataIterationBatchSize/4 on the third, and 1 on the last try.
Running Ghostferry with the source database as a replica is subject to a race condition where when we stop the binlog streamer (and start the cutover stage), pending writes on the source database master might not have propagated to the binlogs of the replicas. Since Ghostferry has no idea about these upstream servers, it could miss writes and thus cause data corruption.
I'm not quite sure yet if we should integrate something that checks the upstream binlog position matches the replica's binlog position direct into ghostferry.Ferry
. However, we can make it an API that we provide as a part of library and integrate it into copydb
.
Note this is an issue with the master as well if sync_binlog != 1
. In these cases we recommend that you call FLUSH BINARY LOGS
, which I assume will flush all pending writes to disk as it closes the current binary log file and open a new one with a separate file name. We've never tested this scenario to my knowledge.
I'm assuming it should be the contents of https://shopify.github.io/ghostferry/master/index.html ^^? https://shopify.github.io/ghostferry/ is a lil bit empty right now
Originally by @hkdsun:
See https://github.com/Shopify/ghostferry/blob/be4f15d/ferry.go#L254-L258
In the situation where the source database is under very heavy load, such that ghostferry is always caughtupThreshold
behind on writing data to the target, we could stay in that loop for a potentially really long time.
An idea is to have the function accept a configurable deadline. Once the deadline is reached, either abort the run or force the application to stop writes on the source (i.e. forcefully initiate the cutover phase).
Comment by @shuhaowu:
I would like to add, we could also add an API to dynamically force a cutover to occur (i.e. make IsAlmostCaughtUp
return true always), perhaps triggerable via the ControlServer.
Comment by @fw42:
I think I'd prefer to abort the move rather than to force the cutover, at least as the default behaviour. If the source database has so many writes that we can't catch up, that probably means that it's very active right now. Locking the source database would be very disruptive at that moment.
When it comes to creating a table on the target server with Foreign Key and if referenced table hasn't created yet, it gives an error.
cannot create table, this may leave the target database in an insane state error="Error 1005: Can't create table publisher.View (errno: 150 \"Foreign key constraint is incorrectly formed\")" table=publisher.View error: failed to create databases and tables: Error 1005: Can't create table publisher.View (errno: 150 "Foreign key constraint is incorrectly formed")
Example of a create Query for View Table:
CREATE TABLE
View
(
id
int(10) unsigned NOT NULL AUTO_INCREMENT,
platform_id
int(10) unsigned NOT NULL,
user_id
int(11) NOT NULL,
object_id
int(10) unsigned NOT NULL,
object_type
tinyint(4) NOT NULL,
timestamp
timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (id
),
KEYview_platform_id_foreign
(platform_id
),
CONSTRAINTview_platform_id_foreign
FOREIGN KEY (platform_id
) REFERENCESPlatform
(id
)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
In this case, ghostfrerry-copydb
hasn't created Platform
table yet.
We have a fix for this too. Just putting this issue here for future reference.
It would be nice to have, in the repository, a commented example of a minimum Ghostferry based application, as the existing copydb/sharding application is quite complex. This will allow beginners to quickly gain an understanding of how Ghostferry based applications should look like.
Having such an application will also benefit existing developers in testing POCs, as changing copydb/sharding could be a complicated manner.
I made a version of this when I'm testing some new features locally without depending on copydb and what not: https://gist.github.com/shuhaowu/5c48465040bda9d4143363f06c600c59. Anyone can try to convert this to a better example.
One thing we might also want to consider is to refactor copydb a little bit to make it more like a library so it can be customized more easily, as it exposes several "nice" features (like config file/filter building) that'll likely have to be reimplemented if a standalone app is to be created.
In order to handle schema changes on the source and the target databases of the same application, we opted for a method where we pause Ghostferry upon the beginning of a schema change and resume it after the schema change has completed on both the source and the target database.
In addition to being useful for schema changes, having a resumable Ghostferry is useful in the general. An example would be if the target/source database becomes temporarily unavailable. As of right now, the data on the target must be cleaned up and then we have to restart Ghostferry from the beginning.
The main issue with Ghostferry being interrupted is that binlogs are no longer being streamed from the source to the target. If the binlogs are not streamed, the target database is then no longer up to date and the data may not be valid. This issue is not exclusive to Ghostferry and also affects regular MySQL replication. The solution there involves starting replication at some user specified binlog position. Implementing the same within Ghostferry will be difficult and inefficient:
Instead of replaying the binlogs as is, a different method can be employed to keep the target up to date:
To analyze the safety of the reconciliation step, we must first make the following assumptions:
We can then analyze the situation where the known good binlog position is the same as the actual good binlog position (no overcopy of binlogs occured):
We can now simply extend this to cases when the known good binlog position is an underestimate of the actual position: if we reconcile a binlog entry that has already been streamed to the target, it will simply be deleted from the target and recopied. Thus it does not pose a problem.
The safety of this reconciliation are also verified with a (unreviewed) TLA+ model.
As demonstrated above, as long as Ghostferry saves at worst an underestimate of the binlog/cursor position, the reconciliation process is safe.
The current code only increments the last successful binlog/cursor position after the binlog/row is successfully streamed/copied. This means that if we were to panic the process at any time and get the saved values out, those values are at worst an underestimate, unless there’s something about Go that we don’t quite understand (?).
If a schema change occur on either the source or the target, we must interrupt the Ghostferry and only resume Ghostferry at some future point. We can asynchronously detect the schema changes on either the source or the target and abort the process. If an error occurs elsewhere because of the schema change within Ghostferry but at a different thread, we stall that code until we can positively identify a schema change OR we abort if some timeout has reached and yet we cannot positively identify the schema change.
We assume that:
Once resumed:
Foreign key is not supported (same as ghost). We should ensure ghostferry bails on start up when it detects a FK on a table as oppose to crashing in the middle of a run.
Hello,
I have an old MariaDB server and I wanted to use this tool for migration. Source MariaDB server was 10.0.x and binlog_row_image was not supported in this version. That's why I've upgraded it to 10.2.22 and this variable is working fine.
After granting necessary permissions to the migration user for both source and target servers, I got this error:
ERRO[0000] failed to read current binlog position error="sql: expected 4 destination arguments in Scan, not 5" tag=binlog_streamer
error: failed to start ferry: sql: expected 4 destination arguments in Scan, not 5
I checked your binary codes and did so for the tutorial, but I couldn't find where exactly is locating for binlog position.
Can you please advise how to fix it?
Thanks in advance.
To ensure MySQL's plain text interface isn't used, we need to make sure this uses prepared statements. A test should be added that fails if non-prepared statements are used to query MySQL.
More context can be gathered from #44 (comment)
When retrieving updates via binlog , the replication module is stripping trailing 0-bytes from the data in data updates. This affects the data inserted via insert/update statements as well as the "where" clause for update/delete statements.
As a result, a target DB starts to diverge in the data that it stores.
IMO this is a bug in the upstream library we use, and I have opened a ticket with them:
However, existing users of the library may break if the library is changed, which is why I think we may need to work around the bug in ghostferry directly.
To do this, we need to update go-mysql to parse BINARY/VAR_BINARY table column definition (it currently assumes they are simple string columns) and extend input array before using the data in calls like BinlogUpdateEvent. AsSQLString
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.