GithubHelp home page GithubHelp logo

automattic / go-search-replace Goto Github PK

View Code? Open in Web Editor NEW
85.0 9.0 18.0 63 KB

πŸš€ Search & replace URLs in WordPress SQL files.

Home Page: https://wpvip.com/2018/03/28/data-sync-on-vip-go/

License: GNU General Public License v3.0

Go 92.89% Makefile 4.55% Dockerfile 2.56%
golang wordpress text-processing

go-search-replace's Introduction

Go Search Replace

Build Status

Search & replace URLs in WordPress SQL files.

cat example-from.com.sql | search-replace example-from.com example-to.com > example-to.com.sql

Overview

Migrating WordPress databases often requires replacing domain names. This is a complex operation because WordPress stores PHP serialized data, which encodes string lengths. The common method uses PHP to unserialize the data, do the search/replace, and then re-serialize the data before writing it back to the database. Here we replace strings in the SQL file and then fix the string lengths.

Considerations

Replacing strings in a SQL file can be dangerous. We have to be careful not to modify the structure of the file in a way that would corrupt the file. For this reason, we're limiting the search domain to roughly include characters that can be used in domain names. Since the most common usage for search-replace is changing domain names or switching http: to https:, this is an easy way to avoid otherwise complex issues.

Installation

From Official Releases

To install on macOS:

wget https://github.com/Automattic/go-search-replace/releases/latest/download/go-search-replace_darwin_arm64.gz
gunzip go-search-replace_darwin_arm64.gz
chmod +x go-search-replace_darwin_arm64
mv go-search-replace_darwin_arm64 /usr/local/bin/go-search-replace
go-search-replace --version

From Source

To install from source, this package requires Go.

Note the changes you need to make to your PATH and that you have to either restart your terminal or source your shell rc file.

You need to install Gox which you can install with go install github.com/mitchellh/gox@latest

Once that's installed you can install this tool with the following command: go install github.com/Automattic/go-search-replace@latest

Go is set up by convention, not configuration so your files likely live in a directory like: /Users/user/go/src/github.com/Automattic/go-search-replace

Nagivage to that directory and run make

go-search-replace will be ready for you to use. Once built you won't have to complete any of the above steps again.

go-search-replace's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-search-replace's Issues

Build failed: unsupported GOOS/GOARCH pair darwin/386

1 errors occurred:
--> darwin/386 error: exit status 2
Stderr: cmd/go: unsupported GOOS/GOARCH pair darwin/386

make: *** [build] Error 1

It looks like this might be the culprit: https://go-review.googlesource.com/c/tools/+/227552/

The darwin/386 port has been removed per golang.org/issue/37610.
TestSizes needs an operating system that has both amd64 and 386 ports,
so darwin no longer qualifies. NetBSD has had its 386 port restored
recently in golang/go#31726, so it can be used instead.

I got myself up and running by deleting everything in the makefile except my own os, but dont know Go well enough to submit a PR.

Install process is not clear

Hi,

The install process described here is not very clear.

On my side, following instructions at this link (https://jimkang.medium.com/install-go-on-mac-with-homebrew-5fa421fc55f5) did not help to get go-search-replace working.

For anyone, here is how to install go-search-replace :

wget https://github.com/Automattic/go-search-replace/releases/download/0.0.6/go-search-replace_darwin_arm64.gz
gunzip go-search-replace_darwin_arm64.gz
chmod +x go-search-replace_darwin_arm64
mv go-search-replace_darwin_arm64 /usr/local/bin/go-search-replace
go-search-replace --version

Corrupted data if serialized var contains a serialized string that gets changed

It is an edge case and the first time I've seen/noticed it so I don't expect a fix, and it's probably out of scope for this tool, but since I ran into it:
Having

serialize([ 'something' => 'a:1:{s:3:"url";s:20:"https://example.org/";}'])

and replacing the domain:

echo 'a:1:{s:4:\"test\";s:44:\"a:1:{s:3:\"url\";s:20:\"https://example.org/\";}\";}' | ./go-search-replace_linux_amd64 example.org test.com

yields

a:1:{s:4:\"test\";s:44:\"a:1:{s:3:\"url\";s:17:\"https://test.com/\";}\";}

when it should be

a:1:{s:4:\"test\";s:41:\"a:1:{s:3:\"url\";s:17:\"https://test.com/\";}\";}

because the length only gets fixed in the "inner" string. I don't see an easy fix with the way it's currently built because you'd have to backtrack all the way to the beginning to be sure that you're not inside a nested serialization, but maybe I'm missing something.

Replacing "subdomain." prefix with empty string ""

Have recently come across your wonderful script to cure Wordpress database copies between domains.

Have one more challenge - I just need to replace "staging." subdomain in all urls mentioned in sql file (including serialized PHP objects).

So replacement is just empty string (""). My wordpress installation is multi-site, so there are multiple subsiteX.maindomain.com urls as well as staging.subsiteX.maindomain.com on staging server.

While migrating from staging to production we need to delete away the "staging." subdomain.

Replace corrupts serialized data if the search and replace are of different length

To reproduce:

php -r 'echo serialize("aaaaabbbbbbbbbbaaaaa"), PHP_EOL;' > test.txt
cat test.txt | ./go-search-replace bbbbbbbbbb ccccc
cat test.txt | ./go-search-replace bbbbbbbbbb ccccccccccccccc

Expected result:

s:15:"aaaaacccccaaaaa";
s:25:"aaaaacccccccccccccccaaaaa";

Actual result:

s:20:"aaaaacccccaaaaa";
s:20:"aaaaacccccccccccccccaaaaa";

Unable to handle escaped full URLs

One of the primary use cases of this would be to rewrite domain urls in the db from http to https

I see you have several tickets on this:

#3
#2
#1

This tool covers:

http://example.com -> https://example.com

but is unable to handle the escaped variants:

http:\/\/example.com -> https:\/\/example.com

Re using delimiters for things like swapping domains:

/example.com -> /newdomain.com

And that would cover most use cases, swapping domains, filepaths etc

But none of this works for swapping http version of domain url to https version of domain url and compensating for escaped urls which are quite commonly found in WP databases.

If you try to pass in the string including escaped slashes, the tool autoescapes these and targets the wrong string instead:

root@may31-devbeta-jeff-buildtest:~# echo "input:"; cat test.sql && cat test.sql | go-search-replace http:\/\/example.com https:\/\/example.com >totest.sql; echo "output:" && cat totest.sql
input:
http://example.com
http:\/\/example.com
output:
https://example.com
http:\/\/example.com

See this has been rewritten instead:

http://example.com -> https://example.com 

And the string with the escaped url is untouched.

This is similar behaviour to interconnectit search-replace, but with that tool you can escape the backslashes like so on the inputs....

So to target escaped urls with interconnectit tool you use:

http:\\\/\/example.com https:\\\/\/example.com

But that doesn't work here:

root@may31-devbeta-jeff-buildtest:~# echo "input:"; cat test.sql && cat test.sql | go-search-replace http:\\\/\/example.com https:\\\/\/example.com >totest.sql; echo "output:" && cat totest.sql
input:
http://example.com
http:\/\/example.com
Invalid <from> URL, minimum length is 4
output:

How to use/install this?

I downloaded the release but then what?
Could you add some infos to the readme, would be greatly appreciated!

Imports: Mixed protocol search

Should we solve for a dataset containing multiple protocols for a single domain to be converted; so in a dataset there would be:

https://example.local
http://example.local

Which both need converting to:

https://example.com

There's also Protocol Relative URLs:

//example.com => https://example.com
…or…
//example.com => http://example.com

Multiple search-replaces

If multiple search replacements are passed, run all the string replacements before fixing serialized data. This should avoid having to loop through the file many times.

Help clarify understanding of async processing?

Hey there!

I was hoping you could help clarify my understanding of what's going on under the hood! I'm not terribly familiar with Go and am using your script as a method of learning! As I understand it, you're actually doing some clever asynchronous programming here and I wanted to make sure my understanding is correct πŸ˜„

There's essentially two top-level goroutines: one that iterates over the file from STDIN and another that waits for the first goroutine to complete its work. The first goroutine creates child goroutines to process each line of the file separately. Now, this introduces a challenge: how can you asynchronously process a file but spit it out in the same order it was received? I believe that's why you have a channel of channels (lines). The child goroutine adds a channel for each line it processes (which is processed in order), which then allows the child goroutine to take as long as it likes because the channel will be read later once the file is finished being read and the WaitGroup is closed? Or each line will be printed whenever it's received by fmt.Print() on line 113? Generally, the idea of the channel of channels allows you process each line individually asynchronously while still maintaining the order of the file. Is that understanding correct? Thanks for sharing this work, it's been incredibly helpful for me!

Protocol relative URLs

We should replace protocol relative URLs, e.g. with whatever the destination protocol is:

//example.com => https://example.com
…or…
//example.com => http://example.com

Add a license?

Can you add a license to this project? Preferably MIT?

Incorrect length calculation in serialization of string data

In the attached SQL dumps of a WordPress options table, some string data is not being serialized correctly.

Command: cat wp_1000642_options.sql | ./go-search-replace hcommons.org heinlein.mlacommons.org > wp_1000642_options_replaced.sql

The error occurs for one long field containing CSS. Here are the relevant snippets:

Original:

...
s:16:\"additional-style\";s:24458:\"div.gdlr-blog-co
...

There is one substitution performed of hcommons.org -> heinlein.mlacommons.org

Result:

...
s:16:\"additional-style\";s:6533:\"div.gdlr-blog-co
...

The new serialized length should be 24469, reflecting the difference in string length between the two URLs, but it is instead reported as 6533. If I serialize the value using PHP's serialize function, it is serialized correctly. The serialization appears correct other than the length.

Original file (renamed .txt for uploading):
wp_1000642_options.txt

File after replacement:
wp_1000642_options_replaced.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.