GithubHelp home page GithubHelp logo

mysql2mongo's Introduction

mysql2mongo

MySQL to MongoDB importer

Work based on http://code.google.com/p/php-sql-parser/

The scripts are using simple relational to non-relational (r2n) mapping mechanism that should let You automagically convert Your MySQL database into MongoDB set of collections.

The approach is: instead of direct convertion let's convert MySQL eksport (set of "INSERTS") to MongoDB import (set of "db.updates").

It would also be interesting to go a little bit further - it should be possible to translate also SELECT'S, UPDATE'S and DELETE'S (using same or similiar r2n mapings) which would make quite nice SQL to NoSQL translator ... And this means simplyfing r2n by using the translator on median migration stages... IE: incremental changes could be routed to both MYSQL an MongDB ...

Example (Wordpress users):

The tables(source of data):

wp_users (
  	`ID` bigint(20) 
  	`user_login`
 	`user_pass`
	....
	);


wp_usermeta (
  	`umeta_id` bigint(20)
  	`user_id` bigint(20)        <---- 1/n relation to users
  	`meta_key` varchar(255) 
  	`meta_value` longtext,
	....
	);

The mapping(conversion definition):

[
{name:"users", type: "t2coll", 			// mapping named "users" type "t2coll" (table as collection)
	table: "wp_users", 					// source table: "wp_users" 
	collname:"wp_users",				// target collection name: "wp_users"
	path:"[ID]"							// target path (inside collection): mapped by [ID] field from wp_users table
	},
										
{name:"users_meta", type: "t2docs", 	// mapping named "users_meta" type: "t2docs" (table as documents)	
	table: "wp_usermeta",				// source table: "wp_usermeta" 
	parent: "users",					// parent collection name: "users" (the mapping/set of data is nested)
	path:"[user_id].meta.[meta_key]",	// target path: [user_id] - (parent connector), "meta" - parent collection elment, [meta_key] - key 
	value:"[meta_value]"				// [meta_value] - value to be placed under the target path (currently only simply field value supported)
	}
]

The collection(the effect):

{
	_id:1.0,
	ID:1.0,	
	display_name:"admin",
	meta:{ 
		"admin_color" : "fresh" , 
		"closedpostboxes_ant_elems" : "a:1:{i:0;s:16:\"commentstatusdiv\";}" , 
		"comment_shortcuts" : "false" , 
		"description" : "Just testing" , 
		"dismissed_wp_pointers" : "wp330_toolbar,wp330_media_uploader,wp330_saving_widgets" , 
		"first_name" : "John" , 
		"last_name" : "TheAdmin" , 
		"wp_user-settings-time" : "1360265776" , 
		"wp_user_level" : "10"
		....
		}
	user_email:[email protected]
	user_login:admin
	user_nicename:admin
	user_pass:$P$BtC.w7H6gOdMRhht5TkBMIWD/xIiAB.
	user_registered:2012-04-25 06:36:01
	....
} 
.... 

Quick start:

Let's go through whole process from the begining to the end.

The start is similiar to this sample MySQL setup: http://dev.mysql.com/doc/employee/en/employees-installation.html

  • get sample MySQL database from https://launchpad.net/test-db/

  • unpack ("tar -xjf" , 7zip etc)

  • run import through mysql commandline tool:

      mysql -u[username] -p[password] -t < employees.sql
    

Now it's time to take a look at the db schema: http://dev.mysql.com/doc/employee/en/sakila-structure.html

Moving from relational to non-relational in case of salaries/employees/titles seems to be simple and rather obvious: Let's get employees as main collection here and titles and salaries as nested documents:

[
{name:"employees", type: "t2coll", 			// mapping named "employees" type "t2coll" (table to collection)
	table: "employees", 					// source table: "employees" 
	collname:"employees",					// target collection name: "employees"
	path:"[emp_no]"							// target path: mapped by [emp_no] field from employees table
	},	
	 
{name:"employees_salaries", type: "t2docs", // mapping named "employees_salaries" type: "t2docs" (table to documents)	
	table: "salaries",						// source table: "salaries" 
	parent: "employees",					// parent collection name: "employees" 
	path:"[emp_no].salaries.[from_date]",	//  "from_date" is unique and  meaningfull - can play as salary identifier inside employee
	}, 										// "value" omitted - the default is to place all record data as key:value pairs

{name:"employees_titles", type: "t2docs", 	// mapping named "employees_titles" type: "t2docs" (table to documents)	
	table: "titles",						// source table: "titles" 
	parent: "employees",					// parent collection name: "employees" 
	path:"[emp_no].titles.[from_date]",		//  "from_date" is ok - as above 
	} 										// "value" omitted - the default is to place all record data as key:value pairs
]

Now when we have some "r2n" mapping scrap - let's do some action:

  • put above mapping (without comments and empty lines) into file "example/employees.r2n"

  • cd example & export mysql data as full insert statements:

      mysqldump -u[username] -p[password] --skip-extended-insert --complete-insert employees employees salaries titles > employees.sql
    
  • now run the example converter:

      php r2nconvert.php employees 100
    

(convert employees.sql to employees(n).json using mapping employees.r2n each 100k records creating new file)

Notice: You have to have enough RAM memory for parent collections cache (collections which have children) - about 100MB/300k records. (currently implemented as in-memory PHPGENCreator.fieldDataCache) Or You have to partition Your DDL input (in such a way that parent records are inside the same partition as children).

At the end - You should get a set of json files containing updates redy to run on MongoDB:

db.employees.update({_id : 10001},{ $set: {emp_no:10001,birth_date:'1953-09-02',first_name:'Georgi',last_name:'Facello',gender:'M',hire_date:'1986-06-26'} } ,{multi: true, upsert: true})
db.employees.update({_id : 10001},{ $set: {'salaries.1986-06-26' : {emp_no:10001,salary:60117,from_date:'1986-06-26',to_date:'1987-06-26'}} } ,{multi: true, upsert: true})
db.employees.update({_id : 10001},{ $set: {'salaries.1987-06-26' : {emp_no:10001,salary:62102,from_date:'1987-06-26',to_date:'1988-06-25'}} } ,{multi: true, upsert: true})
db.employees.update({_id : 10001},{ $set: {'salaries.1988-06-25' : {emp_no:10001,salary:66074,from_date:'1988-06-25',to_date:'1989-06-25'}} } ,{multi: true, upsert: true})
... etc

... and mimport_[name].bat script which gets mongo database name as arg and imports all the collection data to Mongo:

mimport_eployees.bat mymongodb 

mysql2mongo's People

Contributors

virtimus avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.