GithubHelp home page GithubHelp logo

abhr1994 / hadoop2databricks Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 118 KB

This Repository consists of all the codes needed to migrate from Infoworks Onprem to Infoworks Databricks

Python 100.00%

hadoop2databricks's Introduction

How to use the scripts in this repository?

  • param_file.csv is generated by running hdi_dump_metadata.py in On-prem environment
  • source_details.csv is generated by running getsourcedump.py in On-prem environment
# Create RDBMS Source

source /opt/infoworks/bin/env.sh;python create_rdbms_source.py --source_connection_file_path source_details.csv --source_creation_template templates/create_source_template.json --host_name localhost --host_port 2999 --auth_token <> --cluster_template default_template

# Add source to cluster template

python add_source_to_clustertemplate.py --source_name AR_Test_TD --cluster_template default_template

# Script to configure the sources

source /opt/infoworks/bin/env.sh;python source_migration_v2.py --configuration_json_path source_AR_Test_TD.json --source_name AR_Test_TD --source_type rdbms --host_name localhost --host_port 2999 --auth_token <> --cluster_template default_template

# Script to run the historical data migration

source /opt/infoworks/bin/env.sh;python run_migration.py --host <> --token <> --cluster_id 0718-041317-gilt53 --param_file param_file.csv

Requirements

  1. The folder structure in ADLS Gen2 should match the target hdfs directory structure of the on-prem sources
  2. Source schema and table name should be same in HDI and Databricks

Supported Features

  • Supports migration of timestamp-based incremental tables and batch_id based incremental tables
  • Supports reading of parquet and orc files
  • Partitioning is supported
  • Column renames are supported

Unsupported Features

  • Datatypes over-rides are not supported

hadoop2databricks's People

Contributors

abhr1994 avatar

Watchers

James Cloos avatar  avatar

hadoop2databricks's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.