GithubHelp home page GithubHelp logo

io's Introduction

IO

ETL (Extract-Transform-Load)

Extract-Transform-Load (ETL) is a data integration process involving the extraction of data from various sources, transformation into a suitable format, and loading into a target database or data warehouse.

  • Extracting data from various sources.
  • Transforming the data into a suitable format/structure.
  • Loading the transformed data into a target database or data warehouse.

Batch processing

  • core-go/io is designed for batch processing, enabling the development of complex batch applications. It supports operations such as reading, processing, and writing large volumes of data.
  • core-go/io is not an ETL tool. It provides the necessary libraries for implementing ETL processes. It allows developers to create jobs that extract data from sources, transform it, and load it into destinations, effectively supporting ETL operations.

2 common use cases of data processing

Use Cases of core-go/io in ETL:

  • Data Migration: Moving and transforming data from legacy systems to new systems.
  • Data Processing: Handling large-scale data processing tasks like data cleansing and transformation
  • Data Warehousing: Loading and transforming data into data warehouses.
  • Business Intelligence: Transforming raw data into meaningful insights for decision-making, to provide valuable business insights and trends.

Specific Use Cases of core-go/io

Export from database to file

Export from database to file

Samples:

Import from file to database

Import from file to database

  • Detailed flow to import from file to database

    Import flow with data validation

Samples:
Layer Architecture
  • Popular for web development

Layer Architecture

Hexagonal Architecture
  • Suitable for Import Flow

Hexagonal Architecture

Based on the flow, there are 4 main components (4 main ports):
  • Reader, Validator, Transformer, Writer
Reader

Reader Adapter Sample: File Reader. We provide 2 file reader adapters:

  • Delimiter (CSV format) File Reader
  • Fix Length File Reader
Validator
  • Validator Adapter Sample: Schema Validator
  • We provide the Schema validator based on GOLANG Tags
Transformer

We provide 2 transformer adapters

  • Delimiter Transformer (CSV)
  • Fix Length Transformer
Writer

We provide many writer adapters:

ETL Tool Programming vs Traditional Programming

ETL for Simple Transformations

  • Ease of Use: ETL tools are designed to handle straightforward data extraction, transformation, and loading processes efficiently. Simple operations like data type conversions, string manipulations, and basic arithmetic are typically easy to implement and perform well.
  • Graphical Interfaces: Many ETL tools provide intuitive graphical interfaces that allow users to design and implement simple transformations without deep programming knowledge, speeding up the development process.

ETL for Complex Transformations

  • Performance Concerns: When dealing with more complex logic such as loops and conditional statements, ETL processes can become less efficient. This is because ETL tools are often optimized for set-based operations rather than iterative ones, which can lead to slower performance compared to traditional programming languages.
  • Increased Complexity: Writing complex logic in ETL tools can be cumbersome and less readable compared to general-purpose programming languages. The logic might be scattered across various transformation steps, making it harder to maintain and debug.
  • Limited Flexibility: ETL tools may have limitations in terms of the programming constructs they support. This can make it challenging to implement certain algorithms or logic that would be straightforward in a traditional programming language.

Comparative Analysis

Programming Languages (e.g., Java, Go, Python, nodejs):

Advantages:
  • Flexibility: Full programming languages offer greater flexibility and control over the code, allowing for complex logic, custom functions, and advanced algorithms.
  • Performance: For complex transformations, especially those involving iterative processes or conditionals, programming languages can be optimized for better performance.
  • Libraries and Frameworks: A rich ecosystem of libraries and frameworks can be leveraged to handle specific tasks efficiently.
Disadvantages:
  • Development Time: Writing ETL processes from scratch in a programming language can be time-consuming, especially for simple tasks that ETL tools can handle out-of-the-box.
  • Learning Curve: Requires more in-depth programming knowledge, which can be a barrier for non-developers or those new to programming.

ETL Tools (e.g., Talend, Informatica, Apache NiFi):

Advantages:
  • Ease of Use: Designed to simplify the ETL process with user-friendly interfaces and pre-built connectors for various data sources.
  • Speed for Simple Tasks: Quick to implement and deploy simple transformations and data movements.
  • Maintenance: Easier to maintain for straightforward ETL tasks due to visual workflows and less code complexity.
Disadvantages:
  • Performance: Can be less performant for complex logic involving loops and conditionals.
  • Complexity for Advanced Tasks: As the complexity of the transformations increases, ETL tools can become cumbersome and harder to manage.
  • Limited Control: Less flexibility in implementing highly customized logic or optimizations compared to traditional programming languages.

Can we have a solution, which has the advantages of both ETL and traditional programming?

I am finding a solution, which has the advantages of both ETL and traditional programming. Let's analyze 2 common use cases of ETL:

ETL

  • Reader and Writer: ETL is mostly useful.
  • Transformation: ETL is useful for simple transformation only.

Programming Languages

For File Reader, Database Reader and File Writer, Database Writer:
  • Libraries and Frameworks: Please refer to core-go/io, we provide a rich ecosystem of libraries, which can be leveraged to handle specific tasks efficiently like ETL tool (The effort is still higher than ELT tool, but very small).
  • We also have this advantage of ETL tool: Speed for Simple Tasks.
  • Maintenance: with to core-go/io, we provide the descriptive language for GO and nodejs, it is easier to maintain like ETL Tool.
For transformation:
  • Flexibility: for complicated tasks, full programming languages offer greater flexibility and control over the code, allowing for complex logic, custom functions, and advanced algorithms.
  • Performance: for complex transformations, especially those involving iterative processes or conditionals, programming languages can be optimized for better performance.

Conclusion

  • The above advantages and disadvantages are key considerations in choosing between ETL tools and traditional programming for data transformation tasks. For simple and straightforward ETL processes, ETL tools can offer significant advantages in terms of ease of use and development speed. However, for more complex transformations involving intricate logic, loops, and conditionals, traditional programming languages might offer better performance, flexibility, and maintainability.
  • Ultimately, the choice depends on the specific requirements of the ETL process, the complexity of the transformations, and the skill set of the team involved. A hybrid approach, where simple tasks are handled by ETL tools and complex logic is implemented in a programming language, can also be a viable solution.
  • With core-go/io for GOLANG and a rich ecosystem of libraries for nodejs, we offer a hybrid approach, which can handle complex logic in a programming language, but also leverage to handle specific tasks efficiently by programming at reader and writer.

Summary

File Reader

  • File Stream Reader
  • Delimiter (CSV format) File Reader
  • Fix Length File Reader

File Writer

  • File Stream Writer

Delimiter (CSV format) Transformer

  • Transform an object to Delimiter (CSV) format
  • Transform an object to Fix Length format

Appendix

Import and export data for nodejs

Export data for nodejs:

Key features
  • onecore: Standard interfaces for typescript to export data.
  • io-one: File Stream Writer, to export data to CSV or fix-length files by stream.
Libraries to receive rows as stream, to export each record one by one:
Samples
Import data for nodejs
Key features
  • onecore: Standard interfaces for typescript to export data.
  • io-one: File Stream Reader, to read CSV or fix-length files from files by stream.
  • xvalidators: Validate data
  • import-service: Implement import flow
Libraries to write data to database
  • query-core: Simple writer to insert, update, delete, insert batch for Postgres, MySQL, MS SQL
  • Oracle: oracle-core to wrap oracledb, to build insert or update SQL statement, insert batch for Oracle.
  • My SQL: mysql2-core to wrap mysql2, to build insert or update SQL statement.
  • MS SQL: mssql-core to wrap mssql, to build insert or update SQL statement.
  • SQLite: sqlite3-core to wrap sqlite3, to build insert or update SQL statement.
  • Mongo: mongodb-extension to wrap mongodb, to insert, update, upsert, insert batch, update batch, upsert batch.
Sample
  • import-sample: nodejs sample to import data from fix-length or csv file to sql (Oracle, Postgres, My SQL, MS SQL, SQLite)

Installation

Please make sure to initialize a Go module before installing core-go/io:

go get -u github.com/core-go/io

Import:

import "github.com/core-go/io"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.