GithubHelp home page GithubHelp logo

embulk-parser-regex's Introduction

Regex parser plugin for Embulk

A simple parser Using Regular Expression.

Overview

  • Plugin type: parser
  • Guess supported: yes

Configuration

  • regex: regular expression that must use Named Capturing Group (string, required)
  • columns: column definition (list of object)
    • regex_name: 'Named Capturing Group' can only include [a-zA-Z0-9], so alias group name in regex can be specified (string, default: <name> attr value)
  • skip_if_unmatch: if false, when a line don't match the regex, raise RuntimeException. If true, skip the line. (boolean, default: false)

Example

in:
  type: any file input plugin type
  parser:
    type: regex
    regex: ^(?<remoteHost>[.:0-9]+) (?<identity>\S+) (?<user>\S+) \[(?<datetime>[^\]]*)\] "((?<method>\S+) (?<path>\S+) (?<protocol>HTTP/\d+\.\d+)|-)" (?<status>[0-9]+) (?<size>[0-9]+|-) "(?<referer>[^"]*)" "(?<userAgent>[^"]*)" (?<inByte>[0-9]+) (?<outByte>[0-9]+)$
    columns:
    - {name: remote_host, type: string, regex_name: remoteHost}
    - {name: identity, type: string}
    - {name: user, type: string}
    - {name: datetime, type: timestamp, format: '%d/%b/%Y:%H:%M:%S %z'}
    - {name: method, type: string}
    - {name: path, type: string}
    - {name: protocol, type: string}
    - {name: status, type: long}
    - {name: size, type: long}
    - {name: referer, type: string}
    - {name: user_agent, type: string, regex_name: userAgent}
    - {name: in_byte, type: long, regex_name: inByte}
    - {name: out_byte, type: long, regex_name: outByte}

Guess

Some apache LogFormats can be guessed. After writing in: section, you can let embulk guess parser: section using this command:

$ embulk gem install embulk-parser-regex
$ embulk guess -g regex config.yml -o guessed.yml

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously

Run checkstyle:

$ ./gradlew check

embulk-parser-regex's People

Contributors

frsyuki avatar mokemokechicken avatar muga avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

frsyuki muga

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.