GithubHelp home page GithubHelp logo

sanori / spark-access-log Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 24 KB

Simple HTTPd log (a.k.a. access.log) parser for Spark SQL

License: Apache License 2.0

Scala 100.00%
nginx-log http-logs spark-sql udf hive-udf spark-udf spark nginx-logs

spark-access-log's Introduction

access.log parser for Spark SQL

Simple HTTPd log (a.k.a. access.log) parser for Spark SQL.

Currently, Combined and Common log formats are supported.

How to use

SQL (spark-sql)

When start spark-sql:

spark-sql --packages net.sanori.spark:access-log_2.11:0.1.0

In SQL, you can create user defined function and use it:

-- attach ToCombined as to_combined(text_line)
CREATE OR REPLACE FUNCTION to_combined
AS "net.sanori.spark.ToCombined";

-- read raw log file as one column table
CREATE OR REPLACE TEMP VIEW accessLogText
USING text
OPTIONS (path "access.log");

-- create parsed log as a table
CREATE OR REPLACE TEMP VIEW accessLog
AS SELECT log.*
    FROM (
        SELECT to_combined(value) AS log
        FROM accessLogText
    )

Spark SQL (spark-shell)

When start spark-shell:

spark-shell --packages net.sanori.spark:access-log_2.11:0.1.0

Or in build.sbt:

libraryDependencies += "net.sanori.spark" %% "access-log" % "0.1.0"

DataFrame

import net.sanori.spark.accessLog.to_combined
import org.apache.spark.sql.functions._

val lineDf = spark.read.text("access.log")
val logDf = lineDf
  .select(to_combined(col("value")).as("log"))
  .select(col("log.*"))

Dataset

import net.sanori.spark.accessLog.toCombinedLog

val lineDs = spark.read.textFile("access.log")
val logDs = lineDs.map(toCombinedLog)

RDD

import net.sanori.spark.accessLog.toCombinedLog

val lines = sc.textFile("access.log")
val rdd = lines.map(toCombinedLog)

What is provided

Combined or Common logs are transformed to the table which has the following meaning:

name type default value
remoteAddr String ""
remoteUser String ""
time Timestamp 1970-01-01T00:00:00Z
request String ""
status String ""
bytesSent Long null
httpReferer String ""
httpUserAgent String ""

Other information

How to build

sbt clean package

generates access-log_2.11-0.1.0.jar in target/scala-2.11.

Motivation

  • To simplify analysis of web server logs
  • Most of the logs of web server, that is HTTP server, are in Combined or Common log format.
  • To make user defined function that can be used on spark-sql command

Alternative

If you want to view access.log as a table on Hive, not on Spark, or want to process various log formats, nielsbasjes/logparser might be better solution.

Contribution

Suggestions, idea, comments, pull requests are welcome.

spark-access-log's People

Contributors

sanori avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.