GithubHelp home page GithubHelp logo

ildar-shaimordanov / perl-utils Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 0.0 97 KB

Set of text- and file-oriented utilities for processing texts, files and piping

License: MIT License

Batchfile 0.50% Perl 95.04% Shell 2.13% Awk 2.33%
perl grep sort logfile-parser processing-paragraphs multilined-entries sponge pipeline piping transpose

perl-utils's Introduction

perl-utils

Table of Content

Preamble

perl-utils is the set of text- and file-oriented utilities. Text-oriented scripts are supposed to be used mostly for processing paragraphs. By default, a paragraph is idenitified as a bunch of text lines delimited by an empty or blank lines.

Assuming the text file is the set of paragraphs, it is easier to sort, merge and filter some files without losing links between lines of paragraphs.

For example, multiline log entries in log files could contain additional useful information. Using grep -C (or grep -A, or grep -B) doesn't guarantee complete extraction of particular log entries (or can extract other log entries not necessary at the moment).

Paragraph processing utilities

paragrep

paragrep - grep-like filter for searching matches in paragraphs.

paragrep assumes the input consists of paragraphs and prints the paragraphs matching a pattern. Paragraph is identified as a block of text delimited by an empty or blank lines.

The initial version was very simple and was implemented as a shell function invoking perl inline script for grepping log files:

paragrep() {
	perl -ne '
	if ( m/$break_of_para/ ) {
		print $para if defined $para && $para =~ /$regexp/;
		$para = "";
	}
	$para .= $_;
	END {
		print $para if defined $para && $para =~ /$regexp/;
	}
	' -s -- -break_of_para="$1" -regexp="$2" "${@:3}"
}

or

paragrep() {
	perl -ne '
	( m/$break_of_para/ or eof ) and do {
		print $para if defined $para && $para =~ /$regexp/;
		$para = "";
	};
	$para .= $_;
	' -s -- -break_of_para="$1" -regexp="$2" "${@:3}"
}

Later I decided to implement it as the standalone script adding more functionality and flexibility.

Example

Each log entry in log files usually begins with the timestamp in the generalized numeric form date time, which can be covered by the pattern without reflecting on which date format has been used to output dates:

paragrep -Pp '^\d+[/-]\d+[/-]\d+ \d+:\d+:\d+' PATTERN FILENAME

Also the aliases for parsing log files and INI-like configuration files:

alias lgrep="paragrep -Pp '^\d+[/-]\d+[/-]\d+ \d+:\d+:\d+'"
alias cgrep="paragrep -Pp '^(#@ |#-> )?\['"

Similar tools

While working on the script I found a lot of interesting implementations of the task on different languages. Here is a quite short excerpt of them interested me:

logmerge

Small and powerful script to merge two or more logfiles so that multilined entries appear in the correct chronological order without breaks of log entries.

Other text-oriented utilities

sponge

sponge is Perl version of the sponge from the Debian package moreutils.

It reads standard input to memory and writes it out to the specified file. Unlike a shell redirect, the script soaks up all its input before opening the output file. This allows constructing pipelines that read from and write to the same file. If no file is specified, outputs to STDOUT.

My first release was the Perl inline script within the shell function:

sponge() {
	perl -ne '
	push @lines, $_;
	END {
		open(OUT, ">$file")
		or die "sponge: cannot open $file: $!\n";
		print OUT @lines;
		close(OUT);
	}
	' -s -- -file="$1"
}

Perl has many ways to do it. So, there is a bit another way also supporting the -a option for appending to the file:

sponge() {
	perl -e '
	$file = shift || "-";
	@lines = <>;
	open OUT, ( defined $a ? ">>" : ">" ) . $file
	or die "sponge: cannot open $file: $!\n";
	print OUT @lines;
	close OUT;
	' -s -- "$@"
}

Awk can do sponge as well:

#!/usr/bin/awk -f

# slurp a stuff and burp...
# ... | awk -f sponge.awk [-v ORS="\r\n"] [-v append=1] [-v file=file]

NR == 1	{ lines = $0 }
NR != 1	{ lines = lines ORS $0 }

END	{
	if ( ! file ) { file = "-" }
	if ( append ) {
		print lines >> file;
	} else {
		print lines >  file;
	}
}

or the same but more convenient in shell:

#!/bin/sh

# slurp a stuff and burp...
# ... | sponge [-a] file

sponge() (
	case "$1" in
	-a | --append )
		append=1
		file="$2"
		;;
	* )
		append=""
		file="$1"
		;;
	esac

	awk -v append="$append" -v file="$file" '
NR == 1	{ lines = $0 }
NR != 1	{ lines = lines ORS $0 }

END	{
	if ( ! file ) { file = "-" }
	if ( append ) {
		print lines >> file;
	} else {
		print lines >  file;
	}
}'
)

sponge "$@"

Example

An abstract example of usage is described in the tool's help and shown below:

sed '...' file | grep '...' | sponge [-a] file

See also

transpose

This is Perl implementation of the AWK script to transpose the input file so rows become columns and columns become rows.

#!/usr/bin/awk -f

{
	for (i = 1; i <= NF; i++) {
		a[NR,i] = $i
	}
}

NF > p {
	p = NF
}

END {
	for (j = 1; j <= p; j++) {
		str = a[1,j]
		for (i = 2; i <= NR; i++) {
			str = str OFS a[i,j];
		}
		print str
	}
}

Example

( echo {1..5} ; echo {100..104} ) | ./transpose

See also

File-oriented utilities

file-rename

file-rename renames the filenames supplied according to the rule specified as the first argument. It supports several ways to rename files: applying a perl code to copy or move files; rotating names cyclically left or right; swapping two names; flipping the whole list of files.

Example

file-rename 's/\.bak$//' *.bak

See Also

To be continued...

perl-utils's People

Contributors

ildar-shaimordanov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.