GithubHelp home page GithubHelp logo

northern-lights / yara-parser Goto Github PK

View Code? Open in Web Editor NEW
80.0 4.0 9.0 379 KB

Tools for parsing rulesets using the exact grammar as YARA. Written in Go.

License: MIT License

Lex 6.91% Yacc 13.48% Makefile 0.50% Go 75.56% Dockerfile 0.26% YARA 3.29%
yara yara-ruleset grammar lexer yara-parser json ruleset security security-tools signatures

yara-parser's Introduction

yara-parser

yara-parser is a Go library for manipulating YARA rulesets. Its key feature is that it uses the same grammar and lexer files as the original libyara to ensure that lexing and parsing work exactly like YARA. The grammar and lexer files have been modified to fill Go data structures for ruleset manipulation instead of compiling rulesets for data matching.

Using yara-parser, one will be able to read YARA rulesets to programatically change metadata, rule names, rule modifiers, tags, strings, and more.

The ability to serialize rulesets to JSON for rule manipulation in other languages is provided with the y2j tool. Similarly, j2y provides JSON-to-YARA conversion, but do see Limitations below.

Installation

For the following go get commands, if you experience any issues, they are likely due to outdated versions of Go. The project uses features introduced in Go 1.10. Installation should proceed normally after an update.

To install (or update) everything at once, the following command can be used:

go get -u github.com/Northern-Lights/yara-parser/...

y2j: YARA to JSON

Use the following command to install the y2j command for converting YARA rulesets to JSON.

go get -u github.com/Northern-Lights/yara-parser/cmd/y2j

Of course, this will install y2j to $GOPATH/bin, so ensure that the latter is in your $PATH.

The grammar and lexer files are frozen so that building them with goyacc and flexgo are not necessary.

j2y: JSON to YARA

Use the following command to install the j2y command for converting JSON to YARA rulesets.

go get -u github.com/Northern-Lights/yara-parser/cmd/j2y

Grammar Library

Use the following command to install the grammar library for deserializing YARA rulesets without installing y2j.

go get -u github.com/Northern-Lights/yara-parser/grammar

y2j Usage

Command line usage for y2j looks like the following:

$ y2j --help            
Usage of y2j: y2j [options] file.yar

options:
  -indent int
        Set number of indent spaces (default 2)
  -o string               
        JSON output file

In action, y2j would convert the following ruleset:

import "pe"
import "cuckoo"

include "other.yar"

global rule demo : tag1 {
meta:
    description = "This is a demo rule"
    version = 1
    production = false
    description = "because we can"
strings:
    $string = "this is a string" nocase wide
    $regex = /this is a regex/i ascii fullword
    $hex = { 01 23 45 67 89 ab cd ef [0-5] ?1 ?2 ?3 }
condition:
    $string or $regex or $hex
}

to this JSON output:

{
   "file": "sample.yar",
   "imports": [
      "pe",
      "cuckoo"
   ],
   "includes": [
      "other.yar"
   ],
   "rules": [
      {
         "modifiers": {
            "global": true,
            "private": false
         },
         "identifier": "demo",
         "tags": [
            "tag1"
         ],
         "meta": [
            {
               "Key": "description",
               "Val": "This is a demo rule"
            },
            {
               "Key": "version",
               "Val": 1
            },
            {
               "Key": "production",
               "Val": false
            },
            {
               "Key": "description",
               "Val": "because we can"
            }
         ],
         "strings": [
            {
               "id": "$string",
               "type": 0,
               "text": "this is a string",
               "modifiers": {
                  "nocase": true,
                  "ascii": false,
                  "wide": true,
                  "fullword": false,
		  "xor": false,
                  "i": false,
                  "s": false
               }
            },
            {
               "id": "$regex",
               "type": 2,
               "text": "this is a regex",
               "modifiers": {
                  "nocase": false,
                  "ascii": true,
                  "wide": false,
                  "fullword": true,
		  "xor": false,
                  "i": true,
                  "s": false
               }
            },
            {
               "id": "$hex",
               "type": 1,
               "text": " 01 23 45 67 89 ab cd ef [0-5] ?1 ?2 ?3 ",
               "modifiers": {
                  "nocase": false,
                  "ascii": false,
                  "wide": false,
                  "fullword": false,
		  "xor": false,
                  "i": false,
                  "s": false
               }
            }
         ],
         "condition": "$string or $regex or $hex"
      }
   ]
}

Note that the string types are as follows:

String type int code Designation
0 string
1 hex pair bytes
2 regex

Go Usage

Sample usage for working with rulesets in Go looks like the following:

package main

import (
	"fmt"
	"log"
	"os"

	"github.com/Northern-Lights/yara-parser/grammar"
)

func main() {
	input, err := os.Open(os.Args[1])   // Single argument: path to your file
	if err != nil {
		log.Fatalf("Error: %s\n", err)
	}

	ruleset, err := grammar.Parse(input, os.Stdout)
	if err != nil {
		log.Fatalf(`Parsing failed: "%s"`, err)
	}

    fmt.Printf("Ruleset:\n%v\n", ruleset)
    
    // Manipulate the first rule
    rule := ruleset.Rules[0]
    rule.Identifier = "new_rule_name"
    rule.Modifiers.Global = true
    rule.Modifiers.Private = false
}

Development

The included Dockerfile will build an image suitable for producing the parser and lexer using goyacc and flexgo. There is a builder target in the Makefile to help you quickly get started with this. Run the following to build the builder image:

make builder

This will provide you with a Docker image called yara-parser-builder.

As you make changes to the grammar, you can then run make grammar. The .go files will be output in the grammar/ directory.

Limitations

Currently, there are no guarantees with the library that modified rules will serialize back into a valid YARA ruleset. For example, you can set rule.Identifier = "123", but this would be invalid YARA. Additionally, adding or removing strings may cause a condition to become invalid, and conditions are currently treated only as text. Comments also cannot be retained.

yara-parser's People

Contributors

nbareil avatar northern-lights avatar onenulluser avatar saretx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

yara-parser's Issues

Parentheses alone on a line not working

Despite using YARA's grammar.y file, the parser is failing to recognize rules such as the following:

rule FOR {
strings:
    $s1 = "abc"
condition:
    (for any i in (0..#s1) : (
        @s1[i] > 5
    ))
}
$ y2j for.yar 
Couldn't parse YARA ruleset: Parser result: "1" grammar: lexical error @5: "syntax error: unexpected _DOT_DOT_, expecting _RPAREN_ or _OR_ or _AND_"

Keep yara comments

Would it be possible to keep Yara comments please?

I guess it would require to "attach" each comment to a string/meta/condition (to reattach the comment at the Serialization stage).

Thank you very much for your very valuable project!

No Serialize function

Would you consider a PR adding a serialize function to Rule?

Something like:

func serialize(output io.Writer, rule data.Rule) {
	fmt.Fprintf(output, "rule %s ", rule.Identifier)
	if len(rule.Tags) > 0 {
		fmt.Fprintf(output, ": %s ", strings.Join(rule.Tags, " "))
	}

	fmt.Fprintf(output, "{ \n")
	if len(rule.Meta) > 0 {
		fmt.Fprintf(output, "  meta:\n")
		for _, meta := range rule.Meta {
			if _, ok := meta.Val.(string); ok {
				fmt.Fprintf(output, "    %s = \"%s\"\n", meta.Key, meta.Val)
			}
			if _, ok := meta.Val.(int64); ok {
				fmt.Fprintf(output, "    %s = %d\n", meta.Key, meta.Val)
			}
			if val, ok := meta.Val.(bool); ok {
				if val {
					fmt.Fprintf(output, "    %s = true\n", meta.Key)
				} else {
					fmt.Fprintf(output, "    %s = false\n", meta.Key)
				}
			}
		}
		fmt.Fprintf(output, "\n")
	}

	if len(rule.Strings) > 0 {
		fmt.Fprintf(output, "  strings:\n")
		for _, s := range rule.Strings {
			if s.Type == data.TypeString {
				fmt.Fprintf(output, "    %s = \"%s\"", s.ID, s.Text)
			} else if s.Type == data.TypeRegex {
				fmt.Fprintf(output, "    %s = /%s/", s.ID, s.Text)
			} else if s.Type == data.TypeHexString {
				fmt.Fprintf(output, "    %s = { %s }", s.ID, s.Text)
			}
			if s.Modifiers.ASCII {
				fmt.Fprintf(output, " ascii")
			}
			if s.Modifiers.Wide {
				fmt.Fprintf(output, " wide")
			}
			if s.Modifiers.Nocase {
				fmt.Fprintf(output, " nocase")
			}
			if s.Modifiers.Fullword {
				fmt.Fprintf(output, " fullword")
			}

			if s.Modifiers.I {
				fmt.Fprintf(output, "i")
			}
			if s.Modifiers.S {
				fmt.Fprintf(output, "s")
			}

			fmt.Fprintf(output, "\n")
		}
		fmt.Fprintf(output, "\n")
	}

	fmt.Fprintf(output, "  condition:\n    %s\n}\n\n", rule.Condition)
}

Lexer needs to handle error conditions

Conditions such as these need to be handled:

<str>\n  {
  /* syntax_error("unterminated string"); */
}

Should be as simple as:

<str>\n  {
    panic("unterminated string")
}

What is "error" in the grammar?

I'm almost done with the expression parsing, but I can't figure out what this rule does (grammar.y:444):

_FOR_ for_expression error

There seems to be no "error", neither in the yacc file nor in the flex one.

Add xor modifier

YARA has a new xor modifier for strings. Add this to the grammar and lexer.

Use an index for meta and strings

It was nice to be able to do rule.Meta[key] and rule.Strings[id] to get metas and strings in O(1) time, but this caused issues with allowing duplicate $ anonymous strings. As a result, we went with slices for both metas and strings to align more closely with the libyara structure.

We may be able to still carry a map for metas and strings for use in Go development in the grammar library. This may be doable if we:

  1. Use json:",-" as a tag for these indices (to ignore it in JSON serialization)
  2. Use a custom JSON unmarshaler to build the index upon deserialization

XOR modifier doesn't work when bytes range is provided

When one specifies bytes range for xor modifier, e.g.: $xor_string = "This program cannot" xor(0x01-0xff), parser throws an error:

Parser result: "1" grammar: lexical error @4: "syntax error: unexpected '(', expecting CONDITION"

Add --update flag

Add a --update flag to execute go get -u github.com/Northern-Lights/yara-parser/cmd/y2j to automatically update y2j.

Key and Val in meta should have tags for lowercase keys in json data

// A Meta is a simple key/value pair. Val should be restricted to
// int, string, and bool.
type Meta struct {
	Key string
	Val interface{}
}

Should be

// A Meta is a simple key/value pair. Val should be restricted to
// int, string, and bool.
type Meta struct {
	Key string        `json:"key"`
	Val interface{} `json:"val"`
}

Strings duplicate check disallows multiple anonymous `$` strings

Because we are using a map for strings, the anonymous $ identifier cannot be duplicated in the yara-parser implementation. This would be allowed in libyara.

We should probably just follow libyara and make the strings and metas lists (slices), and then check at runtime that string identifiers are not being duplicated, making an exception for the anonymous $ identifier.

Leading slashs are stripped

When a regexp ends with a "/", it is removed:

$ cat leading_slash.yara
rule foobar {
  strings:
    $a = /foo\//
  condition:
    all of them
}

$ y2j leading_slash.yara
{           
   "file": "leading_slash.yara",
   "imports": null,
   "includes": null,
   "rules": [
      {
         "modifiers": {
            "global": false,
            "private": false
         },
         "identifier": "foobar",
         "tags": [],
         "meta": null,
         "strings": [
            {
               "id": "$a",
               "type": 2,
               "text": "foo\\",
               "modifiers": {
                  "nocase": false,
                  "ascii": false,
                  "wide": false,
                  "fullword": false,
                  "i": false,
                  "s": false
               }
            }
         ],
         "condition": "all of them"
      }
   ]
}

While I was expecting "text": "foo\\/".

Single JSON Output Schema

Greetings friends,

I'm one of the maintainers of plyara:
https://github.com/plyara/plyara

It looks like we've arrived at many of the same conclusions and are doing very similar things at least in the parsing and JSON output departments. Our project just released a 2.0.0 version yesterday, and the announcement sparked a discussion on Twitter here:
https://twitter.com/MalwareUtkonos/status/1091533281244471297

Eventually, the author of YARA joined the conversation and mentioned that they are working on a Go implementation of the YARA parser. This implementation will output rules in JSON format. This may or may not obsolete parts of your project, but that's a side topic.

My main proposal is: let's coordinate on one single schema for data structure and JSON output format. We can definitely have local variation, but I think having a single schema that is interoperable among all three projects is a good thing. As a first step, I can post an annotated copy of our full JSON schema along with the reasoning behind various decisions. The short term goal would be to have both your and our annotated schema sent over to the core YARA developers. An ideal situation would be that they adopt as much of our "unified" schema as makes sense. They would then release the official schema when ready. We would then produce JSON that conforms to that official schema. If there are fields that we can't all agree on, we would then have a flag to enable additional local/optional fields in our output.

Please let me know your thoughts on this proposal.

Here is our open issue on the same subject:
plyara/plyara#50

Newest commit broke the build?

As of the most recent commit, building produces the following error:

../../go/src/github.com/Northern-Lights/yara-parser/data/serialize.go:184:10: undefined: NewYARAError
../../go/src/github.com/Northern-Lights/yara-parser/data/serialize.go:191:10: undefined: NewYARAError
../../go/src/github.com/Northern-Lights/yara-parser/data/serialize.go:259:10: undefined: NewYARAError
../../go/src/github.com/Northern-Lights/yara-parser/data/serialize.go:282:10: undefined: NewYARAError
../../go/src/github.com/Northern-Lights/yara-parser/data/serialize.go:286:9: undefined: NewYARAError
../../go/src/github.com/Northern-Lights/yara-parser/data/serialize.go:293:11: undefined: NewYARAError

The previous commit builds and works normally. NewYARAError doesn't appear to be defined anywhere?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.