logstash-plugins / logstash-output-csv Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Seems that 5.2 broke the CSV output plugin.
This will generate a "csv" with all the message fields all the time NOT separated by comma's but spaces even though "blah" does not exist as a field. This works correctly in 2.3 and is broken in 5.2.
To recreate pass in a file with this in it:
00:00:00.0 COMM_TURNED_ON YODA
Use this grok pattern:
EVENT_COMM_TURNED_ON %{TIME:event_time}%{SPACE}%{NOTSPACE:event_type}%{SPACE}%{NOTSPACE:name}
input { stdin { } }
filter {
grok {
patterns_dir =>["C:/src/elk/broken"]
match =>["message", "%{EVENT_COMM_TURNED_ON}"]
}
}
output {
if "_grokparsefailure" not in [tags] {
elasticsearch {
index => "raw-data-%{+YYYY.MM.dd}"
}
if "COMM_TURNED_ON" in [message] {
csv {
fields => ["blah"]
csv_options => {"col_sep" => "," "row_sep" => "\r\n"}
path => "C:/src/elk/comm_turned_on.csv"
}
}
}
}
I'm using the JDBC input in combination with the CSV output. When I fetch data from the table and write the data to an output CSV, it appears that the file remains open. This is a shame, because I need to move and delete that file when done. Any suggestions?
There doesn't appear to be a way to set a header row in a CSV output file. Please add an option to set a header row, ideally using the field names.
With logstash 2.1 a bug was introduced, that disallows me to print the csv document to stdout.
Main parts of the config:
input {
stdin {}
}
filter {
...
}
output {
csv {
path => "/dev/stdout"
fields => [ "path", "resource" ]
csv_options => {
col_sep => ";"
force_quotes => true
}
}
}
This worked in 2.0 and smaller version.
The error message is:
Exception while flushing and closing files. {:exception=>#<IOError: Illegal seek>, :level=>:error}
UPDATE
If I use file input filter with a static path I get this error:
IOError: Illegal seek
flush at org/jruby/RubyIO.java:2207
flush at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-file-2.2.3/lib/logstash/outputs/file.rb:284
flush_pending_files at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-file-2.2.3/lib/logstash/outputs/file.rb:200
each at org/jruby/RubyHash.java:1342
flush_pending_files at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-file-2.2.3/lib/logstash/outputs/file.rb:198
flush at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-file-2.2.3/lib/logstash/outputs/file.rb:187
receive at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-csv-2.0.2/lib/logstash/outputs/csv.rb:40
handle at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/outputs/base.rb:81
output_func at (eval):163
outputworker at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/pipeline.rb:277
start_outputs at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.1.1-java/lib/logstash/pipeline.rb:194
Retranscripting a user's investigation:
Under documentation for CSV is a documentation for file_mode:
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-csv.html#plugins-outputs-csv-file_mode
There is the sentence:
File access mode to use. Note that due to the bug in jruby system umask is ignored on linux: jruby/jruby#3426 Setting it to -1 uses default OS value. Example: "file_mode" => 0640
In my opinion this documentation about file_mode and dir_mode as well is not true.
I did some tests with filemask:
/usr/share/logstash/bin/logstash --config.string 'input { stdin { } } output { csv { path => "/usr/share/logstash/00359402.csv" fields => "message" file_mode => 0000 } }' --log.level=debug
And result was this table:
0000 ----------.
0001 ---------x.
0002 ----------.
0003 ---------x.
0004 -------r--.
0005 -------r-x.
0006 -------r--.
0007 -------r-x.
Under RHEL7 umask for root is 0022.
I found this documentation about the ruby File method:
http://www.ruby-doc.org/core-2.1.2/File.html#method-c-new
which essentially says that ruby uses system call with open and chmod, which essentially says umask is always ignored regarding the bit that is set.
https://www.linuxnix.com/umask-define-linuxunix/
Umask values are subtracted from the default permissions, so a umask of 0222 would make a file read-only for everyone.
So if root default is 0022 which means that for group and other is always read-only, which is exactly what is happening here.
So I tested to set umask to 0000:
umask 0000 ; /usr/share/logstash/bin/logstash --config.string 'input { stdin { } } output { csv { path => "/usr/share/logstash/00359402.csv" fields => "message" file_mode => 0222 } }' --log.level=debug ; umask 0022
result was a file with correct rights:
--w--w--w-. 1 root root 52 Jul 4 16:27 /usr/share/logstash/00359402.csv
So one solution can be, that the documentation page documents that.
Another solution can be, that Systemd could set umask for logstash, but this is not a good solution.
vim /etc/systemd/system/multi-user.target.wants/logstash.service
[Service]
UMask=0000
3rd solution could be, that another Ruby solution will be established for that.
I could not reproduce this bug under RHEL7
jruby/jruby#3426
/usr/share/logstash/bin/logstash --interactive irb
under user root: puts File.umask 18 => nil
under user logstash: puts File.umask 2 => nil
CSV output needs to escape characters in values which cannot be rendered in spreadsheet applications.
(This issue was originally filed by @beardyneedle at elastic/logstash#2115)
to replicate create an CSV output to a dir you do not have write perms to.
Just filing for those who may run into this.
LS 5.4.0 incorrectly packages 3.0.2 version of the logstash-output-csv plugin, which means that users on 5.4.0 can run into this known bug. Note that LS 5.3.0 already comes with 3.0.3 version of the plugin. I can confirm that LS 5.4.1 packages 3.0.3 version of the plugin again so it has been addressed.
For users on LS 5.4.0, either upgrade to LS 5.4.1, or manually install (./logstash-plugin install --version "3.0.3" logstash-output-csv) the 3.0.3 version of the plugin with the fix to #14.
dear friends,
the word separator is spelled wrong twice in csv_options
here
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-csv.html#plugins-outputs-csv-csv_options
this kind of minor mistake can lead to losing a bunch of time when you are looking for stuff with ctrl+f and have no time to read the whole thing
input {
file {
path => [ "/tmp/logstash-test.txt" ]
type => "test"
}
}
filter {}
output {
if [type] == "test" {
file {
path => "/tmp/logstash-test.json"
}
csv {
path => "/tmp/logstash-test.csv"
fields => [ "host", "path" ]
csv_options => { "col_sep" => ":" "row_sep" => "\n"}
}
}
}
/tmp/logstash-test.txt
091502 001 002 003
091517 001 002 003
/tmp/logstash-test.json
{"path":"/tmp/logstash-test.txt","@timestamp":"2017-02-03T09:15:03.119Z","@version":"1","host":"localhost","message":"091502 001 002 003","type":"test","tags":[]}
{"path":"/tmp/logstash-test.txt","@timestamp":"2017-02-03T09:15:18.130Z","@version":"1","host":"localhost","message":"091517 001 002 003","type":"test","tags":[]}
/tmp/logstash-test.csv
2017-02-03T09:15:03.119Z localhost 091502 001 002 0032017-02-03T09:15:18.130Z localhost 091517 001 002 003
echo "$(date -u +%H%M%S) 001 002 003" >> /tmp/logstash-test.txt
localhost:/tmp/logstash-test.txt
localhost:/tmp/logstash-test.txt
I followed the guide in http://logstash.net/docs/1.4.2/outputs/csv#csv_options, but the separator was escaped. For example:
a\t2015-04-14 06:24:07 UTC\t1\nb\t2015-04-14 06:24:08 UTC\t1\nc\t2015-04-14 06:24:09 UTC\t1\n""\t2015-04-14 06:24:09 UTC\t1\nd\t2015-04-14 06:24:10 UTC\t1\n
Configuration:
input {
stdin {}
}
output {
csv {
fields => [ "message", "@timestamp", "@Version" ]
csv_options => {"col_sep" => "\t" "row_sep" => "\n"}
path => "/tmp/test.csv"
}
}
logstash version: 1.4.2
Please post all product and debugging questions on our forum. Your questions will reach our wider community members there, and if we confirm that there is a bug, then we can open a new issue here.
For all general issues, please provide the following details for fast resolution:
Hi
With ES 5.4 this conf file make the folowing errors
input {
elasticsearch {
hosts => "http://localhost:9200"
index => "sirene"
query => '{"query": {"query_string" : {"query": "((DEPET:54 AND provider:sp_mairie) OR (DEPET:55 AND provider:sp_mairie) OR (DEPET:60 AND provider:sp_mairie))"}}}'
}
}
output {
csv {
fields => ["nom", "street", "cp","nomcommune","nom_maire","siteweb","email"]
path => "/home/data-prospection/public_html/jsondata/57cae745456ab85d4ff76a83ef3f1f0e/dt_0d83a4d79454b856ab249ca6496b17b1.csv"
}
}
19:49:32.109 [[main]<elasticsearch] ERROR logstash.pipeline - A plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::Elasticsearch hosts=>["http://localhost:9200"], index=>"sirene", query=>"{\"query\": {\"query_string\" : {\"query\": \"((DEPET:54 AND provider:sp_mairie) OR (DEPET:55 AND provider:sp_mairie) OR (DEPET:60 AND provider:sp_mairie))\"}}}", id=>"5c936185cc6a7e0f4f295c6b5c5a95250ce5f9b6-1", enable_metric=>true, codec=><LogStash::Codecs::JSON id=>"json_633de6cc-69e1-4a02-bd93-63fb1c7b92a6", enable_metric=>true, charset=>"UTF-8">, size=>1000, scroll=>"1m", docinfo=>false, docinfo_target=>"@metadata", docinfo_fields=>["_index", "_type", "_id"], ssl=>false>
Error: [400] {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Failed to parse request body"}],"type":"illegal_argument_exception","reason":"Failed to parse request body","caused_by":{"type":"json_parse_exception","reason":"Unrecognized token 'DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAACsFm84Mm84SFpzU3VHSzdWMHdWQ3N3NGcAAAAAAAAArRZvODJvOEhac1N1R0s3VjB3VkNzdzRnAAAAAAAAAKsWbzgybzhIWnNTdUdLN1Ywd1ZDc3c0ZwAAAAAAAACuFm84Mm84SFpzU3VHSzdWMHdWQ3N3NGcAAAAAAAAArxZvODJvOEhac1N1R0s3VjB3VkNzdzRn': was expecting ('true', 'false' or 'null')\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@6037430d; line: 1, column: 457]"}},"status":400}
Thanks for your return (it's work well with ES 5.3)
I output a cvs file with the csv plugin and a field with float value which is output in a scientific format with E.
Will not be logical to use decimal value instead? (best for latitude and longitude)
Since this plugin inherits from logstash-output-file,
the multi_receive_encoded from the file output gets called instead of this own plugin's receive
This means this plugin behaves like logstash-output-file in logstash 5.x, making it useless. For more reports see:
https://discuss.elastic.co/t/elasticsearch-logstash-input-to-csv-output-mapping-question/65837
https://discuss.elastic.co/t/csv-output-plugin-prints-wrong-stuff/65574
https://discuss.elastic.co/t/debugging-csv-output/64901
https://discuss.elastic.co/t/export-from-elasticsearch-to-csv-problem-where-i-am-wrong/65558
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.