GithubHelp home page GithubHelp logo

Comments (14)

errm avatar errm commented on May 28, 2024

Humm, looks like something unexpected got into one of the pos files.

Could you take a look at them? If you remove the pos files does everything start up from scratch as expected?

I am not quite sure what we should do to recover in this case.... Just throw a better error, or skip trying to seek to the pos, and read from head or tail, without any intervention.

from fluent-plugin-systemd.

dhawal55 avatar dhawal55 commented on May 28, 2024

The pos file is empty and deleting it does not help. It works fine if I kill the container and start again. But it again stops working after few days. It will be nice to throw a better error. "Invalid argument" doesn't say much about what went wrong.
Any idea what could cause the seek function to fail?

from fluent-plugin-systemd.

errm avatar errm commented on May 28, 2024

If the pos file does not contain a cursor it may well be that there is some issue with writing it out. Could it be permissions? Who owns the pos file? and what is the user you are running fluent as? Are you mounting a volume to write the pos files to?

The pos file should have a value that looks something like:
s=add4782f78ca4b6e84aa88d34e5b4a9d;i=1cd;b=4737ffc504774b3ba67020bc947f1bc0;m=42f2dd;t=4d905e4cd5a92;x=25b3f86ff2774ac4 (example from the test suite)

We should be able to check if the pos file is empty before passing the value to systemd, and the perhaps log a warning. It does seem though that you have some underlying issue that is preventing you from writing the pos file.

from fluent-plugin-systemd.

dhawal55 avatar dhawal55 commented on May 28, 2024

Fluentd is running as root and has rw permission on the file. The file is owned by root and is mounted on root device. The pos file starts correctly (just like the sample shown above) but after running for few days, fluentd starts failing and the file becomes empty.

If it is a permission issue, shouldn't it fail the very first time? Something is causing the pos file to become empty and preventing fluent to write to it at a later point in time and i'm not able to figure out what is triggering it.

from fluent-plugin-systemd.

errm avatar errm commented on May 28, 2024

Humm sounds interesting.

I have a patch that should solve the symptoms, by ignoring the pos file if it contains an invalid cursor #7

It would be great if we could reproduce the error writing to the pos....
It seems like the writer thread is still running correctly, or the file would not have been blanked out, I guess that we might be getting an invalid value back from systemd here https://github.com/reevoo/fluent-plugin-systemd/blob/master/lib/fluent/plugin/in_systemd.rb#L59

from fluent-plugin-systemd.

errm avatar errm commented on May 28, 2024

I have been trying to work out how this could happen... Am having a hard time... @dhawal55 do you have any logs from immediately before fluentd starts crashing?

from fluent-plugin-systemd.

SleepyBrett avatar SleepyBrett commented on May 28, 2024

hey @errm I work with @dhawal55, finding some other stuff. It seems like it has trouble flushing it's buffer out to ES.. then it fails to alloc for writing the pos file.

2016-04-26 02:22:09 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-04-26 02:22:06 +0000 error_class="Elasticsearch::Transport::Transport::Errors::Forbidden" error="[403] " plugin_id="object:e9ef94"
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-transport-1.0.15/lib/elasticsearch/transport/transport/base.rb:146:in `__raise_transport_error'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-transport-1.0.15/lib/elasticsearch/transport/transport/base.rb:256:in `perform_request'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-transport-1.0.15/lib/elasticsearch/transport/transport/http/faraday.rb:20:in `perform_request'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-transport-1.0.15/lib/elasticsearch/transport/client.rb:125:in `perform_request'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-api-1.0.15/lib/elasticsearch/api/actions/ping.rb:20:in `block in ping'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-api-1.0.15/lib/elasticsearch/api/utils.rb:191:in `__rescue_from_not_found'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-api-1.0.15/lib/elasticsearch/api/actions/ping.rb:19:in `ping'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.4.0/lib/fluent/plugin/out_elasticsearch.rb:112:in `client'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.4.0/lib/fluent/plugin/out_elasticsearch.rb:237:in `rescue in send'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.4.0/lib/fluent/plugin/out_elasticsearch.rb:235:in `send'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.4.0/lib/fluent/plugin/out_elasticsearch.rb:229:in `write'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/buffer.rb:345:in `write_chunk'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/buffer.rb:324:in `pop'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/output.rb:329:in `try_flush'
  2016-04-26 02:22:09 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/output.rb:140:in `run'
2016-04-26 02:22:13 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-04-26 02:22:08 +0000 error_class="Elasticsearch::Transport::Transport::Errors::Forbidden" error="[403] " plugin_id="object:e9ef94"
  2016-04-26 02:22:13 +0000 [warn]: suppressed same stacktrace
2016-04-26 02:22:18 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-04-26 02:22:11 +0000 error_class="Elasticsearch::Transport::Transport::Errors::Forbidden" error="[403] " plugin_id="object:e9ef94"
  2016-04-26 02:22:18 +0000 [warn]: suppressed same stacktrace
2016-04-26 02:22:25 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-04-26 02:22:20 +0000 error_class="Elasticsearch::Transport::Transport::Errors::Forbidden" error="[403] " plugin_id="object:e9ef94"
  2016-04-26 02:22:25 +0000 [warn]: suppressed same stacktrace
2016-04-26 02:22:33 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-04-26 02:22:34 +0000 error_class="Elasticsearch::Transport::Transport::Errors::Forbidden" error="[403] " plugin_id="object:e9ef94"
  2016-04-26 02:22:33 +0000 [warn]: suppressed same stacktrace
2016-04-26 02:22:41 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-04-26 02:23:09 +0000 error_class="Elasticsearch::Transport::Transport::Errors::Forbidden" error="[403] " plugin_id="object:e9ef94"
  2016-04-26 02:22:41 +0000 [warn]: suppressed same stacktrace
2016-04-26 02:23:17 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-04-26 02:24:09 +0000 error_class="Elasticsearch::Transport::Transport::Errors::Forbidden" error="[403] " plugin_id="object:e9ef94"
  2016-04-26 02:23:17 +0000 [warn]: suppressed same stacktrace
2016-04-26 02:24:16 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-04-26 02:25:09 +0000 error_class="Elasticsearch::Transport::Transport::Errors::Forbidden" error="[403] " plugin_id="object:e9ef94"
  2016-04-26 02:24:16 +0000 [warn]: suppressed same stacktrace
2016-04-26 02:25:15 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-04-26 02:26:09 +0000 error_class="Elasticsearch::Transport::Transport::Errors::Forbidden" error="[403] " plugin_id="object:e9ef94"
  2016-04-26 02:25:16 +0000 [warn]: suppressed same stacktrace
2016-04-26 02:26:16 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-04-26 02:27:09 +0000 error_class="Elasticsearch::Transport::Transport::Errors::Forbidden" error="[403] " plugin_id="object:e9ef94"
  2016-04-26 02:26:16 +0000 [warn]: suppressed same stacktrace
2016-04-26 02:26:56 +0000 [info]: following tail of /var/log/containers/aws-node-labels-ip-xxx.us-west-2.compute.internal_kube-system_apply-labels-fbe15a0159fc1225a0bdb874885ba66a52f7bf2a9d915021bdb8a2e454090660.log
2016-04-26 02:27:07 +0000 [error]: Cannot allocate memory @ io_write - /var/log/es-containers.log.pos
  2016-04-26 02:27:07 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/in_tail.rb:713:in `write'
  2016-04-26 02:27:07 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/in_tail.rb:713:in `update_pos'
  2016-04-26 02:27:07 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/in_tail.rb:537:in `on_notify'
  2016-04-26 02:27:07 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/in_tail.rb:369:in `on_notify'
  2016-04-26 02:27:07 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/in_tail.rb:454:in `call'
  2016-04-26 02:27:07 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/in_tail.rb:454:in `on_timer'
  2016-04-26 02:27:07 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.4.2/lib/cool.io/loop.rb:88:in `run_once'
  2016-04-26 02:27:07 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.4.2/lib/cool.io/loop.rb:88:in `run'
  2016-04-26 02:27:07 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/in_tail.rb:233:in `run'
2016-04-26 02:27:08 +0000 [error]: Cannot allocate memory @ io_getpartial - /var/log/containers/aws-node-labels-ip-172-20-242-176.us-west-2.compute.internal_kube-system_POD-9821b9e016178174bcefc14232b4087072edacb3635acd3b2dfc57e6f4fc6d74.log
  2016-04-26 02:27:08 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/in_tail.rb:519:in `read_nonblock'
  2016-04-26 02:27:08 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/in_tail.rb:519:in `on_notify'
  2016-04-26 02:27:08 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/in_tail.rb:369:in `on_notify'
  2016-04-26 02:27:08 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/in_tail.rb:454:in `call'
  2016-04-26 02:27:08 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/in_tail.rb:454:in `on_timer'
  2016-04-26 02:27:08 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.4.2/lib/cool.io/loop.rb:88:in `run_once'
  2016-04-26 02:27:08 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/cool.io-1.4.2/lib/cool.io/loop.rb:88:in `run'
  2016-04-26 02:27:08 +0000 [error]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/plugin/in_tail.rb:233:in `run'
2016-04-26 02:27:09 +0000 [error]: closed stream
  2016-04-26 02:27:09 +0000 [error]: suppressed same stacktrace
2016-04-26 02:27:10 +0000 [info]: process finished code=9
2016-04-26 02:27:10 +0000 [error]: fluentd main process died unexpectedly. restarting.

from fluent-plugin-systemd.

errm avatar errm commented on May 28, 2024

Hi @SleepyBrett, yeah that seems like that could cause the issue....

I am not sure that there is anything we can do to mitigate this in the plugin, if the process is OOM and everything is broken...

Looking at your Dockerfile you might want to set reload_connections false in your <match **> block as you are using aws-elasticsearch-service, the reload_connections thing doesn't work properly as a) The node list returned by aws is wrong (or empty) and b) aws provide a load balanced endpoint, so the client does not need to know about the nodes anyway.

Also you might want to look into using a disk based buffer ...

In any case I think my patch will make it so you can just restart fluentd, and stuff will start ingesting again, at the cost of duplicated entries if you set read_from_head or lost entries otherwise.

from fluent-plugin-systemd.

dhawal55 avatar dhawal55 commented on May 28, 2024

Hello @errm. Sorry, i got pulled into something else so couldn't respond for last few days. Thank you for your time and help in this matter. I will try to set reload_connections to false and see if that helps with the buffering issue. I will keep you posted on the outcome.

from fluent-plugin-systemd.

errm avatar errm commented on May 28, 2024

Hopefully this will solve things for you @dhawal55, did you pull in the latest version of this plugin?

from fluent-plugin-systemd.

errm avatar errm commented on May 28, 2024

Ping @dhawal55 did you get things sorted out?

from fluent-plugin-systemd.

dhawal55 avatar dhawal55 commented on May 28, 2024

@errm Sorry, i totally forgot to update the status. Things were working fine since i last reported until now. I'm seeing the same error again. I'm updating to use the latest version (0.0.3) of the plugin. Will let you know if that fixes the issue.

from fluent-plugin-systemd.

errm avatar errm commented on May 28, 2024

Thanks that would be great :)

from fluent-plugin-systemd.

dhawal55 avatar dhawal55 commented on May 28, 2024

@errm The new version works. I'm going to close this issue. Thank you for all your help.

from fluent-plugin-systemd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.