GithubHelp home page GithubHelp logo

agentx's Introduction

AgentX

AgentX是alinode团队开发的agent命令程序,用于协助alinode的性能数据上报和问题诊断。

  • Node.js CI
  • codecov

Installation

$ npm install agentx -g

以上命令会将agentx安装为一个全局的命令行工具。

Usage

agentx需要一个配置文件来进行使用,agentx仅会在配置指定下的目录执行命令或读取日志。

该配置格式如下:

{
  "server": "<SERVER IP>:8080",
  "appid": "<YOUR APPID>",
  "secret": "<YOUR SECRET>",
  "cmddir": "</path/to/your/command/dir>",
  "logdir": "</path/to/your/log/dir>",
  "reconnectDelay": 10,
  "heartbeatInterval": 60,
  "reportInterval": 60,
  "error_log": [
    "</path/to/your/error.log>",
    "您的应用在业务层面产生的异常日志的路径",
    "例如:/root/.logs/error.#YYYY#-#MM#-#DD#-#HH#.log",
    "可选"
  ],
  "packages": [
    "</path/to/your/package.json>",
    "可以输入多个package.json的路径",
    "可选"
  ]
}

配置中的#YYYY#、#MM#、#DD#、#HH#是通配符,如果您的异常日志是按时间生成的,请使用它。

保存为config.json。上述不明确的地方请咨询旺旺群:1406236180。

完成配置后,请使用以下命令进行执行:

$ nohup agentx config.json &

agentx将以常驻进程的方式执行。部署完成后,请访问http://alinode.aliyun.com/dashboard查看您的应用详情。如果一切正常,稍等片刻(1分钟)即可收到你的应用性能数据。

License

The agentx is released under the MIT license.

agentx's People

Contributors

alinode-oss avatar dependabot[bot] avatar fengmk2 avatar hyj1991 avatar jacksontian avatar joyeecheung avatar peze avatar popomore avatar zhangzifa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

agentx's Issues

在 macOS 系统中,agentx 抛出导常 `no such file or directory, open '/proc/loadavg'`。

我在使用 egg + egg-alinode 开发项目,最近更新过 node_modules 后运行时会抛出如下异常:

2018-10-18 17:39:36,572 ERROR 10620 nodejs.ENOENTError: ENOENT: no such file or directory, open '/proc/loadavg'
Error: ENOENT: no such file or directory, open '/proc/loadavg'
errno: -2
code: "ENOENT"
syscall: "open"
path: "/proc/loadavg"
name: "ENOENTError"
pid: 10620
hostname: My-Compute-Name.local

系统:macOS Mojave 10.14
Node:alinode v3.12.0 (node v8.12.0)

注:alinode 是使用 nodeinstall --install-alinode ^3 在项目本地安装,而非安装在全局。


以下是我的排查结果,但因对相关知识不太了解,因此只能排查到这里,无法确定真正的问题点:

经判断该异常是因为在 agentx 的 1.9.13 版本中,合并了一个提交 b4c861e,在该提交中修改了 lib/orders/system.js 文件,其中关于 getLoadAvg 方法的调用由:

var status = function () {
  const is_linux = os.type() === 'Linux';
  const loadavg = is_linux ? getLoadAvg() : os.loadavg();
  return {
    ...
  };
};

改为了:

exports.run = function (callback) {
  var result = {
    type: 'system',
    metrics: { cpu_count: cpuNumber, uptime: os.uptime() }
  };

  result.metrics.totalmem = totalMemory;
  getFreeMemory(function (err, freemem) {
    if (err) {
      return callback(err);
    }
    result.metrics.freemem = freemem;
    getLoadAvg(function (err, load) {
      if (err) {
        return callback(err);
      }
      ...
    });
  });
};

在这个修改中,删除了关于系统的条件判断。

agentx 安装报错

本地 node 版本 0.12.7

https://npm.taobao.org/mirrors/node/v0.13.0/node-v0.13.0.tar.gz 确实是404


[root@bigertechA ~]# cnpm install agentx -g
|
> [email protected] install /root/.tnvm/versions/alinode/v0.3.1/lib/node_modules/agentx/node_modules/ws/node_modules/utf-8-validate
> node-gyp rebuild

gyp WARN install got an error, rolling back install
gyp ERR! configure error 
gyp ERR! stack Error: 404 response downloading https://npm.taobao.org/mirrors/node/v0.13.0/node-v0.13.0.tar.gz
gyp ERR! stack     at Request.<anonymous> (/usr/local/lib/node_modules/cnpm/node_modules/node-gyp/lib/install.js:244:14)
gyp ERR! stack     at Request.emit (events.js:131:20)
gyp ERR! stack     at Request.onRequestResponse (/usr/local/lib/node_modules/cnpm/node_modules/node-gyp/node_modules/request/request.js:998:10)
gyp ERR! stack     at ClientRequest.emit (events.js:109:17)
gyp ERR! stack     at HTTPParser.parserOnIncomingClient (_http_client.js:428:21)
gyp ERR! stack     at HTTPParser.parserOnHeadersComplete (_http_common.js:113:23)
gyp ERR! stack     at TLSSocket.socketOnData (_http_client.js:319:20)
gyp ERR! stack     at TLSSocket.emit (events.js:109:17)
gyp ERR! stack     at readableAddChunk (_stream_readable.js:164:16)
gyp ERR! stack     at TLSSocket.Readable.push (_stream_readable.js:128:10)
gyp ERR! System Linux 2.6.32-358.el6.x86_64
gyp ERR! command "node" "/usr/local/lib/node_modules/cnpm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /root/.tnvm/versions/alinode/v0.3.1/lib/node_modules/agentx/node_modules/ws/node_modules/utf-8-validate
gyp ERR! node -v v0.13.0
gyp ERR! node-gyp -v v3.0.3
gyp ERR! not ok 

最后的结果

npm WARN optional dep failed, continuing [email protected]
npm WARN optional dep failed, continuing [email protected]
/root/.tnvm/versions/alinode/v0.3.1/bin/agentx -> /root/.tnvm/versions/alinode/v0.3.1/lib/node_modules/agentx/start_client.js
[email protected] /root/.tnvm/versions/alinode/v0.3.1/lib/node_modules/agentx
├── [email protected]
├── [email protected]
├── [email protected] ([email protected])
├── [email protected] ([email protected], [email protected])
└── [email protected] ([email protected], [email protected])

这算成功还是失败呢?

WSL 子系统的工作环境下不能正常运行

ENOENT: no such file or directory, open '/proc/sys/kernel/core_pattern'
Error: ENOENT: no such file or directory, open '/proc/sys/kernel/core_pattern'
    at Object.fs.openSync (fs.js:646:18)
    at Object.fs.readFileSync (fs.js:551:33)
    at Object.exports.init (/home/db/.tnvm/versions/alinode/v3.14.1/lib/node_modules/@alicloud/agenthub/node_modules/agentx/lib/orders/list_core.js:182:19)

因为WSL存在一个已知问题,导致读取文件出错。
参考

需要先判断文件是否存在。

2018-04-18 16:44:01,879 ERROR 108558 nodejs.Error: Command failed: cat /proc/63968/environ
cat: /proc/63968/environ: 权限不够

cat: /proc/63968/environ: 权限不够

    at ChildProcess.exithandler (child_process.js:275:12)
    at emitTwo (events.js:126:13)
    at ChildProcess.emit (events.js:214:7)
    at maybeClose (internal/child_process.js:925:16)
    at Socket.stream.socket.on (internal/child_process.js:346:11)
    at emitOne (events.js:116:13)
    at Socket.emit (events.js:211:7)
    at Pipe._handle.close [as _onclose] (net.js:554:12)
killed: false
code: 1
signal: null
cmd: 'cat /proc/63968/environ'
pid: 108558

63968 已经不存在。

Node.js 性能平台部署排查工具

  1. 读取进程上的设置
    • ENABLE_NODE_LOG
    • NODE_LOG_DIR
    • is alinode?
  2. 读取 agentx/agenthub 的配置
    • 读取 logdir
    • 检查 logdir 是否有 node-xxx.log

判断不通过指出问题

多个实例的 hostname,配置项 "agentidMode": "IP"。取到是内网ip或外网ip,不定。

if (net.family === 'IPv4' && net.address && net.address !== '127.0.0.1') {

多个实例的 hostname,机器hostname不规范导致多台机器一样,借以配置项 "agentidMode",来拼接上ip后2个数。但是取到的是内网ip(18246),不便于区分机器。
整个os.networkInterfaces();输出如下:(用xxx打码了。。。)
eth3,bond0.这些属性名,是不确定的,Object.keys的输出顺序则不能保证内外网ip出现顺序。

{ lo: 
   [ { address: '127.0.0.1',
       netmask: '255.0.0.0',
       family: 'IPv4',
       mac: '00:00:00:00:00:00',
       internal: true,
       cidr: '127.0.0.1/8' },
     { address: '::1',
       netmask: 'ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff',
       family: 'IPv6',
       mac: '00:00:00:00:00:00',
       scopeid: 0,
       internal: true,
       cidr: '::1/128' } ],
  eth3: 
   [ { address: 'xx.xx.18.246',
       netmask: '255.255.0.0',
       family: 'IPv4',
       mac: 'xxxxxxxxxxxx',
       internal: false,
       cidr: 'xxxxxxxxxx' },
     { address: 'xxxxxxxxxxxx',
       netmask: 'ffff:ffff:ffff:ffff::',
       family: 'IPv6',
       mac: 'xxxxxxxxxxxxx',
       scopeid: 5,
       internal: false,
       cidr: 'xxxxxxxxxxxx' } ],
  bond0: 
   [ { address: 'xx.xx.252.246',
       netmask: '255.255.255.0',
       family: 'IPv4',
       mac: 'xxxxxxxxx',
       internal: false,
       cidr: 'xxxxxxxxxx' },
     { address: 'xxxxxxxxx',
       netmask: 'ffff:ffff:ffff:ffff::',
       family: 'IPv6',
       mac: 'xxxxxxxxxxxxx',
       scopeid: 6,
       internal: false,
       cidr: 'xxxxxxxxxxxxxx' } ] }

tailf问题

  • agentx退出和重启的时候无法关闭。
  • 占用进程数。

安装 agentx 失败

env

alinode-v2.0.8 with Node.js v6.9.5

npm install agentx -g

如下结果

SyntaxError: Unexpected token C in JSON at position 19
    at Object.parse (native)
    at /root/.tnvm/versions/alinode/v2.0.8/lib/node_modules/npm/node_modules/uid-number/uid-number.js:43:18
    at ChildProcess.exithandler (child_process.js:197:7)
    at emitTwo (events.js:106:13)
    at ChildProcess.emit (events.js:191:7)
    at maybeClose (internal/child_process.js:877:16)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:226:5)

access_log是有谁负责写入的?

很好奇access_log是由谁写入的?我用的agenthub,可我查看log文件的操作进程是我启的node进程?可以做一下科普么?

请问一下在 node:alpine 和 centos docker 镜像上都出现wss连接错误该怎么解决?

我们 eggjs 项目,对接 AliNode 后,本地跑,以及部署到 heroku 都能正常上报数据到阿里 Node.js 性能平台。但是部署到 k8s 的 pod(分别试过 node:alpine 与 centos 镜像),都不能上报数据:

[Tue Jun 18 2019 11:46:16 GMT+0800 (China Standard Time)] Connecting to wss://agentserver.node.aliyun.com:8080...
[Tue Jun 18 2019 11:46:16 GMT+0800 (China Standard Time)] get an error: Error: write EPROTO 140088968039272:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:252:

是需要这些镜像做什么改动来兼容吗?

[RFC] Agentx 大版本升级计划

TODO

  • 不再支持 Node v4
  • 增加 Node v10 测试和 CI
  • 测试覆盖率提升
  • 语法升级
    • prototype -> class
    • callback -> promise
    • co / yild -> async / await

agentid 获取方式改进

问题:在某些环境下,用户的 hostname,会出现严重的重复。这导致 appid+agentid 会互相冲突。

解决方案:
增加配置项,agentid_mode ,默认为 hostname 模式,可以配置为 local ip 模式。

coredump 文件检测

实现计划:

  1. 设定current work directory。
  2. cat /proc/sys/kernel/core_pattern

将两者的路径去重。定时检查目录下的core*文件。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.