GithubHelp home page GithubHelp logo

Comments (15)

popduke avatar popduke commented on August 11, 2024

1,Server启动时会在info.log里输出全量的设置,能否提供一下?
2,有没有尝试过压测单节点的情况?
3,把logback.xml里的DemoEventLogger调整成debug,看下event.log压测过程中连接断开的原因是什么?

from bifromq.

fengfu222 avatar fengfu222 commented on August 11, 2024

1,Server启动时会在info.log里输出全量的设置,能否提供一下? 2,有没有尝试过压测单节点的情况? 3,把logback.xml里的DemoEventLogger调整成debug,看下event.log压测过程中连接断开的原因是什么?

1 ---
bootstrap: true
clusterConfig:
env: "Test"
host: "10.89.144.26"
port: 8899
seedEndpoints: "10.89.144.129:8899,10.89.144.26:8899,10.89.144.62:8899,10.89.144.69:8899,10.89.144.121:8899"
mqttServerConfig:
connTimeoutSec: 5
maxConnPerSec: 3000
maxDisconnPerSec: 1000
maxMsgByteSize: 262144
maxResendTimes: 5
maxConnBandwidth: 524288
defaultKeepAliveSec: 60
qos2ConfirmWindowSec: 5
bossELGThreads: 1
workerELGThreads: 16
tcpListener:
enable: true
host: "0.0.0.0"
port: 1883
tlsListener:
enable: true
host: "0.0.0.0"
port: 8883
sslConfig:
certFile: "server.crt"
keyFile: "server_pkcs8.key"
trustCertsFile: "root.crt"
clientAuth: "REQUIRE"
wsListener:
enable: true
host: "0.0.0.0"
port: 8080
wsPath: "/mqtt"
wssListener:
enable: false
host: "0.0.0.0"
port: 8443
wsPath: "/mqtt"
rpcClientConfig:
workerThreads: 100
rpcServerConfig:
host: "10.89.144.26"
port: 0
workerThreads: 100
baseKVRpcServerConfig:
port: 0
stateStoreConfig:
queryThreads: 100
tickerThreads: 10
bgWorkerThreads: 100
distWorkerConfig:
queryPipelinePerStore: 10000
compactWALThreshold: 5000
dataEngineConfig:
type: "rocksdb"
dataPathRoot: ""
manualCompaction: false
compactMinTombstoneKeys: 200000
compactMinTombstoneRanges: 100000
compactTombstoneRatio: 0.3
asyncWALFlush: false
fsyncWAL: false
walEngineConfig:
type: "rocksdb"
dataPathRoot: ""
manualCompaction: true
compactMinTombstoneKeys: 2500
compactMinTombstoneRanges: 2
compactTombstoneRatio: 0.3
asyncWALFlush: false
fsyncWAL: false
balanceConfig:
scheduleIntervalInMs: 5000
balancers:
- "com.baidu.bifromq.dist.worker.balance.ReplicaCntBalancerFactory"
inboxStoreConfig:
queryPipelinePerStore: 10000
compactWALThreshold: 2500
gcIntervalSeconds: 600
purgeDelaySeconds: 180
dataEngineConfig:
type: "rocksdb"
dataPathRoot: ""
manualCompaction: false
compactMinTombstoneKeys: 200000
compactMinTombstoneRanges: 100000
compactTombstoneRatio: 0.3
asyncWALFlush: false
fsyncWAL: false
walEngineConfig:
type: "rocksdb"
dataPathRoot: ""
manualCompaction: true
compactMinTombstoneKeys: 2500
compactMinTombstoneRanges: 2
compactTombstoneRatio: 0.3
asyncWALFlush: false
fsyncWAL: false
balanceConfig:
scheduleIntervalInMs: 5000
balancers:
- "com.baidu.bifromq.inbox.store.balance.ReplicaCntBalancerFactory"
- "com.baidu.bifromq.inbox.store.balance.RangeSplitBalancerFactory"
- "com.baidu.bifromq.inbox.store.balance.RangeLeaderBalancerFactory"
retainStoreConfig:
queryPipelinePerStore: 100
compactWALThreshold: 2500
gcIntervalSeconds: 600
dataEngineConfig:
type: "rocksdb"
dataPathRoot: ""
manualCompaction: false
compactMinTombstoneKeys: 200000
compactMinTombstoneRanges: 100000
compactTombstoneRatio: 0.3
asyncWALFlush: false
fsyncWAL: false
walEngineConfig:
type: "rocksdb"
dataPathRoot: ""
manualCompaction: true
compactMinTombstoneKeys: 5000
compactMinTombstoneRanges: 2
compactTombstoneRatio: 0.3
asyncWALFlush: false
fsyncWAL: false
balanceConfig:
scheduleIntervalInMs: 5000
balancers:
- "com.baidu.bifromq.retain.store.balance.ReplicaCntBalancerFactory"
apiServerConfig:
enable: true
httpPort: 8091
apiBossThreads: 1
apiWorkerThreads: 2
httpsListenerConfig:
enable: false
port: 8090

2 单节点压测 我们用的是小body,可以满足要求

3
bifromq1
bifromq2
bifromq3

from bifromq.

fengfu222 avatar fengfu222 commented on August 11, 2024

1,Server启动时会在info.log里输出全量的设置,能否提供一下? 2,有没有尝试过压测单节点的情况? 3,把logback.xml里的DemoEventLogger调整成debug,看下event.log压测过程中连接断开的原因是什么?

我们中间试过一次,配置 dist和inbox的dataEngine为memory,CPU是可以打上去的,而且能够支撑我们的压测完成,集群一直都很正常。

stateStoreConfig:
distWorkerConfig:
dataEngineConfig:
type: memory
inboxStoreConfig:
dataEngineConfig:
type: memory

from bifromq.

mafei6827 avatar mafei6827 commented on August 11, 2024

尝试压测了一下这个场景用例,压测了两个小时没复现此问题。从描述来看,大量CLOSE_WAIT像是客户端最终异常断开,然后未完成tcp断开的完整握手动作,导致服务端残存了大量的CLOSE_WAIT连接。

from bifromq.

fengfu222 avatar fengfu222 commented on August 11, 2024

尝试压测了一下这个场景用例,压测了两个小时没复现此问题。从描述来看,大量CLOSE_WAIT像是客户端最终异常断开,然后未完成tcp断开的完整握手动作,导致服务端残存了大量的CLOSE_WAIT连接。

出现大量的CLOSE_WAIT 只是现象,分析是因为bifromq服务卡死,导致客户端请求无法响应,所以才会出现大量的CLOSE_WAIT,此时使用mqtt client连接也无法连接上服务。我们的困惑是在于,我的资源都没有到瓶颈,为啥使用RocksDB 无法支持我的压测,我们这里进行了多次压测,只要是使用RocksDB作为存储引擎,一般几分钟后,就会导致服务不可用。memory引擎则没有这个问题。我们还试过将Rocksdb的数据写入到tmpfs,也不行。我们只配置
walEngineConfig:
type: "memory" 测试也是不行,只有
dataEngineConfig:
type: "memory" 才能支撑我的测试

from bifromq.

popduke avatar popduke commented on August 11, 2024

cleansession=true qos0,消息链路上不会涉及rocksdb的io。bifromq会输出jvm的metrics,可以看下heap和direct buffer的使用情况。

from bifromq.

fengfu222 avatar fengfu222 commented on August 11, 2024

cleansession=true qos0,消息链路上不会涉及rocksdb的io。bifromq会输出jvm的metrics,可以看下heap和direct buffer的使用情况。

我们检查了GC,没看到什么问题。开启监控后有个rocksdb的指标感觉耗时比较高

basekv_le_rocksdb_flush_time_seconds_count{env="Test",kvspace="111877626712162304_0",storeId="a79ce47a-e01f-4db5-9651-fb4b3f7b3659",type="wal",} 168.0

basekv_le_rocksdb_flush_time_seconds_sum{env="Test",kvspace="111877626712162304_0",storeId="a79ce47a-e01f-4db5-9651-fb4b3f7b3659",type="wal",} 5.7988E-5

basekv_le_rocksdb_flush_time_seconds_count{env="Test",kvspace="111877626717339649_0",storeId="79675d6a-a7e9-4a52-9f8b-b659b58b9533",type="wal",} 488610.0

basekv_le_rocksdb_flush_time_seconds_sum{env="Test",kvspace="111877626717339649_0",storeId="79675d6a-a7e9-4a52-9f8b-b659b58b9533",type="wal",} 0.029379993

basekv_le_rocksdb_flush_time_seconds_count{env="Test",kvspace="111877626699644928_0",storeId="18207f92-f591-49a1-9887-ab30549816a0",type="wal",} 7.0

basekv_le_rocksdb_flush_time_seconds_sum{env="Test",kvspace="111877626699644928_0",storeId="18207f92-f591-49a1-9887-ab30549816a0",type="wal",} 1.0189E-5

from bifromq.

popduke avatar popduke commented on August 11, 2024

三个kvspace的flush耗时都是1微妙左右,为什么认为很高?

from bifromq.

fengfu222 avatar fengfu222 commented on August 11, 2024

三个kvspace的flush耗时都是1微妙左右,为什么认为很高?

我们刚开始的压测是在没有sub的情况下,纯发送数据到服务器,CPU负载只有1/3,但是如果我们开启了消费模式,CPU负载就会升到2/3了,如果35万个请求全部发送40K的包,很快就会挂掉。
所以我们在想是不是对于发送请求,在调用链上存在限制CPU使用的配置?

from bifromq.

fengfu222 avatar fengfu222 commented on August 11, 2024

尝试压测了一下这个场景用例,压测了两个小时没复现此问题。从描述来看,大量CLOSE_WAIT像是客户端最终异常断开,然后未完成tcp断开的完整握手动作,导致服务端残存了大量的CLOSE_WAIT连接。

你们的压测场景是什么?能否贴一下你们的bifromq的配置?

from bifromq.

popduke avatar popduke commented on August 11, 2024

https://bifromq.io/docs/test_report/test_report/

from bifromq.

fengfu222 avatar fengfu222 commented on August 11, 2024

https://bifromq.io/docs/test_report/test_report/

你这个链接的场景的payload都是小body体,只有几百个字节,我们使用40K的payload,CPU打不上去,很快就挂了,小body体我们也没有问题。我的问题就是使用40K大payload压测的时候,发现32核的CPU,bifromq只能使用三分之一左右,然后集群很快就会被打挂,如何调整参数,都没法让bifromq的CPU使用率上去,使用内存存储就可以将集群CPU打上去。

from bifromq.

popduke avatar popduke commented on August 11, 2024

可以把com.baidu.bifromq.mqtt.handler的debug日志打开,看下压测过程中连接断开的具体原因是什么?

from bifromq.

fengfu222 avatar fengfu222 commented on August 11, 2024

可以把com.baidu.bifromq.mqtt.handler的debug日志打开,看下压测过程中连接断开的具体原因是什么?

error
提示 deliver error

from bifromq.

popduke avatar popduke commented on August 11, 2024

不是event.log里的内容。logback里给 com.baidu.bifromq.mqtt.handler配置个debug logger,看下输出的内容是否有异常抛出。

from bifromq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.