Comments (22)
You can config minloglevel
as 0 and config v
as 3 for more detailed info.
https://docs.nebula-graph.io/3.6.0/5.configurations-and-logs/2.log-management/logs/#parameter_descriptions
from nebula.
OK, I'll make a test to see if there any connection leak. And at the mean time, maybe you can update your nebula-spark-connector to the latest version.
from nebula.
This question came up to me very accidentally, it's about the port amount.
maybe you can try according to the post. https://blog.csdn.net/gltncx11/article/details/122068479
@sparkle-apt
from nebula.
please make sure all the spark workers can ping the storaged address.
from nebula.
In addition, we have observed weird behavior in another test, which is to connect to the database and count number via spark-shell.
spark-shell --master yarn --deploy-mode client --driver-memory=2G --executor-memory=2G --num-executors=2 --executor-cores=2 --conf spark.dynamicAllocation.enabled=false --jars nebula-spark-connector_3.0-3.0-SNAPSHOT-jar-with-dependencies.jar
We run again the following snippet
import org.apache.spark.sql.DataFrame
import com.vesoft.nebula.connector.connector.NebulaDataFrameReader
import com.vesoft.nebula.connector.{NebulaConnectionConfig, ReadNebulaConfig}
sc.setLogLevel("INFO")
val ec2_public_ip = "xx.xx.xx.xx"
val config = NebulaConnectionConfig.builder().withMetaAddress(s"${ec2_public_ip}:9559").withConnectionRetry(2).build()
val nebulaReadEdgeConfig: ReadNebulaConfig = ReadNebulaConfig.builder().withSpace("acct2asset_20231130").withLabel("USES").withNoColumn(false).withReturnCols(List()).withPartitionNum(20).build()
val dataset = spark.read.nebula(config, nebulaReadEdgeConfig).loadEdgesToDF()
dataset.show()
dataset.count()
The first four tasks raised "Unable to activate object" the error while the following ones did not.
We are concerned about this unstable and unexpected behavior and looking forward to your suggestion. Thanks!
@Nicole00 @QingZ11 @wey-gu
from nebula.
So wired! Does the first four tasks are located in the different machines with the other tasks?
Can you sure the telnet storaged_host_ip 9779
is ok for all the spark workers?
from nebula.
@Nicole00 No worries! There are 303938330 USES
edges in the space.
from nebula.
We have decided to not be blocked by this issue for the moment and move forward with other projects and test in larger clusters. We will get back to it when bandwidth allows. So we can close the issue. Thanks for the reminder.
from nebula.
In the previous post mentioned here: https://discuss.nebula-graph.com.cn/t/topic/9726, zhang_hytc encountered the same issue as you did. You can try the following steps:
First, execute the show hosts
command in the nebula-console. This command displays the addresses of the storaged services exposed by the NebulaGraph metad service.
Next, confirm whether you can establish a connection from your local environment to the storaged addresses exposed by the metad service.
from nebula.
Thanks @QingZ11 for your prompt response.
I confirm that the address of the storaged service exposed by the metad service is the public IP address of the storaged service as shown below (The IP of the storaged service is masked due to sensitivity).
And I confirm that I can establish a connection from my EMR cluster to the storaged addresses as shown below (The IP of the storaged service is masked due to sensitivity).
The issue we encounter is not that we cannot connect to the storaged service under any circumstances. Instead, the problem is that we encounter the error when total-executor-cores
is greater than 4. This greatly limits our efficient usage of the graph database for our use cases.
Could you please help look into the issue and share insights that help us address it? Thank you!
from nebula.
Thanks @Nicole00 for reminder. I confirm that all Spark workers and the storaged service are within the same VPC network and their ports are connected.
from nebula.
I've taken the initiative to do some preliminary checks, but so far, those have not led to a resolution. To proceed further and more effectively troubleshoot the issue, could you advise me on the following:
Log Files: Which specific log files or which specific information in logs should I review that may contain error messages or indicators related to this issue?
Configuration Files: Are there any configuration settings that I should inspect or tweak that might be relevant to this problem?
Diagnostic Tools/Commands: Are there tools or commands available to gather more diagnostic information?
If you require additional information or context from my end, please let me know, and I'll be sure to provide it. Thanks!
from nebula.
@Nicole00 Yes, these tasks all run on the same single machine where storaged service is and I confirm telnet storaged_host_ip 9779
returns Connected to storaged_host_ip
.
from nebula.
@Nicole00 Yes, these tasks all run on the same single machine where storaged service is and I confirm
telnet storaged_host_ip 9779
returnsConnected to storaged_host_ip
.
Really wired. If the tasks are all run on ONE SAME single machine, looks like the storaged server is not ready at 10:59:00, but ready at 11:01:09.
from nebula.
could you please provide some log information for nebula storaged?
from nebula.
could you please provide some log information for nebula storaged?
Sure, could you please let me know what minloglevel
and v
is needed in log settings so that I could provide logs that help.
from nebula.
The logging is configured so that minloglevel
is 0 and v
is 3. When rerunning the snippet of counting, we however did not observe any error. Instead, the running tasks seemed to be stuck without any task finished while it seems that data is being fetched slowly according to the log, which is abnormal. It seems to me the intensive logging may impact the performance of fetching data. As the full log is large, we truncated it to contain the top and representative information which is attached.
nebula-storaged-v3.txt
When v
is reset to 0, we observed the same error again. However, no warning / error is found in the storaged log. We are getting more confused and not sure if these pieces of info help. Please let me know if you require additional information or context from my end.
from nebula.
@Nicole00 btw following some random thought, we found tons of TCP connection with TIME_WAIT
state when running the code below:
import org.apache.spark.sql.DataFrame
import com.vesoft.nebula.connector.connector.NebulaDataFrameReader
import com.vesoft.nebula.connector.{NebulaConnectionConfig, ReadNebulaConfig}
sc.setLogLevel("INFO")
val ec2_public_ip = "xx.xx.xx.xx"
val config = NebulaConnectionConfig.builder().withMetaAddress(s"${ec2_public_ip}:9559").withConnectionRetry(2).build()
val nebulaReadEdgeConfig: ReadNebulaConfig = ReadNebulaConfig.builder().withSpace("acct2asset_20231130").withLabel("USES").withNoColumn(false).withReturnCols(List()).withPartitionNum(20).build()
val dataset = spark.read.nebula(config, nebulaReadEdgeConfig).loadEdgesToDF()
dataset.count()
There are around 21k TCP connections:
(base) [[email protected] packages]$ netstat -a | grep -cE ':9779.*TIME_WAIT'
21494
Is this expected?
from nebula.
Sorry for reply late.
Theoretically the connections to storaged will be 20(partitionNum) * (number of sotraged instance).
I checked the connection leak
problem for the connector, the storageClient
will be closed after one partition finish its task and the connectionPool
inside can also be closed when storageClient
is closed.
How many data in your USES
label?
from nebula.
A bit summary of what have been observed so far:
- Generally encountered error
Unable to activate object
when loading graph with number of total nodes greater than 4, observed both on the local and EMR cluster; - Can successfully load with number of total nodes no greater than 4, observed both on the local and EMR cluster;
- Observed once that when loading graph the first four tasks raised "Unable to activate object" the error while the following ones did not and successfully finished;
- When logging configured at
minloglevel = 0
andv = 3
, i.e., large amount of logs written to disk, loading graph with number of total nodes greater than 4 can succeed, much much slower though; - There are tons of TCP connection with TIME_WAIT state when loading graph on local
@Nicole00 Do you have any other ideas taking these into consideration? Anything we could try to increase parallelism when reading graph?
from nebula.
I really cannot reproduce your problem.
I run the connector in local spark cluster with both 5 nodes and 1 node, and can read nebula's data successfully.
I still think it's a network problem.
from nebula.
@sparkle-apt hi, I have noticed that the issue you created hasn’t been updated for nearly a month, is this issue been resolved? If not resolved, can you provide some more information? If solved, can you close this issue?
Thanks a lot for your contribution anyway 😊
from nebula.
Related Issues (20)
- Leader has not been elected
- Add an MD5() function to NebulaGraph HOT 5
- Add Monitoring Metrics to NebulaGraph
- The log information is insufficient. HOT 2
- Unsatisfied secure compilation options -fPIE -pie HOT 4
- nGQL优化问题 HOT 1
- deploy by docker swarm , alway error, "MetaDaemonInit.cpp:118] Leader has not been elected, sleep 1s"
- storage container cannot startup correctly HOT 3
- Suggest contributing this project to the Apache Foundation HOT 1
- Get subgraph doesn't return loop in the results HOT 1
- Termux: Issues compiling and installing from source HOT 1
- Unable to retrieve data. HOT 2
- graph may coredump when metadata refresh during certain query invoke ScanEdgeProcessor::checkAndBuildContexts especially in asan binary HOT 1
- provide the versions and names of all third-party libraries for offline compilation of tag V3.6.0 Nebula source code. HOT 1
- Does the monitoring query interface support obtaining data in HTTPS mode? HOT 1
- customized certificate not work HOT 7
- How do I know when an edge has been successfully created? HOT 3
- SSL encryption capability,Nebula cannot parse the encrypted key file HOT 2
- Restarting the Nebula cluster has no effect. HOT 1
- storaged无法修改日志级别
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nebula.