Comments (5)
spark-connector得到的Dataframe是由 schema和数据分别按顺序映射组装而成,schema是通过metaClient单独读取,要先确认下metaClient读取到的tag的schema顺序和scan出的数据的顺序是否一致。
- 你先看下spark日志中“dataset's schema:“ 这一行日志打印出的DF的schema信息
- 可以通过java-client scan 该tag中的数据,看得到的数据顺序是否与schema顺序一致。
from nebula-java.
直接通过网页客户端scan应该也行吧?这个是scan的结果
dataset schema日志:
20/12/07 21:30:06 INFO NebulaRelation: dataset's schema: StructType(StructField(_vertexId,StringType,false), StructField(vid,StringType,true), StructField(vlength,LongType,true), StructField(inDegree,LongType,true), StructField(groupID,LongType,true), StructField(isKey,LongType,true))
看来不是一致的,不知道怎么调整?
from nebula-java.
我翻了下代码
https://github.com/vesoft-inc/nebula-java/blob/v1.0/tools/nebula-spark/src/main/scala/com/vesoft/nebula/tools/connector/reader/NebulaRelation.scala#L46
这边构造df的schema时,使用metaClient.getTagSchema
返回的nebula schema类型是Map[String, Class]
看起来是可能会出现顺序丢失的情况,不知道是不是这个原因
from nebula-java.
大致改了下,可以读到正确的顺序了
/**
* return the dataset's schema. Schema includes configured cols in returnCols or includes all properties in nebula.
*/
def getSchema(nebulaOptions: NebulaOptions): StructType = {
val returnColMap = nebulaOptions.getReturnColMap
val fields: ListBuffer[StructField] = new ListBuffer[StructField]
val metaClient = NebulaUtils.createMetaClient(nebulaOptions.getHostAndPorts, nebulaOptions)
import scala.collection.JavaConverters._
var nebulaSchema: Schema = null
returnColMap.keySet.foreach(k => {
if (DataTypeEnum.VERTEX.toString.equalsIgnoreCase(nebulaOptions.dataType)) {
fields.append(DataTypes.createStructField("_vertexId", DataTypes.StringType, false))
nebulaSchema = metaClient.getTag(nebulaOptions.spaceName, nebulaOptions.label)
} else {
fields.append(DataTypes.createStructField("_srcId", DataTypes.StringType, false))
fields.append(DataTypes.createStructField("_dstId", DataTypes.StringType, false))
nebulaSchema = metaClient.getEdge(nebulaOptions.spaceName, nebulaOptions.label)
}
if (nebulaOptions.allCols) {
// if allCols is true, then fields should contain all properties.
nebulaSchema.columns.asScala
.foreach(columnDef => {
LOG.info(s"prop name ${columnDef.getName}, type ${columnDef.getType} ")
fields.append(
DataTypes.createStructField(columnDef.getName,
NebulaUtils.convertDataType(NebulaTypeUtil.supportedTypeToClass(columnDef.getType.getType)),
true))
})
} else {
// todo 暂未实现指定列
throw new Error("to be continued")
}
labelFields ++ Map(k -> fields)
datasetSchema = new StructType(fields.toArray)
})
LOG.info(s"dataset's schema: $datasetSchema")
datasetSchema
}
df schema顺序:
20/12/07 22:19:55 INFO NebulaRelation: dataset's schema: StructType(StructField(_vertexId,StringType,false), StructField(vid,StringType,true), StructField(vlength,LongType,true), StructField(groupID,LongType,true), StructField(isKey,LongType,true), StructField(inDegree,LongType,true))
不过指定列版本的我就没想了。。
from nebula-java.
"这边构造df的schema时,使用metaClient.getTagSchema返回的nebula schema类型是Map[String, Class]
看起来是可能会出现顺序丢失的情况,不知道是不是这个原因"
是你说的这个原因,还是按照tag本身的shema更准确。对于指定列可以采用metaClient.getTagSchema的结果,由入参的列序列来决定顺序。 欢迎来提一个pr~
from nebula-java.
Related Issues (20)
- SessionPool 连接 3 节点集群时,一个节点宕机后,客户端无法连接,请问怎么设置集群可用度? HOT 3
- typo in examples? HOT 1
- The effective time of space is not friendly to testing HOT 2
- No worry
- Verify java client with custom header HOT 4
- the ordering of params was problematic, and it wasn't passing params correctly. HOT 2
- Does NebulaGraph support image storage? HOT 2
- [ent 3.6 zone]when upload to hdfs report error HOT 2
- 客户端连接集群问题 HOT 4
- When will storageclient provide the 'socketTimeout' parameter? HOT 1
- allow to config the charset for query statement
- support to config the version for graph client
- support to config the version for meta client
- support to auth for storage client
- nebula-java session会话 HOT 2
- Http2 example failed HOT 4
- support tls single authentication
- nebula java client HOT 3
- Shouldn't close the whole session pool on a single session error HOT 1
- a connection to http:// was leadked. Did you forget to close a response body
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nebula-java.