GithubHelp home page GithubHelp logo

apache / doris-spark-connector Goto Github PK

View Code? Open in Web Editor NEW
71.0 33.0 90.0 600 KB

Spark Connector for Apache Doris

Home Page: https://doris.apache.org/

License: Apache License 2.0

Smarty 0.01% Shell 1.96% Java 58.70% Scala 39.33%
data-warehousing mpp olap dbms apache doris spark connector

doris-spark-connector's Introduction

Spark Connector for Apache Doris

License Join the Doris Community at Slack

Spark Doris Connector

More information about compilation and usage, please visit Spark Doris Connector

License

Apache License, Version 2.0

How to Build

You need to copy customer_env.sh.tpl to customer_env.sh before build and you need to configure it before build.

git clone [email protected]:apache/doris-spark-connector.git
cd doris-spark-connector/spark-doris-connector
./build.sh

QuickStart

  1. download and compile Spark Doris Connector from https://github.com/apache/doris-spark-connector, we suggest compile Spark Doris Connector by Doris offfcial image。
$ docker pull apache/doris:build-env-ldb-toolchain-latest
  1. the result of compile jar is like:spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar

  2. download spark for https://spark.apache.org/downloads.html .if in china there have a good choice of tencent link https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/

#download
wget https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
#decompression
tar -xzvf spark-3.1.2-bin-hadoop3.2.tgz
  1. config Spark environment
vim /etc/profile
export SPARK_HOME=/your_parh/spark-3.1.2-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin
source /etc/profile
  1. copy spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar to spark jars directory。
cp /your_path/spark-doris-connector/target/spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar  $SPARK_HOME/jars
  1. created doris database and table。

    create database mongo_doris;
    use mongo_doris;
    CREATE TABLE data_sync_test_simple
     (
             _id VARCHAR(32) DEFAULT '',
             id VARCHAR(32) DEFAULT '',
             user_name VARCHAR(32) DEFAULT '',
             member_list VARCHAR(32) DEFAULT ''
     )
     DUPLICATE KEY(_id)
     DISTRIBUTED BY HASH(_id) BUCKETS 10
     PROPERTIES("replication_num" = "1");
    INSERT INTO data_sync_test_simple VALUES ('1','1','alex','123');
    1. Input this coed in spark-shell.
import org.apache.doris.spark._
val dorisSparkRDD = sc.dorisRDD(
  tableIdentifier = Some("mongo_doris.data_sync_test"),
  cfg = Some(Map(
    "doris.fenodes" -> "127.0.0.1:8030",
    "doris.request.auth.user" -> "root",
    "doris.request.auth.password" -> ""
  ))
)
dorisSparkRDD.collect()
  • mongo_doris:doris database name
  • data_sync_test:doris table mame.
  • doris.fenodes:doris FE IP:http_port
  • doris.request.auth.user:doris user name.
  • doris.request.auth.password:doris password
  1. if Spark is Cluster model,upload Jar to HDFS,add doris-spark-connector jar HDFS URL in spark.yarn.jars.
spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar

Link:apache/doris#9486

  1. in pyspark,input this code in pyspark shell command.
dorisSparkDF = spark.read.format("doris")
.option("doris.table.identifier", "mongo_doris.data_sync_test")
.option("doris.fenodes", "127.0.0.1:8030")
.option("user", "root")
.option("password", "")
.load()
# show 5 lines data 
dorisSparkDF.show(5)

type convertion for writing to doris using arrow

doris spark
BOOLEAN BooleanType
TINYINT ByteType
SMALLINT ShortType
INT IntegerType
BIGINT LongType
LARGEINT StringType
FLOAT FloatType
DOUBLE DoubleType
DECIMAL(M,D) DecimalType(M,D)
DATE DateType
DATETIME TimestampType
CHAR(L) StringType
VARCHAR(L) StringType
STRING StringType
ARRAY ARRAY
MAP MAP
STRUCT STRUCT

Report issues or submit pull request

If you find any bugs, feel free to file a GitHub issue or fix it by submitting a pull request.

Contact Us

Contact us through the following mailing list.

Name Scope
[email protected] Development-related discussions Subscribe Unsubscribe Archives

Links

doris-spark-connector's People

Contributors

aiden-dong avatar bowenliang123 avatar caoliang-web avatar chncaesar avatar chovy-3012 avatar codecooker17 avatar dependabot[bot] avatar dongliang-0 avatar gnehil avatar hf200012 avatar jnsimba avatar kyofin avatar lexluo09 avatar lide-reed avatar morningman avatar morningman-cmy avatar qidaye avatar shoothzj avatar smallhibiscus avatar timelxy avatar tinkerrrr avatar vinson0526 avatar wolfboys avatar wunan1210 avatar wuwenchi avatar yagagagaga avatar yangzhg avatar youngwb avatar zhaorongsheng avatar zhenhb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

doris-spark-connector's Issues

[Feature] Support Spark3.2 compilation

Search before asking

  • I had searched in the issues and found no similar issues.

Description

No response

Use case

No response

Related issues

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Spark Doris Connector Release Note 1.3.0

Feature & Improvement

  1. Support Spark3.3, 3.4
  2. DorisWriter write memory optimization #140
  3. Supports Map, Struct reading and writing, and Array type writing
  4. Support adding hidden delimiters when writing in CSV format
  5. Write supports two-phase commit #122
  6. Add auto-redirect configuration to write data through direct connection fe
  7. Writing supports OverWrite mode
  8. Structured Streaming supports Row format DataFrame writing doris
  9. Optimize some log output

Bug

  1. Fixed the issue where doris.filter.query pushdown does not take effect in some scenarios
  2. Fix the problem of writing null pointer to String type
  3. Fix Structured Streaming writing exception problem

Thanks

@CodeCooker17
@daikon12
@gnehil
@huanccwang
@JNSimba
@shoothzj
@wolfboys

[Enhancement] Optimize log when flush data to Doris

Search before asking

  • I had searched in the issues and found no similar issues.

Description

When Spark-connector flush data to Doris, it will do retry when flush error.
There will be a warn log when a backend return error, which can cause unnecessary concern among users.

The log is like below

22/04/29 13:10:03 WARN DorisSourceProvider: Failed to load data on BE: http://xxx:8040/api/xxx/xxx/_stream_load? node
22/04/29 13:10:03 WARN DorisSourceProvider: Failed to load data on BE: http://xxx:8040/api/xxx/xxx/_stream_load? node
22/04/29 13:10:03 WARN DorisSourceProvider: Failed to load data on BE: http://xxx:8040/api/xxx/xxx/_stream_load? node

Solution

Change log level from warn to debug.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] doris-2.0.1 partial_columns update error

Search before asking

  • I had searched in the issues and found no similar issues.

Version

doris:2.0.1
spark connector:spark-doris-connector-3.2_2.12

What's Wrong?

there is a error when I use partial_columns update :Partial update should include all key columns, missing: end_sys_imp_date
image
image

What You Expected?

partial_columns were updated successfully
It can be successful using csv and stream load way

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] 无法将datetime类型数据转换成日期格式

Search before asking

  • I had searched in the issues and found no similar issues.

Version

1.3.1

What's Wrong?

doris-spark-connector无法将doris库中datetime类型数据在spark中转换为java Date或timestamp类型。目前统一转换为String类型,对于使用非常不方便!

What You Expected?

doris中的datetime类型转换为java中的Date或Timestamp类型

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Enhancement] Add a parameter that controls the number of StreamLoad tasks committed per partition

Search before asking

  • I had searched in the issues and found no similar issues.

Description

If the amount of data in a partition is greater than INSERT_BARCH_SIZE, each task commits multiple StreamLoad tasks. If the task fails to retry, all data in the partition is recommitted to the StreamLoad task, as well as the data that was previously successfully written. Data duplication occurs.
当一个分区中的数据量大于参数INSERT_BARCH_SIZE时,每个task便会提交多个StreamLoad任务,如果任务发生失败重试,那么该分区的所有数据便会重新提交StreamLoad任务,对于之前成功写入的数据也会重新提交,造成数据重复。

我的建议是增加一个参数,如果开启则强制每个分区只提交一个StreamLoad,保证数据不会被重复提交。

Solution

My suggestion is to add a parameter that, if enabled, forces only one StreamLoad per partition to ensure that data is not repeatedly committed.

我的建议是增加一个参数,如果开启则强制每个分区只提交一个StreamLoad,保证数据不会被重复提交。

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] connot sink spark array to doris by connector

Search before asking

  • I had searched in the issues and found no similar issues.

Version

org.apache.doris spark-doris-connector-3.2_2.12 1.2.0

What's Wrong?

canot wirte array to doris

What You Expected?

wirte to doris table but wirte null

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug]https://repo1.maven.org/maven2/org/apache/doris/spark-doris-connector-3.2_2.12/1.3.0/spark-doris-connector-3.2_2.12-1.3.0.pom: expected='1.3.0 found='1.3.0-SNAPSHOT'

Search before asking

  • I had searched in the issues and found no similar issues.

Version

v1.3.0和v1.2.0都有这个问题

What's Wrong?

**仓库的pom文件没有更新
<thrift-service.version>1.0.0</thrift-service.version>
<netty.version>4.1.77.Final</netty.version>
<arrow.version>5.0.0</arrow.version>
<spark.major.version>3.1</spark.major.version>
<libthrift.version>0.16.0</libthrift.version>
<project.scm.id>github</project.scm.id>
<fasterxml.jackson.version>2.10.5</fasterxml.jackson.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
1.2.0-SNAPSHOT
<scala.version>2.12</scala.version>
<spark.version>3.1.2</spark.version>

What You Expected?

修改下

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] spark doris connector read table error: Doris FE's response cannot map to schema.

Search before asking

  • I had searched in the issues and found no similar issues.

Version

  • connector : org.apache.doris:spark-doris-connector-3.1_2.12:1.0.1
  • doris: 1.1 preview2
  • spark: 3.1.2

What's Wrong?

Read a table

from pyspark.sql import SparkSession
spark = SparkSession.builder \
 .appName('Spark Doris Demo Nick') \
 .config('org.apache.doris:spark-doris-connector-3.1_2.12:1.0.1') \
 .getOrCreate()
spark

dorisSparkDF = spark.read.format("doris")\
    .option("doris.table.identifier", "db.token_info")\
    .option("doris.fenodes", "xxx:8031")\
    .option("user", "xxx")\
    .option("password", "xxx").load()
dorisSparkDF.show(5)

then get a error

22/06/23 07:47:03 ERROR SchemaUtils: Doris FE's response cannot map to schema. res: {"keysType":"UNIQUE_KEYS","properties":[{"name":"chain","aggregation_type":"","comment":"","type":"STRING"},{"name":"token_slug","aggregation_type":"","comment":"","type":"STRING"},{"name":"token_address","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"token_symbol","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"decimals","aggregation_type":"REPLACE","comment":"","type":"INT"},{"name":"type","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"token_type","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"protocol_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"manual_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"erc20_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"coin_gecko_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"logo","aggregation_type":"REPLACE","comment":"","type":"STRING"}],"status":200}
org.codehaus.jackson.map.exc.UnrecognizedPropertyException: Unrecognized field "keysType" (Class org.apache.doris.spark.rest.models.Schema), not marked as ignorable
 at [Source: java.io.StringReader@74af102e; line: 1, column: 14] (through reference chain: org.apache.doris.spark.rest.models.Schema["keysType"])
	at org.codehaus.jackson.map.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:53)
	at org.codehaus.jackson.map.deser.StdDeserializationContext.unknownFieldException(StdDeserializationContext.java:267)
	at org.codehaus.jackson.map.deser.std.StdDeserializer.reportUnknownProperty(StdDeserializer.java:673)
	at org.codehaus.jackson.map.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:659)
	at org.codehaus.jackson.map.deser.BeanDeserializer.handleUnknownProperty(BeanDeserializer.java:1365)
	at org.codehaus.jackson.map.deser.BeanDeserializer._handleUnknown(BeanDeserializer.java:725)
	at org.codehaus.jackson.map.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:703)
	at org.codehaus.jackson.map.deser.BeanDeserializer.deserialize(BeanDeserializer.java:580)
	at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2732)
	at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1863)
	at org.apache.doris.spark.rest.RestService.parseSchema(RestService.java:295)
	at org.apache.doris.spark.rest.RestService.getSchema(RestService.java:279)
	at org.apache.doris.spark.sql.SchemaUtils$.discoverSchemaFromFe(SchemaUtils.scala:51)
	at org.apache.doris.spark.sql.SchemaUtils$.discoverSchema(SchemaUtils.scala:41)
	at org.apache.doris.spark.sql.DorisRelation.lazySchema$lzycompute(DorisRelation.scala:48)
	at org.apache.doris.spark.sql.DorisRelation.lazySchema(DorisRelation.scala:48)
	at org.apache.doris.spark.sql.DorisRelation.schema(DorisRelation.scala:52)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:449)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 dorisSparkDF = spark.read.format("doris")\
      2     .option("doris.table.identifier", "xxx.token_info")\
      3     .option("doris.fenodes", "xxxx:8031")\
      4     .option("user", "xxxx")\
      5     .option("password", "xxxxx").load()
      6 dorisSparkDF.show(5)

File /usr/lib/spark/python/pyspark/sql/readwriter.py:210, in DataFrameReader.load(self, path, format, schema, **options)
    208     return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
    209 else:
--> 210     return self._df(self._jreader.load())

File /opt/conda/miniconda3/lib/python3.8/site-packages/py4j/java_gateway.py:1304, in JavaMember.__call__(self, *args)
   1298 command = proto.CALL_COMMAND_NAME +\
   1299     self.command_header +\
   1300     args_command +\
   1301     proto.END_COMMAND_PART
   1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
   1305     answer, self.gateway_client, self.target_id, self.name)
   1307 for temp_arg in temp_args:
   1308     temp_arg._detach()

File /usr/lib/spark/python/pyspark/sql/utils.py:111, in capture_sql_exception.<locals>.deco(*a, **kw)
    109 def deco(*a, **kw):
    110     try:
--> 111         return f(*a, **kw)
    112     except py4j.protocol.Py4JJavaError as e:
    113         converted = convert_exception(e.java_exception)

File /opt/conda/miniconda3/lib/python3.8/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o72.load.
: org.apache.doris.spark.exception.DorisException: Doris FE's response cannot map to schema. res: {"keysType":"UNIQUE_KEYS","properties":[{"name":"chain","aggregation_type":"","comment":"","type":"STRING"},{"name":"token_slug","aggregation_type":"","comment":"","type":"STRING"},{"name":"token_address","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"token_symbol","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"decimals","aggregation_type":"REPLACE","comment":"","type":"INT"},{"name":"type","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"token_type","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"protocol_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"manual_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"erc20_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"coin_gecko_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"logo","aggregation_type":"REPLACE","comment":"","type":"STRING"}],"status":200}
	at org.apache.doris.spark.rest.RestService.parseSchema(RestService.java:303)
	at org.apache.doris.spark.rest.RestService.getSchema(RestService.java:279)
	at org.apache.doris.spark.sql.SchemaUtils$.discoverSchemaFromFe(SchemaUtils.scala:51)
	at org.apache.doris.spark.sql.SchemaUtils$.discoverSchema(SchemaUtils.scala:41)
	at org.apache.doris.spark.sql.DorisRelation.lazySchema$lzycompute(DorisRelation.scala:48)
	at org.apache.doris.spark.sql.DorisRelation.lazySchema(DorisRelation.scala:48)
	at org.apache.doris.spark.sql.DorisRelation.schema(DorisRelation.scala:52)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:449)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.codehaus.jackson.map.exc.UnrecognizedPropertyException: Unrecognized field "keysType" (Class org.apache.doris.spark.rest.models.Schema), not marked as ignorable
 at [Source: java.io.StringReader@74af102e; line: 1, column: 14] (through reference chain: org.apache.doris.spark.rest.models.Schema["keysType"])
	at org.codehaus.jackson.map.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:53)
	at org.codehaus.jackson.map.deser.StdDeserializationContext.unknownFieldException(StdDeserializationContext.java:267)
	at org.codehaus.jackson.map.deser.std.StdDeserializer.reportUnknownProperty(StdDeserializer.java:673)
	at org.codehaus.jackson.map.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:659)
	at org.codehaus.jackson.map.deser.BeanDeserializer.handleUnknownProperty(BeanDeserializer.java:1365)
	at org.codehaus.jackson.map.deser.BeanDeserializer._handleUnknown(BeanDeserializer.java:725)
	at org.codehaus.jackson.map.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:703)
	at org.codehaus.jackson.map.deser.BeanDeserializer.deserialize(BeanDeserializer.java:580)
	at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2732)
	at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1863)
	at org.apache.doris.spark.rest.RestService.parseSchema(RestService.java:295)
	... 23 more

What You Expected?

There should be no errors

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] spark stream dataframe write to doris json parsing exeception

Search before asking

  • I had searched in the issues and found no similar issues.

Version

master

What's Wrong?

spark stream write to doris occurs the follwing exception:

com.fasterxml.jackson.core.JsonParseException: Unexpected character ('-' (code 45)): Expected space separating root-level values
 at [Source: (String)"2022-08-23 12:09:22.706"; line: 1, column: 6]
	at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1840)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:712)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:637)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportMissingRootWS(ParserMinimalBase.java:684)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._verifyRootSpace(ReaderBasedJsonParser.java:1678)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._parsePosNumber(ReaderBasedJsonParser.java:1321)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:769)
	at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4231)
	at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2711)
	at org.apache.doris.spark.sql.DorisStreamLoadSink.$anonfun$write$3(DorisStreamLoadSink.scala:58)
	at org.apache.doris.spark.sql.DorisStreamLoadSink.$anonfun$write$3$adapted(DorisStreamLoadSink.scala:56)
	at scala.collection.immutable.Range.foreach(Range.scala:156)
	at org.apache.doris.spark.sql.DorisStreamLoadSink.$anonfun$write$2(DorisStreamLoadSink.scala:56)
	at org.apache.doris.spark.sql.DorisStreamLoadSink.$anonfun$write$2$adapted(DorisStreamLoadSink.scala:54)
	at scala.collection.Iterator.foreach(Iterator.scala:944)
	at scala.collection.Iterator.foreach$(Iterator.scala:944)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.foreach(WholeStageCodegenExec.scala:753)
	at org.apache.doris.spark.sql.DorisStreamLoadSink.$anonfun$write$1(DorisStreamLoadSink.scala:54)
	at org.apache.doris.spark.sql.DorisStreamLoadSink.$anonfun$write$1$adapted(DorisStreamLoadSink.scala:51)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1020)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

What You Expected?

doris-spark-connector can write stream dataframe to doris

How to Reproduce?

eg:

    df.selectExpr("CAST(timestamp AS STRING)", "CAST(partition as INT)")
      .writeStream
      .format("doris")
      .option("checkpointLocation", "/tmp/test")
      .option("doris.table.identifier", dorisTable)
      .option("doris.fenodes", dorisFeNodes)
      .option("user", dorisUser)
      .option("password", dorisPwd)
      .start().awaitTermination()
    spark.stop()

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] Deserialization failed in the RestService::getSchema method

Search before asking

  • I had searched in the issues and found no similar issues.

Version

Doris: 1.0rc
spark-connector: spark-doris-connector-3.1_2.12 v1.0.1

What's Wrong?

When Spark buildScan for DorisRelation, it will invoke the RestService::getSchema, and cause the definition of Schema lacks the 'keysType' field which was added in the HTTP interface since 2022.1.3,it will throw a Exception for the deserialization failed.

What You Expected?

The HTTP response string should be deseriazed normally

How to Reproduce?

Just use the Spark-connector with the newest version of Doris,and submit query job with it.

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] The where condition push-down overrides the doris.filter.query filter condition

Search before asking

  • I had searched in the issues and found no similar issues.

Version

1.1.1

What's Wrong?

read table from

df_origin = spark.read.format("doris")\
    .option("doris.table.identifier", "db.nft_transactions")\
    .option("doris.fenodes", "")\
    .option("user", "root")\
    .option("password", "password") \
    .option("doris.filter.query", "block_timestamp >= '2022-06-01' and block_timestamp < '2022-06-03'") \
    .option("doris.read.field", "block_timestamp,marketplace_slug") \
    .option("doris.batch.size", 40000) \
    .load() 
df_origin.show()

output:

+----------------+-------------------+
|marketplace_slug| block_timestamp|
+----------------+-------------------+
| aavegotchi|2022-06-01 00:15:02|
| aavegotchi|2022-06-01 00:15:14|
| aavegotchi|2022-06-01 00:15:26|
| aavegotchi|2022-06-01 00:15:38|
| aavegotchi|2022-06-01 00:18:50|
| aavegotchi|2022-06-01 00:20:26|
| aavegotchi|2022-06-01 00:21:10|

doris connector log

22/08/25 03:08:47 DEBUG org.apache.doris.spark.sql.ScalaDorisRowRDD: Query SQL Sending to Doris FE is: 'select marketplace_slug,block_timestamp from db.nft_transactions where block_timestamp >= '2022-06-01' and block_timestamp < '2022-06-03''.

在这个基础上,我们继续做 where 过滤

df_origin.where("marketplace_slug = 'opensea'").show()

doris connector 会将这个过滤这个 where 条件下推到 doris

doris connector log

Query SQL Sending to Doris FE is: 'select marketplace_slug,block_timestamp from gaia_data__origin_data.nft_transactions where (marketplace_slug is not null) and (marketplace_slug = 'opensea')'.

可以看到,推送到 doris 的条件忽略了前面的 block_timestamp filter,导致最终查询结果是

+----------------+-------------------+
|marketplace_slug| block_timestamp|
+----------------+-------------------+
| opensea|2019-08-01 00:06:57|
| opensea|2019-08-01 00:17:42|
| opensea|2019-08-01 00:19:20|
| opensea|2019-08-01 00:38:02|
| opensea|2019-08-01 00:47:38|
| opensea|2019-08-01 00:59:39|

出现了我们不期望的日期的数据

What You Expected?

  1. where 条件下推要结合 doris.filter.query 的过滤条件,需要parse sql,有些麻烦
  2. 提供选项关闭 where 条件下推,让 spark 来完成这个 where 过滤

How to Reproduce?

见上文

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Spark Doris Connector Release Note 1.2.0

Feature & Improvement

  1. Compile script refactoring optimization
  2. Increase the import of csv format
  3. Support pushdown of doris.filter.query on sparkSQL
  4. Support doris datev2/datetimev2/decimalv3/jsonb/array type
  5. Delete thrfit dependency and introduce thrift sdk
  6. Support setting import interval
  7. Write code refactoring and optimization
  8. Optimize error log output

Thanks

Thanks to everyone who has contributed to this release:
@bowenliang123
@caoliang-web
@chncaesar
@chovy-3012
@DongLiang-0
@gnehil
@JNSimba
@LemonLiTree
@lexluo09
@MrZHui888
@myfjdthink
@smallhibiscus
@timelxy
@wolfboys
@yagagagaga

[Bug] column_separator and line_delimiter not support an invisible character

Search before asking

  • I had searched in the issues and found no similar issues.

Version

Doris 1.2.3
doris-spark-connector: master

What's Wrong?

pyspark3.1.3

df.write.format("doris")
        .option("doris.table.identifier", "")
        .option("doris.fenodes", "")
        .option("user", "")
        .option("password", "")
        .option("doris.write.fields", ",".join(df.columns))
        .option("doris.sink.batch.size", "20000")
        .option("doris.sink.max-retries", "3")
        .option("doris.sink.properties.format", "csv")
        .option("sink.properties.column_separator", "\\x01")
        .option("sink.properties.line_delimiter", "\\x07")
        .option("doris.sink.batch.interval.ms", "200")

stream load csv failed with invisible character

What You Expected?

like Doris fe
org.apache.doris.analysis.Separator
convert \x01 and invisible character

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] Unrecognized Doris type DATETIMEV2

Search before asking

  • I had searched in the issues and found no similar issues.

Version

spark-doris-connector-2.3_2.11 1.1.0

What's Wrong?

  1. Exception in thread "main" org.apache.doris.spark.exception.DorisException: Unrecognized Doris type DATETIMEV2

What You Expected?

should sport DATETIMEV2

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] can not write data to doris0.12

Search before asking

  • I had searched in the issues and found no similar issues.

Version

doris version:0.12
spark-doris-connector-3.1_2.12 version:1.0.1

What's Wrong?

When I use spark-doris-connector write dataFrame to doris 0.12 , an error occurs("Connect to doris http://xx:8030/api/backends?is_alive=true failed."),as shown in the figure below:
1648548631(1)

I guess it's because doris-0.12 has no such interface("api/backends" ),but the official website document says that it supports 0.12+.
1648549088(1)

Hope to get a reply, thank you.

What You Expected?

support doris 0.12+ or change the official website document

How to Reproduce?

Use the official example to reproduce.

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Enhancement] build script improvement

Search before asking

  • I had searched in the issues and found no similar issues.

Description

Currently, in the build.sh script. env checks such as thrift, java, maven, etc., needs to be optimized.

  1. thrift needs to verify the version is 0.13.0
  2. mvnw support(if not install maven)
  3. check java env need to improvement

Solution

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] not support doris V2 data type

Search before asking

  • I had searched in the issues and found no similar issues.

Version

spark-doris-connector-3.2_2.12
doris 1.2.3

What's Wrong?

org.apache.doris.spark.exception.DorisException: Unrecognized Doris type DATEV2

What You Expected?

support newly doris data type

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Feature] support spark catalog

Search before asking

  • I had searched in the issues and found no similar issues.

Description

support spark catalog @hf200012

Use case

No response

Related issues

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] Unable to correctly recognize time partition fields

Search before asking

  • I had searched in the issues and found no similar issues.

Version

doris-spark-connector:1.3.0-1.3.2
doris:2.0
hive:3.1.3
hadoop:3.3.4
spark:3.3.1

What's Wrong?

spark-sql (default)> CREATE TEMPORARY VIEW dwd_test
> USING doris
> OPTIONS(
> 'table.identifier'='dw_dwd.dwd_test',
> 'fenodes'='xxx:8030',
> 'user'='xxxx',
> 'password'='xxx',
> 'sink.properties.format' = 'json'
> );
Response code
Time taken: 3.393 seconds
spark-sql (default)> select * from dwd_test where dt ='2024-01-02' limit 3;
14:07:18.625 [main] ERROR org.apache.doris.spark.sql.ScalaDorisRowRDD - Doris FE's response cannot map to schema. res: {"exception":"errCode = 2, detailMessage = Incorrect datetime value: CAST(2021 AS DATETIME) in expression: (CAST(dt AS DATETIME) = CAST(2021 AS DATETIME))","status":400}
org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "exception" (class org.apache.doris.spark.rest.models.QueryPlan), not marked as ignorable (3 known properties: "partitions", "status", "opaqued_query_plan"])
at [Source: (String)"{"exception":"errCode = 2, detailMessage = Incorrect datetime value: CAST(2021 AS DATETIME) in expression: (CAST(dt AS DATETIME) = CAST(2021 AS DATETIME))","status":400}"; line: 1, column: 15] (through reference chain: org.apache.doris.spark.rest.models.QueryPlan["exception"])
at org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2036) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1678) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:320) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3629) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3597) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rest.RestService.getQueryPlan(RestService.java:284) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rest.RestService.findPartitions(RestService.java:261) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rdd.AbstractDorisRDD.dorisPartitions$lzycompute(AbstractDorisRDD.scala:58) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rdd.AbstractDorisRDD.dorisPartitions(AbstractDorisRDD.scala:57) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rdd.AbstractDorisRDD.getPartitions(AbstractDorisRDD.scala:35) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:476) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:459) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:451) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$2(SparkSQLDriver.scala:69) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at scala.collection.Iterator.foreach(Iterator.scala:943) ~[scala-library-2.12.15.jar:?]
at scala.collection.Iterator.foreach$(Iterator.scala:943) ~[scala-library-2.12.15.jar:?]
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) ~[scala-library-2.12.15.jar:?]
at scala.collection.IterableLike.foreach(IterableLike.scala:74) ~[scala-library-2.12.15.jar:?]
at scala.collection.IterableLike.foreach$(IterableLike.scala:73) ~[scala-library-2.12.15.jar:?]
at scala.collection.AbstractIterable.foreach(Iterable.scala:56) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:286) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_212]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_212]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_212]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_212]
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ~[spark-core_2.12-3.3.1.jar:3.3.1]
14:07:18.643 [main] ERROR org.apache.spark.sql.hive.thriftserver.SparkSQLDriver - Failed in [select * from dwd_cc_trade_pay_success_di where dt ='2024-01-02' limit 3]
org.apache.doris.spark.exception.DorisException: Doris FE's response cannot map to schema. res: {"exception":"errCode = 2, detailMessage = Incorrect datetime value: CAST(2021 AS DATETIME) in expression: (CAST(dt AS DATETIME) = CAST(2021 AS DATETIME))","status":400}
at org.apache.doris.spark.rest.RestService.getQueryPlan(RestService.java:292) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rest.RestService.findPartitions(RestService.java:261) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rdd.AbstractDorisRDD.dorisPartitions$lzycompute(AbstractDorisRDD.scala:58) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rdd.AbstractDorisRDD.dorisPartitions(AbstractDorisRDD.scala:57) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rdd.AbstractDorisRDD.getPartitions(AbstractDorisRDD.scala:35) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:476) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:459) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:451) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$2(SparkSQLDriver.scala:69) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at scala.collection.Iterator.foreach(Iterator.scala:943) ~[scala-library-2.12.15.jar:?]
at scala.collection.Iterator.foreach$(Iterator.scala:943) ~[scala-library-2.12.15.jar:?]
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) ~[scala-library-2.12.15.jar:?]
at scala.collection.IterableLike.foreach(IterableLike.scala:74) ~[scala-library-2.12.15.jar:?]
at scala.collection.IterableLike.foreach$(IterableLike.scala:73) ~[scala-library-2.12.15.jar:?]
at scala.collection.AbstractIterable.foreach(Iterable.scala:56) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:286) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_212]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_212]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_212]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_212]
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ~[spark-core_2.12-3.3.1.jar:3.3.1]
Caused by: org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "exception" (class org.apache.doris.spark.rest.models.QueryPlan), not marked as ignorable (3 known properties: "partitions", "status", "opaqued_query_plan"])
at [Source: (String)"{"exception":"errCode = 2, detailMessage = Incorrect datetime value: CAST(2021 AS DATETIME) in expression: (CAST(dt AS DATETIME) = CAST(2021 AS DATETIME))","status":400}"; line: 1, column: 15] (through reference chain: org.apache.doris.spark.rest.models.QueryPlan["exception"])
at org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2036) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1678) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:320) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3629) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3597) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rest.RestService.getQueryPlan(RestService.java:284) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
... 55 more
org.apache.doris.spark.exception.DorisException: Doris FE's response cannot map to schema. res: {"exception":"errCode = 2, detailMessage = Incorrect datetime value: CAST(2021 AS DATETIME) in expression: (CAST(dt AS DATETIME) = CAST(2021 AS DATETIME))","status":400}
at org.apache.doris.spark.rest.RestService.getQueryPlan(RestService.java:292)
at org.apache.doris.spark.rest.RestService.findPartitions(RestService.java:261)
at org.apache.doris.spark.rdd.AbstractDorisRDD.dorisPartitions$lzycompute(AbstractDorisRDD.scala:58)
at org.apache.doris.spark.rdd.AbstractDorisRDD.dorisPartitions(AbstractDorisRDD.scala:57)
at org.apache.doris.spark.rdd.AbstractDorisRDD.getPartitions(AbstractDorisRDD.scala:35)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:476)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:459)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48)
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:451)
at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$2(SparkSQLDriver.scala:69)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:286)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "exception" (class org.apache.doris.spark.rest.models.QueryPlan), not marked as ignorable (3 known properties: "partitions", "status", "opaqued_query_plan"])
at [Source: (String)"{"exception":"errCode = 2, detailMessage = Incorrect datetime value: CAST(2021 AS DATETIME) in expression: (CAST(dt AS DATETIME) = CAST(2021 AS DATETIME))","status":400}"; line: 1, column: 15] (through reference chain: org.apache.doris.spark.rest.models.QueryPlan["exception"])
at org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2036)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1678)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:320)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3629)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3597)
at org.apache.doris.spark.rest.RestService.getQueryPlan(RestService.java:284)
... 55 more

What You Expected?

i can correctly query data from doris using date format the filter field is partition fields

How to Reproduce?

No response

Anything Else?

i can correctly query data from doris using date format when the filter field is not partition fields

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] No Doris FE is available when fe use https

Search before asking

  • I had searched in the issues and found no similar issues.

Version

doris 1.2.3, master

What's Wrong?

when fe switch https then use spark connector while throw error
java.io.IOException: Failed to get response from Doris
image

What You Expected?

execute spark connector success

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Feature] Not inculude scala in release jar

Search before asking

  • I had searched in the issues and found no similar issues.

Description

In short, scala should be provided

Scala packages should not be packaged as this can lead to version conflicts in the environment, and users will need additional shading work.

image

Use case

No response

Related issues

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] cannot use bitmap to doris by doris spark connector

Search before asking

  • I had searched in the issues and found no similar issues.

Version

org.apache.doris spark-doris-connector-3.2_2.12 1.3.0

What's Wrong?

use the official documentation to write bitmap in spark df mode use bitmap_from_array function to write data,like this
df.write.format("doris")
.option("doris.table.identifier", s"$tableName")
.option("doris.fenodes", s"$url:$sql")
.option("user", s"$username")
.option("password", s"$password")
.option("sink.batch.size", 100)
.option("sink.max-retries", 3)
.option("doris.ignore-type", "bitmap")
.option("doris.deserialize.arrow.async", true)
.option("doris.deserialize.queue.size", 64)
.option("sink.properties.column_separator", "|")
.option("sink.properties.format", "json")
//其它选项
//指定你要写入的字段
.option("doris.write.fields", fieldList)
.save()
fieldList=tag_id, UPDATED_TIME, tag_value, user_ids,CREATED_TIME, UPDATED_BY ,user_id=bitmap_from_array(user_ids)

What You Expected?

connector can support this function like bitmap_from_array

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] 使用doris-spark-connector-3.3_2.12.jar 导数,执行中 repartition 的stage 并行度一直为1,导致做

Search before asking

  • I had searched in the issues and found no similar issues.

Version

version: spark 3.3.1

What's Wrong?

image
image
我的repartition设置是10,但是在stage中执行repartition 并行度任务数一直是1,导致分区执行完成不了;
定位源码,我想知道这个是spark3.3 需要配置什么参数么,还是说 dataframe 到 repartition 之前因为什么操作导致 并行度降到1呢
希望指导下

What You Expected?

我想知道 spark3.3 是否需要配置什么参数,从stage上看 是不是 deserializeToObejct 导致并发降低到1呢,需要什么配置什么spark参数防止这种

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] CREATE TEMPORARY VIEW USING doris got error

Search before asking

  • I had searched in the issues and found no similar issues.

Version

Doris 1.0\ Spark-Doris-Connector master \ spark3.2.1

What's Wrong?

CREATE TEMPORARY VIEW spark_doris
USING doris
OPTIONS(
  "table.identifier"="db_01.datax",
  "fenodes"="10.0.105.243:8030",
  "user"="doris",
  "password"="�Doris"
);
[Code: 0, SQL State: ]  Error operating EXECUTE_STATEMENT: org.apache.doris.spark.exception.DorisException: Doris FE's response cannot map to schema. res: "Access denied for default_cluster:[email protected]"
	at org.apache.doris.spark.rest.RestService.parseSchema(RestService.java:303)
	at org.apache.doris.spark.rest.RestService.getSchema(RestService.java:279)
	at org.apache.doris.spark.sql.SchemaUtils$.discoverSchemaFromFe(SchemaUtils.scala:53)
	at org.apache.doris.spark.sql.SchemaUtils$.discoverSchema(SchemaUtils.scala:43)
	at org.apache.doris.spark.sql.DorisRelation.lazySchema$lzycompute(DorisRelation.scala:48)
	at org.apache.doris.spark.sql.DorisRelation.lazySchema(DorisRelation.scala:48)
	at org.apache.doris.spark.sql.DorisRelation.schema(DorisRelation.scala:52)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:440)
	at org.apache.spark.sql.execution.datasources.CreateTempViewUsing.run(ddl.scala:98)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:100)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.withLocalProperties(ExecuteStatement.scala:159)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.org$apache$kyuubi$engine$spark$operation$ExecuteStatement$$executeStatement(ExecuteStatement.scala:94)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:127)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.InvalidFormatException: Cannot deserialize value of type `int` from String "Access denied for default_cluster:[email protected]": not a valid `int` value
 at [Source: (String)""Access denied for default_cluster:[email protected]""; line: 1, column: 1]
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.InvalidFormatException.from(InvalidFormatException.java:67)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.DeserializationContext.weirdStringException(DeserializationContext.java:1851)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.DeserializationContext.handleWeirdStringValue(DeserializationContext.java:1079)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.std.StdDeserializer._parseIntPrimitive(StdDeserializer.java:762)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.std.StdDeserializer._deserializeFromString(StdDeserializer.java:288)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromString(BeanDeserializerBase.java:1495)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:207)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:197)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4593)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3548)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3516)
	at org.apache.doris.spark.rest.RestService.parseSchema(RestService.java:295)
	... 48 more

What You Expected?

don't know what's wrong

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Enhancement] Fix performance issues caused by small file issues

Search before asking

  • I had searched in the issues and found no similar issues.

Description

In the case of many small files, for example, a file has only 100 pieces of data, but there are thousands or more of files, then the partition of the RDD will be greater than or equal to the number of files. At this time, the amount of data carried by the request is small, but the number of requests is large, which leads to the problem of too many versions.


在面临小文件极多的情况,例如,一个文件只有100条数据,但是有几千个甚至文件,这时RDD的分区会大于等于文件数。这样请求携带的数据量很少,但请求数很多,造成版本数过多的问题甚至导入失败。

Solution

Add a RDD maximum partition parameter, but the default is Integer.MAX_VALUE. This parameter is controlled by the user, and the repartition operation can be performed in advance to reduce the number of partitions.


添加一个RDD最大分区参数,但是默认为Integer.MAX_VALUE,由用户控制这个参数,可以提前做repartition操作减少分区数

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Spark Doris Connector Release Note 1.3.2

Features & Improvements

  1. Reconstruct Loader writing and Support writing in copy mode #187 #190
  2. Write to support https #189
  3. Support reading varint ipv4 ipv6 and other types #199 #197
  4. Support doris2.1 date/datetime reading format #193

Bug

  1. Fix the problem of error reporting when column name is keyword when writing #186
  2. Fix the problem of data duplication caused by repartition #191
  3. Fix the problem of data duplication during retry

Behavior Change

  1. Some default values of Reader have been modified #196

Thanks

@gnehil
@JNSimba
@lxwcodemonkey
@smallhibiscus
@vinlee19

[Bug] Config doris.read.field does not take effect.

Search before asking

  • I had searched in the issues and found no similar issues.

Version

spark 3.1.2 scala 2.12

What's Wrong?

Spark dataframe read doris configuration doris.read.field does not take effect.

  val session = SparkSession.builder().master("local[*]").getOrCreate()
    val dorisSparkDF = session.read
      .format("doris")
      .option("doris.fenodes", "127.0.0.1:8030")
      .option("user", "root")
      .option("password", "123456")
      .option("doris.table.identifier", "test.test_tbl")
      .option("doris.read.field", "name")
      .load()

    dorisSparkDF.show()
    session.stop()

The result of executing the above code:

+--------+------+---+
|    name|gender|age|
+--------+------+---+
|    lisi|     3| 13|
|  wangwu|     2| 14|
|zhangsan|     1| 12|
|    张三|  null| 12|
+--------+------+---+

What You Expected?

After the doris.read.field configuration takes effect, the resulting data should look like this:

+--------+
|    name|
+--------+
|    lisi|
|  wangwu|
|zhangsan|
|    张三|
+--------+

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Feature] array type support

Search before asking

  • I had searched in the issues and found no similar issues.

Description

Currently spark-doris-connector does not support array type, so, we need to solve it.

Use case

No response

Related issues

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] data loss when enable 2pc

Search before asking

  • I had searched in the issues and found no similar issues.

Version

1.3.1

What's Wrong?

The data is lost without any exception when enable doris.sink.enable-2pc and spark jobs running very long time e.g. 3~4hour.

Spark job driver warn logs.

image

The transcation is removed bacause of reaching timeout limit.

image

What You Expected?

It should throw exception when failed to commit transaction .

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] ConnectedFailedException: Connect to Doris BE{host='xxx', port=9060}failed

Search before asking

  • I had searched in the issues and found no similar issues.

Version

doris: master
spark: 3.3.1
scala: 2.12

What's Wrong?

doris spark connector select data failed.
image

What You Expected?

select data success

How to Reproduce?

doris side:
CREATE TABLE spark_connector_test_decimal (c1 int NOT NULL, c2 VARCHAR(25) NOT NULL, c3 VARCHAR(152),
c4 boolean,
c5 tinyint,
c6 smallint,
c7 bigint,
c8 float,
c9 double,
c10 datev2,
c11 datetime,
c12 char,
c13 largeint,
c14 varchar,
c15 decimalv3(15, 5)
)
DUPLICATE KEY(c1)
COMMENT "OLAP"
DISTRIBUTED BY HASH(c1) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);

insert into spark_connector_test_decimal values(10000,'aaa','abc',true, 100, 3000, 100000, 1234.567, 12345.678, '2022-12-01','2022-12-01 12:00:00', 'a', 200000, 'g', 1000.12345);
insert into spark_connector_test_decimal values(10001,'aaa','abc',false, 100, 3000, 100000, 1234.567, 12345.678, '2022-12-01','2022-12-01 12:00:00', 'a', 200000, 'g', 1000.12345);
insert into spark_connector_test_decimal values(10002,'aaa','abc',True, 100, 3000, 100000, 1234.567, 12345.678, '2022-12-01','2022-12-01 12:00:00', 'a', 200000, 'g', 1000.12345);
insert into spark_connector_test_decimal values(10003,'aaa','abc',False, 100, 3000, 100000, 1234.567, 12345.678, '2022-12-01','2022-12-01 12:00:00', 'a', 200000, 'g', 1000.12345);

select * from spark_connector_test_decimal;

spark side:
CREATE TEMPORARY VIEW spark_doris_decimal
USING doris
OPTIONS(
"table.identifier"="sparkconnector.spark_connector_test_decimal",
"fenodes"="fe_host:8030",
"user"="root",
"password"=""
);

select * from spark_doris_decimal;

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Not to Use save_mode]

Search before asking

  • I had searched in the issues and found no similar issues.

Version

doris-spark-connector: 1.4.0

What's Wrong?

image

What You Expected?

can use "save_mode"

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Feature] support https

Search before asking

  • I had searched in the issues and found no similar issues.

Description

Since doris version 2.0, fe and be support https. The spark connector should also support HTTPS requests to meet security requirements.

Use case

No response

Related issues

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] spark-doris-connector 1.3.0 : Unrecognized Doris type JSON

Search before asking

  • I had searched in the issues and found no similar issues.

Version

spark-doris-connector 1.3.0
doris 2.0.2

What's Wrong?

Unrecognized Doris type JSON

What You Expected?

Support JSON type

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Bug] 最新版的connector在maven仓库中找不到

Search before asking

  • I had searched in the issues and found no similar issues.

Version

1.3

What's Wrong?

最新版的connector在maven仓库中找不到

What You Expected?

doris-spark-connector 在maven仓库中能找到

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Enhancement] can provide a json format read data type like flink-connector

Search before asking

  • I had searched in the issues and found no similar issues.

Description

source data incude chinese word may cause some problem
Reason: actual column number is less than schema column number.actual number: 10, column separator: [ ], line delimiter: [
],
can spark like flink connetor read data by json format to avoid the problem
JSON格式导入
'sink.properties.format' = 'json' 'sink.properties.read_json_by_line' = 'true'

Solution

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[Feature] Support Spark3.3 New Feature

Search before asking

  • I had searched in the issues and found no similar issues.

Description

  • implement spark catalog api using spark sql to manage table in apache doris.(eg: create table 、drop table )
  • implement spark DS V2 API like aggregate query push down to doris and limit query push down doris.

Some implement Spark DS V2 connector :

Use case

No response

Related issues

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

[improvement] stream load import only support json data

Search before asking

  • I had searched in the issues and found no similar issues.

Version

3.1_2.12-1.0.1

What's Wrong?

If the data field contains \n \t separator, too many filtered rows will appear when importing data

What You Expected?

When importing data, the data is assembled into JSON format, and stream load is imported in JSON format .

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.