apache / doris-spark-connector Goto Github PK

View Code? Open in Web Editor NEW

74.0 33.0 90.0 612 KB

Spark Connector for Apache Doris

Home Page: https://doris.apache.org/

License: Apache License 2.0

Smarty 0.01% Shell 1.96% Java 58.60% Scala 39.43%

data-warehousing mpp olap dbms apache doris spark connector

doris-spark-connector's Introduction

Spark Connector for Apache Doris

Spark Doris Connector

More information about compilation and usage, please visit Spark Doris Connector

License

Apache License, Version 2.0

How to Build

You need to copy customer_env.sh.tpl to customer_env.sh before build and you need to configure it before build.

git clone [email protected]:apache/doris-spark-connector.git
cd doris-spark-connector/spark-doris-connector
./build.sh

QuickStart

download and compile Spark Doris Connector from https://github.com/apache/doris-spark-connector, we suggest compile Spark Doris Connector by Doris offfcial image。

$ docker pull apache/doris:build-env-ldb-toolchain-latest

the result of compile jar is like：spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar
download spark for https://spark.apache.org/downloads.html .if in china there have a good choice of tencent link https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/

#download
wget https://mirrors.cloud.tencent.com/apache/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
#decompression
tar -xzvf spark-3.1.2-bin-hadoop3.2.tgz

config Spark environment

vim /etc/profile
export SPARK_HOME=/your_parh/spark-3.1.2-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin
source /etc/profile

copy spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar to spark jars directory。

cp /your_path/spark-doris-connector/target/spark-doris-connector-3.1_2.12-1.0.0-SNAPSHOT.jar  $SPARK_HOME/jars

created doris database and table。

create database mongo_doris;
use mongo_doris;
CREATE TABLE data_sync_test_simple
 (
         _id VARCHAR(32) DEFAULT '',
         id VARCHAR(32) DEFAULT '',
         user_name VARCHAR(32) DEFAULT '',
         member_list VARCHAR(32) DEFAULT ''
 )
 DUPLICATE KEY(_id)
 DISTRIBUTED BY HASH(_id) BUCKETS 10
 PROPERTIES("replication_num" = "1");
INSERT INTO data_sync_test_simple VALUES ('1','1','alex','123');

Input this coed in spark-shell.

import org.apache.doris.spark._
val dorisSparkRDD = sc.dorisRDD(
  tableIdentifier = Some("mongo_doris.data_sync_test"),
  cfg = Some(Map(
    "doris.fenodes" -> "127.0.0.1:8030",
    "doris.request.auth.user" -> "root",
    "doris.request.auth.password" -> ""
  ))
)
dorisSparkRDD.collect()

mongo_doris:doris database name
data_sync_test:doris table mame.
doris.fenodes:doris FE IP:http_port
doris.request.auth.user:doris user name.
doris.request.auth.password:doris password

if Spark is Cluster model,upload Jar to HDFS，add doris-spark-connector jar HDFS URL in spark.yarn.jars.

spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar

Link：apache/doris#9486

in pyspark,input this code in pyspark shell command.

dorisSparkDF = spark.read.format("doris")
.option("doris.table.identifier", "mongo_doris.data_sync_test")
.option("doris.fenodes", "127.0.0.1:8030")
.option("user", "root")
.option("password", "")
.load()
# show 5 lines data 
dorisSparkDF.show(5)

type convertion for writing to doris using arrow

doris	spark
BOOLEAN	BooleanType
TINYINT	ByteType
SMALLINT	ShortType
INT	IntegerType
BIGINT	LongType
LARGEINT	StringType
FLOAT	FloatType
DOUBLE	DoubleType
DECIMAL(M,D)	DecimalType(M,D)
DATE	DateType
DATETIME	TimestampType
CHAR(L)	StringType
VARCHAR(L)	StringType
STRING	StringType
ARRAY	ARRAY
MAP	MAP
STRUCT	STRUCT

Report issues or submit pull request

If you find any bugs, feel free to file a GitHub issue or fix it by submitting a pull request.

Contact Us

Name	Scope
[email protected]	Development-related discussions	Subscribe	Unsubscribe	Archives

doris-spark-connector's People

Stargazers

Watchers

Forkers

morningman jnsimba smallhibiscus zhaomin1423 pan3793 kikyou1997 wuhutongchao yangrong688 lide-reed sysexecve darvenduan caoliang-web qidaye lovegiser wahno zy-kkk hf200012 xlfjcg htyoung zhenhb keren123 hanfee sun-help footprintanalytics gronwd lemonlitree gameofby mullerhai stuliper kioco liujinhui1994 cosmosni chovy-3012 mainmainer caishouwen student2028 wpl-jack lgq19991225 gnehil greitzmann lexluo09 chncaesar baishaoisde mrzhui888 dongliang-0 tangsiyang2001 toroy bowenliang123 lsy3993 bingquanzhao wanghangyu817 xy720 yagagagaga wolfboys shoothzj niocoding knonenull12 ximpermanence codecooker17 liujiwen-up lxwcodemonkey xinxingi huanccwang fornaix wuwenchi gogowen daikon12 yz-jayhua xuflyme jiyulongxu yueym1 cjj2010 myiyang qg-lin xuxiaotuan fluozhiye rookieuncle hechao-ustc jiezi2026 ixzc hantmac smokeriu zhangbaipeng vinlee19 mklzl ikas-warehouse zhaorongsheng liutang123 awol2005ex

doris-spark-connector's Issues

[Enhancement] Optimize log when flush data to Doris

Search before asking

I had searched in the issues and found no similar issues.

Description

When Spark-connector flush data to Doris, it will do retry when flush error.
There will be a warn log when a backend return error, which can cause unnecessary concern among users.

The log is like below

22/04/29 13:10:03 WARN DorisSourceProvider: Failed to load data on BE: http://xxx:8040/api/xxx/xxx/_stream_load? node
22/04/29 13:10:03 WARN DorisSourceProvider: Failed to load data on BE: http://xxx:8040/api/xxx/xxx/_stream_load? node
22/04/29 13:10:03 WARN DorisSourceProvider: Failed to load data on BE: http://xxx:8040/api/xxx/xxx/_stream_load? node

Solution

Change log level from warn to debug.

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] SaveMode.Overwrite failed write data to doris

Search before asking

I had searched in the issues and found no similar issues.

Version

spark-doris-connector-3.2_2.12:1.3.2

What's Wrong?

When use SaveMode.Overwrite to write dataframe, 'No suitable driver found for jdbc' exception occured.

What You Expected?

success write data to doris by SaveMode.Overwrite

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

Spark Doris Connector Release Note 1.2.0

Feature & Improvement

Compile script refactoring optimization
Increase the import of csv format
Support pushdown of doris.filter.query on sparkSQL
Support doris datev2/datetimev2/decimalv3/jsonb/array type
Delete thrfit dependency and introduce thrift sdk
Support setting import interval
Write code refactoring and optimization
Optimize error log output

Thanks

Thanks to everyone who has contributed to this release:
@bowenliang123
@caoliang-web
@chncaesar
@chovy-3012
@DongLiang-0
@gnehil
@JNSimba
@LemonLiTree
@lexluo09
@MrZHui888
@myfjdthink
@smallhibiscus
@timelxy
@wolfboys
@yagagagaga

Search before asking

I had searched in the issues and found no similar issues.

Version

doris 1.2
spark 3.2
spark-doris-connector 1.3.1

What's Wrong?

spark通过connector读取doris中的数据。
如果doris的表包含daev2类型字段，写入数据正常，而读取数据报错：

将datev2类型改为date后，读取数据正常。

What You Expected?

正常读取数据

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

Search before asking

I had searched in the issues and found no similar issues.

Version

doris-spark-connector: 1.4.0

What's Wrong?

What You Expected?

can use "save_mode"

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] not support doris V2 data type

Search before asking

I had searched in the issues and found no similar issues.

Version

spark-doris-connector-3.2_2.12
doris 1.2.3

What's Wrong?

org.apache.doris.spark.exception.DorisException: Unrecognized Doris type DATEV2

What You Expected?

support newly doris data type

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

Search before asking

I had searched in the issues and found no similar issues.

Description

Currently spark-doris-connector does not support array type, so, we need to solve it.

Use case

No response

Related issues

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] data loss when enable 2pc

Search before asking

I had searched in the issues and found no similar issues.

Version

1.3.1

What's Wrong?

The data is lost without any exception when enable doris.sink.enable-2pc and spark jobs running very long time e.g. 3~4hour.

Spark job driver warn logs.

The transcation is removed bacause of reaching timeout limit.

What You Expected?

It should throw exception when failed to commit transaction .

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Enhancement] Add a parameter that controls the number of StreamLoad tasks committed per partition

Search before asking

I had searched in the issues and found no similar issues.

Description

If the amount of data in a partition is greater than INSERT_BARCH_SIZE, each task commits multiple StreamLoad tasks. If the task fails to retry, all data in the partition is recommitted to the StreamLoad task, as well as the data that was previously successfully written. Data duplication occurs.
当一个分区中的数据量大于参数INSERT_BARCH_SIZE时，每个task便会提交多个StreamLoad任务，如果任务发生失败重试，那么该分区的所有数据便会重新提交StreamLoad任务，对于之前成功写入的数据也会重新提交，造成数据重复。

我的建议是增加一个参数，如果开启则强制每个分区只提交一个StreamLoad，保证数据不会被重复提交。

Solution

My suggestion is to add a parameter that, if enabled, forces only one StreamLoad per partition to ensure that data is not repeatedly committed.

我的建议是增加一个参数，如果开启则强制每个分区只提交一个StreamLoad，保证数据不会被重复提交。

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Feature] 缺乏详细参数文档

spark 读写doris，有哪些参数，缺少详细文档，需要看代码。

[Bug] column_separator and line_delimiter not support an invisible character

Search before asking

I had searched in the issues and found no similar issues.

Version

Doris 1.2.3
doris-spark-connector: master

What's Wrong?

pyspark3.1.3

df.write.format("doris")
        .option("doris.table.identifier", "")
        .option("doris.fenodes", "")
        .option("user", "")
        .option("password", "")
        .option("doris.write.fields", ",".join(df.columns))
        .option("doris.sink.batch.size", "20000")
        .option("doris.sink.max-retries", "3")
        .option("doris.sink.properties.format", "csv")
        .option("sink.properties.column_separator", "\\x01")
        .option("sink.properties.line_delimiter", "\\x07")
        .option("doris.sink.batch.interval.ms", "200")

stream load csv failed with invisible character

What You Expected?

like Doris fe
org.apache.doris.analysis.Separator
convert \x01 and invisible character

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Enhancement] Fix performance issues caused by small file issues

Search before asking

I had searched in the issues and found no similar issues.

Description

In the case of many small files, for example, a file has only 100 pieces of data, but there are thousands or more of files, then the partition of the RDD will be greater than or equal to the number of files. At this time, the amount of data carried by the request is small, but the number of requests is large, which leads to the problem of too many versions.

在面临小文件极多的情况，例如，一个文件只有100条数据，但是有几千个甚至文件，这时RDD的分区会大于等于文件数。这样请求携带的数据量很少，但请求数很多，造成版本数过多的问题甚至导入失败。

Solution

Add a RDD maximum partition parameter, but the default is Integer.MAX_VALUE. This parameter is controlled by the user, and the repartition operation can be performed in advance to reduce the number of partitions.

添加一个RDD最大分区参数，但是默认为Integer.MAX_VALUE，由用户控制这个参数，可以提前做repartition操作减少分区数

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] Config doris.read.field does not take effect.

Search before asking

I had searched in the issues and found no similar issues.

Version

spark 3.1.2 scala 2.12

What's Wrong?

Spark dataframe read doris configuration doris.read.field does not take effect.

  val session = SparkSession.builder().master("local[*]").getOrCreate()
    val dorisSparkDF = session.read
      .format("doris")
      .option("doris.fenodes", "127.0.0.1:8030")
      .option("user", "root")
      .option("password", "123456")
      .option("doris.table.identifier", "test.test_tbl")
      .option("doris.read.field", "name")
      .load()

    dorisSparkDF.show()
    session.stop()

The result of executing the above code：

+--------+------+---+
|    name|gender|age|
+--------+------+---+
|    lisi|     3| 13|
|  wangwu|     2| 14|
|zhangsan|     1| 12|
|    张三|  null| 12|
+--------+------+---+

What You Expected?

After the doris.read.field configuration takes effect, the resulting data should look like this：

+--------+
|    name|
+--------+
|    lisi|
|  wangwu|
|zhangsan|
|    张三|
+--------+

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Enhancement] The executor memory usage will be double when write to doris with csv&gz

Search before asking

I had searched in the issues and found no similar issues.

Description

No response

Solution

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Enhancement] build script improvement

Search before asking

I had searched in the issues and found no similar issues.

Description

Currently, in the build.sh script. env checks such as thrift, java, maven, etc., needs to be optimized.

thrift needs to verify the version is 0.13.0
mvnw support(if not install maven)
check java env need to improvement

Solution

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] spark-doris-connector 1.3.0 : Unrecognized Doris type JSON

Search before asking

I had searched in the issues and found no similar issues.

Version

spark-doris-connector 1.3.0
doris 2.0.2

What's Wrong?

Unrecognized Doris type JSON

What You Expected?

Support JSON type

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] 无法将datetime类型数据转换成日期格式

Search before asking

I had searched in the issues and found no similar issues.

Version

1.3.1

What's Wrong?

doris-spark-connector无法将doris库中datetime类型数据在spark中转换为java Date或timestamp类型。目前统一转换为String类型，对于使用非常不方便！

What You Expected?

doris中的datetime类型转换为java中的Date或Timestamp类型

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] spark stream dataframe write to doris json parsing exeception

Search before asking

I had searched in the issues and found no similar issues.

Version

master

What's Wrong?

spark stream write to doris occurs the follwing exception:

com.fasterxml.jackson.core.JsonParseException: Unexpected character ('-' (code 45)): Expected space separating root-level values
 at [Source: (String)"2022-08-23 12:09:22.706"; line: 1, column: 6]
	at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1840)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:712)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:637)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportMissingRootWS(ParserMinimalBase.java:684)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._verifyRootSpace(ReaderBasedJsonParser.java:1678)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._parsePosNumber(ReaderBasedJsonParser.java:1321)
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:769)
	at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4231)
	at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2711)
	at org.apache.doris.spark.sql.DorisStreamLoadSink.$anonfun$write$3(DorisStreamLoadSink.scala:58)
	at org.apache.doris.spark.sql.DorisStreamLoadSink.$anonfun$write$3$adapted(DorisStreamLoadSink.scala:56)
	at scala.collection.immutable.Range.foreach(Range.scala:156)
	at org.apache.doris.spark.sql.DorisStreamLoadSink.$anonfun$write$2(DorisStreamLoadSink.scala:56)
	at org.apache.doris.spark.sql.DorisStreamLoadSink.$anonfun$write$2$adapted(DorisStreamLoadSink.scala:54)
	at scala.collection.Iterator.foreach(Iterator.scala:944)
	at scala.collection.Iterator.foreach$(Iterator.scala:944)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.foreach(WholeStageCodegenExec.scala:753)
	at org.apache.doris.spark.sql.DorisStreamLoadSink.$anonfun$write$1(DorisStreamLoadSink.scala:54)
	at org.apache.doris.spark.sql.DorisStreamLoadSink.$anonfun$write$1$adapted(DorisStreamLoadSink.scala:51)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1020)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

What You Expected?

doris-spark-connector can write stream dataframe to doris

How to Reproduce?

eg:

    df.selectExpr("CAST(timestamp AS STRING)", "CAST(partition as INT)")
      .writeStream
      .format("doris")
      .option("checkpointLocation", "/tmp/test")
      .option("doris.table.identifier", dorisTable)
      .option("doris.fenodes", dorisFeNodes)
      .option("user", dorisUser)
      .option("password", dorisPwd)
      .start().awaitTermination()
    spark.stop()

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] No Doris FE is available when fe use https

Search before asking

I had searched in the issues and found no similar issues.

Version

doris 1.2.3, master

What's Wrong?

when fe switch https then use spark connector while throw error
java.io.IOException: Failed to get response from Doris

What You Expected?

execute spark connector success

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Feature] support spark 3.5

Search before asking

I had searched in the issues and found no similar issues.

Description

[Feature] support spark 3.5

Use case

No response

Related issues

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

Spark Doris Connector Release Note 1.3.0

Feature & Improvement

Support Spark3.3, 3.4
DorisWriter write memory optimization #140
Supports Map, Struct reading and writing, and Array type writing
Support adding hidden delimiters when writing in CSV format
Write supports two-phase commit #122
Add auto-redirect configuration to write data through direct connection fe
Writing supports OverWrite mode
Structured Streaming supports Row format DataFrame writing doris
Optimize some log output

Bug

Fixed the issue where doris.filter.query pushdown does not take effect in some scenarios
Fix the problem of writing null pointer to String type
Fix Structured Streaming writing exception problem

Thanks

@CodeCooker17
@daikon12
@gnehil
@huanccwang
@JNSimba
@shoothzj
@wolfboys

[Bug] can not write data to doris0.12

Search before asking

I had searched in the issues and found no similar issues.

Version

doris version:0.12
spark-doris-connector-3.1_2.12 version:1.0.1

What's Wrong?

When I use spark-doris-connector write dataFrame to doris 0.12 , an error occurs("Connect to doris http://xx:8030/api/backends?is_alive=true failed."),as shown in the figure below:

I guess it's because doris-0.12 has no such interface("api/backends" ),but the official website document says that it supports 0.12+.

Hope to get a reply, thank you.

What You Expected?

support doris 0.12+ or change the official website document

How to Reproduce?

Use the official example to reproduce.

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[improvement] stream load import only support json data

Search before asking

I had searched in the issues and found no similar issues.

Version

3.1_2.12-1.0.1

What's Wrong?

If the data field contains \n \t separator, too many filtered rows will appear when importing data

What You Expected?

When importing data, the data is assembled into JSON format, and stream load is imported in JSON format .

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Feature] Not inculude scala in release jar

Search before asking

I had searched in the issues and found no similar issues.

Description

In short, scala should be provided

Scala packages should not be packaged as this can lead to version conflicts in the environment, and users will need additional shading work.

Use case

No response

Related issues

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] The where condition push-down overrides the doris.filter.query filter condition

Search before asking

I had searched in the issues and found no similar issues.

Version

1.1.1

What's Wrong?

read table from

df_origin = spark.read.format("doris")\
    .option("doris.table.identifier", "db.nft_transactions")\
    .option("doris.fenodes", "")\
    .option("user", "root")\
    .option("password", "password") \
    .option("doris.filter.query", "block_timestamp >= '2022-06-01' and block_timestamp < '2022-06-03'") \
    .option("doris.read.field", "block_timestamp,marketplace_slug") \
    .option("doris.batch.size", 40000) \
    .load() 
df_origin.show()

output:

+----------------+-------------------+
|marketplace_slug| block_timestamp|
+----------------+-------------------+
| aavegotchi|2022-06-01 00:15:02|
| aavegotchi|2022-06-01 00:15:14|
| aavegotchi|2022-06-01 00:15:26|
| aavegotchi|2022-06-01 00:15:38|
| aavegotchi|2022-06-01 00:18:50|
| aavegotchi|2022-06-01 00:20:26|
| aavegotchi|2022-06-01 00:21:10|

doris connector log

22/08/25 03:08:47 DEBUG org.apache.doris.spark.sql.ScalaDorisRowRDD: Query SQL Sending to Doris FE is: 'select marketplace_slug,block_timestamp from db.nft_transactions where block_timestamp >= '2022-06-01' and block_timestamp < '2022-06-03''.

在这个基础上，我们继续做 where 过滤

df_origin.where("marketplace_slug = 'opensea'").show()

doris connector 会将这个过滤这个 where 条件下推到 doris

doris connector log

Query SQL Sending to Doris FE is: 'select marketplace_slug,block_timestamp from gaia_data__origin_data.nft_transactions where (marketplace_slug is not null) and (marketplace_slug = 'opensea')'.

可以看到，推送到 doris 的条件忽略了前面的 block_timestamp filter，导致最终查询结果是

+----------------+-------------------+
|marketplace_slug| block_timestamp|
+----------------+-------------------+
| opensea|2019-08-01 00:06:57|
| opensea|2019-08-01 00:17:42|
| opensea|2019-08-01 00:19:20|
| opensea|2019-08-01 00:38:02|
| opensea|2019-08-01 00:47:38|
| opensea|2019-08-01 00:59:39|

出现了我们不期望的日期的数据

What You Expected?

where 条件下推要结合 doris.filter.query 的过滤条件，需要parse sql，有些麻烦
提供选项关闭 where 条件下推，让 spark 来完成这个 where 过滤

How to Reproduce?

见上文

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] spark doris connector read table error: Doris FE's response cannot map to schema.

Search before asking

I had searched in the issues and found no similar issues.

Version

connector : org.apache.doris:spark-doris-connector-3.1_2.12:1.0.1
doris: 1.1 preview2
spark: 3.1.2

What's Wrong?

Read a table

from pyspark.sql import SparkSession
spark = SparkSession.builder \
 .appName('Spark Doris Demo Nick') \
 .config('org.apache.doris:spark-doris-connector-3.1_2.12:1.0.1') \
 .getOrCreate()
spark

dorisSparkDF = spark.read.format("doris")\
    .option("doris.table.identifier", "db.token_info")\
    .option("doris.fenodes", "xxx:8031")\
    .option("user", "xxx")\
    .option("password", "xxx").load()
dorisSparkDF.show(5)

then get a error

22/06/23 07:47:03 ERROR SchemaUtils: Doris FE's response cannot map to schema. res: {"keysType":"UNIQUE_KEYS","properties":[{"name":"chain","aggregation_type":"","comment":"","type":"STRING"},{"name":"token_slug","aggregation_type":"","comment":"","type":"STRING"},{"name":"token_address","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"token_symbol","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"decimals","aggregation_type":"REPLACE","comment":"","type":"INT"},{"name":"type","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"token_type","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"protocol_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"manual_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"erc20_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"coin_gecko_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"logo","aggregation_type":"REPLACE","comment":"","type":"STRING"}],"status":200}
org.codehaus.jackson.map.exc.UnrecognizedPropertyException: Unrecognized field "keysType" (Class org.apache.doris.spark.rest.models.Schema), not marked as ignorable
 at [Source: java.io.StringReader@74af102e; line: 1, column: 14] (through reference chain: org.apache.doris.spark.rest.models.Schema["keysType"])
	at org.codehaus.jackson.map.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:53)
	at org.codehaus.jackson.map.deser.StdDeserializationContext.unknownFieldException(StdDeserializationContext.java:267)
	at org.codehaus.jackson.map.deser.std.StdDeserializer.reportUnknownProperty(StdDeserializer.java:673)
	at org.codehaus.jackson.map.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:659)
	at org.codehaus.jackson.map.deser.BeanDeserializer.handleUnknownProperty(BeanDeserializer.java:1365)
	at org.codehaus.jackson.map.deser.BeanDeserializer._handleUnknown(BeanDeserializer.java:725)
	at org.codehaus.jackson.map.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:703)
	at org.codehaus.jackson.map.deser.BeanDeserializer.deserialize(BeanDeserializer.java:580)
	at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2732)
	at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1863)
	at org.apache.doris.spark.rest.RestService.parseSchema(RestService.java:295)
	at org.apache.doris.spark.rest.RestService.getSchema(RestService.java:279)
	at org.apache.doris.spark.sql.SchemaUtils$.discoverSchemaFromFe(SchemaUtils.scala:51)
	at org.apache.doris.spark.sql.SchemaUtils$.discoverSchema(SchemaUtils.scala:41)
	at org.apache.doris.spark.sql.DorisRelation.lazySchema$lzycompute(DorisRelation.scala:48)
	at org.apache.doris.spark.sql.DorisRelation.lazySchema(DorisRelation.scala:48)
	at org.apache.doris.spark.sql.DorisRelation.schema(DorisRelation.scala:52)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:449)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 dorisSparkDF = spark.read.format("doris")\
      2     .option("doris.table.identifier", "xxx.token_info")\
      3     .option("doris.fenodes", "xxxx:8031")\
      4     .option("user", "xxxx")\
      5     .option("password", "xxxxx").load()
      6 dorisSparkDF.show(5)

File /usr/lib/spark/python/pyspark/sql/readwriter.py:210, in DataFrameReader.load(self, path, format, schema, **options)
    208     return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
    209 else:
--> 210     return self._df(self._jreader.load())

File /opt/conda/miniconda3/lib/python3.8/site-packages/py4j/java_gateway.py:1304, in JavaMember.__call__(self, *args)
   1298 command = proto.CALL_COMMAND_NAME +\
   1299     self.command_header +\
   1300     args_command +\
   1301     proto.END_COMMAND_PART
   1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
   1305     answer, self.gateway_client, self.target_id, self.name)
   1307 for temp_arg in temp_args:
   1308     temp_arg._detach()

File /usr/lib/spark/python/pyspark/sql/utils.py:111, in capture_sql_exception.<locals>.deco(*a, **kw)
    109 def deco(*a, **kw):
    110     try:
--> 111         return f(*a, **kw)
    112     except py4j.protocol.Py4JJavaError as e:
    113         converted = convert_exception(e.java_exception)

File /opt/conda/miniconda3/lib/python3.8/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o72.load.
: org.apache.doris.spark.exception.DorisException: Doris FE's response cannot map to schema. res: {"keysType":"UNIQUE_KEYS","properties":[{"name":"chain","aggregation_type":"","comment":"","type":"STRING"},{"name":"token_slug","aggregation_type":"","comment":"","type":"STRING"},{"name":"token_address","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"token_symbol","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"decimals","aggregation_type":"REPLACE","comment":"","type":"INT"},{"name":"type","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"token_type","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"protocol_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"manual_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"erc20_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"coin_gecko_slug","aggregation_type":"REPLACE","comment":"","type":"STRING"},{"name":"logo","aggregation_type":"REPLACE","comment":"","type":"STRING"}],"status":200}
	at org.apache.doris.spark.rest.RestService.parseSchema(RestService.java:303)
	at org.apache.doris.spark.rest.RestService.getSchema(RestService.java:279)
	at org.apache.doris.spark.sql.SchemaUtils$.discoverSchemaFromFe(SchemaUtils.scala:51)
	at org.apache.doris.spark.sql.SchemaUtils$.discoverSchema(SchemaUtils.scala:41)
	at org.apache.doris.spark.sql.DorisRelation.lazySchema$lzycompute(DorisRelation.scala:48)
	at org.apache.doris.spark.sql.DorisRelation.lazySchema(DorisRelation.scala:48)
	at org.apache.doris.spark.sql.DorisRelation.schema(DorisRelation.scala:52)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:449)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.codehaus.jackson.map.exc.UnrecognizedPropertyException: Unrecognized field "keysType" (Class org.apache.doris.spark.rest.models.Schema), not marked as ignorable
 at [Source: java.io.StringReader@74af102e; line: 1, column: 14] (through reference chain: org.apache.doris.spark.rest.models.Schema["keysType"])
	at org.codehaus.jackson.map.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:53)
	at org.codehaus.jackson.map.deser.StdDeserializationContext.unknownFieldException(StdDeserializationContext.java:267)
	at org.codehaus.jackson.map.deser.std.StdDeserializer.reportUnknownProperty(StdDeserializer.java:673)
	at org.codehaus.jackson.map.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:659)
	at org.codehaus.jackson.map.deser.BeanDeserializer.handleUnknownProperty(BeanDeserializer.java:1365)
	at org.codehaus.jackson.map.deser.BeanDeserializer._handleUnknown(BeanDeserializer.java:725)
	at org.codehaus.jackson.map.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:703)
	at org.codehaus.jackson.map.deser.BeanDeserializer.deserialize(BeanDeserializer.java:580)
	at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2732)
	at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1863)
	at org.apache.doris.spark.rest.RestService.parseSchema(RestService.java:295)
	... 23 more

What You Expected?

There should be no errors

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] Unrecognized Doris type DATETIMEV2

Search before asking

I had searched in the issues and found no similar issues.

Version

spark-doris-connector-2.3_2.11 1.1.0

What's Wrong?

Exception in thread "main" org.apache.doris.spark.exception.DorisException: Unrecognized Doris type DATETIMEV2

What You Expected?

should sport DATETIMEV2

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Feature] Support Spark3.3 New Feature

Search before asking

I had searched in the issues and found no similar issues.

Description

implement spark catalog api using spark sql to manage table in apache doris.(eg: create table 、drop table )
implement spark DS V2 API like aggregate query push down to doris and limit query push down doris.

Some implement Spark DS V2 connector ：

Use case

No response

Related issues

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] Unable to correctly recognize time partition fields

Search before asking

I had searched in the issues and found no similar issues.

Version

doris-spark-connector：1.3.0-1.3.2
doris：2.0
hive：3.1.3
hadoop：3.3.4
spark：3.3.1

What's Wrong?

spark-sql (default)> CREATE TEMPORARY VIEW dwd_test
> USING doris
> OPTIONS(
> 'table.identifier'='dw_dwd.dwd_test',
> 'fenodes'='xxx:8030',
> 'user'='xxxx',
> 'password'='xxx',
> 'sink.properties.format' = 'json'
> );
Response code
Time taken: 3.393 seconds
spark-sql (default)> select * from dwd_test where dt ='2024-01-02' limit 3;
14:07:18.625 [main] ERROR org.apache.doris.spark.sql.ScalaDorisRowRDD - Doris FE's response cannot map to schema. res: {"exception":"errCode = 2, detailMessage = Incorrect datetime value: CAST(2021 AS DATETIME) in expression: (CAST(dt AS DATETIME) = CAST(2021 AS DATETIME))","status":400}
org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "exception" (class org.apache.doris.spark.rest.models.QueryPlan), not marked as ignorable (3 known properties: "partitions", "status", "opaqued_query_plan"])
at [Source: (String)"{"exception":"errCode = 2, detailMessage = Incorrect datetime value: CAST(2021 AS DATETIME) in expression: (CAST(dt AS DATETIME) = CAST(2021 AS DATETIME))","status":400}"; line: 1, column: 15] (through reference chain: org.apache.doris.spark.rest.models.QueryPlan["exception"])
at org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2036) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1678) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:320) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3629) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3597) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rest.RestService.getQueryPlan(RestService.java:284) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rest.RestService.findPartitions(RestService.java:261) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rdd.AbstractDorisRDD.dorisPartitions$lzycompute(AbstractDorisRDD.scala:58) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rdd.AbstractDorisRDD.dorisPartitions(AbstractDorisRDD.scala:57) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rdd.AbstractDorisRDD.getPartitions(AbstractDorisRDD.scala:35) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:476) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:459) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:451) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$2(SparkSQLDriver.scala:69) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at scala.collection.Iterator.foreach(Iterator.scala:943) ~[scala-library-2.12.15.jar:?]
at scala.collection.Iterator.foreach$(Iterator.scala:943) ~[scala-library-2.12.15.jar:?]
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) ~[scala-library-2.12.15.jar:?]
at scala.collection.IterableLike.foreach(IterableLike.scala:74) ~[scala-library-2.12.15.jar:?]
at scala.collection.IterableLike.foreach$(IterableLike.scala:73) ~[scala-library-2.12.15.jar:?]
at scala.collection.AbstractIterable.foreach(Iterable.scala:56) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:286) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_212]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_212]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_212]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_212]
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ~[spark-core_2.12-3.3.1.jar:3.3.1]
14:07:18.643 [main] ERROR org.apache.spark.sql.hive.thriftserver.SparkSQLDriver - Failed in [select * from dwd_cc_trade_pay_success_di where dt ='2024-01-02' limit 3]
org.apache.doris.spark.exception.DorisException: Doris FE's response cannot map to schema. res: {"exception":"errCode = 2, detailMessage = Incorrect datetime value: CAST(2021 AS DATETIME) in expression: (CAST(dt AS DATETIME) = CAST(2021 AS DATETIME))","status":400}
at org.apache.doris.spark.rest.RestService.getQueryPlan(RestService.java:292) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rest.RestService.findPartitions(RestService.java:261) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rdd.AbstractDorisRDD.dorisPartitions$lzycompute(AbstractDorisRDD.scala:58) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rdd.AbstractDorisRDD.dorisPartitions(AbstractDorisRDD.scala:57) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rdd.AbstractDorisRDD.getPartitions(AbstractDorisRDD.scala:35) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:476) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:459) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:451) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$2(SparkSQLDriver.scala:69) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) ~[spark-sql_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at scala.collection.Iterator.foreach(Iterator.scala:943) ~[scala-library-2.12.15.jar:?]
at scala.collection.Iterator.foreach$(Iterator.scala:943) ~[scala-library-2.12.15.jar:?]
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) ~[scala-library-2.12.15.jar:?]
at scala.collection.IterableLike.foreach(IterableLike.scala:74) ~[scala-library-2.12.15.jar:?]
at scala.collection.IterableLike.foreach$(IterableLike.scala:73) ~[scala-library-2.12.15.jar:?]
at scala.collection.AbstractIterable.foreach(Iterable.scala:56) ~[scala-library-2.12.15.jar:?]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:286) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_212]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_212]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_212]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_212]
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) ~[spark-core_2.12-3.3.1.jar:3.3.1]
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ~[spark-core_2.12-3.3.1.jar:3.3.1]
Caused by: org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "exception" (class org.apache.doris.spark.rest.models.QueryPlan), not marked as ignorable (3 known properties: "partitions", "status", "opaqued_query_plan"])
at [Source: (String)"{"exception":"errCode = 2, detailMessage = Incorrect datetime value: CAST(2021 AS DATETIME) in expression: (CAST(dt AS DATETIME) = CAST(2021 AS DATETIME))","status":400}"; line: 1, column: 15] (through reference chain: org.apache.doris.spark.rest.models.QueryPlan["exception"])
at org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2036) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1678) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:320) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3629) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3597) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
at org.apache.doris.spark.rest.RestService.getQueryPlan(RestService.java:284) ~[spark-doris-connector-3.3_2.12-1.3.2.jar:1.4.0-SNAPSHOT]
... 55 more
org.apache.doris.spark.exception.DorisException: Doris FE's response cannot map to schema. res: {"exception":"errCode = 2, detailMessage = Incorrect datetime value: CAST(2021 AS DATETIME) in expression: (CAST(dt AS DATETIME) = CAST(2021 AS DATETIME))","status":400}
at org.apache.doris.spark.rest.RestService.getQueryPlan(RestService.java:292)
at org.apache.doris.spark.rest.RestService.findPartitions(RestService.java:261)
at org.apache.doris.spark.rdd.AbstractDorisRDD.dorisPartitions$lzycompute(AbstractDorisRDD.scala:58)
at org.apache.doris.spark.rdd.AbstractDorisRDD.dorisPartitions(AbstractDorisRDD.scala:57)
at org.apache.doris.spark.rdd.AbstractDorisRDD.getPartitions(AbstractDorisRDD.scala:35)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:292)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:288)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:476)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:459)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48)
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:451)
at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$2(SparkSQLDriver.scala:69)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:286)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "exception" (class org.apache.doris.spark.rest.models.QueryPlan), not marked as ignorable (3 known properties: "partitions", "status", "opaqued_query_plan"])
at [Source: (String)"{"exception":"errCode = 2, detailMessage = Incorrect datetime value: CAST(2021 AS DATETIME) in expression: (CAST(dt AS DATETIME) = CAST(2021 AS DATETIME))","status":400}"; line: 1, column: 15] (through reference chain: org.apache.doris.spark.rest.models.QueryPlan["exception"])
at org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:2036)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1678)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:320)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:177)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3629)
at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3597)
at org.apache.doris.spark.rest.RestService.getQueryPlan(RestService.java:284)
... 55 more

What You Expected?

i can correctly query data from doris using date format the filter field is partition fields

How to Reproduce?

No response

Anything Else?

i can correctly query data from doris using date format when the filter field is not partition fields

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

Spark Doris Connector Release Note 1.3.1

Feature & Improvement

Automatically turn on redirection, default value policy #183
Add gz compression when writing in csv format #180
Add benode writing #177
Add itcase #175
Support json type reading #174
Add arrow type writing #161

Bug

Fix struct type writing logic
Fix the problem caused by null when writing csv #166
Fix the problem of timestamp writing accuracy #165

Thanks

@caicancai
@caoliang-web
@fluozhiye
@gnehil
@JNSimba
@MYiYang
@wuwenchi

[Bug] cannot use bitmap to doris by doris spark connector

Search before asking

I had searched in the issues and found no similar issues.

Version

org.apache.doris spark-doris-connector-3.2_2.12 1.3.0

What's Wrong?

use the official documentation to write bitmap in spark df mode use bitmap_from_array function to write data,like this
df.write.format("doris")
.option("doris.table.identifier", s"$tableName")
.option("doris.fenodes", s"$url:$sql")
.option("user", s"$username")
.option("password", s"$password")
.option("sink.batch.size", 100)
.option("sink.max-retries", 3)
.option("doris.ignore-type", "bitmap")
.option("doris.deserialize.arrow.async", true)
.option("doris.deserialize.queue.size", 64)
.option("sink.properties.column_separator", "|")
.option("sink.properties.format", "json")
//其它选项
//指定你要写入的字段
.option("doris.write.fields", fieldList)
.save()
fieldList=tag_id, UPDATED_TIME, tag_value, user_ids,CREATED_TIME, UPDATED_BY ,user_id=bitmap_from_array(user_ids)

What You Expected?

connector can support this function like bitmap_from_array

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Feature] support spark catalog

Search before asking

I had searched in the issues and found no similar issues.

Description

support spark catalog @hf200012

Use case

No response

Related issues

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] ConnectedFailedException: Connect to Doris BE{host='xxx', port=9060}failed

Search before asking

I had searched in the issues and found no similar issues.

Version

doris: master
spark: 3.3.1
scala: 2.12

What's Wrong?

doris spark connector select data failed.

What You Expected?

select data success

How to Reproduce?

doris side:
CREATE TABLE spark_connector_test_decimal (c1 int NOT NULL, c2 VARCHAR(25) NOT NULL, c3 VARCHAR(152),
c4 boolean,
c5 tinyint,
c6 smallint,
c7 bigint,
c8 float,
c9 double,
c10 datev2,
c11 datetime,
c12 char,
c13 largeint,
c14 varchar,
c15 decimalv3(15, 5)
)
DUPLICATE KEY(c1)
COMMENT "OLAP"
DISTRIBUTED BY HASH(c1) BUCKETS 1
PROPERTIES (
"replication_num" = "1"
);

insert into spark_connector_test_decimal values(10000,'aaa','abc',true, 100, 3000, 100000, 1234.567, 12345.678, '2022-12-01','2022-12-01 12:00:00', 'a', 200000, 'g', 1000.12345);
insert into spark_connector_test_decimal values(10001,'aaa','abc',false, 100, 3000, 100000, 1234.567, 12345.678, '2022-12-01','2022-12-01 12:00:00', 'a', 200000, 'g', 1000.12345);
insert into spark_connector_test_decimal values(10002,'aaa','abc',True, 100, 3000, 100000, 1234.567, 12345.678, '2022-12-01','2022-12-01 12:00:00', 'a', 200000, 'g', 1000.12345);
insert into spark_connector_test_decimal values(10003,'aaa','abc',False, 100, 3000, 100000, 1234.567, 12345.678, '2022-12-01','2022-12-01 12:00:00', 'a', 200000, 'g', 1000.12345);

select * from spark_connector_test_decimal;

spark side:
CREATE TEMPORARY VIEW spark_doris_decimal
USING doris
OPTIONS(
"table.identifier"="sparkconnector.spark_connector_test_decimal",
"fenodes"="fe_host:8030",
"user"="root",
"password"=""
);

select * from spark_doris_decimal;

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] connot sink spark array to doris by connector

Search before asking

I had searched in the issues and found no similar issues.

Version

org.apache.doris spark-doris-connector-3.2_2.12 1.2.0

What's Wrong?

canot wirte array to doris

What You Expected?

wirte to doris table but wirte null

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] CREATE TEMPORARY VIEW USING doris got error

Search before asking

I had searched in the issues and found no similar issues.

Version

Doris 1.0\ Spark-Doris-Connector master \ spark3.2.1

What's Wrong?

CREATE TEMPORARY VIEW spark_doris
USING doris
OPTIONS(
  "table.identifier"="db_01.datax",
  "fenodes"="10.0.105.243:8030",
  "user"="doris",
  "password"="�Doris"
);

[Code: 0, SQL State: ]  Error operating EXECUTE_STATEMENT: org.apache.doris.spark.exception.DorisException: Doris FE's response cannot map to schema. res: "Access denied for default_cluster:[email protected]"
	at org.apache.doris.spark.rest.RestService.parseSchema(RestService.java:303)
	at org.apache.doris.spark.rest.RestService.getSchema(RestService.java:279)
	at org.apache.doris.spark.sql.SchemaUtils$.discoverSchemaFromFe(SchemaUtils.scala:53)
	at org.apache.doris.spark.sql.SchemaUtils$.discoverSchema(SchemaUtils.scala:43)
	at org.apache.doris.spark.sql.DorisRelation.lazySchema$lzycompute(DorisRelation.scala:48)
	at org.apache.doris.spark.sql.DorisRelation.lazySchema(DorisRelation.scala:48)
	at org.apache.doris.spark.sql.DorisRelation.schema(DorisRelation.scala:52)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:440)
	at org.apache.spark.sql.execution.datasources.CreateTempViewUsing.run(ddl.scala:98)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:100)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.withLocalProperties(ExecuteStatement.scala:159)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.org$apache$kyuubi$engine$spark$operation$ExecuteStatement$$executeStatement(ExecuteStatement.scala:94)
	at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$1.run(ExecuteStatement.scala:127)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.InvalidFormatException: Cannot deserialize value of type `int` from String "Access denied for default_cluster:[email protected]": not a valid `int` value
 at [Source: (String)""Access denied for default_cluster:[email protected]""; line: 1, column: 1]
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.exc.InvalidFormatException.from(InvalidFormatException.java:67)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.DeserializationContext.weirdStringException(DeserializationContext.java:1851)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.DeserializationContext.handleWeirdStringValue(DeserializationContext.java:1079)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.std.StdDeserializer._parseIntPrimitive(StdDeserializer.java:762)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.std.StdDeserializer._deserializeFromString(StdDeserializer.java:288)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromString(BeanDeserializerBase.java:1495)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:207)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:197)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4593)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3548)
	at org.apache.doris.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3516)
	at org.apache.doris.spark.rest.RestService.parseSchema(RestService.java:295)
	... 48 more

What You Expected?

don't know what's wrong

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] 使用doris-spark-connector-3.3_2.12.jar 导数，执行中 repartition 的stage 并行度一直为1，导致做

Search before asking

I had searched in the issues and found no similar issues.

Version

version: spark 3.3.1

What's Wrong?

我的repartition设置是10，但是在stage中执行repartition 并行度任务数一直是1，导致分区执行完成不了；
定位源码，我想知道这个是spark3.3 需要配置什么参数么，还是说 dataframe 到 repartition 之前因为什么操作导致并行度降到1呢
希望指导下

What You Expected?

我想知道 spark3.3 是否需要配置什么参数，从stage上看是不是 deserializeToObejct 导致并发降低到1呢，需要什么配置什么spark参数防止这种

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug]https://repo1.maven.org/maven2/org/apache/doris/spark-doris-connector-3.2_2.12/1.3.0/spark-doris-connector-3.2_2.12-1.3.0.pom: expected='1.3.0 found='1.3.0-SNAPSHOT'

Search before asking

I had searched in the issues and found no similar issues.

Version

v1.3.0和v1.2.0都有这个问题

What's Wrong?

**仓库的pom文件没有更新
<thrift-service.version>1.0.0</thrift-service.version>
<netty.version>4.1.77.Final</netty.version>
<arrow.version>5.0.0</arrow.version>
<spark.major.version>3.1</spark.major.version>
<libthrift.version>0.16.0</libthrift.version>
<project.scm.id>github</project.scm.id>
<fasterxml.jackson.version>2.10.5</fasterxml.jackson.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
1.2.0-SNAPSHOT
<scala.version>2.12</scala.version>
<spark.version>3.1.2</spark.version>

What You Expected?

修改下

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

Spark Doris Connector Release Note 1.1.0

Feature

Support Spark3.2.x version

Bugfix

stream load data is converted to json format
Deserialize failed caused by the new field:keysType
Fix doris.read.field configuration does not take effect

Thanks

Thanks to everyone who has contributed to this release:

@cxzl25
@lide-reed
@LOVEGISER
@Kikyou1997
@morningman
@pan3793
@qidaye
@smallhibiscus

[Bug] doris-2.0.1 partial_columns update error

Search before asking

I had searched in the issues and found no similar issues.

Version

doris：2.0.1
spark connector：spark-doris-connector-3.2_2.12

What's Wrong?

there is a error when I use partial_columns update :Partial update should include all key columns, missing: end_sys_imp_date

What You Expected?

partial_columns were updated successfully
It can be successful using csv and stream load way

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

Spark Doris Connector Release Note 1.3.2

Features & Improvements

Reconstruct Loader writing and Support writing in copy mode #187 #190
Write to support https #189
Support reading varint ipv4 ipv6 and other types #199 #197
Support doris2.1 date/datetime reading format #193

Bug

Fix the problem of error reporting when column name is keyword when writing #186
Fix the problem of data duplication caused by repartition #191
Fix the problem of data duplication during retry

Behavior Change

Some default values of Reader have been modified #196

Thanks

@gnehil
@JNSimba
@lxwcodemonkey
@smallhibiscus
@vinlee19

[Feature] support https

Search before asking

I had searched in the issues and found no similar issues.

Description

Since doris version 2.0, fe and be support https. The spark connector should also support HTTPS requests to meet security requirements.

Use case

No response

Related issues

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] 最新版的connector在maven仓库中找不到

Search before asking

I had searched in the issues and found no similar issues.

Version

1.3

What's Wrong?

最新版的connector在maven仓库中找不到

What You Expected?

doris-spark-connector 在maven仓库中能找到

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Enhancement] can provide a json format read data type like flink-connector

Search before asking

I had searched in the issues and found no similar issues.

Description

source data incude chinese word may cause some problem
Reason: actual column number is less than schema column number.actual number: 10, column separator: [ ], line delimiter: [
],
can spark like flink connetor read data by json format to avoid the problem
JSON格式导入
'sink.properties.format' = 'json' 'sink.properties.read_json_by_line' = 'true'

Solution

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Feature] Support Spark3.2 compilation

Search before asking

I had searched in the issues and found no similar issues.

Description

No response

Use case

No response

Related issues

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug] Deserialization failed in the RestService::getSchema method

Search before asking

I had searched in the issues and found no similar issues.

Version

Doris: 1.0rc
spark-connector: spark-doris-connector-3.1_2.12 v1.0.1

What's Wrong?

When Spark buildScan for DorisRelation, it will invoke the RestService::getSchema, and cause the definition of Schema lacks the 'keysType' field which was added in the HTTP interface since 2022.1.3,it will throw a Exception for the deserialization failed.

What You Expected?

The HTTP response string should be deseriazed normally

How to Reproduce?

Just use the Spark-connector with the newest version of Doris,and submit query job with it.

Anything Else?

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

apache / doris-spark-connector Goto Github PK

doris-spark-connector's Introduction

Spark Connector for Apache Doris

Spark Doris Connector

License

How to Build

QuickStart

type convertion for writing to doris using arrow

Report issues or submit pull request

Contact Us

Links

doris-spark-connector's People

Stargazers

Watchers

Forkers

doris-spark-connector's Issues

Search before asking

Description

Solution

Are you willing to submit PR?

Code of Conduct

Search before asking

Version

What's Wrong?

What You Expected?

How to Reproduce?

Anything Else?

Are you willing to submit PR?

Code of Conduct

Feature & Improvement

Thanks

Search before asking

Version

What's Wrong?

What You Expected?

How to Reproduce?

Anything Else?

Are you willing to submit PR?

Code of Conduct

Search before asking

Version

What's Wrong?

What You Expected?

How to Reproduce?

Anything Else?

Are you willing to submit PR?

Code of Conduct

Search before asking

Version

What's Wrong?

What You Expected?

How to Reproduce?

Anything Else?

Are you willing to submit PR?

Code of Conduct

Search before asking

Description

Use case

Related issues

Are you willing to submit PR?

Code of Conduct

Search before asking

Version

What's Wrong?

What You Expected?

How to Reproduce?

Anything Else?

Are you willing to submit PR?

Code of Conduct

Search before asking

Description

Solution

Are you willing to submit PR?

Code of Conduct

Search before asking

Version

What's Wrong?

What You Expected?

How to Reproduce?

Anything Else?