Hello,
Good job on Spark.jl.
I have a issue, I tried to learn Spark and I followed the documentation:
This is a quick introduction into the Spark.jl core functions. It closely follows the official PySpark tutorial and copies many examples verbatim. In most cases, PySpark docs should work for Spark.jl as is or with little adaptation.
I found that it is possible to Create a PySpark DataFrame from a pandas DataFrame. I tried to create a spark DataFrame from julia DataFrames.jl.
`df = spark.createDataFrame([
Row(a=1, b=2.0, c="string1", d=Date(2000, 1, 1), e=DateTime(2000, 1, 1, 12, 0)),
Row(a=2, b=3.0, c="string2", d=Date(2000, 2, 1), e=DateTime(2000, 1, 2, 12, 0)),
Row(a=4, b=5.0, c="string3", d=Date(2000, 3, 1), e=DateTime(2000, 1, 3, 12, 0))
])
+---+---+-------+----------+-------------------+
| a| b| c| d| e|
+---+---+-------+----------+-------------------+
| 1|2.0|string1|2000-01-01|2000-01-01 13:00:00|
| 2|3.0|string2|2000-02-01|2000-01-02 13:00:00|
| 4|5.0|string3|2000-03-01|2000-01-03 13:00:00|
+---+---+-------+----------+-------------------+
julia> println(df)
+---+---+-------+----------+-------------------+
| a| b| c| d| e|
+---+---+-------+----------+-------------------+
| 1|2.0|string1|2000-01-01|2000-01-01 13:00:00|
| 2|3.0|string2|2000-02-01|2000-01-02 13:00:00|
| 4|5.0|string3|2000-03-01|2000-01-03 13:00:00|
+---+---+-------+----------+-------------------+
julia> df_dataframes = DataFrames.DataFrame(A=1:4, B=["M", "F", "F", "M"])
4×2 DataFrame
Row │ A B
│ Int64 String
─────┼───────────────
1 │ 1 M
2 │ 2 F
3 │ 3 F
4 │ 4 M
julia> df = spark.createDataFrame(df_dataframes)
ERROR: MethodError: no method matching createDataFrame(::SparkSession, ::DataFrames.DataFrame)
Closest candidates are:
createDataFrame(::SparkSession, ::Vector{Vector{Any}}, ::Union{String, Vector{String}}) at ~/.julia/packages/Spark/89BUd/src/session.jl:92
createDataFrame(::SparkSession, ::Vector{Row}) at ~/.julia/packages/Spark/89BUd/src/session.jl:98
createDataFrame(::SparkSession, ::Vector{Row}, ::Union{String, Vector{String}}) at ~/.julia/packages/Spark/89BUd/src/session.jl:87
...
Stacktrace:
[1] (::Spark.DotChainer{SparkSession, typeof(Spark.createDataFrame)})(args::DataFrames.DataFrame)
@ Spark ~/.julia/packages/Spark/89BUd/src/chainable.jl:13
[2] top-level scope
@ REPL[21]:1
`
Is there a way to create a spark dataframe from DataFrames.jl? Or Do i have to use Pandas.jl.
regards