Comments (20)
The more feedback and participation from .NET for Spark users, the more commitment we could expect from the dotnet team.
Reading through the issues, PRs, and discussions here, it seems to me that the dotnet team needs MORE participation and feedback to justify MORE commitment to this project.
Nothing to be alarming!
We need more active Participation from users HERE! => put some emoji, here and there, raise some feedback WILL HELP to keep the dotnet team continue the support for .NET for SPARK
from spark.
It seems the support for .NET for Spark is back :-) We will be getting the .NET6 version soon.
from spark.
@lloydjatkinson
FYI, The "Synapse Analytics" PG has started putting that warning all over the place. It even appears in some docs that have absolutely nothing to do with Synapse.
Technically it is not even true, since the .Net 6 PR is in place (see #1112 ) At this point it appears that the communication is willfully incorrect.
What happened is that I had run into a bug on the Synapse-Spark platform, which did NOT occur on any other Spark platform. The root cause of the bug was not actually even related to .Net, in the final analysis. Instead of working on the investigation and working on the fix for the bug, this Synapse PG decided they didn't want to support .Net anymore. So they killed .Net in Synapse 3.3, and they started down the scorched-earth path to make sure nobody ever tries to use .Net-for-Spark on any of the competing cloud platforms either (eg. Databricks or HDI).
Given that they are now making people fearful about this project, they are certainly accomplishing their mission!
I've used .Net for Spark many years and it is a beautiful thing. The PR for .Net 6 is going to allow us to look forward to many more years. This project basically allows .Net to piggyback on the Python bindings, and prevents the need for a .Net programmer to resort to another (inferior) language in order to do our MPP data transformations and ETL's.
from spark.
I am able to use UDFs. I'm compiling Microsoft.Spark.Worker
with .Net 7 and would be using the jar from the NuGet package except I needed to run with Spark 3.2.2 not 3.2.1 so I had to build the jars myself. I use the latest NuGet package and copy the custom jar into the output directory whenever I build. The main issue I ran into was dependency resolution but compiling Microsoft.Spark.Worker
and my spark program with dotnet publish --self-contained
has solved that.
from spark.
* why the provided `microsoft-spark-3-2_2.12-2.1.1.jar ` from nuget does not allow you to run Spark 3.2.2
The most recent nuget package is from 9 months ago https://www.nuget.org/packages/Microsoft.Spark/2.1.1
And the change allowing 3.2.2 was merged 1 month ago #1122
The .Net for Spark team will need to do a new release.
* does the newly build `microsoft-spark-3-2_2.12-2.1.1.jar ` support 3.2.3 ?
If you just built it locally, yes, it should. If you try to spark-submit with an unsupported version, it will fail quickly and tell you want versions it does support.
* What need to be done to create `microsoft-spark-3-3` folder in the` spark\src\scala` folder
I don't know. Spark 3.3 is not supported yet so I don't think it'd produce that jar.
What are the additional modifications needed to bring the existing spark-3.3 to be compatible with the
microsoft-spark-3-3
I have no idea. I don't have time to dig around the internals or look through previous PRs that added support for previous versions, etc.!
What changes have you made to overcome
#pragma warning disable SYSLIB0011
just like got merged in with the .Net 6 PR 🤷
from spark.
I am no longer maintaining this repo (no write access), but I think @AFFogarty / @suhsteve should be able to answer your question.
from spark.
Just curious, hoping for a small comment
Do you see a possibility of using IKVM to eventually make Spark works in .NET?
#IKVM now support .NET6
from spark.
Anyway, what has changed your mind @GeorgeS2019? The fact that there're recent PRs and commits, how is this demonstrating that Microsoft still supports .NET for Spark? I'm really worried about this, because we adopted .NET for Spark few months ago.
from spark.
Questions for ALL!
How many of you are able to get UDF to work? if so, which version of Microsoft Spark e.g. microsoft-spark-3-2.jar, which version of Microsoft.Spark.dll
from spark.
@dragorosson
I am still learning.
I needed to run with Spark 3.2.2 not 3.2.1 so I had to build the jars myself.
microsoft-spark-3-2_2.12-2.1.1.jar
is build with the latest commit when running mvn clean package
microsoft-spark-3-2_2.12-2.1.1.jar
is distributed with nuget.
Sorry for my ignorance:
- why the provided
microsoft-spark-3-2_2.12-2.1.1.jar
from nuget does not allow you to run Spark 3.2.2
object DotnetRunner extends Logging {
private val DEBUG_PORT = 5567
private val supportedSparkMajorMinorVersionPrefix = "3.2"
private val supportedSparkVersions = Set[String]("3.2.0", "3.2.1", "3.2.2", "3.2.3")
-
does the newly build
microsoft-spark-3-2_2.12-2.1.1.jar
support 3.2.3 ? -
What need to be done to create
microsoft-spark-3-3
folder in thespark\src\scala
folder
What are the additional modifications needed to bring the existing spark-3.3 to be compatible with the
microsoft-spark-3-3
?
Ref:
Support for 3.2.2
Support for 3.2.3
from spark.
What changes have you made to overcome
from spark.
FYI: UDF
UDF now works. Here is the key
from spark.
As someone that has never used Apache Spark and seeing this big warning in the first page of the docs, and then seeing this issue, I do not feel confident in using it if it's always going to be so far behind it needs a special warning...
.NET for Apache Spark targets an out of support version of .NET (.NET Core 3.1). For more details see the .NET Support Policy.
from spark.
OK this goes over by head a bit and sounds a bit political? Is Microsoft or the .NET team aware of this? What is the plan going forward?
from spark.
Hi Team: Any timeline for the above PR and possible new release of this library. We are heavily invested in this library (because of our existing .Net dependency) and has been waiting to see a full compatibility of .Net 6.0 & Spark 3.2 (or above) from longtime! I have raised couple of feature requests (#958, #983, #1100) a while back but not much traction on those. Could you please let us know if this library is going to be supported going forward and a possible timeframe for an updated version.
from spark.
cc @imback82
from spark.
Hi @imback82 , @Niharikadutta , @GeorgeS2019 , @dbeavon,
Could you please let me know if we can depend on this library and if there's any chance we see a new version of this library at all?
Thanks
from spark.
Hi @imback82 , @Niharikadutta , @GeorgeS2019 , @dbeavon, Could you please let me know if we can depend on this library and if there's any chance we see a new version of this library at all? Thanks
^ + @suhsteve @AFFogarty
from spark.
@dragorosson
I hope everyone here is clear. I am independent, with nothing to do with Dotnet or Microsoft.
from spark.
@imback82 Since you are not following Microsoft NDA, I think it's easier from your side. As I investigated, Spark.NET team is inside the Microsoft Fabric team which is different from Azure Synapse team (it belongs to Azure SQL team as I know. Correct me if I'm wrong).
Short Question: Is Microsoft Fabric dying?
from spark.
Related Issues (20)
- If you are a webapi project, you also need to use spark-submit? Is there a webapi example to refer to
- Can't a .netcore program connect to a remote spark cluster? If so, what should I do? please help me
- Support for NotebookUtils
- [BUG]: Hive incompatibility when using microsoft-spark-3-1_2.12-2.1.1.jar HOT 1
- [FEATURE REQUEST]: Benchmark Spark.NET versus PySpark and SparkR
- [BUG]: HOT 1
- [FEATURE REQUEST]: .Net 6.0/7.0 Support HOT 24
- [FEATURE REQUEST]: Status of Project HOT 1
- [BUG]: When collected, long values are cast to int
- Question: How to use DataFrame API to achieve the function equivalent to map/reduce in spark.net
- support Apache Spark 3.4 HOT 4
- [BUG]: Failed to execute 'collectToPython' on 'org.apache.spark.sql.Dataset' with args=()
- [FEATURE REQUEST]: Spark version 3.1.3 is not supported by current dotnet on spark code. This is preventing Migration to HDI 5.0 which uses spark version 3.1.3 HOT 1
- Can we breathe life back into this project? HOT 23
- [BUG]: HOT 11
- I am facing the following issue: The system cannot find the path specified but my pyspark opens up. HOT 4
- [FEATURE REQUEST]: Replacement for BinaryFormatter HOT 1
- [FEATURE REQUEST]: Stop targeting .net standard (both 2.0 and 2.1)
- [FEATURE REQUEST]: .Net 8.0 support
- [FEATURE REQUEST]: Deprecate and/or evict Microsoft.Data.Analysis from the Microsoft.Spark assembly HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark.