Comments (2)
Hello!
Apologies for the delayed response! We've been hard at work on some new features and our 12/16 release. Thanks for positive feedback - it's always awesome to hear things are working well.
In the long-term, we have a few things in mind. Unfortunately, I cannot go too much into our long-term roadmap (timing or commitment) but we've definitely discussed a few things:
- When initialization actions should run (before daemons, after, both?)
- Parameters to init actions
- Alternatives to initialization actions
- Templatizing clusters
I think we'll likely expand the richness of init actions (parameters, etc.) and they will be a core part of Dataproc in the long term. We are also thinking about ways we can expand deployment and provisioning in the future, especially if other frameworks (like Flink) become popular to use alongside Spark/Hadoop (Yarn + HDFS.)
So, the tl;dr is it's something we have in mind. You're going to see better initialization action support in the future and in the longer-term, some more thinking (and possibly features) around deployment. You can always ping us at dataproc-feedback@google
with ideas and suggestions - the dev team and I are on the list and always appreciate feedback, insights, and ideas.
Cheers!
James
(Product Manager, Cloud Dataproc)
from initialization-actions.
Hi James (@evilsoapbox), thank you for this thoughtful reply, and I look forward to what you are developing—sounds exciting! Hopefully along the way the open source community can also help contribute to expanding functionality of dataproc (through init actions or more). As dataproc already provides a solid service for computation using open computational frameworks like Spark, it just make sense to help out!
Just to clarify, would you prefer that future ideas and suggestions be emailed instead of posted as issues here? For instance, I was going to open an issue to ask for automated testing and had a few ideas for that. :)
Thanks again and cheers!
from initialization-actions.
Related Issues (20)
- [hue] hive editor missing.
- [oozie] intermittent error writing to HDFS during init action HOT 1
- [gpu] ml-on-gcp repo (gpu metrics dependency) to be archived
- Missing linux headers on debian dataproc instances after update HOT 6
- Terraform provider does not offer a sequential ordering option - implement as init action HOT 2
- [bigtable] 2.1 clusters fail to come online with stock bigtable/bigtable.sh HOT 2
- [livy] update livy init action for 2.1 HOT 1
- [rapids] please update to work with latest dask-rapids v22.12
- [gpu] Driver does not install on 2.2 Rocky/Ubuntu images
- [zeppelin] not supported on 2.1+ image versions HOT 1
- Error on wget livy binary naming HOT 5
- [spark-rapids] Drop Spark 2.x support in spark-rapids.sh
- [gpu] apt-get update Init script seeing broken repositories HOT 2
- [bigtable] apt-get update Init script seeing broken repositories
- [cloud-sql-proxy] Running the Cloud SQL Proxy as a persistent service
- Update initialization scripts to install latest RAPIDS `23.12` OR `24.02`
- [gpu] Add tests for GPU agent
- initialization actions which use apt-get update fail due to purged oldoldstable backports repository HOT 10
- rstudio.sh is unable to get the receive keys. Maybe due to invalid repo key. HOT 1
- Dataproc "apt-get update" failed on ubuntu20 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from initialization-actions.