amunategui.github.io's People
Forkers
amit-dingare mahmudrahman gokul180288 abiyug papajo brainy749 kuonanhong gityogi hbcbh1999 marami52 rohitbond khamxay massaraevi zhanpython petterranefall gaybro8777 dexterya learningbox-suprapto moerashidi yemanzhongting theafricanquant lphgpc rajsingh7 laranea glapierr standardgalacticamunategui.github.io's Issues
Very helpful post on the setting up of Flask for EC2. Saved my ass.
Just wanted to say thanks.
Trouble loading huge data sets on EC2 Spark Clusters
Hi Sir,
I have followed your guide, and it was a great help !!
I have launched one Master and two Slaves with 8 GB memory from AWS EC2.
I wish to load 220MB CSV files from S3 bucket and then I try to convert in Spark Data Frame. But I am encountering error regarding Heap Size. Following is the code and the error.
CODE:
library("SparkR",lib.loc="/root/spark/R/lib")
library(data.table)
Sys.setenv(SPARK_HOME="/root/spark")
sc<- sparkR.init()
sqlContext<- sparkRSQL.init(sc)
data_load<- read.csv("https://s3-ap-southeast-1.amazonaws.com/foldername/filename.csv")
dfr<- createDataFrame(sqlContext, data_load)
ERROR:
java.lang.OutOfMemoryError: Java heap space
at org.apache.spark.api.r.SerDe$.readBytes(SerDe.scala:95)
at org.apache.spark.api.r.SerDe$$anonfun$readBytesArr$1.apply(SerDe.scala:140)
at org.apache.spark.api.r.SerDe$$anonfun$readBytesArr$1.apply(SerDe.scala:140)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.Range.foreach(Range.scala:141)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.api.r.SerDe$.readBytesArr(SerDe.scala:140)
at org.apache.spark.api.r.SerDe$.readArray(SerDe.scala:172)
at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:74)
at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:60)
at org.apache.spark.api.r.RBackendHandler$$anonfun$readArgs$1.apply(RBackendHandler.scala:182)
at org.apache.spark.api.r.RBackendHandler$$anonfun$readArgs$1.apply(RBackendHandler.scala:181)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.Range.foreach(Range.scala:141)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.api.r.RBackendHandler.readArgs(RBackendHandler.scala:181)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:123)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
Error in if (returnStatus != 0) { : argument is of length zero
Please guide me for the same! Thanks in advance !
caret + tbl_df incompatibility
Thank you for your helpful introduction to modeling binary outcomes using caret. I reproduced your example successfully, and following the same steps, I encountered a frustrating issue with my own data. It failed at the GBM train() line, dropping into an unhelpful debug mode. It turns out that this issue is related to an incompatibility between caret and the increasingly popular tbl_df (tidyverse 'tibble') . The issue is documented: topepo/caret#145 and topepo/caret#611 and http://stackoverflow.com/questions/29802216/caret-error-using-gbm-but-not-without-caret . The cause is not intuitive, but solving the problem is easy:
df <- as.data.frame(df)
It is a peculiar coincidence that the first example in your most helpful guide leads to this error. If you revise your guide, it would be helpful to include a mention of this issue. Thanks again!
fork as template
Hi,
I am not sure if this is the correct platform, but is it alright for me to fork this repo as a template for my own blog/profile?
Regards
Germayne
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.