python - Pyspark socket write error -


i'm trying read file(~600m csv file) pyspark. following error.

surprisingly same code works correctly scala.

i found issue page https://issues.apache.org/jira/browse/spark-12261 not work.

reading code:

import os pyspark import sparkcontext pyspark import sparkconf  datasetdir = 'd:\\datasets\\movielens\\ml-latest\\' ratingfile = 'ratings.csv'  conf = sparkconf().setappname("movie_recommendation-server").setmaster('local[2]') sc = sparkcontext(conf=conf)  ratingrdd = sc.textfile(os.path.join(datasetdir, ratingfile)) print(ratingrdd.take(1)[0])  

i getting error:

      16/04/25 09:00:04 error pythonrunner: python worker exited unexpectedly    (crashed)     java.net.socketexception: connection reset peer: socket write error     @ java.net.socketoutputstream.socketwrite0(native method)     @ java.net.socketoutputstream.socketwrite(socketoutputstream.java:109)     @ java.net.socketoutputstream.write(socketoutputstream.java:153)     @ java.io.bufferedoutputstream.flushbuffer(bufferedoutputstream.java:82)     @ java.io.bufferedoutputstream.write(bufferedoutputstream.java:126)     @ java.io.dataoutputstream.write(dataoutputstream.java:107)     @ java.io.filteroutputstream.write(filteroutputstream.java:97)     @ org.apache.spark.api.python.pythonrdd$.writeutf(pythonrdd.scala:622)     @ org.apache.spark.api.python.pythonrdd$.org$apache$spark$api$python$pythonrdd$$write$1(pythonrdd.scala:442)     @ org.apache.spark.api.python.pythonrdd$$anonfun$writeiteratortostream$1.apply(pythonrdd.scala:452)     @ org.apache.spark.api.python.pythonrdd$$anonfun$writeiteratortostream$1.apply(pythonrdd.scala:452)     @ scala.collection.iterator$class.foreach(iterator.scala:727)     @ scala.collection.abstractiterator.foreach(iterator.scala:1157)     @ org.apache.spark.api.python.pythonrdd$.writeiteratortostream(pythonrdd.scala:452)     @ org.apache.spark.api.python.pythonrunner$writerthread$$anonfun$run$3.apply(pythonrdd.scala:280)     @ org.apache.spark.util.utils$.loguncaughtexceptions(utils.scala:1765)     @ org.apache.spark.api.python.pythonrunner$writerthread.run(pythonrdd.scala:239)          


Comments

Popular posts from this blog

javascript - How to get current YouTube IDs via iMacros? -

c# - Maintaining a program folder in program files out of date? -

emulation - Android map show my location didn't work -