org.apache.thrift.transport.TTransportException error while Reading large JSON file in zeppelin scala -
i trying read large json file (1.5 gb) using zeppelin , scala.
zeppelin working on spark in local mode installed on ubuntu os on vm 10 gb ram. have alloted 8gb spark.executor.memory
my code below
val inputfileweather="/home/shashi/incubator-zeppelin-master/data/ai/weather.json" val temp=sqlcontext.read.json(inputfileweather)
i getting following error
org.apache.thrift.transport.ttransportexception @ org.apache.thrift.transport.tiostreamtransport.read(tiostreamtransport.java:132) @ org.apache.thrift.transport.ttransport.readall(ttransport.java:86) @ org.apache.thrift.protocol.tbinaryprotocol.readall(tbinaryprotocol.java:429) @ org.apache.thrift.protocol.tbinaryprotocol.readi32(tbinaryprotocol.java:318) @ org.apache.thrift.protocol.tbinaryprotocol.readmessagebegin(tbinaryprotocol.java:219) @ org.apache.thrift.tserviceclient.receivebase(tserviceclient.java:69) @ org.apache.zeppelin.interpreter.thrift.remoteinterpreterservice$client.recv_interpret(remoteinterpreterservice.java:241) @ org.apache.zeppelin.interpreter.thrift.remoteinterpreterservice$client.interpret(remoteinterpreterservice.java:225) @ org.apache.zeppelin.interpreter.remote.remoteinterpreter.interpret(remoteinterpreter.java:229) @ org.apache.zeppelin.interpreter.lazyopeninterpreter.interpret(lazyopeninterpreter.java:93) @ org.apache.zeppelin.notebook.paragraph.jobrun(paragraph.java:229) @ org.apache.zeppelin.scheduler.job.run(job.java:171) @ org.apache.zeppelin.scheduler.remotescheduler$jobrunner.run(remotescheduler.java:328) @ java.util.concurrent.executors$runnableadapter.call(executors.java:471) @ java.util.concurrent.futuretask.run(futuretask.java:262) @ java.util.concurrent.scheduledthreadpoolexecutor$scheduledfuturetask.access$201(scheduledthreadpoolexecutor.java:178) @ java.util.concurrent.scheduledthreadpoolexecutor$scheduledfuturetask.run(scheduledthreadpoolexecutor.java:292) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1145) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:615) @ java.lang.thread.run(thread.java:745)
the error got due problem in running spark interpreter, zeppelin not connect interpreter process.
you have check logs located in /path/to/zeppelin/logs/*.out
know happening. perhaps in interpreter logs see oom.
i think 8gb executor memory on vm 10 gb unreasonable,(and how many executors starting?). have consider driver memeory well
Comments
Post a Comment