org.apache.thrift.transport.TTransportException error while Reading large JSON file in zeppelin scala -


i trying read large json file (1.5 gb) using zeppelin , scala.

zeppelin working on spark in local mode installed on ubuntu os on vm 10 gb ram. have alloted 8gb spark.executor.memory

my code below

val inputfileweather="/home/shashi/incubator-zeppelin-master/data/ai/weather.json" val temp=sqlcontext.read.json(inputfileweather) 

i getting following error

org.apache.thrift.transport.ttransportexception     @ org.apache.thrift.transport.tiostreamtransport.read(tiostreamtransport.java:132)     @ org.apache.thrift.transport.ttransport.readall(ttransport.java:86)     @ org.apache.thrift.protocol.tbinaryprotocol.readall(tbinaryprotocol.java:429)     @ org.apache.thrift.protocol.tbinaryprotocol.readi32(tbinaryprotocol.java:318)     @ org.apache.thrift.protocol.tbinaryprotocol.readmessagebegin(tbinaryprotocol.java:219)     @ org.apache.thrift.tserviceclient.receivebase(tserviceclient.java:69)     @ org.apache.zeppelin.interpreter.thrift.remoteinterpreterservice$client.recv_interpret(remoteinterpreterservice.java:241)     @ org.apache.zeppelin.interpreter.thrift.remoteinterpreterservice$client.interpret(remoteinterpreterservice.java:225)     @ org.apache.zeppelin.interpreter.remote.remoteinterpreter.interpret(remoteinterpreter.java:229)     @ org.apache.zeppelin.interpreter.lazyopeninterpreter.interpret(lazyopeninterpreter.java:93)     @ org.apache.zeppelin.notebook.paragraph.jobrun(paragraph.java:229)     @ org.apache.zeppelin.scheduler.job.run(job.java:171)     @ org.apache.zeppelin.scheduler.remotescheduler$jobrunner.run(remotescheduler.java:328)     @ java.util.concurrent.executors$runnableadapter.call(executors.java:471)     @ java.util.concurrent.futuretask.run(futuretask.java:262)     @ java.util.concurrent.scheduledthreadpoolexecutor$scheduledfuturetask.access$201(scheduledthreadpoolexecutor.java:178)     @ java.util.concurrent.scheduledthreadpoolexecutor$scheduledfuturetask.run(scheduledthreadpoolexecutor.java:292)     @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1145)     @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:615)     @ java.lang.thread.run(thread.java:745) 

the error got due problem in running spark interpreter, zeppelin not connect interpreter process.

you have check logs located in /path/to/zeppelin/logs/*.out know happening. perhaps in interpreter logs see oom.

i think 8gb executor memory on vm 10 gb unreasonable,(and how many executors starting?). have consider driver memeory well


Comments

Popular posts from this blog

Load Balancing in Bluemix using custom domain and DNS SRV records -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -

python - Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] error -