Spark Streaming from Kafka Source Go Back to a Checkpoint or Rewinding -
when streaming spark dstreams consumer kafka source, 1 can checkpoint spark context when app crashes (or affected kill -9), app can recover context checkpoint. if app 'accidentally deployed bad logic', 1 might want rewind last topic+partition+offset replay events kafka topic's partitions' offset positions working fine before 'bad logic'. how streaming apps rewound last 'good spot' (topic+partition+offset) when checkpointing in effect? 
note: in (heart) logs, jay kreps writes using parallel consumer (group) process starts @ diverging kafka offset locations until caught original , killing original. (what 2nd spark streaming process respect starting partition/offset locations?)
sidebar: question may related mid-stream changing configuration check-pointed spark stream similar mechanism may need deployed.
you not going able rewind stream in running sparkstreamingcontext. it's important keep these points in mind (straight docs):
- once context has been started, no new streaming computations can set or added it.
- once context has been stopped, cannot restarted.
- only 1 streamingcontext can active in jvm @ same time.
- stop() on streamingcontext stops sparkcontext. stop streamingcontext, set optional parameter of stop() called stopsparkcontext false.
- a sparkcontext can re-used create multiple streamingcontexts, long previous streamingcontext stopped (without stopping sparkcontext) before next streamingcontext created
instead, going have stop current stream, , create new one. can start stream specific set of offsets using 1 of versions of createdirectstream takes fromoffsets parameter signature map[topicandpartition, long] -- it's starting offset mapped topic , partition.
another theoretical possibility use kafkautils.createrdd takes offset ranges input. "bad logic" started @ offset x , fixed @ offset y. use cases, might want createrdd offsets x y , process results, instead of trying stream.
Comments
Post a Comment