Extracting text between 2 specific strings with multiple occurrences in bash -


i have big xhtml file lots of junk text don't need. need whatever text lies between 2 specific strings occur many times within file, e.g.

<html> <xyz> unneeded text </xyz> <mytag> important text1 </mytag> <xyz> unneeded text </xyz> <xyz> unneeded text </xyz> <mytag> important text2 </mytag> <mytag> important text3 </mytag> <xyz> unneeded text </xyz> </html> 

my output should be:

important text1 important text2 important text3 

i need using bash script.

thanks help

using regex on xml format risky, particularly line based text processing tool grep. cannot make sure result correct.

if input valid xml format, go xml way: xpath expression.

with tool xmlstarlet, can do:

xmlstarlet sel -t -v "//mytag/text()" file.xml 

it gives desired output.

you can xmllint, however, need further filtering on output.


Comments

Popular posts from this blog

javascript - How to get current YouTube IDs via iMacros? -

c# - Maintaining a program folder in program files out of date? -

emulation - Android map show my location didn't work -