Extracting text between 2 specific strings with multiple occurrences in bash -

- May 15, 2015

i have big xhtml file lots of junk text don't need. need whatever text lies between 2 specific strings occur many times within file, e.g.

<html> <xyz> unneeded text </xyz> <mytag> important text1 </mytag> <xyz> unneeded text </xyz> <xyz> unneeded text </xyz> <mytag> important text2 </mytag> <mytag> important text3 </mytag> <xyz> unneeded text </xyz> </html>

my output should be:

important text1 important text2 important text3

i need using bash script.

thanks help

using regex on xml format risky, particularly line based text processing tool grep. cannot make sure result correct.

if input valid xml format, go xml way: xpath expression.

with tool xmlstarlet, can do:

xmlstarlet sel -t -v "//mytag/text()" file.xml

it gives desired output.

you can xmllint, however, need further filtering on output.

Search This Blog

Addrety

Extracting text between 2 specific strings with multiple occurrences in bash -

Comments

Post a Comment

Popular posts from this blog

javascript - Feed FileReader from server side files -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -

php - Webix Data Loading from Laravel Link -