awk - How to count occurrences no matter its case? -


table

chr10   10482   10484   0   11  +   ca chr10   10486   10488   0   12  +   ca chr10   10487   10489   0   13  +   ca chr10   10490   10492   0   13  +   ca chr10   10491   10493   0   12  +   ct chr10   10494   10496   6.66667 15  +   ca chr10   10495   10497   6.66667 15  +   cc 

i count number of lines in column 7 "ca" can found regardless of of 2 letters being in upper or lower case.

the desired output 5.

the 2 commands (below) give empty output

cat table | awk ' $7 ==/^[cc][aa]/{++count} end {print count}'  awk 'begin {ignorecase = 1} $7==/"ca"/ {++count} end {print count}' table 

the below command returns value of 1

awk 'begin {ignorecase = 1} end {if ($7=="ca"){++count} {print count}}' table 

note: actual table tens of millions of lines long, not want write table intermediate in order count. (i need repeat task other files too).

there little problem in syntax: either var == "string" or var ~ regexp, saying var ~ /"string"/. using correct combination makes command work:

$ awk '$7 ~ /^[cc][aa]/{++count} end {print count+0}' file 5 $ awk 'begin {ignorecase = 1} $7=="ca" {++count} end {print count+0}' file 5 

also, may want use toupper() (or tolower()) check this, instead of using ignorecase flag:

awk 'toupper($7) == "ca" {++count} end {print count+0}' file 

note trick print count + 0 instead of count. way, cast variable 0 if wasn't set before. this, print 0 whenever there no matches; if print count, return empty string.


Comments

Popular posts from this blog

Load Balancing in Bluemix using custom domain and DNS SRV records -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -

python - Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] error -