apache pig - the number of vowels in a file -


can me this? much. , code:

g = load 'input.txt' (line:chararray); b = foreach g generate flatten(strsplit(lower(line), '(?<=.)(?=.)')) s:chararray; c = foreach b generate flatten(tobag(*)) letter; result = filter c ( letter == 'a' or  letter == 'e' or letter == 'i' or letter == 'o' or letter == 'u' ); e = group result letter; f = foreach e generate group, count(result) ; dump f; 

first tokenize line words , characters words.use replace slice characters in words.instead of using tobag(*),use tokenize split characters along replaced delimiter.filter aeiou,then group character , counts.

pigscript

a = load 'test4.txt' (line:chararray); b = foreach generate  flatten(tokenize(line)) words; c = foreach b generate  flatten(tokenize(replace(lower(words),'','|'),'|')) letter; d = filter c (letter == 'a' or  letter == 'e' or letter == 'i' or letter == 'o' or letter == 'u' ); e = group d letter; f = foreach e generate group letter,count(d.letter) total; dump f; 

output

output


Comments

Popular posts from this blog

Load Balancing in Bluemix using custom domain and DNS SRV records -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -

python - Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] error -