apache pig - the number of vowels in a file -
can me this? much. , code:
g = load 'input.txt' (line:chararray); b = foreach g generate flatten(strsplit(lower(line), '(?<=.)(?=.)')) s:chararray; c = foreach b generate flatten(tobag(*)) letter; result = filter c ( letter == 'a' or letter == 'e' or letter == 'i' or letter == 'o' or letter == 'u' ); e = group result letter; f = foreach e generate group, count(result) ; dump f;
first tokenize line words , characters words.use replace slice characters in words.instead of using tobag(*),use tokenize split characters along replaced delimiter.filter aeiou,then group character , counts.
pigscript
a = load 'test4.txt' (line:chararray); b = foreach generate flatten(tokenize(line)) words; c = foreach b generate flatten(tokenize(replace(lower(words),'','|'),'|')) letter; d = filter c (letter == 'a' or letter == 'e' or letter == 'i' or letter == 'o' or letter == 'u' ); e = group d letter; f = foreach e generate group letter,count(d.letter) total; dump f;
output
Comments
Post a Comment