linux - Fast ways to make new multiple files from one file matching multiple patterns -
i have 1 file called uniq.txt (20,000 lines).
head uniq.txt 1 103 10357 1124 1126
i have file called all.txt (106,371,111 lines)
head all.txt cg0001 ? 1 -0.394991215660192 cg0001 ab 103 -0.502535661820095 cg0002 10357 -0.563632386999913 cg0003 ? 1 -0.394991215660444 cg0004 ? 1 -0.502535661820095 cg0004 10357 -0.563632386999913 cg0003 ab 103 -0.64926706504459
i make new 20,000 files all.txt matching each line pattern of uniq.txt. example,
head 1.newfile.txt cg0001 ? 1 -0.394991215660192 cg0003 ? 1 -0.394991215660444 cg0004 ? 1 -0.502535661820095 head 103.newfile.txt cg0001 ab 103 -0.502535661820095 cg0003 ab 103 -0.64926706504459 head 10357.newfile.txt cg0002 10357 -0.563632386999913 cg0004 10357 -0.563632386999913
is there way can make new 20,000 files fast? current script takes 1 min make 1 new file. guess it's scanning all.txt file every time makes new file.
you can try awk. ideally don't need >>
in awk since have stated there 20,000 files, don't want exhaust system's resources keeping many file open.
awk ' nr==fnr { names[$0]++; next } ($3 in names) { file=$3".newfile.txt"; print $0 >>(file); close (file) } ' uniq.txt all.txt
this first scan uniq.txt file memory creating lookup table of sorts. read through all.txt file , start inserting entries corresponding files.
Post a Comment