linux - Fast ways to make new multiple files from one file matching multiple patterns -


i have 1 file called uniq.txt (20,000 lines).

head uniq.txt  1  103  10357  1124  1126  

i have file called all.txt (106,371,111 lines)

head all.txt cg0001  ?   1   -0.394991215660192  cg0001  ab  103 -0.502535661820095  cg0002    10357   -0.563632386999913  cg0003  ?   1   -0.394991215660444  cg0004  ?   1   -0.502535661820095  cg0004    10357   -0.563632386999913  cg0003  ab  103 -0.64926706504459  

i make new 20,000 files all.txt matching each line pattern of uniq.txt. example,

head 1.newfile.txt  cg0001  ?   1   -0.394991215660192  cg0003  ?   1   -0.394991215660444  cg0004  ?   1   -0.502535661820095   head 103.newfile.txt  cg0001  ab  103 -0.502535661820095  cg0003  ab  103 -0.64926706504459   head 10357.newfile.txt  cg0002    10357   -0.563632386999913  cg0004    10357   -0.563632386999913  

is there way can make new 20,000 files fast? current script takes 1 min make 1 new file. guess it's scanning all.txt file every time makes new file.

you can try awk. ideally don't need >> in awk since have stated there 20,000 files, don't want exhaust system's resources keeping many file open.

awk '     nr==fnr { names[$0]++; next }     ($3 in names) { file=$3".newfile.txt"; print $0 >>(file); close (file) } ' uniq.txt all.txt 

this first scan uniq.txt file memory creating lookup table of sorts. read through all.txt file , start inserting entries corresponding files.


Comments

Popular posts from this blog

Load Balancing in Bluemix using custom domain and DNS SRV records -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -

python - Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] error -