foreach - Parallel query of SQLite database in R -


i have large database (~100gb) need pull every entry, perform comparisons on it, , store results of comparisons. have attempted run parallel queries within single r sessions without success. can run multiple r sessions @ once looking better approach. here attempted:

library(rsqlite) library(data.table) library(foreach) library(domc)    #--------- # setup #---------   #connect db db <- dbconnect(sqlite(), dbname="genes_drug_combos.sqlite")   #--------- # query #--------- # 856086 combos = 1309 * 109 * 6  registerdomc(8)  #i run 6 seperate r sessions (one each i) res_list <- foreach(i=1:6) %dopar% {    <- i*109-108   b <- i*109    pb  <- txtprogressbar(min=a, max=b, style=3)   res <- list()    (j in a:b) {      #get preds drug combos     statement   <- paste("select * combo_tstats rowid between", (j*1309)-1308, "and", j*1309)     combo_preds <- dbgetquery(db, statement)      #here stuff result returned query     combo_names <- combo_preds$drug_combo     combo_preds <- as.data.frame(t(combo_preds[,-1]))      colnames(combo_preds)  <- combo_names      #get top drug combos     top_combos <- get_top_drugs(query_genes, drug_info=combo_preds, es=t)      #update progress , store result     settxtprogressbar(pb, j)     res[[ length(res)+1 ]] <- top_combos   }   #bind results   res <- rbindlist(res) } 

i dont errors 1 core spins up. in contrast, if run multiple r sessions, cores go @ it. doing wrong?

some things have learned while accessing concurrently rsqlite same file sqlite database:

1. make sure each worker has own db connection.

  parallel::clusterevalq(cl = cl, {     db.conn <- rsqlite::dbconnect(rsqlite::sqlite(), "./export/models.sqlite");     rsqlite::dbclearresult(rsqlite::dbsendquery(db.conn, "pragma busy_timeout=5000;"));   }) 

2. use pragma busy_timeout=5000;

by default set 0, , chances end "database locked" error each time worker tries write db while locked. previous code sets pragma in each worker connection. note select operations never locked, insert/delete/update.

3. use pragma journal_mode=wal;

this has set once , stays on default forever. add 2 (more or less permanent) files db. improve concurrent read/write performance. read more here.

with above settings have not experienced this issue.


Comments

Popular posts from this blog

Load Balancing in Bluemix using custom domain and DNS SRV records -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -

python - Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] error -