foreach - Parallel query of SQLite database in R -
i have large database (~100gb) need pull every entry, perform comparisons on it, , store results of comparisons. have attempted run parallel queries within single r sessions without success. can run multiple r sessions @ once looking better approach. here attempted:
library(rsqlite) library(data.table) library(foreach) library(domc) #--------- # setup #--------- #connect db db <- dbconnect(sqlite(), dbname="genes_drug_combos.sqlite") #--------- # query #--------- # 856086 combos = 1309 * 109 * 6 registerdomc(8) #i run 6 seperate r sessions (one each i) res_list <- foreach(i=1:6) %dopar% { <- i*109-108 b <- i*109 pb <- txtprogressbar(min=a, max=b, style=3) res <- list() (j in a:b) { #get preds drug combos statement <- paste("select * combo_tstats rowid between", (j*1309)-1308, "and", j*1309) combo_preds <- dbgetquery(db, statement) #here stuff result returned query combo_names <- combo_preds$drug_combo combo_preds <- as.data.frame(t(combo_preds[,-1])) colnames(combo_preds) <- combo_names #get top drug combos top_combos <- get_top_drugs(query_genes, drug_info=combo_preds, es=t) #update progress , store result settxtprogressbar(pb, j) res[[ length(res)+1 ]] <- top_combos } #bind results res <- rbindlist(res) }
i dont errors 1 core spins up. in contrast, if run multiple r sessions, cores go @ it. doing wrong?
some things have learned while accessing concurrently rsqlite
same file sqlite database:
1. make sure each worker has own db connection.
parallel::clusterevalq(cl = cl, { db.conn <- rsqlite::dbconnect(rsqlite::sqlite(), "./export/models.sqlite"); rsqlite::dbclearresult(rsqlite::dbsendquery(db.conn, "pragma busy_timeout=5000;")); })
2. use pragma busy_timeout=5000;
by default set 0, , chances end "database locked" error each time worker tries write db while locked. previous code sets pragma
in each worker connection. note select
operations never locked, insert/delete/update
.
3. use pragma journal_mode=wal;
this has set once , stays on default forever. add 2 (more or less permanent) files db. improve concurrent read/write performance. read more here.
with above settings have not experienced this issue.
Comments
Post a Comment