将 rqda 文件转换为 sql 文件

Question

我正在使用 rstudio to code text manually. The final rqda-file is a sql 数据库中的一个包 RQDA。我在文本中对语句进行了编码，并使用了不同的代码并归入它们在代码类别中（例如：代码类别 "actor_party" 和然后是相关代码 "socialist"、"liberal"、"conservative" 等）。我完成编码并想要执行社交网络分析它。为此，我想创建一个 sql 数据库，以便每个代码 category 有自己的列，每行都有特定的代码。每个代码都可以通过以下属性标识：catid（=代码类别编号）、fid（文件标识号）和 selfirst（每个代码的开头）。通过这样做，特定的 catid、fid 和 selfirst 被选择为每个 coded 语句，以便 sqlite 可以将每个编码标识为唯一（此外，正如您在 R 脚本中看到的，也必须为每个有效编码选择 status=1）。
我使用 rstudio in the version 0.99.879, rqda in the version 0.2-7 and rsqlite 1.0.0.

因此，使用了以下 R 代码：

library(RSQLite) # load Package RSQLite
setwd("C:/...")

system("ls *.rqda", show=TRUE)
sqlite <- dbDriver("SQLite")
#specifing the file
qdadb <- dbConnect(sqlite,"My_data.rqda")


dbListTables(qdadb)
dbListFields(qdadb, "coding") # that's where the codings are stored


catid <- dbGetQuery(qdadb, "select distinct(catid) from treecode where status = 1 ORDER BY catid")
i <- 1
table <- dbGetQuery(qdadb, "select fid, selfirst from coding where status = 1 GROUP BY fid, selfirst")
while(i <= max(catid)) {
   ids <- dbGetQuery(qdadb, paste("select cid from treecode where (catid = ",i," and status = 1)", sep=""));
   t <- dbGetQuery(qdadb, paste("select cid, fid, selfirst from coding where (cid in (", paste(as.character(ids$cid), sep="' '", collapse=","), ") and status = 1)", sep=""));
   table <- merge(table, t, by = c("fid","selfirst"), all.x = T);
   i <- i + 1;
   }
# warnings are created because of the same columns which are duplicated by the merging

colnames(table) <- c("fid", "selfirst", dbGetQuery(qdadb, "select name from codecat where status = 1")[,1]) #each code has attributed a unique f(ile)id and selfirst (it's the unique starting point of each coding)

# see below for an example of such a created table

library(car) # Companion to Applied Regression package

# years - catid = 1
table$A00_time_frame <- recode(table$A00_time_frame, '1 = 2010; 2 = 2011; 3 = 2012; 4 = 2013; 5 = 2014; 6 = 2015')

# Sources - catid = 2
ids <- dbGetQuery(qdadb, "select cid from treecode where (catid = 2 and status = 1)")[,1]
values <- dbGetQuery(qdadb, paste("select name from freecode where (id in(", paste(ids, collapse = ","), ") and status = 1)"))[,1]
table$B00_source <- recode(table$B00_source, paste0("'", paste(ids,"'='", values, collapse = "';'", sep=""),"'", sep=""))

# Claimant type - catid = 3
ids <- dbGetQuery(qdadb, "select cid from treecode where (catid = 3 and status = 1)")[,1]
values <- dbGetQuery(qdadb, paste("select name from freecode where (id in(", paste(ids, collapse = ","), ") and status = 1)"))[,1]
table$C00_claimant_type <- recode(table$C00_claimant_type, paste0("'", 
paste(ids,"'='", values, collapse = "';'", sep=""),"'", sep=""))

and so until "catid = 20"

结果如下所示： example_table [并且这个 table 一直持续到第 844 行 - 只有 fid 在升序]

虽然如此，但创建的 table 与总数匹配编码的数量，一些错误正在发生。有些代码没有链接到正确的语句（即使它们链接到正确的代码类别，但没有链接到正确的编码语句）

我仍然是 R(studio) 的初学者，无法解释哪里出了问题。

有没有人知道这里可能存在的问题或错误以及如何解决？

应要求，我很乐意分享我的文件:)

非常欢迎任何建议或帮助！！

编辑： 这是我的数据子集的 a link，您可以复制它（文件是 rqda 格式，因为我认为，它的转换本身可能是问题所在）。
此外，给你两个例子去哪里看。

通过在 R 中创建 'table'，可以识别以下行

1. - fid 95, selfirst 4553 然后是编码 'Welt' 然后是 'E02_European_Commission' + 'G10_Cameroon' 后来
但是，如果您检查原始 rqda 文件中的编码，则代码 'Cameroon' 不在该文件中，而是在 fid 70、selfirst 5082 和 'Welt' 年 '2010'[=16= 中]

- fid 90、selfirst 959 和年份“2011”显示代码 'CDU'，最后一行 'special claimant' 显示名称 'Martin Schulz'。
  但是，如果查看原始 rqda 文件中的编码，子集中的代码 'Martin Schulz' 没有附加编码。

我希望，这两个示例可以说明问题，并让您知道应该在哪里查看问题所在。

抱歉，我一开始没有提供！

Answer 1

也许先简化代码，以便更好地了解可能出了什么问题？就个人而言，我会更依赖 SQL 而不是 R 来整理所有信息：

t <- dbGetQuery(qdadb, "SELECT codecat.name, coding.cid, coding.fid, coding.selfirst 
       FROM treecode, coding, codecat 
       WHERE treecode.cid = coding.cid 
       AND treecode.catid = codecat.catid
       AND treecode.status = 1
       AND coding.status = 1")
head(reshape(t, idvar = c("fid", "selfirst"), timevar = "name", direction = "wide"))

不确定这是否是您正在寻找的，或者它是否更好用。但评估代码似乎更简单。

将 rqda 文件转换为 sql 文件

Convert a rqda file to a sql file

sql

r

social-networking

rsqlite

rqda