我有一个 data.table out 像这样(实际上它要大得多):
out <- 代码权重组1:2 0.387 12:1 0.399 13:2 1.610 14:3 1.323 25:2 0.373 26:1 0.212 27: 3 0.316 38:2 0.569 39:1 0.120 310: 1 0.354 3它有 3 个不同代码的组(第 1 列).在#1组中,代码3没有出现,而在另一组中出现.
然后,我想对每个组和代码组合的权重求和.我用这个命令实现了这一点:
sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]这很有效,但它没有将组 1 与代码 3 组合在一起,因为它不在 out 表中.我想在 sum.dt 中包含所有可能的组合,如果该组合没有出现在源表中,则总和应为 0,即 V1 列此行中应为 0.
知道如何实现这一目标吗?
解决方案使用CJ(交叉连接)可以添加缺失的组合:
library(data.table)setkey(输出,代码,组)出[CJ(代码,组,唯一=真)][, lapply(.SD, sum), by = .(code, group)][is.na(权重),权重:= 0]给出:
代码组权重1: 1 1 0.3992:1 2 0.2123:1 3 0.4744: 2 1 1.9975: 2 2 0.3736: 2 3 0.5697: 3 1 0.0008: 3 2 1.3239: 3 3 0.316或者使用 xtabs 如@alexis_laz 在评论中所示:
xtabs(weights ~ group + code, out)给出:
代码第 1 组 2 31 0.399 1.997 0.0002 0.212 0.373 1.3233 0.474 0.569 0.316如果你想在一个长格式的数据帧中得到这个输出,你可以将 xtabs 代码包装在 reshape2(或data.table)包:
库(reshape2)res <-melt(xtabs(weights ~ group + code, out))给出:
>类(资源)[1]数据框">资源组码值1 1 1 0.3992 2 1 0.2123 3 1 0.4744 1 2 1.9975 2 2 0.3736 3 2 0.5697 1 3 0.0008 2 3 1.3239 3 3 0.316您也可以使用 dplyr 和 tidyr 的组合来做到这一点:
库(dplyr)图书馆(整理)%>%完成(代码,组,填充=列表(权重=0))%>%group_by(代码,组)%>%总结(总和(权重))I have a data.table out like this (in reality it is much larger):
out <- code weights group 1: 2 0.387 1 2: 1 0.399 1 3: 2 1.610 1 4: 3 1.323 2 5: 2 0.373 2 6: 1 0.212 2 7: 3 0.316 3 8: 2 0.569 3 9: 1 0.120 3 10: 1 0.354 3It has 3 groups with different codes (column 1). In group #1, the code 3 does not appear, while in the other it appears.
Then, I want to sum the weights for every group and code combination . I achieve this with this command:
sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]This works well but it does not have the combination Group 1 with Code 3 because it is not in the out table. I would like to have all possible combinations in sum.dt, and if the combination does not occur in the source table, it should sum up to 0, meaning the column V1 should be 0 in this row.
Any idea how I could achieve this?
解决方案Using CJ (cross join) you can add the missing combinations:
library(data.table) setkey(out, code, group) out[CJ(code, group, unique = TRUE) ][, lapply(.SD, sum), by = .(code, group) ][is.na(weights), weights := 0]gives:
code group weights 1: 1 1 0.399 2: 1 2 0.212 3: 1 3 0.474 4: 2 1 1.997 5: 2 2 0.373 6: 2 3 0.569 7: 3 1 0.000 8: 3 2 1.323 9: 3 3 0.316
Or with xtabs as @alexis_laz showed in the comments:
xtabs(weights ~ group + code, out)which gives:
code group 1 2 3 1 0.399 1.997 0.000 2 0.212 0.373 1.323 3 0.474 0.569 0.316
If you want to get this output in a long-form dataframe, you can wrap the xtabs code in the melt function of the reshape2 (or data.table) package:
library(reshape2) res <- melt(xtabs(weights ~ group + code, out))which gives:
> class(res) [1] "data.frame" > res group code value 1 1 1 0.399 2 2 1 0.212 3 3 1 0.474 4 1 2 1.997 5 2 2 0.373 6 3 2 0.569 7 1 3 0.000 8 2 3 1.323 9 3 3 0.316
You could also do this with a combination of dplyr and tidyr:
library(dplyr) library(tidyr) out %>% complete(code, group, fill = list(weights=0)) %>% group_by(code, group) %>% summarise(sum(weights))