最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

data.table:表中所有现有组合的总和

SEO心得admin56浏览0评论
本文介绍了data.table:表中所有现有组合的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个 data.table out 像这样(实际上它要大得多):

out <- 代码权重组1:2 0.387 12:1 0.399 13:2 1.610 14:3 1.323 25:2 0.373 26:1 0.212 27: 3 0.316 38:2 0.569 39:1 0.120 310: 1 0.354 3

它有 3 个不同代码的组(第 1 列).在#1组中,代码3没有出现,而在另一组中出现.

然后,我想对每个组和代码组合的权重求和.我用这个命令实现了这一点:

sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]

这很有效,但它没有将组 1 与代码 3 组合在一起,因为它不在 out 表中.我想在 sum.dt 中包含所有可能的组合,如果该组合没有出现在源表中,则总和应为 0,即 V1 列此行中应为 0.

知道如何实现这一目标吗?

解决方案

使用CJ(交叉连接)可以添加缺失的组合:

library(data.table)setkey(输出,代码,组)出[CJ(代码,组,唯一=真)][, lapply(.SD, sum), by = .(code, group)][is.na(权重),权重:= 0]

给出:

代码组权重1: 1 1 0.3992:1 2 0.2123:1 3 0.4744: 2 1 1.9975: 2 2 0.3736: 2 3 0.5697: 3 1 0.0008: 3 2 1.3239: 3 3 0.316

或者使用 xtabs 如@alexis_laz 在评论中所示:

xtabs(weights ~ group + code, out)

给出:

代码第 1 组 2 31 0.399 1.997 0.0002 0.212 0.373 1.3233 0.474 0.569 0.316

如果你想在一个长格式的数据帧中得到这个输出,你可以将 xtabs 代码包装在 reshape2(或data.table)包:

库(reshape2)res <-melt(xtabs(weights ~ group + code, out))

给出:

>类(资源)[1]数据框">资源组码值1 1 1 0.3992 2 1 0.2123 3 1 0.4744 1 2 1.9975 2 2 0.3736 3 2 0.5697 1 3 0.0008 2 3 1.3239 3 3 0.316

您也可以使用 dplyr 和 tidyr 的组合来做到这一点:

库(dplyr)图书馆(整理)%>%完成(代码,组,填充=列表(权重=0))%>%group_by(代码,组)%>%总结(总和(权重))

I have a data.table out like this (in reality it is much larger):

out <- code weights group 1: 2 0.387 1 2: 1 0.399 1 3: 2 1.610 1 4: 3 1.323 2 5: 2 0.373 2 6: 1 0.212 2 7: 3 0.316 3 8: 2 0.569 3 9: 1 0.120 3 10: 1 0.354 3

It has 3 groups with different codes (column 1). In group #1, the code 3 does not appear, while in the other it appears.

Then, I want to sum the weights for every group and code combination . I achieve this with this command:

sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]

This works well but it does not have the combination Group 1 with Code 3 because it is not in the out table. I would like to have all possible combinations in sum.dt, and if the combination does not occur in the source table, it should sum up to 0, meaning the column V1 should be 0 in this row.

Any idea how I could achieve this?

解决方案

Using CJ (cross join) you can add the missing combinations:

library(data.table) setkey(out, code, group) out[CJ(code, group, unique = TRUE) ][, lapply(.SD, sum), by = .(code, group) ][is.na(weights), weights := 0]

gives:

code group weights 1: 1 1 0.399 2: 1 2 0.212 3: 1 3 0.474 4: 2 1 1.997 5: 2 2 0.373 6: 2 3 0.569 7: 3 1 0.000 8: 3 2 1.323 9: 3 3 0.316

Or with xtabs as @alexis_laz showed in the comments:

xtabs(weights ~ group + code, out)

which gives:

code group 1 2 3 1 0.399 1.997 0.000 2 0.212 0.373 1.323 3 0.474 0.569 0.316

If you want to get this output in a long-form dataframe, you can wrap the xtabs code in the melt function of the reshape2 (or data.table) package:

library(reshape2) res <- melt(xtabs(weights ~ group + code, out))

which gives:

> class(res) [1] "data.frame" > res group code value 1 1 1 0.399 2 2 1 0.212 3 3 1 0.474 4 1 2 1.997 5 2 2 0.373 6 3 2 0.569 7 1 3 0.000 8 2 3 1.323 9 3 3 0.316

You could also do this with a combination of dplyr and tidyr:

library(dplyr) library(tidyr) out %>% complete(code, group, fill = list(weights=0)) %>% group_by(code, group) %>% summarise(sum(weights))

发布评论

评论列表(0)

  1. 暂无评论