这是我第一个关于stackoverflow的问题,所以如果已经得到回答,我的道歉,请让我知道在哪里看。
两个帖子,并认为他们可能回答我的问题,虽然我很努力地看到如何:
1)数据框架中的值的条件替换 2)创建一个功能来替换一个data.frame与另一个值
就是说,我试图通过引用另一个不同的数据帧来替换一个数据帧中的NAs(较短)长度,并从B列中拉取替换值,其中每个数据框中的列A的值匹配。
我已经修改了下面的数据,为simp虽然概念在实际数据中是一样的, FYI,在实际的第二个数据框中,列A中也没有重复。
这是第一个数据帧(df1):
> df1 BCA 1 NA 2012-10-01 0 2 NA 2012-10-01 5 3 4 2012-10-01 10 4 NA 2012- 10-01 15 5 NA 2012-10-01 20 6 20 2012-10-01 25 7 NA 2012-10-01 0 8 NA 2012-10- 01 5 9 5 2012-10-01 10 10 5 2012-10-01 15 > str(df1)'data.frame':10 obs。的3个变量: $ B:num NA NA 4 NA NA 20 NA NA 5 5 $ C:因子w / 1级2012-10-01:1 1 1 1 1 1 1 1 1 1 $ A:num 0 5 10 15 20 25 0 5 10 15第二个数据框(df2)。
> df2 AB 1 0 1.7169811 2 5 0.3396226 3 10 0.1320755 4 15 0.1509434 5 20 0.0754717 6 25 2.0943396 > str(df2)'data.frame':6 obs。的2个变量: $ A:int 0 5 10 15 20 25 $ B:num 1.717 0.3396 0.1321 0.1509 0.0755 ...我觉得我和以下代码非常接近:
> ; ifelse(is.na(df1 $ B)== TRUE,df2 $ B [df2 $ A == df1 $ A],df1 $ B) [1] 1.7169811 0.3396226 4.0000000 0.1509434 0.0754717 20.0000000 NA NA [9] 5.0000000 5.0000000 警告信息:在df2 $ A == df1 $ A:更长的对象长度不是较短对象长度的倍数显然,我希望第7和第8个输出元素是1.7169811和0.3396226,而不是NAs。 。谢谢你提前求助,再次感谢你的耐心!
解决方案尝试以下代码,该代码将收到您的原始语句,并在 ifelse的 TRUE 参数中进行小调整功能:
> df1 $ B< - ifelse(is.na(df1 $ B)== TRUE,df2 $ B [df2 $ A%in%df1 $ A],df1 $ B)#Switched'==' '%in%'--- ^ > df1 BCA 1 1.7169811 2012-10-01 0 2 0.3396226 2012-10-01 5 3 4.0000000 2012-10-01 10 4 0.1509434 2012- 10-01 15 5 0.0754717 2012-10-01 20 6 20.0000000 2012-10-01 25 7 1.7169811 2012-10-01 0 8 0.3396226 2012-10- 01 5 9 5.0000000 2012-10-01 10 10 5.0000000 2012-10-01 15
This is my first question on stackoverflow, so if it's already been answered, my apologies, and please let me know where to look.
I already reviewed the following two posts and think they might answer my question, although I'm struggling to see how:
1) Conditional replacement of values in a data.frame 2) Creating a function to replace NAs from one data.frame with values from another
With that said, I'm trying to replace NAs in one data frame by referencing another data frame of a different (shorter) length and pulling in replacement values from column "B" where the values for column "A" in each data frame match.
I've modified the data, below, for simplicity and illustration, although the concept is the same in the actual data. FYI, in the real second data frame, there are also no duplicates in column "A".
Here's the first data frame (df1):
> df1 B C A 1 NA 2012-10-01 0 2 NA 2012-10-01 5 3 4 2012-10-01 10 4 NA 2012-10-01 15 5 NA 2012-10-01 20 6 20 2012-10-01 25 7 NA 2012-10-01 0 8 NA 2012-10-01 5 9 5 2012-10-01 10 10 5 2012-10-01 15 > str(df1) 'data.frame': 10 obs. of 3 variables: $ B: num NA NA 4 NA NA 20 NA NA 5 5 $ C: Factor w/ 1 level "2012-10-01": 1 1 1 1 1 1 1 1 1 1 $ A: num 0 5 10 15 20 25 0 5 10 15And the second data frame (df2).
> df2 A B 1 0 1.7169811 2 5 0.3396226 3 10 0.1320755 4 15 0.1509434 5 20 0.0754717 6 25 2.0943396 > str(df2) 'data.frame': 6 obs. of 2 variables: $ A: int 0 5 10 15 20 25 $ B: num 1.717 0.3396 0.1321 0.1509 0.0755 ...I think I'm pretty close with the following code:
> ifelse(is.na(df1$B) == TRUE, df2$B[df2$A == df1$A], df1$B) [1] 1.7169811 0.3396226 4.0000000 0.1509434 0.0754717 20.0000000 NA NA [9] 5.0000000 5.0000000 Warning message: In df2$A == df1$A : longer object length is not a multiple of shorter object lengthObviously, I want the 7th and 8th output elements to be 1.7169811 and 0.3396226, rather than NAs . . .
Thanks, in advance, for any help, and, once again, thanks for your patience!
解决方案Try the following code which takes your original statement and makes a small tweak in the TRUE argument of the ifelse function:
> df1$B <- ifelse(is.na(df1$B) == TRUE, df2$B[df2$A %in% df1$A], df1$B) # Switched '==' to '%in%' ---^ > df1 B C A 1 1.7169811 2012-10-01 0 2 0.3396226 2012-10-01 5 3 4.0000000 2012-10-01 10 4 0.1509434 2012-10-01 15 5 0.0754717 2012-10-01 20 6 20.0000000 2012-10-01 25 7 1.7169811 2012-10-01 0 8 0.3396226 2012-10-01 5 9 5.0000000 2012-10-01 10 10 5.0000000 2012-10-01 15