有一个三列和一些行数的data.frame
vendor <- c("BMW", "BMW", "BMW", "Audi", "Audi", "Mersedes", "BMW", "Audi")
model <- c("X1", "X5", "X1", "A6", "A6", "C-350", "X1", "A6")
year <- c(2009, 2011, 2010, 2015, 2015, 2011, 2010, 2016)
cars <- data.frame(vendor, model, year)
vendor model year
1 BMW X1 2009
2 BMW X5 2011
3 BMW X1 2010
4 Audi A6 2015
5 Audi A6 2015
6 Mersedes C-350 2011
7 BMW X1 2010
8 Audi A6 2016
我如何计算相同行的数量,理想情况下得到一个带有第四列的新数据框,它表示重复的次数。就像是:
vendor model year count
1 BMW X1 2009 1
2 BMW X5 2011 1
3 BMW X1 2010 2
4 Audi A6 2015 2
5 Mersedes C-350 2011 1
6 Audi A6 2016 1
尝试使用该功能
summary(cars)
但它输出每列的统计信息。
unique(cars) vendor model year 1 BMW X1 2009 2 BMW X5 2011 3 BMW X1 2010 4 Audi A6 2015 6 Mersedes C-350 2011 8 Audi A6 2016
或计算行数
library(data.table) cars <- data.table(cars) cars[, .N, by = names(cars)] vendor model year N 1: BMW X1 2009 1 2: BMW X5 2011 1 3: BMW X1 2010 2 4: Audi A6 2015 2 5: Mersedes C-350 2011 1 6: Audi A6 2016 1
另见http://www.sthda.com/english/wiki/identifying-and-removing-duplicate-data-in-r
使用包的选项
dplyr
dplyr
俄语包介绍