Sum by date using "if" statement

r if-statement

55 观看

5回复

1 作者的声誉

I am trying to sum between two dates in a data frame using an "if" statement.

date = seq(as.Date("2000-01-01"), as.Date("2000-01-31"), by="days")
nums = seq(1, 1, length.out = 31)
df = data.frame(date, nums)

if(df$date >= as.Date("2000-01-01") && df$date <= as.Date("2000-01-07")){
  sum(df$nums)
}

However, the output is "31" rather than "7" as I would expect. Is there a better way to sum by date? I would like to use an "if" statement because I would like to apply this to a much larger dataset with many different columns and within different lengths of time.

作者: fourdegreeswester 的来源 发布者: 2017 年 12 月 27 日

回应 5


2

474039 作者的声誉

We can just do the sum on the logical vector. Note that we are using only a single & to return a logical vector.

sum(df$date >= as.Date("2000-01-01") & df$date <= as.Date("2000-01-07"))

If the value of 'nums' are not all 1s, then subset the 'nums' based on the logical vector and get the sum`

sum(df$nums[df$date >= as.Date("2000-01-01") & df$date <= as.Date("2000-01-07")])
作者: akrun 发布者: 2017 年 12 月 27 日

1

126 作者的声誉

library(dplyr)

# your data
date = seq(as.Date("2000-01-01"), as.Date("2000-01-31"), by="days")
nums = seq(1, 1, length.out = 31)
df = data.frame(date, nums)

# answer
df %>%
  filter(date >= '2000-01-01' & date <= '2000-01-07') %>%
  summarize(sum = sum(nums))
作者: AlphaDrivers 发布者: 2017 年 12 月 27 日

1

4109 作者的声誉

Just use this function:

sum_by_dates <- function(frame, date_column, num_column, date1, date2) {
  sub_vec <- frame[[date_column]][frame[[date_column]] >= as.Date(date1) & frame[[date_column]] <= as.Date(date2)]
  df_new <- subset(frame, frame[[date_column]] %in% sub_vec)
  tot <- sum(df_new[[num_column]])
  return(tot)
}

Usage:

sum_by_dates(df, 'date', 'nums', '2000-01-01', '2000-01-07')
作者: Cybernetic 发布者: 2017 年 12 月 27 日

1

222295 作者的声誉

The if-function in R is not vectorized and neither is the "&&"-operator. The usual way to employ logical subsetting is to the vectorized operator "&" and put it in the first argument to "[":

sum(df[ df$date >= as.Date("2000-01-01") & df$date <= as.Date("2000-01-07"), 
   #That is a logical vector in the row selection position.
    "nums"])  #   The second argument to "[" is/are the column(s) to be selected.
#[1] 7
作者: 42- 发布者: 2017 年 12 月 27 日

1

3613 作者的声誉

...and illustrating the diversity of R, here's a solution using sqldf.

date = seq(as.Date("2000-01-01"), as.Date("2000-01-31"), by="days")
nums = seq(1, 1, length.out = 31)
df = data.frame(date, nums)

startDate <- as.Date("2000-01-01")
endDate <- as.Date("2000-01-07")
library(sqldf)
fn$sqldf("select sum(nums) from df where date between $startDate and $endDate")

and the output:

> fn$sqldf("select sum(nums) from df where date between $startDate and $endDate")
   sum(nums)
1         7
> 
作者: Len Greski 发布者: 2017 年 12 月 27 日
32x32