### Sum by date using "if" statement

55 观看

5回复

1 作者的声誉

I am trying to sum between two dates in a data frame using an "if" statement.

``````date = seq(as.Date("2000-01-01"), as.Date("2000-01-31"), by="days")
nums = seq(1, 1, length.out = 31)
df = data.frame(date, nums)

if(df\$date >= as.Date("2000-01-01") && df\$date <= as.Date("2000-01-07")){
sum(df\$nums)
}
``````

However, the output is "31" rather than "7" as I would expect. Is there a better way to sum by date? I would like to use an "if" statement because I would like to apply this to a much larger dataset with many different columns and within different lengths of time.

### 回应 5

2

474039 作者的声誉

We can just do the `sum` on the logical vector. Note that we are using only a single `&` to return a logical vector.

``````sum(df\$date >= as.Date("2000-01-01") & df\$date <= as.Date("2000-01-07"))
``````

If the value of 'nums' are not all 1s, then subset the 'nums' based on the logical vector and get the sum`

``````sum(df\$nums[df\$date >= as.Date("2000-01-01") & df\$date <= as.Date("2000-01-07")])
``````

1

126 作者的声誉

``````library(dplyr)

date = seq(as.Date("2000-01-01"), as.Date("2000-01-31"), by="days")
nums = seq(1, 1, length.out = 31)
df = data.frame(date, nums)

df %>%
filter(date >= '2000-01-01' & date <= '2000-01-07') %>%
summarize(sum = sum(nums))
``````

1

4109 作者的声誉

Just use this function:

``````sum_by_dates <- function(frame, date_column, num_column, date1, date2) {
sub_vec <- frame[[date_column]][frame[[date_column]] >= as.Date(date1) & frame[[date_column]] <= as.Date(date2)]
df_new <- subset(frame, frame[[date_column]] %in% sub_vec)
tot <- sum(df_new[[num_column]])
return(tot)
}
``````

Usage:

``````sum_by_dates(df, 'date', 'nums', '2000-01-01', '2000-01-07')
``````

1

222295 作者的声誉

The `if`-function in R is not vectorized and neither is the "&&"-operator. The usual way to employ logical subsetting is to the vectorized operator "&" and put it in the first argument to "[":

``````sum(df[ df\$date >= as.Date("2000-01-01") & df\$date <= as.Date("2000-01-07"),
#That is a logical vector in the row selection position.
"nums"])  #   The second argument to "[" is/are the column(s) to be selected.
#[1] 7
``````

1

3613 作者的声誉

...and illustrating the diversity of R, here's a solution using `sqldf`.

``````date = seq(as.Date("2000-01-01"), as.Date("2000-01-31"), by="days")
nums = seq(1, 1, length.out = 31)
df = data.frame(date, nums)

startDate <- as.Date("2000-01-01")
endDate <- as.Date("2000-01-07")
library(sqldf)
fn\$sqldf("select sum(nums) from df where date between \$startDate and \$endDate")
``````

and the output:

``````> fn\$sqldf("select sum(nums) from df where date between \$startDate and \$endDate")
sum(nums)
1         7
>
``````