diff --git a/NEWS.md b/NEWS.md index fc6944943..d3be30ec1 100644 --- a/NEWS.md +++ b/NEWS.md @@ -107,6 +107,21 @@ 18. `fwrite` now allows `dec` to be the same as `sep` for edge cases where only one will be written, e.g. 0-row or 1-column tables. [#7227](https://github.com/Rdatatable/data.table/issues/7227). Thanks @MichaelChirico for the report and @venom1204 for the fix. +19. Using `by=` or `keyby=` with a simple numeric or character vector in `j` (e.g. `DT[, 1:2, by=grp]`) used to silently ignore the grouping argument. This now issues a warning to alert the user that grouping is not applied in this syntax and guides them to use the `.SD` idiom instead. [#5397](https://github.com/Rdatatable/data.table/issues/5397). Thanks to @mcol for the report and @venom1204 for the fix. + + ```r + DT = data.table(a=1:4, grp=c(1,1,2,2)) + DT[, 1, by = grp] + # a + # + # 1: 1 + # 2: 2 + # 3: 3 + # 4: 4 + # Warning message: + # `by` or `keyby` is ignored when `j` is a numeric vector... + ``` + ### NOTES 1. The following in-progress deprecations have proceeded: diff --git a/R/data.table.R b/R/data.table.R index c64c391b3..f60ade1ca 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -735,9 +735,20 @@ replace_dot_alias = function(e) { names(..syms) = ..syms j = eval(jsub, lapply(substr(..syms, 3L, nchar(..syms)), get, pos=parent.frame()), parent.frame()) } - if (is.logical(j)) j = which(j) if (!length(j) && !notj) return( null.data.table() ) if (is.factor(j)) j = as.character(j) # fix for FR: #358 + if (is.character(j) || (is.numeric(j) && !is.logical(j))) { + if (!missingby) { + j_type = if (is.character(j)) "a character" else "a numeric" + warningf( + "`by` or `keyby` is ignored when `j` is %s vector used for column selection. Perhaps you intended to use `.SD`? For example: DT[, .SD[, %s], by = ...]",j_type, + gsub("%", "%%", deparse(jsub)) + ) + } + } + + if (is.logical(j)) j = which(j) + if (is.character(j)) { if (notj) { if (anyNA(idx <- chmatch(j, names_x))) diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index d7ffc476b..f3a8f6009 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -21620,3 +21620,12 @@ local({ test(2338.9, {fwrite(dd, f, forceDecimal=FALSE); fread(f)}, di) }) +# 5397 - keyby/key ignored if numeric indices used in j +DT = data.table(a=1:4, b=5:8, g=c(1,1,2,2)) +test(2339.1, DT[, 1:2, by=g], DT[, 1:2], warning="`by` or `keyby` is ignored") +test(2339.2, DT[, 2:1, keyby=g], DT[, 2:1], warning="`by` or `keyby` is ignored") +test(2339.3, DT[, c("b", "a"), by=g, with=FALSE], DT[, c("b", "a")], warning="`by` or `keyby` is ignored") +expected_sd = data.table(g=c(1,1,2,2), a=1:4, b=5:8) +test(2339.4, DT[, .SD[, 1:2], by=g], expected_sd) +expected_single_int = data.table(g=c(1,2), V1=c(1,1)) +test(2339.5, DT[, 1, by=g], expected_single_int) \ No newline at end of file diff --git a/man/data.table.Rd b/man/data.table.Rd index 658c57234..d2dccead3 100644 --- a/man/data.table.Rd +++ b/man/data.table.Rd @@ -97,7 +97,7 @@ data.table(\dots, keep.rownames=FALSE, check.names=FALSE, key=NULL, stringsAsFac See \href{../doc/datatable-intro.html}{\code{vignette("datatable-intro")}} and \code{example(data.table)}.} - \item{by}{ Column names are seen as if they are variables (as in \code{j} when \code{with=TRUE}). The \code{data.table} is then grouped by the \code{by} and \code{j} is evaluated within each group. The order of the rows within each group is preserved, as is the order of the groups. \code{by} accepts: + \item{by}{ Column names are seen as if they are variables (as in \code{j} when \code{with=TRUE}). \emph{Note that \code{by} and \code{keyby} are ignored when \code{j} is a character or numeric vector used for selecting columns (i.e., when the internal \code{with=FALSE} is triggered).} The \code{data.table} is then grouped by the \code{by} and \code{j} is evaluated within each group. The order of the rows within each group is preserved, as is the order of the groups. \code{by} accepts: \itemize{ \item A single unquoted column name: e.g., \code{DT[, .(sa=sum(a)), by=x]}