How to read multiple csv files in R and skip last row using fread function

r csv data.table fread

859 观看

1回复

26 作者的声誉

I am trying to read multiple files (csv) using fread function. But at last row I have unnecessary data and I am unable to use fread as it is throwing an error.

Code:

library(data.table)    
fnames <- list.files("Path",pattern = "^.*Star.*.csv$",full=TRUE)

   read_data <- function(z){
      dat <- fread(z, verbose = TRUE, nrow= -1)
   }

   datalist <- lapply(fnames, fread)

   bigdata <- rbindlist(datalist, use.names = TRUE)

Error:

Error during wrapup: Expected sep (',') but new line, EOF (or other non printing character) ends field 4 when detecting types from point 10: 2704,IE,N,ENDOFFILEMARKER,5397786

I have a row with data ENDOFFILEMARKER at last of each file.

Note:


  • I need to use fread as each data file is around 700 MB.

作者: dharma 的来源 发布者: 2017 年 12 月 27 日

回应 (1)


2

309 作者的声誉

Without seeing your csv files, it is difficult to determine the best answer. Perhaps try reading in one file first using fread. Using something like this may work:

dat <- fread("grep -v ENDOFFILEMARKER filename.csv")

where filename.csv is the name of one of your files placed in your working directory. The -v makes grep return all lines except lines containing the string ENDOFFILEMARKER. If you can get it working with one file, you can then work on applying similar logic to all of the files using lapply.

Another option which has worked for me is using the readLines function. The downside is that the readLines function is somewhat slow. But, if you can't figure out another way, then readLines will work. Here's basically how I used it on one file:

length_a <- length(readLines("filename.csv"))
dt <- fread("filename.csv", nrows = length_a-1)

Once you have it working for one file, you can then figure out how to use it with a loop for all your files.

I understand that fread("head -n -1 filename.csv") is the generally accepted method of skipping the last line but I have never been able to get it to work properly.

Edit: If you are using Windows, this may work for you:

 dat <- fread('findstr /V /C:"ENDOFFILEMARKER" filename.csv')

grep works well if you are using Linux or have Linux tools installed on your Windows machine. If you are using Windows, findstr command is similar to the grep command in Linux. The /V returns all lines except the line containing ENDOFFILEMARKER. The /C:"... ..." allows for matching multiple words including spaces or just one word exactly.

作者: FG7 发布者: 29.12.2017 03:03
32x32