Read Data from Multiple Excel Sheets and Convert them to Individual Data Frames

I work with a team of data enthusiasts where we deal with a large amount of data every single day. Sometimes it happens that we end up storing the data into multiple Excel files or multiple Excel sheets.

While data analysis though, we have to read the data from different sheets and convert them to individual data frames; this means the same function to import the data must be used several times.

I am sure I am not the only one who works with multiple data sets at once, therefore I decided to write this post to explore an efficient way of reading data from several Excel sheets and storing them into individual data frames all at once, which I hope it can be of some help for those who face the same challenge and to have a record of the script for my future work as well.

Step 1: Reading data

Suppose you have an Excel file named as data.xlsx with data in several sheets and you aim to import the data from every single sheet all at once. There are at least two ways of doing this: 1) using the lapply() function, 2) using map() function from purrr package.

using lapply( )

library(readxl)

df_list <- lapply(excel_sheets("data.xlsx"), function(x)
  read_excel("data.xlsx", sheet = x)
  )

using map ( )

library(purrr)

df_list <- map(set_names(excel_sheets("data.xlsx")),
               read_excel, path = "data.xlsx"
               )

Although both lapply() and map() store the final output as a list, the map() function creates a named list where the name of each element of the list corresponds to the name of each sheet which the data come from. Thus, later on you will be able to easily identify which Excel sheet is stored in which element of the list of data frames.

Step 2: Converting the list of data frames into individual data frames

Once you read the data from Excel sheets and store them in a list, the next step is to convert them to individual data frames - unless if you want to apply some list-wise operations such as removing a particular row/column from all data frames at once.

There are several ways of doing this. The straightforward way which I found most appealing and simple to use is the list2env() function. You only need to feed it with the list in which the data frames are stored and it will convert each component of the list into a single object.

list2env(df_list, envir = .GlobalEnv)

Apart from that, the same task can be accomplished using the assign() function with lapply() and map() functions too as below.

using lapply( )

lapply(names(df_list), function(x)
  assign(x, df_list[[x]], envir = .GlobalEnv)
  )

using map( )

purrr::map(names(df_list),
           ~assign(.x, df_list[[.x]], envir = .GlobalEnv)
           )

Related