Hello, readers! In this article, we would be having a look at an important concept of R programming - Reshaping data using R melt() and cast() functions, in detail.
The R melt() and cast() functions help us to reshape the data within a data frame into any customized shape.
Let’s understand both the functions in detail. Here we go!
The melt() function
in R programming is an in-built function. It enables us to reshape and elongate the data frames in a user-defined manner. It organizes the data values in a long data frame format.
Have a look at the below syntax!
Syntax:
melt(data-frame, na.rm = FALSE, value.name = “name”, id = 'columns')
We pass the data frame to the reshaped to the function along with na.rm = FALSE as the default value which means the NA values won’t be ignored.
Further, we pass the new variable/column name to value.name parameter to store the elongated values obtained from the function into it.
The ID parameter is set to the column names of the data frame with respect to which the reshaping would happen.
Example:
In this example, we would be making use of libraries ‘MASS, reshape2, and reshape’. Having created the data frame, we apply the melt() function on the data frame with respect to the column A and B.
rm(list = ls())
install.packages("MASS")
install.packages("reshape2")
install.packages("reshape")
library(MASS)
library(reshape2)
library(reshape)
A <- c(1,2,3,4,2,3,4,1)
B <- c(1,2,3,4,2,3,4,1)
a <- c(10,20,30,40,50,60,70,80)
b <- c(100,200,300,400,500,600,700,800)
data <- data.frame(A,B,a,b)
print("Original data frame:\n")
print(data)
melt_data <- melt(data, id = c("A","B"))
print("Reshaped data frame:\n")
print(melt_data)
Output:
[1] "Original data frame:\n"
A B a b
1 1 1 10 100
2 2 2 20 200
3 3 3 30 300
4 4 4 40 400
5 2 2 50 500
6 3 3 60 600
7 4 4 70 700
8 1 1 80 800
[1] "Reshaped data frame:\n"
> print(melt_data)
A B variable value
1 1 1 a 10
2 2 2 a 20
3 3 3 a 30
4 4 4 a 40
5 2 2 a 50
6 3 3 a 60
7 4 4 a 70
8 1 1 a 80
9 1 1 b 100
10 2 2 b 200
11 3 3 b 300
12 4 4 b 400
13 2 2 b 500
14 3 3 b 600
15 4 4 b 700
16 1 1 b 800
As seen above, after applying melt() function, the data frame gets converted to an elongated data frame. In order to regain the nearly original and natural shape of the data frame, R cast() function
is used.
The cast() function accepts an aggregated function and a formula as a parameter (here, formula is the manner in which the data is to be represented after reshaping) and casts the elongated or molted data frame into a nearly aggregated form of data frame.
Syntax:
cast(data, formula, aggregate function)
We can provide the cast() function with any aggregate function available such as mean, sum, etc.
Example:
rm(list = ls())
library(MASS)
library(reshape2)
library(reshape)
A <- c(1,2,3,4,2,3,4,1)
B <- c(1,2,3,4,2,3,4,1)
a <- c(10,20,30,40,50,60,70,80)
b <- c(100,200,300,400,500,600,700,800)
data <- data.frame(A,B,a,b)
print("Original data frame:\n")
print(data)
melt_data <- melt(data, id = c("A"))
print("Reshaped data frame after melting:\n")
print(melt_data)
cast_data = cast(melt_data, A~variable, mean)
print("Reshaped data frame after casting:\n")
print(cast_data)
As seen above, we have passed mean as the aggregate function to cast() and have set variable equivalent to A variable as the format of representation.
Output:
[1] "Original data frame:\n"
A B a b
1 1 1 10 100
2 2 2 20 200
3 3 3 30 300
4 4 4 40 400
5 2 2 50 500
6 3 3 60 600
7 4 4 70 700
8 1 1 80 800
[1] "Reshaped data frame after melting:\n"
A variable value
1 1 B 1
2 2 B 2
3 3 B 3
4 4 B 4
5 2 B 2
6 3 B 3
7 4 B 4
8 1 B 1
9 1 a 10
10 2 a 20
11 3 a 30
12 4 a 40
13 2 a 50
14 3 a 60
15 4 a 70
16 1 a 80
17 1 b 100
18 2 b 200
19 3 b 300
20 4 b 400
21 2 b 500
22 3 b 600
23 4 b 700
24 1 b 800
[1] "Reshaped data frame after casting:\n"
A B a b
1 1 1 45 450
2 2 2 35 350
3 3 3 45 450
4 4 4 55 550
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
Till then, Happy Learning!! :)
Reference:
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.