R Dplyr Tutorial: Data Manipulation(Join) & Cleaning(Spread)

Sure, let’s go through a brief tutorial on using the dplyr package in R for data manipulation with joins and data cleaning using the spread function.

  1. Data Manipulation with Join Operations:

The dplyr package provides functions for performing various types of join operations to combine datasets. Here, we’ll cover the common types of joins: inner join, left join, right join, and full join.

Load the dplyr package

library(dplyr)

Sample data frames

orders ← data.frame(OrderID = c(101, 102, 103),
CustomerID = c(“C1”, “C2”, “C3”))
customers ← data.frame(CustomerID = c(“C1”, “C2”, “C4”),
CustomerName = c(“Alice”, “Bob”, “Eve”))

Inner Join

inner_join_result ← inner_join(orders, customers, by = “CustomerID”)

Left Join

left_join_result ← left_join(orders, customers, by = “CustomerID”)

Right Join

right_join_result ← right_join(orders, customers, by = “CustomerID”)

Full Join

full_join_result ← full_join(orders, customers, by = “CustomerID”)

  1. Data Cleaning with Spread:

The spread function is used to reshape data from long to wide format based on a key-value relationship.

Load the tidyr package for spread function

library(tidyr)

Sample data frame

data_long ← data.frame(ID = c(1, 1, 2, 2),
Key = c(“A”, “B”, “A”, “B”),
Value = c(10, 20, 30, 40))

Spread the data from long to wide format

data_wide ← spread(data_long, Key, Value)

Output:

ID A B

1 1 10 20

2 2 30 40