The CSV (Comma Separated Value) file is a plain text file that uses a comma to separate values.
R has a built-in functionality that makes it easy to read and write a CSV file.
Sample CSV File
To demonstrate how we read CSV files in R, let's suppose we have a CSV file named airtravel.csv
with following data:
Month, 1958, 1959, 1960
JAN, 340, 360, 417
FEB, 318, 342, 391
MAR, 362, 406, 419
APR, 348, 396, 461
MAY, 363, 420, 472
JUN, 435, 472, 535
JUL, 491, 548, 622
AUG, 505, 559, 606
SEP, 404, 463, 508
OCT, 359, 407, 461
NOV, 310, 362, 390
DEC, 337, 405, 432
The CSV file above is a sample data of monthly air travel, in thousands of passengers, for 1958-1960.
Now, let's try to read data from this CSV File using R's built-in functions.
Read a CSV File in R
In R, we use the read.csv()
function to read a CSV file available in our current directory. For example,
# read airtravel.csv file from our current directory
read_data <- read.csv("airtravel.csv")
# display csv file
print(read_data)
Output
Month, 1958, 1959, 1960 1 JAN 340 360 417 2 FEB 318 342 391 3 MAR 362 406 419 4 APR 348 396 461 5 MAY 363 420 472 6 JUN 435 472 535 7 JUL 491 548 622 8 AUG 505 559 606 9 SEP 404 463 508 10 OCT 359 407 461 11 NOV 310 362 390 12 DEC 337 405 432
In the above example, we have read the airtravel.csv
file that is available in our current directory. Notice the code,
read_data <- read.csv("airtravel.csv")
Here, read.csv()
reads the csv file airtravel.csv
and creates a dataframe which is stored in the read_data variable.
Finally, the csv file is displayed using print()
.
Note: If the file is in some other location, we have to specify the path along with the file name as: read.csv("D:/folder1/airtravel.csv")
.
Number of Rows and Columns of CSV File in R
We use the ncol()
and nrow()
function to get the total number of rows and columns present in the CSV file in R. For example,
# read airtravel.csv file from our directory
read_data <- read.csv("airtravel.csv")
# print total number of columns
cat("Total Columns: ", ncol(read_data))
# print total number of rows
cat("Total Rows:", nrow(read_data))
Output
Total Columns: 4 Total Rows: 12
In the above example, we have used the ncol() and nrow() function to find the total number of columns and rows in the airtravel.csv
file.
Here,
ncol(read_data)
- returns total number of columns i.e. 4nrow(read_data)
- returns total number of rows i.e. 12
Using min() and max() With CSV Files
In R, we can also find minimum and maximum data in a certain column of a CSV file using the min()
and max()
function. For example,
# read airtravel.csv file from our directory
read_data <- read.csv("airtravel.csv")
# return minimum value of 1960 column of airtravel.csv
min_data <- min(read_data$1960) # 390
# return maximum value of 1958 column of airtravel.csv
min_data <- max(read_data$1958) # 505
Output
[1] 390 [1] 505
Here, we have used the min()
and max()
function to find the minimum and maximum value of the 1960
and 1958
column of the airtravel.csv
file respectively.
min(read_data$1960)
- returns the minimum value from the1960
column i.e. 390max(read_data$1958)
- returns the maximum value from the1958
column i.e. 505
Subset of a CSV File in R
In R, we use the subset()
function to return all the datas from a CSV file that satisfies the specified condition. For example,
# read airtravel.csv file from our directory
read_data <- read.csv("airtravel.csv")
# return subset of csv where number of air
# traveler in 1958 should be greater than 400
sub_data <- subset(read_data, 1958 > 400)
print(sub_data)
Output
Month, 1958, 1959, 1960 6 JUN 435 472 535 7 JUL 491 548 622 8 AUG 505 559 606 9 SEP 404 463 508
In the above example, we have specified a certain condition inside the subset()
function to extract data from a CSV file.
subset(read_data, 1958 > 400)
Here, subset()
creates a subset of airtravel.csv
with data column 1958
having data greater than 400 and stored it in the sub_data data frame.
Since column 1958
has data greater than 400 in 6th, 7th, 8th, and 9th row, only these rows are displayed.
Write Into CSV File in R
In R, we use the write.csv()
function to write into a CSV file. We pass the data in the form of dataframe. For example,
# Create a data frame
dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE))
# write dataframe1 into file1 csv file
write.csv(dataframe1, "file1.csv")
In the above example, we have used the write.csv()
function to export a data frame named dataframe1 to a CSV file. Notice the arguments passed inside write.csv()
,
write.csv(dataframe1, "file1.csv")
Here,
dataframe1
- name of the data frame we want to exportfile1.csv
- name of the csv file
Finally, the file1.csv
file would look like this in our directory:
If we pass "quote = FALSE"
to write.csv()
as:
write.csv(dataframe1, "file1.csv",
quote = FALSE
)
Our file1.csv
would look like this:
All the values were wrapped by double quotes " "
are removed.