Unit 2 Reading and Writing Files
Unit 2 Reading and Writing Files
as follows:
read.table("GeeksforGeeks.txt")
Output:
id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Michelle,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
5,Gary,843.25,2015-03-27,Finance
6,Nina,578,2013-05-21,IT
7,Simon,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance
print(is.data.frame(data))
print(ncol(data))
print(nrow(data))
When we execute the above code, it produces the following
result −
[1] TRUE
[1] 5
[1] 8
Once we read data in a data frame, we can apply all the
functions applicable to data frames as explained in subsequent
section.
Get the maximum salary
# Create a data frame.
data <- read.csv("input.csv")
Here the column X comes from the data set newper. This can
be dropped using additional parameters while writing the file.
R - Excel File
Microsoft Excel is the most widely used spreadsheet program
which stores data in the .xls or .xlsx format. R can read directly
from these files using some excel specific packages. Few such
packages are - XLConnect, xlsx, gdata etc. We will be using xlsx
package. R can also write into excel file using this package.
Install xlsx Package
You can use the following command in the R console to install
the "xlsx" package. It may ask to install some additional
packages on which this package is dependent. Follow the same
command with required package name to install the additional
packages.
install.packages("xlsx")
any(grepl("xlsx",installed.packages()))
Open Microsoft excel. Copy and paste the following data in the
work sheet named as sheet1.
id name salary start_date dept
1 Rick 623.3 1/1/2012 IT
2 Dan 515.2 9/23/2013 Operations
3 Michelle 611 11/15/2014 IT
4 Ryan 729 5/11/2014 HR
5 Gary 43.25 3/27/2015 Finance
6 Nina 578 5/21/2013 IT
7 Simon 632.8 7/30/2013 Operations
8 Guru 722.5 6/17/2014 Finance
name city
Rick Seattle
Dan Tampa
Michelle Chicago
Ryan Seattle
Gary Houston
Nina Boston
Simon Mumbai
Guru Dallas
R - XML Files
XML is a file format which shares both the file format and the
data on the World Wide Web, intranets, and elsewhere using
standard ASCII text. It stands for Extensible Markup Language
(XML). Similar to HTML it contains markup tags. But unlike
HTML where the markup tag describes structure of the page, in
xml the markup tags describe the meaning of the data
contained into he file.
You can read a xml file in R using the "XML" package. This
package can be installed using following command.
install.packages("XML")
Input Data
Create a XMl file by copying the below data into a text editor
like notepad. Save the file with a .xml extension and choosing
the file type as all files(*.*).
<RECORDS>
<EMPLOYEE>
<ID>1</ID>
<NAME>Rick</NAME>
<SALARY>623.3</SALARY>
<STARTDATE>1/1/2012</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>2</ID>
<NAME>Dan</NAME>
<SALARY>515.2</SALARY>
<STARTDATE>9/23/2013</STARTDATE>
<DEPT>Operations</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>3</ID>
<NAME>Michelle</NAME>
<SALARY>611</SALARY>
<STARTDATE>11/15/2014</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>4</ID>
<NAME>Ryan</NAME>
<SALARY>729</SALARY>
<STARTDATE>5/11/2014</STARTDATE>
<DEPT>HR</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>5</ID>
<NAME>Gary</NAME>
<SALARY>843.25</SALARY>
<STARTDATE>3/27/2015</STARTDATE>
<DEPT>Finance</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>6</ID>
<NAME>Nina</NAME>
<SALARY>578</SALARY>
<STARTDATE>5/21/2013</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>7</ID>
<NAME>Simon</NAME>
<SALARY>632.8</SALARY>
<STARTDATE>7/30/2013</STARTDATE>
<DEPT>Operations</DEPT>
</EMPLOYEE>
<EMPLOYEE>
<ID>8</ID>
<NAME>Guru</NAME>
<SALARY>722.5</SALARY>
<STARTDATE>6/17/2014</STARTDATE>
<DEPT>Finance</DEPT>
</EMPLOYEE>
</RECORDS>
2
Dan
515.2
9/23/2013
Operations
3
Michelle
611
11/15/2014
IT
4
Ryan
729
5/11/2014
HR
5
Gary
843.25
3/27/2015
Finance
6
Nina
578
5/21/2013
IT
7
Simon
632.8
7/30/2013
Operations
8
Guru
722.5
6/17/2014
Finance
Let's look at the first record of the parsed file. It will give us an
idea of the various elements present in the top level node.
attr(,"class")
[1] "XMLInternalNodeList" "XMLNodeList"
Get Different Elements of a Node
# Load the packages required to read XML files.
library("XML")
library("methods")
R - JSON Files
install.packages("rjson")
Input Data
Create a JSON file by copying the below data into a text editor
like notepad. Save the file with a .json extension and choosing
the file type as all files(*.*).
{
"ID":["1","2","3","4","5","6","7","8" ],
"Name":
["Rick","Dan","Michelle","Ryan","Gary","Nina","Simon","Guru" ],
"Salary":
["623.3","515.2","611","729","843.25","578","632.8","722.5" ],
"StartDate":
[ "1/1/2012","9/23/2013","11/15/2014","5/11/2014","3/27/2015"
,"5/21/2013",
"7/30/2013","6/17/2014"],
"Dept":
[ "IT","Operations","IT","HR","Finance","IT","Operations","Financ
e"]
}
$Name
[1] "Rick" "Dan" "Michelle" "Ryan" "Gary" "Nina"
"Simon" "Guru"
$Salary
[1] "623.3" "515.2" "611" "729" "843.25" "578" "632.8"
"722.5"
$StartDate
[1] "1/1/2012" "9/23/2013" "11/15/2014" "5/11/2014"
"3/27/2015" "5/21/2013"
"7/30/2013" "6/17/2014"
$Dept
[1] "IT" "Operations" "IT" "HR" "Finance" "IT"
"Operations" "Finance"
print(json_data_frame)
When we execute the above code, it produces the following
result −
id, name, salary, start_date, dept
1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 NA Gary 843.25 2015-03-27 Finance
6 6 Nina 578.00 2013-05-21 IT
7 7 Simon 632.80 2013-07-30 Operations
8 8 Guru 722.50 2014-06-17 Finance
Example:
R
print(myData)
Output:
1 A computer science portal for geeks.
Note: The above R code, assumes that the file
“geeksforgeeks.txt” is in your current working directory. To
know your current working directory, type the
function getwd() in R console.
read.delim2(): This method is used for reading “tab-
separated value” files (“.txt”). By default, point (“,”) is used
as decimal points.
Syntax: read.delim2(file, header = TRUE, sep = “\t”, dec = “,”,
…)
Parameters:
file: the path to the file containing the data to be read into R.
header: a logical value. If TRUE, read.delim2() assumes that
your file has a header row, so row 1 is the name of each
column. If that’s not the case, you can add the argument
header = FALSE.
sep: the field separator character. “\t” is used for a tab-
delimited file.
dec: the character used in the file for decimal points.
Example:
R
print(myData)
Output:
1 A computer science portal for geeks.
file.choose(): In R it’s also possible to choose a file
interactively using the function file.choose(), and if you’re a
beginner in R programming then this method is very useful
for you.
Example:
R
print(myFile)
Output:
1 A computer science portal for geeks.
read_tsv(): This method is also used for to read a tab
separated (“\t”) values by using the help of readr package.
Syntax: read_tsv(file, col_names = TRUE)
Parameters:
file: the path to the file containing the data to be read into R.
col_names: Either TRUE, FALSE, or a character vector
specifying column names. If TRUE, the first row of the input
will be used as the column names.
Example:
R
library(readr)
print(myData)
Output:
# A tibble: 1 x 1
X1
library(readr)
print(myData)
print(myData)
Output:
[1] "A computer science portal for geeks."
Example:
R
library(readr)
myData = read_file("geeksforgeeks.txt")
print(myData)
Output:
[1] “A computer science portal for geeks.\r\nGeeksforgeeks is
founded by Sandeep Jain Sir.\r\nI am an intern at this amazing
platform.”
Reading a file in a table format
Another popular format to store a file is in a tabular format. R
provides various methods that one can read data from a
tabular formatted data file.
read.table(): read.table() is a general function that can be
used to read a file in table format. The data will be imported as
a data frame.
Syntax: read.table(file, header = FALSE, sep = “”, dec = “.”)
Parameters:
file: the path to the file containing the data to be imported
into R.
header: logical value. If TRUE, read.table() assumes that
your file has a header row, so row 1 is the name of each
column. If that’s not the case, you can add the argument
header = FALSE.
sep: the field separator character
dec: the character used in the file for decimal points.
Example:
R
# Using read.table()
myData = read.table("basic.csv")
print(myData)
Output:
1 Name,Age,Qualification,Address
2 Amiya,18,MCA,BBS
3 Niru,23,Msc,BLS
4 Debi,23,BCA,SBP
5 Biku,56,ISC,JJP
read.csv(): read.csv() is used for reading “comma separated
value” files (“.csv”). In this also the data will be imported as a
data frame.
Syntax: read.csv(file, header = TRUE, sep = “,”, dec = “.”, …)
Parameters:
file: the path to the file containing the data to be imported
into R.
header: logical value. If TRUE, read.csv() assumes that your
file has a header row, so row 1 is the name of each column.
If that’s not the case, you can add the argument header =
FALSE.
sep: the field separator character
dec: the character used in the file for decimal points.
Example:
R
# Using read.csv()
myData = read.csv("basic.csv")
print(myData)
Output:
Name Age Qualification Address
1 Amiya 18 MCA BBS
2 Niru 23 Msc BLS
3 Debi 23 BCA SBP
4 Biku 56 ISC JJP
Example:
R
# Using read.csv2()
myData = read.csv2("basic.csv")
print(myData)
Output:
Name.Age.Qualification.Address
1 Amiya,18,MCA,BBS
2 Niru,23,Msc,BLS
3 Debi,23,BCA,SBP
4 Biku,56,ISC,JJP
file.choose(): You can also
use file.choose() with read.csv() just like before.
Example:
R
print(myData)
Output:
Name Age Qualification Address
1 Amiya 18 MCA BBS
2 Niru 23 Msc BLS
3 Debi 23 BCA SBP
4 Biku 56 ISC JJP
library(readr)
# Using read_csv() method
print(myData)
Output:
Parsed with column specification:
cols(
Name = col_character(),
Age = col_double(),
Qualification = col_character(),
Address = col_character()
)
# A tibble: 4 x 4
Name Age Qualification Address
# Using read.delim()
myData =
read.delim("http://www.sthda.com/upload/boxplot_format.txt")
print(head(myData))
Output:
CSV stands for Comma Separated Values. These files are used
to handle a large amount of statistical data. Following is the
syntax to write to a CSV file:
Syntax:
R
Here,
csv() and csv2() are the function in R programming.
write.csv() uses “.” for the decimal point and a comma (“,
”) for the separator.
write.csv2() uses a comma (“, ”) for the decimal point and
a semicolon (“;”) for the separator.
R
library("xlsx")