0% found this document useful (0 votes)
4 views

lbobgdt-07-python text file processing

The document provides an overview of string manipulation and file handling in Python, including string properties, indexing, and various functions for reading and writing files. It explains how to process text, convert data types, and handle input termination. Additionally, it includes examples and exercises to reinforce the concepts presented.

Uploaded by

kahetrahd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

lbobgdt-07-python text file processing

The document provides an overview of string manipulation and file handling in Python, including string properties, indexing, and various functions for reading and writing files. It explains how to process text, convert data types, and handle input termination. Additionally, it includes examples and exercises to reinforce the concepts presented.

Uploaded by

kahetrahd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1/23/2025

BIGDATA
Big Data Techniques
and Technologies

Bobby Reyes

Text and File Processing

1
1/23/2025

Strings
◼ string: A sequence of text characters in a program.
◼ Strings start and end with quotation mark " or apostrophe ' characters.
◼ Examples:
"hello"
"This is a string"
"This, too, is a string. It can be very long!"

◼ A string may not span across multiple lines or contain a " character.
"This is not
a legal String."
"This is not a "legal" String either."

◼ A string can represent characters by preceding them with a backslash.


◼ \t tab character
◼ \n new line character
◼ \" quotation mark character
◼ \\ backslash character

◼ Example: "Hello\tthere\nHow are you?"

Indexes
◼ Characters in a string are numbered with indexes starting at 0:
◼ Example:
name = "P. Diddy"

index 0 1 2 3 4 5 6 7
character P . D i d d y

◼ Accessing an individual character of a string:


variableName [ index ]

◼ Example:
print(name, "starts with", name[0])

Output:
P. Diddy starts with P

2
1/23/2025

String Properties
◼ len(string) - number of characters in a string
(including spaces)
◼ str.lower(string) - lowercase version of a string
◼ str.upper(string) - uppercase version of a string

◼ Example:
name = "Martin Douglas Stepp"
length = len(name)
big_name = str.upper(name)
print(big_name, "has", length, "characters")

Output:
MARTIN DOUGLAS STEPP has 20 characters

input
◼ input : Reads a string of text from user input.
◼ Example:
name = input("Howdy, pardner. What's yer name? ")
print(name, "... what a silly name!")

Output:
Howdy, pardner. What's yer name? Sixto Dimaculangan
Sixto Dimaculangan ... what a silly name!

3
1/23/2025

Text Processing
◼ text processing: Examining, editing, formatting text.
◼ often uses loops that examine the characters of a string one by one

◼ A for loop can examine each character in a string in sequence.


◼ Example:
for c in "booyah":
print(c)
Output:
b
o
o
y
a
h

Strings and Numbers


◼ ord(text) - converts a string into a number.
◼ Example: ord("a") is 97, ord("b") is 98, ...

◼ Characters map to numbers using standardized mappings such as


ASCII and Unicode.

◼ chr(number) - converts a number into a string.


◼ Example: chr(99) is "c"

◼ Exercise: Write a program that performs a rotation cypher.


◼ e.g. "Attack" when rotated by 1 becomes "buubdl"

4
1/23/2025

The File Object


◼ Many programs handle data, which often comes from files.
◼ File handling in Python can easily be done with the built-in object
file.
◼ The file object provides all of the basic functions necessary in
order to manipulate files.

◼ Exercise: Open up notepad or notepad++. Write some text and save


the file to a location and with a name you’ll remember, say
'Practice_File.txt'.

The open() function


◼ Before you can work with a file, you first have to open it using
Python’s in-built open() function.
◼ The open() function takes two arguments; the name of the file
that you wish to use and the mode for which we would like to open
the file; the result of open() is a file object that is used work on
this file
fh = open('Practice_File.txt', 'r')

◼ By default, the open() function opens a file in ‘read mode’; this is


what the 'r' above signifies.
◼ There are a number of different file opening modes. The most
common are: 'r'= read, 'w'=write, 'r+'=both reading and
writing, 'a'=appending.

◼ Exercise: Use the open() function to read the file in.

5
1/23/2025

The close() function


◼ Likewise, once you’re done working with a file, you can close it
with the close() function.
◼ Using this function will free up any system resources that are
being used up by having the file open.

fh.close()

Reading in a file and printing to


screen example
◼ Using what you have now learned about for loops, it is possible to
open a file for reading and then print each line in the file to the
screen using a for loop.
◼ Use a for loop and the variable name that you assigned the open
file to in order to print each of the lines in your file to the screen.
◼ Example:
fh = open('Practice_File.txt', 'r')
for line in fh:
print(line)

Output:
The first line of text
The second line of text
The third line of text

6
1/23/2025

The read() function


◼ However, you don’t need to use any loops to access file contents.
Python has in-built file reading commands:
◼ The read() function gets an optional argument, which is the
number of bytes to read. If you skip it, it will read the whole file
content and return it as a string.

1. <fileobject>.read() - returns the entire contents of the file as a single string


Output:
fh = open('Practice_File.txt', 'r’) The first line of text
print(fh.read()) The second line of text
The third line of text
The fourth line of text
The fifth line of text

<fileobject>.read(6) - read n=6 number of bytes Output:


The fi
fh = open('Practice_File.txt', 'r’)
print(fh.read(6))

readline() functions
◼ Other in-built file reading commands:
2. <fileobject>.readline() - returns one line at a time
fh = open('Practice_File.txt', 'r’)
Output:
print(fh.readline())
The first line of text

3. <fileobject>.readlines() - returns a list of lines


Output:
fh = open('Practice_File.txt', 'r’)
['The second line of
print(fh.readlines()) text\n', 'The third line of
text\n', 'The fourth line
of text\n', 'The fifth line
of text\n']

7
1/23/2025

The write() function


◼ Likewise, there are two similar in-built functions for getting Python
to write to a file:

1. <file>.write() - Writes a specified sequence of characters to a file


fh = open('Practice_File_W.txt', 'w')
fh.write('I am adding this string')

2. <file>.writelines() - Writes a list of strings to a file:


testList = ['First line\n', 'Second line\n']
fh = open('Practice_File_W.txt', 'w')
fh.writelines(testList)

Example Line-by-line Processing


◼ Reading a file line-by-line and write to output file:

fh1 = open('Practice_File.txt', 'r')


fh2 = open('Write_File.txt', 'w')
count = 0
for line in fh1.readlines():
fh2.write(line)
count += 1
fh2.write('The file contains ' + str(count) + ' lines.')
fh1.close()
fh2.close()

◼ Exercise: Write a program to process a file of DNA text, such as:


ATGCAATTGCTCGATTAG
◼ Count the percent of C+G present in the DNA.

8
1/23/2025

Data Conversion and Parsing


◼ A file, specifically a text file, consists of strings. However,
especially in engineering and science, we work with numbers.
Thus, need to convert (cast) input string to int or float.
◼ Another challenge is having multiple numbers on a string (line)
separated by special characters or simply spaces as '10.0 5.0 5.0’
➢ Can use the .split(delimiter) method of a string, which

returns a list of strings separated by the given delimiters.

instr = '10.0 5.0 5.0'


outlst = [ float(substr) for substr in instr.split(' ')]
print(outlst)

[10.0, 5.0, 5.0]

▪ Other useful methods on working with strings:


▪ .join(delimiter) – join elements of list of string with a delimiter
▪ .rstrip(‘\n’) – remove occurrences of '\n' at the end of string

Termination of Input
◼ Two ways to stop reading input:
1. By reading a definite number of items.
2. By the end of the file.

➢ EOF indicator – at end of file, functions like read() and

readline() return an empty string ''.

fp = open("pointlist.txt") # open file for reading

pointlist = [] # start with empty list


nextline = fp.readline() # first line of pointlist.txt is number of lines that follow; skip

nextline = fp.readline() # read following line, has two real values


# denoting x and y values of a point
while nextline != ‘’: # until end of file
nextline = nextline.rstrip('\n’) # remove occurrences of '\n' at the end
(x, y) = nextline.split(' ‘) # get x and y (note that they are still strings)
x = float(x) # convert them into real values
y = float(y)
pointlist.append( (x,y) ) # add tuple at the end
nextline = fp.readline() # read the nextline

fp.close()
print(pointlist)

[(0.0, 0.0), (10.0, 0.0), (10.0, 10.0), (0.0, 10.0)]

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy