SlideShare a Scribd company logo
1 of 21
Download to read offline
File operations and data parsing 
Presented by 
Felix Hoffmann 
@Felix11H 
felix11h.github.io/ 
Slides 
Slideshare: 
tiny.cc/file-ops 
Source: 
tiny.cc/file-ops-github 
References 
- Python Files I/O at tutorialspoint.com 
- Dive into Python 3 by Mark Pilgrim 
- Python Documentation on 
match objects 
- Regex tutorial at regexone.com 
This work is licensed under a Creative Commons Attribution 4.0 International License.
File operations: Reading 
Opening an existing file 
>>> f = open("test.txt","rb") 
>>> print f 
<open file ’test.txt’, mode ’rb’ at 0x...> 
Reading it: 
>>> f.read() 
’hello world’ 
Closing it: 
>>> f.close() 
>>> print f 
<closed file ’test.txt’, mode ’rb’ at 0x...>
File operations: Writing 
Opening a (new) file 
>>> f = open("new_test.txt","wb") 
>>> print f 
<open file ’test.txt’, mode ’wb’ at 0x...> 
Writing to it: 
>>> f.write("hello world, again") 
>>> f.write("... and again") 
>>> f.close() 
) Only after calling close() the changes appear in the file for 
editing elsewhere!
File operations: Appending 
Opening an existing file 
>>> f = open("test.txt","ab") 
>>> print f 
<open file ’test.txt’, mode ’ab’ at 0x...> 
Appending to it: 
>>> f.write("hello world, again") 
>>> f.write("... and again") 
>>> f.close() 
) In append mode the file pointer is set to the end of the opened 
file.
File operations: More about file pointers 
1 f = open("lines_test.txt", "wb") 
2 for i in range(10): 
3 f.write("this is line %d n" %(i+1)) 
4 f.close() 
Reading from the file: 
>>> f = open("lines_test.txt", "rb") 
>>> f.readline() 
’this is line 1 n’ 
>>> f.readline() 
’this is line 2 n’ 
>>> f.read(14) 
’this is line 3’ 
>>> f.read(2) 
’ n’
File operations: More about file pointers 
f.tell() gives current position within file f 
f.seek(x[, from]) change file pointer position within 
file f, where 
from = 0 from beginning of file 
from = 1 from current position 
from = 2 from end of file 
1 >>> f = open("lines_test.txt", "rb") 
2 >>> f.tell() 
3 0 
4 >>> f.read(10) 
5 ’this is li’ 
6 >>> f.tell() 
7 10
File operations: More about file pointers 
1 >>> f.seek(5) 
2 >>> f.tell() 
3 5 
4 >>> f.seek(10,1) 
5 >>> f.tell() 
6 15 
7 >>> f.seek(-10,2) 
8 >>> f.tell() 
9 151 
10 >>> f.read() 
11 ’ line 10 n’
File operations: Other Modes 
rb+ Opens the file for reading and writing. File pointer will 
be at the beginning of the file. 
wb+ Opens for reading and writing. Overwrites the existing 
file if the file exists, otherwise a new file is created. 
ab+ Opens the file for appending and reading. The file 
pointer is at the end of the file if the file exists, otherwise 
a new file is created for reading and writing.
Saving Data: Python Pickle 
Use pickle to save and retrieve more complex data types - lists, 
dictionaries and even class objects: 
1 >>> import pickle 
2 >>> f = open(’save_file.p’, ’wb’) 
3 >>> ex_dict = {’hello’: ’world’} 
4 >>> pickle.dump(ex_dict, f) 
5 >>> f.close() 
1 >>> import pickle 
2 >>> f = open(’save_file.p’, ’rb’) 
3 >>> loadobj = pickle.load(f) 
4 >>> print loadobj[’hello’] 
5 world
Best practice: With Statement 
1 import pickle 
2 
3 ex_dict = {’hello’: ’world’} 
4 
5 with open(’save_file.p’, ’wb’) as f: 
6 pickle.dump(ex_dict, f) 
1 import pickle 
2 
3 with open(’save_file.p’, ’rb’) as f: 
4 loadobj = pickle.load(f) 
5 
6 print loadobj[’hello’] 
) Use this!
Need for parsing 
Imagine that 
Data files are 
generated by a third 
party (no control over 
the format) 
& the data files need 
pre-processing 
) Regular expressions 
provide a powerful 
and concise way to 
perform pattern 
match/search/replace 
over the data 
©Randall Munroe xkcd.com CC BY-NC 2.5
Regular expressions - A case study 
Formatting street names 
>>> s = ’100 NORTH MAIN ROAD’ 
>>> s.replace(’ROAD’, ’RD.’) 
’100 NORTH MAIN RD.’ 
>>> s = ’100 NORTH BROAD ROAD’ 
>>> s.replace(’ROAD’, ’RD.’) 
’100 NORTH BRD. RD.’ 
>>> s[:-4] + s[-4:].replace(’ROAD’, ’RD.’) 
’100 NORTH BROAD RD.’ 
Better use regular expressions! 
>>> import re 
>>> re.sub(r’ROAD$’, ’RD.’, s) 
’100 NORTH BROAD RD.’ 
example from Dive Into Python 3 
©Mark Pilgrim CC BY-SA 3.0
Pattern matching with regular expressions 
ˆ Matches beginning of line/pattern 
$ Matches end of line/pattern 
. Matches any character except newline 
[..] Matches any single character in brackets 
[ˆ..] Matches any single character not in brackets 
re* Matches 0 or more occurrences of the preceding 
expression 
re+ Matches 1 or more occurrences of the preceding 
expression 
re? Matches 0 or 1 occurrence 
refng Match exactly n occurrences 
refn,g Match n or more occurrences 
refn,mg Match at least n and at most m 
) Use cheatsheets, trainers, tutorials, builders, etc..
re.search() & matches 
>>> import re 
>>> data = "I like python" 
>>> m = re.search(r’python’,data) 
>>> print m 
<_sre.SRE_Match object at 0x...> 
Important properties of the match object: 
group() Return the string matched by the RE 
start() Return the starting position of the match 
end() Return the ending position of the match 
span() Return a tuple containing the (start, end) positions of 
the match
re.search() & matches 
For example: 
>>> import re 
>>> data = "I like python" 
>>> m = re.search(r’python’,data) 
>>> m.group() 
’python’ 
>>> m.start() 
7 
>>> m.span() 
(7,13) 
For a complete list of match object properties see for example the 
Python Documentation: 
https://docs.python.org/2/library/re.html#match-objects
re.findall() 
>>> import re 
>>> data = "Python is great. I like python" 
>>> m = re.search(r’[pP]ython’,data) 
>>> m.group() 
’Python’ 
) re.search() returns only the first match, use re.findall() instead: 
>>> import re 
>>> data = "Python is great. I like python" 
>>> l = re.findall(r’[pP]ython’,data) 
>>> print l 
[’Python’, ’python’] 
) Returns list instead of match object!
re.findall() - Example 
1 import re 
2 
3 with open("history.txt", "rb") as f: 
4 text = f.read() 
5 
6 year_dates = re.findall(r’19[0-9]{2}’, text)
re.split() 
Suppose the data stream has well-defined delimiter 
>>> data = "x = 20" 
>>> re.split(r’=’,data) 
[’x ’, ’ 20’] 
>>> data = ’ftp://python.about.com’ 
>>> re.split(r’:/{1,3}’, data) 
[’ftp’, ’python.about.com’] 
>>> data = ’25.657’ 
>>> re.split(r’.’,data) 
[’25’, ’657’]
re.sub() 
Replace patterns by other patterns. 
>>> data = "2004-959-559 # my phone number" 
>>> re.sub(r’#.*’,’’,data) 
’2004-959-559 ’ 
A more interesting example: 
>>> data = "2004-959-559" 
>>> re.sub(r’([0-9]*)-([0-9]*)-([0-9]*)’, 
>>> r’3-2-1’, data) 
’559-959-2004’ 
) Groups are captured in parenthesis and referenced in the 
replacement string by n1, n2, ...
os module 
Provides a way of using os dependent functionality: 
os.mkdir() Creates a directory (like mkdir) 
os.chmod() Change the permissions (like chmod) 
os.rename() Rename the old file name with the new file name. 
os.listdir() List the contents of the directory 
os.getcwd() Get the current working directory path 
os.path Submodule for useful functions on pathnames 
For example, list all files in the current directory: 
>>> from os import listdir 
>>> 
>>> for f in listdir("."): 
>>> print f
Have fun! 
Presented by 
Felix Hoffmann 
@Felix11H 
felix11h.github.io/ 
Slides 
Slideshare: 
tiny.cc/file-ops 
Source: 
tiny.cc/file-ops-github 
References 
- Python Files I/O at tutorialspoint.com 
- Dive into Python 3 by Mark Pilgrim 
- Python Documentation on 
match objects 
- Regex tutorial at regexone.com 
This work is licensed under a Creative Commons Attribution 4.0 International License.

More Related Content

What's hot

Python File Handling | File Operations in Python | Learn python programming |...
Python File Handling | File Operations in Python | Learn python programming |...Python File Handling | File Operations in Python | Learn python programming |...
Python File Handling | File Operations in Python | Learn python programming |...Edureka!
 
python file handling
python file handlingpython file handling
python file handlingjhona2z
 
File Handling and Command Line Arguments in C
File Handling and Command Line Arguments in CFile Handling and Command Line Arguments in C
File Handling and Command Line Arguments in CMahendra Yadav
 
Files and file objects (in Python)
Files and file objects (in Python)Files and file objects (in Python)
Files and file objects (in Python)PranavSB
 
File handling in c
File handling in cFile handling in c
File handling in caakanksha s
 
File handling in C++
File handling in C++File handling in C++
File handling in C++Hitesh Kumar
 
File handling and Dictionaries in python
File handling and Dictionaries in pythonFile handling and Dictionaries in python
File handling and Dictionaries in pythonnitamhaske
 
File handling in c
File handling in c File handling in c
File handling in c Vikash Dhal
 
File handling-c programming language
File handling-c programming languageFile handling-c programming language
File handling-c programming languagethirumalaikumar3
 
File Handling Python
File Handling PythonFile Handling Python
File Handling PythonAkhil Kaushik
 
UNIT 10. Files and file handling in C
UNIT 10. Files and file handling in CUNIT 10. Files and file handling in C
UNIT 10. Files and file handling in CAshim Lamichhane
 

What's hot (20)

Python File Handling | File Operations in Python | Learn python programming |...
Python File Handling | File Operations in Python | Learn python programming |...Python File Handling | File Operations in Python | Learn python programming |...
Python File Handling | File Operations in Python | Learn python programming |...
 
Python - Lecture 8
Python - Lecture 8Python - Lecture 8
Python - Lecture 8
 
python file handling
python file handlingpython file handling
python file handling
 
File handling
File handlingFile handling
File handling
 
File Handling and Command Line Arguments in C
File Handling and Command Line Arguments in CFile Handling and Command Line Arguments in C
File Handling and Command Line Arguments in C
 
C Programming Unit-5
C Programming Unit-5C Programming Unit-5
C Programming Unit-5
 
Files and file objects (in Python)
Files and file objects (in Python)Files and file objects (in Python)
Files and file objects (in Python)
 
File handling in c
File handling in cFile handling in c
File handling in c
 
File handling in C++
File handling in C++File handling in C++
File handling in C++
 
File handling and Dictionaries in python
File handling and Dictionaries in pythonFile handling and Dictionaries in python
File handling and Dictionaries in python
 
File handling in c
File handling in cFile handling in c
File handling in c
 
Python file handling
Python file handlingPython file handling
Python file handling
 
File in C language
File in C languageFile in C language
File in C language
 
File handling in c
File handling in c File handling in c
File handling in c
 
Functions in python
Functions in pythonFunctions in python
Functions in python
 
File handling-c programming language
File handling-c programming languageFile handling-c programming language
File handling-c programming language
 
File Handling Python
File Handling PythonFile Handling Python
File Handling Python
 
File in c
File in cFile in c
File in c
 
Files in php
Files in phpFiles in php
Files in php
 
UNIT 10. Files and file handling in C
UNIT 10. Files and file handling in CUNIT 10. Files and file handling in C
UNIT 10. Files and file handling in C
 

Viewers also liked

Manipulating file in Python
Manipulating file in PythonManipulating file in Python
Manipulating file in Pythonshoukatali500
 
Writing Wireshark Filter Expression For Capturing Packets
Writing Wireshark Filter Expression For Capturing PacketsWriting Wireshark Filter Expression For Capturing Packets
Writing Wireshark Filter Expression For Capturing PacketsXafran Marwat
 
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...Alessandro Molina
 
Creating Custom Drupal Modules
Creating Custom Drupal ModulesCreating Custom Drupal Modules
Creating Custom Drupal Modulestanoshimi
 
FLTK Summer Course - Part VIII - Eighth Impact
FLTK Summer Course - Part VIII - Eighth ImpactFLTK Summer Course - Part VIII - Eighth Impact
FLTK Summer Course - Part VIII - Eighth ImpactMichel Alves
 
TMS - Schedule of Presentations and Reports
TMS - Schedule of Presentations and ReportsTMS - Schedule of Presentations and Reports
TMS - Schedule of Presentations and ReportsMichel Alves
 
FLTK Summer Course - Part I - First Impact - Exercises
FLTK Summer Course - Part I - First Impact - ExercisesFLTK Summer Course - Part I - First Impact - Exercises
FLTK Summer Course - Part I - First Impact - ExercisesMichel Alves
 
Using Git on the Command Line
Using Git on the Command LineUsing Git on the Command Line
Using Git on the Command LineBrian Richards
 
FLTK Summer Course - Part VI - Sixth Impact - Exercises
FLTK Summer Course - Part VI - Sixth Impact - ExercisesFLTK Summer Course - Part VI - Sixth Impact - Exercises
FLTK Summer Course - Part VI - Sixth Impact - ExercisesMichel Alves
 
"Git Hooked!" Using Git hooks to improve your software development process
"Git Hooked!" Using Git hooks to improve your software development process"Git Hooked!" Using Git hooks to improve your software development process
"Git Hooked!" Using Git hooks to improve your software development processPolished Geek LLC
 
FLTK Summer Course - Part III - Third Impact
FLTK Summer Course - Part III - Third ImpactFLTK Summer Course - Part III - Third Impact
FLTK Summer Course - Part III - Third ImpactMichel Alves
 
FLTK Summer Course - Part VII - Seventh Impact
FLTK Summer Course - Part VII  - Seventh ImpactFLTK Summer Course - Part VII  - Seventh Impact
FLTK Summer Course - Part VII - Seventh ImpactMichel Alves
 
Servicios web con Python
Servicios web con PythonServicios web con Python
Servicios web con PythonManuel Pérez
 
FLTK Summer Course - Part II - Second Impact - Exercises
FLTK Summer Course - Part II - Second Impact - Exercises FLTK Summer Course - Part II - Second Impact - Exercises
FLTK Summer Course - Part II - Second Impact - Exercises Michel Alves
 
Code Refactoring - Live Coding Demo (JavaDay 2014)
Code Refactoring - Live Coding Demo (JavaDay 2014)Code Refactoring - Live Coding Demo (JavaDay 2014)
Code Refactoring - Live Coding Demo (JavaDay 2014)Peter Kofler
 
Introduction to Git Commands and Concepts
Introduction to Git Commands and ConceptsIntroduction to Git Commands and Concepts
Introduction to Git Commands and ConceptsCarl Brown
 
FLTK Summer Course - Part II - Second Impact
FLTK Summer Course - Part II - Second ImpactFLTK Summer Course - Part II - Second Impact
FLTK Summer Course - Part II - Second ImpactMichel Alves
 
Git hooks For PHP Developers
Git hooks For PHP DevelopersGit hooks For PHP Developers
Git hooks For PHP DevelopersUmut IŞIK
 

Viewers also liked (20)

Introduction to Sumatra
Introduction to SumatraIntroduction to Sumatra
Introduction to Sumatra
 
Manipulating file in Python
Manipulating file in PythonManipulating file in Python
Manipulating file in Python
 
Writing Wireshark Filter Expression For Capturing Packets
Writing Wireshark Filter Expression For Capturing PacketsWriting Wireshark Filter Expression For Capturing Packets
Writing Wireshark Filter Expression For Capturing Packets
 
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...
 
Creating Custom Drupal Modules
Creating Custom Drupal ModulesCreating Custom Drupal Modules
Creating Custom Drupal Modules
 
FLTK Summer Course - Part VIII - Eighth Impact
FLTK Summer Course - Part VIII - Eighth ImpactFLTK Summer Course - Part VIII - Eighth Impact
FLTK Summer Course - Part VIII - Eighth Impact
 
TMS - Schedule of Presentations and Reports
TMS - Schedule of Presentations and ReportsTMS - Schedule of Presentations and Reports
TMS - Schedule of Presentations and Reports
 
FLTK Summer Course - Part I - First Impact - Exercises
FLTK Summer Course - Part I - First Impact - ExercisesFLTK Summer Course - Part I - First Impact - Exercises
FLTK Summer Course - Part I - First Impact - Exercises
 
Using Git on the Command Line
Using Git on the Command LineUsing Git on the Command Line
Using Git on the Command Line
 
FLTK Summer Course - Part VI - Sixth Impact - Exercises
FLTK Summer Course - Part VI - Sixth Impact - ExercisesFLTK Summer Course - Part VI - Sixth Impact - Exercises
FLTK Summer Course - Part VI - Sixth Impact - Exercises
 
"Git Hooked!" Using Git hooks to improve your software development process
"Git Hooked!" Using Git hooks to improve your software development process"Git Hooked!" Using Git hooks to improve your software development process
"Git Hooked!" Using Git hooks to improve your software development process
 
FLTK Summer Course - Part III - Third Impact
FLTK Summer Course - Part III - Third ImpactFLTK Summer Course - Part III - Third Impact
FLTK Summer Course - Part III - Third Impact
 
FLTK Summer Course - Part VII - Seventh Impact
FLTK Summer Course - Part VII  - Seventh ImpactFLTK Summer Course - Part VII  - Seventh Impact
FLTK Summer Course - Part VII - Seventh Impact
 
Servicios web con Python
Servicios web con PythonServicios web con Python
Servicios web con Python
 
Advanced Git
Advanced GitAdvanced Git
Advanced Git
 
FLTK Summer Course - Part II - Second Impact - Exercises
FLTK Summer Course - Part II - Second Impact - Exercises FLTK Summer Course - Part II - Second Impact - Exercises
FLTK Summer Course - Part II - Second Impact - Exercises
 
Code Refactoring - Live Coding Demo (JavaDay 2014)
Code Refactoring - Live Coding Demo (JavaDay 2014)Code Refactoring - Live Coding Demo (JavaDay 2014)
Code Refactoring - Live Coding Demo (JavaDay 2014)
 
Introduction to Git Commands and Concepts
Introduction to Git Commands and ConceptsIntroduction to Git Commands and Concepts
Introduction to Git Commands and Concepts
 
FLTK Summer Course - Part II - Second Impact
FLTK Summer Course - Part II - Second ImpactFLTK Summer Course - Part II - Second Impact
FLTK Summer Course - Part II - Second Impact
 
Git hooks For PHP Developers
Git hooks For PHP DevelopersGit hooks For PHP Developers
Git hooks For PHP Developers
 

Similar to Python - File operations & Data parsing

GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingMuthu Vinayagam
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11Henry Schreiner
 
How can I make it so my code works so my command line can look like -p (1).docx
How can I make it so my code works so my command line can look like -p (1).docxHow can I make it so my code works so my command line can look like -p (1).docx
How can I make it so my code works so my command line can look like -p (1).docxPaulntmMilleri
 
Web2py Code Lab
Web2py Code LabWeb2py Code Lab
Web2py Code LabColin Su
 
Ruby on Rails: Tasty Burgers
Ruby on Rails: Tasty BurgersRuby on Rails: Tasty Burgers
Ruby on Rails: Tasty BurgersAaron Patterson
 
Code error where list index is out of range when doing command like -p.docx
Code error where list index is out of range when doing command like -p.docxCode error where list index is out of range when doing command like -p.docx
Code error where list index is out of range when doing command like -p.docxJoe7Y7Nolany
 
The Ring programming language version 1.7 book - Part 29 of 196
The Ring programming language version 1.7 book - Part 29 of 196The Ring programming language version 1.7 book - Part 29 of 196
The Ring programming language version 1.7 book - Part 29 of 196Mahmoud Samir Fayed
 
Make Sure Your Applications Crash
Make Sure Your  Applications CrashMake Sure Your  Applications Crash
Make Sure Your Applications CrashMoshe Zadka
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRick Copeland
 
File Handling in python.docx
File Handling in python.docxFile Handling in python.docx
File Handling in python.docxmanohar25689
 
Python Google Cloud Function with CORS
Python Google Cloud Function with CORSPython Google Cloud Function with CORS
Python Google Cloud Function with CORSRapidValue
 
TensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow HubTensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow HubJeongkyu Shin
 
The Ring programming language version 1.5.1 book - Part 24 of 180
The Ring programming language version 1.5.1 book - Part 24 of 180The Ring programming language version 1.5.1 book - Part 24 of 180
The Ring programming language version 1.5.1 book - Part 24 of 180Mahmoud Samir Fayed
 

Similar to Python - File operations & Data parsing (20)

GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python Programming
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11
 
How can I make it so my code works so my command line can look like -p (1).docx
How can I make it so my code works so my command line can look like -p (1).docxHow can I make it so my code works so my command line can look like -p (1).docx
How can I make it so my code works so my command line can look like -p (1).docx
 
Having Fun Programming!
Having Fun Programming!Having Fun Programming!
Having Fun Programming!
 
Web2py Code Lab
Web2py Code LabWeb2py Code Lab
Web2py Code Lab
 
Five
FiveFive
Five
 
Ruby on Rails: Tasty Burgers
Ruby on Rails: Tasty BurgersRuby on Rails: Tasty Burgers
Ruby on Rails: Tasty Burgers
 
Files nts
Files ntsFiles nts
Files nts
 
Unit5
Unit5Unit5
Unit5
 
Code error where list index is out of range when doing command like -p.docx
Code error where list index is out of range when doing command like -p.docxCode error where list index is out of range when doing command like -p.docx
Code error where list index is out of range when doing command like -p.docx
 
The Ring programming language version 1.7 book - Part 29 of 196
The Ring programming language version 1.7 book - Part 29 of 196The Ring programming language version 1.7 book - Part 29 of 196
The Ring programming language version 1.7 book - Part 29 of 196
 
files.pptx
files.pptxfiles.pptx
files.pptx
 
Make Sure Your Applications Crash
Make Sure Your  Applications CrashMake Sure Your  Applications Crash
Make Sure Your Applications Crash
 
06-files.ppt
06-files.ppt06-files.ppt
06-files.ppt
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
 
File Handling in python.docx
File Handling in python.docxFile Handling in python.docx
File Handling in python.docx
 
PythonOOP
PythonOOPPythonOOP
PythonOOP
 
Python Google Cloud Function with CORS
Python Google Cloud Function with CORSPython Google Cloud Function with CORS
Python Google Cloud Function with CORS
 
TensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow HubTensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow Hub
 
The Ring programming language version 1.5.1 book - Part 24 of 180
The Ring programming language version 1.5.1 book - Part 24 of 180The Ring programming language version 1.5.1 book - Part 24 of 180
The Ring programming language version 1.5.1 book - Part 24 of 180
 

Recently uploaded

SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code ExamplesPeter Brusilovsky
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptNishitharanjan Rout
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17Celine George
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi RajagopalEADTU
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningMarc Dusseiller Dusjagr
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17Celine George
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Celine George
 
What is 3 Way Matching Process in Odoo 17.pptx
What is 3 Way Matching Process in Odoo 17.pptxWhat is 3 Way Matching Process in Odoo 17.pptx
What is 3 Way Matching Process in Odoo 17.pptxCeline George
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSAnaAcapella
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...Nguyen Thanh Tu Collection
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...EduSkills OECD
 
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptx
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptxMichaelis Menten Equation and Estimation Of Vmax and Tmax.pptx
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptxRugvedSathawane
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....Ritu480198
 

Recently uploaded (20)

SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopal
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learning
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17
 
What is 3 Way Matching Process in Odoo 17.pptx
What is 3 Way Matching Process in Odoo 17.pptxWhat is 3 Way Matching Process in Odoo 17.pptx
What is 3 Way Matching Process in Odoo 17.pptx
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptx
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptxMichaelis Menten Equation and Estimation Of Vmax and Tmax.pptx
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptx
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 

Python - File operations & Data parsing

  • 1. File operations and data parsing Presented by Felix Hoffmann @Felix11H felix11h.github.io/ Slides Slideshare: tiny.cc/file-ops Source: tiny.cc/file-ops-github References - Python Files I/O at tutorialspoint.com - Dive into Python 3 by Mark Pilgrim - Python Documentation on match objects - Regex tutorial at regexone.com This work is licensed under a Creative Commons Attribution 4.0 International License.
  • 2. File operations: Reading Opening an existing file >>> f = open("test.txt","rb") >>> print f <open file ’test.txt’, mode ’rb’ at 0x...> Reading it: >>> f.read() ’hello world’ Closing it: >>> f.close() >>> print f <closed file ’test.txt’, mode ’rb’ at 0x...>
  • 3. File operations: Writing Opening a (new) file >>> f = open("new_test.txt","wb") >>> print f <open file ’test.txt’, mode ’wb’ at 0x...> Writing to it: >>> f.write("hello world, again") >>> f.write("... and again") >>> f.close() ) Only after calling close() the changes appear in the file for editing elsewhere!
  • 4. File operations: Appending Opening an existing file >>> f = open("test.txt","ab") >>> print f <open file ’test.txt’, mode ’ab’ at 0x...> Appending to it: >>> f.write("hello world, again") >>> f.write("... and again") >>> f.close() ) In append mode the file pointer is set to the end of the opened file.
  • 5. File operations: More about file pointers 1 f = open("lines_test.txt", "wb") 2 for i in range(10): 3 f.write("this is line %d n" %(i+1)) 4 f.close() Reading from the file: >>> f = open("lines_test.txt", "rb") >>> f.readline() ’this is line 1 n’ >>> f.readline() ’this is line 2 n’ >>> f.read(14) ’this is line 3’ >>> f.read(2) ’ n’
  • 6. File operations: More about file pointers f.tell() gives current position within file f f.seek(x[, from]) change file pointer position within file f, where from = 0 from beginning of file from = 1 from current position from = 2 from end of file 1 >>> f = open("lines_test.txt", "rb") 2 >>> f.tell() 3 0 4 >>> f.read(10) 5 ’this is li’ 6 >>> f.tell() 7 10
  • 7. File operations: More about file pointers 1 >>> f.seek(5) 2 >>> f.tell() 3 5 4 >>> f.seek(10,1) 5 >>> f.tell() 6 15 7 >>> f.seek(-10,2) 8 >>> f.tell() 9 151 10 >>> f.read() 11 ’ line 10 n’
  • 8. File operations: Other Modes rb+ Opens the file for reading and writing. File pointer will be at the beginning of the file. wb+ Opens for reading and writing. Overwrites the existing file if the file exists, otherwise a new file is created. ab+ Opens the file for appending and reading. The file pointer is at the end of the file if the file exists, otherwise a new file is created for reading and writing.
  • 9. Saving Data: Python Pickle Use pickle to save and retrieve more complex data types - lists, dictionaries and even class objects: 1 >>> import pickle 2 >>> f = open(’save_file.p’, ’wb’) 3 >>> ex_dict = {’hello’: ’world’} 4 >>> pickle.dump(ex_dict, f) 5 >>> f.close() 1 >>> import pickle 2 >>> f = open(’save_file.p’, ’rb’) 3 >>> loadobj = pickle.load(f) 4 >>> print loadobj[’hello’] 5 world
  • 10. Best practice: With Statement 1 import pickle 2 3 ex_dict = {’hello’: ’world’} 4 5 with open(’save_file.p’, ’wb’) as f: 6 pickle.dump(ex_dict, f) 1 import pickle 2 3 with open(’save_file.p’, ’rb’) as f: 4 loadobj = pickle.load(f) 5 6 print loadobj[’hello’] ) Use this!
  • 11. Need for parsing Imagine that Data files are generated by a third party (no control over the format) & the data files need pre-processing ) Regular expressions provide a powerful and concise way to perform pattern match/search/replace over the data ©Randall Munroe xkcd.com CC BY-NC 2.5
  • 12. Regular expressions - A case study Formatting street names >>> s = ’100 NORTH MAIN ROAD’ >>> s.replace(’ROAD’, ’RD.’) ’100 NORTH MAIN RD.’ >>> s = ’100 NORTH BROAD ROAD’ >>> s.replace(’ROAD’, ’RD.’) ’100 NORTH BRD. RD.’ >>> s[:-4] + s[-4:].replace(’ROAD’, ’RD.’) ’100 NORTH BROAD RD.’ Better use regular expressions! >>> import re >>> re.sub(r’ROAD$’, ’RD.’, s) ’100 NORTH BROAD RD.’ example from Dive Into Python 3 ©Mark Pilgrim CC BY-SA 3.0
  • 13. Pattern matching with regular expressions ˆ Matches beginning of line/pattern $ Matches end of line/pattern . Matches any character except newline [..] Matches any single character in brackets [ˆ..] Matches any single character not in brackets re* Matches 0 or more occurrences of the preceding expression re+ Matches 1 or more occurrences of the preceding expression re? Matches 0 or 1 occurrence refng Match exactly n occurrences refn,g Match n or more occurrences refn,mg Match at least n and at most m ) Use cheatsheets, trainers, tutorials, builders, etc..
  • 14. re.search() & matches >>> import re >>> data = "I like python" >>> m = re.search(r’python’,data) >>> print m <_sre.SRE_Match object at 0x...> Important properties of the match object: group() Return the string matched by the RE start() Return the starting position of the match end() Return the ending position of the match span() Return a tuple containing the (start, end) positions of the match
  • 15. re.search() & matches For example: >>> import re >>> data = "I like python" >>> m = re.search(r’python’,data) >>> m.group() ’python’ >>> m.start() 7 >>> m.span() (7,13) For a complete list of match object properties see for example the Python Documentation: https://docs.python.org/2/library/re.html#match-objects
  • 16. re.findall() >>> import re >>> data = "Python is great. I like python" >>> m = re.search(r’[pP]ython’,data) >>> m.group() ’Python’ ) re.search() returns only the first match, use re.findall() instead: >>> import re >>> data = "Python is great. I like python" >>> l = re.findall(r’[pP]ython’,data) >>> print l [’Python’, ’python’] ) Returns list instead of match object!
  • 17. re.findall() - Example 1 import re 2 3 with open("history.txt", "rb") as f: 4 text = f.read() 5 6 year_dates = re.findall(r’19[0-9]{2}’, text)
  • 18. re.split() Suppose the data stream has well-defined delimiter >>> data = "x = 20" >>> re.split(r’=’,data) [’x ’, ’ 20’] >>> data = ’ftp://python.about.com’ >>> re.split(r’:/{1,3}’, data) [’ftp’, ’python.about.com’] >>> data = ’25.657’ >>> re.split(r’.’,data) [’25’, ’657’]
  • 19. re.sub() Replace patterns by other patterns. >>> data = "2004-959-559 # my phone number" >>> re.sub(r’#.*’,’’,data) ’2004-959-559 ’ A more interesting example: >>> data = "2004-959-559" >>> re.sub(r’([0-9]*)-([0-9]*)-([0-9]*)’, >>> r’3-2-1’, data) ’559-959-2004’ ) Groups are captured in parenthesis and referenced in the replacement string by n1, n2, ...
  • 20. os module Provides a way of using os dependent functionality: os.mkdir() Creates a directory (like mkdir) os.chmod() Change the permissions (like chmod) os.rename() Rename the old file name with the new file name. os.listdir() List the contents of the directory os.getcwd() Get the current working directory path os.path Submodule for useful functions on pathnames For example, list all files in the current directory: >>> from os import listdir >>> >>> for f in listdir("."): >>> print f
  • 21. Have fun! Presented by Felix Hoffmann @Felix11H felix11h.github.io/ Slides Slideshare: tiny.cc/file-ops Source: tiny.cc/file-ops-github References - Python Files I/O at tutorialspoint.com - Dive into Python 3 by Mark Pilgrim - Python Documentation on match objects - Regex tutorial at regexone.com This work is licensed under a Creative Commons Attribution 4.0 International License.