Python XML Parsing – Complete Examples
Python XML Parsing
Python XML Parsing – We shall learn to parse xml documents in python programming language. There are many options available out there. We shall go through enough example for the following libraries
- ElementTree
- cElementTree
- minidom
- objectify
We shall look into examples to parse the xml file, extract attributes, extract elements, etc. for all of the above libraries.
We shall consider following xml file for examples going forward in this tutorial.
1 2 3 4 5 6 7 8 9 10 11 12 | <?xml version="1.0" encoding="UTF-8" ?> <holidays year="2017"> <holiday type="other"> <date>Jan 1</date> <name>New Year</name> </holiday> <holiday type="public"> <date>Oct 2</date> <name>Gandhi Jayanti</name> </holiday> </holidays> |
ElementTree – Python XML Parser
ElementTree comes along with python.
Examples:
Get Root Tag Name
1 2 3 4 5 | # Python XML Parsing import xml.etree.ElementTree as ET root = ET.parse('sample.xml').getroot() tag = root.tag print(tag) |
1 2 | tutorialkart@arjun-VPCEH26EN:~/PycharmProjects/PythonTutorial/parsing$ python python_xml_parse_ElementTree.py holidays |
Get Attributes of Root
1 2 3 4 5 6 7 8 9 10 11 | # Python XML Parsing import xml.etree.ElementTree as ET root = ET.parse('sample.xml').getroot() # get all attributes attributes = root.attrib print(attributes) # extract a particular attribute year = attributes.get('year') print('year : ',year) |
1 2 3 | tutorialkart@arjun-VPCEH26EN:~/PycharmProjects/PythonTutorial/parsing$ python python_xml_parse_ElementTree.py {'year': '2017'} year : 2017 |
Iterate over child nodes of root
1 2 3 4 5 6 7 | # Python XML Parsing import xml.etree.ElementTree as ET root = ET.parse('sample.xml').getroot() # iterate over all the nodes with tag name - holiday for holiday in root.findall('holiday'): print(holiday) |
1 2 3 | tutorialkart@arjun-VPCEH26EN:~/PycharmProjects/PythonTutorial/parsing$ python python_xml_parse_ElementTree.py <Element 'holiday' at 0x7fb5a107d3b8> <Element 'holiday' at 0x7fb59fc2f868> |
Iterate over child nodes of root and get their attributes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Python XML Parsing import xml.etree.ElementTree as ET root = ET.parse('sample.xml').getroot() # iterate over child nodes for holiday in root.findall('holiday'): # get all attributes of a node attributes = holiday.attrib print(attributes) # get a particular attribute type = attributes.get('type') print(type) |
1 2 3 4 5 | tutorialkart@arjun-VPCEH26EN:~/PycharmProjects/PythonTutorial/parsing$ python python_xml_parse_ElementTree.py {'type': 'other'} other {'type': 'public'} public |
Access Elements of a Node
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Python XML Parsing import xml.etree.ElementTree as ET root = ET.parse('sample.xml').getroot() # iterate over all nodes for holiday in root.findall('holiday'): # access element - name name = holiday.find('name').text print('name : ', name) # access element - date date = holiday.find('date').text print('date : ', date) |
1 2 3 4 5 | tutorialkart@arjun-VPCEH26EN:~/PycharmProjects/PythonTutorial/parsing$ python python_xml_parse_ElementTree.py name : New Year date : Jan 1 name : Gandhi Jayanti date : Oct 2 |
Access Elements of a Node without knowing their tag names
1 2 3 4 5 6 7 8 9 10 | # Python XML Parsing import xml.etree.ElementTree as ET root = ET.parse('sample.xml').getroot() for holiday in root.findall('holiday'): # access all elements in node for element in holiday: ele_name = element.tag ele_value = holiday.find(element.tag).text print(ele_name, ' : ', ele_value) |
1 2 3 4 5 | tutorialkart@arjun-VPCEH26EN:~/PycharmProjects/PythonTutorial/parsing$ python python_xml_parse_ElementTree.py date : Jan 1 name : New Year date : Oct 2 name : Gandhi Jayanti |