XML Parsing in Python

In this Python tutorial, we will learn how to parse XML documents using ElementTree library. We have examples on how to use this library covering scenarios like accessing tag names, attributes, iterating over the child nodes, etc.

Python XML Parsing using ElementTree

ElementTree comes along with python.

We shall look into examples to parse the xml file, extract attributes, extract elements, etc. for all of the above libraries.

We shall consider following xml file for examples going forward in this tutorial.

sample.xml

</>

Copy

<?xml version="1.0" encoding="UTF-8" ?>

<holidays year="2017">
    <holiday type="other">
        <date>Jan 1</date>
        <name>New Year</name>
    </holiday>
    <holiday type="public">
        <date>Oct 2</date>
        <name>Gandhi Jayanti</name>
    </holiday>
</holidays>

1. Get Root Tag Name

In the following program, we get the tag name of the root node.

Python Program

</>

Copy

# Python XML Parsing
import xml.etree.ElementTree as ET
root = ET.parse('sample.xml').getroot()
tag = root.tag
print(tag)

Output

tutorialkart@arjun-VPCEH26EN:~/PycharmProjects/PythonTutorial/parsing$ python python_xml_parse_ElementTree.py
holidays

2. Get Attributes of Root

In the following program, we access the attributes of the root node.

Python Program

</>

Copy

# Python XML Parsing
import xml.etree.ElementTree as ET
root = ET.parse('sample.xml').getroot()

# get all attributes
attributes = root.attrib
print(attributes)

# extract a particular attribute
year = attributes.get('year')
print('year : ',year)

Output

tutorialkart@arjun-VPCEH26EN:~/PycharmProjects/PythonTutorial/parsing$ python python_xml_parse_ElementTree.py
{'year': '2017'}
year : 2017

3. Iterate over child nodes of root

In the following program, we iterate over the child nodes of the root node using a For loop statement.

Python Program

</>

Copy

# Python XML Parsing
import xml.etree.ElementTree as ET
root = ET.parse('sample.xml').getroot()

# iterate over all the nodes with tag name - holiday
for holiday in root.findall('holiday'):
    print(holiday)

Output

tutorialkart@arjun-VPCEH26EN:~/PycharmProjects/PythonTutorial/parsing$ python python_xml_parse_ElementTree.py
<Element 'holiday' at 0x7fb5a107d3b8>
<Element 'holiday' at 0x7fb59fc2f868>

4. Iterate over child nodes of root and get their attributes

The following program is an extension to the previous program, where we access the attributes of the children, while iterating over them.

Python Program

</>

Copy

# Python XML Parsing
import xml.etree.ElementTree as ET
root = ET.parse('sample.xml').getroot()

# iterate over child nodes
for holiday in root.findall('holiday'):

    # get all attributes of a node
    attributes = holiday.attrib
    print(attributes)

    # get a particular attribute
    type = attributes.get('type')
    print(type)

Output

tutorialkart@arjun-VPCEH26EN:~/PycharmProjects/PythonTutorial/parsing$ python python_xml_parse_ElementTree.py
{'type': 'other'}
other
{'type': 'public'}
public

5. Access Elements of a Node

In the following program, we access the elements of a specific node.

Python Program

</>

Copy

# Python XML Parsing
import xml.etree.ElementTree as ET
root = ET.parse('sample.xml').getroot()

# iterate over all nodes
for holiday in root.findall('holiday'):

    # access element - name
    name = holiday.find('name').text
    print('name : ', name)

    # access element - date
    date = holiday.find('date').text
    print('date : ', date)

Output

tutorialkart@arjun-VPCEH26EN:~/PycharmProjects/PythonTutorial/parsing$ python python_xml_parse_ElementTree.py
name :  New Year
date :  Jan 1
name :  Gandhi Jayanti
date :  Oct 2

6. Access Elements of a Node without knowing their tag names

In the following program, we access the elements of a node, iteratively, in a For loop statement.

Python Program

</>

Copy

# Python XML Parsing
import xml.etree.ElementTree as ET
root = ET.parse('sample.xml').getroot()

for holiday in root.findall('holiday'):
    # access all elements in node
    for element in holiday:
        ele_name = element.tag
        ele_value = holiday.find(element.tag).text
        print(ele_name, ' : ', ele_value)

Output

tutorialkart@arjun-VPCEH26EN:~/PycharmProjects/PythonTutorial/parsing$ python python_xml_parse_ElementTree.py
date  :  Jan 1
name  :  New Year
date  :  Oct 2
name  :  Gandhi Jayanti

Conclusion

In this Python Tutorial, we learned how to parse an XML file using ElementTree library.

TutorialKart

XML Parsing in Python

Python XML Parsing using ElementTree

1. Get Root Tag Name

2. Get Attributes of Root

3. Iterate over child nodes of root

4. Iterate over child nodes of root and get their attributes

5. Access Elements of a Node

6. Access Elements of a Node without knowing their tag names

Conclusion

Popular Courses

SAP

CRM

SAP Resources

Apache

GUI

Programming

Databases

Mobile

Linux

Web & Server

Testing

Learning