Python — Beautiful Soup All Notes with Projects

Beautiful Soup is a Python package for getting data out of HTML ,XML documents and other markup languages. We can use this package for getting data from java script or dynamically loading pages. It only fetches the contents of the URL that you give and then stops.

Beautiful Soup installation is so easy for your Python environment. Just type pip install bs4 .

pip install bs4

Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One of them is lxml parser. If you want to use it you have to install it. By the way there are four type of parser libraries. Lets summarizes the advantages and disadvantages of each parser library.

BeaufulSoup(markup,"html.parser") Advantages: Batteries included, Decent speed. Disadvantages: Not as fast as lxml, less lenient than html5lib

BeaufulSoup(markup,"lxml") Advantages: Very fast, Lenient. Disadvantages: External C dependency

BeaufulSoup(markup,"xml") Advantages: Very fast, The only currently supported XML parser. Disadvantages: External C dependency

BeaufulSoup(markup,"html5lib") Advantages: Creates valid HTML5. Disadvantages: Very slow, External C dependency

— — — — — — — — — — — — — — — — — — — — — — — — — —

Project-1) Examine basic Html document

We will examine to get data from simple Html document using BeautifulSoup library. As you see we imported BeautfitulSoup from bs4 pakage and parse to soup. When we parse the document we used html.parser method.

Lets do our html document to pretty with prettifty() method

print(soup.prettify())<html>
<head>
<title>
Python's awsome
</title>
</head>
<body>
<p class="title">
<b>
We will list python authors
</b>
</p>
<div class="list-authors">
<span class="descriptor">
Python is Everywhere
</span>
<a href="http://1.com" id="link1">
Messi
</a>
,
<a href="http://2.com" id="link2">
Ronaldo
</a>
</div>
</body>
</html>

Lets do some simple ways to get data from Html Document:

print(soup.title)
<title>Python's awsome</title>
print(soup.title.text)
Python's awsome
print(soup.title.name)
title
print(soup.title.string)
Python's awsome
print(soup.title.parent.name)
head
print(soup.p)
<p class="title"><b>We will list python authors</b></p>
print(soup.p.text)
We will list python authors
print(soup.a)
<a href="http://1.com" id="link1">Messi</a>
print(soup.a.text)
Messi
print(soup.div)
<div class="list-authors">
<span class="descriptor">Python is Everywhere</span>
<a href="http://1.com" id="link1">Messi</a>,
<a href="http://2.com" id="link2">Ronaldo</a>
</div>
print(soup.div.text)
Python is Everywhere
Messi,
Ronaldo
print(soup.div['class'])
['list-authors']
print(soup.find_all('a'))
[<a href="http://1.com" id="link1">Messi</a>, <a href="http://2.com" id="link2">Ronaldo</a>]
print(soup.get_text())
Python's awsome
We will list python authorsPython is Everywhere
Messi,
Ronaldo
Zidanprint(soup.find_all('a', attrs={'id': 'link2'}))
[<a href="http://2.com" id="link2">Ronaldo</a>]
print(soup.find_all('div', attrs={'class': 'list-authors'}))
[<div class="list-authors">
<span class="descriptor">Python is Everywhere</span>
<a href="http://1.com" id="link1">Messi</a>,
<a href="http://2.com" id="link2">Ronaldo</a>
</div>]
for link in soup.find_all('a'):
print(link.get('href'))
http://1.com
http://2.com
print(soup.find_all('a', href=True))
[<a href="http://1.com" id="link1">Messi</a>, <a href="http://2.com" id="link2">Ronaldo</a>]

— — — — — — — — — — — — — — — — — — — — — — — — — —

Project-2) Get special href from div tags

How to get www.ios.com data from this simply html document.

test5 = soup.select('.test5 a') 
for a in test5:
print(a['href'])

www.ios.com

Project-3) Get data from Span Tag

Get 10 gb from Span Tag

print(soup.find("span").text)
10 GB
print(soup.select_one("span").text)
10 GB
print(soup.select_one("span[title*=RAM]").text)
10 GB

— — — — — — — — — — — — — — — — — — — — — — — — — —

Project-4)

We will keep going…

I am an computer engineer. I am interested technology since 2010.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store