Member-only story
Beautiful Soup is a Python package for getting data out of HTML ,XML documents and other markup languages. We can use this package for getting data from java script or dynamically loading pages. It only fetches the contents of the URL that you give and then stops.
Beautiful Soup installation is so easy for your Python environment. Just type pip install bs4 .
pip install bs4
Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One of them is lxml
parser. If you want to use it you have to install it. By the way there are four type of parser libraries. Lets summarizes the advantages and disadvantages of each parser library.
BeaufulSoup(markup,"html.parser")
Advantages: Batteries included, Decent speed. Disadvantages: Not as fast as lxml, less lenient than html5lib
BeaufulSoup(markup,"lxml")
Advantages: Very fast, Lenient. Disadvantages: External C dependency
BeaufulSoup(markup,"xml")
Advantages: Very fast, The only currently supported XML parser. Disadvantages: External C dependency
BeaufulSoup(markup,"html5lib")
Advantages: Creates valid HTML5. Disadvantages: Very slow, External C dependency
— — — — — — — — — — — — — — — — — — — — — — — — — —