Data Acquisition 2 | Beautifulsoup

Series: Data Acquisition

To begin using Beautifulsoup, we have to install it first.

$ pip install beautifulsoup4

Then we have to import this package at the front of our python file,

from bs4 import BeautifulSoup

2. Create the Soup

We can create the soup by a local html file,

with open(<path of the file>) as f:    
    text = f.read()    
    soup = BeautifulSoup(text, 'html.parser')

or we can create the soup by an online webpage,

import requests

reqs = requests.get(<URL>)
reqs.encoding = 'utf-8'.   # this can be changed
soup = BeautifulSoup(reqs.text, 'html.parser')

3. Play with the Soup

soup.prettify()

soup.title

soup.title.string

soup.title.text

soup.find_all('a')

for link in soup.find_all('a'):
    print(link.get('href'))

for link in soup.find_all('a'):
    print(link.get('href'))

tag.name

tag.text

tag.attrs

tag['id']

tag.get('id')

soup.find_all('img')

for link in soup.find_all('img'):
    print(link.get('src'))

tag.contents

tag.children

soup.find(<attr>=<value>)

soup.find('a')

tag.next_element

tag.last_element

soup.find_all(["a", "b"])

soup.find('a', id=<ID>)

soup.find('a', class_=<CLASS>)

soup.find_all("a", limit=<n>)