Posted 2021-12-20Updated 2021-12-202 minutes read (About 233 words)

python爬虫数据分析

beautifulsoup

from bs4 import BeautifulSoup
file=open("./xxx.html","rb")
htm=file.read().decode('utf-8')
bs=BeautifulSoup(html,"html.parser")

标签及其内容；拿到它所找到的第一个内容

print(bs.title)

只要内容不要标签

print(bs.title.string)

bs.a.attrs 获取a标签的所有的属性，返回一个字典
bs.a 获取a标签的所有的属性，返回一个字典
print(bs.a.sting)#comment 是一个特殊的navigablestring，输出的内容

3.Beautifulsoup 表示整个文档

print(bs.attrs)

遍历

正则

bs.find_all(re.conpile("a"))

find_all()

t_list=bs.find_all("a")
import re
t_list=bs.find_all(re.compile("a"))

def name_is_exists(tag):
    return tag.has_attr("name")
t_list=bs.find_all(name_is_exists)

CSS选择器

print(bs.select('title')) 通过标签查找
print(bs.select(".mnav")) 通过类名来查找
print(bs.select(#u1)) 通过id来查找
print(bs.select(a[class='bri'])) 通过属性来查找
print(bs.select("head > title") 通过子标签
print(bs.select(".mnav ~ .bri")

python爬虫数据分析

http://example.com/2021/12/20/python爬虫数据分析/

Author

vague huang

Posted on

2021-12-20

Updated on

2021-12-20

Licensed under

Afdian.net Alipay

Buy me a coffee Patreon PaypalWechat

Comments