下载此文档

Python爬虫程序设计KC23.pptx

文档分类：IT计算机 | 页数：约22页举报非法文档有奖

1/22

下载提示

1.该资料是网友上传的，本站提供全文预览，预览什么样，下载就什么样。
2.下载该文档所得收入归上传者、原创者。
3.下载的文档，不会出现我们的网址水印。

同意并开始全文预览

(约 1-6 秒)

1/22 下载此文档

文档列表 文档介绍

BeautifulSoup查找HTML元素
BeautifulSoup查找HTML元素
查找文档的元素是我们爬取网页信息的重要手段，BeautifulSoup提供了一系列的查找元素的方法，其中功能强大的fisoup=BeautifulSoup(doc,"lxml")
tags=("a")
for tag in tags:
print(tag)

程序结果找到3个<a>元素：
<a class="sister" href="" id="link1">Elsie</a>
<a class="sister" href="" id="link2">Lacie</a>
<a class="sister" href="" id="link3">Tillie</a>
例2-3-3：查找文档中的第一个<a>元素
from bs4 import BeautifulSoup
doc='''
<html><head><title>The Dormouse's story</title></head>
<body>
The Dormouse's story

Once upon a time there were three little sisters; and their names were
<a href="" class="sister" id="link1">Elsie</a>,
<a href="" class="sister" id="link2">Lacie</a> and
<a href="" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.

...
</body>
</html>
'''
soup=BeautifulSoup(doc,"lxml")
tag=("a")
print(tag)

程序结果找到第一个<a>元素：
<a class="sister" href="" id="link1">Elsie</a>
例2-3-4：查找文档中class="title"的元素
from bs4 import BeautifulSoup
doc='''
<html><head><title>The Dormouse's story</title></head>
<body>
The Dormouse's story

Once upon a time there were three little sisters; and their names were
<a href="" class="sister" id="link1">Elsie</a>,
<a href="" class="sister" id="link2">Lacie</a> and
<a href="" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.

...
</body>
</html>
'''
soup=BeautifulSoup(doc,"lxml")
tag=("p",attrs={"class":"title"})
print(tag)

程序结果找到class="title"的元素
The Dormouse's story
很显然如果使用：
tag=("p")
也能找到这个元素，因为它是文档的第一个元素。
例2-3-5：查找文档中class="sister"的元素
from

Python爬虫程序设计KC23 来自淘豆网m.daumloan.com转载请标明出处.