下载此文档

毕业论文-基于Python的网络爬虫设计.docx


文档分类:IT计算机 | 页数:约19页 举报非法文档有奖
1/19
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/19 下载此文档
文档列表 文档介绍
基于Python的网络爬虫
摘要
随着计算机技术的不断发展,新的编程语言层出不穷,Python,Html正是其中的佼佼者。相比较早期普及的高级语言(Java,C语言)等,Python有着更加实用的模块和库,虽然牺牲了底层性,但却更加方便用于开发小型项目。此外,Html也已经被普遍用于网站前端,标记语言的特性结合CSS丰富了网页内容和形式,某种意义上也促进了更加人性化的电子商务系统的发展。本文的网络爬虫正式基于Python语言编写的,通过对Html抓取加工将数据可视化,以监测逐渐从线下转移到线上,并随着电子商务发展越发简单普遍难以追踪的非法野生动物贸易。
本文首先对计算器语言的发展,尤其是Python和Html的优势、基础概念以及性能进行介绍。最后着重介绍可以实现检测网络上非法野生动物贸易的爬虫的设计和实现。程序主要包括三个模块:URL解析,Html抓取,本地输出。Python的开发环境和工具分别是OS X和PyChram CE,主要调用了Python中的Urllib2,beautifulsoup模块。
最终的程序可以实现对指定网站,指定关键词抓取指定内容,保存内容到本地,以便于追踪和检测。

关键字:Python,HTML,爬虫,非法野生动物贸易
Python-Based "Illegal wildlife trade" Spider
Electronics & Information Technology Program 11-1
Haozhi Zhu
Supervisor Rui Zhao
Abstract
With the continuous development puter technology, the new programming language after another, Python, Html is the one of the best. Compared to the early popularity of high-level language (Java, C language), etc., Python has a more practical modules and libraries, although at the expense of the underlying property, but it is more convenient for the development of small-scale projects. In addition, Html has also been widely used web front end, feature rich CSS markup language binding of web content and form, in a sense also promoted the development of a more user-friendly merce system. In this paper, based on the official web crawler written in Python, Html crawling through the data visualization process to monitor the gradual shift from offline to online, and with the development of merce more simple universal untraceable illegal wildlife trade.
Firstly, the development of the calculator language, especially Python and Html advantages, basic concepts and performance are introduced. The last focuses on the design and implementation can be achieved illegal wildlife trade on the reptile work. Program includes three modules: URL parsing, Html capture, data visualization output. Python development environment and tools are OS X and PyChram CE, mainly call in Python Urllib2, re and os module.
The final

毕业论文-基于Python的网络爬虫设计 来自淘豆网m.daumloan.com转载请标明出处.

非法内容举报中心
文档信息