毕业论文-基于Redis的二手房数据爬取系统的设计与实现.docx

下载文档

1001
0
约3.06万字
约 41页
2021-02-07 发布于福建
举报
版权申诉
保障服务

毕业论文-基于Redis的二手房数据爬取系统的设计与实现.docx

1、本文档共41页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

PAGE 2019届毕业生毕业设计说明书题目：基于Redis的二手房数据爬取系统的设计与实现院系名称：信息科学与工程学院专业班级：学生姓名：学号：指导教师：教师职称： 2019年6月5日 PAGE III 摘要随着社会经济的快速发展，城镇化的加速建设，房地产交易越来越火，尤其二手房交易市场居高不下，互联网涌现大批网上二手房交易网站，但是由于提供的房源质量参差不齐，对于个人用户的需求不够精确，无法做到房源精准投放，因此需要实现二手房房源推荐系统来解决用户需求，而房源推荐系统的实现首要就是需要获得足够多的房源信息，所以本毕设通过实现二手房数据爬取系统来爬取房源数据，为房源推荐系统提供数据支持。本系统使用多线程多端爬虫的优势，设计一个基于Redis的分布式主题爬虫。本系统采用Scrapy爬虫框架来开发，使用Xpath网页提取技术对下载网页进行内容解析，使用Redis做分布式，使用MongoDB对提取的数据进行存储，使用Django开发可视化界面对爬取的结果进行友好展示，设计并实现了针对链家网二手房数据的分布式爬虫系统。经过开发验证，本系统可以完成对链家二手房房源数据的分布式爬取，可以为房源推荐系统提供数据支持，也可以为数据分析师提供二手房数据分析的数据源。关键词：二手房：分布式爬虫：Scrapy：可视化 Title： Design and Implementation of Second-hand housing Data crawling system Abstract With the rapid development of social economy, the acceleration of urbanization construction, real estate transactions become more and more fire, especially second-hand housing market is high, the Internet emerged a large number of second-hand housing transactions online website, but due to provide housing quality is uneven, demand for individual users is not accurate, can't do properties accurately targeted, so you need to realize the secondary housing system to meet the needs of users, the implementation of the first housing recommended system is need to get enough housing information, so this project through the secondary data to crawl system housing data, recommend the system to provide data support for housing. This system uses the advantages of multi-threaded multi-layer crawlers to design a distributed topic crawler based on Redis. This system is developed by Scrapy crawler framework. XPath webpage extraction technology is used to parse the downloaded webpage, use Redis to do distributed, use Mongo to store the extracted data, and use Django to develop visual interface to display the crawling result. And realized a distributed crawler system for the second-hand housing data of the chain home network. After development and verification, this system can complete the distributed crawling of home