GithubHelp home page GithubHelp logo

ppkliu / sitecopy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from threezh1/sitecopy

0.0 0.0 0.0 14 KB

sitecopy is a tool that facilitates personal website backup and network data collection

Python 100.00%

sitecopy's Introduction

SiteCopy

sitecopy is a tool that facilitates personal website backup and network data collection

介绍

网站复制,也可称为网站备份。是通过工具将网页上的内容全部保存下来。当然不仅仅只是保存了一个html页面,而是将网页源码内所包含的css、js和静态文件等全部保存,以在本地也可以完整的浏览整个网站。网络上也有一些类似的工具,但使用起来并不理想。于是自己写一个Python脚本,方便个人对网站的备份,也方便一些网络资料的收集。

关于SiteCopy的开发记录:论如何优雅的复制一个网站的所有页面

对互联网任何网站的复制需在取得授权后方可进行,若使用者因此做出危害网络安全的行为后果自负,与作者无关,特此声明。

使用

Python版本: 3.7

安装依赖库: pip3 install -r requirements.txt

  • 复制单个页面

python sitecopy.py -u "http://www.threezh1.com"

  • 复制整个网站

python sitecopy.py -u "http://www.threezh1.com" -e

  • 复制多个页面

python sitecopy.py -s "site.txt"

  • 复制多个网站

python sitecopy.py -s "site.txt" -e

指定链接爬取的循环次数: -d (默认为200)

指定线程数:-e (默认为30)

例子: 爬取 www.threezh1.com 网站所有页面,指定链接爬取的循环次数为200,指定线程数为30

python sitecopy.py -u "http://www.threezh1.com" -e -d 200 -t 30

复制网站测试

运行截图:

pic_11.jpg

目录截图:

pic_07.jpg

页面截图:

pic_06.jpg

已知存在的问题

  1. 目录替换时在有些情况下会进行多次替换导致页面无法正常显示
  2. 网站或图床有防爬措施时无法正常保存
  3. 网络问题导致脚本无法正常执行

非常希望能够和师傅们共同交流对这些问题的解决方式,我的邮箱:[email protected]

sitecopy's People

Contributors

threezh1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.