GithubHelp home page GithubHelp logo

hhy5277 / toutiaocrawler Goto Github PK

View Code? Open in Web Editor NEW

This project forked from haibincoder/toutiaocrawler

0.0 1.0 0.0 10.9 MB

今日头条爬虫,主要爬取关键词搜索结果,包含编辑距离算法、奇异值分解、k-means聚类。

Python 100.00%

toutiaocrawler's Introduction

ToutiaoCrawler

接口示例:

2018.6.5更新
https://toutiao.com/search_content/?offset=0&format=json&keyword=手机&autoload=true&count=20&cur_tab=1&from=search_tab

参数说明:

keywordk:搜索的关键字
count:本页文章数量
cur_tab:当前页数

调试方法:

F12选择Network/All,选择preview/data节点

Demo:

ToutiaoCrawler\ToutiaoCrawler\demo.py 这里可以根据需求获取文章标题、标签、内容链接

Demo效果以及调试示例:


--------------------以下为项目代码,部分接口已失效--------------------

  • 需要python3.6版本
  • 首先安装需要的包,使用pycharm打开会自动安装
  1. 创建数据库和数据表ToutiaoCrawler/toutiao.sql;配置mysql连接ToutiaoCrawler/ToutiaoCrawler/Utils/Util.py
  2. 运行Crawler/get_toutiao_news_byapi.py 获取新闻列表【此接口16年开发,部分已失效】
  3. 运行Crawler/get_toutiao_content_byapi.py 获取新闻内容
  • (到这一步数据库已经有内容了)
  1. 运行Analysis/levenshtein.py 计算编辑距离
  2. 运行svd/svd.py 奇异值分解
  3. 运行svd/test_kmeans.py 进行聚类分析和绘图
  • 如果需要txt文件,执行Utils/list_to_txt.py  

toutiaocrawler's People

Contributors

haibincoder avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.