GithubHelp home page GithubHelp logo

meway24 / taobaoscrapy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from torome/taobaoscrapy

0.0 1.0 0.0 30.82 MB

😩Tool For Taobao/Tmall| 儿时玩具已经过时

Home Page: https://github.com/hunterhug/GoTaoBao

Batchfile 0.14% Python 99.86%

taobaoscrapy's Introduction

天猫淘宝关键字商品搜索说明

已经太久远了,不再维护了。

本人开了新库,使用Golang语言, 更多精彩请移动到https://github.com/hunterhug/GoTaoBao, 更多参考:一只尼玛博客园

仍然能跑,2017/6。

一个抓取淘宝的Python爬虫
---------------------------------------------------------

一个抓取淘宝天猫关键字搜索商品的爬虫使用python3.4,爬虫程序已经封装好
支持抓取商品标题/商品价格/商品销量/商品图片等
使用请直接点击exe文件夹中后缀为exe的文件或者run.bat

------------------------------------------------------------

一.项目结构

-----taobaocomment
	-------source	源代码
	-------data 原始数据
	-------image 你要的图片
	-------excel	你要的结果
	-------exe.rar	请解压变成exe
	-------exehelp.rar	请解压变成exehelp
	-------run.bat	你要跑的脚本
	-------runhelp.bat 

二.本地环境准备

安装python3。然后设置环境变量。

1.安装依赖模块

pip3 install -r requirement.txt

Windows用户请自行装库:

import urllib.request, urllib.parse, http.cookiejar
import os, time, re
import http.cookies
import xlsxwriter as wx
from PIL import Image
import pymysql
import socket
import json
import datetime

如果安装模块失败, 那么可能是cx_Freeze下载失败, 从万能仓库 下载对应版本的打包库,然后:

pip3 install cx_Freeze-4.3.4-cp35-none-win_amd64.whl

2.打包exe

转到源代码文件夹source, 执行打包命令!

python setup.py build

exe.win32-3.4文件夹移到根目录,改名为exe, 同样python setuphelp.py build打包辅助工具, 移动到根目录, 改名exehelp.

三.开始使用

正常执行

cd source
python mtaobao.py

或者

run.bat

有时候程序运行中途断网或者其他原因,如误点下载图片,而图片几万张不耐烦终止程序,导致程序
运行没完成。不必担心,只要原始数据在,一切好办。
将 data 中的原始数据移到 help 文件夹中继续!

cd source
python help.py

或者

runhelp.bat

四.演示

Do not understand?contact me.
author:hunterhug
2015/11

如果你觉得项目帮助到你,欢迎请我喝杯咖啡

微信 微信

支付宝 支付宝


补充

1.2016/7/7改bug

请查看JSON.json,淘宝json数据字段变更,导致程序出错

淘宝需要验证时,请往subcookie.txt填东西,参考pdf

'手机折扣'字段失效

Traceback (most recent call last):
  File "mtaobao.py", line 322, in <module>
    itemlist.append(item['mobileDiscount'])
KeyError: 'mobileDiscount'

'URL地址'字段失效

Traceback (most recent call last):
  File "mtaobao.py", line 328, in <module>
    itemlist.append(item['auctionURL'])
KeyError: 'auctionURL'

已经更正

参考JSON可以加更多字段,请自行增加修改

taobaoscrapy's People

Contributors

hunterhug avatar

Watchers

Meway avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.