GithubHelp home page GithubHelp logo

spider-baiduindex's Issues

TypeError: string indices must be integers

Probably, because of some change in Python, I tried to run the demo code and get the following error :(

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-25-d685484c0ecf> in <module>
     12     keywords = ['爬虫', 'lol', '张艺兴', '人工智能', '华为', '武林外传']
     13     baidu_index = BaiduIndex(keywords, '2018-01-01', '2019-05-02')
---> 14     for index in baidu_index.get_index():
     15         print(index)

~/scripts/get_index.py in get_index(self)
     56                     start_date=params_data['start_date'],
     57                     end_date=params_data['end_date'],
---> 58                     keywords=params_data['keywords']
     59                 )
     60                 key = self._get_key(uniqid)

~/scripts/get_index.py in _get_encrypt_datas(self, start_date, end_date, keywords)
    107         html = self._http_get(url)
    108         datas = json.loads(html)
--> 109         uniqid = datas['data']['uniqid']
    110         encrypt_datas = []
    111         for single_data in datas['data']['userIndexes']:

TypeError: string indices must be integers

大佬你好:string indices must be integers

C:\ProgramData\Anaconda3\python.exe C:/Users/Administrator/Desktop/spider-BaiduIndex-master/new_spider_without_selenium/demo.py
Traceback (most recent call last):
File "C:/Users/Administrator/Desktop/spider-BaiduIndex-master/new_spider_without_selenium/demo.py", line 6, in
for index in baidu_index.get_index():
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\new_spider_without_selenium\get_index.py", line 58, in get_index
keywords=params_data['keywords']
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\new_spider_without_selenium\get_index.py", line 109, in _get_encrypt_datas
uniqid = datas['data']['uniqid']
TypeError: string indices must be integers

我一开始以为是cookie可能是找好的。然后发现您说可能缺了一个cookie。我后来请人解决了这个问题,希望大佬在某个地方给补上一些些注释哈哈。谢谢

baidu_index.py里的_all_kind变量改为参数传递

建议把baidu_index.py里的_all_kind变量改为参数传递,这样方便指定要爬的type类型。

如下:
# _all_kind = ['all', 'pc', 'wise']

def __init__(
    self,
    *,
    keywords: list,
    start_date: str,
    end_date: str,
    cookies: str,
    area=0,
    type=['all', 'pc', 'wise']
):
    self.keywords = keywords
    self.area = area
    self.start_date = start_date
    self.end_date = end_date
    self.cookies = cookies
    self._params_queue = utils.get_params_queue(start_date, end_date, keywords)
    self._all_kind = type

TypeError: string indices must be integers

in demo.py :
from get_index import BaiduIndex

if name == "main":
keywords = ['比特币']
baidu_index = BaiduIndex(keywords, '2013-04-01', '2014-03-31')
for index in baidu_index.get_index():
print(index)

Traceback (most recent call last):
File "e:/MyProjects/spider-BaiduIndex/new_spider_without_selenium/demo.py", line 6, in
for index in baidu_index.get_index():
File "e:\MyProjects\spider-BaiduIndex\new_spider_without_selenium\get_index.py", line 58, in get_index
keywords=params_data['keywords']
File "e:\MyProjects\spider-BaiduIndex\new_spider_without_selenium\get_index.py", line 109, in _get_encrypt_datas
uniqid = datas['data']['uniqid']
TypeError: string indices must be integers

How can I modify the code?

发现一个小BUG

formated_data['index'] = data['all']['data'][i]
这行代码的data后面应该是用kind,而不是写死的all

另外,想请教下你是怎么解析出uniqid和数值data之间的关系的?之前更新后就发现了这个API ,只是不明白之间的解析规则,这个可以跟我大概讲一下吗?

截图区域大了点

测试了下
im = im.crop((0,4,all_width,16))
这样刚刚好,截大了tesseract识别不好
再把tesseract进行样本训练,完美识别

今天在·使用的时候似乎开始有点bug?

Traceback (most recent call last):
File "E:\PyCharm 2018.3\helpers\pydev\pydevd.py", line 1689, in
main()
File "E:\PyCharm 2018.3\helpers\pydev\pydevd.py", line 1683, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "E:\PyCharm 2018.3\helpers\pydev\pydevd.py", line 1083, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "E:\PyCharm 2018.3\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/Administrator/Desktop/spider-BaiduIndex-master/spider-BaiduIndex-master/new_spider_without_selenium/demo.py", line 6, in
for index in baidu_index.get_index():
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\spider-BaiduIndex-master\new_spider_without_selenium\get_index.py", line 58, in get_index
keywords=params_data['keywords']
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\spider-BaiduIndex-master\new_spider_without_selenium\get_index.py", line 109, in _get_encrypt_datas
uniqid = datas['data']['uniqid']
TypeError: string indices must be integers

你好

请问有办法录个视频教程吗。。。本人非程序员,但能看得懂一些,但具体操作起来可能比较困难。如果能获得帮助,真是太感谢。。

想请教一下大神的媒体指数和咨询指数爬取

新建 Microsoft Word 文档.docx
上面是代码
为什么会出现一下的错误呢

Traceback (most recent call last):
File "C:/Users/Administrator/Desktop/spider-BaiduIndex-master/new_spider_without_selenium/news_feed.py", line 7, in
for index in baidu_index.get_index():
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\new_spider_without_selenium\get_extended_index.py", line 68, in get_index
for formated_data in self._format_data(encrypt_data):
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\new_spider_without_selenium\get_extended_index.py", line 131, in _format_data
keyword = str(data['word'])
KeyError: 'word'

此外,如果我想爬取咨询指数,我将这句成这样,
baidu_index = BaiduIndex(keywords, '2016-1-01', '2020-4-09',type,'feed')

也是错误的。请问怎么修改呢?

Traceback (most recent call last):
File "C:/Users/Administrator/Desktop/spider-BaiduIndex-master/new_spider_without_selenium/news_feed.py", line 5, in
baidu_index = BaiduIndex(keywords, '2016-1-01', '2020-4-09', type,'feed')
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\new_spider_without_selenium\get_extended_index.py", line 42, in init
self._pre_url = self.pre_url_dict[kind]
KeyError: <class 'type'>

请问如何确认api请求中的area编码?

目前area=0的情况下,取到的数据是“全国”数据,请问如何更改area取到各省份/城市的数据呢?
这个api有无相关的接口文档,或者如何确认各种省份/城市选项的编码呢?

感觉找一个词x一段时间,手工对应去做暴力破解会有点麻烦,请问有无其他方法~

大佬我又来了,我想调用同时爬取所有省份的数据时出现了问题

我把demo的代码改了一下。写了一个循环,以便爬完一个省份爬下一个。尴尬的是我发现当它运行到第17个循环的时候出现了bug,即result_data只能出现一个1×1数字,按道理来说数组的应该会出现我设置的天数那么长。我一开始以为是省份的问题,但我单独去爬取918号省份的时候是正常的。我不知道这是因为我连续爬了16个省份被百度发现了吗,但是每次我都是运行到第17个循环出错,想请教一下longxiaofei老师这个是怎么回事。

————————————
from get_index import BaiduIndex
import numpy as np
import pandas as pd
if name == "main":
keywords = ['天使投资']
result = []
times = 36
for i in range(times):
area_index = str(901+i)
baidu_index = BaiduIndex(keywords, '2019-1-02', '2019-1-07',area_index)
result_data = []
c=baidu_index.get_index()
for index in c:
if index['type'] == 'all':
np.array(result_data.append(index.get('index')))
if i == 0:
result=result_data
else:
result= np.vstack((result,result_data))
df = pd.DataFrame(result)
df.to_csv("天使投资.csv",encoding='utf_8_sig')

百度指数有两个指标

一个是 搜索指数, 一个是 资讯指数。 可以爬资讯指数吗,测了下只能爬搜索指数

无法获取百度指数

换了几个COOKIE,在不同电脑上,不同网络环境下,都无法获取到指数数据

部分关键词数据获取不到

代码爬取广州2020-01-01到2020-12-31的数据,关键词是['糖果', '冻干', '月饼', '啤酒', '洋酒'],但奇怪的是每一次跑程序,有时会有几列数据抓取不到,结果如下图:
微信图片_20210221141400
但有时又能全部抓到
微信图片_20210221141658

想问一下这里是需要怎么修改吗

关于可能由于被禁而出现string indices must be integers的问题

您好!首先非常感谢您的代码!参考您的代码我已经成功爬下了几个关键词的数据(写课程论文用的)!

可能我爬取的数据比较多,在爬取前七个关键词(我一个个爬的)的时候都是正确的,都是再次爬取的时候就又出现了string indices must be integers的报错,这是被网站禁了吗?我更换cookie和ip地址都还是会出现这个报错。

用的程序是在您的最新的程序的基础上加入自己的cookie,其他的未作修改。

我发现了一个百度指数防爬取的招数

image
image
就是本来全图是都有数据的,但是随着下拉栏的移动,图的数据就会不由自主的变成0(似乎是故意显示成错误),毫无疑问,我在您的代码爬取的时候,也是爬着爬着就出现了很多0,不知道有什么方法解决吗

ERROR-10002:未知错误

1621349083(1)
你好,我想问一下这个报错是什么问题?
第一次在JupyterNotebook写好后,后面再次使用时发现会一直报错这个。再次使用后使用的是新的Cookies

TypeError: string indices must be integers

Traceback (most recent call last):
File "/Users/xxx/Desktop/baidu/demo.py", line 28, in
for index in baidu_index.get_index():
File "/Users/xxx/Desktop/baidu/baidu_index/baidu_index.py", line 51, in get_index
encrypt_datas, uniqid = self._get_encrypt_datas(
File "/Users/xxx/Desktop/baidu/baidu_index/baidu_index.py", line 88, in _get_encrypt_datas
uniqid = datas['data']['uniqid']
TypeError: string indices must be integers

用大佬的demo.py跑了一下,但出现了如上报错。cookies是配置成功了的。

关键词数据返回

请教前辈,我关键词为上市公司名称,数据返回后,公司名称被分割为单个汉字,这个要如何解决呢?谢谢!
image

TypeError: string indices must be integers

运行DEMO代码报错。cookie正常。之前运行也是正常的。今天突然报错。换了几个cookie仍然报错。我想知道是我的代码的问题还是百度又修改算法了?

TypeError: string indices must be integers

Hi, I reused this scraper after two months. I used the cookies, "BDUSS" or "H_PS_PSSID", and still returns this error:

File "/Users/X/Downloads/spider-BaiduIndex-master 3/baidu_index/baidu_index.py", line 84, in _get_encrypt_datas
uniqid = datas['data']['uniqid']

TypeError: string indices must be integers

This is where I get my cookies from:

Screenshot 2020-07-31 at 16 05 15

Thanks..

似乎不能用了,可能是哪里出了问题?

image

  • 下载chromedriver, 并将它放到环境变量中

  • 下载tesseract, 并将它放到环境变量中

  • 单账号抓取:请你打开百度的首页,登录后,将百度首页的cookie复制后,粘贴到config.py中的COOKIES对象中

  • 找到tesseract文件夹, tesseract/3.05.02/share/tessdata/configs中的digits
    这些都做了。不知道怎么进行调试

关于使用COOKIE的问题

我遇到一个问题是之前粘贴了cookie这后成功运行了,但是今天再次运行的时候就出现了以下错误
Traceback (most recent call last):
File "demo.py", line 26, in
for i, keyword_type_date_index in enumerate(baidu_index.get_index()):
File "C:\Users\jxyzh\Desktop\spider-BaiduIndex-master\baidu_index\baidu_index.py", line 51, in get_index
encrypt_datas, uniqid = self._get_encrypt_datas(
File "C:\Users\jxyzh\Desktop\spider-BaiduIndex-master\baidu_index\baidu_index.py", line 84, in _get_encrypt_datas
uniqid = datas['data']['uniqid']
TypeError: string indices must be integers

我发现之前有人提问过这个问题,不过我这里之前已经成功运行过了,但是今天却报错,换上今天重新粘贴的COOKIE也不行,所以我想问一下有什么办法可以解决吗还是这是百度那里的原因。

非常感谢

分类选项

百度指数有一个分类:PC+移动,PC,移动 这个可以添加吗?

部分地区数据无法抓取?

在对某些地区的数据进行爬取时会报错:list index out of range,想请问要怎么办?
比如针对源代码中给出的几个关键字,地区代码设为32(湖北襄阳)时会报错。
希望能得到答复,谢谢~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.