longxiaofei / spider-baiduindex Goto Github PK

View Code? Open in Web Editor NEW

741.0 741.0 226.0 228 KB

data sdk for baidu Index

License: MIT License

Python 100.00%

spider-baiduindex's Issues

大神要是有时间可以开发爬取一下人群属性：性别分布，年龄分布，兴趣分布

PyCryptodome代替PyCrypto

安装遇到编译问题，建议依赖项换成PyCryptodome，无感替代

TypeError: string indices must be integers

Probably, because of some change in Python, I tried to run the demo code and get the following error :(

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-25-d685484c0ecf> in <module>
     12     keywords = ['爬虫', 'lol', '张艺兴', '人工智能', '华为', '武林外传']
     13     baidu_index = BaiduIndex(keywords, '2018-01-01', '2019-05-02')
---> 14     for index in baidu_index.get_index():
     15         print(index)

~/scripts/get_index.py in get_index(self)
     56                     start_date=params_data['start_date'],
     57                     end_date=params_data['end_date'],
---> 58                     keywords=params_data['keywords']
     59                 )
     60                 key = self._get_key(uniqid)

~/scripts/get_index.py in _get_encrypt_datas(self, start_date, end_date, keywords)
    107         html = self._http_get(url)
    108         datas = json.loads(html)
--> 109         uniqid = datas['data']['uniqid']
    110         encrypt_datas = []
    111         for single_data in datas['data']['userIndexes']:

TypeError: string indices must be integers

大佬你好：string indices must be integers

C:\ProgramData\Anaconda3\python.exe C:/Users/Administrator/Desktop/spider-BaiduIndex-master/new_spider_without_selenium/demo.py
Traceback (most recent call last):
File "C:/Users/Administrator/Desktop/spider-BaiduIndex-master/new_spider_without_selenium/demo.py", line 6, in
for index in baidu_index.get_index():
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\new_spider_without_selenium\get_index.py", line 58, in get_index
keywords=params_data['keywords']
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\new_spider_without_selenium\get_index.py", line 109, in _get_encrypt_datas
uniqid = datas['data']['uniqid']
TypeError: string indices must be integers

我一开始以为是cookie可能是找好的。然后发现您说可能缺了一个cookie。我后来请人解决了这个问题，希望大佬在某个地方给补上一些些注释哈哈。谢谢

关于天眼查里面的"base_datas", "category"，和"base_datas", "area"这个可以提供下不

如题说问。

老哥，你怎么知道https://miao.baidu.com/abdr这个的参数是写死的呢？

我这边如果用你那个参数，是不行的不知道其他朋友行不

有人知道如何获取百度的主题功能吗

之前的版本似乎不行了，然后用了新版OK。但是新版本不能爬取PC端2011年之前的

我发现我输入2011年之前的日期，输出的日期，第一天就是2011-1-1。

baidu_index.py里的_all_kind变量改为参数传递

建议把baidu_index.py里的_all_kind变量改为参数传递，这样方便指定要爬的type类型。

如下：
# _all_kind = ['all', 'pc', 'wise']

def __init__(
    self,
    *,
    keywords: list,
    start_date: str,
    end_date: str,
    cookies: str,
    area=0,
    type=['all', 'pc', 'wise']
):
    self.keywords = keywords
    self.area = area
    self.start_date = start_date
    self.end_date = end_date
    self.cookies = cookies
    self._params_queue = utils.get_params_queue(start_date, end_date, keywords)
    self._all_kind = type

TypeError: string indices must be integers

in demo.py :
from get_index import BaiduIndex

if name == "main":
keywords = ['比特币']
baidu_index = BaiduIndex(keywords, '2013-04-01', '2014-03-31')
for index in baidu_index.get_index():
print(index)

Traceback (most recent call last):
File "e:/MyProjects/spider-BaiduIndex/new_spider_without_selenium/demo.py", line 6, in
for index in baidu_index.get_index():
File "e:\MyProjects\spider-BaiduIndex\new_spider_without_selenium\get_index.py", line 58, in get_index
keywords=params_data['keywords']
File "e:\MyProjects\spider-BaiduIndex\new_spider_without_selenium\get_index.py", line 109, in _get_encrypt_datas
uniqid = datas['data']['uniqid']
TypeError: string indices must be integers

How can I modify the code?

搜索指数中area的编号如何确定是哪个省份呢

request_args = {
            'word': json.dumps(word_list),
            'startDate': start_date.strftime('%Y-%m-%d'),
            'endDate': end_date.strftime('%Y-%m-%d'),
            'area': area
        }

发现一个小BUG

formated_data['index'] = data['all']['data'][i]
这行代码的data后面应该是用kind，而不是写死的all

另外，想请教下你是怎么解析出uniqid和数值data之间的关系的？之前更新后就发现了这个API ，只是不明白之间的解析规则，这个可以跟我大概讲一下吗？

截图区域大了点

测试了下
im = im.crop((0,4,all_width,16))
这样刚刚好，截大了tesseract识别不好
再把tesseract进行样本训练，完美识别

希望大佬可以开发组合词爬取功能

如图

今天在·使用的时候似乎开始有点bug？

Traceback (most recent call last):
File "E:\PyCharm 2018.3\helpers\pydev\pydevd.py", line 1689, in
main()
File "E:\PyCharm 2018.3\helpers\pydev\pydevd.py", line 1683, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "E:\PyCharm 2018.3\helpers\pydev\pydevd.py", line 1083, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "E:\PyCharm 2018.3\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/Administrator/Desktop/spider-BaiduIndex-master/spider-BaiduIndex-master/new_spider_without_selenium/demo.py", line 6, in
for index in baidu_index.get_index():
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\spider-BaiduIndex-master\new_spider_without_selenium\get_index.py", line 58, in get_index
keywords=params_data['keywords']
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\spider-BaiduIndex-master\new_spider_without_selenium\get_index.py", line 109, in _get_encrypt_datas
uniqid = datas['data']['uniqid']
TypeError: string indices must be integers

大佬怎么发现的数据解密方法呀？

在哪儿个js里面有什么函数写的么？

请问这个报错如何解决string indices must be integers

你好

请问有办法录个视频教程吗。。。本人非程序员，但能看得懂一些，但具体操作起来可能比较困难。如果能获得帮助，真是太感谢。。

TypeError: string indices must be integers

I keep getting this error when I run the whole project. Why?

作者太牛逼了，顺便问下能抓取百度指数里的“实时”数据么

我尝试改了下，发现太笨了搞不定。

调用连接是：http://index.baidu.com/api/LiveApi/getLive

厉害呀大神！，就想请问下，1.搜到的结果怎么保存到excel？？

厉害呀，就想请问下，搜到的结果怎么保存到excel？？本人新手小白，pandas ，baidu_index.to_csv('niushi',index = False) 失败呀

想请教一下大神的媒体指数和咨询指数爬取

新建 Microsoft Word 文档.docx
上面是代码
为什么会出现一下的错误呢

Traceback (most recent call last):
File "C:/Users/Administrator/Desktop/spider-BaiduIndex-master/new_spider_without_selenium/news_feed.py", line 7, in
for index in baidu_index.get_index():
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\new_spider_without_selenium\get_extended_index.py", line 68, in get_index
for formated_data in self._format_data(encrypt_data):
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\new_spider_without_selenium\get_extended_index.py", line 131, in _format_data
keyword = str(data['word'])
KeyError: 'word'

此外，如果我想爬取咨询指数，我将这句成这样，
baidu_index = BaiduIndex(keywords, '2016-1-01', '2020-4-09',type,'feed')

也是错误的。请问怎么修改呢？

Traceback (most recent call last):
File "C:/Users/Administrator/Desktop/spider-BaiduIndex-master/new_spider_without_selenium/news_feed.py", line 5, in
baidu_index = BaiduIndex(keywords, '2016-1-01', '2020-4-09', type,'feed')
File "C:\Users\Administrator\Desktop\spider-BaiduIndex-master\new_spider_without_selenium\get_extended_index.py", line 42, in init
self._pre_url = self.pre_url_dict[kind]
KeyError: <class 'type'>

报错：TypeError string indices must be integers

运行demo时，baidu_index.py 88行 uniqid = datas["data"]["uniqid"] 报错TypeError string indices must be integers

请问如何确认api请求中的area编码？

目前area=0的情况下，取到的数据是“全国”数据，请问如何更改area取到各省份/城市的数据呢？
这个api有无相关的接口文档，或者如何确认各种省份/城市选项的编码呢？

感觉找一个词x一段时间，手工对应去做暴力破解会有点麻烦，请问有无其他方法~

能不能实现多个cookies抓取呀，账号被限制了。。。

爬的指数有些多，单个账号爬好像被限制了，不知道要怎么实现多个账号爬取的功能

请问添加了cookie，为什么返回数据显示未登录

{'status': 10000, 'data': '', 'message': 'not login'}

string indices must be integers报错（和前面的issues好像不太一样）

前面的issues主要是cookies设置错误或者关键词不存在，但是我检查了这些我都满足，test_cookies为true，搜索指数和媒体指数都有。
关键词是龙脉温泉，出错是咨询指数（难度是因为他咨询指数各个值都为0吗？）

大佬我又来了，我想调用同时爬取所有省份的数据时出现了问题

我把demo的代码改了一下。写了一个循环，以便爬完一个省份爬下一个。尴尬的是我发现当它运行到第17个循环的时候出现了bug，即result_data只能出现一个1×1数字，按道理来说数组的应该会出现我设置的天数那么长。我一开始以为是省份的问题，但我单独去爬取918号省份的时候是正常的。我不知道这是因为我连续爬了16个省份被百度发现了吗，但是每次我都是运行到第17个循环出错，想请教一下longxiaofei老师这个是怎么回事。

————————————
from get_index import BaiduIndex
import numpy as np
import pandas as pd
if name == "main":
keywords = ['天使投资']
result = []
times = 36
for i in range(times):
area_index = str(901+i)
baidu_index = BaiduIndex(keywords, '2019-1-02', '2019-1-07',area_index)
result_data = []
c=baidu_index.get_index()
for index in c:
if index['type'] == 'all':
np.array(result_data.append(index.get('index')))
if i == 0:
result=result_data
else:
result= np.vstack((result,result_data))
df = pd.DataFrame(result)
df.to_csv("天使投资.csv",encoding='utf_8_sig')

百度指数有两个指标

一个是搜索指数，一个是资讯指数。可以爬资讯指数吗，测了下只能爬搜索指数

无法获取百度指数

换了几个COOKIE，在不同电脑上，不同网络环境下，都无法获取到指数数据

部分关键词数据获取不到

代码爬取广州2020-01-01到2020-12-31的数据，关键词是['糖果', '冻干', '月饼', '啤酒', '洋酒']，但奇怪的是每一次跑程序，有时会有几列数据抓取不到，结果如下图：

但有时又能全部抓到

想问一下这里是需要怎么修改吗

关于可能由于被禁而出现string indices must be integers的问题

您好！首先非常感谢您的代码！参考您的代码我已经成功爬下了几个关键词的数据（写课程论文用的）！

可能我爬取的数据比较多，在爬取前七个关键词（我一个个爬的）的时候都是正确的，都是再次爬取的时候就又出现了string indices must be integers的报错，这是被网站禁了吗？我更换cookie和ip地址都还是会出现这个报错。

用的程序是在您的最新的程序的基础上加入自己的cookie，其他的未作修改。

我发现了一个百度指数防爬取的招数

就是本来全图是都有数据的，但是随着下拉栏的移动，图的数据就会不由自主的变成0（似乎是故意显示成错误），毫无疑问，我在您的代码爬取的时候，也是爬着爬着就出现了很多0，不知道有什么方法解决吗

ERROR-10002：未知错误

你好，我想问一下这个报错是什么问题？
第一次在JupyterNotebook写好后，后面再次使用时发现会一直报错这个。再次使用后使用的是新的Cookies

decrypt_func是什么原理对返回的加密值做了解析

请教一下是基于什么算法做到的，看起来像是知道原来的加密算法

TypeError: string indices must be integers

Traceback (most recent call last):
File "/Users/xxx/Desktop/baidu/demo.py", line 28, in
for index in baidu_index.get_index():
File "/Users/xxx/Desktop/baidu/baidu_index/baidu_index.py", line 51, in get_index
encrypt_datas, uniqid = self._get_encrypt_datas(
File "/Users/xxx/Desktop/baidu/baidu_index/baidu_index.py", line 88, in _get_encrypt_datas
uniqid = datas['data']['uniqid']
TypeError: string indices must be integers

用大佬的demo.py跑了一下，但出现了如上报错。cookies是配置成功了的。

百度媒体指数的抓取部分倒数第二行代码中 baidu_index是否应该改为news_index

否则还是打印的搜索指数？单独抓取时就会显示baidu_index is not defined。

关键词数据返回

请教前辈，我关键词为上市公司名称，数据返回后，公司名称被分割为单个汉字，这个要如何解决呢？谢谢！

TypeError: string indices must be integers

运行DEMO代码报错。cookie正常。之前运行也是正常的。今天突然报错。换了几个cookie仍然报错。我想知道是我的代码的问题还是百度又修改算法了？

TypeError: string indices must be integers

Hi, I reused this scraper after two months. I used the cookies, "BDUSS" or "H_PS_PSSID", and still returns this error:

File "/Users/X/Downloads/spider-BaiduIndex-master 3/baidu_index/baidu_index.py", line 84, in _get_encrypt_datas
uniqid = datas['data']['uniqid']

TypeError: string indices must be integers

This is where I get my cookies from:

Thanks..

似乎不能用了，可能是哪里出了问题？

下载chromedriver, 并将它放到环境变量中
下载tesseract, 并将它放到环境变量中
单账号抓取：请你打开百度的首页，登录后，将百度首页的cookie复制后，粘贴到config.py中的COOKIES对象中
找到tesseract文件夹, tesseract/3.05.02/share/tessdata/configs中的digits
这些都做了。不知道怎么进行调试

关于使用COOKIE的问题

我遇到一个问题是之前粘贴了cookie这后成功运行了，但是今天再次运行的时候就出现了以下错误
Traceback (most recent call last):
File "demo.py", line 26, in
for i, keyword_type_date_index in enumerate(baidu_index.get_index()):
File "C:\Users\jxyzh\Desktop\spider-BaiduIndex-master\baidu_index\baidu_index.py", line 51, in get_index
encrypt_datas, uniqid = self._get_encrypt_datas(
File "C:\Users\jxyzh\Desktop\spider-BaiduIndex-master\baidu_index\baidu_index.py", line 84, in _get_encrypt_datas
uniqid = datas['data']['uniqid']
TypeError: string indices must be integers

我发现之前有人提问过这个问题，不过我这里之前已经成功运行过了，但是今天却报错，换上今天重新粘贴的COOKIE也不行，所以我想问一下有什么办法可以解决吗还是这是百度那里的原因。

非常感谢

longxiaofei / spider-baiduindex Goto Github PK

spider-baiduindex's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs