Comments (24)
现在的github 关注人动态是通过ajax加载的(在原来的response里找不到),要向'https://github.com/dashboard-feed' 发送请求
这是我修改的代码,主要是dynamic这个函数以及增加的feed_url, 修改后运行结果和书上截图相同import requests from pyquery import PyQuery as pq class Login(object): def __init__(self): self.headers = { 'Referer': 'https://github.com/', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36', 'Host': 'github.com' } self.login_url = 'https://github.com/login' self.post_url='https://github.com/session' self.feed_url = 'https://github.com/dashboard-feed' self.logined_url = 'https://github.com/settings/profile' ## 维持会话,自动处理cookies self.session = requests.Session() ## 解析出登录所需要的 def token(self): response = self.session.get(self.login_url, headers=self.headers) selector = pq(response.text) token = selector('input[name="authenticity_token"]').attr('value') return token def login(self, email, password): #print(self.token()) post_data = { 'commit': 'Sign in', 'utf8': '✓', 'authenticity_token': self.token(), 'login': email, 'password': password } response = self.session.post(self.post_url, data=post_data, headers=self.headers) response = self.session.get(self.feed_url, headers=self.headers) if response.status_code == 200: self.dynamics(response.text) #print(response.text) response = self.session.get(self.logined_url, headers=self.headers) if response.status_code == 200: self.profile(response.text) ## 关注人的动态信息 def dynamics(self, html): selector = pq(html) #print(selector.text()) dynamics = selector('div[class="d-flex flex-items-baseline"] div') dynamics.find('span').remove() #print(dynamics.text()) for item in dynamics.items(): dynamic = item.text().strip() print(dynamic) ## 详情页面 def profile(self, html): selector = pq(html) #print(selector.text()) name = selector('input[id="user_profile_name"]').attr('value') email = selector('select[id="user_profile_email"] option[selected="selected"]').text() print(name,email)
为什么我用你的代码最后输出一个None
from githublogin.
@JasonWen1 你把最后一个函数的#号去掉看看页面的源代码
完整的代码还要加上
if __name__ == "__main__":
login = Login()
login.login(email='YourEmail', password='Yourpassword')
from githublogin.
@JasonWen1 我以为是网站改版了,但刚刚我运行还是成功的,你能把完整的代码和输出po出来吗
from githublogin.
@univerone 我也是运行了,结果是一个None, 4个网址的响应代码分别是如下:
https://github.com/login 200
https://github.com/session 422
https://github.com/dashboard-feed 200
https://github.com/settings/profile 200
None
我查了下
422 | 从当前客户端所在的IP地址到服务器的连接数超过了服务器许可的最大范围。通常,这里的IP地址指的是从服务器上看到的客户端地址(比如用户的网关或者代理服务器地址)。在这种情况下,连接数的计算可能涉及到不止一个终端用户。
但我正常浏览器登陆github是OK的啊,而且测试的时候我浏览器github的帐号都是退出了的
from githublogin.
@heavenkiller2018 我测试https://github.com/session时,响应代码也是422。请问你是怎么解决这个问题的呢?
from githublogin.
现在的github 关注人动态是通过ajax加载的(在原来的response里找不到),要向'https://github.com/dashboard-feed' 发送请求
这是我修改的代码,主要是dynamic这个函数以及增加的feed_url, 修改后运行结果和书上截图相同import requests from pyquery import PyQuery as pq class Login(object): def __init__(self): self.headers = { 'Referer': 'https://github.com/', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36', 'Host': 'github.com' } self.login_url = 'https://github.com/login' self.post_url='https://github.com/session' self.feed_url = 'https://github.com/dashboard-feed' self.logined_url = 'https://github.com/settings/profile' ## 维持会话,自动处理cookies self.session = requests.Session() ## 解析出登录所需要的 def token(self): response = self.session.get(self.login_url, headers=self.headers) selector = pq(response.text) token = selector('input[name="authenticity_token"]').attr('value') return token def login(self, email, password): #print(self.token()) post_data = { 'commit': 'Sign in', 'utf8': '✓', 'authenticity_token': self.token(), 'login': email, 'password': password } response = self.session.post(self.post_url, data=post_data, headers=self.headers) response = self.session.get(self.feed_url, headers=self.headers) if response.status_code == 200: self.dynamics(response.text) #print(response.text) response = self.session.get(self.logined_url, headers=self.headers) if response.status_code == 200: self.profile(response.text) ## 关注人的动态信息 def dynamics(self, html): selector = pq(html) #print(selector.text()) dynamics = selector('div[class="d-flex flex-items-baseline"] div') dynamics.find('span').remove() #print(dynamics.text()) for item in dynamics.items(): dynamic = item.text().strip() print(dynamic) ## 详情页面 def profile(self, html): selector = pq(html) #print(selector.text()) name = selector('input[id="user_profile_name"]').attr('value') email = selector('select[id="user_profile_email"] option[selected="selected"]').text() print(name,email)
为什么我用你的代码最后输出一个None
我的也是这个问题
from githublogin.
现在的GitHub关注人动态是通过Ajax加载的(在原来的Response里找不到),要向‘https://github.com/dashboard-feed‘发送请求’这是我修改的代码,主要是动态这个函数以及增加的feed_url,修改后运行结果和书上截图相同
import requests from pyquery import PyQuery as pq class Login(object): def __init__(self): self.headers = { 'Referer': 'https://github.com/', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36', 'Host': 'github.com' } self.login_url = 'https://github.com/login' self.post_url='https://github.com/session' self.feed_url = 'https://github.com/dashboard-feed' self.logined_url = 'https://github.com/settings/profile' ## 维持会话,自动处理cookies self.session = requests.Session() ## 解析出登录所需要的 def token(self): response = self.session.get(self.login_url, headers=self.headers) selector = pq(response.text) token = selector('input[name="authenticity_token"]').attr('value') return token def login(self, email, password): #print(self.token()) post_data = { 'commit': 'Sign in', 'utf8': '✓', 'authenticity_token': self.token(), 'login': email, 'password': password } response = self.session.post(self.post_url, data=post_data, headers=self.headers) response = self.session.get(self.feed_url, headers=self.headers) if response.status_code == 200: self.dynamics(response.text) #print(response.text) response = self.session.get(self.logined_url, headers=self.headers) if response.status_code == 200: self.profile(response.text) ## 关注人的动态信息 def dynamics(self, html): selector = pq(html) #print(selector.text()) dynamics = selector('div[class="d-flex flex-items-baseline"] div') dynamics.find('span').remove() #print(dynamics.text()) for item in dynamics.items(): dynamic = item.text().strip() print(dynamic) ## 详情页面 def profile(self, html): selector = pq(html) #print(selector.text()) name = selector('input[id="user_profile_name"]').attr('value') email = selector('select[id="user_profile_email"] option[selected="selected"]').text() print(name,email)
为什么我用你的代码最后输出一个无
我的也是这个问题
请问如何解决HTTPError: 422 Client Error: Unprocessable Entity for url: https://github.com/session这个问题呢
from githublogin.
遇到了一样的问题。response = self.session.post(self.post_url, data=post_data, headers=self.headers)
的返回结果是422
from githublogin.
github的 https://github.com/session状态码返回是422,我尝试了码云的,返回是200没问题,可能github对post请求加了其他的验证
from githublogin.
@ljtckkk 是的,应该是加了一些其他的验证。
如果只是针对github的话,我推荐直接使用官方提供的api。申请token之后用起来很方便,也不需要在登录页面记录cookie和authenticity_token了。
import requests
headers ={
'Authorization': 'token [你的token]',
}
session = requests.Session()
response = session.get('https://api.github.com/users/[用户名]/received_events', headers=headers)
print(response.status_code)
print(response.text)
from githublogin.
@ljtckkk 是的,应该是加了一些其他的验证。
如果只是针对github的话,我推荐直接使用官方提供的api。申请token之后用起来很方便,也不需要在登录页面记录cookie和authenticity_token了。import requests headers ={ 'Authorization': 'token [你的token]', } session = requests.Session() response = session.get('https://api.github.com/users/[用户名]/received_events', headers=headers) print(response.status_code) print(response.text)
我按照楼主说的修改完, 没有任何问题啊, 很奇怪
from githublogin.
现在的github 关注人动态是通过ajax加载的(在原来的response里找不到),要向'https://github.com/dashboard-feed' 发送请求
这是我修改的代码,主要是dynamic这个函数以及增加的feed_url, 修改后运行结果和书上截图相同import requests from pyquery import PyQuery as pq class Login(object): def __init__(self): self.headers = { 'Referer': 'https://github.com/', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36', 'Host': 'github.com' } self.login_url = 'https://github.com/login' self.post_url='https://github.com/session' self.feed_url = 'https://github.com/dashboard-feed' self.logined_url = 'https://github.com/settings/profile' ## 维持会话,自动处理cookies self.session = requests.Session() ## 解析出登录所需要的 def token(self): response = self.session.get(self.login_url, headers=self.headers) selector = pq(response.text) token = selector('input[name="authenticity_token"]').attr('value') return token def login(self, email, password): #print(self.token()) post_data = { 'commit': 'Sign in', 'utf8': '✓', 'authenticity_token': self.token(), 'login': email, 'password': password } response = self.session.post(self.post_url, data=post_data, headers=self.headers) response = self.session.get(self.feed_url, headers=self.headers) if response.status_code == 200: self.dynamics(response.text) #print(response.text) response = self.session.get(self.logined_url, headers=self.headers) if response.status_code == 200: self.profile(response.text) ## 关注人的动态信息 def dynamics(self, html): selector = pq(html) #print(selector.text()) dynamics = selector('div[class="d-flex flex-items-baseline"] div') dynamics.find('span').remove() #print(dynamics.text()) for item in dynamics.items(): dynamic = item.text().strip() print(dynamic) ## 详情页面 def profile(self, html): selector = pq(html) #print(selector.text()) name = selector('input[id="user_profile_name"]').attr('value') email = selector('select[id="user_profile_email"] option[selected="selected"]').text() print(name,email)
为什么我用你的代码最后输出一个None
你的个人资料里面没有设置展示的name和email吧,我开始也是,改完之后就好了
from githublogin.
xpath版本的:
import requests, re
from lxml import etree
class Login(object):
def __init__(self):
self.headers = {
'Referer': 'https://github.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
'Host': 'github.com'
}
self.login_url = 'https://github.com/login'
self.post_url = 'https://github.com/session'
self.feed_url = 'https://github.com/dashboard-feed'
self.logined_url = 'https://github.com/settings/profile'
self.session = requests.Session()
def dynamics(self, html):
print('*'*10+'dynamicing'+'*'*10)
selector = etree.HTML(html)
# print("*"*20, etree.tostring(selector).decode('utf-8'))
# print(selector.xpath('//div[@class="d-flex flex-items-baseline"]'))
dynamics = selector.xpath('//div[@class="d-flex flex-items-baseline"]//div')
# print(dynamics)
for item in dynamics:
etree.strip_elements(item, 'span')
dynamic = ' '.join(item.xpath('.//text()')).replace('\n', ' ').strip()
dynamic = re.sub(' +', ' ', dynamic)
print(dynamic)
print('*' * 10 + 'dynamic end' + '*' * 10)
def profile(self, html):
print('*'*10+'profileing'+'*'*10)
selector = etree.HTML(html)
name = selector.xpath('//input[@id="user_profile_name"]/@value')[0]
email = selector.xpath('//select[@id="user_profile_email"]/option[@value!=""]/text()')[0]
print(name, email)
def token(self):
response = self.session.get(self.login_url, headers=self.headers)
selector = etree.HTML(response.text)
token = selector.xpath('//div//input[2]/@value')[0]
return token
def login(self, email, password):
post_data = {
'commit': 'Sign in',
'utf8': '✓',
'authenticity_token': self.token(),
'login': email,
'password': password
}
response = self.session.post(self.post_url, data=post_data, headers=self.headers)
response = self.session.get(self.feed_url, headers=self.headers)
if response.status_code == 200:
self.dynamics(response.text)
# print('response.text', response.text)
response = self.session.get(self.logined_url, headers=self.headers)
if response.status_code == 200:
# print(response.text)
self.profile(response.text)
if __name__ == '__main__':
login = Login()
login.login(email='email', password='password')
from githublogin.
感谢!已经更新到 README。
from githublogin.
还是不行呀,显示none,我也更改了资料了。还有什么叫:现在的github 关注人动态是通过ajax加载的(在原来的response里找不到)。能不能讲解一下
from githublogin.
现在的github 关注人动态是通过ajax加载的(在原来的response里找不到),要向'https://github.com/dashboard-feed' 发送请求
这是我修改的代码,主要是dynamic这个函数以及增加的feed_url, 修改后运行结果和书上截图相同import requests from pyquery import PyQuery as pq class Login(object): def __init__(self): self.headers = { 'Referer': 'https://github.com/', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36', 'Host': 'github.com' } self.login_url = 'https://github.com/login' self.post_url='https://github.com/session' self.feed_url = 'https://github.com/dashboard-feed' self.logined_url = 'https://github.com/settings/profile' ## 维持会话,自动处理cookies self.session = requests.Session() ## 解析出登录所需要的 def token(self): response = self.session.get(self.login_url, headers=self.headers) selector = pq(response.text) token = selector('input[name="authenticity_token"]').attr('value') return token def login(self, email, password): #print(self.token()) post_data = { 'commit': 'Sign in', 'utf8': '✓', 'authenticity_token': self.token(), 'login': email, 'password': password } response = self.session.post(self.post_url, data=post_data, headers=self.headers) response = self.session.get(self.feed_url, headers=self.headers) if response.status_code == 200: self.dynamics(response.text) #print(response.text) response = self.session.get(self.logined_url, headers=self.headers) if response.status_code == 200: self.profile(response.text) ## 关注人的动态信息 def dynamics(self, html): selector = pq(html) #print(selector.text()) dynamics = selector('div[class="d-flex flex-items-baseline"] div') dynamics.find('span').remove() #print(dynamics.text()) for item in dynamics.items(): dynamic = item.text().strip() print(dynamic) ## 详情页面 def profile(self, html): selector = pq(html) #print(selector.text()) name = selector('input[id="user_profile_name"]').attr('value') email = selector('select[id="user_profile_email"] option[selected="selected"]').text() print(name,email)
为什么我用你的代码最后输出一个None
不行, 没有跳转过去, 所以输出None
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
from githublogin.
试了下脚本,不能直接用呀?有啥需要调整的么
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /login (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x000002C7C3B986A0>: Failed to establish a new connection: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。',))
from githublogin.
返回None的好兄弟们,你们有可能是没有关注人导致 dynamics = selector('div[class="d-flex flex-items-baseline"] div') 没有目标,其实已经登陆上了 ,在login()方法中,在最后加上print(response.text),打印出登陆后的页面后,在结果中按ctrl+F搜索自己的名字,会发现你的名字出现在结果中就登陆上了
from githublogin.
楼上说的有道理,但既然前几章不都已经有解析AJAX的吗?
from githublogin.
pyquery和xpath这两个版本都能成功运行。
from githublogin.
还是不行呀,显示none,我也更改了资料了。还有什么叫:现在的github 关注人动态是通过ajax加载的(在原来的response里找不到)。能不能讲解一下
可能你的github没有关注人或者profile没有设置好。
from githublogin.
None,还有一种情况,是触发了邮箱验证,我把response抓取下来打开就是email 设备验证码页面,需要去邮箱拿到验证码
解决方法,从邮箱拿到验证码,找到邮箱验证码,验证之后就不会拦截了。
from githublogin.
邮件验证有422的情况,偶尔可以正常浏览。
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import requests
from lxml import etree
class Login(object):
# 初始化函数
def __init__(self):
# 设置http协议header头信息
self.headers = {
'Referer': 'https://github.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36',
'Host': 'github.com'
}
# 设置登录的url地址
self.login_url = 'https://github.com/login'
# 设置session的url地址
self.post_url = 'https://github.com/session'
# 设置profile的url地址
self.logined_url = 'https://github.com/settings/profile'
# 从请求中获取session
self.session = requests.Session()
# 设置email验证的url地址
self.email_url = 'https://github.com/sessions/verified-device'
def token(self):
# session的get函数获取响应
response = self.session.get(self.login_url, headers=self.headers)
## 将响应文件保存到本地
fhandle = open("./github_xpath.html","wb")
fhandle.write(response.text.encode('utf-8'))
fhandle.close()
## 获取选择器
selector = etree.HTML(response.text)
#print(response.text)
#xpath在线工具正常,lxml解析为空
#token = selector.xpath('//div//input[2]/@value')
# 使用选择器获取响应的token
token = selector.xpath('//div//input[1]/@value')
print(token)
return token
def login(self, email, password):
#print(self.token())
# post传递的数据
post_data = {
'commit': 'Sign in',
'utf8': '✓',
'authenticity_token': self.token()[0],
'login': email,
'password': password,
'trusted_device': '',
'webauthn-support': 'supported',
'webauthn-iuvpaa-support': 'unsupported'
}
## session的post函数获取响应
response = self.session.post(self.post_url, data=post_data, headers=self.headers)
if response.status_code == 200:
print(response.url)
self.dynamics(response.text)
## session的get函数获取响应
response = self.session.get(self.logined_url, headers=self.headers)
if response.status_code == 200:
self.profile(response.text)
def dynamics(self, html):
fhandle = open("./github_xpath_dynamics.html","wb")
fhandle.write(html.encode('utf-8'))
fhandle.close()
opt = input("请输入邮箱验证码")
post_data = {
'authenticity_token': self.token()[0],
'opt': opt
}
response = self.session.post(self.logined_url, data=post_data)
print(response.status_code)
if response.status_code == 200:
html = response.text
#fhandle = open("./github_xpath_422.html","wb")
#fhandle.write(html.encode('utf-8'))
#fhandle.close()
selector = etree.HTML(html)
dynamics = selector.xpath('//div[contains(@class, "news")]//div[contains(@class, "alert")]')
for item in dynamics:
dynamic = ' '.join(item.xpath('.//div[@class="title"]//text()')).strip()
print(dynamic)
elif response.status_code == 422:
fhandle = open("./github_xpath_422.html","wb")
fhandle.write(html.encode('utf-8'))
fhandle.close()
def profile(self, html):
fhandle = open("./github_xpath_profile.html","wb")
fhandle.write(html.encode('utf-8'))
fhandle.close()
selector = etree.HTML(html)
name = selector.xpath('//input[@id="user_profile_name"]/@value')
email = selector.xpath('//select[@id="user_profile_email"]/option[@value!=""]/text()')
print(name, email)
if __name__ == "__main__":
login = Login()
login.login(email='email', password='password')
from githublogin.
邮箱验证不应该放在一开始的登录请求那吗
from githublogin.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from githublogin.