哔咔picacomic-收藏夹自动下载爬虫 Python

2020-4-29更新:
写了一个带视窗的下载程序,详见:
博客:https://www.muyoo.top/index.php/archives/61/
github:https://github.com/muyoou/picacomic_downloader
pica的Api详见https://www.muyoo.top/index.php/archives/4/

2020-2-16最后一次测试可以使用,pica如果强制更新可能会失效

搞哔咔的爬虫主要是因为它的下载功能简直无语。。。下下来是用uuid标记的乱序就算了,看离线下载的本子居然还要联网。我有网还下下来干什么。。。对于我这种松鼠病的人简直不能忍
之前反编译了pica的app拿到了api,现在写了个爬虫下载。专门下载收藏夹里的本子,因为目的明确,所以写的很粗糙。
先要安装一下requests库,引入这些包

# encoding=utf8  
import sys
reload(sys)
sys.setdefaultencoding('utf8')
import hashlib
import hmac
import requests
import uuid
import json
import time
import os
import time
import os
import urllib3

因为在下面选择了允许所有证书,每次运行都会报一个警告,所以加一行代码抑制一下

urllib3.disable_warnings()

下面是pica的验证密钥计算函数,原理详见Api分析

#hmacsha256加密函数
def hmacsha256(key,string):
        signature = hmac.new(
                key,
                msg=string,
                digestmod=hashlib.sha256
        ).hexdigest()
        return signature
#密钥计算函数
def password(url,method,time,nonce):
        key="C69BAF41DA5ABD1FFEDC6D2FEA56B"
        str=url+time+nonce+method+key
        str=str.lower()
        mi="~d}$Q7$eIni=V)9\\RK/P.RM4;9[7|@/CA}b~OW!3?EV`:<>M7pddUBL5n|0/*Cn"
        return hmacsha256(mi,str)

再就是请求发送函数,写的挺混乱没封装。把post,get和图片下载全写一个函数里了,有空完善一下。
其实这东西只要用一次把收藏夹都下下来就好了,所以就很随意

def send(nurl,payload,mothed,auth=""):
        timestr=str(int(time.time()))
        url="https://picaapi.picacomic.com/"+nurl
        nonce=str(uuid.uuid1()).replace("-","")
        sign=password(nurl,mothed,timestr,nonce)
        headers={
                        "api-key":"C69BAF41DA5ABD1FFEDC6D2FEA56B",
                        "accept":"application/vnd.picacomic.com.v1+json",
                        "app-channel":"1",
                        "time":timestr,
                        "nonce":nonce,
                        "signature":sign,
                        "app-version":"2.2.1.3.3.4",
                        "app-uuid":"cb69a7aa-b9a8-3320-8cf1-74347e9ee970",
                        "image-quality":"high",
                        "app-platform":"android",
                        "app-build-version":"45",
                        "Content-Type": "application/json; charset=UTF-8",
                        "User-Agent":"okhttp/3.8.1",
                }
if auth!="":
                headers.update({"authorization":auth})
        if mothed=="POST":
                r = requests.post(url,headers=headers,data=json.dumps(payload),verify=False)
        elif mothed=="GET":
                getnum=0
                headers.pop("Content-Type")
                while True:
                    r = requests.get(url,headers=headers,verify=False)
                    if r.status_code == 200:
                         print('GET请求成功')
                         break
                    else:
                         print('GET请求失败')
                         print('尝试重新连接中。。。')
                         time.sleep(3)
                         getnum+=1
                         if getnum>8 :break
        else:
                getnum=0
                headers.pop("Content-Type")
                while True:
                    r = requests.get(nurl,headers=headers,verify=False,stream=True)
                    if r.status_code == 200:
                         open(payload, 'wb').write(r.content)
                         break
                    else:
                         time.sleep(3)
                         print('图片加载错误')
                         getnum+=1
                         if getnum>8 :break
        return r

这个是创建文件夹的函数,给每一个本子新建一个文件夹

def mkdir(path):
    isExists=os.path.exists(path)
    if not isExists:
        os.makedirs(path) 
        return True
    else:
        return False

然后就是要运行的语句了~
先登陆一下拿到token

mytoken=send("auth/sign-in",{"email":"你的用户名","password":"密码"},"POST").json()['data']['token']

然后就可以为所欲为了
接下来的本子下载代码写的和*一样,嵌套了4层循环,看本子心情急切,望见谅

print('开始执行收藏夹下载')
#在下面写上要下载的收藏夹页数,默认从1开始计数,下文是下载1到21页
for index in range(1,21):
        print('\n')
        print('开始下载收藏夹第'+str(index)+'页的内容。。。')
        out=send("users/favourite?s=dd&page="+str(index),None,"GET",mytoken)
        comicnum=1
        for comic in out.json()['data']['comics']['docs']:
               print('~正在下载第'+str(index)+'页本子:'+str(comic['title']))
               comicid=comic['_id']
               out2=send("comics/"+str(comicid)+"/eps?page=1",None,"GET",mytoken)
               epsnum=1
               for eps in out2.json()['data']['eps']['docs']:
                        print('章节:'+str(eps['title'])+"\n--------------------------------------\n")
                        temppage=1
                        while True:
                                epsid=eps['order']
                                out3=send("comics/"+str(comicid)+"/order/"+str(epsid)+"/pages?page="+str(temppage),None,"GET",mytoken).json()
                                total=out3['data']['pages']['total']
                                savepath="./comic/"+str(index)+'_'+str(comic['title'])+"/"+str(eps['title'])
                                if not (mkdir(savepath) or (temppage!=1)) : break
                                picnum=(temppage-1)*40+1
                                for picture in out3['data']['pages']['docs']:
                                        print('-下载 '+str(picnum)+'/'+str(total)+' -- '+str(picture['media']['originalName']))
                                        pic=sendPost(str(picture['media']['fileServer'])+"/static/"+str(picture['media']['path']),savepath+"/"+str(picnum)+"_"+str(picture['media']['originalName']),"img",mytoken)
                                        picnum+=1
                                if int(out3['data']['pages']['pages'])==temppage:break
                                else : temppage+=1
                        print('此章节下载完成\n---------------------------------------\n')
                        if epsnum>=5 :break
                        epsnum+=1
               print("~此本子下载完成")
               comicnum+=1
               time.sleep(3)
        print('\n\n这一页下载完成\n\n')
        time.sleep(3)

然后就可以挂个外国服务器让它自己下载了,还是要下好一阵子的。。。21页收藏夹下了一个晚上

2 条评论

  • Joey

    woc,牛逼

  • Joey

    慢死了QWQ
    平时贼吃流量,这次发来的数据少得可怜
    吸人血

留下你的评论