RError.com

RError.com Logo RError.com Logo

RError.com Navigation

  • 主页

Mobile menu

Close
  • 主页
  • 系统&网络
    • 热门问题
    • 最新问题
    • 标签
  • Ubuntu
    • 热门问题
    • 最新问题
    • 标签
  • 帮助
主页 / user-518980

Red Fox's questions

Martin Hope
Red Fox
Asked: 2023-03-09 19:25:43 +0000 UTC

使用 asyncio 加速 python 解析器

  • 5

我不知道如何加速我的解析器

import asyncio
import aiohttp
from bs4 import BeautifulSoup
import time
import requests

async def get_html(url):
    async with aiohttp.ClientSession() as client:
        async with client.get(url) as r:
            return await r.text()


async def main():
    t0 = time.time()
    tasks = []
    count = 0
    for i in range(1,201):
        url = "https://cspromogame.ru/avatars/?page="
        url = url  + str(i)
        task = asyncio.create_task(get_html(url))
        tasks.append(task)
    p = await asyncio.gather(*tasks)
    for text in p:
        soup = BeautifulSoup(text, 'html.parser')
        for a in soup.findAll('a',class_='avatars__link'):
            link = a.get("href")
            await makePhoto(link)
            count += 1
    print(time.time()-t0)
    print(count)

async def makePhoto(url):
    name = str(url.split("/")[-1:]).replace(".jpg","")
    print(url)
    async with aiohttp.ClientSession(raise_for_status=True) as client:
        async with client.get(url) as r:
            with open(f"{name}.jpg","wb") as f:
                f.write(await r.text())
    

if __name__ == "__main__":
    asyncio.run(main())
python
  • 1 个回答
  • 32 Views
Martin Hope
Red Fox
Asked: 2022-12-15 12:29:32 +0000 UTC

如何加速 Avatar 解析器?

  • 8

我练习了多线程并决定制作一个头像解析器。但是数据收集的速度还有很多不足之处。我究竟做错了什么?或者也许在 asyncio 下重新制作它?

import threading
import requests
from bs4 import BeautifulSoup
import lxml

def getImage():
        #https://cspromogame.ru/avatars?page=1999
        links = set()
        for page in range(1,101): #2000
            url = "https://cspromogame.ru/avatars?page={page}"
            req = requests.get(url=url)
            soup = BeautifulSoup(req.text,"lxml")
            a = soup.findAll("a", class_="avatars__link")
            for link in a:
                link = link.get("href")
                alinks = link.split("/")[-1].replace(".jpg","")
                req2 = requests.get(link)
                out = open(f"Avatars/Картинка_{alinks}.jpg",'wb')
                out.write(req2.content)
                out.close()
                print("Обработано ", alinks)
                
threads  = []
for i in range(11):
    t = threading.Thread(target=getImage)
    t.start()

for th in threads:
    th.join()
python-3.x
  • 1 个回答
  • 22 Views
Martin Hope
Red Fox
Asked: 2022-09-11 12:17:48 +0000 UTC

我在哪里可以获得api(动漫)

  • 0

帮助解决错误。该程序给出了一个关键错误,尽管一切都已经过检查并且应该是正确的。

import requests
import json
from pprint import pprint
from deep_translator import GoogleTranslator

#Выдаёт ошибку KeyError строка 22

#ввод
inp = input(">>> ") # например one pieace
morf = inp.replace(" ","+").lower()
#перевод на английский
name = GoogleTranslator(source='auto', 
target='en').translate(morf)
print(name)
#подстановка в url 
url=f"https://kitsu.io/api/edge/anime?filter[text]={name}"                                                       
r = requests.get(url=url)
#ответ в json
d = json.dumps(r.json())
text = json.loads(d)
#множество для уникальности записей, иначе они дублируются
data = set()
for t in text:
    enTitle = text['data'][0]['attributes']['titles']['en_us']
    jpTitle  = text['data'][0]['attributes']['titles']['ja_jp']
    title = GoogleTranslator(source='auto', target='ru').translate(enTitle)
    stDate = text['data'][0]['attributes']['startDate']
    edDate = text['data'][0]['attributes']['endDate']
    typeA = text['data'][0]['attributes']['subtype']
    desc = text['data'][0]['attributes']['synopsis']
    descrip = GoogleTranslator(source='auto', target='ru').translate(desc)
    img = text['data'][0]['attributes']['posterImage']['original']
    ep = text['data'][0]['attributes']['episodeCount']
    lenEp = text['data'][0]['attributes']['episodeLength']
    #добавляем запись во множество
    data.add(str(title) +"\n"+ str(jpTitle) +"\n"+ str(stDate) +"\n"+ str(edDate) +"\n"+ str(typeA) +"\n"+ str(descrip) +"\n"+ str(img) +"\n"+str(ep) +"\n"+ str(lenEp))
#печатаем оезультат
print(data)
python
  • 0 个回答
  • 0 Views

Sidebar

Stats

  • 问题 10021
  • Answers 30001
  • 最佳答案 8000
  • 用户 6900
  • 常问
  • 回答
  • Marko Smith

    我看不懂措辞

    • 1 个回答
  • Marko Smith

    请求的模块“del”不提供名为“default”的导出

    • 3 个回答
  • Marko Smith

    "!+tab" 在 HTML 的 vs 代码中不起作用

    • 5 个回答
  • Marko Smith

    我正在尝试解决“猜词”的问题。Python

    • 2 个回答
  • Marko Smith

    可以使用哪些命令将当前指针移动到指定的提交而不更改工作目录中的文件?

    • 1 个回答
  • Marko Smith

    Python解析野莓

    • 1 个回答
  • Marko Smith

    问题:“警告:检查最新版本的 pip 时出错。”

    • 2 个回答
  • Marko Smith

    帮助编写一个用值填充变量的循环。解决这个问题

    • 2 个回答
  • Marko Smith

    尽管依赖数组为空,但在渲染上调用了 2 次 useEffect

    • 2 个回答
  • Marko Smith

    数据不通过 Telegram.WebApp.sendData 发送

    • 1 个回答
  • Martin Hope
    Alexandr_TT 2020年新年大赛! 2020-12-20 18:20:21 +0000 UTC
  • Martin Hope
    Alexandr_TT 圣诞树动画 2020-12-23 00:38:08 +0000 UTC
  • Martin Hope
    Air 究竟是什么标识了网站访问者? 2020-11-03 15:49:20 +0000 UTC
  • Martin Hope
    Qwertiy 号码显示 9223372036854775807 2020-07-11 18:16:49 +0000 UTC
  • Martin Hope
    user216109 如何为黑客设下陷阱,或充分击退攻击? 2020-05-10 02:22:52 +0000 UTC
  • Martin Hope
    Qwertiy 并变成3个无穷大 2020-11-06 07:15:57 +0000 UTC
  • Martin Hope
    koks_rs 什么是样板代码? 2020-10-27 15:43:19 +0000 UTC
  • Martin Hope
    Sirop4ik 向 git 提交发布的正确方法是什么? 2020-10-05 00:02:00 +0000 UTC
  • Martin Hope
    faoxis 为什么在这么多示例中函数都称为 foo? 2020-08-15 04:42:49 +0000 UTC
  • Martin Hope
    Pavel Mayorov 如何从事件或回调函数中返回值?或者至少等他们完成。 2020-08-11 16:49:28 +0000 UTC

热门标签

javascript python java php c# c++ html android jquery mysql

Explore

  • 主页
  • 问题
    • 热门问题
    • 最新问题
  • 标签
  • 帮助

Footer

RError.com

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

帮助

© 2023 RError.com All Rights Reserve   沪ICP备12040472号-5