RError.com

RError.com Logo RError.com Logo

RError.com Navigation

  • 主页

Mobile menu

Close
  • 主页
  • 系统&网络
    • 热门问题
    • 最新问题
    • 标签
  • Ubuntu
    • 热门问题
    • 最新问题
    • 标签
  • 帮助
主页 / user-566550

Abram's questions

Martin Hope
Vyacheslav
Asked: 2025-04-02 14:14:51 +0000 UTC

Word 文件无法在 Python 代码中读取

  • 5

为什么python代码中无法读取并打印word文件的内容:

import bs4
import time
import random
import requests
import docx
from bs4 import BeautifulSoup
from requests_html import HTMLSession
import magic
import chardet
import codecs
from io import BytesIO
from docx import Document

from selenium import webdriver  # pip install selenium


# Список пользовательских агентов
user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 11.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.4 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'
]


# Функция для получения случайного пользовательского агента
def get_random_user_agent():
    return random.choice(user_agents)


# Настройка браузера
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f"user-agent={get_random_user_agent()}")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option("useAutomationExtension", False)

data = []
# Использование webdriver

session = HTMLSession()
response = session.get(
    "https://mos-gorsud.ru/mgs/search?caseDateFrom=16.02.2023&caseDateTo=28.02.2023&courtAlias=mgs&documentStatus=2&processType=6&formType=fullForm&page=2")
time.sleep(3)  # Дополнительная задержка на случай, если нужно, но избегайте чрезмерного использования sleep

soup = BeautifulSoup(response.text, 'html.parser')
heads = soup.find('table', class_='custom_table').find_all('tr')
print(len(heads))
for head in heads[1:]:
    link = 'https://mos-gorsud.ru' + head.find('nobr').find('a')['href']
    print(link)
    loom = session.get(link)
    abble = BeautifulSoup(loom.text, 'html.parser')
    
    documents = abble.find('table', {'class': 'custom_table mainTable'}).find('tbody').find_all('tr')
    for document in documents:
        if "Приговор" in document.text:
            score = document.find_all('td')

            print(len(score))
            for soc in score:
                stock = soc.find_all('a')
                for sto in stock:
                    print('Prigovor: ' + 'https://mos-gorsud.ru' + sto['href'])
                    link_doc = 'https://mos-gorsud.ru' + sto['href']
                    response = requests.get(link_doc, get_random_user_agent())

                    # Проверка успешности запроса
                    if response.status_code == 200:
                        # Сохранение файла на диск
                        with open('prigovor.docx', 'wb') as file:
                            file.write(response.content)

                        # Открытие Word-документа и извлечение текста
                        document = Document('prigovor.docx')
                        text = '\n'.join([paragraph.text for paragraph in document.paragraphs])
                        resheniye = ' '.join(text.split())

                        # Вывод ссылки и текста
                        print('Ссылка на файл: https://mos-gorsud.ru' + sto['href'])
                        print(resheniye)
                    else:
                        print(f"Error downloading file: {response.status_code}")

        elif "Постановление суда апелляционной инстанции" in document.text:
            score = document.find_all('td')

            print(len(score))
            for soc in score:
                stock = soc.find_all('a')
                for sto in stock:
                    print('Postanovleniye : ' + 'https://mos-gorsud.ru' + sto['href'])
                    link_pod = 'https://mos-gorsud.ru' + sto['href']
                    response = requests.get(link_pod, get_random_user_agent())

                    # Проверка успешности запроса
                    if response.status_code == 200:
                        # Сохранение файла на диск
                        with open('resheniye.docx', 'wb') as file:
                            file.write(response.content)

                        # Открытие Word-документа и извлечение текста
                        document = Document('resheniye.docx')
                        text = '\n'.join([paragraph.text for paragraph in document.paragraphs])
                        postanov = ' '.join(text.split())

                        # Вывод ссылки и текста
                        print('Ссылка на файл: https://mos-gorsud.ru' + sto['href'])
                        print(postanov)
                    else:
                        print(f"Error downloading file: {response.status_code}")

            print('\n')

写道:

Traceback (most recent call last):
  File "C:\Users\user\PycharmProjects\cases_pars\little.py", line 143, in <module>
    document = Document('resheniye.docx')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\PycharmProjects\cases_pars\.venv\Lib\site-packages\docx\api.py", line 27, in Document
    document_part = cast("DocumentPart", Package.open(docx).main_document_part)
                                         ^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\PycharmProjects\cases_pars\.venv\Lib\site-packages\docx\opc\package.py", line 127, in open
    pkg_reader = PackageReader.from_file(pkg_file)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\PycharmProjects\cases_pars\.venv\Lib\site-packages\docx\opc\pkgreader.py", line 22, in from_file
    phys_reader = PhysPkgReader(pkg_file)
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\PycharmProjects\cases_pars\.venv\Lib\site-packages\docx\opc\phys_pkg.py", line 21, in __new__
    raise PackageNotFoundError("Package not found at '%s'" % pkg_file)
docx.opc.exceptions.PackageNotFoundError: Package not found at 'resheniye.docx'
python
  • 2 个回答
  • 47 Views
Martin Hope
Vyacheslav
Asked: 2024-11-24 22:57:50 +0000 UTC

屏幕显示的数据与导出到excel文件的数据不一致

  • 6

有一个 Flask 项目:

from flask import Flask, request, render_template, send_file
import requests
# import csv
import pandas as pd
from io import BytesIO
import logging
import json
# import pandas as pd
# from io import BytesIO
# -*- coding: utf-8 -*-
import sys

sys.stdout.reconfigure(encoding='utf-8')

logging.basicConfig(level=logging.DEBUG)
app = Flask(__name__)


# Функция для поиска вакансий
def search_vacancies(keyword):
    BASE_URL = "https://api.hh.ru/vacancies"
    vacancies_found = []  # Список для хранения найденных вакансий
    per_page = 100
    page = 0

    while True:
        params = {
            'text': keyword,
            'page': page,
            'per_page': per_page,
            'only_with_salary': True  # Только вакансии с зарплатой
        }

        response = requests.get(BASE_URL, params=params)
        if response.status_code != 200:
            print("Ошибка при обращении к API")
            break

        lan = response.json()

        if not lan['items']:
            break

        for item in lan['items']:
            # vacancy_name = item['name']
            vacancy_name = item['name'].lower()  # Приводим название вакансии к нижнему регистру для сравнения
            # Проверяем, содержит ли название вакансии ключевое слово
            if keyword in vacancy_name:
                salary_from = item['salary']['from'] if item['salary'] else None
                salary_to = item['salary']['to'] if item['salary'] else None
                currency = item['salary']['currency'] if item['salary'] else None
                city = item['area']['name'] if 'area' in item else None
                link = item['alternate_url'] if 'alternate_url' in item else None
                discription = item['snippet']['responsibility'] if 'snippet' in item else None

                vacancies_found.append({
                    'name': vacancy_name,
                    'salary_from': salary_from,
                    'salary_to': salary_to,
                    'currency': currency,
                    'city': city,
                    'link': link,
                    'discription': discription
                })

        page += 1

    return vacancies_found


@app.route('/download', methods=['POST'])
def download():
    keyword = request.form['work_name']
    vacancies = search_vacancies(keyword)

    df = pd.DataFrame(vacancies)
    output = BytesIO()
    df.to_excel(output, index=False)
    output.seek(0)
    return send_file(output, as_attachment=True, download_name='vacancies.xlsx')


@app.route('/', methods=['GET', 'POST'])
def index():
    if request.method == 'POST':
        keyword = request.form['keyword']
        vacancies = search_vacancies(keyword)
        print(len(vacancies))
        all_vacancies = (len(vacancies))
        if vacancies:
            # filename = save_to_json(vacancies)
            return render_template('index.html', vacancies=vacancies, download=True, all_vacancies=all_vacancies)
        else:
            return render_template('index.html', all_vacancies=0)
    return render_template('index.html')


if __name__ == '__main__':
    app.run(debug=True)
<h1>Парсер вакансий</h1>
<form method="POST">
    <label for="keyword">Введите вакансию для поиска:</label>
    <input type="text" id="keyword" style="margin-left: -2%;" name="keyword" required>
    <button type="submit">Искать</button>
</form>
{% if vacancies %}
<h2>Найденные вакансии: {{ all_vacancies }}</h2>
<form method="post" action="/download">
    <input type="hidden" name="work_name" value="{{ keyword }}">
    <input type="submit" value="Скачать в Excel">
</form>
<table>
    <tr>
        <th>Название</th>
        <th>Зарплата от</th>
        <th>Зарплата до</th>
        <th>Валюта</th>
        <th>Город</th>
        <th>Описание</th>
        <th>Ссылка</th>
    </tr>
    {% for vacancy in vacancies %}
    <tr>
        <td>{{ vacancy.name }}</td>
        <td>{{ vacancy.salary_from }}</td>
        <td>{{ vacancy.salary_to }}</td>
        <td>{{ vacancy.currency }}</td>
        <td>{{ vacancy.city }}</td>
        <td>{{ vacancy.discription }}</td>
        <td><a href="{{ vacancy.link }}" class="vacancy-link">{{ vacancy.link }}</a></td>
    </tr>
    {% endfor %}
</table>
{% elif message %}
<p>{{ message }}</p>
{% endif %}
<form method="post" action="/download">
    <input type="hidden" name="work_name" value="{{ keyword }}">
    <input type="submit" value="Скачать в Excel">
</form>
<!--{#</table>#}-->
</body>
</html>

问题是屏幕上显示的解析结果与导出到excel文件的数据不匹配。而且,我对某个关键字请求不同的数据,并且该关键字对应的数据显示在屏幕上,导出到excel文件的数据总是相同的,尽管它必须与屏幕上显示的数据相对应。我不明白为什么会发生这种情况,如果有人知道,请告诉我问题是什么。

项目链接:https://abram742.pythonanywhere.com/

python
  • 1 个回答
  • 39 Views
Martin Hope
Abram
Asked: 2024-03-08 23:48:05 +0000 UTC

在Python代码中使用多个异常仍然会引发错误

  • 5

在 Python 代码中使用多个异常仍然会引发错误。如果有人知道请告诉我原因是什么:

import bs4
import time
import requests  # pip install requests
from bs4 import BeautifulSoup  # pip install bs4
from selenium import webdriver  # pip install selenium
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager  # pip install webdriver-manager

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36')
chrome_options.add_argument("--disable-blink-features=AutomationControlled")

with webdriver.Chrome(service=Service(ChromeDriverManager().install()),
                      options=chrome_options) as driver:  # Открываем хром
    driver.get(f"https://www.stroyportal.ru/catalog/section-bolty-3128/st60/")  # Открываем страницу
    time.sleep(3)  # Время на прогрузку страницы
    soup = bs4.BeautifulSoup(driver.page_source, 'html.parser')
    heads = soup.find('div', class_='col-12 col-xs-12 order-2 order-xs-2 order-sm-2 catalog_list_items').find_all(
        'div', class_='catalog_list_item')
    print(len(heads))
    headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    for i in heads:
        try:
            w = i.find_next('div', class_='format_img').find('a', href=True)
            print('https://www.stroyportal.ru' + w['href'])
            get_url = ('https://www.stroyportal.ru' + w['href'])
        except:
            get_url = "https://www.stroyportal.ru/catalog/section-bolty-3128/gayka-soedinitelnaya-m8-702404271/?popup=true"
            break

        stock = requests.get(get_url, headers=headers).text
        moon = BeautifulSoup(stock, 'lxml')
        try:
            name = moon.find('div', class_='font-0 vertical-m').find('h1',
                                                                     class_='d-inline inline-right-16 font-bold font-24')
            print(name.text.strip())
        except Exception:
            name = moon.find('div', class_='col-7 col-sm-6 col-xs-12').find('h1', class_='font-22 font-bold')
            print(name.text.strip())
        except:
            name = 'None'
            print(name)

回溯(最近一次调用最后一次):文件“C:\Users\user\PycharmProjects\stroyportal_pars\main_3.py”,第 36 行,名称= Moon.find('div', class_='font-0 Vertical-m' ).find('h1', ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^ ^^^^^^ AttributeError:“NoneType”对象没有属性“find”

在处理上述异常的过程中,又出现了一个异常:

回溯(最近一次调用最后一次):文件“C:\Users\user\PycharmProjects\stroyportal_pars\main_3.py”,第 40 行,名称= Moon.find('div', class_='col-7 col-sm- 6 col-xs-12').find('h1', class_='font-22 font-bold') ^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError:“NoneType”对象没有属性“find”

python
  • 1 个回答
  • 38 Views
Martin Hope
Abram
Asked: 2023-10-07 14:29:49 +0000 UTC

Python 程序不执行第二部分

  • 4

有一个用于注册汽车和驾驶员的Python程序。
问题是只执行了第一部分,第二部分抛出错误。

我不明白出了什么问题。如果有人知道,请告诉我。

我的代码:

class Car:
    def __init__(self, license_plate, brand, model, color):
        self.license_plate = license_plate
        self.brand = brand
        self.model = model
        self.color = color


class Driver:
    def __init__(self, name, age, license_number):
        self.name = name
        self.age = age
        self.license_number = license_number


class ParkingLot:
    def __init__(self):
        self.cars = []
        self.drivers = []

    def add_car(self, car):
        self.cars.append(car)
        print(f"Машина {car.license_plate} добавлена в учет.")

    def add_driver(self, driver):
        self.drivers.append(driver)
        print(f"Водитель {driver.name} добавлен в учет.")

    def remove_car(self, car):
        self.cars.remove(car)
        print(f"Машина {car.license_plate} удалена из учета.")

    def remove_driver(self, driver):
        self.drivers.remove(driver)
        print(f"Водитель {driver.name} удален из учета.")

    def get_car_by_license_plate(self, license_plate):
        for car in self.cars:
            if car.license_plate == license_plate:
                return car
        return None

    def get_driver_by_license_number(self, license_number):
        for driver in self.drivers:
            if driver.license_number == license_number:
                return driver
        return None


while True:
    # Пример использования программы
    parking_lot = ParkingLot()
    print('enter number car')
    nu_car = input()
    print('enter color car')
    color = input()
    print('enter firm car')
    firm_car = input()
    print('enter model car')
    model = input()
    print('enter first last name')
    name = input()
    print('enter age driver')
    age = int(input())
    print('enter drivedoc')
    docc = int(input())
    car1 = Car(f"{nu_car}", f"{firm_car}", f"{model}", f"{color}")
    driver1 = Driver(f"{name}", f"{age}", f"{docc}")
    print('Выберите соответсвующую цифру: 1-добавить в учёт, 2-удалить из учёта')
    count = int(input())
    if count == 1:
        # Добавляем машину и водителя в учет
        parking_lot.add_car(car1)
        parking_lot.add_driver(driver1)
    elif count == 2:
        # Удаляем машину и водителя из учета
        parking_lot.remove_car(car1)
        parking_lot.remove_driver(driver1)
python
  • 1 个回答
  • 47 Views
Martin Hope
Abram
Asked: 2023-09-13 02:00:11 +0000 UTC

数据未使用pandas、python写入excel表

  • 5

数据不写入excel表,只写入最后一行。尝试了不同的位置,结果相同。如果有人知道,请告诉我错误在哪里?

import pandas as pd
import requests
from bs4 import BeautifulSoup

while True:
    print('Выберите категорию товара: легковые шины, грузовые шины, внедорожные шины')
    name = input()
    if name == 'легковые шины':
        get_name = 'https://samara.express-shina.ru/search/legkovyie-shinyi'
    elif name == 'грузовые шины':
        get_name = 'https://samara.express-shina.ru/search/gruzovyie-shinyi'
    elif name == 'внедорожные шины':
        get_name = 'https://samara.express-shina.ru/search/vnedorozhnyie-shinyi'
    else:
        print('Такой категории нет')

    print('Введите название файла латинскими буквами')
    file_name = str(input())

    count = 1
    while count <= 3:
        url = f'{get_name}?num={count}'
        data = requests.get(url).text
        block = BeautifulSoup(data, 'lxml')
        heads = block.find_all('div', class_='b-offer__boxes')
        for i in heads:
            get_url = i.find_next('a').get('href')
            # print('https://samara.express-shina.ru'+get_url)
            w = ('https://samara.express-shina.ru' + get_url)
            seac = requests.get(w).text
            look = BeautifulSoup(seac, 'lxml')
            leen = look.find('div', class_='header_product_page').find('h1')
            print(leen.text.strip())
            nazvan = (leen.text.strip())
            price = look.find('span', class_='price_new')
            print(price.text.strip())
            cena = (price.text.strip())
            articul = look.find('span', class_='articul')
            print(articul.text.strip())
            codde = (articul.text.strip())
            img = look.find('div', class_='inner_images').find('img').get('src')
            print('https://samara.express-shina.ru' + img)
            pixx = ('https://samara.express-shina.ru' + img)
            print('\n')

            storage = {'zagol': nazvan,
                       'cena': cena,
                       'articul': codde,
                       'img': pixx}

            df = pd.DataFrame({
                'NAME': [storage['zagol']],
                'PRICE': [storage['cena']],
                'ARTICUL': [storage['articul']],
                'IMG': [storage['img']]
            })
            df.to_excel(f'{file_name}.xlsx')

        count += 1
python
  • 2 个回答
  • 33 Views

Sidebar

Stats

  • 问题 10021
  • Answers 30001
  • 最佳答案 8000
  • 用户 6900
  • 常问
  • 回答
  • Marko Smith

    我看不懂措辞

    • 1 个回答
  • Marko Smith

    请求的模块“del”不提供名为“default”的导出

    • 3 个回答
  • Marko Smith

    "!+tab" 在 HTML 的 vs 代码中不起作用

    • 5 个回答
  • Marko Smith

    我正在尝试解决“猜词”的问题。Python

    • 2 个回答
  • Marko Smith

    可以使用哪些命令将当前指针移动到指定的提交而不更改工作目录中的文件?

    • 1 个回答
  • Marko Smith

    Python解析野莓

    • 1 个回答
  • Marko Smith

    问题:“警告:检查最新版本的 pip 时出错。”

    • 2 个回答
  • Marko Smith

    帮助编写一个用值填充变量的循环。解决这个问题

    • 2 个回答
  • Marko Smith

    尽管依赖数组为空,但在渲染上调用了 2 次 useEffect

    • 2 个回答
  • Marko Smith

    数据不通过 Telegram.WebApp.sendData 发送

    • 1 个回答
  • Martin Hope
    Alexandr_TT 2020年新年大赛! 2020-12-20 18:20:21 +0000 UTC
  • Martin Hope
    Alexandr_TT 圣诞树动画 2020-12-23 00:38:08 +0000 UTC
  • Martin Hope
    Air 究竟是什么标识了网站访问者? 2020-11-03 15:49:20 +0000 UTC
  • Martin Hope
    Qwertiy 号码显示 9223372036854775807 2020-07-11 18:16:49 +0000 UTC
  • Martin Hope
    user216109 如何为黑客设下陷阱,或充分击退攻击? 2020-05-10 02:22:52 +0000 UTC
  • Martin Hope
    Qwertiy 并变成3个无穷大 2020-11-06 07:15:57 +0000 UTC
  • Martin Hope
    koks_rs 什么是样板代码? 2020-10-27 15:43:19 +0000 UTC
  • Martin Hope
    Sirop4ik 向 git 提交发布的正确方法是什么? 2020-10-05 00:02:00 +0000 UTC
  • Martin Hope
    faoxis 为什么在这么多示例中函数都称为 foo? 2020-08-15 04:42:49 +0000 UTC
  • Martin Hope
    Pavel Mayorov 如何从事件或回调函数中返回值?或者至少等他们完成。 2020-08-11 16:49:28 +0000 UTC

热门标签

javascript python java php c# c++ html android jquery mysql

Explore

  • 主页
  • 问题
    • 热门问题
    • 最新问题
  • 标签
  • 帮助

Footer

RError.com

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

帮助

© 2023 RError.com All Rights Reserve   沪ICP备12040472号-5