基于ddddocr库的验证码识别

背景

Ebay公益服的持续使用往往需要通过签到赚取积分来维持,偶尔因忙于他事容易忘记,故在翻阅他人资料后着手开发。

依赖准备

  • ddddocr 用于验证码识别
  • telethon 同telegram交互
  • asyncio 异步任务处理
  • requests 通知请求
# 安装
pip install ddddocr
# 安装时发生了报错
AttributeError: module 'PIL.Image' has no attribute 'ANTIALIAS'

在github上发现有issue解决方案

#按照方案二处理,降级Pillow的版本,比如使用9.5.0版本(先卸载,再重新安装)
pip uninstall -y Pillow
pip install Pillow==9.5.0

其他正常安装即可

ddddocr 使用

import ddddocr

ocr = ddddocr.DdddOcr(beta=True,show_ad=False) #开启测试版 关闭广告

with open("test.jpg", 'rb') as f:
    image = f.read()

res = ocr.classification(image)
print(res)

图片示例:

识别图片示例

其他一些用法

实现

实现路径为

  • 通过与telegram交互发送签到/checkin 命令,
  • 获取验证码,
  • 通过captcha_solver识别验证码,
  • 最后交互发送识别结果,
  • 判断结果信息,失败重试。
# -*- coding: utf-8 -*-
import os
import time
from telethon import TelegramClient, events
import ddddocr
import asyncio
import requests
def notice(text):
    api_url = "https://noticurl.com/send_notification"
    data = {
        "chat_id": "{chat_id}",
        "text": text
    }
    response = requests.post(api_url, json=data)
    return print(response.json())

def captcha_solver(f):
    with open(f, 'rb') as image_file: 
        image_bytes = image_file.read()
        ocr = ddddocr.DdddOcr(beta=True, show_ad=False)
        res = ocr.classification(image_bytes)
    return res

async def tg_qd(client, tg_bot, tg_command):
    await client.send_message(tg_bot, tg_command)
    await asyncio.sleep(5)  # 使用 asyncio.sleep 代替 time.sleep
    messages = await client.get_messages(tg_bot)
    await messages[0].download_media(file="1.jpg")
    the_code = captcha_solver("1.jpg")
    await client.send_message(tg_bot, the_code)
    await asyncio.sleep(5)
    messages = await client.get_messages(tg_bot)
    return messages[0].message

api_id = [{api_id}]  # 输入api_id,一个账号一项
api_hash = ['{api_hash}']  # 输入api_hash,一个账号一项
session_name = api_id[:]
bots_commands = ["@{channelname}", "/checkin", "成功","/cancle"]

async def main():
    for num in range(len(api_id)):
        session_name[num] = "id_" + str(session_name[num])
        client = TelegramClient(
            session_name[num],
            api_id[num],
            api_hash[num],
            proxy=("socks5", "127.0.0.1", 1087), #本地运行开启代理
        )
        try:
            await client.start()
            the_result = await tg_qd(client, bots_commands[0], bots_commands[1])
            print(the_result)
            i = 0
            while bots_commands[2] not in the_result: 
                i += 1
                await client.send_message(bots_commands[0], bots_commands[3])
                await asyncio.sleep(5)
                the_result = await tg_qd(client, bots_commands[0], bots_commands[1])
                if i > 2: 
                    break
            text = the_result + "终点站"
            notice(text)
        except Exception as err:
            error_message = f"Error in main(): {str(err)}"
            notice(error_message)
        finally:
            await client.disconnect()

asyncio.run(main())

其他

Truecaptcha

本来想通过truecaptcha来实现验证码识别

import requests
import base64

def solve(f):
	with open(f, "rb") as image_file:
		encoded_string = base64.b64encode(image_file.read()).decode('ascii')
		url = 'https://api.apitruecaptcha.org/one/gettext'

		data = { 
			'userid':'{id}', #填入id
			'apikey':'{key}',  #填入key
			'data':encoded_string
		}
		response = requests.post(url = url, json = data)
		data = response.json()
		return data

但是之前的单日100的限额变成了一次。

另外试了一下改服务对于中文识别不太行,英文数字还可以

telegram交互

后续还需要多研究研究!