临时写的一个,应用场景佷有限,大家凭自己再扩展吧,我是因为要把一个文章迁移,强制复制也不行,就写了个这玩意。
import re import requests from lxml import etree post_url = input('请输入文章地址: ') #根提文章地址get数据 res = requests. get(post_url) xx= res. content. decode('utf-8') x = etree. HTML(xx) #需要获取父级xpath #xpath示例: //*[@id="article-container"] #不会的百度吧 xpath = input('请输入xpath路径, 可打开控制台查看:') content = x. xpath(xpath + '//*') ree = re. compile(r'class=".*"|id=".*"') url l = re. compile(r'(?<=(src="))(/).*?(?=("))') with open('resualt. txt', 'w', encoding='utf-8') as file: tep1 = '' for i in content: tep = etree. tostring(i, encoding='utf-8'). decode('utf-8'). strip() tep = re. sub(ree, ", tep) strr = re. search(urll, tep) #如果图片是想对路径,就自动背换成绝对路径,《需要自己寻找修改路径地址》 #后面不用筒,只需要找到煎面的路径就行。就像 https://dreamtea.top #需要自己实测 if strr is not None: strr r = strr. group() tep = re.sub(urll, ' https://cdn.con'+'/'+strr,tep) # print(tep) strr = None if tep != tep1 and tep in tep1: #print(tep) continue file. write(tep) tep1 = tep print('导出完成!')
这个可以再扩展成更自动的,可是我懒,希望有闲的没事的大佬扩展一下,我要借鉴(抄)~~