Python 技巧 - 抓取页面并保存为PDF

技巧要点记录

1、requests和parsel库的安装

2、获取网页内容

发送一个请求

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16

url = ''

headers = {
    'Host': '',
    'Referer': '',
    'User-Agent': ''
}

cookie = {
    'Cookie': ''
}

response = requests.get(url, headers=headers, cookie=cookie)

print(response.text)

获取到内容

3、pdfkit、wkhtmltopdf库的安装

1
2

pip install pdfkit

wkhtmltopdf 请去官网下载 https://wkhtmltopdf.org/downloads.html

4、转换为PDF

pdfkit、wkhtmltopdf库的安装

1
2
3
4
5
6

import pdfkit

config = pdfkit.configuration(wkhtmltopdf='xxx/xxx/wkhtmltopdf.exe')

pdfkit.from_file('xx.html', 'xxx.pdf', configuration=config)