Python进行PDF转图片

pdfplumber的可视化调试
使用pdfplumber这个Python工具库,pdfplumber基于pdfminer.six。
使用pdfplumber进行PDF转图片,简单快捷。同时pdfplumber还提供可视化的PDF内容提取调试支持,如上图。
import pdfplumber
pdf = pdfplumber.open("ccf-2019.pdf")
for i, page in enumerate(pdf.pages):
page.to_image(resolution=150).save('{}.png'.format(i))
Linux上进行PDF转图片
Linux上使用pdftoppm命令行工具可以方便进行PDF转图片,pdftoppm属于poppler-utils包。
安装:
sudo apt install poppler-utils
使用:
pdftoppm -png demo.pdf <图片名>
pdftoppm提供许多配置选项,比如crop剪切图片、缩放、分辨率、打印页数等等。
Usage: pdftoppm [options] [PDF-file [PPM-file-prefix]]
-f : first page to print
-l : last page to print
-o : print only odd pages
-e : print only even pages
-singlefile : write only the first page and do not add digits
-r : resolution, in DPI (default is 150)
-rx : X resolution, in DPI (default is 150)
-ry : Y resolution, in DPI (default is 150)
-scale-to : scales each page to fit within scale-to*scale-to pixel box
-scale-to-x : scales each page horizontally to fit in scale-to-x pixels
-scale-to-y : scales each page vertically to fit in scale-to-y pixels
-x : x-coordinate of the crop area top left corner
-y : y-coordinate of the crop area top left corner
-W : width of crop area in pixels (default is 0)
-H : height of crop area in pixels (default is 0)
-sz : size of crop square in pixels (sets W and H)
-cropbox : use the crop box rather than media box
-mono : generate a monochrome PBM file
-gray : generate a grayscale PGM file
-png : generate a PNG file
-jpeg : generate a JPEG file
-jpegopt : jpeg options, with format =[,=]*
-tiff : generate a TIFF file
-tiffcompression : set TIFF compression: none, packbits, jpeg, lzw, deflate
-freetype : enable FreeType font rasterizer: yes, no
-thinlinemode : set thin line mode: none, solid, shape. Default: none
-aa : enable font anti-aliasing: yes, no
-aaVector : enable vector anti-aliasing: yes, no
-opw : owner password (for encrypted files)
-upw : user password (for encrypted files)
-q : don't print any messages or errors