Python进行PDF转图片

pdfplumber的可视化调试

使用pdfplumber这个Python工具库,pdfplumber基于pdfminer.six。

使用pdfplumber进行PDF转图片,简单快捷。同时pdfplumber还提供可视化的PDF内容提取调试支持,如上图。

import pdfplumber pdf = pdfplumber.open("ccf-2019.pdf") for i, page in enumerate(pdf.pages): page.to_image(resolution=150).save('{}.png'.format(i))

Linux上进行PDF转图片

Linux上使用pdftoppm命令行工具可以方便进行PDF转图片,pdftoppm属于poppler-utils包。

安装:

sudo apt install poppler-utils

使用:

pdftoppm -png demo.pdf <图片名>

pdftoppm提供许多配置选项,比如crop剪切图片、缩放、分辨率、打印页数等等。

Usage: pdftoppm [options] [PDF-file [PPM-file-prefix]] -f : first page to print -l : last page to print -o : print only odd pages -e : print only even pages -singlefile : write only the first page and do not add digits -r : resolution, in DPI (default is 150) -rx : X resolution, in DPI (default is 150) -ry : Y resolution, in DPI (default is 150) -scale-to : scales each page to fit within scale-to*scale-to pixel box -scale-to-x : scales each page horizontally to fit in scale-to-x pixels -scale-to-y : scales each page vertically to fit in scale-to-y pixels -x : x-coordinate of the crop area top left corner -y : y-coordinate of the crop area top left corner -W : width of crop area in pixels (default is 0) -H : height of crop area in pixels (default is 0) -sz : size of crop square in pixels (sets W and H) -cropbox : use the crop box rather than media box -mono : generate a monochrome PBM file -gray : generate a grayscale PGM file -png : generate a PNG file -jpeg : generate a JPEG file -jpegopt : jpeg options, with format =[,=]* -tiff : generate a TIFF file -tiffcompression : set TIFF compression: none, packbits, jpeg, lzw, deflate -freetype : enable FreeType font rasterizer: yes, no -thinlinemode : set thin line mode: none, solid, shape. Default: none -aa : enable font anti-aliasing: yes, no -aaVector : enable vector anti-aliasing: yes, no -opw : owner password (for encrypted files) -upw : user password (for encrypted files) -q : don't print any messages or errors