gummy.utils.pdf_utils module¶
Utility programs for handling and analyzing PDF file.
-
gummy.utils.pdf_utils.
get_pdf_pages
(file, dirname='/Users/iwasakishuto/.gummy')[source]¶ Get PDF pages.
- Parameters
file (data, str) – url or path or data of PDF.
dirname (str) – if
file
is url, download and save it todirname
. (defalt=GUMMY_DIR
)
-
gummy.utils.pdf_utils.
parser_pdf_pages
(layout_objs)[source]¶ Parse PDF pages and get contents in order.
- Parameters
layout_objs (list) – Each element is pdfminer.layout object.
- Returns
Each element is a list which contains [text, bbox(x0,y0,x1,y1)]
- Return type
list
-
gummy.utils.pdf_utils.
get_pdf_contents
(file, dirname='/Users/iwasakishuto/.gummy')[source]¶ Get PDF contents.
- Parameters
file (data, str) – url or path or data of PDF.
dirname (str) – if
file
is url, download and save it todirname
. (defalt=GUMMY_DIR
)
- Returns
Each element is a list which contains [text, bbox(x0,y0,x1,y1)]
- Return type
list
-
gummy.utils.pdf_utils.
createHighlight
(bbox=0, 0, 1, 1, contents='', color=[1, 1, 0], author='iwasakishuto(@cabernet_rock)')[source]¶ Create a Highlight
- Parameters
bbox (tuple) – a bounding box showing the location of highlight.
contents (str) – Text comments for a highlight label.
color (list) – Highlight color. Defaults to
[1,1,0]
. (yellow)author (str) – Who wrote the annotation (comment). Defaults to
"iwasakishuto(@cabernet_rock)"
.
- Returns
Highlight information.
- Return type
DictionaryObject
Examples
>>> from gummy.utils import createHighlight, addHighlightToPage >>> from PyPDF2 import PdfFileWriter, PdfFileReader >>> page_no = 0 >>> pdfOutput = PdfFileWriter() >>> with open("input.pdf", mode="rb") as inPdf: ... pdfInput = PdfFileReader(inPdf) ... page = pdfInput.getPage(page_no) ... highlight = createHighlight(bbox=(10,10,90,90), contents="COMMENT", color=(1,1,0)) ... addHighlightToPage(highlight, page, pdfOutput) ... pdfOutput.addPage(page) ... with open("output.pdf", mode="wb") as outPdf: ... pdfOutput.write(outPdf)
-
gummy.utils.pdf_utils.
addHighlightToPage
(highlight, page, output)[source]¶ Add a highlight to a page.
- Parameters
highlight (DictionaryObject) – Highlight information.
page (PageObject) – A single page within a PDF file.
output (PdfFileWriter) – A pdf writer.
Examples
>>> from gummy.utils import createHighlight, addHighlightToPage >>> from PyPDF2 import PdfFileWriter, PdfFileReader >>> page_no = 0 >>> pdfOutput = PdfFileWriter() >>> with open("input.pdf", mode="rb") as inPdf: ... pdfInput = PdfFileReader(inPdf) ... page = pdfInput.getPage(page_no) ... highlight = createHighlight(bbox=(10,10,90,90), contents="COMMENT", color=(1,1,0)) ... addHighlightToPage(highlight, page, pdfOutput) ... pdfOutput.addPage(page) ... with open("output.pdf", mode="wb") as outPdf: ... pdfOutput.write(outPdf)