gummy.utils.pdf_utils module

Utility programs for handling and analyzing PDF file.

gummy.utils.pdf_utils.get_pdf_pages(file, dirname='/Users/iwasakishuto/.gummy')[source]

Get PDF pages.

Parameters
  • file (data, str) – url or path or data of PDF.

  • dirname (str) – if file is url, download and save it to dirname. (defalt= GUMMY_DIR)

gummy.utils.pdf_utils.parser_pdf_pages(layout_objs)[source]

Parse PDF pages and get contents in order.

Parameters

layout_objs (list) – Each element is pdfminer.layout object.

Returns

Each element is a list which contains [text, bbox(x0,y0,x1,y1)]

Return type

list

gummy.utils.pdf_utils.get_pdf_contents(file, dirname='/Users/iwasakishuto/.gummy')[source]

Get PDF contents.

Parameters
  • file (data, str) – url or path or data of PDF.

  • dirname (str) – if file is url, download and save it to dirname. (defalt= GUMMY_DIR)

Returns

Each element is a list which contains [text, bbox(x0,y0,x1,y1)]

Return type

list

gummy.utils.pdf_utils.createHighlight(bbox=0, 0, 1, 1, contents='', color=[1, 1, 0], author='iwasakishuto(@cabernet_rock)')[source]

Create a Highlight

Parameters
  • bbox (tuple) – a bounding box showing the location of highlight.

  • contents (str) – Text comments for a highlight label.

  • color (list) – Highlight color. Defaults to [1,1,0]. (yellow)

  • author (str) – Who wrote the annotation (comment). Defaults to "iwasakishuto(@cabernet_rock)" .

Returns

Highlight information.

Return type

DictionaryObject

Examples

>>> from gummy.utils import createHighlight, addHighlightToPage
>>> from PyPDF2 import PdfFileWriter, PdfFileReader
>>> page_no = 0
>>> pdfOutput = PdfFileWriter()
>>> with open("input.pdf", mode="rb") as inPdf:
...     pdfInput = PdfFileReader(inPdf)
...     page = pdfInput.getPage(page_no)
...     highlight = createHighlight(bbox=(10,10,90,90), contents="COMMENT", color=(1,1,0))
...     addHighlightToPage(highlight, page, pdfOutput)
...     pdfOutput.addPage(page)
...     with open("output.pdf", mode="wb") as outPdf:
...         pdfOutput.write(outPdf)
gummy.utils.pdf_utils.addHighlightToPage(highlight, page, output)[source]

Add a highlight to a page.

Parameters
  • highlight (DictionaryObject) – Highlight information.

  • page (PageObject) – A single page within a PDF file.

  • output (PdfFileWriter) – A pdf writer.

Examples

>>> from gummy.utils import createHighlight, addHighlightToPage
>>> from PyPDF2 import PdfFileWriter, PdfFileReader
>>> page_no = 0
>>> pdfOutput = PdfFileWriter()
>>> with open("input.pdf", mode="rb") as inPdf:
...     pdfInput = PdfFileReader(inPdf)
...     page = pdfInput.getPage(page_no)
...     highlight = createHighlight(bbox=(10,10,90,90), contents="COMMENT", color=(1,1,0))
...     addHighlightToPage(highlight, page, pdfOutput)
...     pdfOutput.addPage(page)
...     with open("output.pdf", mode="wb") as outPdf:
...         pdfOutput.write(outPdf)