gummy.models module

This file defines a model that integrates all of journals, translators, gateways, and it is possible to do all of the following at once.

  1. Determine the journal_type of paper from the url or file extension.

  2. If necessary, use a GummyGateway to access non-open content of the journal.

  3. Parse the paper using GummyJournals and obtain the contents.

  4. Translate the English obtained using GummyTranslators to Japanese.

  5. Arrange Japanese and English according to the templates .

  6. Convert the obtained HTML to PDF.

You can get (import) TranslationGummy by the following 2 ways.

>>> from gummy.models import TranslationGummy
>>> from gummy import TranslationGummy
class gummy.models.TranslationGummy(chrome_options=None, browser=False, driver=None, gateway='useless', translator='deepl', maxsize=5000, specialize=True, from_lang='en', to_lang='ja', verbose=True, translator_verbose=True)[source]

Bases: object

This class integrates all of the followings

Parameters
  • chrome_options (ChromeOptions) – Instance of ChromeOptions. (default= get_chrome_options() )

  • browser (bool) – Whether you want to run Chrome with GUI browser. (default= False )

  • driver (WebDriver) – Selenium WebDriver.

  • gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See gateways. (default= “useless”)

  • translator (str, GummyTranslator) – identifier of the Gummy Translator Class. See translators. (default= “deepl”)

  • maxsize (int) – Number of English characters that we can send a request at one time. (default= 5000)

  • specialize (bool) – Whether to support multiple languages or specialize. (default= True) If you want to specialize in translating between specific languages, set from_lang and to_lang arguments.

  • from_lang (str) – Language before translation.

  • to_lang (str) – Language after translation.

  • verbose (bool) – Whether you want to print output or not. (default= True )

  • translator_verbose (bool) – Whether you want to print translator’s output or not. (default= False )

translate(query, barname=None, from_lang='en', to_lang='ja', correspond=False)[source]

Translate English into Japanese. See translate.

Parameters
  • query (str) – English to be translated.

  • barname (str) – Bar name for ProgressMonitor.

  • from_lang (str) – Language before translation.

  • to_lang (str) – Language after translation.

  • correspond (bool) – Whether to correspond the location of from_lang correspond to that of to_lang.

Examples

>>> from gummy import TranslationGummy
>>> model = TranslationGummy()
>>> ja = model.translate("This is a pen.")
DeepLTranslator (query1) 03/30 [##------------------] 10.00% - 3.243[s]
>>> print(ja)
'これはペンです。'
get_contents(url, journal_type=None, crawl_type=None, gateway=None, **gatewaykwargs)[source]

Get contents of the journal.

Parameters
  • url (str) – URL of a paper or path/to/local.pdf.

  • journal_type (str) – Journal type, if you not specify, judge by analyzing from url.

  • crawl_type (str) – Crawling type, if you not specify, use recommended crawling type.

  • gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See gateways. (default= None)

  • gatewaykwargs (dict) – Gateway keywargs. See passthrough.

Returns

(title, content)

Return type

tuple (str, dict)

Examples

>>> from gummy import TranslationGummy
>>> model = TranslationGummy()
>>> title, texts = model.get_contents("https://www.nature.com/articles/ncb0800_500")
Estimated Journal Type : Nature
Crawling Type: soup
    :
>>> print(title)
Formation of the male-specific muscle in female by ectopic expression
>>> print(texts[:1])
[{'head': 'Abstract', 'en': 'The  () gene product Fru has been ... for the sexually dimorphic actions of the gene.'}]
toHTML(url, path=None, out_dir='/Users/iwasakishuto/.gummy', from_lang='en', to_lang='ja', correspond=True, journal_type=None, crawl_type=None, gateway=None, searchpath='/Users/iwasakishuto/Github/portfolio/Translation-Gummy/gummy/templates', template='paper.html', **gatewaykwargs)[source]

Get contents from URL and create a HTML.

Parameters
  • url (str) – URL of a paper or path/to/local.pdf.

  • path/out_dir (str) – Where you save a created HTML. If path is None, save at <out_dir>/<title>.html (default= GUMMY_DIR)

  • from_lang (str) – Language before translation.

  • to_lang (str) – Language after translation.

  • correspond (bool) – Whether to correspond the location of from_lang correspond to that of to_lang.

  • journal_type (str) – Journal type, if you specify, use journal_type journal crawler. (default= None)

  • crawl_type (str) – Crawling type, if you not specify, use recommended crawling type. (default= None)

  • gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See gateways. (default= None)

  • searchpath/template (str) – Use a <searchpath>/<template> tpl for creating HTML. (default= TEMPLATES_DIR/paper.html)

  • gatewaykwargs (dict) – Gateway keywargs. See passthrough.

toPDF(url, path=None, out_dir='/Users/iwasakishuto/.gummy', from_lang='en', to_lang='ja', correspond=True, journal_type=None, crawl_type=None, gateway=None, searchpath='/Users/iwasakishuto/Github/portfolio/Translation-Gummy/gummy/templates', template='paper.html', delete_html=True, options={}, **gatewaykwargs)[source]

Get contents from URL and create a PDF.

Parameters
  • url (str) – URL of a paper or path/to/local.pdf.

  • path/out_dir (str) – Where you save a created HTML. If path is None, save at <out_dir>/<title>.html (default= GUMMY_DIR)

  • from_lang (str) – Language before translation.

  • to_lang (str) – Language after translation.

  • correspond (bool) – Whether to correspond the location of from_lang correspond to that of to_lang.

  • journal_type (str) – Journal type, if you specify, use journal_type journal crawler. (default= None)

  • crawl_type (str) – Crawling type, if you not specify, use recommended crawling type. (default= None)

  • gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See gateways. (default= None)

  • searchpath/template (str) – Use a <searchpath>/<template> tpl for creating HTML. (default= TEMPLATES_DIR/paper.html)

  • delete_html (bool) – Whether you want to delete an intermediate html file. (default= True)

  • options (dict) – Options for wkhtmltopdf. See https://wkhtmltopdf.org/usage/wkhtmltopdf.txt (default= {})

  • gatewaykwargs (dict) – Gateway keywargs. See passthrough.