gummy.models module¶

This file defines a model that integrates all of journals, translators, gateways, and it is possible to do all of the following at once.

Determine the journal_type of paper from the url or file extension.
If necessary, use a GummyGateway to access non-open content of the journal.
Parse the paper using GummyJournals and obtain the contents.
Translate the English obtained using GummyTranslators to Japanese.
Arrange Japanese and English according to the templates .
Convert the obtained HTML to PDF.

You can get (import) TranslationGummy by the following 2 ways.

>>> from gummy.models import TranslationGummy
>>> from gummy import TranslationGummy

class gummy.models.TranslationGummy(chrome_options=None, browser=False, driver=None, gateway='useless', translator='deepl', maxsize=5000, specialize=True, from_lang='en', to_lang='ja', verbose=True, translator_verbose=True)[source]¶

Bases: object

This class integrates all of the followings

Parameters

chrome_options (ChromeOptions) – Instance of ChromeOptions. (default= get_chrome_options() )
browser (bool) – Whether you want to run Chrome with GUI browser. (default= False )
driver (WebDriver) – Selenium WebDriver.
gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See gateways. (default= “useless”)
translator (str, GummyTranslator) – identifier of the Gummy Translator Class. See translators. (default= “deepl”)
maxsize (int) – Number of English characters that we can send a request at one time. (default= 5000)
specialize (bool) – Whether to support multiple languages or specialize. (default= True) If you want to specialize in translating between specific languages, set from_lang and to_lang arguments.
from_lang (str) – Language before translation.
to_lang (str) – Language after translation.
verbose (bool) – Whether you want to print output or not. (default= True )
translator_verbose (bool) – Whether you want to print translator’s output or not. (default= False )

translate(query, barname=None, from_lang='en', to_lang='ja', correspond=False)[source]¶

Translate English into Japanese. See translate.

Parameters

query (str) – English to be translated.
barname (str) – Bar name for ProgressMonitor.
from_lang (str) – Language before translation.
to_lang (str) – Language after translation.
correspond (bool) – Whether to correspond the location of from_lang correspond to that of to_lang.

Examples

>>> from gummy import TranslationGummy
>>> model = TranslationGummy()
>>> ja = model.translate("This is a pen.")
DeepLTranslator (query1) 03/30 [##------------------] 10.00% - 3.243[s]
>>> print(ja)
'これはペンです。'

get_contents(url, journal_type=None, crawl_type=None, gateway=None, **gatewaykwargs)[source]¶

Get contents of the journal.

Parameters

url (str) – URL of a paper or path/to/local.pdf.
journal_type (str) – Journal type, if you not specify, judge by analyzing from url.
crawl_type (str) – Crawling type, if you not specify, use recommended crawling type.
gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See gateways. (default= None)
gatewaykwargs (dict) – Gateway keywargs. See passthrough.

Returns

(title, content)

Return type

tuple (str, dict)

Examples

>>> from gummy import TranslationGummy
>>> model = TranslationGummy()
>>> title, texts = model.get_contents("https://www.nature.com/articles/ncb0800_500")
Estimated Journal Type : Nature
Crawling Type: soup
    :
>>> print(title)
Formation of the male-specific muscle in female by ectopic expression
>>> print(texts[:1])
[{'head': 'Abstract', 'en': 'The  () gene product Fru has been ... for the sexually dimorphic actions of the gene.'}]

toHTML(url, path=None, out_dir='/Users/iwasakishuto/.gummy', from_lang='en', to_lang='ja', correspond=True, journal_type=None, crawl_type=None, gateway=None, searchpath='/Users/iwasakishuto/Github/portfolio/Translation-Gummy/gummy/templates', template='paper.html', **gatewaykwargs)[source]¶

Get contents from URL and create a HTML.

Parameters

url (str) – URL of a paper or path/to/local.pdf.
path/out_dir (str) – Where you save a created HTML. If path is None, save at <out_dir>/<title>.html (default= GUMMY_DIR)
from_lang (str) – Language before translation.
to_lang (str) – Language after translation.
correspond (bool) – Whether to correspond the location of from_lang correspond to that of to_lang.
journal_type (str) – Journal type, if you specify, use journal_type journal crawler. (default= None)
crawl_type (str) – Crawling type, if you not specify, use recommended crawling type. (default= None)
gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See gateways. (default= None)
searchpath/template (str) – Use a <searchpath>/<template> tpl for creating HTML. (default= TEMPLATES_DIR/paper.html)
gatewaykwargs (dict) – Gateway keywargs. See passthrough.

toPDF(url, path=None, out_dir='/Users/iwasakishuto/.gummy', from_lang='en', to_lang='ja', correspond=True, journal_type=None, crawl_type=None, gateway=None, searchpath='/Users/iwasakishuto/Github/portfolio/Translation-Gummy/gummy/templates', template='paper.html', delete_html=True, options={}, **gatewaykwargs)[source]¶

Get contents from URL and create a PDF.

Parameters

url (str) – URL of a paper or path/to/local.pdf.
path/out_dir (str) – Where you save a created HTML. If path is None, save at <out_dir>/<title>.html (default= GUMMY_DIR)
from_lang (str) – Language before translation.
to_lang (str) – Language after translation.
correspond (bool) – Whether to correspond the location of from_lang correspond to that of to_lang.
journal_type (str) – Journal type, if you specify, use journal_type journal crawler. (default= None)
crawl_type (str) – Crawling type, if you not specify, use recommended crawling type. (default= None)
gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See gateways. (default= None)
searchpath/template (str) – Use a <searchpath>/<template> tpl for creating HTML. (default= TEMPLATES_DIR/paper.html)
delete_html (bool) – Whether you want to delete an intermediate html file. (default= True)
options (dict) – Options for wkhtmltopdf. See https://wkhtmltopdf.org/usage/wkhtmltopdf.txt (default= {})
gatewaykwargs (dict) – Gateway keywargs. See passthrough.

highlight(url, path=None, out_dir='/Users/iwasakishuto/.gummy', from_lang='en', to_lang='ja', journal_type=None, gateway=None, ignore_length=10, highlight_color=[1, 1, 0], **gatewaykwargs)[source]¶

Get contents from URL and create a PDF.

Parameters

url (str) – URL of a paper or path/to/local.pdf.
path/out_dir (str) – Where you save a created HTML. If path is None, save at <out_dir>/<title>.html (default= GUMMY_DIR)
from_lang (str) – Language before translation.
to_lang (str) – Language after translation.
journal_type (str) – Journal type, if you specify, use journal_type journal crawler. (default= None)
gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See gateways. (default= None)
ignore_length (int) – If the number of English characters is smaller than ignore_length , do not highlight
highlight_color (list) – The highlight color.
gatewaykwargs (dict) – Gateway keywargs. See passthrough.

gummy.models module¶

Other contents

Social link

Previous topic

Next topic

This Page