gummy.models module¶
This file defines a model that integrates all of journals
,
translators
, gateways
, and
it is possible to do all of the following at once.
Determine the
journal_type
of paper from theurl
or file extension.If necessary, use a
GummyGateway
to access non-open content of the journal.Parse the paper using
GummyJournals
and obtain the contents.Translate the English obtained using
GummyTranslators
to Japanese.Arrange Japanese and English according to the templates .
Convert the obtained HTML to PDF.
You can get (import) TranslationGummy
by the following 2 ways.
>>> from gummy.models import TranslationGummy
>>> from gummy import TranslationGummy
-
class
gummy.models.
TranslationGummy
(chrome_options=None, browser=False, driver=None, gateway='useless', translator='deepl', maxsize=5000, specialize=True, from_lang='en', to_lang='ja', verbose=True, translator_verbose=True)[source]¶ Bases:
object
This class integrates all of the followings
- Parameters
chrome_options (ChromeOptions) – Instance of ChromeOptions. (default=
get_chrome_options()
)browser (bool) – Whether you want to run Chrome with GUI browser. (default=
False
)driver (WebDriver) – Selenium WebDriver.
gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See
gateways
. (default= “useless”)translator (str, GummyTranslator) – identifier of the Gummy Translator Class. See
translators
. (default= “deepl”)maxsize (int) – Number of English characters that we can send a request at one time. (default=
5000
)specialize (bool) – Whether to support multiple languages or specialize. (default=
True
) If you want to specialize in translating between specific languages, setfrom_lang
andto_lang
arguments.from_lang (str) – Language before translation.
to_lang (str) – Language after translation.
verbose (bool) – Whether you want to print output or not. (default=
True
)translator_verbose (bool) – Whether you want to print translator’s output or not. (default=
False
)
-
translate
(query, barname=None, from_lang='en', to_lang='ja', correspond=False)[source]¶ Translate English into Japanese. See
translate
.- Parameters
query (str) – English to be translated.
barname (str) – Bar name for
ProgressMonitor
.from_lang (str) – Language before translation.
to_lang (str) – Language after translation.
correspond (bool) – Whether to correspond the location of
from_lang
correspond to that ofto_lang
.
Examples
>>> from gummy import TranslationGummy >>> model = TranslationGummy() >>> ja = model.translate("This is a pen.") DeepLTranslator (query1) 03/30 [##------------------] 10.00% - 3.243[s] >>> print(ja) 'これはペンです。'
-
get_contents
(url, journal_type=None, crawl_type=None, gateway=None, **gatewaykwargs)[source]¶ Get contents of the journal.
- Parameters
url (str) – URL of a paper or
path/to/local.pdf
.journal_type (str) – Journal type, if you not specify, judge by analyzing from
url
.crawl_type (str) – Crawling type, if you not specify, use recommended crawling type.
gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See
gateways
. (default=None
)gatewaykwargs (dict) – Gateway keywargs. See
passthrough
.
- Returns
(title, content)
- Return type
tuple (str, dict)
Examples
>>> from gummy import TranslationGummy >>> model = TranslationGummy() >>> title, texts = model.get_contents("https://www.nature.com/articles/ncb0800_500") Estimated Journal Type : Nature Crawling Type: soup : >>> print(title) Formation of the male-specific muscle in female by ectopic expression >>> print(texts[:1]) [{'head': 'Abstract', 'en': 'The () gene product Fru has been ... for the sexually dimorphic actions of the gene.'}]
-
toHTML
(url, path=None, out_dir='/Users/iwasakishuto/.gummy', from_lang='en', to_lang='ja', correspond=True, journal_type=None, crawl_type=None, gateway=None, searchpath='/Users/iwasakishuto/Github/portfolio/Translation-Gummy/gummy/templates', template='paper.html', **gatewaykwargs)[source]¶ Get contents from URL and create a HTML.
- Parameters
url (str) – URL of a paper or
path/to/local.pdf
.path/out_dir (str) – Where you save a created HTML. If path is None, save at
<out_dir>/<title>.html
(default=GUMMY_DIR
)from_lang (str) – Language before translation.
to_lang (str) – Language after translation.
correspond (bool) – Whether to correspond the location of
from_lang
correspond to that ofto_lang
.journal_type (str) – Journal type, if you specify, use
journal_type
journal crawler. (default= None)crawl_type (str) – Crawling type, if you not specify, use recommended crawling type. (default= None)
gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See
gateways
. (default= None)searchpath/template (str) – Use a
<searchpath>/<template>
tpl for creating HTML. (default= TEMPLATES_DIR/paper.html)gatewaykwargs (dict) – Gateway keywargs. See
passthrough
.
-
toPDF
(url, path=None, out_dir='/Users/iwasakishuto/.gummy', from_lang='en', to_lang='ja', correspond=True, journal_type=None, crawl_type=None, gateway=None, searchpath='/Users/iwasakishuto/Github/portfolio/Translation-Gummy/gummy/templates', template='paper.html', delete_html=True, options={}, **gatewaykwargs)[source]¶ Get contents from URL and create a PDF.
- Parameters
url (str) – URL of a paper or
path/to/local.pdf
.path/out_dir (str) – Where you save a created HTML. If path is None, save at
<out_dir>/<title>.html
(default=GUMMY_DIR
)from_lang (str) – Language before translation.
to_lang (str) – Language after translation.
correspond (bool) – Whether to correspond the location of
from_lang
correspond to that ofto_lang
.journal_type (str) – Journal type, if you specify, use
journal_type
journal crawler. (default= None)crawl_type (str) – Crawling type, if you not specify, use recommended crawling type. (default= None)
gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See
gateways
. (default= None)searchpath/template (str) – Use a
<searchpath>/<template>
tpl for creating HTML. (default= TEMPLATES_DIR/paper.html)delete_html (bool) – Whether you want to delete an intermediate html file. (default= True)
options (dict) – Options for wkhtmltopdf. See https://wkhtmltopdf.org/usage/wkhtmltopdf.txt (default= {})
gatewaykwargs (dict) – Gateway keywargs. See
passthrough
.
-
highlight
(url, path=None, out_dir='/Users/iwasakishuto/.gummy', from_lang='en', to_lang='ja', journal_type=None, gateway=None, ignore_length=10, highlight_color=[1, 1, 0], **gatewaykwargs)[source]¶ Get contents from URL and create a PDF.
- Parameters
url (str) – URL of a paper or
path/to/local.pdf
.path/out_dir (str) – Where you save a created HTML. If path is None, save at
<out_dir>/<title>.html
(default=GUMMY_DIR
)from_lang (str) – Language before translation.
to_lang (str) – Language after translation.
journal_type (str) – Journal type, if you specify, use
journal_type
journal crawler. (default= None)gateway (str, GummyGateWay) – identifier of the Gummy Gateway Class. See
gateways
. (default= None)ignore_length (int) – If the number of English characters is smaller than
ignore_length
, do not highlighthighlight_color (list) – The highlight color.
gatewaykwargs (dict) – Gateway keywargs. See
passthrough
.