im2latex-100k , arXiv:1609.04938 | Zenodo


本站和网页 https://zenodo.org/record/56198#.XD_2hsszbNN 的作者无关,不对其内容负责。快照谨为网络故障时之索引,不代表被搜索网站的即时页面。

im2latex-100k , arXiv:1609.04938 | Zenodo
Toggle navigation
Upload
Communities
Log in
Sign up
June 21, 2016
Dataset
Open Access
im2latex-100k , arXiv:1609.04938
Kanervisto, Anssi
A prebuilt dataset for OpenAI's task for image-2-latex system. Includes total of ~100k formulas and images splitted into train, validation and test sets. Formulas were parsed from LaTeX sources provided here: http://www.cs.cornell.edu/projects/kddcup/datasets.html(originally from聽 arXiv)
Each image is a PNG image of fixed size. Formula is in black and rest of the image is transparent.
For related tools (eg. tokenizer) check out this repository: https://github.com/Miffyli/im2latex-dataset
For pre-made evaluation scripts and built im2latex system check this repository: https://github.com/harvardnlp/im2markup
Newlines used in formulas_im2latex.lst are UNIX-style newlines (\n). Reading file with other type of newlines results to slightly wrong amount of lines (104563 instead of 103558), and thus breaks the structure used by this dataset. Python 3.x reads files using newlines of the running system by default, and to avoid this file must be opened with newlines="\n" (eg. open("formulas_im2latex.lst", newline="\n")).
Files
(306.8 MB)
Name
Size
formula_images.tar.gz
md5:cf25f2408f1ea09bbd096890a6361533
292.2 MB
Download
im2latex_formulas.lst
md5:974c0a14f0daa6d91ecd0e625f1ddf52
12.3 MB
Download
im2latex_test.lst
md5:1bc17b865796dca5df15250b4da7804f
237.4 kB
Download
im2latex_train.lst
md5:d5607c37aa00576098a9e4bad84a7040
1.9 MB
Download
im2latex_validate.lst
md5:cf6eeee02bc443b1b9557685fbfe7ea5
213.7 kB
Download
readme.txt
md5:3d4cb64d8c403148ff06370d71072cdc
924 Bytes
Download
Citations
21,732
27,579
views
downloads
See more details...
All versions
This version
Views 21,73221,756
Downloads 27,57927,579
Data volume 3.6 TB3.6 TB
Unique views 18,96518,985
Unique downloads 7,9127,912
More info on how stats are collected.
Indexed in
Publication date:
June 21, 2016
DOI:
Zenodo DOI Badge
DOI
10.5281/zenodo.56198
Markdown
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.56198.svg)](https://doi.org/10.5281/zenodo.56198)
reStructedText
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.56198.svg
:target: https://doi.org/10.5281/zenodo.56198
HTML
<a href="https://doi.org/10.5281/zenodo.56198"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.56198.svg" alt="DOI"></a>
Image URL
https://zenodo.org/badge/DOI/10.5281/zenodo.56198.svg
Target URL
https://doi.org/10.5281/zenodo.56198
Keyword(s):
im2latex
latex
tex
formula
openai
Related identifiers:
Part of
arXiv:1609.04938
License (for files):
Creative Commons Zero v1.0 Universal
Share
Cite as
Export
BibTeX
CSL
DataCite
Dublin Core
DCAT
JSON
JSON-LD
GeoJSON
MARCXML
Mendeley
About
About
Policies
Infrastructure
Principles
Contact
Blog
Blog
Help
FAQ
Features
Support
Developers
REST API
OAI-PMH
Contribute
GitHub
Donate
Funded by
Status
Privacy policy
Terms of Use
Support
Powered by CERN Data Centre & Invenio.