Suchergebnisse

Language model assisted OCR classification for Republican Chinese newspaper text

Autor*in: Henke, Konstantin; Arnold, Matthias

Erschienen: 29 Feb. 2024

Verlag: Taiwanese Association for Digital Humanities, Taipeh, ROC ; Universitätsbibliothek Heidelberg, Heidelberg

Zugang:

Resolving-System (kostenfrei)

Heidelberg: Universitätsbibliothek Heidelberg

Standort:

Universitätsbibliothek Heidelberg

Fernleihe:

keine Fernleihe

Standort:

HeiBIB - Die Heidelberger Universitätsbibliographie

Fernleihe:

keine Fernleihe

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Hinweise zum Inhalt

Volltext (kostenfrei)

Quelle:	Verbundkataloge
Sprache:	Englisch
Medientyp:	Buch (Monographie)
Format:	Online
Weitere Identifier:	urn: urn:nbn:de:bsz:16-heidok-314169 doi: 10.11588/heidok.00031416
Umfang:	1 Online-Ressource (20 Seiten), Illustrationen

<<Die>> Funktion nackter Körper im Holocaust-Spielfilm 1948-1975

eine phänomenologische Filmanalyse

Autor*in: Henke, Konstantin

Erschienen: 2012

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Hinweise zum Inhalt

Volltext am Hochschulschriftenserver der UB Wien

Quelle:	Verbundkataloge
Sprache:	Deutsch
Medientyp:	Dissertation
Format:	Online; Druck
Weitere Identifier:	urn: urn:nbn:at:at-ubw:1-29745.45433.710070-6
Schlagworte:	Europa; Film; Judenvernichtung <Motiv>; Nacktheit <Motiv>; Geschichte 1948-1975;
Umfang:	134 S.
Bemerkung(en):	Wien, Univ., Dipl.-Arb., 2013

Language Model Assisted OCR Classification for Republican Chinese Newspaper Text

Autor*in: Henke, Konstantin; Arnold, Matthias

Erschienen: 2024

Verlag: Taiwanese Association for Digital Humanities

In this work, we present methods to obtain a neural optical character recognition (OCR) tool for article blocks in a Republican Chinese newspaper. Our basis is a small fraction of the image corpus for which text ground truth exists. We introduce a... mehr

Volltext:	https://archiv.ub.uni-heidelberg.de/volltextserver/31416/ https://archiv.ub.uni-heidelberg.de/volltextserverhttps://archiv.ub.uni-heidelberg.de/volltextserver/31416/7/Language_model_Henke_Arnold_2023.pdf
Zitierfähiger Link:	https://doi.org/10.11588/heidok.00031416 https://doi.org/10.6853/DADH.202310_(12).0001 https://nbn-resolving.org/urn:nbn:de:bsz:16-heidok-314169

In this work, we present methods to obtain a neural optical character recognition (OCR) tool for article blocks in a Republican Chinese newspaper. Our basis is a small fraction of the image corpus for which text ground truth exists. We introduce a character segmentation method which produces over 90,000 labeled images of single characters and train a GoogLeNet classifier as an OCR model. In addition, we create synthetic training data from character images extracted from Song-Ti fonts. Randomly augmented on the fly and used for pre-training, they increase OCR accuracy from 95.49% to 96.95% on our test set. Finally, we employ post-OCR correction based on a pre-trained masked language model and present heuristics to select the required hyperparameters, by which we are able to correct 16% of remaining classification errors, increasing accuracy on the test set to 97.44%.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt AVL
Sprache:	Englisch
Medientyp:	Bericht
Format:	Online
DDC Klassifikation:	Datenverarbeitung; Informatik (004); Bibliotheks- und Informationswissenschaften (020); Andere Sprachen (490); Literaturen anderer Sprachen (890); Geschichte Asiens; des Fernen Ostens (950)
Lizenz:	info:eu-repo/semantics/openAccess ; Please see front page of the work (Sorry, Dublin Core plugin does not recognise license id)

Filtern nach

Aktive Filter

Kategorien:

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

Language model assisted OCR classification for Republican Chinese newspaper text

Heidelberg: Universitätsbibliothek Heidelberg

<<Die>> Funktion nackter Körper im Holocaust-Spielfilm 1948-1975

Language Model Assisted OCR Classification for Republican Chinese Newspaper Text

Kontaktieren Sie uns!