• search hit 10 of 14
Back to Result List

Jezičnopovijesni i računalnojezikoslovni aspekti opisa i normiranja pisanja vodoravnih crta u hrvatskome jeziku

Language Historical and Computational Linguistic Aspects of the Descriptions and Norming of Dashes in the Croatian Language

  • Rad opisuje jedan od dvaju interpunkcijskih znakova (središnje crte i navodnici) koji bitno odstupaju od odnosa jednoga znaka za jedno (unikodno) značenje. Dok navodnici imaju višestruke grafeme (točnije, osam) za jedno značenje, središnje crte najčešće imaju dva grafema (kraća i dulja crta) koja pokrivaju čak 11 (unikodnih i latiničnih) crtnih znakova. Dok je kriterij crtne duljine tradicionalno visoko istaknut u pravopisnim priručnicima, u predstavljenoj se kategorizaciji on nalazi tek na šestoj hijerarhijskoj razini. Osim što su u međuvremenu standardizirana dva nova unikodna crtna znaka (two-em dash i three-em dash, Unicode 6.1, siječanj 2012.), drugačija metodologija i uspoređivanje jezičnopovijesnoga i računalnojezikoslovnoga aspekta proširila je spoznaje o crtnim znakovima u hrvatskome jeziku o kojima je pisano u Portada i Stojanov (2009). Predstavlja se kategorizacija osjetljiva na dihotomiju grafičkoga prikaza i značenja koja sve crtne znakove dijeli u pet hijerarhijskih razina. Između 44 unikodna vodoravna i neprekinuta crtna znaka, podjelom prema tipu, vremenu, funkciji, smjeru i visini, došlo se do 11 latiničnih suvremenih pismovnih vodoravnih središnjih znakova među kojima svaki latinični jezik odabire svoje crtne znakove. Svim se crtnim unikodnim grafemima opisalo značenje i uporaba. S druge strane, crtni se znakovi promatraju u kroatističkoj jezičnopovijesnoj pravopisnoj perspektivi. U odnosu na bogati repozitorij unikodno standardiziranih crtnih znakova utvrdilo se da je pravopisna norma bitno redukcijska. Pravopisno normiranje crtnih znakova podijelilo se u dva razdoblja i tri skupine, ovisno o grafemskome obliku (prva i druga generacija pravopisnih pri ručnika) i nazivlju (prijestandardna faza i dva standardna smjera normiranja ovisno o prihvaćanju terminoloških parova spojnica – crtica i crtica – crta). Na temelju jezičnopovijesnoga i računalnojezikoslovnoga poredbenog istraživanja te na temelju supostavljanja unikodne standardizacije crtnih znakova pravopisnoj tradiciji opisa središnjih crta željelo se ukazati na (i) potrebu za širim i interdisciplinarnim pristupom opisa pisane jezične prakse, (ii) nedovoljnost opisa školske razine pravopisnih priručnika za suvremeno pisanje, kao i na (iii) nedostatnost postojeće kroatističke kodifikacije obaju terminoloških smjerova. Da bi se pravopisni priručnici mogli nazvati znanstveno utemeljenim djelima, u znatnijoj bi mjeri trebali opisati računalno pisanje i na razini interpunkcije uvesti razlikovanje znaka i grafema. Jedno od takvih mjesta opisa prema kojima bi pravopisi mogli unaprijediti svoju tehnološku suvremenost jest pitanje pisanja spojnice na početku prelomljena retka o čemu se iznijelo osam argumenata za odbacivanje aktualne tradicije. Raščlamba je pokazala da ima opravdanosti da se crtna kodifikacija temelji na trima ili četirima znakovima koji se iz 11 unikodnih latiničnih znakova svode uspostavljanjem osnovnih skupina središnjih crta, radno nazvanih c1, c2, c3 i c4 kao najkraća, srednje duga, duga i jako duga središnja crta.
  • This paper describes one of two punctuation marks (dashes and quotation marks) that deviate significantly from the relationship of one character per (unicode) semantic value. While quotation marks have multiple graphemes (eight, specifically) for one semantic value, dashes typically have two graphemes (a short and a long dash) that cover as many as 11 (Unicode and Latin) dash characters. While the criteria for line length has typically been highly prominent in orthography manuals, it is only found in the presented categorization on the sixth hierarchical level. Aside from two new Unicode dash characters (the two-em dash and three-em dash, Unicode 6.1, January 2012) having been standardized in the meantime, differing methodology and a comparison of the linguistic-historical and computational linguistic aspects have spread awareness of dash characters in the Croatian language as described in Portada-Stojanov (2009). A categorization is presented that is sensitive to the dichotomy of graphic representation and meaning that divides all dash characters into five hierarchical levels. Among the 44 Unicode horizontal and unbroken dash characters, a division into type, time, functionality, direction, and line height has resulted in 11 contemporary Latin alphabetic horizontal central characters, among which each language written in the Latin alphabet chooses its own. The semantic value and usage of all Unicode dash graphemes has been described. On the other hand, the paper also described dash characters from the perspective of Croatian historical linguistics and orthography. In comparison to the rich repository of standardized Unicode dash characters, it has been shown that orthographic standards are significantly reductive. Orthographic norming of dash characters is divided into two periods and three groups, depending on their graphemic form (the first and second generation of orthography manuals) and terminology (the pre-standard phase and the two standard norming schools, depending on the acceptance of the terminological pairs “spojnica – crtica” and “crtica – crta”). The historical linguistic and computational linguistic comparative research and the contrastive analysis of the Unicode standardization of dash characters with traditional orthographic descriptions of dash characters was intended to highlight (i) the need for a broader, interdisciplinary approach to describing written linguistic practice, (ii) the insufficiency of descriptions in primary and secondary school orthography manuals for modern writing, and (iii) the insufficiency of the existing Croatian codification of both terminological schools. In order for orthography manuals to be called scholarly, it is claimed that computer writing should be better described, and that a differentiation between characters and graphemes should be introduced on the level of punctuation. One of the areas in which orthography manuals could bring themselves technologically up to date is the issue of the writing of compound words at the beginning of a broken line, and the paper provides eight reasons to abandon the current tradition. Analysis has shown that it would be justified to base dash codification on three or four characters, which reduces the 11 Latin Unicode characters to basic groups of dashes – the short, medium, long, and very long dashes, referred to as c1, c2, c3 and c4.

Download full text files

Export metadata

Metadaten
Author:Tomislav Stojanov
URN:urn:nbn:de:hebis:30:3-388579
URL:http://hrcak.srce.hr/141876
Parent Title (Croatian):Rasprave : časopis Instituta za Hrvatski Jezik i Jezikoslovlje
Publisher:Inst.
Place of publication:Zagreb
Document Type:Article
Language:Croatian
Date of Publication (online):2016/10/20
Year of first Publication:2015
Publishing Institution:Universitätsbibliothek Johann Christian Senckenberg
Release Date:2016/10/20
Tag:Croatian language; Unicode; dash; hyphen; linguography; orthography
crta; crtica; hrvatski jezik; jezikopis; pravopis; spojnica; unikod
Volume:41
Issue:1
Page Number:36
First Page:127
Last Page:161
Note:
Rights: Papers published in this journal can be used for personal or educational purposes while respecting the rights of authors and publishers.
HeBIS-PPN:401969150
Sammlungen:Linguistik
Licence (German):License LogoDeutsches Urheberrecht