"a string spanning multiple lines " \
"on and on and on and on" \
"and so on"
This document describes the DSL-based files with an extension .iml
or .imp
.
An .imp
file is a file containing a standalone transliteration map. For
instance, a map that can transliterate a Korean file to a Latin file.
An .iml
file is a file that contains a library of aliases and stages to be
used by the .imp
maps. It follows the same format, but does not require the
metadata and tests parts to exist and doesn’t allow the main
stage to exist.
This document describes the map version of the format if it isn’t noted
otherwise.
Basic syntax
A #
character is a comment character. This means, that the part that follows
a #
character till the end of the line is ignored by Interscript, but exist to
communicate to a human reader the intention behind the content. In this document
it is most often a hint to a person reading this document.
A String is a part of the document of a form either: "content"
or 'content'
.
It denotes a group of characters to be used. It can be joined together using a
+
character like so: "a" + "b"
which is equal to as if someone wrote just
"ab"
.
It can also be joined in the following way:
The \\
character written before the end of the line ensures that our parser
treats the next line as a continuation of the previous one. This kind of syntax
makes multiline tests much clearer.
Except for the strings of the form 'content'
, all those forms can contain
escape forms like \u0410
, which means "An Unicode character 0410". The usage
of those forms is discouraged in new maps, but possible. A String can also
contain an escape character \n
which means a line break.
An array (or a list) is a part of the document of a form ["a", "b", "c"]
. It
means a sequential group of Strings, or other types.
Document
The root part of the .iml
file is called a document. A map has a format as
follows:
metadata {
# Metadata part comes here
}
tests {
# Tests part comes here
}
# A dependency directive may happen zero or more times. It will be described in
# a subsection.
dependency "other-map-or-library", as: shortname
# This part is optional
aliases {
# Aliases part comes here
}
stage {
# A stage description comes here
}
# There may be more than 1 stage, the other stages need to have a name. The
# default stage name is `main`. A name can't happen more than once in a document.
stage(stage_name) {
# A stage description comes here
}
Dependency
Dependency is an instruction to be issued only in the document context. It means that we want to import some aliases or stages from another map or a library.
dependency "other-map-or-library", as: shortname
This instruction will allow us to reference aliases and stages from other
libraries in this form: map.shortname.stage.stagename
for stages and
map.shortname.aliasname
for aliases.
There is a second syntax, mostly useful for loading libraries that will import the stages and aliases to a global context resulting in possibly more human readable maps:
dependency "other-map-or-library", as: shortname, import: true
This form allows to reference other stages and aliases in the following form:
stage.stagename
, aliasname
It is not possible to load maps using this form, only libraries, because we
can’t override the main
stage.
The standard library is implicitly imported this way. There’s no way or need to import it explicitly.
Standard library
All maps depend on a standard library implicitly. This standard library defines a few useful aliases that may or may not be expressed otherwise.
Below is a table that describes the aliases defined by the standard library:
|
An empty string |
|
A space character |
|
Any whitespace ascii character (space, tab, line-delimiter, …) |
|
A word boundary (see below for what institutes a word character) |
|
An ascii word character (a-z, A-Z, 0-9, _) |
|
Negation of the above |
|
Any ascii alphabetic character (a-z, A-Z) |
|
Negation of the above |
|
Any ascii digit |
|
Negation of the above |
|
Beginning of a line |
|
Ending of a line |
|
Beginning of a string |
|
Ending of a string |
Any standard library (or otherwise) aliases can be joined with anything else
using a + command, for example: line_start + "rest"
.
Metadata part
The metadata part describes our map. It follows a YAML syntax, and so contrary to
other parts of the document, it doesn’t necessarily conform to all what’s written
in the Basic syntax
part of this guide.
metadata {
# ID of the authority that provided the transliteration rules we are about to implement
authority_id: iso
# ID of the rules, most often the year they were defined
id: 1996-method1
# The language code of the map
language: iso-639-2:kor
# The source script of our map, in our example Hang for Hangul
source_script: Hang
# The destination script of our map
destination_script: Latn
# The longer name of our map
name: ISO/TR 11941:1996 Information and documentation — Transliteration of Korean script into Latin characters
# The URL where it was published
url: https://www.iso.org/standard/20564.html
# The creation date of our map
creation_date: 1996
# The adoption date of our map, or empty if not adopted
adoption_date: ""
# The description of our map
description: |
Establishes a system for the transliteration of the characters of Korean script into Latin characters.
Intended to provide a means for international communication of written documents.
# The notes that describe some parts of our map that we are about to implement
notes:
- A word-initial hard sign 'ъ' is not represented, but instead is left out of the transliteration.
- The romanization follows the dialect spoken in Chechnya rather than other local pronunciations.
}
Tests part
The tests part describes a group of the tests to be executed by the automated system to verify that the map is defined properly. An example tests part looks like this:
tests {
test "애기", "aeki"
test "방", "pang"
}
This means, that we want to test our map to transliterate a string "애기" to "aeki" and "방" to "pang".
Aliases part
An aliases part describes a group of aliases to be used by the stages to simplify the code of our map.
Let’s suppose that our map refers to "Double consonant jamo" and "Aspirated consonant jamo" quite extensively. We can alias those
aliases {
def_alias double_cons_jamo, any("ᄁᄄᄈᄍᄊ")
def_alias aspirated_cons_jamo, any("ᄏᄐᄑᄎ")
}
And later in the stage part refer to them by just double_cons_jamo
, not
needing to repeat ourselves.
Stage part
A stage part describes a stage, a sequential group of steps to transliterate a string from a source script code to a destination script code. An example stage looks like the following:
stage {
run map.hangjamo.stage.main
sub any("ᄀᆨ"), "k"
sub any("ᄏᆿ"), "kh"
parallel {
sub "ᅡ", "a"
sub "ᅥ", "eo"
}
}
A stage can be named, as described in the Document section. The default name
of a stage is main
.
sub
call
A sub
call does a substitution of an item (string, character, alias) with
another item.
stage {
sub "source", "destination"
}
This call allows for some named parameters:
|
Execute this substitution only if the "source" is preceded by what is given as a parameter, but won’t replace it, it will only replace the "source". |
|
Same, but this parameter denotes what is used after. |
|
Negation of |
For example:
stage {
sub boundary + "Е", "Ye", not_before: "’"
sub boundary + "е", "ye", not_before: "’"
sub none, "'", not_before: hangul, after: aspirated_cons
}
Multiple replacements
In various maps there was a need to document multiple replacements. Let’s suppose our character set has a character "a" that can be transliterated to any of the forms "X", "Y" or "Z". As of now, it means that "a" is always translated to "X", as it came first. In the future it will be possible to execute such a map in reverse as well.
stage {
sub "a", any("XYZ")
}
parallel
block
A parallel block can be defined as a subsection of a stage
part. It indicates
that the steps inside need to be executed in parallel. At the current time, only
sub
calls can be executed in parallel. It also means, that those steps will try
to find the longest substrings first.
stage {
parallel {
sub "А", 'A'
sub "Б", 'B'
sub "В", 'V'
sub "Г", 'G'
}
}
Simple mode
If there are only rules with simple sub rules, we are using a fast track
implementation. By simple sub rules we mean those rules that lack a before/after
part and ones that only use string and possibly any
items with concatenation.
run
call
The run call runs a stage defined inside the document, or another map or
library. If this map isn’t local, a map or library dependency needs to be
declared using the dependency
call.
For example:
stage {
# If dependency declared without import: true
run map.hangjamo.stage.main
# If dependency declated with import: true, or we reference a local stage
run stage.remove_spaces
}
Standard library functions
There are certain conversions that may be hard to be achieved using stages, those are implemented in respective standard libraries using programming languages.
For a function named title_case
, it can be called with the following:
stage {
title_case
}
A standard library function can take (named) arguments. Those are described in the table below and they may be omitted if a default value is specified.
List of standard library functions
Function name | Arguments | Sample input | Sample output |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Consult: Usage with Secryst. |
|
|
|
Consult: Usage with Rababa. |
Items
Interscript doesn’t work purely on Strings, even though Strings are mostly
referenced to by this document. The items can be used in the alias
and stage
context.
String item
The most basic kind of item. For example "Г"
means "match Г" or "replace
with Г" depending on usage context. Some contexts will only accept strings, or
aliases to strings.
+
method
Items can be concatenated (added together) to denote a complex item. For instance:
any("ab") + "e"
means "either ae or be" and is equivalent to any(["ae", "be"])
.
any
item
Any denotes some alternative variations of a string. It has 3 forms of call:
-
any("abcde")
- any character: a, b, c, d or e -
any(["one", "two"])
- any string: one or two -
any("a".."z")
- any character from a to z
Any can be also used with other kinds of items than String, for instance:
stage {
sub any([line_start + "a", "a" + line_end]), none
}
maybe
, some
, maybe_some
items
If you want a given item to be allowed to be repeated respectively: 0 to 1 times,
1 to Infinity times, 0 to Infinity times, you can surround it with respectively:
maybe()
, some()
, maybe_some()
.
stage {
sub "a"+maybe("-")+"b", "AB" # Equivalent to regexp: a-?b
sub "a"+some("-")+"b", "AB" # Equivalent to regexp: a-+b
sub "a"+maybe_some("-")+"b", "AB" # Equivalent to regexp: a-*b
}
alias
item
An alias item references an alias. For example map.other_map.alias_from_other_map
or simply a_local_alias_or_an_alias_from_imported_library
.
capture
and ref
items
Sometimes there may be a need to reference a group from input inside output (or
input too). People who know regular expressions are familiar with expressions of
some form of replace /(a)/, '[\1]'
. Interscript supports this kind of syntax:
stage {
sub capture(any("abc")), "["+ref(1)+"]"
}
When ran against a string "abcde"
, this stage will produce an output of
"[a][b][c]de"
.
Reversibility
Starting with Interscript 2.2 we added reversibility support. In general all
commands, except for metadata support a reverse_run
keyword. This keyword
is nil
by default, which means, that it’s reversible. (One exception though:
sub
call, if given a none
as to
, defaults to reverse_run: false
).
reverse_run: false
means, that the command is only ran in forward.
reverse_run: true
, on the other hand, means that the command is only ran in
reverse.
Example 1:
stage {
sub "a", "b", reverse_run: true
sub "c", "d", reverse_run: false
}
When ran in forward mode (normal run) on a string "abcde", it gives "abdde". When ran in reverse mode on a string "abcde", it gives "aacde".
Example 2:
stage {
parallel {
sub "a", "あ"
sub "o", "お"
sub "i", "い"
}
}
When ran in forward mode (normal run) on a string "あおい", it gives "aoi". When ran in reverse more on a string "aoi", it gives "あおい".
The tests accept reverse_run:
as well, and as before, it defaults to nil
.
To run a test in only a single direction, you can write it as follows:
tests {
test "привет", "privet", reverse_run: false
}
Do note that even though a given command is given reverse_run: true
, it
still needs to be written in forward. As in: it will be reversed. As with tests,
if you want to test a map in reverse that "privet" gives a "привет", you still
need to write "привет" as the first argument, "privet" as the second, and then
you need to supply reverse_run: true
.
We understand there may be a need that a given set of rules won’t get reversed
(not all kinds of commands are reversible in principle). For this kind of usage,
we created a dont_reverse: true
argument to stage
. Note: no other commands
support this argument. An example map of this kind would look like this:
stage {
run stage.reverse_exclusive, reverse_run: true
}
stage(reverse_exclusive, dont_reverse: true) {
sub 'a', 'b'
}
This map doesn’t have any rules in forward, so when given "abcde" in forward, it
obviously returns "abcde". When given "abcde" in reverse it returns "bbcde" (if
dont_reverse:
wouldn’t be set, it would return "aacde").
Composability
To compose two maps together, whether in reverse or not, you can supply a name in the form of: "map1-name|map2-name" everywhere a map name is accepted (in a CLI utility or as a dependency).
Ending notes
This document described everything Interscript currently supports, but it is strongly advised to read the existing maps to get a grasp of how those functionalities can be used best.