Name

ponscr-syntax — description of Ponscripter syntax

Description

This page documents the syntax of Ponscripter scripts. See Overview for an overview of other documentation.

Note that this cannot be considered NScripter documentation. NScripter itself is largely unspecified. Ponscripter's implementation is ultimately based on observation and documentation rather than on reverse-engineering as such: it inevitably adopts different parsing strategies, and is more liberal in what it accepts. Not all differences are described below. With this disclaimer out of the way: the documentation.

Fundamentals

Scripts are line-based.

There are two parsing modes: command mode and text mode. Parsing of each line begins in command mode, and switches to text mode for the rest of the line if a text command is encountered.

The two parsing modes have little in common. The following sections discuss command mode first; text mode is then treated separately.

A third mode, unmarked text, exists for legacy reasons. This mode is similar to text mode, and is entered if an invalid character is encountered at the start of a command: that is, a number, a character outside the ASCII range, or anything in the set [[@\/%?$(!#,]. Such lines are valid if they contain only !-commands; otherwise a warning is issued, and the behaviour is undefined.

(This mode derives from NScripter, where traditionally there was no intersection between printable characters and command characters; the ` text marker was introduced in ONScripter as a means of supporting English text, and replaced in Ponscripter with ^ to free up ` for other uses. Since unmarked text serves no useful purpose, and complicates parsing, it is deprecated and will be removed without notice at some point in the future.)

Context

NScripter is a context-sensitive language. Each parameter to a command may be parsed differently based on the type of that parameter. The major types are string and integer, with labels and barewords being special cases of string parameters.

String expressions do not merely have a different type from integer expressions, as in other languages: they have a distinct syntax. Some string expressions can be parsed as integer expressions, but then leave code unparsed that will cause a syntax error when it is reached. It is impossible, in the general case, to parse a line of code unless it is known in advance what context each parameter is using.

For example, given the following definitions:

numalias foo, 100
stralias foo, "bar"

the constant foo would have the value 100 in integer context, but "bar" in string context.

See the next section for more on constants, and Expression syntax below for details of the syntax accepted in each context.

Lexical categories

The following broad lexical categories are used in command mode (parenthesised names are used in syntax descriptions below):

Comments
are introduced with a semicolon, and last to the end of the line.
Barewords (bareword)

have the same syntax as identifiers in most programming languages: the first character must be in the set [A-Za-z_], and the remainder must be in the set [A-Za-z0-9_].

A bareword at the start of a line, or immediately following a colon, is assumed to be a command. Otherwise their interpretation is context-sensitive:

  • If an alias exists of the desired type (a numalias in number context, or a stralias in string context) then the bareword acts as a constant, and the value of the alias is substituted.
  • In string context where no stralias exists, the bareword itself is treated as a string; it will be transformed to lower case and substituted directly.
  • In number context where no numalias exists, a warning is issued and 0 is substituted.
  • Some commands, such as rmenu, ld, and systemcall, look for barewords directly for certain parameters; in these cases aliases are not resolved.
String literals (str_lit)

are formed in two ways.They may be enclosed in regular double quotes, or in pairs of the text delimiter (^ in native scripts, ` in legacy scripts).

The two forms have slightly different semantics. Strings enclosed in text delimiters support ~-tags (described under text mode below) to apply text formatting, while tildes are literal characters in double-quoted strings.

Note: this differs from ONScripter (and some pre-release versions of Ponscripter), where double-quoted strings had semantics similar to unmarked text: in particular, whitespace was ignored.

In these interpreters, whitespace could be made significant in double-quoted text by following the opening quote with a text delimiter. This no longer has any effect, but is still supported for backwards-compatibility: the text delimiter is ignored, and the construct is equivalent to a double-quoted string.

Numeric literals (num_lit)

are straightforward.

Unlike NScripter, which accepts only decimal integers, Ponscripter also understands the C-style 0xNN notation for hexadecimal numbers.

Label literals (label)

have the general format *bareword. They are used to mark and provide targets for jump commands (goto, csel, etc) and for the construction of subroutines with commands such as defsub, textgosub, etc.

(In NScripter, label literals are a distinct type that can only be used where a command is expecting a label. ONScripter also accepts them wherever a string is expected: *foo means roughly the same thing as "foo".)

Colour literals (colour)

have the general format #RRGGBB, where RR, GG, and BB are each two hex digits. These represent colours in the standard way.

(In NScripter, colour literals are a distinct type that can only be used where a command is expecting a colour. ONScripter also accepts them wherever a string is expected: #RRGGBB means exactly the same thing as "#RRGGBB".)

Variables (int_var, str_var)

take the form of a sigil followed by either a number, a bareword (which must have been defined with numalias), or an integer variable (with sigil) for indirect access.

The sigils are % for integer variables, ? for integer arrays, and $ for string variables.

Hence %200 (an integer variable), $%foo (the string variable indexed by the current value of %foo), and ?bar[9][4] (dereferencing the multidimensional array ?bar).

Variable syntax is expressed formally in the expression sections below.

Expression syntax

Integer expressions (int_expr)

are similar to those in other languages. The syntax is infix. There are two operator precedence levels: *, /, and mod are processed before + and -. Parentheses and unary minus operate as normal.

More formally:

int_expr::= int_term binary_op int_expr
int_term::= int_paren | - int_paren
int_paren::= ( int_expr ) | int_elt
int_elt::= num_lit | int_var | bareword
int_var::= % int_elt | ? int_elt subscript+
subscript::= [ int_expr ]
binary_op::= * | / | mod | + | -
num_lit::= [0-9]+ | 0x[0-9A-Fa-f]+
bareword::= [A-Za-z_][A-Za-z_0-9]*
String expressions (str_expr)

are simpler. Their grammar is as follows:

str_expr::= str_elt | str_elt + str_expr
str_elt::= file_cond | str_lit | str_var | label | colour | bareword
file_cond::= ( str_term ) str_term str_term
str_var::= $ int_elt
str_lit::= "[^"]*?" | ^[^^]*?^
label::= * [A-Za-z_0-9]+
colour::= # [0-9A-Fa-f]{6}

The only part of the above that should not be obvious, given the descriptions under Lexical categories above, is the file_cond term. This is only useful when the filelog command is in effect. The parenthesised string is interpreted as the name of an image file. If the player has viewed this file, the first of the subsequent terms is used; otherwise, the second is used.

Conditional expressions (conditional)

are effectively a special syntax associated with the if / notif commands.

They are somewhat lacking compared to conditionals in most languages: in particular, multiple terms may be combined only with an and operator, with no or available.

Either strings or integers may be compared. The ordering of strings is deliberately left undefined; it may change without warning in the future. However, for any given Ponscripter version, the ordering will be the same across all platforms and will not be affected by users' locale settings.

The operators are C-style: == and != for equality and inequality; <, <=, >, and >= for ordering; and & to combine terms with a logical and.

(Several operators accept variant forms: && for &, = for ==, and <> for !=. These variants have no semantic difference from the canonical forms.)

Functions cannot be called from conditional expressions (you must assign the result of a function to a variable, and compare that manually), with one exception: there is hardcoded support for a function fchk, which takes a string, interprets it as the filename of a picture, and returns true iff that picture has been displayed. (This is analogous to the file_cond term in string expressions.)

The grammar is:

conditional::= cond_term | cond_term & conditional
cond_term::= comp_term | fchk str_expr
comparison::= expression comp_op expression
expression::= int_expr | str_expr
comp_op::= == | | != | > | >= | < | <=

Command syntax

The above lexemes and expressions are combined in a fairly similar way to BASIC. Commands are interpreted sequentially, one to a line; multiple commands may be placed on a single line, where required, by separating them with colons.

There are several forms of command:

  • Procedure calls consist of a bareword, normally followed by a parameter list: this is a comma-separated list of expressions (parentheses are not used).
  • Labels consist of a label literal, which serves as a name for that point in the script.

    There is also a form of anonymous label, represented by a single ~ character, which is used by the jumpf and jumpb commands.

  • Text commands consist of a text delimiter, which switches the interpreter into text mode for the remainder of the line; see next section.

Text mode

As described above, text commands begin with a text marker (^ in native scripts, ` in legacy scripts). The remainder of the line is then parsed in text mode.

Most characters in text mode represent themselves and are printed verbatim; this includes the newline at the end of each line, unless it is explicitly suppressed with /. It also includes characters with special meanings in command mode, such as colons and semicolons.

However, there are also a fair number of control characters with special meanings. Since text syntax was not so much designed as gradually accumulated, there is very little consistency in how these control characters are chosen, when exactly in the parsing process they are interpreted, and how they are printed literally. Read on for details.

Text control

Single characters with special meanings. These characters may all be printed literally by prefixing them with a single hash character, i.e. #@, #_, etc.

@
Waits for click, then continues printing text as though nothing had happened. (Unlike in many ONScripter builds, the behaviour of @ is not altered by the definition of a textgosub routine.)
\
Waits for a click, then clears the text window and begins a new page.
_
If a character has the clickstr nature, prefixing it with an underscore suppresses that behaviour; otherwise it does nothing whatsoever. clickstr is evil, so you should never need to use this. Place your pauses explicitly.
/
At the end of a line, ends a text command without beginning a new line of display text. This control only has any effect immediately before a newline character. Anywhere else in a line, even if only whitespace follows, it prints a literal slash.

Speed control

Multi-character control codes controlling text speed.

Whitespace after these codes is ignored; you can cause it to be treated literally by adding a trailing separator character, i.e. !sd| etc.

If one of these sequences would appear in literal text, it can be escaped by prefixing it with a single hash character, i.e. #!sd etc.

Due to existing conventions for script layout, these codes are also valid as standalone commands without a preceding text marker; in this case they must be the only thing on their line apart from whitespace.

!sNUM

Sets text speed; this is equivalent to the commmand

textspeed NUM

but has a more convenient syntax in cases where the speed must change within a single line.

Lower speeds are faster; 0 means there should be no deliberate delay between characters, though (as they are still printed one at a time) it may not quite lead to instantaneous display.

!sd

Resets text speed to the current player-selected default.

!wNUM

Inserts a pause of NUM milliseconds. It cannot be truncated by clicking, but can be skipped with any of the normal skip commands.

!dNUM

As !w, but the pause can also be truncated by clicking.

Colour tags

#RRGGBB, where RR, GG, and BB are each two hex digits, modifies the current text foreground colour in the obvious way. A literal hash character can be inserted with ##.

Formatting tags

All formatting other than text colour is performed with formatting tag blocks. These are delimited with tildes; a literal tilde can be inserted with ~~ (not #~... that would be consistent.)

Any number of tags can be combined within a single block, optionally separated with whitespace.

Font selection tags

The tags in this section, with the exception of c, assume that Ponscripter's eight font slots are assigned according to the following convention:

  0 - text regular
  1 - text italic
  2 - text bold
  3 - text bold italic
  4 - display regular
  5 - display italic
  6 - display bold
  7 - display bold italic

If fonts are assigned in any other way, tags such as b and i will not behave as documented; you should use c in this case. Font slots are assigned using the h_mapfont command, which is documented in Extensions.

cN
Selects the font in slot N
d
Selects the default style (equivalent to c0)
r
Disables italics (default)
i
Toggles italics
t
Disables bold weight (default)
i
Toggles bold weight
f
Selects text face (default)
s
Toggles display face
Text size

In this section, the base size refers to the font size defined for the active window; the current size refers to that selected with previous size control tags.

=N
Sets font size to exactly N pixels.
%N
Sets font size to N% of the base size.
+N
Increases current font size by N pixels.
-N
Decreases current font size by N pixels.
Text position
xN
Sets the horizontal text position to a position N pixels right of the left margin.
yN
Sets the vertical text position to a position N pixels below the top margin.
x+N, y+N
Adjusts the current horizontal or vertical text position by N pixels right or down.
x-N, y-N
Adjusts the current horizontal or vertical text position by N pixels left or up.
Indentation
n
Sets the indent to the current horizontal position. New text lines will start from this offset until the end of the current page.
u
Resets the indent to the left margin. This will only affect subsequent line breaks; to end an indented section within a page, position this at the end of the last line of the indented section.

In addition to these tags, the indent is set automatically when the first character of a page is an indent character.

The set of indent characters can be configured with the h_indentstr command (described in Extensions). By default it includes opening quotes and em dashes.

Formatting examples

As an example of the usage of these tags, Narcissu 2's omake mode displays page headings at the top of each screen with code like

^!s0~i %120 x-20 y-40~Heading~i =0~!sd
br2 120

Here the !s0 and !sd are the usual NScripter commands. The first tag block selects italic text, 120% of the regular font size, and shifts the output position up and to the left. The second tag block cancels the italic effect and resets the font size to normal.

An example of indentation:

^**%.Item 1
^Not indented
^**%.~n~Item 2
^Indented~u~
^Not indented

Ligatures and shortcuts

To assist in typing Unicode scripts with ASCII keyboards, Ponscripter has the ability to replace sequences of characters with Unicode symbols. This facility is also used to implement the hash-escaping of single-character control codes, and can be used to add ligatures automatically. It is only enabled in native scripts; none of this is possible in legacy mode.

A shortcut is a mapping of a sequence of characters to a Unicode codepoint.

A shortcut sequence can be inserted literally by separating the characters with either a Unicode ZWNJ or a | character, e.g. `|` to insert two separate open single quotes. A literal | can be inserted with ||.

By default, the following character sequences are defined, in addition to the hash escapes described above:

``
open double quotes
''
close double quotes
`
open single quote
'
apostrophe / close single quote

Additional sequences can be defined by use of the h_ligate command: see Extensions.

Variable interpolation

Unlike in vanilla NScripter, merely including the name of a variable in text does not cause it to be interpolated; this is because frankly it seems to be more common to want something like $500 to be literal text representing a sum of money.

Instead, variables will be interpolated if enclosed in braces: {$foo}, {?100[%index]}, and so forth. This is not to be confused with NScripter's rather less useful brace syntax (variable assignments), which is not supported.

The variable's sigil must immediately follow the opening brace, and only variables can be interpolated, not arbitrary expressions. To include a literal sequence of a left brace followed by a sigil character, use a separator character: {|%.

Certain control codes are recognised after variable interpolation, since they are parsed at a later stage of processing: these are text controls, speed controls, colour tags, and ligatures/shortcuts. In particular, and in contrast to NScripter, things like ^!w{%var} will be interpreted as a command to wait for however long is specified in the given variable. This should be considered an undefined behaviour, and will probably change in future; rather than rely on it, you should use the wait command (and so forth) for variable timings, and in the unlikely event that you actually intend to print the literal string !w followed by the value of %var, you should write #!w{%var} to avoid ambiguity.

Other special sequences are not recognised after interpolation. Variable interpolations are not expanded recursively. Likewise, formatting codes are not processed during interpolation; however, if the string literal in which they first appeared was delimited with ^ rather than ", they will have been processed when the string was read, and will therefore work as intended.

That is to say,

mov $var, "~b~"
^foo{$var}bar\

prints

foo~b~bar

, while

mov $var, ^~b~^
^foo{$var}bar\

prints

foobar

.

Bugs

This whole syntax may be considered a bug: it is inconvenient, irregular, and needlessly difficult to parse. Don't blame me: I didn't design it, I'm just documenting it. If you want a similar tool with sane syntax, try something like Ren'Py.

See also

Overview