ponscr-syntax — description of Ponscripter syntax
This page documents the syntax of Ponscripter scripts. See Overview for an overview of other documentation.
Note that this cannot be considered NScripter documentation. NScripter itself is largely unspecified. Ponscripter's implementation is ultimately based on observation and documentation rather than on reverse-engineering as such: it inevitably adopts different parsing strategies, and is more liberal in what it accepts. Not all differences are described below. With this disclaimer out of the way: the documentation.
Scripts are line-based.
There are two parsing modes: command mode and text mode. Parsing of each line begins in command mode, and switches to text mode for the rest of the line if a text command is encountered.
The two parsing modes have little in common. The following sections discuss command mode first; text mode is then treated separately.
A third mode, “unmarked text”, exists for legacy
reasons. This mode is similar to text mode, and is entered if
an invalid character is encountered at the start of a command:
that is, a number, a character outside the ASCII range, or
anything in the set [[@\/%?$(!#,]
. Such
lines are valid if they contain only
!
-commands; otherwise a warning is issued, and
the behaviour is undefined.
(This mode derives from NScripter, where traditionally there
was no intersection between printable characters and command
characters; the `
text marker was introduced in
ONScripter as a means of supporting English text, and replaced
in Ponscripter with ^
to free up `
for other uses. Since unmarked text serves no useful purpose,
and complicates parsing, it is deprecated and will be removed
without notice at some point in the future.)
NScripter is a context-sensitive language. Each parameter to a command may be parsed differently based on the type of that parameter. The major types are string and integer, with labels and barewords being special cases of string parameters.
String expressions do not merely have a different type from integer expressions, as in other languages: they have a distinct syntax. Some string expressions can be parsed as integer expressions, but then leave code unparsed that will cause a syntax error when it is reached. It is impossible, in the general case, to parse a line of code unless it is known in advance what context each parameter is using.
For example, given the following definitions:
numalias
foo, 100stralias
foo, "bar"
the constant foo
would have the value
100
in integer context, but
"bar"
in string context.
See the next section for more on constants, and Expression syntax below for details of the syntax accepted in each context.
The following broad lexical categories are used in command mode (parenthesised names are used in syntax descriptions below):
bareword
)
have the same syntax as identifiers in most programming
languages: the first character must be in the set
[A-Za-z_]
, and the remainder must be
in the set [A-Za-z0-9_]
.
A bareword at the start of a line, or immediately following a colon, is assumed to be a command. Otherwise their interpretation is context-sensitive:
rmenu
,
ld
, and
systemcall
, look for barewords
directly for certain parameters; in these cases
aliases are not resolved.
str_lit
)
are formed in two ways.They may be enclosed in regular
double quotes, or in pairs of the text delimiter
(^
in native scripts, `
in
legacy scripts).
The two forms have slightly different semantics.
Strings enclosed in text delimiters support
~
-tags (described under
text mode below) to apply text
formatting, while tildes are literal characters in
double-quoted strings.
Note: this differs from ONScripter (and some pre-release versions of Ponscripter), where double-quoted strings had semantics similar to unmarked text: in particular, whitespace was ignored.
In these interpreters, whitespace could be made significant in double-quoted text by following the opening quote with a text delimiter. This no longer has any effect, but is still supported for backwards-compatibility: the text delimiter is ignored, and the construct is equivalent to a double-quoted string.
num_lit
)
are straightforward.
Unlike NScripter, which accepts only decimal integers,
Ponscripter also understands the C-style
0x
notation for hexadecimal numbers.
NN
label
)
have the general format
*
. They
are used to mark and provide targets for jump commands
(bareword
goto
, csel
, etc)
and for the construction of subroutines with commands
such as defsub
,
textgosub
, etc.
(In NScripter, label literals are a distinct type that
can only be used where a command is expecting a label.
ONScripter also accepts them wherever a string is
expected: *foo
means roughly the same
thing as "foo"
.)
colour
)
have the general format
#
, where
RRGGBB
RR
,
GG
, and
BB
are each two hex digits.
These represent colours in the standard way.
(In NScripter, colour literals are a distinct type that
can only be used where a command is expecting a colour.
ONScripter also accepts them wherever a string is
expected:
#
means exactly the same thing as
RRGGBB
"#
.)
RRGGBB
"
int_var
,
str_var
)
take the form of a sigil followed by either a number, a
bareword (which must have been defined with
numalias
), or an integer variable
(with sigil) for indirect access.
The sigils are %
for integer variables,
?
for integer arrays, and $
for string variables.
Hence %200
(an integer variable),
$%foo
(the string variable indexed by
the current value of %foo
), and
(dereferencing the multidimensional array
?bar
[9][4]?bar
).
Variable syntax is expressed formally in the expression sections below.
int_expr
)
are similar to those in other languages. The syntax is
infix. There are two operator precedence levels:
*
, /
, and mod
are
processed before +
and -
.
Parentheses and unary minus operate as normal.
More formally:
int_expr | ::= | int_term binary_op int_expr |
int_term | ::= | int_paren | “-” int_paren |
int_paren | ::= | “(” int_expr “)” | int_elt |
int_elt | ::= |
num_lit | int_var | bareword
|
int_var | ::= |
“%” int_elt | “?” int_elt subscript +
|
subscript | ::= |
“[” int_expr “]”
|
binary_op | ::= | “*” | “/” | “mod” | “+” | “-” |
num_lit | ::= |
[0-9]+ | 0x[0-9A-Fa-f]+
|
bareword | ::= |
[A-Za-z_][A-Za-z_0-9]*
|
str_expr
)
are simpler. Their grammar is as follows:
str_expr | ::= |
str_elt | str_elt “+” str_expr
|
str_elt | ::= |
file_cond | str_lit | str_var | label | colour | bareword
|
file_cond | ::= |
“(” str_term “)” str_term str_term
|
str_var | ::= |
“$” int_elt
|
str_lit | ::= |
"[^"]*?" | ^[^^]*?^
|
label | ::= |
“*” [A-Za-z_0-9]+
|
colour | ::= |
“#” [0-9A-Fa-f]{6}
|
The only part of the above that should not be obvious, given
the descriptions under Lexical categories
above, is the file_cond
term. This
is only useful when the filelog
command
is in effect. The parenthesised string is interpreted as the
name of an image file. If the player has viewed this file,
the first of the subsequent terms is used; otherwise, the
second is used.
conditional
)
are effectively a special syntax associated with the
if
/ notif
commands.
They are somewhat lacking compared to conditionals in most languages: in particular, multiple terms may be combined only with an “and” operator, with no “or” available.
Either strings or integers may be compared. The ordering of strings is deliberately left undefined; it may change without warning in the future. However, for any given Ponscripter version, the ordering will be the same across all platforms and will not be affected by users' locale settings.
The operators are C-style: ==
and
!=
for equality and inequality;
<
, <=
,
>
, and >=
for ordering;
and &
to combine terms with a logical
“and”.
(Several operators accept variant forms:
&&
for &
,
=
for ==
, and
<>
for !=
. These
variants have no semantic difference from the canonical
forms.)
Functions cannot be called from conditional expressions
(you must assign the result of a function to a variable,
and compare that manually), with one exception: there is
hardcoded support for a function
fchk
, which takes a string,
interprets it as the filename of a picture, and returns
true iff that picture has been displayed. (This is
analogous to the file_cond
term in string expressions.)
The grammar is:
conditional | ::= |
cond_term | cond_term “&” conditional
|
cond_term | ::= |
comp_term | “fchk” str_expr
|
comparison | ::= |
expression comp_op expression
|
expression | ::= |
int_expr | str_expr
|
comp_op | ::= | “==” | | “!=” | “>” | “>=” | “<” | “<=” |
The above lexemes and expressions are combined in a fairly similar way to BASIC. Commands are interpreted sequentially, one to a line; multiple commands may be placed on a single line, where required, by separating them with colons.
There are several forms of command:
Labels consist of a label literal, which serves as a name for that point in the script.
There is also a form of anonymous label, represented by
a single ~
character, which is used by the
jumpf and jumpb
commands.
As described above, text commands begin with a text marker
(^
in native scripts, `
in legacy
scripts). The remainder of the line is then parsed in text
mode.
Most characters in text mode represent themselves and are
printed verbatim; this includes the newline at the end of each
line, unless it is explicitly suppressed with /
.
It also includes characters with special meanings in command
mode, such as colons and semicolons.
However, there are also a fair number of control characters with special meanings. Since text syntax was not so much designed as gradually accumulated, there is very little consistency in how these control characters are chosen, when exactly in the parsing process they are interpreted, and how they are printed literally. Read on for details.
Single characters with special meanings. These characters may
all be printed literally by prefixing them with a single hash
character, i.e. #@
, #_
,
etc.
@
@
is not altered
by the definition of a textgosub
routine.)
\
_
clickstr
nature, prefixing it with an underscore suppresses that
behaviour; otherwise it does nothing whatsoever.
clickstr
is evil, so you should
never need to use this. Place your pauses explicitly.
/
Multi-character control codes controlling text speed.
Whitespace after these codes is ignored; you can cause it to
be treated literally by adding a trailing separator character,
i.e. !sd
|
etc.
If one of these sequences would appear in literal text, it can
be escaped by prefixing it with a single hash character, i.e.
#!sd
etc.
Due to existing conventions for script layout, these codes are also valid as standalone commands without a preceding text marker; in this case they must be the only thing on their line apart from whitespace.
!s
NUM
Sets text speed; this is equivalent to the commmand
textspeed
NUM
but has a more convenient syntax in cases where the speed must change within a single line.
Lower speeds are faster; 0
means
there should be no deliberate delay between characters,
though (as they are still printed one at a time) it may
not quite lead to instantaneous display.
!sd
Resets text speed to the current player-selected default.
!w
NUM
Inserts a pause of NUM
milliseconds. It cannot be truncated by clicking, but
can be skipped with any of the normal skip commands.
!d
NUM
As !w
, but the pause can also be
truncated by clicking.
#
RRGGBB
, where
RR
, GG
,
and BB
are each two hex digits,
modifies the current text foreground colour in the obvious
way. A literal hash character can be inserted with
##
.
All formatting other than text colour is performed with
formatting tag blocks. These are delimited with tildes; a
literal tilde can be inserted with ~~
(not #~
... that would
be consistent.)
Any number of tags can be combined within a single block, optionally separated with whitespace.
The tags in this section, with the exception of
c
, assume that Ponscripter's eight
font slots are assigned according to the following
convention:
0 - text regular
1 - text italic
2 - text bold
3 - text bold italic
4 - display regular
5 - display italic
6 - display bold
7 - display bold italic
If fonts are assigned in any other way, tags such as
b
and i
will
not behave as documented; you should use
c
in this case. Font slots are
assigned using the h_mapfont
command, which is documented in Extensions.
c
N
N
d
c0
)
r
i
t
i
f
s
In this section, the base size refers to the font size defined for the active window; the current size refers to that selected with previous size control tags.
=
N
N
pixels.
%
N
N
% of
the base size.
+
N
N
pixels.
-
N
N
pixels.
x
N
N
pixels right of the
left margin.
y
N
N
pixels below the
top margin.
x+
N
, y+
N
N
pixels
right or down.
x-
N
, y-
N
N
pixels
left or up.
n
u
In addition to these tags, the indent is set automatically when the first character of a page is an indent character.
The set of indent characters can be configured with the
h_indentstr
command (described in
Extensions). By default it includes opening quotes
and em dashes.
As an example of the usage of these tags, Narcissu 2's omake mode displays page headings at the top of each screen with code like
^!s0
~i %120 x-20 y-40~Heading~i =0~!sd
br2
120
Here the !s0
and
!sd
are the usual NScripter
commands. The first tag block selects italic text, 120%
of the regular font size, and shifts the output position
up and to the left. The second tag block cancels the
italic effect and resets the font size to normal.
An example of indentation:
^**%.Item 1 ^Not indented ^**%.~n~Item 2 ^Indented~u~ ^Not indented
To assist in typing Unicode scripts with ASCII keyboards, Ponscripter has the ability to replace sequences of characters with Unicode symbols. This facility is also used to implement the hash-escaping of single-character control codes, and can be used to add ligatures automatically. It is only enabled in native scripts; none of this is possible in legacy mode.
A shortcut is a mapping of a sequence of characters to a Unicode codepoint.
A shortcut sequence can be inserted literally by separating
the characters with either a Unicode ZWNJ or a
|
character, e.g. `|`
to
insert two separate open single quotes. A literal
|
can be inserted with
||
.
By default, the following character sequences are defined, in addition to the hash escapes described above:
``
''
`
'
Additional sequences can be defined by use of the
h_ligate
command: see Extensions.
Unlike in vanilla NScripter, merely including the name of a
variable in text does not cause it to be interpolated; this is
because frankly it seems to be more common to want something
like $500
to be literal text representing a
sum of money.
Instead, variables will be interpolated if enclosed in braces:
{$foo}
, {?100[%index]}
,
and so forth. This is not to be confused with NScripter's
rather less useful brace syntax (variable assignments), which
is not supported.
The variable's sigil must immediately follow the opening
brace, and only variables can be interpolated, not arbitrary
expressions. To include a literal sequence of a left brace
followed by a sigil character, use a separator character:
{|%
.
Certain control codes are recognised after variable
interpolation, since they are parsed at a later stage of
processing: these are text controls, speed controls, colour
tags, and ligatures/shortcuts. In particular, and in contrast
to NScripter, things like ^!w{%var}
will be
interpreted as a command to wait for however long is specified
in the given variable. This should be considered an undefined
behaviour, and will probably change in future; rather than
rely on it, you should use the wait
command (and so forth) for variable timings, and in the
unlikely event that you actually intend to print the literal
string !w
followed by the value of
%var
, you should write
#!w{%var}
to avoid ambiguity.
Other special sequences are not recognised after
interpolation. Variable interpolations are not expanded
recursively. Likewise, formatting codes are not processed
during interpolation; however, if the string literal in which
they first appeared was delimited with ^
rather than "
, they will have been
processed when the string was read, and will therefore work as
intended.
That is to say,
mov
$var, "~b~"
^foo{$var}bar\
prints
foo~b~bar
, while
mov
$var, ^~b~^
^foo{$var}bar\
prints
foobar
.