=====================
The OmegaConf grammar
=====================
.. contents::
:local:
.. testsetup:: *
from omegaconf import OmegaConf
OmegaConf uses an `ANTLR `_-based grammar to parse string expressions,
where the `lexer rules `_
rules define the tokens used by the `parser rules `_.
Currently this grammar's main usage is in the parsing of :ref:`interpolations`, detailed below.
.. _interpolation-strings:
Interpolation strings
^^^^^^^^^^^^^^^^^^^^^
An interpolation string is any string containing the ``${`` character sequence (denoting the start of an interpolation),
and is parsed using the ``text`` rule of the grammar:
.. code-block:: antlr
text: (interpolation |
ANY_STR | ESC | ESC_INTER | TOP_ESC | QUOTED_ESC)+;
Such a string can either be a single interpolation, or the concatenation of multiple fragments
that can either be interpolations or regular strings
(with a special handling of escaped characters, see :ref:`escaping-in-interpolation-strings` below).
These are all examples of interpolation strings:
- ``${foo.bar}``
- ``https://${host}:${port}``
- ``Hello ${name}``
- ``${a}${oc.env:B}${c}``
Interpolation types
^^^^^^^^^^^^^^^^^^^
An ``interpolation`` as found in the rule above can either be a :ref:`config-node-interpolation`
(e.g., ``${host}``) or a call to a :ref:`resolver` (e.g., ``${oc.env:B}``).
This is reflected in the following parser rules:
.. code-block:: antlr
interpolation: interpolationNode | interpolationResolver;
interpolationNode:
INTER_OPEN // ${
DOT*
(configKey | BRACKET_OPEN configKey BRACKET_CLOSE)
(DOT configKey | BRACKET_OPEN configKey BRACKET_CLOSE)*
INTER_CLOSE; // }
interpolationResolver:
INTER_OPEN // ${
resolverName COLON sequence?
BRACE_CLOSE; // }
The following are all valid examples of config node interpolations according to the ``interpolationNode`` rule
(note in particular that it supports both dot and bracket notations to access child nodes):
- ``${host}``
- ``${.sibling}``
- ``${..uncle.cousin}``
- ``${some_list[3]}``
- ``${some_deep_dict[key1][subkey2].subsubkey3}``
Here are also examples of resolver calls from the ``interpolationResolver`` rule:
- ``${oc.env:B}``
- ``${my_resolver_without_args:}``
- ``${oc.select: missing, default}``
Resolver arguments must be provided in a comma-separated list as per the following
``sequence`` parser rule:
.. code-block:: antlr
sequence: (element (COMMA element?)*) | (COMMA element?)+;
*Note that this rule currently supports empty arguments to preserve backward compatibility
with OmegaConf 2.0, but this has been deprecated (see* `#572 `_ *).*
.. _element-types:
Element types
^^^^^^^^^^^^^
As seen in the ``sequence`` rule above, each resolver argument is parsed by an ``element`` rule,
which currently supports four main types of arguments:
.. code-block:: antlr
element:
quotedValue
| listContainer
| dictContainer
| primitive
;
A ``quotedValue`` is a quoted string that may contain basically anything in-between either double or single quotes
(including interpolations, which will be resolved at evaluation time).
For instance:
- ``"Hello World!"``
- ``'Hello ${name}!'``
- ``"I ${can: ${nest}, ${interpolations}, 'and quotes'}"``
The ``quotedValue`` parser rule is formally defined as:
.. code-block:: antlr
quotedValue:
(QUOTE_OPEN_SINGLE | QUOTE_OPEN_DOUBLE)
text?
MATCHING_QUOTE_CLOSE;
``listContainer`` and ``dictContainer`` are respectively lists and dictionaries, using a familiar syntax:
- List examples: ``[]``, ``[1, 2, 3]``, ``[${a}, ${oc.env:B}, c]``
- Dict examples: ``{}``, ``{a: 1, b: 2}``, ``{a: ${a}, b: ${oc.env:B}}``
Their corresponding parser rules are:
.. code-block:: antlr
listContainer: BRACKET_OPEN sequence? BRACKET_CLOSE;
dictContainer: BRACE_OPEN
(dictKeyValuePair (COMMA dictKeyValuePair)*)?
BRACE_CLOSE;
Regarding dictionaries, note that although values can be any ``element``, keys are more
restricted, and in particular quoted strings and interpolations are currently *not* allowed as
dictionary keys (see the definition of ``dictKey`` in the `grammar `_).
Finally, a ``primitive`` is everything else that is allowed, including in particular (see the `full grammar `_
for details):
- Unquoted strings (that support only a subset of characters, contrary to quoted ones): ``foo``, ``foo_bar``, ``hello world 123``
- Integer numbers: ``123``, ``-5``, ``+1_000_000``
- Floating point numbers (with special case-independent keywords for infinity and NaN): ``0.1``, ``1e-3``, ``inf``, ``-INF``, ``nan``
- Other special keywords (also case-independent): ``null``, ``true``, ``false``, ``NULL``, ``True``, ``fAlSe``.
**IMPORTANT**: ``None`` is *not* a special keyword and will be parsed as an unquoted string, you must
use the ``null`` keyword instead (as in YAML).
- Interpolations (thus allowing for nested interpolations)
Escaped characters
^^^^^^^^^^^^^^^^^^
Some characters need to be escaped, with varying escaping requirements depending on the situation.
In general, however, you can use the following rule of thumb:
*you only need to escape characters that otherwise have a special meaning in the current context*.
.. _escaping-in-interpolation-strings:
Escaping in interpolation strings
+++++++++++++++++++++++++++++++++
In order to define fields whose value is an interpolation-like string, interpolations can be escaped with ``\${``.
For instance:
.. doctest::
>>> c = OmegaConf.create({"path": r"\${dir}", "dir": "tmp"})
>>> print(c.path) # does *not* interpolate into the `dir` node
${dir}
If you actually want to follow a ``\`` with a resolved interpolation, this backslash
needs to be escaped into ``\\`` to differentiate it from an escaped interpolation:
.. doctest::
>>> c = OmegaConf.create({"path": r"C:\\${dir}", "dir": "tmp"})
>>> print(c.path) # *does* interpolate into the `dir` node
C:\tmp
Note that we use Python raw strings here to make code
more readable -- otherwise all ``\`` characters would need be duplicated due to how Python handles
escaping in regular string literals.
Finally, since the ``\`` character has no special meaning unless followed by ``${``,
it does *not* need to be escaped anywhere else:
.. doctest::
>>> c = OmegaConf.create({"path": r"C:\foo_${dir}", "dir": "tmp"})
>>> print(c.path) # a single \ is preserved...
C:\foo_tmp
>>> c = OmegaConf.create({"path": r"C:\\foo_${dir}", "dir": "tmp"})
>>> print(c.path) # ... and multiple \\ too (no escape sequence)
C:\\foo_tmp
Escaping in unquoted strings
++++++++++++++++++++++++++++
Unquoted strings can be found in a number of contexts, including dictionary keys/values,
list elements, etc. As a result, the escape sequences are used for some
special characters
(``\\``, ``\[``, ``\]``, ``\{``, ``\}``, ``\(``, ``\)``, ``\:``, ``\=``, ``\,``),
for instance:
- ``C\:\\$\{dir\}`` resolves to the string ``"C:\${dir}"``
- ``\[a\, b\, c\]`` resolves to the string ``"[a, b, c]"``
In addition, leading and trailing whitespaces must be escaped in unquoted strings
if we do not want them to be stripped (while inner whitespaces are always preserved):
.. doctest::
>>> c = OmegaConf.create({"esc": r"${oc.decode: \ hi u \ }"})
>>> c.esc # one leading whitespace and two trailing ones
' hi u '
>>> # Tabs are handled similarly (NB: r-strings can't be used below)
>>> c = OmegaConf.create({"esc": "${oc.decode:\t\\\thi u\t\\\t\t}"})
>>> c.esc # one leading tab and two trailing ones
'\thi u\t\t'
Escaping in unquoted strings can lead to hard-to-read expressions, and it is recommended
to switch to quoted strings instead of relying heavily on the above escape sequences.
Escaping in quoted strings
++++++++++++++++++++++++++
As can be seen from the definition of the ``quotedValue`` parser rule above, quoted strings
are just ``text`` fragments surrounded by quotes, and are thus very similar to :ref:`interpolation-strings`.
As a result, the ``\${`` escape sequence can also be used to escape interpolations
in quoted strings (as described in :ref:`escaping-in-interpolation-strings`):
- ``"\${dir}"`` resolves to the string ``"${dir}"``
- ``"C:\\${dir}"`` resolves to the string ``"C:\"``
However, one key difference with interpolation strings is that quotes of the same type
as the enclosing quotes must be escaped, unless they are within a nested interpolation.
For instance:
- ``'\'Hi you\', I said'`` resolves to the string ``"'Hi you', I said"``
- ``"'Hi ${concat: 'y', "o", u}', I said"`` also resolves to the string ``"'Hi you', I said"``
if ``concat`` is a :doc:`custom resolver` concatenating its inputs.
The main point to pay attention to in this example is that the quoted strings ``'y'`` and
``"o"`` found within the resolver interpolation ``${concat: ...}`` do *not* need to be
escaped, regardless of existing quotes outside of this interpolation.