syslog-ng 4 theme: typing
As explained in my previous post, we do have some features already in mind for syslog-ng 4, even though the work on creating a long term set of objectives for the syslog-ng project is not finished yet. One of the themes I that I have working code for already, is typing.
syslog-ng traditionally assumes that log data, even if it comes in a structured form (like RFC5424 structured data or JSON) is primarily textual in nature. For this reason, name-value pairs in syslog-ng are text values just as the log message as a whole. The need for typing however came up previously, most notably in cases where we sent data to a consumer that supported typing, such as:
- Elastic like other similar consumers use JSON, and attributes can have non-text types
- SQL columns have types
- Riemann metrics can have types
Also, it happens that typing has an impact in log routing decisions. In a lot of cases, textual comparisons or regexp matches are fine, however sometimes your routing condition depends on a value being larger than or less than a numeric value. For example:
log { if ("${.apache.bytes}" > "10000") { # do something } };
In this case, doing the comparison as texts is clearly incorrect, if ${.apache.bytes} was “5”, the condition above would pass, as the string “5” is larger than “10000”, which is clearly not the case if we were to compare these numerically. To allow both numeric and textual comparisons, syslog-ng has two sets of operators, the usual “<“, “=” and “>” are doing numeric comparisons, while “lt”, “eq” and “gt” are doing string comparisons. But it’s pretty easy to mix those up, even I make that mistake sometimes.
To address both problems, type support is being added to syslog-ng. The change by itself is pretty simple:
- we add a “type” value associated with each name-value pair of the log message,
- the value itself continues to be stored internally in their current, text based format,
- whenever we need type information in a type aware context (e.g. when we format a JSON or send a riemann event), we would use this type information
- whenever we just need the name-value pair as before, in textual context, we would just continue to use the existing string based value
The consequences:
- type aware consumers (like: JSON, Elastic, Riemann, MongoDB, etc) would use type information automatically, no need for explicit type hints
- we can implement type aware comparisons, so that syslog-ng does the right comparison, based on types (e.g. like JavaScript).
As always, this is probably easier to understand with examples.
Type aware JSON parsing/reproduction
@version: 4.0 log { source { tcp(port(2000) flags(no-parse)); }; parser { json-parser(prefix('.json.')); }; destination { file("/tmp/json.out" template("$(format-json .json.* --shift-levels 2)\n")); }; };
This configuration expects JSON payloads, one by each line, on TCP port 2000. It parses the JSON and then reformats it using $(format-json). Let’s run this configuration:
$ /sbin/syslog-ng -Fedvtf /etc/syslog-ng/syslog-ng-typing-demo.conf
Let’s send a JSON payload to this syslog-ng instance:
$ echo '{"text": "string", "number": 5, "bool": true, "thisisnull": null, "list": [5,6,7,8]}' | nc -q0 localhost 2000
syslog-ng reports the parsing process in its debug/trace log levels:
[2022-03-03T08:40:56.408225] json-parser message processing started; input='{"text": "string", "number": 5, "bool": true, "thisisnull": null, "list": [5,6,7,8]}', prefix='.json.', marker='(null)', msg='0x7ffff00141c0' [2022-03-03T08:40:56.408461] Setting value; name='.json.text', value='string', msg='0x7ffff00141c0' [2022-03-03T08:40:56.408500] Setting value; name='.json.number', value='5', msg='0x7ffff00141c0' [2022-03-03T08:40:56.408524] Setting value; name='.json.bool', value='true', msg='0x7ffff00141c0' [2022-03-03T08:40:56.408545] Setting value; name='.json.thisisnull', value='', msg='0x7ffff00141c0' [2022-03-03T08:40:56.408592] Setting value; name='.json.list', value='5,6,7,8', msg='0x7ffff00141c0'
Note the individial name-value pairs being set as they are extracted from the JSON format. And then this is reproduced on the output side:
{ "thisisnull": null, "text": "string", "number": 5, "list": [ "5", "6", "7", "8" ], "bool": true }
Please note that “numer” is numeric and “list” contains a JSON list. One limitation that is still visible here is that list elements are not typed and are always strings when being reproduced using $(format-json), because list elements are not name-value pairs.
Associate type information with name-value pairs
It is not just JSON that can set types for name-value pairs, rewrite rules and db-parser() can also set them. In rewrite rules, set() can now take a type hint, and that type-hint gets associated with the value as its type:
#this makes $PID numeric rewrite { set(int("$PID") value("PID")); };
Also, db-parser() would set type information depending on which parser we used extract the specific field. For instance @NUMBER@ would extract an integer.
Type information returned by macros and templates and template functions
Template functions will be able to return the type, depending on the function they perform. For instance the list handling functions like $(list-slice) would return a list. Numerical functions like $(+) would return numbers. Likewise, some macros are also being annotated with their types.
Template expressions as a whole also become typed, whenever we use an “simple” template expression (e.g. with just one ‘$’ reference, like “$PID”), the type of the template is inferred automatically and that type is propagated. If the inferred type is not correct, you can always use type-hints to “cast” the template expression to some other type.
When does it become available? When I can try it?
Since the typing behavior has the potential of changing the output in certain ways (e.g. produce a numeric value which used a string before), we are not turning this feature on automatically. As long as we are in the 3.x release train, it will stay disabled, even as parts of it are being merged. You can evaluate the feature by setting your config version (e.g. @version at the top of the config file), to 4.0, as shown with the example config above.
Then, as we release 4.0, the typing feature will be enabled by default for any configuration that uses @version: 4.0.
Most of the feature is already implemented, but not yet merged to the mainline yet. There are opened PRs on GitHub. 3.36 is expected to contain the first batch (e.g. JSON parser pieces), but not the complete change. I expect the changes to land in mainline in 1 or 2 extra release cycles, e.g. the end of April or end of June.
Stay tuned!
Recent Comments