Rounding up syslog-ng 4 and a practical introduction to typing
syslog-ng 4 is right around the corner and the work on the topics I listed in this blog post are nearing completion. Instead of a pile of breaking changes, we choose to improve syslog-ng in an evolutionary manner: providing fine grained compatibility with older versions along the way, so that syslog-ng 4 remains a drop-in replacement for any earlier release in the past 15 years.
What evolution in this context means in practice is that features/changes are merged into 3.x releases as we are ready with them, but they are all hidden behind a feature flag: they all come disabled by default and to enable them, one needs to use `@version: 4.0` at the top of one’s configuration file. This process is detailed in Peter Czanik’s blog post with a couple of real-world examples.
The first set of changes went into 3.36.1 released in March, some more followed in 3.37.1 (and a related blog post) released in June and 3.38.1 is any day now with most changes accumulated already on the “master” branch (nightly snapshots).
Hopefully, this is going to be the last 3.x release before 4.x is cut, but this also depends on feedback and issues that we might encounter in this cycle.
This release was focused primarily to get 4.0 ready and as such it concentrated pretty much on finishing up the typing feature. If you have already read the linked blog post, you might already be aware that we intend to associate runtime type information to any name-value pair that we encounter, so that we can 1) allow type aware comparisons in routing decisions, 2) reproduce the original types when sending a log message to consumers. I also expect this to be an important feature as we implement more features in our long term objectives (observability, app awareness, user friendliness).
Problems that type aware comparisons attempt to solve
Probably the most important change in 3.38.1 is the introduction of type aware comparisons. Traditionally syslog-ng had two kinds of comparison operators, just like shell scripts with the “test” builtin command.
- one to compare strings (eq, ne, gt, lt)
- one to compare numbers (==, !=, >, <)
Here’s an example:
log { source { file("/var/log/apache2/access.log"); }; parser { apache-accesslog-parser(); }; if ("${.apache.response}" >= "500") { file("/logs/apache-errors"); }; };
This example shows how to route HTTP 500 and above requests to a separate file. If you look at the if {} statement, you can see that we compared the HTTP response code to 500 and tried to capture anything that is higher than 500. But there’s a potential trap in here. ${.apache.response} is a string that contains a number. “500” in syslog-ng 3.x is also a string.
How this comparison is performed depends on the operator that we use: numeric (<, >, ==, !=) or string (lt, gt, eq, ne) focused.
If you look at the example again, you can see that we used the “numeric” operators, which means that the configuration above correctly performs the comparison, converts both “${.apache.response}” and “500” to numbers and compares them numerically.
But then, let’s see this example:
log { source { file("/var/log/apache2/access.log"); }; parser { apache-accesslog-parser(); }; if ("${.apache.httpversion}" == "1.0") { file("/logs/http-10-logs"); }; };
Again, we are trying to compare a name-value pair against a literal string, this time checking for equality, and albeit version numbers are not strictly numeric, they are in this specific case. Also, using “==” as an operator as before.
But this example will do something that is pretty unexpected:
- we used the numeric operators in the example, so syslog-ng would convert both “${.apache.httpversion}” and “1.0” to numbers
- but the numeric operators only support INTEGERS, floating point numbers are not supported (actually syslog-ng uses the function atoi(3) for this conversion)
- atoi() actually picks up the numbers before the “.” in the version number and converts that to an integer, this means “1.0” becomes 1 and “1.1” becomes 1 too!
- so the comparison above would evaluate to TRUE to any value that starts with a 1 and then a non-digit character.
- this means that all HTTP versions starting from 1.0 up to 1.9 end up in our file that we designated as one to hold 1.0 only traffic.
Not really the expected behaviour. But it becomes worse.
log { source { file("/var/log/apache2/access.log"); }; parser { apache-accesslog-parser(); }; if ("${.apache.request}" == "/wp-admin/login.php") { file("/logs/wordpress-logins"); }; };
This time, we are trying to filter our data based on a string comparison, and we erroneously used the numeric operator. This is what happens:
- Neither “${.apache.request}” nor “/wp-admin/login.php” is numeric, they don’t even have digits in front of them
- Both values are converted to 0.
- Zero equals to zero, so the filter expression above is always TRUE.
There are other similar cases, the ugliest one when comparing a name-value pair to an empty string with numeric operators. Results are completely unexpected.
Type aware comparisons come to the rescue
I saw numerous cases where someone got the operator incorrect when trying to compare/match something in syslog-ng. I felt that the issue has never been a user error, rather we made a poor job of providing a user friendly syntax and thus we pushed too much responsibility on those attempting to make use of these features.
But solving these kind of design mistakes is never easy. Some of our users have figured this out already. We don’t want to break their configuration, right? But we want to make this easier, more intuitive for new users or new use-cases.
The solution we implemented was to make the numeric operators (==, !=, <, >) do the right thing. Based on the types of its arguments, it can in most cases infer what would be the right thing to do. So let’s help them there. We took some inspiration from JavaScript (which operates in a similar string-heavy environment) and implemented more intuitive rules for our – previously numeric only – comparisons.
Let’s see our previous examples:
@version: 4.0 log { source { file("/var/log/apache2/access.log"); }; parser { apache-accesslog-parser(); }; if ("${.apache.response}" >= 500) { file("/logs/apache-errors"); }; };
If you compare this to my previous example, I have removed the quotes from around “500”. In syslog-ng 3.x, the quotes were mandatory. In 4.x, they are not. If you are not using quotes, the literal 500 becomes a numeric literal. And comparing a string to a number would compare those as numeric (e.g. just like JavaScript). We could even improve the Apache parser to make ${.apache.response} a number as it parses its input, but to do the right thing, it is enough that one side of the comparison is numeric.
Next example:
@version: 4.0 log { source { file("/var/log/apache2/access.log"); }; parser { apache-accesslog-parser(); }; if ("${.apache.httpversion}" == "1.0") { file("/logs/http-10-logs"); }; };
I haven’t changed anything in this example, both “${.apache.httpversion}” and “1.0” are strings and they are compared as strings. So this time, only HTTP/1.0 would be routed to our logfile. 1.1 or even 1.9 would be filtered out, as expected. We could use floating point based comparisons if we wanted to by removing the quotes (just like in the previous example) or by using explicit type-casting:
if ("${.apache.httpversion}" == 1.0) OR
if (double("${.apache.httpversion}") == "1.0")
Type casting can be applied anywhere where we used template strings before to apply a type to the result of the template expansion.
And here’s the third example:
@version: 4.0 log { source { file("/var/log/apache2/access.log"); }; parser { apache-accesslog-parser(); }; if ("${.apache.request}" == "/wp-admin/login.php") { file("/logs/wordpress-logins"); }; };
Again, no changes necessary. Both sides are strings, we are comparing as strings. No need to use the “eq” operator. Just one set of operators and sometimes explicit type-casts will cover all use-cases. For compatibility reasons, the old “string” operators (eq, ne, lt, gt) remain to be available, but I hope we can forget those eventually.
Other typing related changes
This section briefly lists the various components that we needed to adapt to typing. These changes happened since 3.36.1 was released but not explicitly announced in those versions. Let me know if you are interested in any of these topics in more detail, probably there are a couple of blog posts worth of content here:
- type aware comparisons in filter expressions: as detailed above, the previously numeric operators become type aware and the exact comparison performed will be based on types associated with the values that we compare.
- json-parser() and $(format-json): JSON support is massively improved with the introduction of types. For one: type information is retained across input parsing->transformation->output formatting. JSON lists (arrays) are now supported and are converted to syslog-ng lists so they can be manipulated using the $(list-*) template functions. There are other important improvements in how we support JSON.
- set(), groupset(): in any case where we allow the use of templates, support for type-casting was added and the type information is properly promoted.
- db-parser() type support: db-parser() gets support for type casts, <value> assignments within db-parser() rules can associate types with values using the type-casting syntax, e.g. <value name=”foobar”>int32($PID)</value>. The “int32” is a type-cast that associates $foobar with an integer type. db-parser()’s internal parsers (e.g. @NUMBER@) will also associated type information with a name-value pair automatically.
- add-contextual-data() type support: any new name-value pair that is populated using add-contextual-data() will propagate type information, similarly to db-parser().
- map-value-pairs() type support: propagate type information
- SQL type support: the sql() driver gained support for types, so that columns with specific types will be stored as those types.
- template type support: templates can now be casted explicitly to a specific type, but they also propagate type information from macros/template functions and values in the template string
- value-pairs type support: value-pairs form the backbone of specifying a set of name-value pairs and associated transformations to generate JSON or a key-value pair format. It also gained support for types, the existing type-hinting feature that was already part of value-pairs was adapted and expanded to other parts of syslog-ng.
- on-disk serialized formats (e.g. disk buffer/logstore): we remain compatible with messages serialized with an earlier version of syslog-ng, and the format we choose remains compatible for “downgrades” as well. E.g. even if a new version of syslog-ng serialized a message, the old syslog-ng and associated tools will be able to read it (sans type information of course)
Recent Comments