Select Page
syslog-ng on the long term: a draft on strategic directions

syslog-ng on the long term: a draft on strategic directions

I made a promise some posts ago that I would use this blog both for collecting feedback and to provide information about potential next steps ahead of syslog-ng. In the same post, I also promised that you, the syslog-ng community, would have a chance to steer these directions. Please read on to find out how to do that.

In the past few weeks I performed a round of discussions/interviews with syslog-ng users. I also spent time looking at other products and analyst reports on the market. Based on all this information I’ve come up with a list of potential strategic directions for syslog-ng to tackle. Focusing on these and prioritizing features that fall into one of these directions ensures that syslog-ng indeed moves ahead.

When I performed similar goal setting exercises in my previous CTO role at Balabit, our team made something similar:

  1. brainstorming on potential directions,
  2. drafting up a cleaned up conclusion document,
  3. validating that the document is a good summary of the discussion and
  4. validating via customers that they are indeed a good summary of what the customers need.

syslog-ng is an Open Source project, so I wanted to involve the community somehow. Organizing a brainstorming session sounds difficult on-line (do you know good solutions for this?). So I wanted to create an opportunity to talk with the broad community about my thoughts somehow, in a way that leads to a useful conclusion. This is the primary intent behind this post.

Once you read the directions below, please think about if you agree with my choice of directions here! Are these indeed the most important things? Have I missed something? Do you have something in mind that should be integrated somehow? Which of the directions do you consider the most important?

Please give your feedback via this form https://forms.gle/xJ2heSHeVb7ZHUHH9, write a comment  on the blog or drop me an email. Thanks.

1. The Edge

syslog-ng has traditionally been used as a tool for log aggregation, e.g. working on the server side. That’s why its CPU and memory usage has always been in focus. Being able to consume a million (sometimes millions!) of messages a second is important for server use-cases, however I think that in exchange for this focus, syslog-ng has neglected the other side of the spectrum: the Edge.

The Edge is where log messages are produced by infrastructure and applications and then sent away to a centralized logging system.

syslog-ng trackles the original “syslogd-like” deployment scenarios on the Edge, but lacks features/documentation that make it easy to deploy it in a more modern setting, e.g. as a part of a Kubernetes cluster or as a part of a cloud-native application.

Apart from the deployment questions, I consider The Edge to be also important for improving data quality and thus improving the usefulness of collected log data. I see that in a lot of cases today, log data is collected without associated meta-information. And without that meta information it becomes very difficult to understand the originating context of said log data, limiting the ability to extract insights and understanding from logs.

These are the kind of features that fall into this bucket, in no particular order:

  • Transport that is transparently carrying metadata as well as log data, plus multi-line messages (this is probably achieved by EWMM already)
  • Kubernetes (container logs, pod related meta information, official image)
  • Document GCP/AWS/Azure deployments, log data enrichment
  • non-Linux support (Windows and other UNIXes)
  • Fetch logs from Software as a Service products
  • etc

2. Cloud Native

The cloud is not just a means to deploy our existing applications to a rented infrastructure. It is a set of engineering practices that make developing applications faster and more reliable. Applications are deployed as a set of microservices, each running in its own container, potentially distributed along a cluster of compute nodes. Components of the applications managed via some kind of container orchestration system, such as Kubernetes.

Being friendly to these new environments is important, as new applications are increasingly using this paradigm.

Features in this category:

  • Container images for production
    • as a logging side-car to collect app logs and transfer them to the centralized logging function or
    • as an application specific, local logging repository (e.g. app specific server)
  • HTTP ingestion API
    • these apps tend to communicate using HTTP, so it is more native to use that even for log ingestion
    • maybe provide compatibility with other aggregation solutions (Elastic, Splunk, etc)
  • Object Storage support
  • Stateless & persistent queueing (kafka?)
  • etc

3. Observability

The term observability roots in control theory, however it is increasingly applied to the operations of IT systems. Being observable in this context means that the IT system provides an in-depth view into its inner behaviours, making it simpler to troubleshoot problems or increase performance. Observability today often implies three distinct types of data: metrics, traces and logs.

I originally met this term in relation to Prometheus, an Open Source package that collects and organizes application specific metrics in a manner that easily adapts to cloud native, elastic workloads. Traditional monitoring tools (such as Zabbix or Nagios) require a top-down, manual configuration, while Prometheus reversed this concept and pushed this responsibility to application authors. Applications should expose their important metrics so that application monitoring works “out-of-the-box”. This idea quickly gained momentum as manually configuring monitoring tools to adapt automatically scaled application components is pretty much impossible.

Albeit observability originally comes from the application monitoring space, its basic ideas can be extended to cover traces and logs as well.

Features in this category:

  • Being observable: provide a prometheus exporter so that we can become observable out-of-the-box
  • Interoperate with Observability platforms
    • Loki destination
    • Support for OpenTelemetry (source and destination)
    • convert logs from metrics/traces and vice-versa

4. Application awareness

syslog has been a great invention: it has served us in the last 40-45 years and its importance continues into the future. Operating systems, network devices, IoT, applications, containers, container orchestration systems can all push their log data to syslog. For some of those, using syslog is the only option.

In a way syslog is the common denominator of all log producing IT systems out there and as such it has become the shared infrastructure to carry logs in a lot of environments.

In my opinion, the success of syslog stems from the simplicity of using it: just send a datagram to port 514 and you are done. However this simplicity is also its biggest limitation: it is under-specified. There have been attempts at standardization (RFC3164 and RFC5424) but these serve more as “conventions” than standards.

The consequence is that incompatible message formats limit the usefulness of log data, once collected in a central repository. I regularly see issues such as:

  • unparseable and partial timestamps
  • missing or incorrect timezone information
  • missing information about the application’s name (e.g. $PROGRAM) or hostname
  • incorrectly framed multi-line messages
  • key=value data that is in a format downstream systems are unable to parse

Sometimes it’s not the individual log entry that is the problem, rather the overly verbose logging format that becomes difficult to work with once you start using it for dashboards/queries:

  • The Linux audit system produces very verbose, multi-line logs about a single OS operation
  • Mail systems emit multiple log entries for a single email transaction, sometimes a separate log entry for each attachment.
  • etc

syslog-ng has always been good in the various heuristics to properly extract information even from incorrectly formatted syslog messages, however there are extreme cases where applications omit crucial information or use a syntax so far away from the spec that even syslog-ng is unable to parse the data correctly.

Application awareness in this context means the ability of fixing up the syslog parsing with the knowledge of the application that produced it. It is difficult to craft heuristics that work with all incorrect formats, however once we start with identifying the application, then we can correctly determine what the log message was intended to look like. Fixing these issues before the message hits a consumer (e.g. SIEM) helps a lot in actually using the data we store.

Also, being application aware also implies that log routing decisions can become policy aware. “Forward me all the security logs” is a common request from any security department. However actually doing this is not simple: what should constitute as “security”? Being application aware means that it becomes possible to classify based on applications instead of individual log messages.

Features in this category:

  • classifying incoming logs per application (e.g. app-parser() and its associated application adapters)
  • fix incoming logs and make them formatted in a way that becomes easier to handle by downstream consumers (timestamps, multi-line messages, etc.)
  • translate incoming logs into a format that a downstream system best understands

5. User friendliness

syslog-ng is a domain specific language for log management. Its performance is a crucial characteristic, but the complexity of operations performed by syslog-ng, still within the log management layer has grown tremendously. Making syslog-ng easier to understand, errors and problems easier to diagnose is important in order to deal with this complexity. Having first class documentation is also important for it to succeed in any of these directions, described above.

So albeit not functionality by itself, I consider User friendliness a top-priority for syslog-ng.

Features in this category:

  • syntax improvements can go a long way of adopting a feature. syslog-ng has always been able to do conditional routing of log messages however if()/elif()/else went a long way in getting it adopted. There are other potential improvements in the syntax that could help reading/writing syslog-ng configurations easier.
  • configuration diagnostics: better location reporting in error messages, warnings, etc.
  • interactive debuggability: as syslog-ng is applied to more complex problems, the related configuration becomes more complex too. Today, you have to launch syslog-ng in foreground, inject a message and try to follow its operations using the builtin trace messages. Interactive debugging would go a long way in making the writing and testing these functionalities.

Those are roughly the directions I have in mind for the future of syslog-ng. If you disagree or have some comments, please provide feedback via the form at: https://forms.gle/xJ2heSHeVb7ZHUHH9

syslog-ng 4 theme: typing

syslog-ng 4 theme: typing

As explained in my previous post, we do have some features already in mind for syslog-ng 4, even though the work on creating a long term set of objectives for the syslog-ng project is not finished yet. One of the themes I that I have working code for already, is typing.

syslog-ng traditionally assumes that log data, even if it comes in a structured form (like RFC5424 structured data or JSON) is primarily textual in nature. For this reason, name-value pairs in syslog-ng are text values just as the log message as a whole. The need for typing however came up previously, most notably in cases where we sent data to a consumer that supported typing, such as:

  • Elastic like other similar consumers use JSON, and attributes can have non-text types
  • SQL columns have types
  • Riemann metrics can have types

Also, it happens that typing has an impact in log routing decisions. In a lot of cases, textual comparisons or regexp matches are fine, however sometimes your routing condition depends on a value being larger than or less than a numeric value. For example:

log {
   if ("${.apache.bytes}" > "10000") {
      # do something
   }
};

In this case, doing the comparison as texts is clearly incorrect, if ${.apache.bytes} was “5”, the condition above would pass, as the string “5” is larger than “10000”, which is clearly not the case if we were to compare these numerically. To allow both numeric and textual comparisons, syslog-ng has two sets of operators, the usual “<“, “=” and “>” are doing numeric comparisons, while “lt”, “eq” and “gt” are doing string comparisons. But it’s pretty easy to mix those up, even I make that mistake sometimes.

To address both problems, type support is being added to syslog-ng. The change by itself is pretty simple:

  • we add a “type” value associated with each name-value pair of the log message,
  • the value itself continues to be stored internally in their current, text based format,
  • whenever we need type information in a type aware context (e.g. when we format  a JSON or send a riemann event), we would use this type information
  • whenever we just need the name-value pair as before, in textual context, we would just continue to use the existing string based value

The consequences:

  • type aware consumers (like: JSON, Elastic, Riemann, MongoDB, etc) would use type information automatically, no need for explicit type hints
  • we can implement type aware comparisons, so that syslog-ng does the right comparison, based on types (e.g. like JavaScript).

As always, this is probably easier to understand with examples.

Type aware JSON parsing/reproduction

@version: 4.0
log {
  source { tcp(port(2000) flags(no-parse)); };
  parser { json-parser(prefix('.json.')); };
  destination { file("/tmp/json.out" template("$(format-json .json.* --shift-levels 2)\n")); };
};

This configuration expects JSON payloads, one by each line, on TCP port 2000. It parses the JSON and then reformats it using $(format-json). Let’s run this configuration:

$ /sbin/syslog-ng -Fedvtf /etc/syslog-ng/syslog-ng-typing-demo.conf

Let’s send a JSON payload to this syslog-ng instance:

$ echo '{"text": "string", "number": 5, "bool": true, "thisisnull": null, "list": [5,6,7,8]}' | nc -q0 localhost 2000

syslog-ng reports the parsing process in its debug/trace log levels:

[2022-03-03T08:40:56.408225] json-parser message processing started; input='{"text": "string", "number": 5, "bool": true, "thisisnull": null, "list": [5,6,7,8]}', prefix='.json.', marker='(null)', msg='0x7ffff00141c0'
[2022-03-03T08:40:56.408461] Setting value; name='.json.text', value='string', msg='0x7ffff00141c0'
[2022-03-03T08:40:56.408500] Setting value; name='.json.number', value='5', msg='0x7ffff00141c0'
[2022-03-03T08:40:56.408524] Setting value; name='.json.bool', value='true', msg='0x7ffff00141c0'
[2022-03-03T08:40:56.408545] Setting value; name='.json.thisisnull', value='', msg='0x7ffff00141c0'
[2022-03-03T08:40:56.408592] Setting value; name='.json.list', value='5,6,7,8', msg='0x7ffff00141c0'

Note the individial name-value pairs being set as they are extracted from the JSON format. And then this is reproduced on the output side:

{
  "thisisnull": null,
  "text": "string",
  "number": 5,
  "list": [
    "5",
    "6",
    "7",
    "8"
  ],
  "bool": true
}

Please note that “numer” is numeric and “list” contains a JSON list. One limitation that is still visible here is that list elements are not typed and are always strings when being reproduced using $(format-json), because list elements are not name-value pairs.

Associate type information with name-value pairs

It is not just JSON that can set types for name-value pairs, rewrite rules and db-parser() can also set them. In rewrite rules, set() can now take a type hint, and that type-hint gets associated with the value as its type:

#this makes $PID numeric
rewrite { set(int("$PID") value("PID")); };

Also, db-parser() would set type information depending on which parser we used extract the specific field. For instance @NUMBER@ would extract an integer.

Type information returned by macros and templates and template functions

Template functions will be able to return the type, depending on the function they perform. For instance the list handling functions like $(list-slice) would return a list. Numerical functions like $(+) would return numbers. Likewise, some macros are also being annotated with their types.

Template expressions as a whole also become typed, whenever we use an “simple” template expression (e.g. with just one ‘$’ reference, like “$PID”), the type of the template is inferred automatically and that type is propagated. If the inferred type is not correct, you can always use type-hints to “cast” the template expression to some other type.

When does it become available? When I can try it?

Since the typing behavior has the potential of changing the output in certain ways (e.g. produce a numeric value which used a string before), we are not turning this feature on automatically. As long as we are in the 3.x release train, it will stay disabled, even as parts of it are being merged. You can evaluate the feature by setting your config version (e.g. @version at the top of the config file), to 4.0, as shown with the example config above.

Then, as we release 4.0, the typing feature will be enabled by default for any configuration that uses @version: 4.0.

Most of the feature is already implemented, but not yet merged to the mainline yet. There are opened PRs on GitHub. 3.36 is expected to contain the first batch (e.g. JSON parser pieces), but not the complete change. I expect the changes to land in mainline in 1 or 2 extra release cycles, e.g. the end of April or end of June.

Stay tuned!