Select Page
syslog performance: scaling up before scaling out

syslog performance: scaling up before scaling out

The other day I was reading a blog post on handling syslog at scale back on’s blog. As you can imagine, syslog-ng has been used to solve syslog related challenges for a while now (24 years to be exact) and with that expertise I wanted to point out a few things in relation to that blog post. This might even become a series.

The blog post linked above, gives some advice how to scale syslog, in the section titled: Scaling Syslog the Right Way. Read the blog post for more details, but here’s my summary:

  • place the receiver (e.g. Cribl Stream) right next to your log sources (e.g. in the same data center)
  • scale out the receiver over many nodes so it becomes a cluster
  • make sure to deploy a load balancer in front

What this means in practice is that you will need a sizeable infrastructure to consume the logs of a syslog producing device. Since my assumption is that all data centers have such appliances, you will need to deploy this infrastructure in each of your data centres (to be close to the sources).

While I can see that load balancing clusters to process log data are important in some scenarios, I don’t think this use case should be one of them. A single node, potentially in a failover High Availability setup should suffice.

There’s a choice between scaling out vs scaling up a workload. Cribl recommends scaling it out. My take is that it should be possible to scale it up before scaling it out.

syslog-ng has a bag of tricks to make it fast even on a single node, thereby reducing hardware costs and operational complexities and the need for a load balancer.

  • it is implemented in C (compared to Cribl’s choice of TypeScript, fluentd and logstash are Ruby IIRC)
  • it avoids/minimizes copying of data while processing them, its log routing implementation uses copy-on-write semantics for cases where multiple paths potentially change the message in parallel paths
  • it avoids/minimizes memory allocations, in the simplest case it allocates 1 block of memory for 1 message
  • it uses an efficient asynchronous architecture, using epoll and one thread per CPU core
  • it uses a message representation where fields (aka: name-value pairs or properties) are stored in a packed block of memory, that is efficient to look up & serialize.
  • it uses internal queueing mechanisms that avoids lock contention and allows back-pressure to be applied selectively
  • it offers alternatives to regular expressions, as regexps are pretty slow to evaluate at volume

syslog-ng offers a domain specific language to route and manipulate messages, Cribl uses JavaScript.

I understand that this is an apples to oranges comparison. Cribl seems to have good UX. syslog-ng has good performance.

This is on my ~2018 laptop (Intel(R) Core(TM) i5-7440HQ CPU @ 2.80GHz, 4 cores), with a single destination writing all messages to disk, with syslog parsing enabled.

# single threaded sender
$ loggen -S -r 10000000 -s 300 –active-connections=1 -I 20 -P localhost 2000
average rate = 305506.81 msg/sec, count=6111711, time=20.0052, (average) msg size=304, bandwidth=90697.33 kB/sec

# sending on 10 threads
$ loggen -S -r 10000000 -s 300 –active-connections=10 -I 20 -P localhost 2000
average rate = 561537.95 msg/sec, count=11533347, time=20.5389, (average) msg size=304, bandwidth=166706.58 kB/sec

syslog-ng config:

@version: 3.38

log {
  source { tcp(port(2000) log-iw-size(10000) log-fetch-limit(1000) flags(syslog-protocol)) ; };
  destination { file("/install/foobar" log-fifo-size(10000)); };


Rounding up syslog-ng 4 and a practical introduction to typing

Rounding up syslog-ng 4 and a practical introduction to typing

syslog-ng 4 is right around the corner and the work on the topics I listed in this blog post are nearing completion. Instead of a pile of breaking changes, we choose to improve syslog-ng in an evolutionary manner: providing fine grained compatibility with older versions along the way, so that syslog-ng 4 remains a drop-in replacement for any earlier release in the past 15 years.

What evolution in this context means in practice is that features/changes are merged into 3.x releases as we are ready with them, but they are all hidden behind a feature flag: they all come disabled by default and to enable them, one needs to use `@version: 4.0` at the top of one’s configuration file. This process is detailed in Peter Czanik’s blog post with a couple of real-world examples.

The first set of changes went into 3.36.1 released in March, some more followed in 3.37.1 (and a related blog post) released in June  and 3.38.1 is any day now with most changes accumulated already on the “master” branch (nightly snapshots).

Hopefully, this is going to be the last 3.x release before 4.x is cut, but this also depends on feedback and issues that we might encounter in this cycle.

This release was focused primarily to get 4.0 ready and as such it concentrated pretty much on finishing up the typing feature. If you have already read the linked blog post, you might already be aware that we intend to associate runtime type information to any name-value pair that we encounter, so that we can 1) allow type aware comparisons in routing decisions, 2) reproduce the original types when sending a log message to consumers. I also expect this to be an important feature as we implement more features in our long term objectives (observability, app awareness, user friendliness).

Problems that type aware comparisons attempt to solve

Probably the most important change in 3.38.1 is the introduction of type aware comparisons. Traditionally syslog-ng had two kinds of comparison operators, just like shell scripts with the “test” builtin command.

  1. one to compare strings (eq, ne, gt, lt)
  2. one to compare numbers (==, !=, >, <)

Here’s an example:

log {
  source { file("/var/log/apache2/access.log"); };
  parser { apache-accesslog-parser(); };
  if ("${.apache.response}" >= "500") {

This example shows how to route HTTP 500 and above requests to a separate file. If you look at the if {} statement, you can see that we compared the HTTP response code to 500 and tried to capture anything that is higher than 500. But there’s a potential trap in here. ${.apache.response} is a string that contains a number. “500” in syslog-ng 3.x is also a string.

How this comparison is performed depends on the operator that we use: numeric (<, >, ==, !=) or string (lt, gt, eq, ne) focused.

If you look at the example again, you can see that we used the “numeric” operators, which means that the configuration above correctly performs the comparison, converts both “${.apache.response}” and “500” to numbers and compares them numerically.

But then, let’s see this example:

log {
  source { file("/var/log/apache2/access.log"); };
  parser { apache-accesslog-parser(); };
  if ("${.apache.httpversion}" == "1.0") {

Again, we are trying to compare a name-value pair against a literal string, this time checking for equality, and albeit version numbers are not strictly numeric, they are in this specific case. Also, using “==” as an operator as before.

But this example will do something that is pretty unexpected:

  • we used the numeric operators in the example, so syslog-ng would convert both “${.apache.httpversion}” and “1.0” to numbers
  • but the numeric operators only support INTEGERS, floating point numbers are not supported (actually syslog-ng uses the function atoi(3) for this conversion)
  • atoi() actually picks up the numbers before the “.” in the version number and converts that to an integer, this means “1.0” becomes 1 and “1.1” becomes 1 too!
  • so the comparison above would evaluate to TRUE to any value that starts with a 1 and then a non-digit character.
  • this means that all HTTP versions starting from 1.0 up to 1.9 end up in our file that we designated as one to hold 1.0 only traffic.

Not really the expected behaviour. But it becomes worse.

log {
  source { file("/var/log/apache2/access.log"); };
  parser { apache-accesslog-parser(); };
  if ("${.apache.request}" == "/wp-admin/login.php") {

This time, we are trying to filter our data based on a string comparison, and we erroneously used the numeric operator. This is what happens:

  • Neither “${.apache.request}” nor “/wp-admin/login.php” is numeric, they don’t even have digits in front of them
  • Both values are converted to 0.
  • Zero equals to zero, so the filter expression above is always TRUE.

There are other similar cases, the ugliest one when comparing a name-value pair to an empty string with numeric operators. Results are completely unexpected.

Type aware comparisons come to the rescue

I saw numerous cases where someone got the operator incorrect when trying to compare/match something in syslog-ng. I felt that the issue has never been a user error, rather we made a poor job of providing a user friendly syntax and thus we pushed too much responsibility on those attempting to make use of these features.

But solving these kind of design mistakes is never easy. Some of our users have figured this out already. We don’t want to break their configuration, right? But we want to make this easier, more intuitive for new users or new use-cases.

The solution we implemented was to make the numeric operators (==, !=, <, >) do the right thing. Based on the types of its arguments, it can in most cases infer what would be the right thing to do. So let’s help them there. We took some inspiration from JavaScript (which operates in a similar string-heavy environment) and implemented more intuitive rules for our – previously numeric only – comparisons.

Let’s see our previous examples:

@version: 4.0
log {  
  source { file("/var/log/apache2/access.log"); };
  parser { apache-accesslog-parser(); };
  if ("${.apache.response}" >= 500) {

If you compare this to my previous example, I have removed the quotes from around “500”. In syslog-ng 3.x, the quotes were mandatory. In 4.x, they are not. If you are not using quotes, the literal 500 becomes a numeric literal. And comparing a string to a number would compare those as numeric (e.g. just like JavaScript). We could even improve the Apache parser to make ${.apache.response} a number as it parses its input, but to do the right thing, it is enough that one side of the comparison is numeric.

Next example:

@version: 4.0
log {
  source { file("/var/log/apache2/access.log"); };
  parser { apache-accesslog-parser(); };
  if ("${.apache.httpversion}" == "1.0") {

I haven’t changed anything in this example, both “${.apache.httpversion}” and “1.0” are strings and they are compared as strings. So this time, only HTTP/1.0 would be routed to our logfile. 1.1 or even 1.9 would be  filtered out, as expected. We could use floating point based comparisons if we wanted to by removing the quotes (just like in the previous example) or by using explicit type-casting:

if ("${.apache.httpversion}" == 1.0)

if (double("${.apache.httpversion}") == "1.0")

Type casting can be applied anywhere where we used template strings before to apply a type to the result of the template expansion.

And here’s the third example:

@version: 4.0
log {
  source { file("/var/log/apache2/access.log"); };
  parser { apache-accesslog-parser(); };
  if ("${.apache.request}" == "/wp-admin/login.php") { 

Again, no changes necessary. Both sides are strings, we are comparing as strings. No need to use the “eq” operator. Just one set of operators and sometimes explicit type-casts will cover all use-cases. For compatibility reasons, the old “string” operators (eq, ne, lt, gt) remain to be available, but I hope we can forget those eventually.

Other typing related changes

This section briefly lists the various components that we needed to adapt to typing. These changes happened since 3.36.1 was released but not explicitly announced in those versions. Let me know if you are interested in any of these topics in more detail, probably there are a couple of blog posts worth of content here:

  • type aware comparisons in filter expressions: as detailed above, the previously numeric operators become type aware and the exact comparison performed will be based on types associated with the values that we compare.
  • json-parser() and $(format-json): JSON support is massively improved with the introduction of types. For one: type information is retained across input parsing->transformation->output formatting. JSON lists (arrays) are now supported and are converted to syslog-ng lists so they can be manipulated using the $(list-*) template functions. There are other important improvements in how we support JSON.
  • set(), groupset(): in any case where we allow the use of templates, support for type-casting was added and the type information is properly promoted.
  • db-parser() type support: db-parser() gets support for type casts, <value> assignments within db-parser() rules can associate types with values using the type-casting syntax, e.g. <value name=”foobar”>int32($PID)</value>. The “int32” is a type-cast that associates $foobar with an integer type. db-parser()’s internal parsers (e.g. @NUMBER@) will also associated type information with a name-value pair automatically.
  • add-contextual-data() type support: any new name-value pair that is populated using add-contextual-data() will propagate type information, similarly to db-parser().
  • map-value-pairs() type support: propagate type information
  • SQL type support: the sql() driver gained support for types, so that columns with specific types will be stored as those types.
  • template type support: templates can now be casted explicitly to a specific type, but they also propagate type information from macros/template functions and values in the template string
  • value-pairs type support: value-pairs form the backbone of specifying a set of name-value pairs and associated transformations to generate JSON or a key-value pair format. It also gained support for types, the existing type-hinting feature that was already part of value-pairs was adapted and expanded to other parts of syslog-ng.
  • on-disk serialized formats (e.g. disk buffer/logstore): we remain compatible with messages serialized with an earlier version of syslog-ng, and the format we choose remains compatible for “downgrades” as well. E.g. even if a new version of syslog-ng serialized a message, the old syslog-ng and associated tools will be able to read it (sans type information of course)
syslog-ng 3.37 released

syslog-ng 3.37 released

syslog-ng 3.37 has just been released, packages available in various platforms this week. You can get the detailed release notes on the github releases page, however I felt this would be a good opportunity to revisit my draft on the syslog-ng long term objectives and how this release builds in that direction.

The Edge: deployment and CI/CD

Being better at the edge means that we need to improve support for use-cases where syslog-ng is directly deployed on the node/server or is deployed close to such nodes or servers. One way to deploy syslog-ng is to use a .deb or .rpm package, but more and more syslog-ng is used in a container. Our production docker image is built based on Debian. Creating this image has been a partially manual process with all the issues that this entails.

With the merge of PR #4014 and #4003, Attila Szakács automated the entire workflow in a beautiful set of GitHub Action scripts, so that:

  • Official source and binary packages (for CentOS, Debian, Fedora and Ubuntu) are built automatically, once a syslog-ng release is tagged
  • The production docker image is built and pushed automatically, once the required binary packages are successfully built.

While we have pretty good, automated unit and functional tests, we did not test the installation packages themselves. Until now. András Mitzky implemented a smoke tests for the packages themselves, doing an install & upgrade and a start-stop test.

The Edge: Kubernetes

Increasingly, the edge is often running on an orchestrated, container based infrastructure, such as Kubernetes. Using syslog-ng in these systems were possible but required manual integration. With the merger of PR #4015, this is becoming more out of the box, expect another blog post on this in the coming days.

Application awareness

syslog is used as an infrastructure for logging serving a wide variety of applications. For these applications, logging is not a primary concern, unfortunately. The consequence is that they often produce invalid or incorrect data. To handle these applications well, we need to cater for these issues.

For instance, certain Aruba products use a timestamp like this:

2022-03-10 08:04:08,449

Looking at this, the problem might not even be apparent: it uses a comma to separate seconds from the fractions part.

You might argue that this is not an important problem at all, who needs fractions anyway?

There are two issues with this:

  1. Fractions might be important to some (e.g. for ordering with thousands of logs per second).
  2. It breaks the parsing the message itself (as the timestamp is embedded in a larger message), causing message related metadata to be incorrectly extracted (e.g. which device you want to attribute this message). This means that your dashboard in a SIEM may miss vital information.

And this is not the only similar case. See for this pull request for example for a similar example.

This is exactly why application awareness is important, fixing these cases means that your log data becomes more usable as a whole.

Usually it is not the programming of the solution that is difficult here, rather the difficulty lies in having to learn that the problem exists in the first place. If you have a similar parsing problem, please let us know by opening a GitHub issue. The past few such problems were submitted to us by the Splunk Connect for Syslog team, thanks for their efforts. Btw, sc4s is great if you want to feed syslog to Splunk and it uses syslog-ng internally.

On a similar note, we have improved the cisco-parser() that extracts fields from Cisco gear and added a parser for MariaDB audit logs. Both of these parsers are part of our app-parser() framework.


There are a few other features I find interesting, just a short summary

  • Type support is nearing completion. We added support for types in template expressions, groupset() & map-value-pairs().
  • We improved syslog-ng’s own trace messages: we added the unique message ID (e.g. $RCPTID) as a tag in all message related trace messages, so that you can correlate trace messages to a specific message. We also included type information as a part of the type support effort.
  • We improved handling of list/array like data in this pull request.
  • We extended our set of TLS options by adding support for sigalgs & client-sigalgs.


Survey on syslog-ng objectives still open…

Survey on syslog-ng objectives still open…

In my last post, I enumerated the long term objectives I distilled from the discussions I had earlier with some syslog-ng users. Thanks for everyone who responded to that and/or filled out the survey, very insightful responses, something to work from. The survey is still open, but as always, the more the better so some more responses would be very much appreciated.

You might be operating a logging infrastructure, or you might be in the process of deploying one, it does not matter: if your job or project involves logging and solving the problems it entails I’d like to know your opinion.


syslog-ng on the long term: a draft on strategic directions

syslog-ng on the long term: a draft on strategic directions

I made a promise some posts ago that I would use this blog both for collecting feedback and to provide information about potential next steps ahead of syslog-ng. In the same post, I also promised that you, the syslog-ng community, would have a chance to steer these directions. Please read on to find out how to do that.

In the past few weeks I performed a round of discussions/interviews with syslog-ng users. I also spent time looking at other products and analyst reports on the market. Based on all this information I’ve come up with a list of potential strategic directions for syslog-ng to tackle. Focusing on these and prioritizing features that fall into one of these directions ensures that syslog-ng indeed moves ahead.

When I performed similar goal setting exercises in my previous CTO role at Balabit, our team made something similar:

  1. brainstorming on potential directions,
  2. drafting up a cleaned up conclusion document,
  3. validating that the document is a good summary of the discussion and
  4. validating via customers that they are indeed a good summary of what the customers need.

syslog-ng is an Open Source project, so I wanted to involve the community somehow. Organizing a brainstorming session sounds difficult on-line (do you know good solutions for this?). So I wanted to create an opportunity to talk with the broad community about my thoughts somehow, in a way that leads to a useful conclusion. This is the primary intent behind this post.

Once you read the directions below, please think about if you agree with my choice of directions here! Are these indeed the most important things? Have I missed something? Do you have something in mind that should be integrated somehow? Which of the directions do you consider the most important?

Please give your feedback via this form, write a comment  on the blog or drop me an email. Thanks.

1. The Edge

syslog-ng has traditionally been used as a tool for log aggregation, e.g. working on the server side. That’s why its CPU and memory usage has always been in focus. Being able to consume a million (sometimes millions!) of messages a second is important for server use-cases, however I think that in exchange for this focus, syslog-ng has neglected the other side of the spectrum: the Edge.

The Edge is where log messages are produced by infrastructure and applications and then sent away to a centralized logging system.

syslog-ng trackles the original “syslogd-like” deployment scenarios on the Edge, but lacks features/documentation that make it easy to deploy it in a more modern setting, e.g. as a part of a Kubernetes cluster or as a part of a cloud-native application.

Apart from the deployment questions, I consider The Edge to be also important for improving data quality and thus improving the usefulness of collected log data. I see that in a lot of cases today, log data is collected without associated meta-information. And without that meta information it becomes very difficult to understand the originating context of said log data, limiting the ability to extract insights and understanding from logs.

These are the kind of features that fall into this bucket, in no particular order:

  • Transport that is transparently carrying metadata as well as log data, plus multi-line messages (this is probably achieved by EWMM already)
  • Kubernetes (container logs, pod related meta information, official image)
  • Document GCP/AWS/Azure deployments, log data enrichment
  • non-Linux support (Windows and other UNIXes)
  • Fetch logs from Software as a Service products
  • etc

2. Cloud Native

The cloud is not just a means to deploy our existing applications to a rented infrastructure. It is a set of engineering practices that make developing applications faster and more reliable. Applications are deployed as a set of microservices, each running in its own container, potentially distributed along a cluster of compute nodes. Components of the applications managed via some kind of container orchestration system, such as Kubernetes.

Being friendly to these new environments is important, as new applications are increasingly using this paradigm.

Features in this category:

  • Container images for production
    • as a logging side-car to collect app logs and transfer them to the centralized logging function or
    • as an application specific, local logging repository (e.g. app specific server)
  • HTTP ingestion API
    • these apps tend to communicate using HTTP, so it is more native to use that even for log ingestion
    • maybe provide compatibility with other aggregation solutions (Elastic, Splunk, etc)
  • Object Storage support
  • Stateless & persistent queueing (kafka?)
  • etc

3. Observability

The term observability roots in control theory, however it is increasingly applied to the operations of IT systems. Being observable in this context means that the IT system provides an in-depth view into its inner behaviours, making it simpler to troubleshoot problems or increase performance. Observability today often implies three distinct types of data: metrics, traces and logs.

I originally met this term in relation to Prometheus, an Open Source package that collects and organizes application specific metrics in a manner that easily adapts to cloud native, elastic workloads. Traditional monitoring tools (such as Zabbix or Nagios) require a top-down, manual configuration, while Prometheus reversed this concept and pushed this responsibility to application authors. Applications should expose their important metrics so that application monitoring works “out-of-the-box”. This idea quickly gained momentum as manually configuring monitoring tools to adapt automatically scaled application components is pretty much impossible.

Albeit observability originally comes from the application monitoring space, its basic ideas can be extended to cover traces and logs as well.

Features in this category:

  • Being observable: provide a prometheus exporter so that we can become observable out-of-the-box
  • Interoperate with Observability platforms
    • Loki destination
    • Support for OpenTelemetry (source and destination)
    • convert logs from metrics/traces and vice-versa

4. Application awareness

syslog has been a great invention: it has served us in the last 40-45 years and its importance continues into the future. Operating systems, network devices, IoT, applications, containers, container orchestration systems can all push their log data to syslog. For some of those, using syslog is the only option.

In a way syslog is the common denominator of all log producing IT systems out there and as such it has become the shared infrastructure to carry logs in a lot of environments.

In my opinion, the success of syslog stems from the simplicity of using it: just send a datagram to port 514 and you are done. However this simplicity is also its biggest limitation: it is under-specified. There have been attempts at standardization (RFC3164 and RFC5424) but these serve more as “conventions” than standards.

The consequence is that incompatible message formats limit the usefulness of log data, once collected in a central repository. I regularly see issues such as:

  • unparseable and partial timestamps
  • missing or incorrect timezone information
  • missing information about the application’s name (e.g. $PROGRAM) or hostname
  • incorrectly framed multi-line messages
  • key=value data that is in a format downstream systems are unable to parse

Sometimes it’s not the individual log entry that is the problem, rather the overly verbose logging format that becomes difficult to work with once you start using it for dashboards/queries:

  • The Linux audit system produces very verbose, multi-line logs about a single OS operation
  • Mail systems emit multiple log entries for a single email transaction, sometimes a separate log entry for each attachment.
  • etc

syslog-ng has always been good in the various heuristics to properly extract information even from incorrectly formatted syslog messages, however there are extreme cases where applications omit crucial information or use a syntax so far away from the spec that even syslog-ng is unable to parse the data correctly.

Application awareness in this context means the ability of fixing up the syslog parsing with the knowledge of the application that produced it. It is difficult to craft heuristics that work with all incorrect formats, however once we start with identifying the application, then we can correctly determine what the log message was intended to look like. Fixing these issues before the message hits a consumer (e.g. SIEM) helps a lot in actually using the data we store.

Also, being application aware also implies that log routing decisions can become policy aware. “Forward me all the security logs” is a common request from any security department. However actually doing this is not simple: what should constitute as “security”? Being application aware means that it becomes possible to classify based on applications instead of individual log messages.

Features in this category:

  • classifying incoming logs per application (e.g. app-parser() and its associated application adapters)
  • fix incoming logs and make them formatted in a way that becomes easier to handle by downstream consumers (timestamps, multi-line messages, etc.)
  • translate incoming logs into a format that a downstream system best understands

5. User friendliness

syslog-ng is a domain specific language for log management. Its performance is a crucial characteristic, but the complexity of operations performed by syslog-ng, still within the log management layer has grown tremendously. Making syslog-ng easier to understand, errors and problems easier to diagnose is important in order to deal with this complexity. Having first class documentation is also important for it to succeed in any of these directions, described above.

So albeit not functionality by itself, I consider User friendliness a top-priority for syslog-ng.

Features in this category:

  • syntax improvements can go a long way of adopting a feature. syslog-ng has always been able to do conditional routing of log messages however if()/elif()/else went a long way in getting it adopted. There are other potential improvements in the syntax that could help reading/writing syslog-ng configurations easier.
  • configuration diagnostics: better location reporting in error messages, warnings, etc.
  • interactive debuggability: as syslog-ng is applied to more complex problems, the related configuration becomes more complex too. Today, you have to launch syslog-ng in foreground, inject a message and try to follow its operations using the builtin trace messages. Interactive debugging would go a long way in making the writing and testing these functionalities.

Those are roughly the directions I have in mind for the future of syslog-ng. If you disagree or have some comments, please provide feedback via the form at: