Select Page
syslog-ng, the next chapter: Axoflow

syslog-ng, the next chapter: Axoflow

I haven’t recently posted here, and the reason for that is that I have started Axoflow, a new startup in the observability space that took a lot of energy to get off the ground. We’ve closed our initial round of funding, which makes this a perfect time to introduce what we are set out to do.

Most of the innovation on how logs are processed and managed happen under the umbrella of “observability” these days: logs becoming one pillar of the logs/metrics/traces triumvirate, a combination often called telemetry data.

What I found is that mainstream observability focuses very much on the operational aspects of running applications and less so on the security element. Also, I find that in a lot of cases, enterprises tend to have separate teams for security (reporting to the CISO/CIO) and observability (reporting to the development organization/CTO).

At Axoflow, we know both sides of these coins: I am very much part of the “security camp”: syslog-ng being the #1 solution to feed the enterprise SIEM (Splunk and the like). The background of my co-founders is the observability space, having created logging operator (at Banzai Cloud and Cisco) and having run global, web-scale applications (at Ustream and IBM).

Based on our combined expertise, we are building a product that helps enterprises run their combined logging/observability pipelines.

Enterprises have invested millions of their security budget into deploying or running their log management infrastructures that feed their SIEM. They are also investing millions into deploying observability products to help running their applications. Yet, these systems are often separate. Their operation often manual and error prone.

This is where Axoflow is set out to help. We are building a management plane that greatly simplifies the operation of these telemetry pipelines (or shows where they are leaking) and also allows the separation between security and observability to be torn down.

You could ask how this relates to syslog-ng and this blog: even though Axoflow is a vendor agnostic management plane for telemetry pipelines, the data plane carrying the actual data is crucial as well. That’s where syslog-ng plays a role: we consider syslog-ng to be our open source, reference data plane that we use to show-case our management functionality. syslog-ng is also a great stepping stone to plug Axoflow in: Axoflow is a simple add-on to your existing pipeline and you immediately get the benefits, without the risks involved in changing a critical, functional system in a major way.

 

syslog-ng 4 improves Python support

syslog-ng 4 improves Python support

It’s been a while since I personally acted as the release manager for a syslog-ng release, the last such release was 3.3.1 back in October 2011. v3.3 was an important milestone, as that was the version that introduced threaded mode and came with a completely revamped core architecture to scale up properly on computers that had multiple CPUs or cores. I released syslog-ng 4.0.1 a couple of weeks ago which brings with it the support for runtime typing, which is a significant conceptual improvement.

Apart from typing, which I have discussed at length already, the release sports important additions and improvements in syslog-ng’s support for Python, which I would like to zoom into a bit in this post.

In case you are not aware, syslog-ng has allowed you to write source and destination drivers, parsers and template functions in Python for a while now. See this post on writing a source in general and this one for writing an HTTP source.

There was one caveat in using Python though: while it was easy to extend an existing configuration and relatively easy to deploy these in a specific environment, syslog-ng lacked the infrastructure to merge such components into syslog-ng itself and expose this functionality as if it was implemented natively. For instance, to use the Python based HTTP source described in the blog post I mentioned above, you needed to write something like this to use the Python based http source:

source s_http {
    python(
      class("httpsource_v2.HTTPSource")
      options("port", "8081")
    );
};

As you can see, this syntax is pretty foreign, at least if you compare this to a native driver that would look like this:

source s_http {
    http(port(8081));
};

A lot simpler, right? Apart from configuration syntax, there was another shortcoming though: Python code usually relies on 3rd party libraries, usually distributed using PyPI and installed using pip. Up to 4.0.0, one needed to take care about these dependencies manually. The http source example above needs you to install the “python3-twisted” package using dnf/apt-get or pip manually and only then would you be able to use it.

These short-comings are all addressed in the 4.0.0 release, so that:

  • 3rd party libraries are automatically managed once you install syslog-ng.
  • you can use native configuration syntax,
  • we can ship Python code as a part of syslog-ng,

Let’s break these down one-by-one.

Managing 3rd party Python dependencies

From now on, syslog-ng automatically creates and populates a Python virtualenv to host such 3rd party dependencies. This virtualenv is located in ${localstatedir}/venv, which expands to /var/lib/syslog-ng/venv normally. The virtualenv is created by a script named syslog-ng-update-virtualenv, which is automatically run at package installation time.

The list of packages that syslog-ng will install into this virtualenv is described by /usr/lib/syslog-ng/python/requirements.txt.

If you want to make further libraries available (for instance because your local configuration needs it), you can simply use pip to install them:

$ /var/lib/syslog-ng/python-venv/bin/pip install <pypi package>

syslog-ng will automatically activate this virtualenv at startup, no need to explicitly activate it before launching syslog-ng.

Using this mechanism, system installed Python packages will not interfere with packages that you need because of a syslog-ng related functionality.

Native configuration syntax for Python based plugins using blocks.

There are two ways of hiding the implementation complexities of a Python based component, in your configuration file:

  • using blocks to wrap the python() low level syntax, described just below
  • using Python based config generators, described in the next section

Blocks have been around for a while, they basically allow you to take a relatively complex configuration snippet and turn it into a more abstract component that can easily be reused. For instance, to allow using this syntax:

source s_http {
    http(port(8081)); 
};

and turn it into a python() based source driver, you just need the following block:

block source http(port(8081)) {
   python(class("httpsource_v2.HTTPSource")
          options("port", "`port`") );
}

The content of the block will be substituted into the configuration, whenever the name of the block is encountered. Parameters in the form of param(value) will be substituted using backticks.

In simple cases, using blocks provides just enough flexibility to hide an implementation detail (e.g. that we used Python as the implementation language) and also hides redundant configuration code.

Blocks are very similar to macros as used in other languages. This term was unfortunately already taken in the syslog-ng context, that’s why it has been named differently.

Blocks are defined in syslog-ng include files, these include files you can store as an “scl” subdirectory of the Python module.

Native configuration syntax for Python based plugins using configuration generators.

Sometimes, blocks are insufficient to properly wrap our desired functionality. Sometimes you need conditionals, in other cases you want to use a more complex mechanism or a template language to generate part of the configuration. That you can do using configuration generators.

Configuration generators have also been around for a while, but until now they were only available using external shell scripts (using the confgen module), or restricted to be used from C, syslog-ng’s base language. The changes in 4.0 allow you to write generators in Python.

Here’s an example:

@version: 4.0
python {

from syslogng import register_config_generator
def generate_foobar(args):
    print(args)
    return "tcp(port(2000))"
#
# this registers a plugin in the "source" context named "foobar"
# which would invoke the generate_foobar() function when a foobar() source
# reference is encountered.
#
register_config_generator("source", "foobar", generate_foobar)

};
log {
    # we are actually calling the generate_foobar() function in this
    # source, passing all parameters as values in the "args" dictionary
    source { foobar(this(is) a(value)); };
    destination { file("logfile"); };
};


syslog-ng will automatically invoke your generate_foobar() function whenever it finds a “foobar” source driver and then takes the return value for that function and substitutes back to where it was found. Parameters are passed around in the args parameter.

Shipping Python code with syslog-ng

Until now, Python was more of an “extended” configuration language, but with the features described above, it can actually become a language to write native-looking and native-behaving plugins for syslog-ng, therefore it becomes important for us to ship these.

To submit a Python implemented functionality to syslog-ng, just open a PR that places the new Python code into the modules/python-modules/syslogng/modules subdirectory. This will get installed as a part of our syslog-ng-python package. If you have 3rd party dependencies, just include them in the setup.py and requirements.txt files.

If you need an example how to use the new Python based facilities, just look at the implementation of our kubernetes() source.

syslog performance: scaling up before scaling out

syslog performance: scaling up before scaling out

The other day I was reading a blog post on handling syslog at scale back on cribl.io’s blog. As you can imagine, syslog-ng has been used to solve syslog related challenges for a while now (24 years to be exact) and with that expertise I wanted to point out a few things in relation to that blog post. This might even become a series.

The blog post linked above, gives some advice how to scale syslog, in the section titled: Scaling Syslog the Right Way. Read the blog post for more details, but here’s my summary:

  • place the receiver (e.g. Cribl Stream) right next to your log sources (e.g. in the same data center)
  • scale out the receiver over many nodes so it becomes a cluster
  • make sure to deploy a load balancer in front

What this means in practice is that you will need a sizeable infrastructure to consume the logs of a syslog producing device. Since my assumption is that all data centers have such appliances, you will need to deploy this infrastructure in each of your data centres (to be close to the sources).

While I can see that load balancing clusters to process log data are important in some scenarios, I don’t think this use case should be one of them. A single node, potentially in a failover High Availability setup should suffice.

There’s a choice between scaling out vs scaling up a workload. Cribl recommends scaling it out. My take is that it should be possible to scale it up before scaling it out.

syslog-ng has a bag of tricks to make it fast even on a single node, thereby reducing hardware costs and operational complexities and the need for a load balancer.

  • it is implemented in C (compared to Cribl’s choice of TypeScript, fluentd and logstash are Ruby IIRC)
  • it avoids/minimizes copying of data while processing them, its log routing implementation uses copy-on-write semantics for cases where multiple paths potentially change the message in parallel paths
  • it avoids/minimizes memory allocations, in the simplest case it allocates 1 block of memory for 1 message
  • it uses an efficient asynchronous architecture, using epoll and one thread per CPU core
  • it uses a message representation where fields (aka: name-value pairs or properties) are stored in a packed block of memory, that is efficient to look up & serialize.
  • it uses internal queueing mechanisms that avoids lock contention and allows back-pressure to be applied selectively
  • it offers alternatives to regular expressions, as regexps are pretty slow to evaluate at volume

syslog-ng offers a domain specific language to route and manipulate messages, Cribl uses JavaScript.

I understand that this is an apples to oranges comparison. Cribl seems to have good UX. syslog-ng has good performance.

This is on my ~2018 laptop (Intel(R) Core(TM) i5-7440HQ CPU @ 2.80GHz, 4 cores), with a single destination writing all messages to disk, with syslog parsing enabled.

# single threaded sender
$ loggen -S -r 10000000 -s 300 –active-connections=1 -I 20 -P localhost 2000
average rate = 305506.81 msg/sec, count=6111711, time=20.0052, (average) msg size=304, bandwidth=90697.33 kB/sec

# sending on 10 threads
$ loggen -S -r 10000000 -s 300 –active-connections=10 -I 20 -P localhost 2000
average rate = 561537.95 msg/sec, count=11533347, time=20.5389, (average) msg size=304, bandwidth=166706.58 kB/sec

syslog-ng config:

@version: 3.38

log {
  source { tcp(port(2000) log-iw-size(10000) log-fetch-limit(1000) flags(syslog-protocol)) ; };
  destination { file("/install/foobar" log-fifo-size(10000)); };
};

 

Survey on syslog-ng objectives still open…

Survey on syslog-ng objectives still open…

In my last post, I enumerated the long term objectives I distilled from the discussions I had earlier with some syslog-ng users. Thanks for everyone who responded to that and/or filled out the survey, very insightful responses, something to work from. The survey is still open, but as always, the more the better so some more responses would be very much appreciated.

You might be operating a logging infrastructure, or you might be in the process of deploying one, it does not matter: if your job or project involves logging and solving the problems it entails I’d like to know your opinion.

Thanks

syslog-ng distribution and support bottleneck

syslog-ng distribution and support bottleneck

I find that a lot of syslog-ng deployments are lagging behind and are using ancient versions. It has become difficult for me to get these deployments to more recent versions. No product is able to improve and cover new ground in a situation like this…

Being ancient is a relative term: for instance, in the JavaScript world it is considered ancient if you are using a framework that was initially released two or more years ago. New hypes and incompatible rewrites are published at a pace which makes the JavaScript ecosystem difficult to follow.

Maintaining this change velocity in the log management space is not feasible. Deploying a log management and processing infrastructure from scratch can literally take years just one time. Swapping out technologies every now and then on a whim would mean that the project never reaches the goals it was set out to achieve.

With that I said I still think that being able to regularly push out updates to deployments is an important bottleneck to solve for any product to be sustainable. This is needed for both the feature front (e.g. addressing new use-cases) and on the support perspective (e.g. fixing bugs).

I often get questions about syslog-ng 3.5.6. This release was originally published 5th August 2014, roughly 8 years ago, and happens to be part of EPEL7. syslog-ng is included in BMW i3 vehicles, this video shows the listing of open source components on the infotainment screen, The BMW Open Source DVD contains syslog-ng 3.4.7, a whooping fresh release from December 2013. There are similar stories with syslog-ng included in products or an OS release, usually with pretty old versions.

Why does this happen?

Due to the early adoption of syslog-ng, it was included in a number of Linux distributions and BSDs/UNIXes, even became default in some of them. I considered this a great success.

For none of these distributions however is log management a central question. They each need some kind of log daemon, but that’s it. Whether that log daemon is syslogd, rsyslog or syslog-ng does not really matter. Neither matters their actual version number. So even though distributions helped initial syslog-ng adoption, they have become a bottleneck in delivering new releases to users.

Users can still upgrade, right?

Enterprise users (and products that embed Linux and syslog-ng) pick an OS version and plan with it for ~10 years. Unfortunately they deploy syslog-ng as a part of the OS and expect the OS vendor to provide support. Often, the sysadmin responsible for log management is not even allowed to upgrade. Some claim that upgrading syslog-ng would violate their support terms, causing the entire OS to become unsupported.

So even though more recent versions of syslog-ng includes functionality or fixes they need, they stick to the old version and try to work around any issues they find.

The support from the OS vendor for the logging component is questionable at best and is restricted only for the most basic use-cases, not cases where syslog-ng would play an important role in one’s infrastructure. Just as log management is not a central focus for the OS, neither is it for the support team behind the OS. They would fix security issues, should they be reported, but otherwise they will just continue to use what they have.

Solution: state of the art binaries to pick from

Building the latest version of syslog-ng for your enterprise distro on your own is not for the faint of heart. Even though 20 years ago, building your own kernel or application was an essential part of a sysadmin’s job on any UNIX, this is not true any more.

Also, it was a lot easier to build syslog-ng in 2001, today we have so many integrations that pulling all the build dependencies (and the right versions) is far from trivial.

We worked hard in the past years to resolve this issue and today syslog-ng is not only available in source format. There are a number of options today to pick from, should you want to use the latest and greatest:

Over time, the building of bespoke/customized packages has become much easier too, this blog post explains it all.

So what’s your excuse? I am really interested if the options above suffice. Do you still use an old syslog-ng version? Why? Would any of the above work for you? If not, What would YOU need to upgrade syslog-ng to recent versions? And what would you need to change your processes to plan for upgrades regularly?

If you have a response to any of these questions, please post it as comment below or drop me an email. Thanks.

syslog-ng-future.blog? Is this a fork or what?

syslog-ng-future.blog? Is this a fork or what?

I mentioned in the previous post that I would like to focus on syslog-ng and put it more into the spotlight. I also mentioned that Balabit, the company I was a founder of and the commercial sponsor behind syslog-ng, was acquired by One Identity ~4 years ago. How does this add up? Who owns the Intellectual Property (IP) for syslog-ng? Who am I or this blog affiliated with?

I felt this post was important to set things straight and make it easier to understand my motivation. If you are not much into Free Software and Open Source licenses or not interested too much in administrative nuisances of FLOSS projects, feel free to skip this post.

First of all, the IP in syslog-ng that Balabit owned originally, was transferred to the acquirer, One Identity. This includes:

  • copyright on the documentation and parts of the codebase
  • trademarks, website, marketing stuff

The good news in here is, that not all of the codebase is copyrighted by its commercial sponsor. Back in the Balabit days we enacted a change of the licensing regime in 2010 as described here: bazsi.blogspot.com/2010/07/syslog-ng-contributions-redefined.html.

With this change we’ve stopped requiring signed CLAs (Contributor’s License Agreements) whenever someone contributed to syslog-ng. This means that the copyright of any outside contributions would be retained by the contributor and not assigned to Balabit or its successors.

Over the years, many such outside contributions were merged into the syslog-ng codebase, meaning that the code today is owned by many different individuals and companies. Copyrights of files are tracked by the tests/copyright/policy file in the source tree: you can note that there are some files that are external contributions in their entirety. Some files have mixed copyrights, partly owned by One Identity, partly by the outside contributor.

Since the license syslog-ng uses is a combination of GPL + LGPL, the combined work as such is free software forever. The GPL/LGPL warrants that anyone can get the source code and be able to change it in any way he or she wishes. The sole requirement of the license is that should you distribute syslog-ng to any 3rd parties, you would need to disclose that you were using GPLed code and offer the source code along with any changes you have made.

The same rules apply to the commercial sponsor as well: as long as it relies on the work that was produced by external contributors over the years, they will need to publish any changes to syslog-ng they create. The only way out would be to rip out all code created by external contributions.

This means that syslog-ng today is a truly open source project, at least from the licensing perspective

But there’s another perspective, namely whether there is an active developer community that adds new features and publishes new releases.

Fortunately, this also holds true. The syslog-ng project lives on https://github.com/syslog-ng/syslog-ng, contributions are transparently managed in GitHub pull requests, either if created by One Identity or someone else in the community. Regular releases are produced on a 8-10 weeks cadence and are published both in source + binary formats and a docker image.

With all of the explanations above, the status of this site and myself can easily be explained: I am not affiliated with One Identity (the successor of Balabit) in any way. I am an individual who contributes time and energy to the open source syslog-ng project, just as I did the same in the last ~24 years.

This is a personal blog, related to syslog-ng and producing useful fixes and features for syslog-ng as an open source project. I am not sponsored or endorsed by One Identity. I am here to help finding out where syslog-ng should go next.

The consensus of the FLOSS community is that a project is only considered truly open source if the developer base/IP is not concentrated within a single company/organization. The reason for this understanding is simple: if there’s only one such entity then that entity has too much control/power over the project. If the company goes bankrupt or changes hands, then priorities might change in a way that causes the open source project to suffer. The long term sustainability of an open source project hinges on the breadth of its contributor base: the broader it is, the more likely it is that the open source project can outlive its creators or commercial sponsors.

The dependence of the syslog-ng project on Balabit as its commercial sponsor has been an issue since ~2009 and probably the reason why it has not become the default logging daemon in Fedora, thus RedHat Enterprise Linux. I don’t think the inclusion in Linux distributions is the prime venue of competition today as it once was. But this decision by Fedora still hurts. 🙂

In a way, the acquisition of Balabit, and my departure from One Identity later allows syslog-ng to become a truly independent-, sustainable open source project.

Stay tuned!