Select Page
syslog-ng 4 improves Python support

syslog-ng 4 improves Python support

It’s been a while since I personally acted as the release manager for a syslog-ng release, the last such release was 3.3.1 back in October 2011. v3.3 was an important milestone, as that was the version that introduced threaded mode and came with a completely revamped core architecture to scale up properly on computers that had multiple CPUs or cores. I released syslog-ng 4.0.1 a couple of weeks ago which brings with it the support for runtime typing, which is a significant conceptual improvement.

Apart from typing, which I have discussed at length already, the release sports important additions and improvements in syslog-ng’s support for Python, which I would like to zoom into a bit in this post.

In case you are not aware, syslog-ng has allowed you to write source and destination drivers, parsers and template functions in Python for a while now. See this post on writing a source in general and this one for writing an HTTP source.

There was one caveat in using Python though: while it was easy to extend an existing configuration and relatively easy to deploy these in a specific environment, syslog-ng lacked the infrastructure to merge such components into syslog-ng itself and expose this functionality as if it was implemented natively. For instance, to use the Python based HTTP source described in the blog post I mentioned above, you needed to write something like this to use the Python based http source:

source s_http {
    python(
      class("httpsource_v2.HTTPSource")
      options("port", "8081")
    );
};

As you can see, this syntax is pretty foreign, at least if you compare this to a native driver that would look like this:

source s_http {
    http(port(8081));
};

A lot simpler, right? Apart from configuration syntax, there was another shortcoming though: Python code usually relies on 3rd party libraries, usually distributed using PyPI and installed using pip. Up to 4.0.0, one needed to take care about these dependencies manually. The http source example above needs you to install the “python3-twisted” package using dnf/apt-get or pip manually and only then would you be able to use it.

These short-comings are all addressed in the 4.0.0 release, so that:

  • 3rd party libraries are automatically managed once you install syslog-ng.
  • you can use native configuration syntax,
  • we can ship Python code as a part of syslog-ng,

Let’s break these down one-by-one.

Managing 3rd party Python dependencies

From now on, syslog-ng automatically creates and populates a Python virtualenv to host such 3rd party dependencies. This virtualenv is located in ${localstatedir}/venv, which expands to /var/lib/syslog-ng/venv normally. The virtualenv is created by a script named syslog-ng-update-virtualenv, which is automatically run at package installation time.

The list of packages that syslog-ng will install into this virtualenv is described by /usr/lib/syslog-ng/python/requirements.txt.

If you want to make further libraries available (for instance because your local configuration needs it), you can simply use pip to install them:

$ /var/lib/syslog-ng/python-venv/bin/pip install <pypi package>

syslog-ng will automatically activate this virtualenv at startup, no need to explicitly activate it before launching syslog-ng.

Using this mechanism, system installed Python packages will not interfere with packages that you need because of a syslog-ng related functionality.

Native configuration syntax for Python based plugins using blocks.

There are two ways of hiding the implementation complexities of a Python based component, in your configuration file:

  • using blocks to wrap the python() low level syntax, described just below
  • using Python based config generators, described in the next section

Blocks have been around for a while, they basically allow you to take a relatively complex configuration snippet and turn it into a more abstract component that can easily be reused. For instance, to allow using this syntax:

source s_http {
    http(port(8081)); 
};

and turn it into a python() based source driver, you just need the following block:

block source http(port(8081)) {
   python(class("httpsource_v2.HTTPSource")
          options("port", "`port`") );
}

The content of the block will be substituted into the configuration, whenever the name of the block is encountered. Parameters in the form of param(value) will be substituted using backticks.

In simple cases, using blocks provides just enough flexibility to hide an implementation detail (e.g. that we used Python as the implementation language) and also hides redundant configuration code.

Blocks are very similar to macros as used in other languages. This term was unfortunately already taken in the syslog-ng context, that’s why it has been named differently.

Blocks are defined in syslog-ng include files, these include files you can store as an “scl” subdirectory of the Python module.

Native configuration syntax for Python based plugins using configuration generators.

Sometimes, blocks are insufficient to properly wrap our desired functionality. Sometimes you need conditionals, in other cases you want to use a more complex mechanism or a template language to generate part of the configuration. That you can do using configuration generators.

Configuration generators have also been around for a while, but until now they were only available using external shell scripts (using the confgen module), or restricted to be used from C, syslog-ng’s base language. The changes in 4.0 allow you to write generators in Python.

Here’s an example:

@version: 4.0
python {

from syslogng import register_config_generator
def generate_foobar(args):
    print(args)
    return "tcp(port(2000))"
#
# this registers a plugin in the "source" context named "foobar"
# which would invoke the generate_foobar() function when a foobar() source
# reference is encountered.
#
register_config_generator("source", "foobar", generate_foobar)

};
log {
    # we are actually calling the generate_foobar() function in this
    # source, passing all parameters as values in the "args" dictionary
    source { foobar(this(is) a(value)); };
    destination { file("logfile"); };
};


syslog-ng will automatically invoke your generate_foobar() function whenever it finds a “foobar” source driver and then takes the return value for that function and substitutes back to where it was found. Parameters are passed around in the args parameter.

Shipping Python code with syslog-ng

Until now, Python was more of an “extended” configuration language, but with the features described above, it can actually become a language to write native-looking and native-behaving plugins for syslog-ng, therefore it becomes important for us to ship these.

To submit a Python implemented functionality to syslog-ng, just open a PR that places the new Python code into the modules/python-modules/syslogng/modules subdirectory. This will get installed as a part of our syslog-ng-python package. If you have 3rd party dependencies, just include them in the setup.py and requirements.txt files.

If you need an example how to use the new Python based facilities, just look at the implementation of our kubernetes() source.

syslog performance: scaling up before scaling out

syslog performance: scaling up before scaling out

The other day I was reading a blog post on handling syslog at scale back on cribl.io’s blog. As you can imagine, syslog-ng has been used to solve syslog related challenges for a while now (24 years to be exact) and with that expertise I wanted to point out a few things in relation to that blog post. This might even become a series.

The blog post linked above, gives some advice how to scale syslog, in the section titled: Scaling Syslog the Right Way. Read the blog post for more details, but here’s my summary:

  • place the receiver (e.g. Cribl Stream) right next to your log sources (e.g. in the same data center)
  • scale out the receiver over many nodes so it becomes a cluster
  • make sure to deploy a load balancer in front

What this means in practice is that you will need a sizeable infrastructure to consume the logs of a syslog producing device. Since my assumption is that all data centers have such appliances, you will need to deploy this infrastructure in each of your data centres (to be close to the sources).

While I can see that load balancing clusters to process log data are important in some scenarios, I don’t think this use case should be one of them. A single node, potentially in a failover High Availability setup should suffice.

There’s a choice between scaling out vs scaling up a workload. Cribl recommends scaling it out. My take is that it should be possible to scale it up before scaling it out.

syslog-ng has a bag of tricks to make it fast even on a single node, thereby reducing hardware costs and operational complexities and the need for a load balancer.

  • it is implemented in C (compared to Cribl’s choice of TypeScript, fluentd and logstash are Ruby IIRC)
  • it avoids/minimizes copying of data while processing them, its log routing implementation uses copy-on-write semantics for cases where multiple paths potentially change the message in parallel paths
  • it avoids/minimizes memory allocations, in the simplest case it allocates 1 block of memory for 1 message
  • it uses an efficient asynchronous architecture, using epoll and one thread per CPU core
  • it uses a message representation where fields (aka: name-value pairs or properties) are stored in a packed block of memory, that is efficient to look up & serialize.
  • it uses internal queueing mechanisms that avoids lock contention and allows back-pressure to be applied selectively
  • it offers alternatives to regular expressions, as regexps are pretty slow to evaluate at volume

syslog-ng offers a domain specific language to route and manipulate messages, Cribl uses JavaScript.

I understand that this is an apples to oranges comparison. Cribl seems to have good UX. syslog-ng has good performance.

This is on my ~2018 laptop (Intel(R) Core(TM) i5-7440HQ CPU @ 2.80GHz, 4 cores), with a single destination writing all messages to disk, with syslog parsing enabled.

# single threaded sender
$ loggen -S -r 10000000 -s 300 –active-connections=1 -I 20 -P localhost 2000
average rate = 305506.81 msg/sec, count=6111711, time=20.0052, (average) msg size=304, bandwidth=90697.33 kB/sec

# sending on 10 threads
$ loggen -S -r 10000000 -s 300 –active-connections=10 -I 20 -P localhost 2000
average rate = 561537.95 msg/sec, count=11533347, time=20.5389, (average) msg size=304, bandwidth=166706.58 kB/sec

syslog-ng config:

@version: 3.38

log {
  source { tcp(port(2000) log-iw-size(10000) log-fetch-limit(1000) flags(syslog-protocol)) ; };
  destination { file("/install/foobar" log-fifo-size(10000)); };
};

 

Survey on syslog-ng objectives still open…

Survey on syslog-ng objectives still open…

In my last post, I enumerated the long term objectives I distilled from the discussions I had earlier with some syslog-ng users. Thanks for everyone who responded to that and/or filled out the survey, very insightful responses, something to work from. The survey is still open, but as always, the more the better so some more responses would be very much appreciated.

You might be operating a logging infrastructure, or you might be in the process of deploying one, it does not matter: if your job or project involves logging and solving the problems it entails I’d like to know your opinion.

Thanks

syslog-ng distribution and support bottleneck

syslog-ng distribution and support bottleneck

I find that a lot of syslog-ng deployments are lagging behind and are using ancient versions. It has become difficult for me to get these deployments to more recent versions. No product is able to improve and cover new ground in a situation like this…

Being ancient is a relative term: for instance, in the JavaScript world it is considered ancient if you are using a framework that was initially released two or more years ago. New hypes and incompatible rewrites are published at a pace which makes the JavaScript ecosystem difficult to follow.

Maintaining this change velocity in the log management space is not feasible. Deploying a log management and processing infrastructure from scratch can literally take years just one time. Swapping out technologies every now and then on a whim would mean that the project never reaches the goals it was set out to achieve.

With that I said I still think that being able to regularly push out updates to deployments is an important bottleneck to solve for any product to be sustainable. This is needed for both the feature front (e.g. addressing new use-cases) and on the support perspective (e.g. fixing bugs).

I often get questions about syslog-ng 3.5.6. This release was originally published 5th August 2014, roughly 8 years ago, and happens to be part of EPEL7. syslog-ng is included in BMW i3 vehicles, this video shows the listing of open source components on the infotainment screen, The BMW Open Source DVD contains syslog-ng 3.4.7, a whooping fresh release from December 2013. There are similar stories with syslog-ng included in products or an OS release, usually with pretty old versions.

Why does this happen?

Due to the early adoption of syslog-ng, it was included in a number of Linux distributions and BSDs/UNIXes, even became default in some of them. I considered this a great success.

For none of these distributions however is log management a central question. They each need some kind of log daemon, but that’s it. Whether that log daemon is syslogd, rsyslog or syslog-ng does not really matter. Neither matters their actual version number. So even though distributions helped initial syslog-ng adoption, they have become a bottleneck in delivering new releases to users.

Users can still upgrade, right?

Enterprise users (and products that embed Linux and syslog-ng) pick an OS version and plan with it for ~10 years. Unfortunately they deploy syslog-ng as a part of the OS and expect the OS vendor to provide support. Often, the sysadmin responsible for log management is not even allowed to upgrade. Some claim that upgrading syslog-ng would violate their support terms, causing the entire OS to become unsupported.

So even though more recent versions of syslog-ng includes functionality or fixes they need, they stick to the old version and try to work around any issues they find.

The support from the OS vendor for the logging component is questionable at best and is restricted only for the most basic use-cases, not cases where syslog-ng would play an important role in one’s infrastructure. Just as log management is not a central focus for the OS, neither is it for the support team behind the OS. They would fix security issues, should they be reported, but otherwise they will just continue to use what they have.

Solution: state of the art binaries to pick from

Building the latest version of syslog-ng for your enterprise distro on your own is not for the faint of heart. Even though 20 years ago, building your own kernel or application was an essential part of a sysadmin’s job on any UNIX, this is not true any more.

Also, it was a lot easier to build syslog-ng in 2001, today we have so many integrations that pulling all the build dependencies (and the right versions) is far from trivial.

We worked hard in the past years to resolve this issue and today syslog-ng is not only available in source format. There are a number of options today to pick from, should you want to use the latest and greatest:

Over time, the building of bespoke/customized packages has become much easier too, this blog post explains it all.

So what’s your excuse? I am really interested if the options above suffice. Do you still use an old syslog-ng version? Why? Would any of the above work for you? If not, What would YOU need to upgrade syslog-ng to recent versions? And what would you need to change your processes to plan for upgrades regularly?

If you have a response to any of these questions, please post it as comment below or drop me an email. Thanks.

syslog-ng-future.blog? Is this a fork or what?

syslog-ng-future.blog? Is this a fork or what?

I mentioned in the previous post that I would like to focus on syslog-ng and put it more into the spotlight. I also mentioned that Balabit, the company I was a founder of and the commercial sponsor behind syslog-ng, was acquired by One Identity ~4 years ago. How does this add up? Who owns the Intellectual Property (IP) for syslog-ng? Who am I or this blog affiliated with?

I felt this post was important to set things straight and make it easier to understand my motivation. If you are not much into Free Software and Open Source licenses or not interested too much in administrative nuisances of FLOSS projects, feel free to skip this post.

First of all, the IP in syslog-ng that Balabit owned originally, was transferred to the acquirer, One Identity. This includes:

  • copyright on the documentation and parts of the codebase
  • trademarks, website, marketing stuff

The good news in here is, that not all of the codebase is copyrighted by its commercial sponsor. Back in the Balabit days we enacted a change of the licensing regime in 2010 as described here: bazsi.blogspot.com/2010/07/syslog-ng-contributions-redefined.html.

With this change we’ve stopped requiring signed CLAs (Contributor’s License Agreements) whenever someone contributed to syslog-ng. This means that the copyright of any outside contributions would be retained by the contributor and not assigned to Balabit or its successors.

Over the years, many such outside contributions were merged into the syslog-ng codebase, meaning that the code today is owned by many different individuals and companies. Copyrights of files are tracked by the tests/copyright/policy file in the source tree: you can note that there are some files that are external contributions in their entirety. Some files have mixed copyrights, partly owned by One Identity, partly by the outside contributor.

Since the license syslog-ng uses is a combination of GPL + LGPL, the combined work as such is free software forever. The GPL/LGPL warrants that anyone can get the source code and be able to change it in any way he or she wishes. The sole requirement of the license is that should you distribute syslog-ng to any 3rd parties, you would need to disclose that you were using GPLed code and offer the source code along with any changes you have made.

The same rules apply to the commercial sponsor as well: as long as it relies on the work that was produced by external contributors over the years, they will need to publish any changes to syslog-ng they create. The only way out would be to rip out all code created by external contributions.

This means that syslog-ng today is a truly open source project, at least from the licensing perspective

But there’s another perspective, namely whether there is an active developer community that adds new features and publishes new releases.

Fortunately, this also holds true. The syslog-ng project lives on https://github.com/syslog-ng/syslog-ng, contributions are transparently managed in GitHub pull requests, either if created by One Identity or someone else in the community. Regular releases are produced on a 8-10 weeks cadence and are published both in source + binary formats and a docker image.

With all of the explanations above, the status of this site and myself can easily be explained: I am not affiliated with One Identity (the successor of Balabit) in any way. I am an individual who contributes time and energy to the open source syslog-ng project, just as I did the same in the last ~24 years.

This is a personal blog, related to syslog-ng and producing useful fixes and features for syslog-ng as an open source project. I am not sponsored or endorsed by One Identity. I am here to help finding out where syslog-ng should go next.

The consensus of the FLOSS community is that a project is only considered truly open source if the developer base/IP is not concentrated within a single company/organization. The reason for this understanding is simple: if there’s only one such entity then that entity has too much control/power over the project. If the company goes bankrupt or changes hands, then priorities might change in a way that causes the open source project to suffer. The long term sustainability of an open source project hinges on the breadth of its contributor base: the broader it is, the more likely it is that the open source project can outlive its creators or commercial sponsors.

The dependence of the syslog-ng project on Balabit as its commercial sponsor has been an issue since ~2009 and probably the reason why it has not become the default logging daemon in Fedora, thus RedHat Enterprise Linux. I don’t think the inclusion in Linux distributions is the prime venue of competition today as it once was. But this decision by Fedora still hurts. 🙂

In a way, the acquisition of Balabit, and my departure from One Identity later allows syslog-ng to become a truly independent-, sustainable open source project.

Stay tuned!

syslog-ng relaunch

syslog-ng relaunch

syslog-ng has been around for decades: I started coding the first version of syslog-ng in September 1998, circa 24 years ago. The adoption of syslog-ng skyrocketed soon after that: people installed it in place of the traditional syslogd across the globe. It was packaged for Debian, Gentoo, SUSE and even commercial UNIXes. It became a default logging daemon in some of these Linux distributions. Commercial products started embedding it as a system component. Over the years however I feel that syslog-ng has become a trusted piece of infrastructure, few people really care about. I set out to change that.

The use of syslog-ng has become so widespread and dominant, needing minimal maintenance, that after a point, people stopped noticing its existence. It became like the printer sitting in an office corner: we know it’s there, we use it regularly, we appreciate the function but we don’t really know or care about the details or the brand providing us with given service.

I see syslog-ng regularly in this spot today: its deployment might have been a big project in its time with its own challenges, but it has been a solved problem ever since.

Not that log management and log processing would be a static, boring field of IT & IT Security. Like all other fields of enterprise IT, there’s been tremendous activity in the last 10-15 years.

Markets and relevant trends:

  • SIEM & User Behavior Analytics(LogLogic, ArcSight, QRadar, Splunk, …)
  • Big Data (Hadoop, Kafka, Storm, Spark, NiFi)
  • Enterprise SaaS services (Office365, Google Workspace, etc.)
  • Containers and orchestration (Kubernetes, OpenShift, cloud & on-prem)
  • Cloud Native Applications

All these changes naturally resulted in an equal frenzy in the tools processing and managing log data. New tools and services emerged, old tools gained new features. I could probably go on and get into details on these trends but that’s not why I am here today.

I started this blog as I wanted to show two things:

  1. That syslog-ng has not been the stoic figure in the corner and has incorporated important improvements over the years that are not widely known and unfortunately not even assumed.
  2. To solicit feedback on my future plans and with that help guide the development of syslog-ng to the future.

The intent behind this blog is to address the 2nd point.

The first point might sound a little strange at first: if there are indeed functionality in syslog-ng that its users don’t know or care about, that can only mean one of two things:

  1. Those features were not needed in the first place.
  2. The marketing/communication of syslog-ng as a project has not been very good.

As one of the engineers behind the changes I firmly believe #1 is not true. The features we added to syslog-ng over the years are important. I believe these features enable syslog-ng to address problems that only few people assume it could address. But I am not here to go into details on those features either.

My take on the marketing issue is different: other projects, open source or commercial, have been better at communicating their value propositions. They were more successful at communicating their release-by-release improvements and with that gained a more significant traction in the marketplace.

The reason behind this failure is an entire post on its own (let me know if you are interested!), my short and simple summary is a single word: focus.

I am the founder of the syslog-ng project. I founded a company that sponsored the syslog-ng project. But neither my or my company’s primary focus has ever been syslog-ng. Some of you may remember that syslog-ng was hosted on balabit.com. Balabit was a player in the Privileged Access Management space (e.g. the likes of CyberArk, BeyondTrust, e-DMZ, Wallix etc). Albeit we made an effort to combine log management with PAM, but truth be told we never really succeeded in doing so. syslog-ng grew from being my personal hobby to become the 2nd product in the Balabit portfolio.

This situation handicapped syslog-ng compared to those projects and companies that had logs as their primary focus.

Balabit was acquired 4 years ago: I spent my sabbatical, I learnt a couple of new hobbies (electronics mainly, welding is something I still want to learn), implemented home automation in my house (see http://bazsi.blogspot.com/), became a hobby angel investor and a management consultant. With all that I am somewhat bored. I love spending time with my family all these new things, but at the same time I need new challenges. There are too many “small” things I spend my time with and I have an itch to do something “bigger”.

I want to give syslog-ng a chance it never had: I want to make it my primary focus. The foundations and the technology are already there, let’s put the spotlights on, blow the dust off. Engage with users, understand their needs and communicate value. Understand things that are missing and fix them.

In a nutshell, I would like to relaunch syslog-ng as a project. Let’s reboot the process that keeps a product able to adapt to a changing market and continue to be relevant for more decades to come.

I am inviting you to be a part of it. Feedback, new use cases, feature requests and even bug reports are welcome. Strong points that you like, weak spots that you would like to see improved are very interesting.

Subscribe below and help me in this endeavour.  Stay tuned!