Select Page
syslog-ng 4 theme: typing

syslog-ng 4 theme: typing

As explained in my previous post, we do have some features already in mind for syslog-ng 4, even though the work on creating a long term set of objectives for the syslog-ng project is not finished yet. One of the themes I that I have working code for already, is typing.

syslog-ng traditionally assumes that log data, even if it comes in a structured form (like RFC5424 structured data or JSON) is primarily textual in nature. For this reason, name-value pairs in syslog-ng are text values just as the log message as a whole. The need for typing however came up previously, most notably in cases where we sent data to a consumer that supported typing, such as:

  • Elastic like other similar consumers use JSON, and attributes can have non-text types
  • SQL columns have types
  • Riemann metrics can have types

Also, it happens that typing has an impact in log routing decisions. In a lot of cases, textual comparisons or regexp matches are fine, however sometimes your routing condition depends on a value being larger than or less than a numeric value. For example:

log {
   if ("${.apache.bytes}" > "10000") {
      # do something
   }
};

In this case, doing the comparison as texts is clearly incorrect, if ${.apache.bytes} was “5”, the condition above would pass, as the string “5” is larger than “10000”, which is clearly not the case if we were to compare these numerically. To allow both numeric and textual comparisons, syslog-ng has two sets of operators, the usual “<“, “=” and “>” are doing numeric comparisons, while “lt”, “eq” and “gt” are doing string comparisons. But it’s pretty easy to mix those up, even I make that mistake sometimes.

To address both problems, type support is being added to syslog-ng. The change by itself is pretty simple:

  • we add a “type” value associated with each name-value pair of the log message,
  • the value itself continues to be stored internally in their current, text based format,
  • whenever we need type information in a type aware context (e.g. when we format  a JSON or send a riemann event), we would use this type information
  • whenever we just need the name-value pair as before, in textual context, we would just continue to use the existing string based value

The consequences:

  • type aware consumers (like: JSON, Elastic, Riemann, MongoDB, etc) would use type information automatically, no need for explicit type hints
  • we can implement type aware comparisons, so that syslog-ng does the right comparison, based on types (e.g. like JavaScript).

As always, this is probably easier to understand with examples.

Type aware JSON parsing/reproduction

@version: 4.0
log {
  source { tcp(port(2000) flags(no-parse)); };
  parser { json-parser(prefix('.json.')); };
  destination { file("/tmp/json.out" template("$(format-json .json.* --shift-levels 2)\n")); };
};

This configuration expects JSON payloads, one by each line, on TCP port 2000. It parses the JSON and then reformats it using $(format-json). Let’s run this configuration:

$ /sbin/syslog-ng -Fedvtf /etc/syslog-ng/syslog-ng-typing-demo.conf

Let’s send a JSON payload to this syslog-ng instance:

$ echo '{"text": "string", "number": 5, "bool": true, "thisisnull": null, "list": [5,6,7,8]}' | nc -q0 localhost 2000

syslog-ng reports the parsing process in its debug/trace log levels:

[2022-03-03T08:40:56.408225] json-parser message processing started; input='{"text": "string", "number": 5, "bool": true, "thisisnull": null, "list": [5,6,7,8]}', prefix='.json.', marker='(null)', msg='0x7ffff00141c0'
[2022-03-03T08:40:56.408461] Setting value; name='.json.text', value='string', msg='0x7ffff00141c0'
[2022-03-03T08:40:56.408500] Setting value; name='.json.number', value='5', msg='0x7ffff00141c0'
[2022-03-03T08:40:56.408524] Setting value; name='.json.bool', value='true', msg='0x7ffff00141c0'
[2022-03-03T08:40:56.408545] Setting value; name='.json.thisisnull', value='', msg='0x7ffff00141c0'
[2022-03-03T08:40:56.408592] Setting value; name='.json.list', value='5,6,7,8', msg='0x7ffff00141c0'

Note the individial name-value pairs being set as they are extracted from the JSON format. And then this is reproduced on the output side:

{
  "thisisnull": null,
  "text": "string",
  "number": 5,
  "list": [
    "5",
    "6",
    "7",
    "8"
  ],
  "bool": true
}

Please note that “numer” is numeric and “list” contains a JSON list. One limitation that is still visible here is that list elements are not typed and are always strings when being reproduced using $(format-json), because list elements are not name-value pairs.

Associate type information with name-value pairs

It is not just JSON that can set types for name-value pairs, rewrite rules and db-parser() can also set them. In rewrite rules, set() can now take a type hint, and that type-hint gets associated with the value as its type:

#this makes $PID numeric
rewrite { set(int("$PID") value("PID")); };

Also, db-parser() would set type information depending on which parser we used extract the specific field. For instance @NUMBER@ would extract an integer.

Type information returned by macros and templates and template functions

Template functions will be able to return the type, depending on the function they perform. For instance the list handling functions like $(list-slice) would return a list. Numerical functions like $(+) would return numbers. Likewise, some macros are also being annotated with their types.

Template expressions as a whole also become typed, whenever we use an “simple” template expression (e.g. with just one ‘$’ reference, like “$PID”), the type of the template is inferred automatically and that type is propagated. If the inferred type is not correct, you can always use type-hints to “cast” the template expression to some other type.

When does it become available? When I can try it?

Since the typing behavior has the potential of changing the output in certain ways (e.g. produce a numeric value which used a string before), we are not turning this feature on automatically. As long as we are in the 3.x release train, it will stay disabled, even as parts of it are being merged. You can evaluate the feature by setting your config version (e.g. @version at the top of the config file), to 4.0, as shown with the example config above.

Then, as we release 4.0, the typing feature will be enabled by default for any configuration that uses @version: 4.0.

Most of the feature is already implemented, but not yet merged to the mainline yet. There are opened PRs on GitHub. 3.36 is expected to contain the first batch (e.g. JSON parser pieces), but not the complete change. I expect the changes to land in mainline in 1 or 2 extra release cycles, e.g. the end of April or end of June.

Stay tuned!

syslog-ng future: the path to syslog-ng 4

syslog-ng future: the path to syslog-ng 4

syslog-ng 3.0.1 was released 17th February 2009, almost exactly 13 years ago. The key feature at that point was to add support for RFC5424, the new “syslog” protocol. The 3.0 release marked a significant conceptual change in syslog-ng as this was where we introduced support for generic “name-value pairs”, a means to encode application or organization specific fields (aka name-value pairs as we named them) associated with a log message.

The 3.x release train has been a long and a busy one. We are right now at 3.35.1 with 3.36.1 right around the corner. Not counting bugfix releases, that’s ~4 releases per year on average. This pace was slower initially (~1 release/year) which then increased due to all the engineering practices that we implemented in the last decade: syslog-ng is a very well tested application today, covered both in terms of unit tests and functional, end-to-end testing. In the last years, the syslog-ng project has produced 5-6 releases per year (every ~2 months), in a rolling model. Apart from features and bugfixes we also had a sharp focus on compatibility and avoiding regressions.

When I started to draft this post, I compiled a list of noteworthy features that were created since 3.0.1 in 2009. My intention with the list was to include it here to back up my previous claim that there are lot of undiscovered and under-communicated aspects of syslog-ng. However, when I finished with the list, I had to realise that even if I trim it down, it is still too long to discuss it in a blog post at one go. For now, I’ve uploaded my raw notes here. I am probably going to use that list to publish technology pieces on the blog or create a survey to map out which are the more interesting items to syslog-ng users. I don’t know yet.

This post however, is not about the past, the title says it all: it is about the path to syslog-ng 4. With the relaunch taking place, I was thinking what else could be better to symbolize a restart than a new major version? With that we can take a moment to reflect on the 3.x series and start anew with fresh energy.

It is very important to state that syslog-ng 4 is not the revolutionary, break-everything kind of release that we see too often in the software world. Rather it is an evolutionary change that will be produced similarly to previous releases, that is:

  • the release will contain both features and bugfixes
  • if a change in behaviour is unavoidable, we keep being compatible using the config version mechanism, e.g. the “@version:” tag in the front of the config file
  • compatibility with old config versions are retained long term (e.g. we are compatible back to 3.0, with compatibility back to 2.0 dropped just a couple of years ago)

But why the fuzz, you may ask, about a new version number if nothing changes and we do exactly as before?

Well, there are some plans scheduled for 4.0 (more on those later), but I consider this release to be an opportunity to set up new, long term objectives. Objectives that will cover the upcoming releases as well and not just 4.0 itself. With the launch of this blog and through interactions with the community, I already have some thoughts of my own, still, I would like to allow community members to contribute even on the strategic level. Let’s find the mission statement for syslog-ng that covers the next 10 years and then guide the project towards those goals with a step in each release. I am posting the specifics and the mechanism of this work in an upcoming post. Until that post, please continue to send me feedback (via Email, gitter.im, GitHub, Reddit, LinkedIn whatever you like), I am truly enjoying each and every one of these interactions and make an effort to respond to all your queries. Also, the syslog-ng project started to use GitHub’s discussion feature, so if you have a suggestion with regards to syslog-ng 4, feel free to submit it here.

Release management and Support

So how would the release of 4.0 happen? Is this a new branch over 3.x? How long would we support 3.x?

These are all valid questions, however the answer is simple: syslog-ng 4 is nothing more than a 3.x release in this respect. We will add features and bugfixes and compatibility will be provided using the config version feature (ie. @version). We will make no breaking changes that we cannot continue to be compatible with. There will be no separate 3.x and 4.x releases going in parallel. If we break something, fixes would be pushed out in upcoming versions (either the scheduled one or an emergency one if the problem is critical). We are confident that our current test coverage gives us a safety net that allows us to use this release strategy.

At the same time, we are scheduling some larger-scale changes that will probably not fit into a normal 8 week release cycle we do these days. We don’t want to stop doing our 3.x releases and we don’t want to publish half-baked features. So how are we going to resolve this conflict?

The regular bugfix/feature flow of 3.x will continue to operate as before. Any 4.0 related functional change will be merged to master (and thus make it into 3.x releases) but any functional change will be disabled.

Once all 4.0 related changes are merged, a 4.0.1 release will be created, effectively turning on the new behaviours, except if the user operates in `@config: 3.x` mode, which is the usual method  to tell syslog-ng to operate in compatibility mode.

All of this basically means the following:

  • the 3.x feature and bugfix flow operates as normal
  • the 4.x related changes get merged and can be evaluated if someone is interested (by using “@version: 3.255” at the top of your configuration file)
  • no half-baked functionality is exposed, even if they take longer to bake than the 8 week release cadence.
  • all protected by our testing infrastructure

Up until now, only the versioning framework was merged with some more queued for merging. Details on some of the plans for 4.0 are coming in separate posts. Stay tuned!

syslog-ng distribution and support bottleneck

syslog-ng distribution and support bottleneck

I find that a lot of syslog-ng deployments are lagging behind and are using ancient versions. It has become difficult for me to get these deployments to more recent versions. No product is able to improve and cover new ground in a situation like this…

Being ancient is a relative term: for instance, in the JavaScript world it is considered ancient if you are using a framework that was initially released two or more years ago. New hypes and incompatible rewrites are published at a pace which makes the JavaScript ecosystem difficult to follow.

Maintaining this change velocity in the log management space is not feasible. Deploying a log management and processing infrastructure from scratch can literally take years just one time. Swapping out technologies every now and then on a whim would mean that the project never reaches the goals it was set out to achieve.

With that I said I still think that being able to regularly push out updates to deployments is an important bottleneck to solve for any product to be sustainable. This is needed for both the feature front (e.g. addressing new use-cases) and on the support perspective (e.g. fixing bugs).

I often get questions about syslog-ng 3.5.6. This release was originally published 5th August 2014, roughly 8 years ago, and happens to be part of EPEL7. syslog-ng is included in BMW i3 vehicles, this video shows the listing of open source components on the infotainment screen, The BMW Open Source DVD contains syslog-ng 3.4.7, a whooping fresh release from December 2013. There are similar stories with syslog-ng included in products or an OS release, usually with pretty old versions.

Why does this happen?

Due to the early adoption of syslog-ng, it was included in a number of Linux distributions and BSDs/UNIXes, even became default in some of them. I considered this a great success.

For none of these distributions however is log management a central question. They each need some kind of log daemon, but that’s it. Whether that log daemon is syslogd, rsyslog or syslog-ng does not really matter. Neither matters their actual version number. So even though distributions helped initial syslog-ng adoption, they have become a bottleneck in delivering new releases to users.

Users can still upgrade, right?

Enterprise users (and products that embed Linux and syslog-ng) pick an OS version and plan with it for ~10 years. Unfortunately they deploy syslog-ng as a part of the OS and expect the OS vendor to provide support. Often, the sysadmin responsible for log management is not even allowed to upgrade. Some claim that upgrading syslog-ng would violate their support terms, causing the entire OS to become unsupported.

So even though more recent versions of syslog-ng includes functionality or fixes they need, they stick to the old version and try to work around any issues they find.

The support from the OS vendor for the logging component is questionable at best and is restricted only for the most basic use-cases, not cases where syslog-ng would play an important role in one’s infrastructure. Just as log management is not a central focus for the OS, neither is it for the support team behind the OS. They would fix security issues, should they be reported, but otherwise they will just continue to use what they have.

Solution: state of the art binaries to pick from

Building the latest version of syslog-ng for your enterprise distro on your own is not for the faint of heart. Even though 20 years ago, building your own kernel or application was an essential part of a sysadmin’s job on any UNIX, this is not true any more.

Also, it was a lot easier to build syslog-ng in 2001, today we have so many integrations that pulling all the build dependencies (and the right versions) is far from trivial.

We worked hard in the past years to resolve this issue and today syslog-ng is not only available in source format. There are a number of options today to pick from, should you want to use the latest and greatest:

Over time, the building of bespoke/customized packages has become much easier too, this blog post explains it all.

So what’s your excuse? I am really interested if the options above suffice. Do you still use an old syslog-ng version? Why? Would any of the above work for you? If not, What would YOU need to upgrade syslog-ng to recent versions? And what would you need to change your processes to plan for upgrades regularly?

If you have a response to any of these questions, please post it as comment below or drop me an email. Thanks.

syslog-ng-future.blog? Is this a fork or what?

syslog-ng-future.blog? Is this a fork or what?

I mentioned in the previous post that I would like to focus on syslog-ng and put it more into the spotlight. I also mentioned that Balabit, the company I was a founder of and the commercial sponsor behind syslog-ng, was acquired by One Identity ~4 years ago. How does this add up? Who owns the Intellectual Property (IP) for syslog-ng? Who am I or this blog affiliated with?

I felt this post was important to set things straight and make it easier to understand my motivation. If you are not much into Free Software and Open Source licenses or not interested too much in administrative nuisances of FLOSS projects, feel free to skip this post.

First of all, the IP in syslog-ng that Balabit owned originally, was transferred to the acquirer, One Identity. This includes:

  • copyright on the documentation and parts of the codebase
  • trademarks, website, marketing stuff

The good news in here is, that not all of the codebase is copyrighted by its commercial sponsor. Back in the Balabit days we enacted a change of the licensing regime in 2010 as described here: bazsi.blogspot.com/2010/07/syslog-ng-contributions-redefined.html.

With this change we’ve stopped requiring signed CLAs (Contributor’s License Agreements) whenever someone contributed to syslog-ng. This means that the copyright of any outside contributions would be retained by the contributor and not assigned to Balabit or its successors.

Over the years, many such outside contributions were merged into the syslog-ng codebase, meaning that the code today is owned by many different individuals and companies. Copyrights of files are tracked by the tests/copyright/policy file in the source tree: you can note that there are some files that are external contributions in their entirety. Some files have mixed copyrights, partly owned by One Identity, partly by the outside contributor.

Since the license syslog-ng uses is a combination of GPL + LGPL, the combined work as such is free software forever. The GPL/LGPL warrants that anyone can get the source code and be able to change it in any way he or she wishes. The sole requirement of the license is that should you distribute syslog-ng to any 3rd parties, you would need to disclose that you were using GPLed code and offer the source code along with any changes you have made.

The same rules apply to the commercial sponsor as well: as long as it relies on the work that was produced by external contributors over the years, they will need to publish any changes to syslog-ng they create. The only way out would be to rip out all code created by external contributions.

This means that syslog-ng today is a truly open source project, at least from the licensing perspective

But there’s another perspective, namely whether there is an active developer community that adds new features and publishes new releases.

Fortunately, this also holds true. The syslog-ng project lives on https://github.com/syslog-ng/syslog-ng, contributions are transparently managed in GitHub pull requests, either if created by One Identity or someone else in the community. Regular releases are produced on a 8-10 weeks cadence and are published both in source + binary formats and a docker image.

With all of the explanations above, the status of this site and myself can easily be explained: I am not affiliated with One Identity (the successor of Balabit) in any way. I am an individual who contributes time and energy to the open source syslog-ng project, just as I did the same in the last ~24 years.

This is a personal blog, related to syslog-ng and producing useful fixes and features for syslog-ng as an open source project. I am not sponsored or endorsed by One Identity. I am here to help finding out where syslog-ng should go next.

The consensus of the FLOSS community is that a project is only considered truly open source if the developer base/IP is not concentrated within a single company/organization. The reason for this understanding is simple: if there’s only one such entity then that entity has too much control/power over the project. If the company goes bankrupt or changes hands, then priorities might change in a way that causes the open source project to suffer. The long term sustainability of an open source project hinges on the breadth of its contributor base: the broader it is, the more likely it is that the open source project can outlive its creators or commercial sponsors.

The dependence of the syslog-ng project on Balabit as its commercial sponsor has been an issue since ~2009 and probably the reason why it has not become the default logging daemon in Fedora, thus RedHat Enterprise Linux. I don’t think the inclusion in Linux distributions is the prime venue of competition today as it once was. But this decision by Fedora still hurts. 🙂

In a way, the acquisition of Balabit, and my departure from One Identity later allows syslog-ng to become a truly independent-, sustainable open source project.

Stay tuned!

syslog-ng relaunch

syslog-ng relaunch

syslog-ng has been around for decades: I started coding the first version of syslog-ng in September 1998, circa 24 years ago. The adoption of syslog-ng skyrocketed soon after that: people installed it in place of the traditional syslogd across the globe. It was packaged for Debian, Gentoo, SUSE and even commercial UNIXes. It became a default logging daemon in some of these Linux distributions. Commercial products started embedding it as a system component. Over the years however I feel that syslog-ng has become a trusted piece of infrastructure, few people really care about. I set out to change that.

The use of syslog-ng has become so widespread and dominant, needing minimal maintenance, that after a point, people stopped noticing its existence. It became like the printer sitting in an office corner: we know it’s there, we use it regularly, we appreciate the function but we don’t really know or care about the details or the brand providing us with given service.

I see syslog-ng regularly in this spot today: its deployment might have been a big project in its time with its own challenges, but it has been a solved problem ever since.

Not that log management and log processing would be a static, boring field of IT & IT Security. Like all other fields of enterprise IT, there’s been tremendous activity in the last 10-15 years.

Markets and relevant trends:

  • SIEM & User Behavior Analytics(LogLogic, ArcSight, QRadar, Splunk, …)
  • Big Data (Hadoop, Kafka, Storm, Spark, NiFi)
  • Enterprise SaaS services (Office365, Google Workspace, etc.)
  • Containers and orchestration (Kubernetes, OpenShift, cloud & on-prem)
  • Cloud Native Applications

All these changes naturally resulted in an equal frenzy in the tools processing and managing log data. New tools and services emerged, old tools gained new features. I could probably go on and get into details on these trends but that’s not why I am here today.

I started this blog as I wanted to show two things:

  1. That syslog-ng has not been the stoic figure in the corner and has incorporated important improvements over the years that are not widely known and unfortunately not even assumed.
  2. To solicit feedback on my future plans and with that help guide the development of syslog-ng to the future.

The intent behind this blog is to address the 2nd point.

The first point might sound a little strange at first: if there are indeed functionality in syslog-ng that its users don’t know or care about, that can only mean one of two things:

  1. Those features were not needed in the first place.
  2. The marketing/communication of syslog-ng as a project has not been very good.

As one of the engineers behind the changes I firmly believe #1 is not true. The features we added to syslog-ng over the years are important. I believe these features enable syslog-ng to address problems that only few people assume it could address. But I am not here to go into details on those features either.

My take on the marketing issue is different: other projects, open source or commercial, have been better at communicating their value propositions. They were more successful at communicating their release-by-release improvements and with that gained a more significant traction in the marketplace.

The reason behind this failure is an entire post on its own (let me know if you are interested!), my short and simple summary is a single word: focus.

I am the founder of the syslog-ng project. I founded a company that sponsored the syslog-ng project. But neither my or my company’s primary focus has ever been syslog-ng. Some of you may remember that syslog-ng was hosted on balabit.com. Balabit was a player in the Privileged Access Management space (e.g. the likes of CyberArk, BeyondTrust, e-DMZ, Wallix etc). Albeit we made an effort to combine log management with PAM, but truth be told we never really succeeded in doing so. syslog-ng grew from being my personal hobby to become the 2nd product in the Balabit portfolio.

This situation handicapped syslog-ng compared to those projects and companies that had logs as their primary focus.

Balabit was acquired 4 years ago: I spent my sabbatical, I learnt a couple of new hobbies (electronics mainly, welding is something I still want to learn), implemented home automation in my house (see http://bazsi.blogspot.com/), became a hobby angel investor and a management consultant. With all that I am somewhat bored. I love spending time with my family all these new things, but at the same time I need new challenges. There are too many “small” things I spend my time with and I have an itch to do something “bigger”.

I want to give syslog-ng a chance it never had: I want to make it my primary focus. The foundations and the technology are already there, let’s put the spotlights on, blow the dust off. Engage with users, understand their needs and communicate value. Understand things that are missing and fix them.

In a nutshell, I would like to relaunch syslog-ng as a project. Let’s reboot the process that keeps a product able to adapt to a changing market and continue to be relevant for more decades to come.

I am inviting you to be a part of it. Feedback, new use cases, feature requests and even bug reports are welcome. Strong points that you like, weak spots that you would like to see improved are very interesting.

Subscribe below and help me in this endeavour.  Stay tuned!