Data Best Practices for Logging
When you set up a tool that allows to collect large amounts of data, there always comes a time when you ask yourself the question of cost, environmental impact and compliance with legislation… DecaLog is no exception to the rule: its collection capabilities in both depth and width should catch your attention. To help you figure it out, here are the best practices you can apply and how to implement them in DecaLog.
What will guide you in this process are the “golden rules” of data and metadata handling:
Collect only what you need, and store it just as long as necessary – and no longer: your wallet and the planet will thank you.
Privacy by default
Respect your users and their choices… And the privacy laws you are subject to.
Security by design
Apply security best practices: encrypt when possible, manage identities and roles of those who can access data, etc.
These rules are particularly actionable and, whether you have decided to store your events, traces and metrics in WordPress or to use a third party provider, DecaLog has all the features required to implement them.
DecaLog offers loggers of different classes – alerting, debugging, crash analytics, events logging, monitoring and tracing – to allow you to segment or partition its activity according to your needs. So, use this capability: only create the loggers you really need; make sure there is no redundancy (in terms of data or usage); pause all loggers that are not needed at any given time.
If you want to reduce the perimeter of the collected data, set only the necessary listeners in the second tab of the settings screen. This is a drastic measure that is not recommended in all cases, but it can bring proud services.
Carefully set the minimal logging level. Depending on what you want to do, it may not be necessary to collect the debug, info and notice levels. Each level corresponds to a specific type of event and in order to be similar to other log management systems and to maintain consistency between all the DecaLog listeners, the levels are used as follows:
- DEBUG – Only used for events related to application/system debugging. Must not concern standard, important or critical events. Ex.: “Plugin table xxx updated.”, “Textdomain yyy loaded.”.
- INFO – Simple informational messages which can be forgotten. Ex.: “User xxx is logged-out.”, “New comment on post yyy.”.
- NOTICE – Normal but significant conditions. Ex.: “The configuration of plugin xxx was modified.”, “The database is 70% full.”.
- WARNING – A significant condition indicating a situation that may lead to an error if recurring or if no action is taken. Ex.: “Page not found.”, “Comment flood triggered.”.
- ERROR – Minor operating error which requires investigation and preventive treatment. Ex.: “The file could not be opened.”, “The feature could not be loaded.”.
- CRITICAL – Operating error which requires investigation and corrective treatment. Ex.: “Uncaught Exception!”, “Database error in query xxx.”.
- ALERT – Major operating error which requires immediate corrective treatment. Ex.: “The WordPress database is corrupted.”.
- EMERGENCY – A panic condition (unusable system). Ex.: “The WordPress database is down.”, “Parse error: syntax error, unexpected ‘if’ (T_IF).”.
With these definitions, you can choose exactly what fits your needs.
Events and traces metadata
DecaLog allows you to fine-tune the associated metadata. These metadata are known as fields for events and tags for traces but it is, by nature, the same type of information.
The settings to select/unselect metadata are done in the settings screen of each logger under the “Reported details” section.
If, for some reason, you don’t need some of these details, don’t forget to uncheck them.
In the same way, in order to preserve the confidentiality of the handled data – do not forget that in many cases, these data will be stored at third party providers – DecaLog allows to apply protective measures on data classified as Personally Identifiable Information by a large number of laws or regulations.
The settings to choose these measures are done in the settings screen of each logger under the “Privacy options” section.
By hashing this information, DecaLog offers a minimal level of protection that does not affect the ability to filter or group by these values, whether fields or tags.
The above setting is only for metadata: fields and tags. If you want to try to expunge potential Personally Identifiable Information in the events content (the message), use the pseudonymization option available in the general plugin settings.
Metrics in DecaLog are categorised under two profiles: production and development. Although the development profile exposes many additional metrics, feel free to stay on the production profile if it is sufficient for you. This will reduce the amount of data to be sent and stored.
Rather than forcing each monitoring logger with a specific profile, another way to go is to set this parameter to automatic. If you put your site in “local”, “development” or “staging” stage, the metrics profile used will be development, otherwise the profile used will be production.
This settings is done on a per-logger basis.
Traces and metrics sampling rate
In many cases, since traces and metrics can be quite stable from one request to the next, it is more efficient not to systematically send them.
That’s why DecaLog offers you to set, for each involved logger, a sampling rate from 1‰ (one per thousand requests) to 100% (all requests). By setting values below 100%, your sample (of traces and metrics) will not be complete, but if you operate high traffic site(s), this will not really matter. And it will significantly reduce the environmental footprint of your operations.
Loggers of debug class are very helpful to “see” what’s going in a request. Nevertheless, their mode of operation is, in essence, to leak as much data as possible into the user’s browser. Treat them as what they are: a debugging aid! So, never use them on sites in “production” stage or on sites (even in “development” stage) that are publicly accessible.
If you take advantage of the built-in visualization tools for events and traces, it is possible to modify the behavior of the corresponding log access rules.
This change is done via the “Logs accesses” option in the general settings of DecaLog. If you decide not to use the most secure mode (Never override privileges), please understand that this can have a strong impact on security and privacy especially – but not exclusively – for legal reasons (“on the fly” modification of data operators).
For the same reasons as for debug loggers, do not leave a “Prometheus endpoint” logger on a site in “production” stage or publicly accessible without activating the endpoint authentication (this option may be enabled in the general settings of DecaLog). Enabling this option is the safest way to avoid leaking metrics.
As you can see, it is quite possible to log in an efficient and “ethical” way. DecaLog already offers a number of features for this… However, if you have an additional feature in mind that could help to improve data handling – or for any other question/suggestion about WordPress observability with DecaLog – you can reach me on Twitter. You can also further these discussions on Github.