Sending errors in a roll up digest email with log4net


On production, there are many alerts in a short period. It's nice to get error emails, but you would want to avoid receiving hundreds of emails in a few minutes, when something fails. Also, people get used to alert emails and later nobody would care about the blizzard of emails.

So what we want are

  1. Roll up notifications of faults occurring in a short period into a summary email
  2. But the first fault should be emailed straight away
  3. The thresholds are two. First, it waits for certain number of faults. Second, though the buffer is not full, after a certain period of time, the logger sends all the faults in the buffer. Practically, it will wait for 5 minutes. Even within 5 minutes, if the nubmer of faults exceeds 100, then the batch will be sent.
  4. Include contextual events. We also want to see 10 preceding events before the error to understand the context of the fault.

We will use log4net, a superb logging framework that handles the most of the features we want, just out of box.

Let me go straight to the code.

[sourcecode language="csharp"]
public class SmtpCachingAppender : SmtpAppender
{
private bool _timedFlush;
System.Threading.Timer _timer;
private string _originalSubject;
private readonly IList<LoggingEvent> _thresholdEvents;
private readonly List<LoggingEvent> _allEvents;

public SmtpCachingAppender()
{
FlushCount = 20;
FlushInterval = new TimeSpan(0, 5, 0);
_thresholdEvents = new List<LoggingEvent>();
_allEvents = new List<LoggingEvent>();
}

public TimeSpan FlushInterval { get; set; }
public int FlushCount { get; set; }

/// <summary>
/// Create a timer that fires to force flushing cached log events
/// via SMTP at a specified interval.
/// </summary>
public override void ActivateOptions()
{
if (FlushInterval > TimeSpan.Zero)
_timer = new System.Threading.Timer(OnTimer, null, FlushInterval, FlushInterval);

base.ActivateOptions();
}

protected override void SendBuffer(LoggingEvent[] events)
{
_thresholdEvents.Add(events.Last());
_allEvents.AddRange(events);

bool isFirstError = _thresholdEvents.Count == 1;
if (isFirstError)
Send(_allEvents);

bool isBatchReady = ((FlushCount != 0) && (_thresholdEvents.Count >= FlushCount)) || _timedFlush;
if (isBatchReady)
{
Send(_allEvents);
ResetCacheBuffer();
}
}

private void Send(IEnumerable<LoggingEvent> events)
{
_originalSubject = Subject;

PrepareSubject();
base.SendBuffer(events.ToArray());

Subject = _originalSubject;
}

private void ResetCacheBuffer()
{
_thresholdEvents.Clear();
_allEvents.Clear();
_timedFlush = false;
}

private void PrepareSubject()
{

var matches = Regex.Matches(Subject, @"%property{(\w+)}");
foreach (Match match in matches)
{
if (!_thresholdEvents[0].Properties.Contains(match.Groups[1].Value))
continue;

Subject = Subject.Replace(match.Value, _thresholdEvents[0].Properties[match.Groups[1].Value].ToString());
}

if (_thresholdEvents.Count > 1)
{
Subject = "SUMMARY " + Subject + " - " + _thresholdEvents.Count + " errors";
}
}

private void OnTimer(Object stateInfo)
{
_timedFlush = true;
Flush(true);
}

}

[/sourcecode]

SmtpCachingAppender inherits SmtpAppender. SmtpAppender already has feature for contextual events. So we can benefit from it.

When the logger logs any event, SendBuffer(LoggingEvent[] events) is called. why is the parameter events collection? It's because of "buffersize" setting. if "buffersize" is set to 0, only 1 event is passed. If it is 10, 10 events are passed, and the last event is the threshold event (in this case, it will be error) and all previous events are contextual ones.

There are two lists, _thresholdEvents and _allEvents. _allEvents will have all events including all previous events even though they are not ERROR event (there would be some DEBUG, INFO, WARN, ...). The batch will only consider the number threshold events, so we need a separate list, _thresholdEvents.

When either of the two conditions are met, we call base.SendBuffer(...), FlushInterval is done or the number of threshold events exceed the specified FlushCount.

The setup for this appender is like this.

[sourcecode language="xml"]
<appender name="RollUpErrorEmail" type="SmtpCachingAppender">
<to value="operation@andrewchaa.me.uk" />
<from value="backend@yourreader.net" />
<subject value="Your Service ERROR (local from homemachine)" />
<smtpHost value="SMTP-HOST" />
<bufferSize value="10" />
<lossy value="true" />
<FlushInterval value="00:05:00" />
<FlushCount value="100" />

<evaluator type="log4net.Core.LevelEvaluator">
<threshold value="ERROR"/>
</evaluator>

<layout type="log4net.Layout.PatternLayout">
<conversionPattern value="%utcdate [%thread] %-5level %logger user=%property{user} - %message%n" />
</layout>
</appender>

[/sourcecode]

The usage of each configuration values are

There was a sample code for cachingsmtpappender. I added the first email bit and also changed it a little bit. Hope this helps anyone interested.