Rocket photo by NASA
Photo by [NASA](https://unsplash.com/photos/n463SoeSiVY?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/search/photos/rocket?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)

The scalability and reliability of FaaS services such as AWS Lambda really give us a priceless piece of mind. Nonetheless, so much flexibility needs careful monitoring to make sure we’re running within safe boundaries of performance and financial costs.

Say a function is deployed to serve a certain customer-facing request. Benchmarking the function reveals an average of 3 seconds to complete a request - with a very low standard deviation. We plan a 200% markup and charge our customers three times the cost to run those 3 seconds.

Now imagine this function starts taking 10 or 20 seconds to serve requests? Maybe we rely on a third-party API and their network performance starts to smell really bad. Heck, it could last a day or a week, we can’t control. Depending on our application scale, this mismatch could lead to a huge operational loss.

Don't burn your precious dollars!

Burning US dollar bill
Photo by [Chris](https://pixabay.com/users/intellectual-4717896/) on [Pixabay](https://pixabay.com/photos/burning-money-dollars-cash-flame-2113914/)

To avoid this kind of problem, we need to put in place thoughtful logic to monitor our function’s metrics and alert ourselves in case anything starts derailing. In the example above, we could have logic such as:

Alert me if invocation duration is above 3.5 seconds, on average, over the last 15 minutes.

Alert could be an email, a message to a Slack channel, SMS, whatever. The important part is having the ability to set custom monitoring policies in a way that matches our application performance expectations.

It’s not just about monitoring errors or analyzing each invocation individually. Due to the variable nature of the hardware, networking, etc, our function can perform in unpredictable ways. Measuring it on an invocation-basis would lead to many false alarms. Many requests might go above 3.5 seconds, but many others fall below 2.5, which will offset our costs and shouldn’t take our precious time. The “last 15 minutes” aggregation part of the above alerting policy is key to allow ourselves to save time and attention to what really matters.

That’s one of the reasons Dashbird exists. It’s a monitoring tool specially designed for serverless applications that provides great flexibility for setting custom monitoring policies such as the one demonstrated above. If you have functions running in production on AWS Lambda, I strongly suggest to check them out, you can try for free, no credit card required: Dashbird.io.


Renato Byrro is a Developer Advocate at Dashbird.io. You can follow me on Twitter and Medium.

This post is also available on DEV.