Home Use Cases Blog Contact Us
24/7 Support 010 5000 LSD (573) Submit a Ticket
LSD

LSD

Kubernetes Professionals

  • Home
  • Solutions
    • Managed Kubernetes
    • Managed Observability
    • Managed Event Streaming
  • Services
  • Cloud
    • Cloud Kubernetes
    • Cloud AWS
  • Software
  • Partners
    • Confluent
    • Elastic
    • Gitlab
    • Red Hat
    • SUSE
    • VMWare Tanzu

Tag: observability

What are the benefits of Observability?

What are the benefits of Observability?

Posted on November 4, 2022 by Doug Moll
INSIGHTS

Now that we have explored what observability is and what makes up a good observability solution, we can dive a bit deeper into the benefits. This is again not an exhaustive list of benefits but I consider these to be the most impactful to businesses. Although some of these have been touched on in my previous posts, in this post I will consolidate these and add the missing pieces.

More performance, less downtime

Leaders in the observability space can detect and resolve issues considerably faster than businesses that are still relatively immature in this space. This includes issues relating to application performance or downtime.

Poorly performing applications or applications experiencing downtime have a direct impact on costs for any business. These can be in the form of tangible costs such as a direct loss in revenue or intangible costs such as brand and reputational damage.

Consider an eCommerce store which cannot transact due to a broken payment service, a social application that can no longer serve ads, a real-time trading application with super high latency, or a logistics application with a broken tracking service. There are literally thousands of examples across industries where the costs associated with downtime or poorly performing applications are very tangible.

When a banking application goes down, almost everyone knows about it the minute it happens. Twitter lights up, it appears on everyone’s news feeds and it even lands up on radio and television news broadcasts. Apart from the direct costs, the reputational damage caused by the downtime of an application can also be very costly, leading to increased customer churn, loss of new customers; as well as a host of other outcomes which impact the bottom line.

Measuring the true costs of downtime or poor-performing applications can be a difficult task, but the costs typically far outweigh the costs of making sure observability is done right; where issues are detected early and fixed before they can have a significant impact.

Higher productivity, better customer experience

A properly implemented observability solution provides businesses with massively improved insights across the entirety of the business. These insights improve efficiencies and workflows in detecting and resolving issues across the application landscape. This landscape is distributed in today’s modern architectures and extends to the infrastructure, networks and platforms on which the applications run, both on-prem as well as cloud environments. These insights and efficiencies ultimately provide multiple benefits across business operations.

One of the more tangible benefits is that if your developers and DevOps engineers are not stuck diagnosing problems all day, they can spend their time developing and deploying applications. This means accelerated development cycles that ultimately lead to getting applications to the market quicker as well as leading to better and more innovative applications.

With businesses being ever more defined by the digital experiences they provide to their customers, observability is one of the edges required to become leaders in the industry. The deeper insights also help to align the different functions of the business. Having visibility on all aspects of the system, from higher level SLAs to all the frontend and backend processes, enables operations and development teams to optimise processes across the landscape. These insights even enable businesses to introduce new sources of income.

Observability is also vital in providing businesses with confidence in their cross-functional processes and assurance that the applications that are brought to market are robust. This confidence is even more important in today’s complex distributed systems which stretch across on-prem and cloud environments.

Happy people, better talent retention

One of the often overlooked benefits of observability is talent retention. With highly skilled developers and DevOps engineers being a bit of a scarcity, it stands to reason that businesses would want to do what they can to retain their best talent.

The frustration of sitting in endless war rooms and spending the majority of the day putting out fires is a surefire way to ensure highly skilled talent will look for opportunities to work elsewhere,  to be able to do what they enjoy.

Efficient observability practices and workflows drastically reduce the amount of time developers and engineers spend dealing with issues, making them happier and ultimately helping to retain them.

Fewer monitoring tools, look at all those benefits

One of the themes from my previous posts is that using multiple monitoring tools instead of a centralised observability solution creates inefficiencies and has a severe impact on a business’s ability to detect and resolve issues. From this post, it should be apparent that the insights gained – by a centralised observability solution across the landscape – have a number of other benefits too.

Although this post is dealing with the generic benefits of observability without necessarily comparing it to other approaches, I feel addressing a few drawbacks from the multiple tool approach will also highlight additional benefits of the central platform approach to observability. Below are some of these drawbacks:

  • Licensing multiple monitoring tools introduces unnecessary costs as well as complexity in administering multiple different licensing models.
  • Having multiple tools also introduces complexity across your environment with multiple different agents and tools to be managed and operationally maintained.
  • The diverse and often rare skills required to operate multiple different tools either introduce a burden on existing operations teams or cause reliance on multiple different external parties to implement, manage and maintain tools.
  • Data governance is vital in any tool or system that stores data. Monitoring tools are no different and often contain sensitive data. Governance for a single observability solution is far simpler to achieve and less costly than multiple tools.
  • Storing data also has a cost burden which is often far higher when you have multiple tools, each with its own storage requirements.

The main thing to highlight is that the above drawbacks are really secondary to the most important benefit of centralised observability over the multiple monitoring tools approach. That is detecting and resolving issues in the most efficient and quickest way possible. This is best achieved with seamless correlation between your logs, metrics and APM data in a centralised platform.

Realising your benefits

To be a leader in the observability space is a journey. As I mentioned in previous posts, observability is not simply achieved by deploying a tool. It starts with architecture and design to ensure the solution adheres to best practices and can scale and grow as the business needs it to. It then extends to ingesting all the right data, formatted and stored in a way that can facilitate efficient correlations and workflows. Then all the other backend and frontend pieces need to fall in place, such as retention management, alerting, security, machine learning, etc.

LSD has been deploying observability solutions for our customers for many years and we help accelerate their journey through our battle-tested solutions and experience in deploying and implementing these solutions. Please follow this link to learn more.

Doug Moll

Doug Moll is a solution architect at LSD, focusing on Observability and Event-Streaming for cloud native platforms.

What is Observability? (Part 2)

What is Observability? (Part 2)

Posted on October 28, 2022October 19, 2022 by Doug Moll
INSIGHTS

In Part 1, I introduced Observability and what we are looking to achieve with its implementation. In this post, I will discuss the most critical aspects of a good Observability solution that together will help businesses reach the Observability goals I described in Part 1.

What makes up a good Observability solution?

Elements of Observability

Scale

So far, I have discussed that the solution needs support for logs, metrics, logs and other data types. This is very important, however equally important is the ability to process this data and deliver results and insights in real time. The scalability of the platform is therefore also of crucial importance.

Integration

Data is often collected by means of an agent or multiple agents. A good solution typically provides many out-of-the-box integrations to data sources but also the ability to ingest custom sources. The ability to manage a fleet of agents centrally can also simplify operational burdens significantly.

Correlation

I have mentioned the ability to effectively correlate between data sources as being important. There are different ways of achieving this however I find the most effective to be a common schema between sources.

Visualisation and navigation

A key component of an observability solution is a way to effectively visualise and provide navigational tools so that users can intuitively navigate and analyse data to determine the root cause of problems. This is the user interface that facilitates correlations and is often driven through dashboards which either allows the analyst to easily correlate between sources or visualise data from different sources in the same dashboards. Having the ability to create custom dashboards also adds a lot of value to a solution. Different audiences have different requirements on what they want to view and customisation allows dashboards to be built with a view of what is directly important to the audience.

Alerting

An effective and intuitive alerting framework is the next feature which adds a lot of value to a solution. In its simplest form, alerting rules are configured based on static thresholds where an email is sent when a threshold is exceeded. Alerting can however be the bane of many operational teams, with alert fatigue being as big a problem as new conditions arise for which no rules have been defined. Furthermore, alerts should trigger a managed response and not just an email. This is often achieved through integrations with existing incident management systems. To properly unpack alerting and how to solve problems such as alert fatigue, is beyond the scope of this post however a good solution would cater for this and provide methods for integration with existing incident management systems.

Machine Learning

Machine Learning is also playing an increasingly vital role in Observability. So much so that a good solution should incorporate some measure of it. This also goes a long way in helping to solve some common alerting problems as detailed above. Anomaly detection, based on machine learning, will learn what normal conditions are in time series data, over time. For example, it will learn that month-end cycles include many more transactions than what is typically expected and will therefore know an influx of certain events over that period is normal, preventing alerts from being generated. Static rules are not able to achieve this. Similarly, because it is monitoring for anomalies, it does not need a static rule in place for anomaly detection to, for example, detect rare or anomalous events in log files and alert on them.

Distributed architectures

I previously mentioned the critical importance of observability in new distributed architectures and services. Moving workloads to the cloud have further complicated architectures and introduced many more moving parts deployed across hybrid environments, both on-premise and in the cloud or even multi-clouds. There are a number of factors to consider in this regard which would determine what the right tool is. Firstly, the capability to analyse data from all environments and correlate across this data, regardless of the environment is important. This may entail a single platform hosted either in the cloud or on-premise, or multiple platforms hosted in different environments with the capabilities to seamlessly link the clusters in a way that makes navigating data across environments transparent to the users. There are multiple factors to consider when architecting an observability solution. Deep diving into these is not in the scope of this article but it is vital that the deployment models offered by a solution optimally meet all requirements for the business. This is typically achieved by solutions that are flexible and offer different options in terms of deployment. These solutions should be able to meet requirements today and in the future.

Solution management and support

When considering deployment models, it is also important to evaluate the operational effort to manage your observability solution. In other words, is it properly orchestrated according to modern architectural standards? How easy is it to scale or upgrade? How easy is it to support? If any of these factors fall outside of the capabilities of the team supporting and maintaining the Observability solution, consider a managed service approach and let Observability experts take care of it.

Security

Security is critical with modern standards requiring encryption, role-based access controls (RBAC), authentication with the identity provider of choice such as AD, etc. Additionally, there are sometimes requirements to secure data to a field or attribute level. A good solution will cater for enterprise-level security standards.

Resilience

How different Observability solutions achieve high availability (HA) varies. Furthermore, the deployment methods supported by the solution would also impact the level of HA achievable. For example, deploying in Kubernetes can take advantage of all the self-healing capabilities inherently available in such a platform. The targeted environment for the deployment also has an impact. Deploying to the cloud would for example allow distribution across availability zones or even regions. Then there are also considerations to be made on Disaster Recovery environments, should there be a requirement, and exactly how those may be supported by the solution. Without unpacking all the factors involved in making your decision, careful consideration should be taken on this. A good solution will offer you the flexibility to decide on the level of HA required, depending on your deployment destination and method of choice.

Costing model

It is not my intention to deep dive into costing models of differing solutions. Focusing on what makes a good solution, it must however be factored in that the solution will not be effective if it is not able to ingest all the data required due to cost. Costing models should be carefully evaluated based on current and future states. Many businesses find themselves in a position where costs are manageable in the beginning but then quickly spiral out of control as soon as features are added or the solution starts scaling.

Skills and Knowledge

Finally, the implementation of the Observability platform decided on is vital. The term ‘Observability’ is just a label printed on the tin of an Observability tool. Installing an Observability product does not mean that a business now has observability. It needs to be deployed and implemented in the right way to be effective. Finding experienced practitioners who understand the ins and outs of how to do this is also key to success.

The above is by no means a comprehensive list of attributes but to me, they are the more important ones to consider.

Where is Observability heading?

Observability is a vital component in modern distributed architectures and I do not see this changing any time soon. Observability solutions will keep expanding in terms of environment and data source coverage as well as capabilities which help push the Mean Time To Resolve (MTTR) to as close to zero as possible. The use of Machine Learning technologies is becoming ever more prominent and I see this continuing to evolve and provide more efficient ways to predict, detect and resolve issues. There is also a lot of work going into scale, with a drive to accommodate more and more data with improved efficiencies and costs. I can also see solutions moving to incorporate more automation in the resolution process. Observability solutions simply have to grow and evolve as the world it is trying to gain insights from keeps changing.

Learn more about Observability by reading this blog post by Mark Billett, an Observability engineer at LSD.

If you would like to know more about Observability or a Managed Observability Platform, check out our page.

Doug Moll

Doug Moll is a solution architect at LSD, focusing on Observability and Event-Streaming for cloud native platforms.

What is Observability? (Part 1)

What is Observability? (Part 1)

Posted on October 21, 2022October 19, 2022 by Doug Moll
INSIGHTS

In part one of this two-part series of posts, I’ll be discussing my views on the fundamentals and key elements of Observability, as opposed to a technical deep dive. There are many great resources out there which already take a closer look at the key concepts. First off, let’s look at what Observability is.

What is Observability?

The CNCF defines Observability as “the capability to continuously generate and discover actionable insights based on signals from the system under observation”.

Essentially the goal of Observability is to detect and fix problems as fast as possible. In the world of monolithic apps and older architectures, monitoring was often enough to accomplish this goal, but with the world moving to distributed architectures and microservices, it is not always obvious why a problem has occurred by merely monitoring an isolated metric which has spiked.

This is where observability becomes a necessity. With observability basically being a measure of how well the internal state of a system can be understood based on its signals, it stands to reason that all the right data is needed! In a distributed system the right data is typically regarded to be logs, metrics and application traces, often referred to as the “three pillars of observability”.

While these are the generally agreed upon key indicators, it is important in my view to also look at including user experience data, uptime data, as well as synthetic data to provide an end-to-end observable system.

The analyst’s ability to then gain the relevant insights from this data to detect and fix root cause events in the quickest and most efficient way possible is the measure of how effectively observability has been implemented for the system.

There are a number of aspects which can determine the success of your observability efforts, some of which bear more weight than others. There are also tons of observability tools and solutions to choose from. What is fairly typical amongst customers that LSD engages with is that they have numerous tools in their stable but have not achieved their goals in terms of observability, and therefore haven’t achieved the desired state.

Let’s explore this a bit more by looking at what the desired state may look like.

What is the desired state?

This is best explained by looking at an example: A particular service has a spike in latency which is likely picked up through an alert. How does an analyst go from there to determine the root cause of the latency spike?

Firstly the analyst may want to trace the transaction causing the latency spike. For this, they would analyse the full distributed trace of the high latency events. Having identified the transaction, the analyst still does not know the root cause. Some clues may lie in the metrics of the host or container it ran in, so that may be the next course of action. The root cause is mostly determined in the logs, so ultimately the analyst would want to analyse the logs for the specific transaction in question.

The above scenario is fairly simple however achieving this in the most efficient way, relies on the ability to optimally correlate between logs, metrics and traces.

Proper correlation means being able to jump directly from a transaction in a trace to the logs for that specific transaction, or being able to jump directly to the metrics of the container it ran in. To me, the most effective way to achieve this is for all the logs, metrics and traces, to exist in the same observability platform and to share the same schema.

In the digital age, customers want a flawless experience when interacting with businesses. Let’s look at a bank for example. There is no room for error when a service is directly interacting with a customer’s finances. So when an online banking service goes down for three days (it happens), it will lose customers or at least suffer reputational damage.

The ultimate goal is to detect and fix root cause events as quickly and efficiently as possible, and in this, the approach of using multiple tools fails.

In part two of this series, I will discuss the most critical factors which contribute to a good Observability solution that will help businesses reach the goals set out above.

 

Learn more about Observability by reading this blog post by Mark Billett, an Observability engineer at LSD.

If you would like to know more about Observability or a Managed Observability Platform, check out our page.

Doug Moll

Doug Moll is a solution architect at LSD, focusing on Observability and Event-Streaming for cloud native platforms.

Recent Posts

  • The Technical Benefits of Cloud Native architecture (Part 1)
  • What is Event-Streaming?
  • Watch: LSD, VMware and Axiz talk Tanzu at the Prison Break Market
  • Protected: Free VMware Tanzu Proof of Concept for three qualifying companies
  • Wrap-Up | Tech This Out: Ansible

Recent Comments

No comments to show.

Archives

  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • July 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • November 2020
  • August 2020
  • July 2020
  • June 2020
  • April 2020
  • March 2020
  • February 2020

Categories

  • Cloud Native Tech
  • News
  • Press Release
  • Uncategorized
  • Video
  • Why Cloud Native?
Managed Kubernetes Managed Observability Managed Streaming Services Software
Usecases Partners Thinktank (Blog)
Contact Us Terms of Service Privacy Policy Cookie Policy

All Rights Reserved © 2022 | Designed and developed by Handcrafted Brands

MENU logo
  • Home
  • Solutions
    • Managed Kubernetes
    • Managed Observability
    • Managed Event Streaming
  • Services
  • Cloud
    • Cloud Kubernetes
    • Cloud AWS
  • Software
  • Partners
    • Confluent
    • Elastic
    • Gitlab
    • Red Hat
    • SUSE
    • VMWare Tanzu
  • Blog