What are some common topics covered by observability experts in their content?

Observability experts often discuss various aspects such as observability-driven development (ODD), site reliability engineering (SRE), system monitoring, debugging techniques, and best practices for maintaining resilient systems.

How do observability blogs contribute to the understanding of complex systems?

Observability-focused blogs offer insights into understanding complex systems by providing articles, tutorials, and guides. They cover topics ranging from troubleshooting methodologies to implementing observability tools and techniques effectively.

What is the significance of following experts and blogs in the observability space?

Following observability experts and blogs offers a broad perspective on the evolving landscape of monitoring and observability. It helps in staying updated with industry trends, best practices, and innovative approaches in maintaining system reliability.

In what ways do observability experts contribute to the development community?

Observability experts contribute by sharing their knowledge through talks, interviews, podcasts, articles, and discussions. They provide valuable insights, practical advice, and thought leadership in improving system observability and performance.

How do observability-focused blogs aid in the adoption of observability practices?

Observability-centric blogs play a crucial role in aiding the adoption of observability practices by providing accessible content. They offer resources suitable for various skill levels, helping engineers and teams understand, implement, and optimize observability.

15 Observability-driven Development Experts and Blogs to Follow

At ETEAM, we love learning about new practices and technologies. We’ve been following the evolution of observability since the early days of monitoring and believe that it should be included in all stages of the development lifecycle.

We’ve not only written about observability-driven development (ODD) in detail, but we also use it to improve performance, deploy more frequently, and reduce change failure rates 3x. We’ve seen that ODD can cut down incidents and downtime by building applications that are resilient from the start and it can improve business outcomes.

Our goal is to help engineering teams leverage this new ideology to their advantage. We’ve put together some of the best resources around the web, including a list of experts who are redefining observability and using it to solve challenges in unique ways.

Whether you are a newcomer to observability in custom software development or a veteran, you will surely benefit from some insights by following these experts and blogs - insights that you can bring back to your own team and company.

Photos of observability-driven development experts to follow.

Observability experts you need to follow

1. Charity Majors

Co-founder and CTO of Honeycomb, author, podcaster, and Ops engineer, Charity Majors wears many hats - all related to improving development and delivery cycles through observability.

Why follow: Charity Majors is one of the pioneering voices in modern observability. With a career history spanning Parse and Facebook, she is well-placed to diagnose and discuss the challenges of managing and maintaining complex distributed systems at scale. She’s a published co-author at O’Reilly with books such as “Observability Engineering: Achieving Production Excellence”.

Through her talks, interviews, and blog content she highlights the importance of developers taking responsibility for what happens to their code once it hits production. Observability empowers developers to own the full cycle of their code and take a proactive approach to many of the issues faced by complex applications.

Talks about: observability-driven development, site reliability engineering (SRE), database reliability engineering (DBRE), monitoring complex distributed systems, building high-performing development teams

Channels: Personal blog | LinkedIn | X

2. Cindy Sridharan

Known for her book "Distributed Systems Observability," Cindy Sridharan is an author and expert specializing in strategies for building resilient systems and maintainable services regardless of size and load.

Why follow: With a background in infrastructure and API development, Cindy Sridharan’s writings and expertise in distributed systems make her a figure to follow in the observability space. She runs the Prometheus user group and has been on the committee of several leading industry conferences on systems engineering.

From how monitoring has changed in the age of cloud-native to gaining better visibility into system behavior, Cindy is a great resource for learning about the different aspects of observability for developers. If you’re looking for insights to help you choose the best observability strategy for your distributed system, her blog and book are a great place to start.

Talks about: observability for large-scale cloud services, distributed tracing, zero downtime deployments, testing microservices, testing in production, API development

Channels: Personal blog | X

3. Jaana Dogan

Currently a Distinguished Software Engineer at GitHub, Jaana Dogan previously worked at AWS Observability and Google, where she focused on the observability of Go production services.

Why follow: Based on her extensive experience building developer platforms and tools, Jaana Dogan provides an in-depth perspective on how to make systems observable and performant, including optimization techniques, infrastructure considerations, and monitoring tools.

On her blog and Medium, you can find step-by-step tutorials and opinion articles covering everything from working with Go to collecting metrics for observability and best practices for configuration and release management. She also shares insights into the collaboration between engineering teams and how to foster effective development and operations.

Talks about: developer tools, data observability, system performance, system health and debugging, microservices, Go, engineering culture and practices

Channels: Personal blog | Medium | X

4. Liz Fong-Jones

Developer Advocate and Field CTO at Honeycomb, Liz Fong-Jones is another observability veteran who previously worked on products like the Google Cloud Load Balancer.

Why follow: Liz Fong-Jones is an active contributor to the observability community, with numerous talks, publications, videos, and podcasts, mostly focused on site reliability engineering (SRE) and taming complex, distributed systems.

If you are interested in observability beyond just tools and technology, you can follow Liz for insights on how this paradigm is changing who is involved in production, how engineers collaborate, and how they measure success. She also advocates for ethical tech practices, emphasizing the importance of building healthy engineering cultures.

Talks about: observability engineering, troubleshooting and monitoring, SRE, sustainable operations, teamwork, diversity and inclusion in tech, ethical considerations

Channels: Personal blog | LinkedIn

5. Yan Cui

Also known as “The Burning Monk” based on his blog’s name, Yan Cui is an AWS Serverless Hero and independent consultant whose videos, courses, and workshops focus on combining serverless and observability.

Why follow: If you’re not a fan of long technical ebooks and articles, Yan Cui’s YouTube channel provides a host of videos on monitoring, tracing, and debugging techniques specific to serverless applications. He also hosts the “Real-world serverless” podcast and interviews with observability experts.

As an AWS professional, he often provides in-depth tutorials, code samples, and guidance on leveraging AWS services and tools. In his content, he explores ways to gain visibility into serverless systems for better understanding and troubleshooting.

Talks about: observability in serverless architectures, building serverless architectures in AWS, serverless case studies, improving system performance, AI in DevOps

Channels: Personal blog | YouTube | LinkedIn

Blogs and articles on observability you should bookmark

Beginner-friendly

Observability is all about cutting through the complexity to understand how your systems, services, and apps behave and why they act this way. However, the process itself might seem complex, especially for a beginner.

These resources cut through the buzzwords to explain related concepts straightforwardly.

6. o11y.wiki

O11y is the abbreviation for observability and also the name of this wiki covering all the important terms in the field, starting from A to Z. The glossary covers all the definitions you need to know as you start on this journey, from basic concepts like what an alert or a log is to more specific use cases like tail sampling.

7. ETEAM Blog

Cover image of ETEAM article on what is observability in custom software development.

At ETEAM, we don’t only enjoy learning new things, but we also love to give back. So we started a series of easy-to-understand articles for newcomers. They begin with general concepts like what observability is and how it compares to Application Performance Monitoring (APM). They progress to more niche topics like combining observability-driven development and security.

Recommended articles:

Observability best practices

As engineers, we need to ensure custom software development meets business and user needs, while the application is also performing at its peak. Here is where best practices come in.

These blogs cater to varying levels of skill, including experienced DevOps professionals, covering tried-and-tested methods, advanced strategies, and in-depth analysis of observability.

8. Honeycomb Blog

One of the well-regarded voices in the industry, Honeycomb’s blog covers a wide range of best practices, including how to avoid fatigue by properly configuring alerts and instrumenting code to generate meaningful telemetry data.

In the series called Ask Miss O11y, readers can send in questions related to observability best practices and the challenges they face implementing them.

Recommended articles:

9. New Relic Blog

Cover image of New Relic article introducing expert observability series.

Focused on best practices for troubleshooting and improving software performance, New Relic’s blog is another great resource you can check out.

They recently launched “The Expert Observability series”, a collection of articles where New Relic engineers share tips and real-world scenarios of how they perfected their techniques.

Recommended articles:

Observability tools and technologies

An important part of implementing observability is not only choosing the right tools but also configuring them correctly and making sure they fit with your tech stack.

A lot of monitoring platforms publish tutorials on how you can implement observability using their solution or complementary ones.

10. Dynatrace Blog

In addition to articles focused on using Dynatrace for observability purposes, you can also find insights into the toolkit ecosystem at large, such as the difference between observability platforms and observability tools. You can get a good grasp of AIOps tools and advancements in AI-driven observability, a growing trend in the industry.

Recommended articles:

11. ITNEXT on Medium

ITNEXT is a publication on Medium featuring blog articles on next-gen technologies. While not exclusively focused on observability, it does include guides and tutorials on working with tools and standards such as OpenTelemetry, Prometheus, and Service-Level Objective (SLO) generators like Sloth.

Recommended articles:

Industry-specific insights

Each industry not only faces specific risks but also requires a different approach when it comes to what you are monitoring and optimizing for. While financial services might emphasize low latency, healthcare might prioritize data accuracy.

There are a lot of great resources out there to help you delve deeper into industry-specific use cases on observability.

12. Splunk Blog

Besides related topics, Splunk's blog also includes content about observability in various sectors, such as finance, healthcare, the public sector, and more.

Their podcast The Security Detail broadens the discussion on using better system visibility to understand and fight threats across different verticals.

Recommended articles:

13. Cisco AppDynamics Blog

Cover image of Cisco AppDynamics Blog article on what is digital experience monitoring.

If you’re wondering how observability practices apply in different domains, the AppDynamics blog discusses industry use cases in higher education and public services, and also how observability can improve user experience as a whole.

Digital Experience Monitoring is an emerging field that analyzes the quality of user interactions to optimize end-to-end experience.

Recommended articles:

Emerging trends and the future of observability

Observability has grown exponentially over the past few years and other changes are yet to come. Industry experts offer thoughtful and often controversial predictions on how observability and related technologies will evolve and impact business.

Here are just two examples.

14. APM Digest

APM Digest covers a wide range of topics related to the future of application performance management, also publishing articles on observability practices and emerging trends.

Gathering insights from analysts, consultants, and vendors they put together an annual list of predictions covering IT performance and observability topics.

Recommended articles:

15. Grafana Labs Blog

Cover image of Grafana Labs Blog article on Observability Survey 2023.

For the past two years, Grafana Labs has been surveying the state of observability and publishing the results on their blog.

Based on feedback from hundreds of industry practitioners, the report highlights tools and data sources, market maturity, and future priorities in the field.

Recommended articles:

Conclusion

We are all trying to keep our finger on the pulse of application performance monitoring and observability, and at ETEAM we are no different. We hope this curated list will provide you with a useful overview, from beginner-friendly articles, best practices, and future trends to some of the most active voices in the industry.

While systems spread across multi-cloud environments and services make incident response even more complicated, they also emphasize the importance of full-stack visibility. Faster troubleshooting and the ability to easily pinpoint the cause of an incident are invaluable to smooth business operations, making observability a priority.