At ETEAM, we love learning about new practices and technologies. We’ve been following the evolution of observability since the early days of monitoring and believe that it should be included in all stages of the development lifecycle.
We’ve not only written about observability-driven development (ODD) in detail, but we also use it to improve performance, deploy more frequently, and reduce change failure rates 3x. We’ve seen that ODD can cut down incidents and downtime by building applications that are resilient from the start and it can improve business outcomes.
Our goal is to help engineering teams leverage this new ideology to their advantage. We’ve put together some of the best resources around the web, including a list of experts who are redefining observability and using it to solve challenges in unique ways.
Whether you are a newcomer to observability in custom software development or a veteran, you will surely benefit from some insights by following these experts and blogs - insights that you can bring back to your own team and company.
Observability experts you need to follow
1. Charity Majors
Co-founder and CTO of Honeycomb, author, podcaster, and Ops engineer, Charity Majors wears many hats - all related to improving development and delivery cycles through observability.
Why follow: Charity Majors is one of the pioneering voices in modern observability. With a career history spanning Parse and Facebook, she is well-placed to diagnose and discuss the challenges of managing and maintaining complex distributed systems at scale. She’s a published co-author at O’Reilly with books such as “Observability Engineering: Achieving Production Excellence”.
Through her talks, interviews, and blog content she highlights the importance of developers taking responsibility for what happens to their code once it hits production. Observability empowers developers to own the full cycle of their code and take a proactive approach to many of the issues faced by complex applications.
Talks about: observability-driven development, site reliability engineering (SRE), database reliability engineering (DBRE), monitoring complex distributed systems, building high-performing development teams
Channels: Personal blog | LinkedIn | X
2. Cindy Sridharan
Known for her book "Distributed Systems Observability," Cindy Sridharan is an author and expert specializing in strategies for building resilient systems and maintainable services regardless of size and load.
Why follow: With a background in infrastructure and API development, Cindy Sridharan’s writings and expertise in distributed systems make her a figure to follow in the observability space. She runs the Prometheus user group and has been on the committee of several leading industry conferences on systems engineering.
From how monitoring has changed in the age of cloud-native to gaining better visibility into system behavior, Cindy is a great resource for learning about the different aspects of observability for developers. If you’re looking for insights to help you choose the best observability strategy for your distributed system, her blog and book are a great place to start.
Talks about: observability for large-scale cloud services, distributed tracing, zero downtime deployments, testing microservices, testing in production, API development
Channels: Personal blog | X
3. Jaana Dogan
Currently a Distinguished Software Engineer at GitHub, Jaana Dogan previously worked at AWS Observability and Google, where she focused on the observability of Go production services.
Why follow: Based on her extensive experience building developer platforms and tools, Jaana Dogan provides an in-depth perspective on how to make systems observable and performant, including optimization techniques, infrastructure considerations, and monitoring tools.
On her blog and Medium, you can find step-by-step tutorials and opinion articles covering everything from working with Go to collecting metrics for observability and best practices for configuration and release management. She also shares insights into the collaboration between engineering teams and how to foster effective development and operations.
Talks about: developer tools, data observability, system performance, system health and debugging, microservices, Go, engineering culture and practices
Channels: Personal blog | Medium | X
4. Liz Fong-Jones
Developer Advocate and Field CTO at Honeycomb, Liz Fong-Jones is another observability veteran who previously worked on products like the Google Cloud Load Balancer.
Why follow: Liz Fong-Jones is an active contributor to the observability community, with numerous talks, publications, videos, and podcasts, mostly focused on site reliability engineering (SRE) and taming complex, distributed systems.
If you are interested in observability beyond just tools and technology, you can follow Liz for insights on how this paradigm is changing who is involved in production, how engineers collaborate, and how they measure success. She also advocates for ethical tech practices, emphasizing the importance of building healthy engineering cultures.
Talks about: observability engineering, troubleshooting and monitoring, SRE, sustainable operations, teamwork, diversity and inclusion in tech, ethical considerations
Channels: Personal blog | LinkedIn
5. Yan Cui
Also known as “The Burning Monk” based on his blog’s name, Yan Cui is an AWS Serverless Hero and independent consultant whose videos, courses, and workshops focus on combining serverless and observability.
Why follow: If you’re not a fan of long technical ebooks and articles, Yan Cui’s YouTube channel provides a host of videos on monitoring, tracing, and debugging techniques specific to serverless applications. He also hosts the “Real-world serverless” podcast and interviews with observability experts.
As an AWS professional, he often provides in-depth tutorials, code samples, and guidance on leveraging AWS services and tools. In his content, he explores ways to gain visibility into serverless systems for better understanding and troubleshooting.
Talks about: observability in serverless architectures, building serverless architectures in AWS, serverless case studies, improving system performance, AI in DevOps
Channels: Personal blog | YouTube | LinkedIn
Blogs and articles on observability you should bookmark
Beginner-friendly
Observability is all about cutting through the complexity to understand how your systems, services, and apps behave and why they act this way. However, the process itself might seem complex, especially for a beginner.
These resources cut through the buzzwords to explain related concepts straightforwardly.
6. o11y.wiki
O11y is the abbreviation for observability and also the name of this wiki covering all the important terms in the field, starting from A to Z. The glossary covers all the definitions you need to know as you start on this journey, from basic concepts like what an alert or a log is to more specific use cases like tail sampling.
7. ETEAM Blog
At ETEAM, we don’t only enjoy learning new things, but we also love to give back. So we started a series of easy-to-understand articles for newcomers. They begin with general concepts like what observability is and how it compares to Application Performance Monitoring (APM). They progress to more niche topics like combining observability-driven development and security.
Recommended articles:
Observability best practices
As engineers, we need to ensure custom software development meets business and user needs, while the application is also performing at its peak. Here is where best practices come in.
These blogs cater to varying levels of skill, including experienced DevOps professionals, covering tried-and-tested methods, advanced strategies, and in-depth analysis of observability.
8. Honeycomb Blog
One of the well-regarded voices in the industry, Honeycomb’s blog covers a wide range of best practices, including how to avoid fatigue by properly configuring alerts and instrumenting code to generate meaningful telemetry data.
In the series called Ask Miss O11y, readers can send in questions related to observability best practices and the challenges they face implementing them.
Recommended articles:
-
Ask Miss O11y: As a Developer, How Can I Try Out Observability?
-
Ask Miss O11y: How Can I Convince My Organization to Invest in Instrumenting for Observability?
9. New Relic Blog
Focused on best practices for troubleshooting and improving software performance, New Relic’s blog is another great resource you can check out.
They recently launched “The Expert Observability series”, a collection of articles where New Relic engineers share tips and real-world scenarios of how they perfected their techniques.
Recommended articles:
Observability tools and technologies
An important part of implementing observability is not only choosing the right tools but also configuring them correctly and making sure they fit with your tech stack.
A lot of monitoring platforms publish tutorials on how you can implement observability using their solution or complementary ones.
10. Dynatrace Blog
In addition to articles focused on using Dynatrace for observability purposes, you can also find insights into the toolkit ecosystem at large, such as the difference between observability platforms and observability tools. You can get a good grasp of AIOps tools and advancements in AI-driven observability, a growing trend in the industry.
Recommended articles:
11. ITNEXT on Medium
ITNEXT is a publication on Medium featuring blog articles on next-gen technologies. While not exclusively focused on observability, it does include guides and tutorials on working with tools and standards such as OpenTelemetry, Prometheus, and Service-Level Objective (SLO) generators like Sloth.
Recommended articles:
Industry-specific insights
Each industry not only faces specific risks but also requires a different approach when it comes to what you are monitoring and optimizing for. While financial services might emphasize low latency, healthcare might prioritize data accuracy.
There are a lot of great resources out there to help you delve deeper into industry-specific use cases on observability.
12. Splunk Blog
Besides related topics, Splunk's blog also includes content about observability in various sectors, such as finance, healthcare, the public sector, and more.
Their podcast The Security Detail broadens the discussion on using better system visibility to understand and fight threats across different verticals.
Recommended articles:
-
Observability for the Public Sector: Greater Visibility for a More Resilient Digital Future
-
The Security Detail Podcast: Exploring Cyber Threats Across Different Industries
13. Cisco AppDynamics Blog
If you’re wondering how observability practices apply in different domains, the AppDynamics blog discusses industry use cases in higher education and public services, and also how observability can improve user experience as a whole.
Digital Experience Monitoring is an emerging field that analyzes the quality of user interactions to optimize end-to-end experience.
Recommended articles:
Emerging trends and the future of observability
Observability has grown exponentially over the past few years and other changes are yet to come. Industry experts offer thoughtful and often controversial predictions on how observability and related technologies will evolve and impact business.
Here are just two examples.
14. APM Digest
APM Digest covers a wide range of topics related to the future of application performance management, also publishing articles on observability practices and emerging trends.
Gathering insights from analysts, consultants, and vendors they put together an annual list of predictions covering IT performance and observability topics.
Recommended articles:
-
2024 Application Performance Management Predictions - Part 2: Observability
-
2024 Application Performance Management Predictions - Part 3: Observability
15. Grafana Labs Blog
For the past two years, Grafana Labs has been surveying the state of observability and publishing the results on their blog.
Based on feedback from hundreds of industry practitioners, the report highlights tools and data sources, market maturity, and future priorities in the field.
Recommended articles:
Conclusion
We are all trying to keep our finger on the pulse of application performance monitoring and observability, and at ETEAM we are no different. We hope this curated list will provide you with a useful overview, from beginner-friendly articles, best practices, and future trends to some of the most active voices in the industry.
While systems spread across multi-cloud environments and services make incident response even more complicated, they also emphasize the importance of full-stack visibility. Faster troubleshooting and the ability to easily pinpoint the cause of an incident are invaluable to smooth business operations, making observability a priority.