Caddy 0.11 Will Have Telemetry
Today, we are formally announcing that the next major release of Caddy, version 0.11, will feature an integrated telemetry client. We're doing this in the interest of transparency (Caddy is an open source project, after all), so that you can know what to expect and know how to offer feedback or ask questions before the release. We're really excited about this feature and hope you will be, too.
Telemetry is a mechanism used by many software projects including Mozilla Firefox, Microsoft Windows, Canonical's Ubuntu, Google Chrome, and others, to report data for diagnostic, performance measurement, future development, and debugging purposes. It is used to gain insights on usage of the software and to guide technical, often-nuanced development decisions. For example, Firefox telemetry is frequently cited as a source when deprecating old TLS versions and observing HTTPS deployment across the Web. It is also used to help Mozilla developers know how to improve the software's stability and performance, and where to focus their development efforts.
We're really interested in the potential Caddy telemetry affords everyone: its developers, its users, and the broader Internet community. In this post, we'll explain the history and plans for Caddy telemetry, and openly invite your feedback and questions before the release. (Please read this whole post before discussing!)
The Caddy Telemetry Project
Caddy Telemetry was initially announced in January of this year, as an academic course that I got permission to work on under faculty mentorship for university credit. The motivation is three-fold:
Client-side scans of the Internet are not uncommon. They probe servers from a client-side perspective and collect information about server configurations, open ports, and even some vulnerabilities. However, there lacks a good, global data set that observes the behavior and configuration of client software on the Internet from a server perspective.
There is virtually no understanding of how Caddy is currently used or how well it is working. All we have is an estimated download count (currently 500k+ from the website, 1m+ from GitHub, and 5m+ Docker pulls, by the way). Most development decisions have been based largely on speculation or single-user feedback. We desire greater insight into Caddy's use in order to make technical and usability improvements and to improve its security and reliability.
We want to help server operators have resources to improve the reliability and security of their websites. Most monitoring tools operate from the outside, observing the process externally. Logs help, but they are often unstructured, and even if structured, can be difficult to configure and manage because logging tools are hardly specialized for the program they're monitoring. Caddy Telemetry understands Caddy from the inside, creating several interesting possibilities (we'll describe some below).
In essense, we set out to develop a system that could provide the first-ever, near-real-time, global insight into the health and status of the Internet from a server-side perspective. And that's precisely what Caddy Telemetry can do, without being constrained to any single proprietary network or infrastructure.
For the first few months, we called on (and again) Internet and security researchers and anyone else to get involved by offering their feedback and expressing any research goals they might have. The process was documented openly as work progressed, with code pushed to a public branch fairly early on. We want to be clear that this project was not developed behind closed doors: it was a very collaborative process. Thanks to the collective efforts of several who participated in our Slack discussions, and in counsel from academic and research institutions, we've been able to get this up and running in just a few months while drawing from the experience of a diverse group of developers and field experts.
How it Works
The first time Caddy 0.11+ runs, it will generate a random, unique ID which it stores in a file called
~/.caddy/uuid. This ID is not generated based on any information in connection with you or your machine; it is just a standard UUID. It is stored with the same permissions as certificates and keys, but the UUID is not considered secret or cryptographically sensitive.
While Caddy is running, it keeps track of various metrics (mostly counts of things) and sends a payload which includes the UUID to the telemetry server via HTTPS. (Note that, in this case, Caddy is the client, not the server.) In response, the telemetry server tells the client the next time to send an update, which the client then schedules.
Data is pooled in the backend and put into a database, keyed by timestamp. A background worker in the telemetry server normalizes the data to reduce excessive duplication and to save space. Another worker can answer queries by reading from the database.
Caddy's telemetry package is designed to be highly efficient/asynchronous. Even high-traffic servers should not notice any significant, degraded performance. (If you do, please report a bug with a performance profile!)
What is Collected
Telemetry does NOT collect personal information. No cookies, no session IDs, no way to identify individual clients connecting to your server. Telemetry is concerned with benign, aggregate counts: successful TLS connections, HTTP requests handled, response latency, etc.; technical characteristics: properties of TLS handshakes, software version, User-Agent strings, MITM assessments, etc.; and timestamps; things like that.
Most of the data can be observed by anyone watching the network or are things that clients are emitting at a protocol level. Telemetry does NOT peer inside your web site to report things that are secret or personal.
The full list of what is collected will be documented on a new docs page, but for now you can see what is collected by viewing the source code (search the diff for
Accessing the Data
We are pleased to announce that we do not sell the data—we will make the data available for free.
However, we're easing into this and will be making the full data set available in stages over time. There is a lot we need to consider before we just open the firehose and give everyone a big download link. We are learning about privacy, usefulness, storage and bandwidth costs, and all the technical aspects such as database administration, efficiency, protocols and query languages, implementation issues, etc. (I admit, I was intimidated when I first took on this project, but I felt strongly that it could be important and valuable enough to be worth it. I hope I was right.)
At launch, the Caddy website will have a page where you can view some basic global Caddy metrics, such as number of certificates managed, number of requests served, number of TLS handshakes completed, etc. You will also be able to enter your instance's UUID and view details specific to your instance! What would you like to see on that page? We want to know.
Anyway, as the project matures and our understanding grows of what the cost will be, we anticipate making the raw data for individual instances available for free download. In the meantime, if you are are an academic or institutional researcher with a specific research objective, you may contact us to request access to query the database.
Your Controls and Privacy
We are very well aware of the emphasis on user privacy in recent months, and indeed, one of Caddy's primary goals is to enhance privacy across the Internet. Privacy has not taken a backseat in this effort, either, and we've consulted with academic and research institutions on the matter.
One nice advantage of server-side telemetry is that the data is naturally aggregated—not just by metric name, but also by entire individual clients/users. Unlike most client-side telemetry implementations, our telemetry server does NOT receive any connections from individual end users (browsers) or information from any one end user.
You will be in control of your telemetry: you may always choose to not participate in it. In fact, the telemetry server has the ability to remotely disable (but NOT enable!) telemetry in Caddy instances at any time if deemed necessary. It can also disable certain metrics if that is needed. We don't yet foresee needing to use these features, but they are there. In addition, we are looking into ways to facilitate authenticated deletion requests for if you want your telemetry removed from our servers.
On or off by default?
I'll be honest: We really want telemetry to be on by default. We are confident it is useful enough and comes at no cost to you, given that it can be turned off by anyone who really doesn't want it on. We hope you'll be open to this idea, considering the benefits that come with it and how easily it can be disabled.
Unfortunately, research goals and best practice are often at odds with each other when it comes to data collection. If telemetry is opt-in (disabled by default), it immediately introduces a strong bias into the data set. It also reduces the size and usefulness of the data because generally people will not take an extra action to enable something that is not absolutely required. Furthermore, having to enable telemetry "every time" for those who want it may be irritating. This path will probably not upset privacy activists, but it will also produce less useful research data, which partially defeats the point.
If telemetry is opt-out (enabled by default), we eliminate a reasonable amount of bias, but to the protest of some privacy advocates. This has the inverse usability frustractions of opt-in: people who do not want telemetry will have to specify it every time they get or use Caddy. However, opt-out telemetry will likely result in a much larger, more reliable data set, and it will help make Caddy—and the Web—that much better for you and everyone.
One worry we have is that an over-emphasis on once-legitimate, but now exaggerated, privacy concerns will forever lead nowhere productive. We of course do not discount privacy, but we will go forward within reason and make adjustments along the way, knowing that no privacy decision (or law) has ever satisfied 100% of the population. We also acknowledge that many users quietly use Caddy every day, perfectly happy with how it works and with its default settings. Many users will not mind participating in telemetry, just as they do not mind their Firefox or Ubuntu software doing similar—and they may never voice that opinion directly.
Instead, people wanting telemetry have asked for it indirectly, by asking us questions time and time again for years that we could not answer: "How many people use Caddy?" "How many TLS certificates does Caddy manage?" "What's the most number of sites a single Caddy instance serves?" "Is my dev instance accidentally publicly reachable?" "How many requests/second is normal, and what constitutes an anomaly?" "Did Caddy instances see any connections from the Mirai botnet?" "Can we deprecate TLS 1.0 and 1.1 yet?" "Which clients fail to adhere to HSTS?"
We hope that instead of getting caught up in fears and doubts about telemetry, the community will come together constructively and actively participate in telemetry, and help make it better. We have an opportunity to do something here that hasn't been done before. It would be tragic if irrational reactions dominated, resulting in a splintered community effort, and prevented something really great from coming of this. We're going to put our hope and trust in the community here.
We accept that there won't be total consensus on this issue. From talking with a variety of people, we know that not everyone will want telemetry on by default, and not everyone will want telemetry off by default, either.
How representative is the data?
Either way, there will always be the question of how representative the data is of the true population. This is tricky, or impossible, to accurately approximate using the obvious switch of a command-line flag to toggle telemetry. In addition, CLI flags have to specified at every invocation of the server, which can be annoying if you rely on telemetry always being on or off. In Caddy, we generally try to reduce command line flag usage to make it easier to run Caddy. (Indeed, we've shifted a lot of the run-time strain to compile-time by making static binaries, custom builds with our website, etc.)
We have the intuition from other software projects that most users don't frequently change their telemetry settings. Either it's on and it stays that way, or it's off and it stays that way. We assume that's generally true, making a CLI flag more non-sensical, since it applies to each run.
Following that logic, we can easily approximate an error margin by making the telemetry a compile-time choice. Since we would know how many downloads chose to enable telemetry (just as we already know how many downloads are for which platforms or have which plugins), we can be fairly certain how representative it is. This is desirable from a research perspective, and from a user experience perspective as well.
A middle ground
Telemetry doesn't have to be only opt-in or opt-out like black and white, depending on how you get Caddy. We could make telemetry opt-in on the website but opt-out on the source code, for example. Whether a box is checked or unchecked or a variable is changed to true or false is all up in the air, and open to discussion.
Regardless of whether telemetry is on or off by default, every user who is getting Caddy from our website or our official source code repository will have a clear choice about telemetry. (We can't speak for unofficial, community-maintained packages—you'll have to ask them.) It'd basically work like this: enabling telemetry on the website will be as simple as checking a box, and disabling telemetry in the source will be as simple as changing a boolean variable.
We know this will not satisfy everyone, but we also know that many people are looking forward to this new feature.
Our Vision of Telemetry
We're actually not super interested in focusing on whether telemetry should be on or off by default. We're way more excited to know how we can make telemetry useful for you to use! What would you like to see from it?
We look forward to the incredible possibilities this data set makes possible. First, we hope that this data set will enable some great, never-before-possible research. Maybe it can be used to detect emerging botnets as they happen, or detect and warn other instances of DDoS attacks. Or to have a comprehensive sweep of common clients and know what capabilities are out there. This data will make it easier to make informed decisions when it comes to protocols and moving the Web forward technically. We hope the community will participate in telemetry and use it to make the Web safer, more reliable, and more secure. (We really are optimistic about the future of the Web!)
When you look up your instance in the telemetry data, what would you like to see? What kinds of charts/plots, tables, numbers, etc., would you like to be included in that report? Tell us.
Our hope is that you will find telemetry a useful resource. On top of telemetry, we look forward to providing you with premium monitoring/alerting services, advanced reports, and data export directly to third-party services and tools (note that we would sell these tools/services, not the data itself). If enough people participate in telemetry, we may be able to do away with paid licenses, which is even more appealing.
Give Us Feedback, Ask Us Questions!
We have a forum thread dedicated to this discussion. Please join us there to ask your questions and give feedback. We look forward to having you participate with us! And thanks for all you do to make Caddy—and the web—better.
Disclaimer: This post is accurate at time of posting, but things discussed here are subject to change as we go forward. Always check the latest announcements, forum posts, releases, and especially the docs for the most up-to-date information.
« Blog Index