An open source community is typically very good at reporting problems that they find at any point during their usage of the software. These problems include ones encountered during installation, reading documentation, using or integrating the software, and upgrading to new versions. Once a problem has been found community members are typically good at providing more details and helping to diagnose the problem: after all it is in their best interests to do so. Once a solution has been proposed community members are typically good at indicating that the issue has been resolved.
Open source communities are not so good, mainly because they are not typically encouraged to do so, at providing positive feedback when there are no problems. For example as part of my job I frequently download software from new open source projects, from open source projects that are new to me, and new versions of software from open source projects that I am a frequent user of. Frequently I download software from multiple projects that have similar functionality to see which best suits my needs. If I encounter issues with the software I engage with the project's community. If I do not encounter any problems with the software that prevented me from continuing I don't provide any feedback. Even if the software worked perfectly but I decided to use a different project's offering for some reason I don't contribute this knowledge. There is a lot of positive (or at least non-negative) feedback that would be useful to the administrators and core developers of open source projects including:
- It worked fine in my environment (hardware, operating system, RDBMS, browser etc).
- It worked fine for my use case, with my data.
- It's performance is satisfactory under my conditions (transaction throughput / concurrent users etc)
- I integrated it with these other open source or proprietary products ...
- I'm using this locale ...
This information is important for open source projects because the determination of when the software is 'ready' is made by the project administrators or someone within a commercial open source organization. Either way these people are engaged in the creation of the software and not in its usage. Without feedback from the community that the software is working as designed in all environments these judgments have to be made 'blind'. The estimation of readiness is typically one of hardening. The question is 'have enough community members used these features in enough environments for enough different uses that it has been sufficiently tested and battle hardened.'
If you cannot tell how many people are using the software this determination is very hard. Administrators of open source projects often use a combination of forum posts and download counts to estimate the hardening of a particular release. This estimation makes a number of assumptions:
- The number of downloads of the software is proportional to the number of people using the software. This is not a 1-to-1 ratio. I only end up using a fraction of the open source software that I download. This ratio is impossible to estimate and probably changes over time.
- A larger group of people using the software will apply more use cases to the software. This seems like a reasonable assumption.
- Applying a higher number of use-cases to the software tests more of the logic within the software and finds more defects if they exist. This is generally accepted to be true.
- Community members are, over time and across all geographies and cultures, uniformly likely to report defects that they find. This probably depends on several circumstances of their usage at any one point in time.
- Community members are, over time, uniformly likely to use every feature of the software.
These assumptions are used in conjunction with the largely negative feedback from forums to estimate hardening. For example:
- A certain feature is implemented in and released to the community in Milestone 1 of the software. Milestone 1 is downloaded 30,000 times. Subsequently 22 bugs are reported on the forums against this feature.
- Fixes are attempted for the 22 defects and these are released in Milestone 2 of the software. Milestone 2 is downloaded 20,000 times. Subsequently 8 bugs are reported.
- Fixes are attempted for the 8 defects reported in Milestone 2 and are released in Milestone 3 of the software. Milestone 3 is downloaded 25,000 times. No further bugs are reported.
It seems reasonable to assume that the quality of the feature has improved. However there is no guarantee that any of the 30,000 downloads of Milestone 3 resulted in anyone using the feature in question. The five assumptions above are individually reasonable over a significant period of time, however applying them together to one month's worth of data introduces highly significant variability.
If open source projects were able to use concrete, quantifiable data instead of using these assumptions in there estimation of hardening there are many benefits.
- Community members who are waiting for the 'declared ready' version of the whole release can help the project administrators to make the determination happen sooner.
- Community members who are waiting for features that are coming in the next release can help the development of those features start sooner.
- Community members who are waiting for individual features to be 'declared ready' so that they can work on localizations can help make the determination happen sooner.
- Potential community members who do not have other ways to contribute because they have not found any bugs, or documentation issues, or platform-specific issues etc are provided with a mechanism for participating.
There are good reasons for open source communities to provide positive feedback but typically they do not. I think there are two reasons for this:
- Open source projects have not asked for this feedback in a reliable or consistent way. This document serves as our request for this feedback.
- Open source projects have not provided mechanisms to enable this feedback. The points below describe some of the mechanisms for this.
Positive Feedback Enablers
For the purposes of this discussion I will use the name Positive Feedback Enabler (PFE) to cover embedded techniques that are used to increase the amount of positive feedback. These techniques include Heartbeat, Phone-Home, Version-Checker, Acceptance-Check, Test Suite.
This is a simple signal that is sent from the software to one of the project's servers to indicate that the software is running. The heartbeat does not send any granular or complex data. The results of a heartbeat allow the project administrators to use better installation numbers when estimating the amount of hardening that the software has undergone.
A heartbeat does not provide a direct benefit to the user of the software, only to the project that created the software.
This is a call from the software to one of the project's server that sends information about the environment and usage of the software to one of the project's servers. The user should be able to review the information before it is sent and/or the software should log the information sent for reviewing later.
This is a call from the software to one of the project's servers to determine if a newer version of the software is available. By its very nature a version-checker is an extension of a heartbeat. A Version-Checker helps focus the community on the latest version of the software and so increases the momentum of forward progress.
This is an acknowledgment from the community members that the software is sufficient to meet the needs of their use case. The software could allow a voluntary description of the use case or other voluntary to be sent as well. This method requires participation from the community.
This is a test suite that tests the operations of the software in the user's environment and reports the results to one of the project's servers. This method requires participation from the community. The user should be able to review the information before it is sent and/or the software should log the information sent for reviewing later.
Many of these techniques can be combined together, for example a Test Suite with an Acceptance-Check would validate that the software executes as designed in the user's environment and that it meets their functionality needs.
Opt-In, Opt-Out, and Opt-Now
Any FPE needs to provide the user with a way to enable and disable it. The question of the default operation (enabled or disabled) of a PFE is a topic that many people are passionate about. The choices are:
- Opt-In: The feature is enabled by default and the user has to take action to disable it.
- Opt-Out: The feature is disabled by default and the user has to take action to enable it.
- Opt-Now: The first time the software is run the user is presented with a dialog asking them their preference. Whether or not one of the options is selected by default the user is proactively informed of the feature and asked to choose a preference.
Despite the usefulness of positive feedback for an open source project there are valid reasons for wanting or needing to disable a PFE:
- No Internet Access: The machine running the software does not have access to the internet.
- Firewall or Security Issues: Access to the project's servers is blocked by or causes problems with a firewall.
- Production Environment or Large-Scale rollout: The software is being deployed into a mission-critical or large-scale environment where the behavior of the FPE is undesirable.
- Preference: The preference of the group or individual using the software is that positive feedback is not gathered via a PFE.
Regardless of whether the PFE is by default an opt-in, opt-out, or opt-now feature the user of the software should have the freedom to choose whether to allow the PFE to operate or not.
Do PFE's Violate Open Source Principles?
The Open Source Definition from the OSI (http://www.opensource.org/docs/osd)describes 10 criteria for the distribution of open source software: Free Redistribution, Inclusion of Source Code, Derived Works, Integrity of The Author's Source Code, No Discrimination Against Persons or Groups, No Discrimination Against Fields of Endeavor, Distribution of License, License Must Not Be Specific to a Product, License Must Not Restrict Other Software, License Must Be Technology-Neutral.
In addition to these there are many commonly held principles: Openness, transparency, modularity, 'early and often', and the right to fork.
Of these criteria the only ones that could be affected by a heartbeat, phone-home, or version-checker are the Discrimination ones. These criteria apply to the license and not to the software itself so even these are not clear cases. The technique for gathering the feedback must not discriminate against any person or group or against any field of endeavor. That is to say that the results of using the technique must not be used to discriminate against the users of the software. The best way to ensure this is to make sure that the technique does not gather information that can be used to identify an individual, group, or field of enterprise. The obvious problem here is that of IP address. Any message from the software running on the user's machine to the project's server will be traceable by way of the IP address. This IP address could be used to identify a group and, rarely, an individual (in the case of a one-person organization).
I suggest that the open source project use as many of the open source principles as possible in the creation and operation of the PFE. For example:
- 'Early and Often': The introduction of a PFE should be communicated clearly beforehand and it should not be a complete surprise to the existing community when it shows up in the software.
- Transparency: The purpose and benefits of the PFE should be communicated. The data collected via the PFE should be made available to the community (in summarized form). The interpretation of the data by the project should be communicated. The relation between the PFE data and the methodology used by the project should be communicated. The raw data sent by the PFE should be viewable by the user's of the software.
- Openness: The community should be given the opportunity to provide feedback (positive or negative) about the PFE. The source code of the PFE should be easily located and well documented.
What About Freedom?
PFE's do not violate any of the freedoms that come with open source software provided that their introduction, configuration, and operation are done in a solid open source way.
- You have the freedom to disable the PFE
- You have the freedom to view the source code of the PFE
- You have the freedom to voice your opinion about the PFE
I propose this charter for open source projects that embed Positive Feedback Enablers in their software.
Whether the functionality is opt-in, opt-out, or opt-now it should be possible, for a typical user of the software, to configure its behavior.
- If the software has a user interface for configuration the feature should be configurable through that user interface.
- If the software does not have a user interface for configuration and is configured from properties files or XML files the feature should be configurable using these.
- If the software only has a code-level API the feature should be configurable through the API.
It should be possible to remove the PFE entirely from the software without causing any degradation of any kind.
The data collected through the PFE should never be used to discriminate against any group, individual, or field of endeavor.
The purpose of the PFE should be clearly stated by the project. This should include a description of how the data will be used and interpreted.
The PFE data should be made available in summarized form to the community that is providing the raw data. The project should also publish their interpretation of the data.
The data collected through the PFE should never be made available in a format that allows the identification of a group or individual.
Feedback on this document is welcomed.
Chief Geek, Pentaho