Learn everything about our new Bitergia Research branch!

Give Credit Where Credit Is Due: Identify Contributors From Commit Messages

female software engineer coding on computer

Share this Post

Table of Contents

Want to identify all contributors who helped with the source code of an open source project but find yourself limited by what is officially captured in the git-log? Maybe your project, like the Linux Kernel, keeps track of who helped with patches by adding their name to the commit message, but tools don’t usually understand how to analyze this properly.

Bitergia Analytics Platform and its open source GrimoireLab twin now provide a way to analyze commit messages to identify contributors who helped in different roles. Start reading below to learn how to do this!

Patches are Created by People who Fulfill Different Roles Involved in Developing Open Source Software

Software development involves many different roles. Large projects allow for specializing and sharing these roles across different contributors. I’ll describe a fictional scenario to show how many different roles and contributors can be involved in creating a patch. 

Let’s assume that Adam has an idea for a new feature. His contribution is to suggest it to the community. Beth likes the feature idea and develops source code to implement it. Charles is a project maintainer, reviews Beth’s code, and suggests changes. Darryl is another developer who jumps in and helps improve the source code, becoming a co-developer. Eva is a user who is also very interested in the new feature and tests it thoroughly. 

Finally, everyone is happy with the new feature which been added to the open source software. This addition shows up as a single commit with a single author due to how git works. However, a sophisticated open source project would acknowledge everyone who helped by adding their name in the commit message along with the role they played in creating this patch. This may look like this:

commit 9bf1ab975743807f73779ecd42125d1b6973d758
Author: Beth <beth@example.com>
Date: Tue Aug 2 13:30:02 2022 -0700

    Add new awesome feature

    This new feature adds the capability to do more awesome things that we couldn’t do before.

Signed-off-by: Beth <beth@example.com>    
Co-Developed-by: Darryl <darryl@example.com>    
Tested-by: Eva <eva@example.com>    
Reviewed-by: Charles <charles@example.com>    
Suggested-by: Adam <adam@example.com>

 

Most tools that help to analyze software development focus only on the commit author, Beth in our case. However, Adam, Charles, Darryl, and Eva all played an important role, and we want to recognize their contributions. Before we discuss how to do this with Bitergia Analytics Platform, let’s review some use cases in which it would be very useful to identify contributors.

It Is Critical to Understand Who Is Driving an Open Source Project: Contributors, Projects, and Organizations

Let’s look at three use cases in which it is important to identify contributors accurately. Each use case focuses on a different stakeholder: The contributor, the project, and the organization. 

Use Case 1: Contributors as stakeholders

Contributors to open source projects often like to be recognized for their contributions. This is regardless of whether they volunteer their free time or contribute as part of a paid job. Relying only on easy-to-collect contributor information, like the author of a commit, ignores the work done by other contributors. This creates a bias of recognizing only a sub-group of community members. 

The CHAOSS metric Types of Contributions highlights the importance of recognizing all contributors and all their contributions. The challenge is how the contributions are recorded and can be analyzed. In the above example, a project can recognize more contributors by making it a standard to include their names in a commit message if they helped in some capacity. 

Questions that a contributor may want to answer with metrics:

  • Am I being recognized for the contributions that I make?
  • How impactful am I in a project?
  • How many patches was I involved in this month or quarter so that I can show my value to my employer?

Use Case 2: Project as stakeholder

When an open source project becomes large, the community tends to divide the responsibility of maintaining the software. Different modules, subsystems, or other logical organizations are assigned to individual maintainers responsible for this part of the project. In the Linux Kernel, these are called debuties and are trusted to make good choices for the project. From a trusted perspective, it is therefore important to highlight the involvement of these maintainers and the people they trust. Adding in commit messages the names of everyone involved and their role provides transparency that fosters trust and enables large projects to distribute responsibilities. 

Analyzing the collaboration patterns at the project level allows taking action early, before issues become a problem, such as a maintainer becoming inactive and thus a subsystem going unmaintained. This would not be possible if the maintainer’s contributions as a reviewer or approver weren’t tracked.

Questions that a project may want to answer with metrics:

  • Who is regularly involved in the development and maintenance of a specific subsystem and can be trusted to take on leadership roles?
  • How many patches are reviewed by a trusted maintainer of that subsystem?
  • How many patches are reviewed by a trusted maintainer that is not responsible for the subsystem but is dependent on it?

Use Case 3: Organizations as stakeholders

Organizations have vested interests in open source projects they contribute to. For example, an organization may want to show its contributions to an open source project and, for this, needs to be able to analyze how its employees have contributed. An organization may also want to analyze how a competitor is influencing the open source project and analyze where and how the competitor’s employees are contributing. Maybe, an organization is doing contract work for customers and wants to showcase not just the new features that were added but also other work that was done in the project. 

This is important to convince customers of the organization’s qualification when explaining that influence and the ability to make changes to an open source project are earned through long-term engagement and building trust relationships with other project members.

Questions that an organization may want to answer with metrics:

  • How many patches, each month, each quarter, are produced by our employees AND signed-off by our employees?
  • How many patches in the upstream git have a Reported-by tag from one of our engineers?
  • How many patches were signed off by engineers in my team but not authored by them?
Bitergia Analytics for Identify Contributors From Commit Messages

Extract Contributor Information from Commit Messages Using flags

Now that we understand the use cases, let’s look at how we can get the desired metrics. Bitergia added the ability to analyze commit messages for contributors to its Bitergia Analytics Platform and GrimoireLab

This new feature works by reading all git commit messages and extracting information about contributors that appear after these commonly used flags:

  • Signed-off-by
  • Acked-by
  • Co-Developed-by
  • Reported-by
  • Tested-by
  • Reviewed-by
  • Suggested-by

The contributor information from this extraction is connected with the identity information that is maintained with the Bitergia Analytics Platform. This will allow dashboard users to filter contributors by the company and other available metadata, such as bot vs. human contribution. Organizations may also maintain information about which department or team their employees belong to and can now analyze contributions to an open source project at that finer level.

Now that we have this information, we can answer finer questions, such as which patches were signed off by a different person than the commit author. This and other questions can give a lot of insight into the inner workings of an open source project and the role different contributors play. Because this was very interesting to one of our customers, I would like to give an example:

Here is how we can answer the question, “How many patches were signed off by engineers in my team but not authored by them?”:

  • We find that the author of a commit is “Tony”.
  • The signed-off-by field is a list that might read: [“Homer,” “Tony”, “Marge”]
  • We added a field in our data with the author removed: [“Homer,” “Marge”]
  • This means “Homer” and “Marge” signed off on the commit but were not an author. 

Example

With the information provided after these two features are in place, we could apply filters to get, for instance, the following subset of the data:

  • Get all the commits from a project repository
    • which were signed-off by (and non-authored) by people belonging to the following departments:
      • “Foundational Technologies”
      • “Segments and Strategic Initiatives”
      • “Developer Services”
      • “Services”

 


Written by Georg Link in collaboration with Miguel Angel Fernández

Georg Link

Georg Link

Director of Sales at Bitergia

More To Explore

Do You Want To Start
Your Metrics Journey?

drop us a line and Start with a Free Demo!