Chapter 9: Best Practices for Professional Git and GitHub Usage
Chapter 9: Best Practices for Professional Git and GitHub Usage¶
Using Git and GitHub isn't just about knowing the commands; it's also about adopting practices that lead to clearer history, better collaboration, higher code quality, and overall project health. This chapter covers essential best practices used by professional development teams.
9.1 Crafting Excellent Commit Messages¶
Commit messages are the narrative of your project's history. A good commit message tells your future self and your collaborators why a change was made, not just what changed (Git's diff can show the what).
9.1.1 The Seven Rules of a Great Commit Message¶
A widely adopted convention for commit messages, originating from the Git community, provides a simple structure:
- Separate subject from body with a blank line. Git commands like git log(without--oneline) treat the first line as the subject and the rest as the body.
- Limit the subject line to 50 characters. This keeps the subject concise and readable in various Git tools.
- Capitalize the subject line.
- Do not end the subject line with a period.
- Use the imperative mood in the subject line. Write the subject as if you are commanding the codebase (e.g., "Add feature", "Fix bug", "Update documentation", not "Adding feature", "Fixed bug", "Updates documentation").
- Wrap the body at about 72 characters. This improves readability in terminals.
- Use the body to explain what and why vs. how. Provide context, explain the problem the commit solves, or discuss alternative approaches considered.
Example of a good commit message:
Fix: User login issue on production
Addresses a bug where users were unable to log in
due to incorrect validation logic introduced in commit ABC123D.
The previous regex for email validation was too strict,
rejecting valid email formats. This commit updates the regex
to accept a broader range of standard email structures.
Also adds a test case for the previously failing email format.
Resolves #45 #46
This message clearly states the fix in the subject, provides context (what bug, where it came from), explains the details (why the regex was updated), and links to related issues.
9.1.2 Why Good Messages Matter¶
- Understand History: Quickly grasp the purpose of changes using git log --oneline.
- Debugging: Easily identify potential commits that might have introduced a bug using git bisect(an advanced command to find the commit that introduced a bug) orgit blame.
- Code Review: Provide context for reviewers in Pull Requests.
- Onboarding: Help new team members understand the evolution of the project.
- Release Notes: Useful for generating summaries of changes for releases.
Investing a little time in writing good commit messages pays significant dividends in the long run.
9.2 Structuring Repositories¶
How you organize files and folders in your repository can impact project maintainability and clarity.
9.2.1 Monorepos vs. Polyrepos (Briefly)¶
- Polyrepository (Polyrepo): The traditional approach, where each project, service, or component lives in its own independent Git repository.
- Pros: Clear boundaries, easier to manage access for individual components, often simpler build/deployment per component.
- Cons: Managing dependencies between repositories can be complex, harder to make atomic changes across multiple components, difficult to maintain consistent tooling/practices.
- Monorepository (Monorepo): A single repository containing code for many distinct projects, services, or libraries, often within a defined directory structure. Examples include projects by Google, Facebook, or tools like Babel.
- Pros: Easier to manage dependencies (all code is local), simpler to refactor across projects, consistent tooling and CI/CD, atomic commits across components.
- Cons: Can become very large, requires specialized tooling for efficient operation (e.g., Lerna, Nx, Bazel), more complex initial setup, potential challenges with commit noise and access control if not managed well.
Choosing between a monorepo and polyrepo strategy depends on the size of your organization, the relationships between your projects, and available tooling. For most standard projects, a polyrepo structure (one repo per main application/service) is sufficient and simpler to start with.
9.2.2 Organizing Files and Folders¶
Regardless of monorepo or polyrepo, keep your repository structure logical and consistent:
- Standard Root Files: Place essential files like README.md,LICENSE,.gitignore, and potentially aCONTRIBUTING.mdandCODE_OF_CONDUCT.mdin the repository root.
- Source Code: Typically in a srcorappdirectory. Organize code logically within this directory based on function, layer (e.g.,frontend,backend), or component.
- Documentation: A docsfolder is common for more extensive documentation than the README.
- Tests: Often in a testsor__tests__directory, mirroring the structure of the source code.
- Build/Config Files: Keep project-level configuration files (e.g., package.json,pom.xml,Dockerfile, CI config files like.github/workflows) at the root or in dedicated config directories (.config).
- Examples/Demos: A separate directory for examples showing how to use the project.
A clear directory structure helps newcomers understand the project layout quickly.
9.3 Branching Strategies Revisited (Deep Dive)¶
We touched upon workflows in Chapter 5. Let's reinforce best practices for branching.
9.3.1 Feature Branch Workflow Implementation¶
- Always branch for new work: Never work directly on mainordevelop. Create a new branch for every feature, bug fix, or significant change.
- Descriptive branch names: Use clear, concise branch names (e.g., feature/user-profile,bugfix/login-api,chore/update-dependencies). Using prefixes (feature/,bugfix/,chore/,release/) helps categorize branches.
- Keep branches short-lived: Finish work on a feature branch and merge it as soon as possible. Long-lived branches can become difficult to merge due to divergence.
- Regularly integrate from the main branch: While working on a feature branch, regularly pull the latest changes from the base branch (e.g., mainordevelop) into your feature branch. You can do this via merging (git merge main) or rebasing (git rebase main). Rebasing keeps your feature branch history linear before the final merge, but be mindful of rewriting history if you've pushed the feature branch (see Rebase vs Merge in 4.4.2).
- Clean up merged branches: Delete merged branches locally (git branch -d <branch-name>) and on the remote (git push origin --delete <branch-name>) to keep the repository tidy. GitHub often automates remote deletion after a PR merge.
9.3.2 Understanding Gitflow¶
While its strictness isn't necessary for all projects, understanding Gitflow's philosophy is valuable. It formalizes the idea of branches for different stages:
- main: Production-ready code.
- develop: Integration branch for the next release.
- feature/<name>: Branches for new features, branched from- develop, merged into- develop.
- release/<version>: Branches created from- developto prepare for a release. Only bug fixes go here. Merged back into- mainand- develop, then tagged.
- hotfix/<name>: Branches created from- mainto fix critical production bugs. Merged back into- mainand- develop, then tagged.
Gitflow provides a clear structure but adds overhead. Many teams use a simpler "GitHub Flow" (main is deployable, work in feature branches, merge via PR) or a variation of Gitflow.
9.4 Code Review Best Practices¶
Pull Requests facilitate code review, a crucial practice for improving code quality, sharing knowledge, and finding bugs early.
9.4.1 For the Reviewer¶
- Understand the Goal: Read the PR description and linked issues to understand why the changes were made.
- Review Incrementally: For large PRs, review commit by commit or focus on specific areas.
- Provide Constructive Feedback: Be specific, explain why something should change, and suggest alternatives if possible. Focus on code correctness, readability, maintainability, style, tests, and potential side effects.
- Ask Questions: If something is unclear, ask the author.
- Don't Block on Trivialities: Distinguish between required changes (bugs, serious issues) and suggestions (style preferences, minor refactors). Use labels like suggestion.
- Approve When Ready: Use GitHub's "Approve" button when you are satisfied with the changes.
- Be Timely: Review PRs promptly to avoid blocking the author and the release pipeline.
9.4.2 For the Author¶
- Write Clear Descriptions: Provide context in the PR description (see 5.3.1).
- Keep PRs Small: Smaller PRs are easier and faster to review. Break down large features into multiple smaller PRs.
- Self-Review First: Before asking for review, review your own changes. Look for typos, logical errors, or areas that might be confusing.
- Respond to Comments: Address all feedback. Explain your reasoning if you disagree or need clarification.
- Push Changes Based on Feedback: Make requested changes in new commits on the same branch and push.
- Thank Reviewers: Appreciate the time and effort reviewers put in.
Effective code review relies on mutual respect and a shared goal of improving the codebase.
9.5 Managing Large Repositories (Introduction)¶
Very large repositories, especially those containing large binary files (images, videos, datasets, executables), can strain Git's performance and repository size.
9.5.1 Git Large File Storage (LFS)¶
- Problem: Git is designed for text files and storing snapshots efficiently by referencing identical content. Large binary files change frequently and cannot be easily diffed or compressed by Git in the same way, leading to bloated repository sizes and slow operations.
- Solution: Git LFS replaces large files with small pointer files in Git, while the actual file content is stored on a remote LFS server (GitHub provides LFS hosting).
- How it works: You install Git LFS and tell it which file types to track (git lfs track "*.psd"). This adds a.gitattributesfile. When you commit, Git stores the pointer, and Git LFS handles storing the large file content when you push. When you clone or fetch, Git gets the pointers, and Git LFS downloads the actual large files as needed.
If your project involves large assets that need versioning, consider using Git LFS from the start.
9.6 Backup and Recovery Strategies¶
While GitHub hosts your repository, relying solely on GitHub as a backup might not be sufficient for all needs (e.g., if the service is temporarily unavailable or for disaster recovery planning outside of GitHub).
9.6.1 Why Backups are Important¶
Although Git's distributed nature means every clone is technically a full copy of the history, having dedicated backups protects against accidental deletion, account compromise, or service outages.
9.6.2 Methods for Backing up Git Repositories¶
- Regular Cloning: Simply cloning your repository to another location or server acts as a basic backup.
- Bare Clones: A bare clone (git clone --bare <repo-url>) creates a copy of the repository's Git data (.gitdirectory content) without a working directory. This is a clean way to back up the repository's history.
- Using git bundle: Create a single binary file containing the entire repository history or a subset of it. This file can be easily transferred and stored.
You can then clone or fetch from this bundle file. * Third-Party Backup Services: Various services specialize in backing up GitHub (and other Git hosting) repositories. * Automated Scripts: Write scripts that regularly clone or bundle your repositories and store them off-site or in cloud storage.
For critical projects, implement an automated backup strategy independent of GitHub.
Chapter 9 Summary:
We covered essential professional practices for using Git and GitHub effectively. We emphasized the importance of writing clear and conventional commit messages, the factors in structuring repositories (monorepos vs. polyrepos, organizing files), and detailed best practices for branching and using the Feature Branch workflow. We provided guidelines for effective code review, introduced Git LFS for managing large binary files, and discussed strategies for backing up your Git repositories beyond relying solely on the hosting provider.
Adopting these practices will lead to a more organized, collaborative, and maintainable project over time.