messy-diffstat.rst 4.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
  1. .. SPDX-License-Identifier: GPL-2.0
  2. =====================================
  3. Handling messy pull-request diffstats
  4. =====================================
  5. Subsystem maintainers routinely use ``git request-pull`` as part of the
  6. process of sending work upstream. Normally, the result includes a nice
  7. diffstat that shows which files will be touched and how much of each will
  8. be changed. Occasionally, though, a repository with a relatively
  9. complicated development history will yield a massive diffstat containing a
  10. great deal of unrelated work. The result looks ugly and obscures what the
  11. pull request is actually doing. This document describes what is happening
  12. and how to fix things up; it is derived from The Wisdom of Linus Torvalds,
  13. found in Linus1_ and Linus2_.
  14. .. _Linus1: https://lore.kernel.org/lkml/CAHk-=wg3wXH2JNxkQi+eLZkpuxqV+wPiHhw_Jf7ViH33Sw7PHA@mail.gmail.com/
  15. .. _Linus2: https://lore.kernel.org/lkml/CAHk-=wgXbSa8yq8Dht8at+gxb_idnJ7X5qWZQWRBN4_CUPr=eQ@mail.gmail.com/
  16. A Git development history proceeds as a series of commits. In a simplified
  17. manner, mainline kernel development looks like this::
  18. ... vM --- vN-rc1 --- vN-rc2 --- vN-rc3 --- ... --- vN-rc7 --- vN
  19. If one wants to see what has changed between two points, a command like
  20. this will do the job::
  21. $ git diff --stat --summary vN-rc2..vN-rc3
  22. Here, there are two clear points in the history; Git will essentially
  23. "subtract" the beginning point from the end point and display the resulting
  24. differences. The requested operation is unambiguous and easy enough to
  25. understand.
  26. When a subsystem maintainer creates a branch and commits changes to it, the
  27. result in the simplest case is a history that looks like::
  28. ... vM --- vN-rc1 --- vN-rc2 --- vN-rc3 --- ... --- vN-rc7 --- vN
  29. |
  30. +-- c1 --- c2 --- ... --- cN
  31. If that maintainer now uses ``git diff`` to see what has changed between
  32. the mainline branch (let's call it "linus") and cN, there are still two
  33. clear endpoints, and the result is as expected. So a pull request
  34. generated with ``git request-pull`` will also be as expected. But now
  35. consider a slightly more complex development history::
  36. ... vM --- vN-rc1 --- vN-rc2 --- vN-rc3 --- ... --- vN-rc7 --- vN
  37. | |
  38. | +-- c1 --- c2 --- ... --- cN
  39. | /
  40. +-- x1 --- x2 --- x3
  41. Our maintainer has created one branch at vN-rc1 and another at vN-rc2; the
  42. two were then subsequently merged into c2. Now a pull request generated
  43. for cN may end up being messy indeed, and developers often end up wondering
  44. why.
  45. What is happening here is that there are no longer two clear end points for
  46. the ``git diff`` operation to use. The development culminating in cN
  47. started in two different places; to generate the diffstat, ``git diff``
  48. ends up having pick one of them and hoping for the best. If the diffstat
  49. starts at vN-rc1, it may end up including all of the changes between there
  50. and the second origin end point (vN-rc2), which is certainly not what our
  51. maintainer had in mind. With all of that extra junk in the diffstat, it
  52. may be impossible to tell what actually happened in the changes leading up
  53. to cN.
  54. Maintainers often try to resolve this problem by, for example, rebasing the
  55. branch or performing another merge with the linus branch, then recreating
  56. the pull request. This approach tends not to lead to joy at the receiving
  57. end of that pull request; rebasing and/or merging just before pushing
  58. upstream is a well-known way to get a grumpy response.
  59. So what is to be done? The best response when confronted with this
  60. situation is to indeed to do a merge with the branch you intend your work
  61. to be pulled into, but to do it privately, as if it were the source of
  62. shame. Create a new, throwaway branch and do the merge there::
  63. ... vM --- vN-rc1 --- vN-rc2 --- vN-rc3 --- ... --- vN-rc7 --- vN
  64. | | |
  65. | +-- c1 --- c2 --- ... --- cN |
  66. | / | |
  67. +-- x1 --- x2 --- x3 +------------+-- TEMP
  68. The merge operation resolves all of the complications resulting from the
  69. multiple beginning points, yielding a coherent result that contains only
  70. the differences from the mainline branch. Now it will be possible to
  71. generate a diffstat with the desired information::
  72. $ git diff -C --stat --summary linus..TEMP
  73. Save the output from this command, then simply delete the TEMP branch;
  74. definitely do not expose it to the outside world. Take the saved diffstat
  75. output and edit it into the messy pull request, yielding a result that
  76. shows what is really going on. That request can then be sent upstream.