bluekeyes / go-gitdiff Goto Github PK
View Code? Open in Web Editor NEWGo library for parsing and applying patches created by Git
License: MIT License
Go library for parsing and applying patches created by Git
License: MIT License
Similar to #16: git format-patch
often adds stuff after a ---
line; and many people add things there too. So you might have a message that looks like this:
Subject: [PATCH] Implement foo bar
Blah blah blah
S-o-b: <[email protected]>
---
CC: [email protected]
CC: [email protected]
xen/common/domain.c | 10 ++++++++++
1 file changed, 10 insertions(+)
git am
always ends up removing anything after the ---
, because it actually interprets ---
as the beginning of the patch.
Would you be open to having PatchPatchHeader
separate out this extra information into a separate field? Maybe, BodyAppendix
or something like that?
If so I can write something up & send a PR.
To mirror Parse
, the library should export an Apply(io.Writer, io.ReaderAt, *File) error
function as a convenience wrapper for using Applier
with default settings. This will be strict application for now, but would probably change to fuzzy application if that's ever implemented.
I recently encountered a patch containing this file deletion (paths sanitized):
diff --git a/path/to/file/File with Spaces.pdf b/path/to/file/File with Spaces.pdf
deleted file mode 100644
index 6e02dcd4fabc172009aca3a6f78763246c59b8fe..0000000000000000000000000000000000000000
I think I assumed these would be quoted, but Git does not seem to consider spaces special characters when generating patches. This leads to a git file header: missing filename information
error.
Check behavior against git_header_name
in apply.c
to see how Git handles this.
The single Applier
type does some messy internal state tracking to avoid mixing ApplyFile
, ApplyTextFragment
, and ApplyBinaryFragment
. I think the following would be better:
TextApplier
(in apply_text.go
) and BinaryApplier
(in apply_binary.go
). Each of these has methods to apply single fragments (and multiple fragments, in the case of TextApplier
.)Reset
method and rename Flush
to Close
to better indicate that apply types are single-useApplier
type and the ApplyFile
methodApplyFile
to the global Apply
function. This is the convenience function to select an applier based on the file type and execute it.This should reduce confusion and provides an obvious place for the eventual text-only options for fuzzy apply.
I have a git patch that is triggering this line
go-gitdiff/gitdiff/patch_header.go
Line 114 in 13e8639
commit 44b179bf547c84cb588480558de71df1e9243aaf
Author: bot-deploy Github Action <>
Date: Tue Mar 5 17:07:58 2024 +0000
Export updated bot artifact
diff --git a/bot_exports/ba3e9571-b1d9-45cb-be06-a7b4a2e279e7.blob b/bot_exports/ba3e9571-b1d9-45cb-be06-a7b4a2e279e7.blob
index 4ea75f9..f92448c 100644
Binary files a/bot_exports/ba3e9571-b1d9-45cb-be06-a7b4a2e279e7.blob and b/bot_exports/ba3e9571-b1d9-45cb-be06-a7b4a2e279e7.blob differ
Author: bot-deploy Github Action <>
is causing parsing to fail. Can this be gracefully handled, like returning an empty email?
If a patch is malformed or a File
is created directly, various fields may disagree. Add a validate function that checks for these types of issues so clients (e.g. appliers) can rely on the content of the fields.
Some of the issues to check:
IsRename
is true/false but OldName
and NewName
are equal/not equalIsDelete
or IsNew
is true but there is more than one fragmentIsDelete
is true but the single fragment has context or addition lines or NewPosition
and NewLines
are not 0IsNew
is true but the single fragment has context or deletion lines or OldPosition
and OldLines
are not 0IsBinary
is true but TextFragments
is not emptyAfter coming back to it to add some features, I'm not happy with the LineReaderAt
interface and to some extent the use of io.ReaderAt
. This is mostly for text patches, since io.ReaderAt
is actually an ideal interface for the needs of binary patches.
Things I don't like:
io.EOF
.io.EOF
.LineReaderAt
wrapping an io.ReaderAt
feels complicated, but maybe this is inevitable when you need to build a line index dynamicallyAny solution needs to solve the following constraints:
Things I've considered:
io.ReaderAt
and LineReaderAt
: this works well for binary applies (it's the minimal method needed to implement them), but has the problems outlined above for text applies.
io.ReadSeeker
: this enables the same features as io.ReaderAt
(and is implemented by the same standard library types) but the position tracking and Read
function make some things (like copying) easier. Since I don't plan to support concurrent use of the same source, I'm not sure if there's a major difference between using Read
and Seek
versus using ReadAt
.
[]byte
: this is simple and supports random access, but doesn't allow much flexibility. The whole source must be in memory and the apply functions will compute the line index as needed even if there was a more efficient way to get it. On the other hand, it reduces the need for internal buffers, so the number of allocations is probably lower. For what it's worth, git
takes this approach and reads the full source file into memory for applies.
In my usage so far, everything is already in memory for other reasons, so the []byte
might be the simplest. Or maybe io.ReaderAt
is the correct interface and I just need a better abstraction on top of it for line operations.
From a94db29e472831db7a75ba52e99ab717c17886eb Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <59619111+dependabot[bot]@users.noreply.github.com>
Date: Mon, 29 Apr 2024 17:31:28 +0000
Subject: [PATCH] =?UTF-8?q?=E2=9C=85=20(deps):=20Bump=20schemas=20fr?=
=?UTF-8?q?om=202.11.20240425191412=20to=202.11.20240429164216=20(#10143)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
---
Gemfile.lock | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Gemfile.lock b/Gemfile.lock
index 5b1718812c..5e83f3d1fa 100644
--- a/Gemfile.lock
+++ b/Gemfile.lock
@@ -109,7 +109,7 @@ GEM
faraday
flipper
jwt
- schemas (2.11.20240425191412)
+ schemas (2.11.20240429164216)
google-protobuf (~> 3.21)
googleapis-common-protos-types
twirp (>= 1.7)
For the above patch, ParsePatchHeader
is failing with mail: missing @ in addr-spec
. I completely understand that the email 59619111+dependabot[bot]@users.noreply.github.com
is invalid, and the code for parsing logic belongs to net/mail
package. Just wanted to post it hear anyways to hear your thoughts on it. Please close it as invalid if it's not worth your time. Thank you!
In a patch that includes a mode, but does not change it, only the OldMode
field is set. Copy this value into the NewMode
field as well for convenience.
Looking through decode_header
in the Git source, it looks like there are several possible encodings. Currently, we only support quoted-printable UTF-8 and ignore anything else (implemented in #25.)
To support arbitrary encodings, I think we need to:
=?
and the next ?
q?
or b?
to determine if the content until the next =?
is encoded as quoted-printable
or base64
ianaindex.MIME
to look up the encodingHi!
We are using this lib over at https://pr.pico.sh and so far haven't had any issues. So thank you for your hard work!
We are trying to support cover letters in our patch request workflow, but it looks like since the patch itself is empty, go-gitdiff
skips over it and returns empty header data. Ideally this lib would still process the empty patch even if there aren't any diffs in it.
I'm curious what you think about adding support for empty patches?
Patches generated with git format-patch
will have [PATCH]
in the subject line; at the moment, this doesn't seem to be removed by ParsePatchHeader()
.
It might be a good idea to just implement the equivalent of https://github.com/git/git/blob/master/mailinfo.c 's cleanup_subject()
, which seems to remove the following things at the beginning of the patch title:
Re:
and variationsI might take at implementing this if I have time.
Hi!
I have a need to convert a patch into a normal diff, is there some functionality within this library that could facilitate that or do you have any recommendations?
I'd rather not parse the patch and then reconstruct it. I could parse out the header information and any trailing git version numbers at the bottom but I feel like this library already does most of that work.
Thanks!
By default, for email formatted patches, ParsePatchHeader
will remove all content in square brackets and place it in a separate field. This matches the default behavior of git
, but git
also includes flags to disable cleaning completely or to only remove content in square brackets that contains the word PATCH
.
It should be a backwards-compatible change to have ParsePatchHeader
accept option functions that can disable the default behavior or switch to only removing [PATCH]
content.
Hi.
Thanks for this library. It saved me lots of time. However, I'm currently facing a problem.
The following code run indefinitely and eats RAM:
package main
import (
"bytes"
"github.com/bluekeyes/go-gitdiff/gitdiff"
)
const (
diff = `
diff --git a/app/controllers/seances_controller.rb b/app/controllers/seances_controller.rb
index 743d0ad..4f4d4e8 100644
--- a/app/controllers/seances_controller.rb
+++ b/app/controllers/seances_controller.rb
@@ -5,8 +5,6 @@ class SeancesController < ApplicationController
if authorization_result.code != 200
return render_by_status_code(code: authorization_result.code, data: authorization_result.data)
end
-
- render_by_status_code(code: 200, data: json)
end
def create
`
)
var (
body = `class SeancesController < ApplicationController
def index
set_auth_operation_id_header
if authorization_result.code != 200
return render_by_status_code(code: authorization_result.code, data: authorization_result.data)
end
render_by_status_code(code: 200, data: json)
end
def create
seance = Seance.new(
movie_id: params[:movie_id],
price: params[:price],
datetime: params[:datetime]
)
if seance.save!
render json: {
data: {
id: seance.id,
type: 'seances',
attributes: { datetime: seance.datetime, price: seance.price },
seats: Seat.pluck(:id).map do |seat_id|
{ id: seat_id, vacant: true }
end
}
}
end
rescue ActiveRecord::RecordInvalid => e
render_invalid_record(message: e.message)
end
def destroy
ActiveRecord::Base.transaction do
Booking.where(seance: params[:id]).destroy_all
Seance.find(params[:id]).destroy
end
render json: { data: [{ id: params[:id], type: 'seances' }] }
end
def json
seats_ids = Seat.pluck(:id)
Seance.includes(:bookings).where(movie: params[:movie_id]).order('datetime').limit(params[:max_results] || 50).map do |seance|
booking_seats_ids = seance.bookings.pluck(:seat_id)
{
id: seance.id,
price: seance.price,
datetime: seance.datetime,
seats: seats_ids.map do |seat_id|
{ id: seat_id, vacant: !(booking_seats_ids.include?(seat_id)) }
end
}
end
end
end
`
)
func main() {
files, _, err := gitdiff.Parse(bytes.NewBufferString(diff))
if err != nil {
panic(err)
}
for _, file := range files {
writer := bytes.NewBuffer(nil)
reader := bytes.NewReader([]byte(body))
appl := gitdiff.NewApplier(reader)
if err := appl.ApplyFile(writer, file); err != nil {
panic(err)
}
}
}
I debugged it to somewhere in Flush() operation. The internals keeps copying same lines over and over.
I would appreciate your help in debugging this.
Ivan
Looks like your library is a bit out-of-date? Line 63 for me in sample I was testing with looks like this:
new file mode 100644
Currently, an Applier
can only apply patches in "strict" mode, where line numbers and context lines must match exactly. Git supports a more flexible model when applying patches that allow them to work in more situations, such as cherry-picking changes to different branches:
I think copying Git's whitespace normalization could get complicated, but it would be nice to at least support exact matches on different lines or matches with reduced context.
time.Parse
isn't guaranteed to throw errors, when it sees elements that it doesn't recognize it zeros them. https://play.golang.org/p/4kbScfG56Ic
Elements omitted from the value are assumed to be zero or, when zero is impossible, one, so parsing "3:04pm" returns the time corresponding to Jan 1, year 0, 15:04:00 UTC (note that because the year is 0, this time is before the zero Time). Years must be in the range 0000..9999. The day of the week is checked for syntax but it is otherwise ignored.
It might be worth checking the time to see if the year is 1 in addition to checking if err is nil, seeing that git was released in 2005 :D I know this would make it unusable for time travelers but ๐คท
Personally I would just use something like https://github.com/kierdavis/dateparser https://play.golang.org/p/-yRXt4qPAZo
Hi,
I'm using go-gitdiff to parse git diff patch files in Gitlab and derive some metrics, I used to do same by using diff response of Gitlab diff API.
I see there is some discrepancy between the results.
If I move & rename certain file with with minimal changes, go-gitdiff patch parsed output shows the file in old path as deleted -
(*gitdiff.File)(0xc00013e480)({
OldName: (string) (len=59) "adapters/phasedAdapters/executeAdapters/execute_adapters.go",
NewName: (string) "",
IsNew: (bool) false,
IsDelete: (bool) true,
IsCopy: (bool) false,
IsRename: (bool) false,
OldMode: (os.FileMode) -rw-r--r--,
NewMode: (os.FileMode) ----------,
OldOIDPrefix: (string) (len=7) "f08cca6",
NewOIDPrefix: (string) (len=7) "0000000",
Score: (int) 0
and shows the file at new path as new file -
(*gitdiff.File)(0xc00013e2d0)({
OldName: (string) "",
NewName: (string) (len=33) "adapters/phase/execute/execute.go",
IsNew: (bool) true,
IsDelete: (bool) false,
IsCopy: (bool) false,
IsRename: (bool) false,
OldMode: (os.FileMode) ----------,
NewMode: (os.FileMode) -rw-r--r--,
OldOIDPrefix: (string) (len=7) "0000000",
NewOIDPrefix: (string) (len=7) "a1d4924",
Score: (int) 0
whereas in Gitlab MR diff view I get move and rename as -
adapters/phasedAdapters/executeAdapters/execute_adapters.go โ adapters/phase/execute/execute.go
In Gitlab diff API, the response for the file I get it as -
{
"old_path": "adapters/phasedAdapters/executeAdapters/execute_adapters.go",
"new_path": "adapters/phase/execute/execute.go",
"a_mode": "100644",
"b_mode": "100644",
"new_file": false,
"renamed_file": true,
"deleted_file": false
}
Gitlab determines the file was just renamed and move to different path whereas go-gitdiff assumes it as a new file and considers file at old path as deleted, is this expected?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.