GithubHelp home page GithubHelp logo

Comments (2)

tfnico avatar tfnico commented on June 29, 2024

Also MacRoman encoding keeps slipping into some of our repositories. When pushing a diff that contains macroman characters, I get this:

➜  ~/projects/agnes/[master]>git push                                                                                                                                                                                   tfnico@thomas-ferris-nicolaisens-imac [15:17:57]
Counting objects: 5, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 336 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
remote: Sending mail...
remote: /usr/lib/ruby/gems/1.9.1/gems/git-commit-notifier-0.11.1/lib/git_commit_notifier/diff_to_html.rb:341:in `split': invalid byte sequence in UTF-8 (ArgumentError)
remote:     from /usr/lib/ruby/gems/1.9.1/gems/git-commit-notifier-0.11.1/lib/git_commit_notifier/diff_to_html.rb:341:in `extract_commit_info_from_git_show_output'
remote:     from /usr/lib/ruby/gems/1.9.1/gems/git-commit-notifier-0.11.1/lib/git_commit_notifier/diff_to_html.rb:475:in `diff_for_commit'
remote:     from /usr/lib/ruby/gems/1.9.1/gems/git-commit-notifier-0.11.1/lib/git_commit_notifier/diff_to_html.rb:627:in `block in diff_for_branch'
remote:     from /usr/lib/ruby/gems/1.9.1/gems/git-commit-notifier-0.11.1/lib/git_commit_notifier/diff_to_html.rb:626:in `each'
remote:     from /usr/lib/ruby/gems/1.9.1/gems/git-commit-notifier-0.11.1/lib/git_commit_notifier/diff_to_html.rb:626:in `diff_for_branch'
remote:     from /usr/lib/ruby/gems/1.9.1/gems/git-commit-notifier-0.11.1/lib/git_commit_notifier/diff_to_html.rb:675:in `diff_between_revisions'
remote:     from /usr/lib/ruby/gems/1.9.1/gems/git-commit-notifier-0.11.1/lib/git_commit_notifier/commit_hook.rb:118:in `run'
remote:     from /usr/lib/ruby/gems/1.9.1/gems/git-commit-notifier-0.11.1/lib/git_commit_notifier/executor.rb:29:in `block in run!'
remote:     from /usr/lib/ruby/gems/1.9.1/gems/git-commit-notifier-0.11.1/lib/git_commit_notifier/executor.rb:27:in `each_line'
remote:     from /usr/lib/ruby/gems/1.9.1/gems/git-commit-notifier-0.11.1/lib/git_commit_notifier/executor.rb:27:in `run!'
remote:     from /usr/lib/ruby/gems/1.9.1/gems/git-commit-notifier-0.11.1/bin/git-commit-notifier:15:in `<top (required)>'
remote:     from /usr/bin/git-commit-notifier:19:in `load'
remote:     from /usr/bin/git-commit-notifier:19:in `<main>'
To [email protected]:agnes.git
   9f68b34..42c8401  master -> master

It would be nice if it was a bit more fault-tolerant, by for example generating a notification mail saying that the contents could not be handled because of encoding problems.

from git-commit-notifier.

mfn avatar mfn commented on June 29, 2024

Here some analysis on the topic, also in reference to my pull request 105 (https://github.com/bitboxer/git-commit-notifier/pull/105 ):

I had another one of this invalid byte sequence errors which looks like this:

Sending mail...
/usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/diff_to_html.rb:341:in `split': invalid byte sequence in UTF-8 (ArgumentError)
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/diff_to_html.rb:341:in `extract_commit_info_from_git_show_output'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/diff_to_html.rb:475:in `diff_for_commit'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/diff_to_html.rb:627:in `block in diff_for_branch'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/diff_to_html.rb:626:in `each'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/diff_to_html.rb:626:in `diff_for_branch'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/diff_to_html.rb:675:in `diff_between_revisions'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/commit_hook.rb:118:in `run'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/executor.rb:29:in `block in run!'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/executor.rb:27:in `each_line'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/executor.rb:27:in `run!'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/bin/git-commit-notifier:15:in `<top (required)>'
        from /usr/local//rvm/gems/ruby-1.9.2-p290/bin/git-commit-notifier:19:in `load'
        from /usr/local//rvm/gems/ruby-1.9.2-p290/bin/git-commit-notifier:19:in `<main>'

The code in question:

336     def extract_commit_info_from_git_show_output(content)
337       result = { :message => [], :commit => '', :author => '', :date => '', :email => '',
338       :committer => '', :commit_date => '', :committer_email => ''}
339
340       message = []
341       content.split("\n").each do |line|
342         if line =~ /^diff/ # end of commit info
343           break

Some more digging:

  • diff_to_html.rb b0rks in extract_commit_info_from_git_show_output
  • extract_commit_info_from_git_show_output gets his data from diff_for_commit
  • diff_for_commit calls Git.show
  • Git.show calls from_shell which forces the encoding to UTF-8:
 11     def from_shell(cmd)
 12       r = `#{cmd}`
 13       raise ArgumentError.new("#{cmd} failed") unless $?.exitstatus.zero?
 14       r.force_encoding(Encoding::UTF_8) if r.respond_to?(:force_encoding)
 15       r
 16     end

from_shell is a general wrapper around all calls to the git binaries. However, my initial patch and assumption was flawed in that all git command always return proper UTF-8. It does not; only certain commands can do this. For example: git log provides an option to specify the encoding of the log message. git diff does not. Why? My explanation: it can and does diff arbitrary content, does not know/care about charsets and thus can not specify an encoding.

Now, git show is just a mixture of git log and git diff (oversimplified); git show has an option to specify an encoding but only for the log message part; not for the diff part. Since git show mixes git log and git diff output, it is no longer guaranteed that we only receive UTF-8 and thus my initial assumption is wrong.

The problem is to some extend non-trivial. I can easily patch in to have the respective code in diff_to_html.rb to e.g. convert to ASCII-8BIT; this would work for git-commit-notifier itself, but in my testing it choked in premailer later on:

/usr/local/rvm/gems/ruby-1.9.2-p290/gems/premailer-1.7.3/lib/premailer/premailer.rb:332:in `split': invalid byte sequence in UTF-8 (ArgumentError)
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/premailer-1.7.3/lib/premailer/premailer.rb:332:in `is_xhtml?'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/premailer-1.7.3/lib/premailer/adapter/nokogiri.rb:110:in `to_inline_css'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/emailer.rb:50:in `mail_html_message'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/emailer.rb:156:in `send'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/commit_hook.rb:162:in `run'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/executor.rb:29:in `block in run!'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/executor.rb:27:in `each_line'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/lib/git_commit_notifier/executor.rb:27:in `run!'
        from /usr/local/rvm/gems/ruby-1.9.2-p290/gems/git-commit-notifier-0.11.2/bin/git-commit-notifier:15:in `<top (required)>'
        from /usr/local//rvm/gems/ruby-1.9.2-p290/bin/git-commit-notifier:19:in `load'
        from /usr/local//rvm/gems/ruby-1.9.2-p290/bin/git-commit-notifier:19:in `<main>'

I'll better leave this to someone with more insights here.

As to "when can this error happen": commit a file in git with latin1 encoding and chars (e.g. German umlauts) and switch to UTF-8 and use the respective UTF-8 encoded variants of the German umlauts. The git diff (and git show for the matter) will contain mixed encoding output and is not a single valid UTF-8 sequence.

from git-commit-notifier.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.