GithubHelp home page GithubHelp logo

Comments (10)

stevendaniels avatar stevendaniels commented on July 22, 2024 1

I can confirm this issue. It looks like the problem started in this commit: d4dd5d1

-    @encoding = raw_encoding(nil) ||
-                ( if encoding = options.delete(:internal_encoding)
-                    case encoding
-                    when Encoding; encoding
-                    else Encoding.find(encoding)
-                    end
-                  end ) ||
-                ( case encoding = options.delete(:encoding)
-                  when Encoding; encoding
-                  when /\A[^:]+/; Encoding.find($&)
-                  end ) ||
+    internal_encoding = Encoding.find(internal_encoding) if internal_encoding
+    if encoding
+      encoding, = encoding.split(":") if encoding.is_a?(String)
+      encoding = Encoding.find(encoding)
+    end
+    @encoding = raw_encoding(nil) || internal_encoding || encoding ||

On current master, the relevant section looks like this:

  # honor the IO encoding if we can, otherwise default to ASCII-8BIT
  internal_encoding = Encoding.find(internal_encoding) if internal_encoding
  external_encoding = Encoding.find(external_encoding) if external_encoding
  if encoding
    encoding, = encoding.split(":", 2) if encoding.is_a?(String)
    encoding = Encoding.find(encoding)
  end
  @encoding = raw_encoding(nil) || internal_encoding || encoding ||
              Encoding.default_internal || Encoding.default_external

Beyond the issue that @ShockwaveNN raised, I also noticed we aren't using the external_encoding argument anywhere.

from csv.

radar avatar radar commented on July 22, 2024 1

I am still having issues trying to read a CSV string that has a byte order mark prepended to it:

require 'csv'
puts "CSV VERSION #{CSV::VERSION}" # Shows 3.0.0

bom_character = 65_279
contents = "first_name\nRyan".codepoints.unshift(bom_character).pack("U*")
csv = CSV.parse(contents, headers: true, encoding: 'bom|utf-8')
csv.each do |row|
  p row.to_h.keys.first.codepoints
  p "ROW FIRST NAME IS #{row["first_name"]}"
end

This outputs nil for the first name and indicates that the key also contains the BOM. What am I doing wrong here?

from csv.

kou avatar kou commented on July 22, 2024 1

BOM is for opening a file not parse target string.

What am I doing wrong here?

You should not reuse closed issue. You should open a new issue.

from csv.

ShockwaveNN avatar ShockwaveNN commented on July 22, 2024

Is it a good practice to post bugs here or better post in on https://bugs.ruby-lang.org/ ?

from csv.

stevendaniels avatar stevendaniels commented on July 22, 2024

@kou, @hsbt:
It looks like change in the order of operations is what broke things. raw_encoding(nil) returns <Encoding:UTF-8> because @io.external_encoding == <Encoding:UTF-8>

  def raw_encoding(default = Encoding::ASCII_8BIT)
    if @io.respond_to? :internal_encoding
      @io.internal_encoding || @io.external_encoding
    elsif @io.is_a? StringIO
      @io.string.encoding
    elsif @io.respond_to? :encoding
      @io.encoding
    else
      default
    end
  end

To me, it looks like raw_encoding will always return an encoding, so
raw_encoding(nil) || internal_encoding || encoding || Encoding.default_internal || Encoding.default_external will never reach internal_encoding

BTW, neither Ruby 2.4.3 or 2.5.0 recognize bom|utf-8 as a valid encoding:

irb(main):001:0> RUBY_VERSION
=> "2.4.3"
irb(main):002:0>  Encoding.find("bom|utf-8")
ArgumentError: unknown encoding name - bom|utf-8
	from (irb):2:in `find'
	from (irb):2
	from /Users/steven/.rbenv/versions/2.4.3/bin/irb:11:in `<main>'

It doesn't look like we use the internal_encoding, encoding, external_encoding arguments. Do we still need them?

from csv.

kou avatar kou commented on July 22, 2024

@ShockwaveNN Thanks for your report. I've fixed it.
You can use here.

from csv.

tricknotes avatar tricknotes commented on July 22, 2024

@kou 4d13339 is so cool!
Could you release a new version including this commit?
I want to use this fix as a released version.

from csv.

kou avatar kou commented on July 22, 2024

OK.
Can you update news.md? We can release a new version after we have a release note for the next version in new.md.

from csv.

tricknotes avatar tricknotes commented on July 22, 2024

Of course!
Thanks for contribution chance for me.

I sent a PR for news.md. #36

from csv.

kou avatar kou commented on July 22, 2024

Great!

from csv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.