GithubHelp home page GithubHelp logo

haskell-hvr / text-short Goto Github PK

View Code? Open in Web Editor NEW
16.0 7.0 16.0 148 KB

Memory-efficient representation of Unicode text strings

Home Page: https://hackage.haskell.org/package/text-short

License: BSD 3-Clause "New" or "Revised" License

Haskell 83.13% C 16.87%
haskell unicode text

text-short's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

text-short's Issues

unsafeHead

An unsafeHead :: ShortText -> Char function in Data.Text.Short.Unsafe would be useful in a situation I'm in. Would you take a PR for this?

Binary encode problem with GHC-9.0

    Binary.encode:                        FAIL
      src-test/Tests.hs:168:
      expected: "\NUL@@@@@@\146Hello \226\130\172 & \240\169\184\189!\NUL"
       but got: "\NUL\NUL\NUL\NUL\NUL\NUL\NUL\DC2Hello \226\130\172 & \240\169\184\189!\NUL"
      Use -p '/Binary.encode/' to rerun this test only.
    Binary.decode:                        FAIL
      Exception: Data.Binary.Get.runGet at position 8: not enough bytes
      CallStack (from HasCallStack):
        error, called at src/Data/Binary/Get.hs:345:5 in binary-0.8.9.0-640a90031da49036cea3bba3d7b0d12d9fcffb626f47339815976c15a4892a0e:Data.Binary.Get
      Use -p '/Binary.decode/' to rerun this test only.

hPut and friends

It would be nice to be able to output a ShortText to a handle in a more direct way that explicitly converting to a bytestring and using bytestring's functions for this.

Documentation of `fromText` should be updated for compatibility with `text-2`

-- | \(\mathcal{O}(n)\) Construct 'ShortText' from 'T.Text'
--
-- This is currently not \(\mathcal{O}(1)\) because currently 'T.Text' uses UTF-16 as its internal representation.
-- In the event that 'T.Text' will change its internal representation to UTF-8 this operation will become \(\mathcal{O}(1)\).
--
-- @since 0.1
fromText :: T.Text -> ShortText

Note that O(1) is true only for the best case where the Text represents the full underlying ByteArray# rather than a slice of it.

Tighten bounds on bytestring

Right now, attempting to build with bytestring 0.10.6.0 results in

Failed to build text-short-0.1.1.
Build log (
/home/vanessa/.cabal/logs/ghc-8.2.2/text-short-0.1.1-cb7a60e704f134de9ba10d40279fe006e0d896ed0ad4e3bc36635e5b2a9184ca.log
):
Warning: text-short.cabal:24:3: The field "default-language" is specified more
than once at positions 24:3, 43:3
Configuring library for text-short-0.1.1..
Preprocessing library for text-short-0.1.1..
Building library for text-short-0.1.1..
[1 of 3] Compiling Data.Text.Short.Internal ( src/Data/Text/Short/Internal.hs, dist/build/Data/Text/Short/Internal.o )

src/Data/Text/Short/Internal.hs:73:43: error:
    • No instance for (Semigroup ShortByteString)
        arising from the 'deriving' clause of a data type declaration
      Possible fix:
        use a standalone 'deriving instance' declaration,
          so you can specify the instance context yourself
    • When deriving the instance for (Semigroup ShortText)
   |
73 |                   deriving (Eq,Ord,Monoid,Semigroup,Hashable,NFData)
   |                                           ^^^^^^^^^
cabal: Failed to build text-short-0.1.1 (which is required by
bench:lzlib-bench from lzlib-1.0.7.0). See the build log above for details.

An unlifted ShortText# type

I've got this defined in one of my projects:

newtype ShortText# :: TYPE ('BoxedRep 'Unlifted) where
  ShortText# :: ByteArray# -> ShortText#
    
lift :: ShortText# -> ShortText
lift (ShortText# x) = TS.fromShortByteStringUnsafe (SBS x)
  
unlift :: ShortText -> ShortText#
unlift t = case TS.toShortByteString t of
  SBS x -> ShortText# x

In GHC 9.4, we can apply Array# to boxed unlifted types (previously, there was a nasty ArrayArray# type that was awful to use), so with this type, we can finally write things like:

Array# ShortText#

I don't technically need for ShortText# to live in text-short, but I thought it might be a reasonable place for it to live. It requires UnliftedNewtypes, which is only available since GHC 8.10.

Exception: recoverDecode: invalid argument (invalid byte sequence) (GHC 9.0.1)

$ nix-build -A haskell.packages.ghc901.text-short '<nixpkgs>'
...
Test suite tests: RUNNING...     
Tests                                                                                                     
  Unit-tests                                         
    fromText mempty:                      OK                                                              
    fromShortByteString [0xc0,0x80]:      OK                                                              
    fromByteString [0xc0,0x80]:           OK        
    fromByteString [0xf0,0x90,0x80,0x80]: OK                                                              
    fromByteString [0xf4,0x90,0x80,0x80]: OK                                                              
    IsString U+D800:                      FAIL
      Exception: recoverDecode: invalid argument (invalid byte sequence)                                  
...

How can I help get this resolved?

Does not work with GHCJS

This does not work with GHCJS because of it uses the c ffi. Not sure if there's a great workaround for this.

When will the next version be released?

I was about to open a PR to fix some outdated documentation, when I found out while reading the source code that #36 which was merged ~10 months ago had done just that.

When will a new version with the updated documentation be released to Hackage?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.