GithubHelp home page GithubHelp logo

michael105 / codepage_converter Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 51 KB

Conversion between different codepages and/or utf8; source encoding can be guessed

License: GNU General Public License v3.0

Makefile 0.64% C 99.36%
codepage cp1252 cp437 iconv iso8859 terminal utf8 cp850

codepage_converter's Introduction

cpconv

Codepage conversion tool.

Filter stdin to stdout, without any options the charset will be guessed and translated to cp1252.

Currently, the guessing of the charmap might work only with German, the "algorithm" looks for German Umlauts.( äö .. ) It's explained in the source, howto add other languages.

Only extended ascii is implemented (ASCII 0 - 255 ), and UTF8.

Conversion from and to the following charmaps is possible:

cpe4002a
cp850
cp437
cp1252
cp1250
cp1251
cp1253
cp1255
cp1256
cp1257
cp1258
atarist
macintosh
mac_centraleurope
iso8859_15
utf8

cpe4002a is a special codepage, I'm using with slterm. (https://github.com/michael105/slterm)

The option -c converts the input to the notation, used in c strings. ( "\x84\xef .. " ) No previous conversion of the input, extended ascii and chars < ascii 32 are converted to \xnn notation, linebreaks ("\n","\x0a") aren't modified.

It's a small hack, but useful, if you do work at the terminal. Most programs will work with e.g. cp1252, or cp437 (the set with border symbols), but I did always have a hard time with German Umlauts and utf-8.

BOM characters aren't translated, but I also didn't come along them yet. (Might be also obvious, they aren't contained in the extended ascii sets).

I wouldn't recommend using this in a security relevant context, it's not written for security, BOM characters could get a problem. Filtering texts should be save, just don't put it in back of a server.

# cpconv -h

convert stdin to stdout
Usage: cpconv [-hslxud] [tocp [fromcp]]

Examples: cat text.txt | cpconv cp437 cp850
         (convert from cp850 to cp437)

          cat text.txt | cpconv
         (convert to cp1252, guessing source charset)

Without any options, try to guess the charset and convert to cp1252
(change the default in the source, if needed)

options: -s : silence, no messages to stderr
         -v : verbose
         -l : list codepages
         -U : dump umlaute, converted
         -c : convert input to cstring notation
         -C : convert input to cstring notation, including linebreaks
         -x : display non convertible chars in hexadecimal
         -u : display non convertible chars as utf8
         -d : print debug information
         -h : show this help

(I'm using this as input filter for vi, and to copy text between terminal and x clipboard
 the conversion is done automatically, if needed)

miSc, Michael Myer, 2023, GPL

github.com/michael105/codepage_converter

codepage_converter's People

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.