GithubHelp home page GithubHelp logo

xqq / libaribcaption Goto Github PK

View Code? Open in Web Editor NEW
103.0 5.0 12.0 996 KB

Portable ARIB STD-B24 Caption Decoder/Renderer

License: MIT License

CMake 4.52% C 6.98% C++ 88.50%
arib aribb24 caption closedcaption cpp dtv ffmpeg

libaribcaption's Introduction

libaribcaption

日本語

A portable caption decoder / renderer for handling ARIB STD-B24 based TV broadcast captions.

Background

While CEA-608/708 closed caption standards are used by the ATSC system in North America, DVB Subtitles / DVB Teletext defined in DVB standard are used in Europe and many parts of the world, Japan established its own TV broadcasting standard ISDB that includes a kind of caption service defined in ARIB STD-B24 by the Association of Radio Industries and Businesses (ARIB).

Brazil also adopted ISDB-T International for their broadcasting by establishing Brazilian version SBTVD / ISDB-Tb based on the Japanese standard, which has been widely used in South America countries and around the world. Brazilian version also includes a caption service for Latin languages defined in ABNT NBR 15606-1 which is modified from ARIB STD-B24 specification. Philippines also adopted ISDB-T International based on the Brazilian standards, but uses UTF-8 for caption encoding based on the Japansese specification ARIB STD-B24.

Though ISDB-based TV broadcasting has been operating for about 20 years, ARIB based caption is still lacking support in general players.

Overview

libaribcaption provides decoder and renderer for handling ARIB STD-B24 based broadcast captions, making it possible for general players to render ARIB captions with the same effect (or even better) as Television.

libaribcaption is written in C++17 but also provides C interfaces to make it easier to integrate into video players. It is a lightweight library that only depends on libfreetype and libfontconfig in the worst case.

libaribcaption is a cross-platform library that works on various platforms, including but not limited to:

  • Windows 7+
  • Windows XP+ (libfreetype required)
  • Linux (libfreetype and libfontconfig required)
  • Android 2.x+ (libfreetype required)
  • macOS
  • iOS

Screenshot

screenshot0.png

Features

  • Support captions in Japanese (ARIB STD-B24 JIS), Latin languages (ABNT NBR 15606-1) and Philippine (ARIB STD-B24 UTF-8)
  • Full support for rendering ARIB additional symbols (Gaiji) and DRCS characters
  • Lightweight and portable implementation that works on various platforms
  • Performance optimized (SSE2 on x86/x64) graphics rendering
  • Multiple text rendering backend driven by DirectWrite / CoreText / FreeType
  • Zero third-party dependencies on Windows (using DirectWrite) and macOS / iOS (using CoreText)
  • Built-in font fallback mechanism
  • Built-in DRCS converting table for replacing / rendering known DRCS characters into / by alternative Unicode

Build

CMake 3.11+ and a C++17 compatible compiler will be necessary for building. Usually you just have to:

cd libaribcaption
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . -j8
cmake --install .      # Optional

By default libaribcaption is compiled as static library, indicate ARIBCC_SHARED_LIBRARY:BOOL=ON to build as a shared library:

cmake .. -DCMAKE_BUILD_TYPE=Release -DARIBCC_SHARED_LIBRARY:BOOL=ON    # or -DBUILD_SHARED_LIBS:BOOL=ON

libaribcaption has several CMake options that can be specified:

ARIBCC_BUILD_TESTS:BOOL            # Compile test codes inside /test. Default to OFF
ARIBCC_SHARED_LIBRARY:BOOL         # Compile as shared library. Default to OFF
ARIBCC_NO_EXCEPTIONS:BOOL          # Disable C++ Exceptions. Default to OFF
ARIBCC_NO_RTTI:BOOL                # Disable C++ RTTI. Default to OFF
ARIBCC_NO_RENDERER:BOOL            # Disable the renderer and leave only the decoder behind. Default to OFF
ARIBCC_IS_ANDROID:BOOL             # Indicate target platform is Android. Detected automatically by default.
ARIBCC_USE_DIRECTWRITE:BOOL        # Enable DirectWrite font provider & renderer. Default to ON on Windows
ARIBCC_USE_GDI_FONT:BOOL           # Enable GDI font provider which is necessary for WinXP support. Default to OFF.
ARIBCC_USE_CORETEXT:BOOL           # Enable CoreText font provider & renderer. Default to ON on macOS / iOS
ARIBCC_USE_FREETYPE:BOOL           # Enable FreeType based renderer. Default to ON on Linux / Android
ARIBCC_USE_EMBEDDED_FREETYPE:BOOL  # Use embedded FreeType instead of searching system library. Default to OFF
ARIBCC_USE_FONTCONFIG:BOOL         # Enable Fontconfig font provider. Default to ON on Linux and other platforms

By default, libaribcaption only enables DirectWrite on Windows and CoreText on macOS / iOS without any third-party dependencies, But you can still enable the FreeType based text renderer by indicating -DARIBCC_USE_FREETYPE:BOOL=ON.

For Windows XP support, you have to turn off DirectWrite (which will result in a crash), enable GDI font provider and FreeType:

cmake .. -DCMAKE_BUILD_TYPE=Release -ARIBCC_USE_DIRECTWRITE:BOOL=OFF -DARIBCC_USE_GDI_FONT:BOOL=ON -DARIBCC_USE_FREETYPE:BOOL=ON

For enabling FreeType text renderer on Windows, consider using vcpkg or msys2 for accessing third-party libraries.

If you are under some kind of environment (like Android NDK or Windows) that is hard to prepare system-wide installed FreeType, consider using embedded FreeType by indicating -DARIBCC_USE_EMBEDDED_FREETYPE:BOOL=ON. This option will automatically fetch and compile a static-linked FreeType library internally.

Usage

libaribcaption could be imported through find_package() if you have installed it into system:

cmake_minimum_required(VERSION 3.11)
project(testarib LANGUAGES C CXX)

find_package(aribcaption REQUIRED)

add_executable(testarib main.cpp)

target_compile_features(testarib
    PRIVATE
        cxx_std_17
)

target_include_directories(testarib
    PRIVATE
        ${ARIBCAPTION_INCLUDE_DIR}
)

target_link_libraries(testarib
    PRIVATE
        aribcaption::aribcaption
)

Or using add_subdirectory() to import source folder directly:

cmake_minimum_required(VERSION 3.11)
project(testarib2 LANGUAGES C CXX)

set(ARIBCC_USE_FREETYPE ON CACHE BOOL "Enable FreeType")    # Indicate options here (optional)
add_subdirectory(thirdparty/libaribcaption)

add_executable(testarib2 main.cpp)

target_compile_features(testarib2
    PRIVATE
        cxx_std_17
)

target_link_libraries(testarib2
    PRIVATE
        aribcaption::aribcaption
)

Or using pkg-config if you have installed it into system:

# Link to libaribcaption static library
gcc main.c -o main `pkg-config --cflags --libs --static libaribcaption`

# Link to libaribcaption shared library
gcc main.c -o main `pkg-config --cflags --libs libaribcaption`

Documents

See the comments in public headers, and sample code with ffmpeg

Hints

libaribcaption's C++ headers are also written in C++17. If your environment doesn't support C++17, consider using the C API or switch to a newer compiler.

The C API (public headers with ".h" extensions) could be useful for calling from Pure C or other languages, see capi sample for usage.

Recommended fonts

These fonts are recommended for Japanese ARIB caption rendering:

Windows TV MaruGothic

Hiragino Maru Gothic ProN (macOS)

Rounded M+ 1m for ARIB

和田研中丸ゴシック2004ARIB

License

libaribcaption is released under MIT License. You should include the copyright notice and permission notice in your distribution.

References

ARIB STD-B24

ARIB TR-B14

ABNT NBR 15606-1

ISDB-T Standards (Philippines)

Other implementations

libaribcaption is heavily inspired by the following projects:

aribb24

aribb24.js

TVCaptionMod2

libaribcaption's People

Contributors

aimoff avatar btbn avatar otya128 avatar tguillem avatar vroad avatar xqq avatar xtne6f avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

libaribcaption's Issues

Wrong interpretation of HLC (highlighting control)

The LSB (least significant bit) of the HLC parameter means "bottom" (the side of the next line).

-    kEnclosureStyleTop = 1u << 0,
+    kEnclosureStyleBottom = 1u << 0,
-    kEnclosureStyleBottom = 1u << 2,
+    kEnclosureStyleTop = 1u << 2,

Casting DeleteDC to void(*)(HDC) is impossible

In "src/renderer/font_provider_gdi.cpp", the member hdc_ holds DeleteDC as a deleter. Declaration of DeleteDC is the following:

BOOL __stdcall DeleteDC(HDC hdc);

Then, ScopedHolder<HDC> tries to cast the deleter into void (_cdecl*)(HDC) (by default, _cdecl calling convention is used), and it outputs C2664 error on MSVC 15.9.43 x86 build.
__stdcall and _cdecl are different, so usually they can't be reinterpretable. Such a cast may be possible on AMD64. But at least not on x86.
As a solution, I suggest to use a deleter using captureless lambda like this:

[](HDC h) { DeleteDC(h); }

Issue with G3 special character set

I have been having an issue with displaying a quaver (music note) using Latin schema. A peek into the code shows that the implementation of kLatinSpecialTable does not match the way its index is being calculated in HandleGLGR, where for GraphicSet::kLatinSpecial case we have:

uint32_t index = (uint32_t)ch - 0x21;

So, for ch = 0x21 we will get 0x0021, while we should get 0x266a according to ABNT 15608-3

Tag a release?

Having a stable release tagged is required for inclusion in many package managers. Could you tag a 1.0.0 or 0.1.0 version? Thanks.

ffmpeg's decoder error [libaribcaption @ 0000025ee70e1e00]

I got the following error in the decoder process inside ffmpeg.

[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 a368b4ce2212ef80e2bf3d68559f5151 to Unicode
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 33d4c5243a45503d43fbb858a728664d to Unicode
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 33d4c5243a45503d43fbb858a728664d to Unicode
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 33d4c5243a45503d43fbb858a728664d to Unicode
Last message repeated 2 times
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 33d4c5243a45503d43fbb858a728664d to Unicode
Last message repeated 3 times
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 33d4c5243a45503d43fbb858a728664d to Unicode
Last message repeated 5 times
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 a368b4ce2212ef80e2bf3d68559f5151 to Unicode
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 33d4c5243a45503d43fbb858a728664d to Unicode
Last message repeated 5 times
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 33d4c5243a45503d43fbb858a728664d to Unicode
Last message repeated 1 times
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 33d4c5243a45503d43fbb858a728664d to Unicode
Last message repeated 1 times
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 33d4c5243a45503d43fbb858a728664d to Unicode
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 33d4c5243a45503d43fbb858a728664d to Unicode
Last message repeated 2 times
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 33d4c5243a45503d43fbb858a728664d to Unicode
Last message repeated 2 times
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 33d4c5243a45503d43fbb858a728664d to Unicode
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 33d4c5243a45503d43fbb858a728664d to Unicode
[libaribcaption @ 0000025ee70e1e00] DecoderImpl: Cannot convert unrecognized DRCS pattern with MD5 a368b4ce2212ef80e2bf3d68559f5151 to Unicode

I presume that there is a character string that has not been converted, but how should I deal with it?

DecodeStatus needs enriching

Hello,

ARIB-B24 spec does not specify how/when caption management data should be sent. While analyzing various streams, I've observed two distinct patterns: some streams consistently place management data ahead of caption statements, while others exhibit sporadic transmission of management data, with intervals of around 10 seconds. During these intervals, numerous caption statements are broadcasted, necessitating the ability to distinguish and ignore them until management data is received.

Currently, users of the libaribcaption library face challenges in determining the type of data received. The DecoderImpl::Decode function, along with the enum DecodeStatus, only covers three cases: error, noCaption, and gotCaption. Therefore, I attempted to treat noCaption as an indicative of caption management data. However, this approach proved unreliable, as valid caption statement data was occasionally misinterpreted as noCaption. This occurred particularly when utilizing EncodingScheme::kAuto with a Latin broadcast stream where no regions were recognized, resulting in noCaption from DecoderImpl::Decode. Resolving this required switching to EncodingScheme::kABNT_NBR_15606_1_Latin during decoder initialization.

To address this inconsistency, it would be beneficial to extend the DecodeStatus enumeration with a new value, such as kMgnt and enhancing DecoderImpl::Decode to return kMgnt specifically for identifying caption management data.

Alternatively, it's worth investigating the possibility of a bug within the library where valid caption statement data is erroneously interpreted as noCaption as described above.

Kind regards
Daniel

duplicate symbol: MD5_*

ld.lld: error: duplicate symbol: MD5_Update
>>> defined at libaribcaption/src/base/md5.c
>>>            libaribcaption.a(md5.c.obj)
>>> defined at crypto/md5/md5_dgst.c
>>>            libcrypto.a(libcrypto-lib-md5_dgst.obj)

ld.lld: error: duplicate symbol: MD5_Final
>>> defined at libaribcaption/src/base/md5.c
>>>            libaribcaption.a(md5.c.obj)
>>> defined at crypto/md5/md5_dgst.c
>>>            libcrypto.a(libcrypto-lib-md5_dgst.obj)

ld.lld: error: duplicate symbol: MD5_Init
>>> defined at libaribcaption/src/base/md5.c
>>>            libaribcaption.a(md5.c.obj)
>>> defined at crypto/md5/md5_dgst.c
>>>            libcrypto.a(libcrypto-lib-md5_dgst.obj)
clang: error: linker command failed with exit code 1 (use -v to see invocation)

This problem occurs when linking mpv, with statically linked openssl, libaribcaption, ffmpeg.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.