GithubHelp home page GithubHelp logo

text-clipper's Introduction

text-clipper.js

Fast and correct clip functions for HTML and plain text.

Why use text-clipper?

text-clipper offers the following advantages over similar libraries that allow clipping HTML:

  • Correctness
    • HTML is processed through a proper state machine, no regular expression hacks.
    • Valid HTML input always produces valid HTML output.
    • Heavily unit-tested to support the above statement.
  • Proper Unicode handling
    • Unicode-awareness makes sure Unicode characters such as emojis don't get clipped halfway.
  • Performance
    • Text-clipper has been carefully optimized and is typically as fast as or faster than its competitors (see: blog).
  • Consistent API and behavior for both HTML and plain text

Usage

Deno

First install the package:

$ deno add @arendjr/text-clipper

Once installed, you can use it as follows:

import clip from "@arendjr/text-clipper";

const clippedString = clip(string, 80); // returns a string of at most 80 characters

const clippedHtml = clip(htmlString, 140, { html: true, maxLines: 5 });

Bun

Install using the following command instead:

$ bunx jsr add @arendjr/text-clipper

For usage instructions, see above.

Node.js

Install using one of the following commands, depending on your package manager:

$ npx jsr add @arendjr/text-clipper # If using NPM
$ yarn dlx jsr add @arendjr/text-clipper # If using Yarn
$ pnpm dlx jsr add @arendjr/text-clipper # If using PNPM

For usage instructions, see above.

Options

breakWords

By default, text-clipper tries to break only at word boundaries so words don't get clipped halfway. Set this option to true if you want words to be broken up.

html

By default, text-clipper treats the input string as plain text. This is undesirable if the input string is HTML, because it might result in broken HTML tags. Set this option to true to make text-clipper treat the input as HTML, in which case it will try to always return valid HTML, provided the input is valid as well.

imageWeight

The amount of characters to assume for images. This is used whenever an image is encountered, but also for embedded SVG and MathML content. The default is 2.

indicator

The string to insert to indicate the string was clipped. Default: '…'.

Note that the indicator is never inserted if only whitespace remains after the clipping point.

insertIndicatorAtLinebreak

Whether the indicator should be inserted when the text is clipped at a linebreak. Default: true.

maxLines

Maximum amount of lines allowed. If given, the string will be clipped either at the moment the maximum amount of characters is exceeded or the moment maxLines newlines are discovered, whichever comes first.

Note when in HTML mode, block-level elements trigger newlines and text-clipper assumes the text will be displayed with a CSS white-space setting that treats \n as a line break. Of course the HTML tag <br> is also counted.

stripTags

Optional list of tags to be stripped from the input HTML. May be set to true to strip all tags. Only supported in combination with html: true.

Example:

// Strips all images from the input string:
clip(input, 140, { html: true, stripTags: ["img", "svg"] });

Tag names must be specified in lowercase.

Changelog

See CHANGELOG.md.

License

Licensed under the MIT License.

See LICENSE.

text-clipper's People

Contributors

arendjr avatar churchs19 avatar daniil4udo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

text-clipper's Issues

option strip empty tag when using with maxLines

Hey, thanks for greet package, @arendjr !

It would be helpful if text-clipper has option to strip empty tag.

For example:

clip( '<p><img/></p><p>abc</p>',  3,  {
    html: true,
    stripTags: ['img'],
    maxLines: 1,
  })

returns

'<p></p>'

which isn't what i wanted.

I want this:

'<p>abc</p>'

Can you provide me a workaround, please! Thank you!

set maxLength to words instead of characters

Hi,
I had a use case where I wanted to clip by number of words and not by characters.
Before I go and implement it, I wanted to ask if you would like that functionality in text-clipper?

Clipping HTML has unexpected results

Been writing a few tests and getting unexpected results with the following:

  1. When using a length that is less or equal to the indicator length:
clip('<p>one <a href="#">two - three <br>four</a> five</p>', 0, {
    html: true,
    breakWords: true,
    indicator: '...'
  });
// output - <p>one <a href=\"#\">two - three <br>four</a> five<...

The expected output I imagine would be <p>...</p>

  1. When the last character in the node is a space, but has remaining tags:
clip('<p>one <a href="#">two - three <br>four</a> five</p>', 6, {
    html: true,
    breakWords: true,
    indicator: '...'
  });
// output - <p>one </p>

The expected output - <p>one...</p>

Please let me know if this is a bug or if it's actually expected behaviour.

Extra space is added after word

Screenshot 2022-01-11 at 16 40 53

This is with HTML set to true, I am finding that a space is added after the word, is it possible to change the output such that the space after the word is not added?

Release ESM-compatible version

#22 was merged, but there hasn't been a release since. Would it be possible to make one @arendjr?

I'm trying to migrate a project from CJS to ESM using 2.2.0 (PREreview/prereview.org#1750), and tsc is giving me the error:

typeof import("[...]/node_modules/text-clipper/dist/index")' has no call signatures

I've had a go and see if there's a way round it locally, but I'm guessing using the updated build will be simpler/quicker!

Missing indicator on html tag cut

Hi,
given this example:

const tc = require('text-clipper').default
tc('<p>12345</p><p>67890</p>', 8, {
      html: true,
      breakWords: true,
      indicator: '...',
    })

I would expect this output:

<p>12345...</p>

// or
<p>12345</p><p>...</p>

Instead, I got:

<p>12345</p>

What do you think?

<del> tag not supported

Missing support for <del></del> tag, had to uninstall because of it. Maybe you would want to add that.

Clipping HTML table has unexpected results

Hi Arend,

We have table

image

<table border="1" cellpadding="1" cellspacing="1" style="width: 500px">
	<tbody>
		<tr>
			<td>
			<ul>
				<li>fb</li>
			</ul>
			</td>
			<td>
			<ul>
				<li>fbfbfb</li>
			</ul>
			</td>
		</tr>
		<tr>
			<td>
			<ul>
				<li>google</li>
			</ul>
			</td>
			<td>
			<ul>
				<li>twitter</li>
			</ul>
			</td>
		</tr>
		<tr>
			<td>
			<ul>
				<li>intel</li>
			</ul>
			</td>
			<td>
			<ul>
				<li>amazon</li>
			</ul>
			</td>
		</tr>
	</tbody>
</table>

If we do

clip(theTableContent, 150, {
	html: true,
	breakWords: true,
});

we get the truncated table

<table border="1" cellpadding="1" cellspacing="1" style="width: 500px">
	<tbody>
		<tr>
			<td>
			<ul>
				<li>fb</li>
			</ul>
			</td>
			<td>
			<ul>
				<li>fbfbfb</li>
			</ul>
			</td>
		</tr>
		<tr>
			<td>
			<ul>
				<li>google</li>
			</ul>
			</td>
			<td>
			<ul>
				<li>twitter</li>
			</ul>
			</td>
		</tr>
		<tr>
			<td>
			<ul>
				<li>intel</li>
			</ul>
			</td>
…</tr></tbody></table>

which displays as
image

Notice the indicator ... is in incorrect position.

Could you please let me know if this is a bug?

Thanks!

Clipped length is different from max length

I am consistently seeing clipped text who length does not match the max length requested. For example,

var textClipper = require("text-clipper")
var html = "<p>David</p>";
textClipper(html, 4, { indicator: "", html: true, breakWords: true} );

returns the clipped text "Dav". Tried with many other short-ish strings with consistent results. The clipped text is always one character less than requested.

Am I possibly misusing the API in some way?

Only add closing tag for standard HTML tags that are not self-closing

When clipping HTML, would it be possible to not add a closing tag for strings that aren't actually HTML tags?

So if I do something like this:

clip('<p>The quick brown <fox> jumps over the lazy dog', 30, { html: true })

Since <fox> isn't actually a valid HTML tag, I'm hoping to get:

<p>The quick brown <fox> jumps over …</p>

instead of:

<p>The quick brown <fox> jumps over …</fox></p>

Count the HTML tags in maxLength

I'd like to suggest to have the option to count the length of HTML tags in maxLength. I need an email template that can't exceed 255 chars but if template.length - tagsInTemplate.length < 255 than the clipper will return my original template which is longer than 255 with the tags.

Option to return the text that gets clipped off

Hey Arend!

This library looks awesome! Thanks for putting this together!

I was hoping to use this for a truncated excerpt with a Read More/Read Less toggle button.

This library will return the excerpt, but I don't see an easy way to get the rest of the text when a user clicks "Read More." (I'd prefer not to duplicate the truncated section.

I don't know if that fits with this project, but I figured I'd mention it, since if it did I would totally use this!

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.