GithubHelp home page GithubHelp logo

rlugojr / stringr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tidyverse/stringr

0.0 1.0 0.0 1.13 MB

A fresh approach to string manipulation in R

Home Page: http://stringr.tidyverse.org

License: GNU General Public License v2.0

R 99.36% CSS 0.27% JavaScript 0.37%

stringr's Introduction

stringr

Build Status Coverage Status CRAN Status

Overview

Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparation tasks. The stringr package provide a cohesive set of functions designed to make working with strings as easy as posssible. If you're not familiar with strings, the best place to start is the chapter on strings in R for Data Science.

stringr is built on top of stringi, which uses the ICU C library to provide fast, correct implementations of common string manipulations. stringr focusses on the most important and commonly used string manipulation functions whereas stringi provides a comprehensive set covering almost anything you can imagine. If you find that stringr is missing a function that you need, try looking in stringi. Both packages share similar conventions, so once you've mastered stringr, you should find stringi similarly easy to use.

Installation

# Install the released version from CRAN:
install.packages("stringr")

# Install the cutting edge development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/stringr")

Usage

All functions in stringr start with str_ and take a vector of strings as the first argument.

x <- c("why", "video", "cross", "extra", "deal", "authority")
str_length(x) 
#> [1] 3 5 5 5 4 9
str_c(x, collapse = ", ")
#> [1] "why, video, cross, extra, deal, authority"
str_sub(x, 1, 2)
#> [1] "wh" "vi" "cr" "ex" "de" "au"

Most string functions work with regular expressions, a concise language for describing patterns of text. For example, the regular expression "[aeiou]" matches any single character that is a vowel:

str_subset(x, "[aeiou]")
#> [1] "video"     "cross"     "extra"     "deal"      "authority"
str_count(x, "[aeiou]")
#> [1] 0 3 1 2 2 4

There are seven main verbs that work with patterns:

  • str_detect(x, pattern) tells you if there's any match to the pattern.

    str_detect(x, "[aeiou]")
    #> [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
  • str_count(x, pattern) counts the number of patterns.

    str_count(x, "[aeiou]")
    #> [1] 0 3 1 2 2 4
  • str_subset(x, pattern) extracts the matching components.

    str_subset(x, "[aeiou]")
    #> [1] "video"     "cross"     "extra"     "deal"      "authority"
  • str_locate(x, pattern) gives the position of the match.

    str_locate(x, "[aeiou]")
    #>      start end
    #> [1,]    NA  NA
    #> [2,]     2   2
    #> [3,]     3   3
    #> [4,]     1   1
    #> [5,]     2   2
    #> [6,]     1   1
  • str_extract(x, pattern) extracts the text of the match.

    str_extract(x, "[aeiou]")
    #> [1] NA  "i" "o" "e" "e" "a"
  • str_match(x, pattern) extracts parts of the match defined by parentheses.

    # extract the characters on either side of the vowel
    str_match(x, "(.)[aeiou](.)")
    #>      [,1]  [,2] [,3]
    #> [1,] NA    NA   NA  
    #> [2,] "vid" "v"  "d" 
    #> [3,] "ros" "r"  "s" 
    #> [4,] NA    NA   NA  
    #> [5,] "dea" "d"  "a" 
    #> [6,] "aut" "a"  "t"
  • str_replace(x, pattern, replacemnt) replaces the matches with new text.

    str_replace(x, "[aeiou]", "?")
    #> [1] "why"       "v?deo"     "cr?ss"     "?xtra"     "d?al"      "?uthority"
  • str_split(x, pattern) splits up a string into multiple pieces.

    str_split(c("a,b", "c,d,e"), ",")
    #> [[1]]
    #> [1] "a" "b"
    #> 
    #> [[2]]
    #> [1] "c" "d" "e"

As well as regular expressions (the default), there are three other pattern matching engines:

  • fixed(): match exact bytes
  • coll(): match human letters
  • boundary(): match boundaries

Compared to base R

R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R.

  • Uses consistent function and argument names. The first argument is always the vector of strings to modify, which makes stringr work particularly well in conjunction with the pipe:

    letters %>%
      .[1:10] %>% 
      str_pad(3, "right") %>%
      str_c(letters[2:11])
    #>  [1] "a  b" "b  c" "c  d" "d  e" "e  f" "f  g" "g  h" "h  i" "i  j" "j  k"
  • Simplifies string operations by eliminating options that you don't need 95% of the time.

  • Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs.

stringr's People

Contributors

andrew avatar christophergandrud avatar craigcitro avatar gaborcsardi avatar gagolews avatar hadley avatar jcheng5 avatar lmullen avatar luckyrandom avatar petehaitch avatar pitakakariki avatar ruderphilipp avatar smbache avatar t-kalinowski avatar tjmahr avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.