regex - rm_between with multiple markers in an observation -

- January 15, 2010

there helpful answers on here using rm_between when each observation has 1 instance of markers. have dataset want extract things in ""'s , of observations have multiple instances of that. example:

fresh or chilled atlantic salmon "salmo salar" , danube salmon "hucho hucho"

when use code,

library(qdapregex) rf <- data.frame(rm_between_multiple(h2$se_desc_en, c("\"", "\""), c("\"", "\"")))

it creates data frame , same line earlier

 "fresh or chilled atlantic salmon , danube salmon"

is returned perfect. need missing data. try retain it, change code to:

h3 <- rm_between_multiple(h2$se_desc_en, c("\"", "\""), c("\"", "\""), extract=true)

to create list data in quotations. same line returned is:

c("salmo salar", " , danube salmon ", "hucho hucho",    "salmo salar", " , danube salmon ", "hucho hucho")

which has data in quotations has info in between quotations , being repeated. i'm new @ programming , wondering if there way write code not included information between these quotations.

i think don't need rm_between_multiple rm_between. there appears regex issue in using same left , right marker i'm not sure if bug yet. can use following extract

x <- 'fresh or chilled atlantic salmon "salmo salar" , danube salmon "hucho hucho"'  rm_default(     x,      pattern = s("@rm_between", '"'),     extract=true )  ## [[1]] ## [1] "\"salmo salar\"" "\"hucho hucho\""

edit think because default regex of rm_between not include left/right bounds. uses following regex "(?<=\").*?(?=\")". use of lookaheads cause left/right bounds not consumed , allows quotation marks available for: " , danube salmon ". (imo) bug address unsure how yet.

edit 2 incorporated @hwnd's response rm_between. dev version of qdapregex. can instal dev version via:

if (!require("pacman")) install.packages("pacman"); library(pacman) p_install_gh("trinker/qdapregex"); p_load(qdapregex)

and ...

rm_between(x, '"', '"', extract = true)  ## [[1]] ## [1] "salmo salar" "hucho hucho"

Search This Blog

Overvie

regex - rm_between with multiple markers in an observation -

Comments

Post a Comment

Popular posts from this blog

android - Gradle sync Error:Configuration with name 'default' not found -

java - Andrioid studio start fail: Fatal error initializing 'null' -

StringGrid issue in Delphi XE8 firemonkey mobile app -