Regex
A Regex oda Regular Expression (boarisch: Regulära Ausdruck) is a Sequenz vo Zoachn, wo a Suachmusta definiad.
Regex wean in da Softwareentwicklung vawendd owa aa in Texteditorn, wo s zan Suacha und Dasetzn vo Zoachnkeedn vawendd wean. So konst in ana Wikipedia olle Weata aussasuacha, wo mit A oofanga und mid -bichl afhean. Do is wuascht, wejchane Zoachn dazwischn liegn. Sowos geht nua mid an Regex.
D Syntax vo d Regex variiad a wengal zwischn vaschiednan Apps.
Praxis
[Werkeln | Am Gwëntext werkeln]Oafoche Regex
[Werkeln | Am Gwëntext werkeln]Operator | Effekt |
---|---|
. | Da Punktoperator driffd jeds Zoachn. |
[ ] | A Box (Kostn) dameglicht s Findn vo oanzlna Zoachn in an Text oda in ana Zoachnkeedn. |
[^ ] | A Complement Box (Gengdoalskostn) dameglicht, dass oanzlne Zoachn in an Text oda ana Zoachnkeedn ned gfundn wean. |
^ | A Caret Anchor (Zoachnanka) driffd en Ofang vo ana Zein (oda jeda Zein im Multiline Mode) |
$ | A Dollar Anchor(Dollaranka) driffd s End vo ana Zein (oda jeda Zein im Multiline Mode) |
( ) | Runde Klamman (parentheses) defininan an markiadn Untaausdruck (marked subexpression). Dea gfundaned Textowschnidd ko spada wieda owgruafa wean. |
\n | n is a Ziffa vo 1 to 9; driffd wos da nte markiade Untaausdruck driffd. Den Operator gibts ned in da daweitadn Regex-Syntax. |
* | A oanzlns Zoachn gfoigt vo "*" driffd Nui oda meah Kopien vo dem Ausdruck. Beispuisweis, "ab*c" driffd "ac", "abc", "abbbc" etc. "[xyz]*" driffd "", "x", "y", "zx", "zyx", und so weida.
|
Beispui
[Werkeln | Am Gwëntext werkeln]- "^[MH]uad"
- Driffd Muad und Huad owa nua am Ofang vo ana Zein.
- "[MH]uad$"
- Driffd Muad und Huad owa nua am End vo ana Zein.
Zoachnauswoi
[Werkeln | Am Gwëntext werkeln][egh]
|
oans vo d Zoachn „e“, „g“ oder „h“ |
[0-6]
|
a Ziffa vo „0“ bis „6“ (Bindestriich gem an Bereich oo) |
[A-Za-z0-9]
|
a beliabiga lateinischa Buachstob oda a beliabige Ziffa |
[^a]
|
a beliabigs Zoachn aussa „a“ („^“ voa ana Zoachnklass moant Negation) |
[-A-Z] , [A-Z-] (bzw. [A-Z\-a-z] , owa ned noch POSIX)
|
D Auswoi enthoid aa en Bindestrich „-“ |
Zoachnklassn
[Werkeln | Am Gwëntext werkeln]Es gibt Zoachnklassn, wo fiadefiniat san. Des wead owa ned in oin Implementiarunga glei untastitzt. Zoachnklassn san beispuisweis:
\d |
digit | a Ziffa, oiso [0-9] (und evtl. aa no weidane Zoizoachn, wia Unicode usw.) |
\D |
no digit | a Zoachn, wo koa Ziffa is, oiso [^\d] |
\w |
word character | a Buachstob, a Ziffa oda a Untastrich, oiso [a-zA-Z_0-9] (und evtl. aa no ned-lateinische Buachstom, z. B. Umlaut) |
\W |
no word character | a Zoachn, wo weda Buachstob Zoi no Untastrich is, oiso [^\w] |
\s |
whitespace | moast mindast s Laazoachn und d Klass vo d Steiazoachn \f, \n, \r, \t und \v |
\S |
no whitespace | a Zoachn, wo koa Whitespace is, oiso [^\s] |
Zoachnklassn noch POSIX-Standard
[Werkeln | Am Gwëntext werkeln]POSIX | Ned-Standard | Perl/Tcl | Vim | Java | ASCII | Bschrieb |
---|---|---|---|---|---|---|
[:ascii:] [1]
|
\p{ASCII}
|
[\x00-\x7F]
|
ASCII characters (ASCII Zoachn) | |||
[:alnum:]
|
\p{Alnum}
|
[A-Za-z0-9]
|
Alphanumeric characters (alphanumerische Zoachn) | |||
[:word:] [1]
|
\w
|
\w
|
\w
|
[A-Za-z0-9_]
|
Alphanumeric characters plus "_" (alphanum. Zoachn plus "_") | |
\W
|
\W
|
\W
|
[^A-Za-z0-9_]
|
Non-word characters (Ned-Woat Zoachn) | ||
[:alpha:]
|
\a
|
\p{Alpha}
|
[A-Za-z]
|
Alphabetic characters (Buachstom) | ||
[:blank:]
|
\s
|
\p{Blank}
|
[ [[\t]]]
|
Space and tab (Laazoachn und Tabs) | ||
\b
|
\< \>
|
\b
|
(?<=\W)(?=\w)|(?<=\w)(?=\W)
|
Word boundaries (Woatgrenzn) | ||
\B
|
(?<=\W)(?=\W)|(?<=\w)(?=\w)
|
Non-word boundaries (Ned-Woat-Grenzn) | ||||
[:cntrl:]
|
\p{Cntrl}
|
[\x00-\x1F\x7F]
|
Control characters (Steiazoachn) | |||
[:digit:]
|
\d
|
\d
|
\p{Digit} or \d
|
[0-9]
|
Digits (Ziffan) | |
\D
|
\D
|
\D
|
[^0-9]
|
Non-digits (Ned-Ziffan) | ||
[:graph:]
|
\p{Graph}
|
[\x21-\x7E]
|
Visible characters (Sichtbore Zoachn) | |||
[:lower:]
|
\l
|
\p{Lower}
|
[a-z]
|
Lowercase letters (kloane Buachstom) | ||
[:print:]
|
\p
|
\p{Print}
|
[\x20-\x7E]
|
Visible characters and the space character (Sichtbore Zoachn & Laazoachn) | ||
[:punct:]
|
\p{Punct}
|
[][!"#$%&'()*+,./:;<=>?@\^_`{|}~-]
|
Punctuation characters (Zoachnsetzung bzw. Interpunktion) | |||
[:space:]
|
\s
|
\_s
|
\p{Space} or \s
|
[ \t\r\n\v\f]
|
Whitespace characters (Laazoachn) | |
\S
|
\S
|
\S
|
[^ \t\r\n\v\f]
|
Non-whitespace characters (Ned-Laazoachn) | ||
[:upper:]
|
\u
|
\p{Upper}
|
[A-Z]
|
Uppercase letters (grousse Buachstom) | ||
[:xdigit:]
|
\x
|
\p{XDigit}
|
[A-Fa-f0-9]
|
Hexadecimal digits (hexadezimale Zoachn) |
Quantifier
[Werkeln | Am Gwëntext werkeln]Quantifier (Quantifiziara oda Wiedahoifaktorn) legn fest, wia oft a Ausdruck, oiso a vurigs Zoachn bzw. a vurige Zoachnkeedn zuaglossn is.
? |
Da vurige Ausdruck is optionai, ea ko fiakema, braucht owa ned. Des hoasst, da Ausdruck kimmt nui- oda oamoi fia. (Des entspricht {0,1} )
|
+ |
Da vurige Ausdruck muass mindastns oamoi fiakema, deaf owa aa efta fiakema. (Des is aa {1,} )
|
* |
Da vurige Ausdruck deaf beliabi oft (aa koamoi) fiakema. (Des is aa {0,} )
|
{n} |
Da vurige Ausdruck muass exakt n-moi fiakema. (Des is aa {n,n} )
|
{min,} |
Da vurige Ausdruck muass mindastens min-moi fiakema. |
{min,max} |
Da vurige Ausdruck muass mindastens min-moi und deaf maximai max-moi fiakema. |
{0,max} |
Da vurige Ausdruck deaf maximai max-moi fiakema. |
Beispui
[Werkeln | Am Gwëntext werkeln]a+
is „a“ owa aa „aaaa“[0-9]+
is „0123456789“ owa aa „072345“[ab]+
is „a“, „b“, „aa“, „bbaab“ usw.[0-9]{2,5}
is mindastns zwoa und maximai 5 Ziffan, z. B. „91“ oder „63091“
Praktische Beispui
[Werkeln | Am Gwëntext werkeln]Operator | Bschrieb | Beispui |
---|---|---|
.
|
Driffd normai jeds Zoachn auss a neie Zein. In eckadn Klamman is da Punkt weatle gmoant. |
$string1 = "Hello World\n";
if ($string1 =~ m/...../) {
print "$string1 has length >= 5.\n";
}
Output: Hello World
has length >= 5.
|
( )
|
Grupiad Zoachn za oan Element. Wen a Ausdruck in rundn Klamman gfunden wead, ko spada duach $1 , $2 , ... draf zuagriffa wean.
|
$string1 = "Hello World\n";
if ($string1 =~ m/(H..).(o..)/) {
print "We matched '$1' and '$2'.\n";
}
Output: We matched 'Hel' and 'o W'.
|
+
|
Driffd as voaherige Zoachn oamoi oda meahmois. | $string1 = "Hello World\n";
if ($string1 =~ m/l+/) {
print "There are one or more consecutive letter \"l\"'s in $string1.\n";
}
Output: There are one or more consecutive letter "l"'s in Hello World.
|
?
|
Driffd as voaherige Zoachn nuimoi oda oamoi. | $string1 = "Hello World\n";
if ($string1 =~ m/H.?e/) {
print "There is an 'H' and a 'e' separated by ";
print "0-1 characters (e.g., He Hue Hee).\n";
}
Output: There is an 'H' and a 'e' separated by 0-1 characters (e.g., He Hue Hee).
|
?
|
Modifiziad an * , + , ? or {M,N} Regex, wo voahea kimmt, so dass a meglichst sejtn gfundn wead (non-greedy match).
|
$string1 = "Hello World\n";
if ($string1 =~ m/(l.+?o)/) {
print "The non-greedy match with 'l' followed by one or\n";
print "more characters is 'llo' rather than 'llo Wo'.\n";
}
Output: The non-greedy match with 'l' followed by one or
more characters is 'llo' rather than 'llo Wo'.
|
*
|
Driffd as voaherige Zoachn nuimoi oda meahmois. | $string1 = "Hello World\n";
if ($string1 =~ m/el*o/) {
print "There is an 'e' followed by zero to many ";
print "'l' followed by 'o' (e.g., eo, elo, ello, elllo).\n";
}
Output: There is an 'e' followed by zero to many 'l' followed by 'o' (e.g., eo, elo, ello, elllo).
|
{M,N}
|
Definiad a Minimum M und a Maximum N vo Zoachn-Iwaeihstimmunga (match count). N ko ausglossn wean und M ko 0 sei: {M} driffd "genau" M moi; {M,} driffd "zmindast" M moi; {0,N} driffd "hextns" N moi.x* y+ z? is so equivalent za x{0,} y{1,} z{0,1} .
|
$string1 = "Hello World\n";
if ($string1 =~ m/l{1,2}/) {
print "There exists a substring with at least 1 ";
print "and at most 2 l's in $string1\n";
}
Output: There exists a substring with at least 1 and at most 2 l's in Hello World
|
[…]
|
Definiad a Reih vo meglichn Zoachn-Iwaeihstimmunga. | $string1 = "Hello World\n";
if ($string1 =~ m/[aeiou]+/) {
print "$string1 contains one or more vowels.\n";
}
Output: Hello World
contains one or more vowels.
|
|
|
Separiad oitanative Meglikeidn. | $string1 = "Hello World\n";
if ($string1 =~ m/(Hello|Hi|Pogo)/) {
print "$string1 contains at least one of Hello, Hi, or Pogo.";
}
Output: Hello World
contains at least one of Hello, Hi, or Pogo.
|
\b
|
Driffd a Nuibroadngrenz (zero-width boundary) zwischn am Zoachn vo da Woatklass (schaug untn) und entweda am Zoachn vo da Ned-Woatklass oder ana Kantn; säim wia
|
$string1 = "Hello World\n";
if ($string1 =~ m/llo\b/) {
print "There is a word that ends with 'llo'.\n";
}
Output: There is a word that ends with 'llo'.
|
\w
|
Driffd a alphanumerisches Zoachn, eihschliassle "_"; säim wia [A-Za-z0-9_] in ASCII, und
in Unicode, wo |
$string1 = "Hello World\n";
if ($string1 =~ m/\w/) {
print "There is at least one alphanumeric ";
print "character in $string1 (A-Z, a-z, 0-9, _).\n";
}
Output: There is at least one alphanumeric character in Hello World
(A-Z, a-z, 0-9, _).
|
\W
|
Driffd a ned-alphanumerisches Zoachn, ausschliassle "_"; same as [^A-Za-z0-9_] in ASCII, und
in Unicode. |
$string1 = "Hello World\n";
if ($string1 =~ m/\W/) {
print "The space between Hello and ";
print "World is not alphanumeric.\n";
}
Output: The space between Hello and World is not alphanumeric.
|
\s
|
Driffd a Laazoachn, wo in ASCII a Tab(ulator), a Zeinfiaschub, a Seitnfiaschub, Wognrucklaf und a Laazoachn san; in Unicode stimmts aa mid Laazoachn ohne Untabrechung, vo da naxtn Zein und dena Laazoachn mid variabla Broadn (unta andam) iwaeih. |
$string1 = "Hello World\n";
if ($string1 =~ m/\s.*\s/) {
print "In $string1 there are TWO whitespace characters, which may";
print " be separated by other characters.\n";
}
Output: In Hello World
there are TWO whitespace characters, which may be separated by other characters.
|
\S
|
Driffd ois NUA KOA Laazoachn. | $string1 = "Hello World\n";
if ($string1 =~ m/\S.*\S/) {
print "In $string1 there are TWO non-whitespace characters, which";
print " may be separated by other characters.\n";
}
Output: In Hello World
there are TWO non-whitespace characters, which may be separated by other characters.
|
\d
|
Driffd a Ziffa; säim ois wia [0-9] in ASCII; in Unicode, säim ois wia \p{Digit} or \p{GC=Decimal_Number} , wo a säim is ois wia \p{Numeric_Type=Decimal} .
|
$string1 = "99 bottles of beer on the wall.";
if ($string1 =~ m/(\d+)/) {
print "$1 is the first number in '$string1'\n";
}
Output: 99 is the first number in '99 bottles of beer on the wall.'
|
\D
|
Drifft a Ned-Ziffa; säim ois wia [^0-9] in ASCII oda \P{Digit} in Unicode.
|
$string1 = "Hello World\n";
if ($string1 =~ m/\D/) {
print "There is at least one character in $string1";
print " that is not a digit.\n";
}
Output: There is at least one character in Hello World
that is not a digit.
|
^
|
Matches the beginning of a line or string. | $string1 = "Hello World\n";
if ($string1 =~ m/^He/) {
print "$string1 starts with the characters 'He'.\n";
}
Output: Hello World
starts with the characters 'He'.
|
$
|
Matches the end of a line or string. | $string1 = "Hello World\n";
if ($string1 =~ m/rld$/) {
print "$string1 is a line or string ";
print "that ends with 'rld'.\n";
}
Output: Hello World
is a line or string that ends with 'rld'.
|
\A
|
Matches the beginning of a string (but not an internal line). | $string1 = "Hello\nWorld\n";
if ($string1 =~ m/\AH/) {
print "$string1 is a string ";
print "that starts with 'H'.\n";
}
Output: Hello
World
is a string that starts with 'H'.
|
\z
|
Matches the end of a string (but not an internal line).[2] | $string1 = "Hello\nWorld\n";
if ($string1 =~ m/d\n\z/) {
print "$string1 is a string ";
print "that ends with 'd\\n'.\n";
}
Output: Hello
World
is a string that ends with 'd\n'.
|
[^…]
|
Matches every character except the ones inside brackets. | $string1 = "Hello World\n";
if ($string1 =~ m/[^abc]/) {
print "$string1 contains a character other than ";
print "a, b, and c.\n";
}
Output: Hello World
contains a character other than a, b, and c.
|
Beleg
[Werkeln | Am Gwëntext werkeln]- ↑ 1,0 1,1 33.3.1.2 Character Classes — Emacs lisp manual — Version 25.1. In: gnu.org. 2016. Abgerufen am 13. Aprü 2017.
- ↑ Damian Conway: Regular Expressions, End of String. In: Perl Best Practices, S. 240, O'Reilly 2005, ISBN 978-0-596-00173-5
Im Netz
[Werkeln | Am Gwëntext werkeln]- Reguläre Sprochn, reguläre Ausdrick
- POSIX specification of regular expressions
- Perl syntax for regular expressions
- Regex-Kurs fia Ofenga
- Regex guide
Software
[Werkeln | Am Gwëntext werkeln]- Online visual regex tester
- Online regex tester
- Online regex tester – Visualisiarung