Regex

A Regex oda Regular Expression (boarisch: Regulära Ausdruck) is a Sequenz vo Zoachn, wo a Suachmusta definiad.

Regex wean in da Softwareentwicklung vawendd owa aa in Texteditorn, wo s zan Suacha und Dasetzn vo Zoachnkeedn vawendd wean. So konst in ana Wikipedia olle Weata aussasuacha, wo mit A oofanga und mid -bichl afhean. Do is wuascht, wejchane Zoachn dazwischn liegn. Sowos geht nua mid an Regex.

D Syntax vo d Regex variiad a wengal zwischn vaschiednan Apps.

Praxis

Oafoche Regex

Operator	Effekt
.	Da Punktoperator driffd jeds Zoachn.
[ ]	A Box (Kostn) dameglicht s Findn vo oanzlna Zoachn in an Text oda in ana Zoachnkeedn.
[^ ]	A Complement Box (Gengdoalskostn) dameglicht, dass oanzlne Zoachn in an Text oda ana Zoachnkeedn ned gfundn wean.
^	A Caret Anchor (Zoachnanka) driffd en Ofang vo ana Zein (oda jeda Zein im Multiline Mode)
$	A Dollar Anchor(Dollaranka) driffd s End vo ana Zein (oda jeda Zein im Multiline Mode)
( )	Runde Klamman (parentheses) defininan an markiadn Untaausdruck (marked subexpression). Dea gfundaned Textowschnidd ko spada wieda owgruafa wean.
\n	n is a Ziffa vo 1 to 9; driffd wos da nte markiade Untaausdruck driffd. Den Operator gibts ned in da daweitadn Regex-Syntax.
*	A oanzlns Zoachn gfoigt vo "" driffd Nui oda meah Kopien vo dem Ausdruck. Beispuisweis, "abc" driffd "ac", "abc", "abbbc" etc. "[xyz]" driffd "", "x", "y", "zx", "zyx", und so weida. \n, where n is a digit from 1 to 9, matches zero or more iterations of what the nth marked subexpression matched. For example, "(a.)c\1" matches "abcab" and "abcabab" but not "abcac". A Ausdruck wo vo "$" and "$" eihgschlossn is, gfoigt vo an "" guit ois invalid.

Beispui

"^[MH]uad"
- Driffd Muad und Huad owa nua am Ofang vo ana Zein.
"[MH]uad$"
- Driffd Muad und Huad owa nua am End vo ana Zein.

Zoachnauswoi

`[egh]`	oans vo d Zoachn „e“, „g“ oder „h“
`[0-6]`	a Ziffa vo „0“ bis „6“ (Bindestriich gem an Bereich oo)
`[A-Za-z0-9]`	a beliabiga lateinischa Buachstob oda a beliabige Ziffa
`[^a]`	a beliabigs Zoachn aussa „a“ („^“ voa ana Zoachnklass moant Negation)
`[-A-Z]`, `[A-Z-]` (bzw. `[A-Z\-a-z]`, owa ned noch POSIX)	D Auswoi enthoid aa en Bindestrich „-“

Zoachnklassn

Es gibt Zoachnklassn, wo fiadefiniat san. Des wead owa ned in oin Implementiarunga glei untastitzt. Zoachnklassn san beispuisweis:

`\d`	digit	a Ziffa, oiso [0-9] (und evtl. aa no weidane Zoizoachn, wia Unicode usw.)
`\D`	no digit	a Zoachn, wo koa Ziffa is, oiso [^\d]
`\w`	word character	a Buachstob, a Ziffa oda a Untastrich, oiso [a-zA-Z_0-9] (und evtl. aa no ned-lateinische Buachstom, z. B. Umlaut)
`\W`	no word character	a Zoachn, wo weda Buachstob Zoi no Untastrich is, oiso [^\w]
`\s`	whitespace	moast mindast s Laazoachn und d Klass vo d Steiazoachn \f, \n, \r, \t und \v
`\S`	no whitespace	a Zoachn, wo koa Whitespace is, oiso [^\s]

Zoachnklassn noch POSIX-Standard

POSIX	Ned-Standard	Perl/Tcl	Vim	Java	ASCII	Bschrieb
	`[:ascii:]`^[1]			`\p{ASCII}`	`[\x00-\x7F]`	ASCII characters (ASCII Zoachn)
`[:alnum:]`				`\p{Alnum}`	`[A-Za-z0-9]`	Alphanumeric characters (alphanumerische Zoachn)
	`[:word:]`^[1]	`\w`	`\w`	`\w`	`[A-Za-z0-9_]`	Alphanumeric characters plus "_" (alphanum. Zoachn plus "_")
		`\W`	`\W`	`\W`	`[^A-Za-z0-9_]`	Non-word characters (Ned-Woat Zoachn)
`[:alpha:]`			`\a`	`\p{Alpha}`	`[A-Za-z]`	Alphabetic characters (Buachstom)
`[:blank:]`			`\s`	`\p{Blank}`	`[ [[\t]]]`	Space and tab (Laazoachn und Tabs)
		`\b`	`\< \>`	`\b`	`(?<=\W)(?=\w)\|(?<=\w)(?=\W)`	Word boundaries (Woatgrenzn)
				`\B`	`(?<=\W)(?=\W)\|(?<=\w)(?=\w)`	Non-word boundaries (Ned-Woat-Grenzn)
`[:cntrl:]`				`\p{Cntrl}`	`[\x00-\x1F\x7F]`	Control characters (Steiazoachn)
`[:digit:]`		`\d`	`\d`	`\p{Digit}` or `\d`	`[0-9]`	Digits (Ziffan)
		`\D`	`\D`	`\D`	`[^0-9]`	Non-digits (Ned-Ziffan)
`[:graph:]`				`\p{Graph}`	`[\x21-\x7E]`	Visible characters (Sichtbore Zoachn)
`[:lower:]`			`\l`	`\p{Lower}`	`[a-z]`	Lowercase letters (kloane Buachstom)
`[:print:]`			`\p`	`\p{Print}`	`[\x20-\x7E]`	Visible characters and the space character (Sichtbore Zoachn & Laazoachn)
`[:punct:]`				`\p{Punct}`	[][!"#$%&'()*+,./:;<=>?@\^_`{\|}~-]	Punctuation characters (Zoachnsetzung bzw. Interpunktion)
`[:space:]`		`\s`	`\_s`	`\p{Space}` or `\s`	`[ \t\r\n\v\f]`	Whitespace characters (Laazoachn)
		`\S`	`\S`	`\S`	`[^ \t\r\n\v\f]`	Non-whitespace characters (Ned-Laazoachn)
`[:upper:]`			`\u`	`\p{Upper}`	`[A-Z]`	Uppercase letters (grousse Buachstom)
`[:xdigit:]`			`\x`	`\p{XDigit}`	`[A-Fa-f0-9]`	Hexadecimal digits (hexadezimale Zoachn)

Quantifier

Quantifier (Quantifiziara oda Wiedahoifaktorn) legn fest, wia oft a Ausdruck, oiso a vurigs Zoachn bzw. a vurige Zoachnkeedn zuaglossn is.

`?`	Da vurige Ausdruck is optionai, ea ko fiakema, braucht owa ned. Des hoasst, da Ausdruck kimmt nui- oda oamoi fia. (Des entspricht `{0,1}`)
`+`	Da vurige Ausdruck muass mindastns oamoi fiakema, deaf owa aa efta fiakema. (Des is aa `{1,}`)
`*`	Da vurige Ausdruck deaf beliabi oft (aa koamoi) fiakema. (Des is aa `{0,}`)
`{n}`	Da vurige Ausdruck muass exakt n-moi fiakema. (Des is aa `{n,n}`)
`{min,}`	Da vurige Ausdruck muass mindastens min-moi fiakema.
`{min,max}`	Da vurige Ausdruck muass mindastens min-moi und deaf maximai max-moi fiakema.
`{0,max}`	Da vurige Ausdruck deaf maximai max-moi fiakema.

Beispui

a+ is „a“ owa aa „aaaa“
[0-9]+ is „0123456789“ owa aa „072345“
[ab]+ is „a“, „b“, „aa“, „bbaab“ usw.
[0-9]{2,5} is mindastns zwoa und maximai 5 Ziffan, z. B. „91“ oder „63091“

Praktische Beispui

Operator	Bschrieb	Beispui
`.`	Driffd normai jeds Zoachn auss a neie Zein. In eckadn Klamman is da Punkt weatle gmoant.	$string1 = "Hello World\n"; if ($string1 =~ m/...../) { print "$string1 has length >= 5.\n"; } Output: Hello World has length >= 5.
`( )`	Grupiad Zoachn za oan Element. Wen a Ausdruck in rundn Klamman gfunden wead, ko spada duach `$1`, `$2`, ... draf zuagriffa wean.	$string1 = "Hello World\n"; if ($string1 =~ m/(H..).(o..)/) { print "We matched '$1' and '$2'.\n"; } Output: We matched 'Hel' and 'o W'.
`+`	Driffd as voaherige Zoachn oamoi oda meahmois.	$string1 = "Hello World\n"; if ($string1 =~ m/l+/) { print "There are one or more consecutive letter \"l\"'s in $string1.\n"; } Output: There are one or more consecutive letter "l"'s in Hello World.
`?`	Driffd as voaherige Zoachn nuimoi oda oamoi.	$string1 = "Hello World\n"; if ($string1 =~ m/H.?e/) { print "There is an 'H' and a 'e' separated by "; print "0-1 characters (e.g., He Hue Hee).\n"; } Output: There is an 'H' and a 'e' separated by 0-1 characters (e.g., He Hue Hee).
`?`	Modifiziad an `*`, `+`, `?` or `{M,N}` Regex, wo voahea kimmt, so dass a meglichst sejtn gfundn wead (non-greedy match).	$string1 = "Hello World\n"; if ($string1 =~ m/(l.+?o)/) { print "The non-greedy match with 'l' followed by one or\n"; print "more characters is 'llo' rather than 'llo Wo'.\n"; } Output: The non-greedy match with 'l' followed by one or more characters is 'llo' rather than 'llo Wo'.
`*`	Driffd as voaherige Zoachn nuimoi oda meahmois.	$string1 = "Hello World\n"; if ($string1 =~ m/elo/) { print "There is an 'e' followed by zero to many "; print "'l' followed by 'o' (e.g., eo, elo, ello, elllo).\n"; } Output:* There is an 'e' followed by zero to many 'l' followed by 'o' (e.g., eo, elo, ello, elllo).
`{M,N}`	Definiad a Minimum M und a Maximum N vo Zoachn-Iwaeihstimmunga (match count). N ko ausglossn wean und M ko 0 sei: `{M}` driffd "genau" M moi; `{M,}` driffd "zmindast" M moi; `{0,N}` driffd "hextns" N moi. `x* y+ z?` is so equivalent za `x{0,} y{1,} z{0,1}`.	$string1 = "Hello World\n"; if ($string1 =~ m/l{1,2}/) { print "There exists a substring with at least 1 "; print "and at most 2 l's in $string1\n"; } Output: There exists a substring with at least 1 and at most 2 l's in Hello World
`[…]`	Definiad a Reih vo meglichn Zoachn-Iwaeihstimmunga.	$string1 = "Hello World\n"; if ($string1 =~ m/[aeiou]+/) { print "$string1 contains one or more vowels.\n"; } Output: Hello World contains one or more vowels.
`\|`	Separiad oitanative Meglikeidn.	$string1 = "Hello World\n"; if ($string1 =~ m/(Hello\|Hi\|Pogo)/) { print "$string1 contains at least one of Hello, Hi, or Pogo."; } Output: Hello World contains at least one of Hello, Hi, or Pogo.
`\b`	Driffd a Nuibroadngrenz (zero-width boundary) zwischn am Zoachn vo da Woatklass (schaug untn) und entweda am Zoachn vo da Ned-Woatklass oder ana Kantn; säim wia `(^\w\|\w$\|\W\w\|\w\W)`.	$string1 = "Hello World\n"; if ($string1 =~ m/llo\b/) { print "There is a word that ends with 'llo'.\n"; } Output: There is a word that ends with 'llo'.
`\w`	Driffd a alphanumerisches Zoachn, eihschliassle "_"; säim wia `[A-Za-z0-9_]` in ASCII, und `[\p{Alphabetic}\p{GC=Mark}\p{GC=Decimal_Number}\p{GC=Connector_Punctuation}]` in Unicode, wo `Alphabetic` mehra ois wia lateinische Buachstom moant und `Decimal_Number` mehra ois wia arabische Ziffan moant.	$string1 = "Hello World\n"; if ($string1 =~ m/\w/) { print "There is at least one alphanumeric "; print "character in $string1 (A-Z, a-z, 0-9, _).\n"; } Output: There is at least one alphanumeric character in Hello World (A-Z, a-z, 0-9, _).
`\W`	Driffd a ned-alphanumerisches Zoachn, ausschliassle "_"; same as `[^A-Za-z0-9_]` in ASCII, und `[^\p{Alphabetic}\p{GC=Mark}\p{GC=Decimal_Number}\p{GC=Connector_Punctuation}]` in Unicode.	$string1 = "Hello World\n"; if ($string1 =~ m/\W/) { print "The space between Hello and "; print "World is not alphanumeric.\n"; } Output: The space between Hello and World is not alphanumeric.
`\s`	Driffd a Laazoachn, wo in ASCII a Tab(ulator), a Zeinfiaschub, a Seitnfiaschub, Wognrucklaf und a Laazoachn san; in Unicode stimmts aa mid Laazoachn ohne Untabrechung, vo da naxtn Zein und dena Laazoachn mid variabla Broadn (unta andam) iwaeih.	$string1 = "Hello World\n"; if ($string1 =~ m/\s.\s/) { print "In $string1 there are TWO whitespace characters, which may"; print " be separated by other characters.\n"; } Output:* In Hello World there are TWO whitespace characters, which may be separated by other characters.
`\S`	Driffd ois NUA KOA Laazoachn.	$string1 = "Hello World\n"; if ($string1 =~ m/\S.\S/) { print "In $string1 there are TWO non-whitespace characters, which"; print " may be separated by other characters.\n"; } Output:* In Hello World there are TWO non-whitespace characters, which may be separated by other characters.
`\d`	Driffd a Ziffa; säim ois wia `[0-9]` in ASCII; in Unicode, säim ois wia `\p{Digit}` or `\p{GC=Decimal_Number}`, wo a säim is ois wia `\p{Numeric_Type=Decimal}`.	$string1 = "99 bottles of beer on the wall."; if ($string1 =~ m/(\d+)/) { print "$1 is the first number in '$string1'\n"; } Output: 99 is the first number in '99 bottles of beer on the wall.'
`\D`	Drifft a Ned-Ziffa; säim ois wia `[^0-9]` in ASCII oda `\P{Digit}` in Unicode.	$string1 = "Hello World\n"; if ($string1 =~ m/\D/) { print "There is at least one character in $string1"; print " that is not a digit.\n"; } Output: There is at least one character in Hello World that is not a digit.
`^`	Matches the beginning of a line or string.	$string1 = "Hello World\n"; if ($string1 =~ m/^He/) { print "$string1 starts with the characters 'He'.\n"; } Output: Hello World starts with the characters 'He'.
`$`	Matches the end of a line or string.	$string1 = "Hello World\n"; if ($string1 =~ m/rld$/) { print "$string1 is a line or string "; print "that ends with 'rld'.\n"; } Output: Hello World is a line or string that ends with 'rld'.
`\A`	Matches the beginning of a string (but not an internal line).	$string1 = "Hello\nWorld\n"; if ($string1 =~ m/\AH/) { print "$string1 is a string "; print "that starts with 'H'.\n"; } Output: Hello World is a string that starts with 'H'.
`\z`	Matches the end of a string (but not an internal line).^[2]	$string1 = "Hello\nWorld\n"; if ($string1 =~ m/d\n\z/) { print "$string1 is a string "; print "that ends with 'd\\n'.\n"; } Output: Hello World is a string that ends with 'd\n'.
`[^…]`	Matches every character except the ones inside brackets.	$string1 = "Hello World\n"; if ($string1 =~ m/[^abc]/) { print "$string1 contains a character other than "; print "a, b, and c.\n"; } Output: Hello World contains a character other than a, b, and c.

Beleg

↑ ^1,0 ^1,1 33.3.1.2 Character Classes — Emacs lisp manual — Version 25.1. In: gnu.org. 2016. Abgerufen am 13. Aprü 2017.
↑ Damian Conway: Regular Expressions, End of String. In: Perl Best Practices, S. 240, O'Reilly 2005, ISBN 978-0-596-00173-5

Im Netz

Software

[char-classes-emacs-list-2016-1] 1,0 ^1,1 33.3.1.2 Character Classes — Emacs lisp manual — Version 25.1. In: gnu.org. 2016. Abgerufen am 13. Aprü 2017.

[Perl_Best_Practices-2] Damian Conway: Regular Expressions, End of String. In: Perl Best Practices, S. 240, O'Reilly 2005, ISBN 978-0-596-00173-5

[1]

[2]