Friday, January 28, 2011

Common Metacharacters and Features

Character Representations
Character Shorthands: \n, \t, \a, \b, \e, \f, \r, \v, ...
Octal Escapes: \num
Hex/Unicode Escapes: \xnum, \x{num}, \unum, \Unum, ...
Control Characters: \cchar


Character Classes and Class-Like Constructs
Normal classes: [a-z] and [^a-z]
Almost any character: dot
Exactly one byte: \C
Unicode Combining Character Sequence: \X
Class shorthands: \w, \d, \s, \W, \D, \S
Unicode properties, blocks, and categories: \p{Prop}, \P{Prop}
Class set operations: [[a-z]&&[^aeiou]]
POSIX bracket-expression "character class": [[:alpha:]]
POSIX bracket-expression "collating sequences": [[.span-ll.]]
POSIX bracket-expression "character equivalents": [[=n=]]
Emacs syntax classes


Anchors and Other "Zero-Width Assertions"
Start of line/string: ^, \A
End of line/string: $, \Z, \z
Start of match (or end of previous match): \G
Word boundaries: \b, \B, \<, \>, ...
Lookahead (?=?), (?!?); Lookbehind, (?<=?), (?<!?)


Comments and Mode Modifiers
Mode modifier: (?modifier), such as (?i) or (?-i)
Mode-modified span: (?modifier:?), such as (?i:?)
Comments: (?#?) and #?
Literal-text span: \Q?\E


Grouping, Capturing, Conditionals, and Control
Capturing/grouping parentheses: (?), \1, \2, ...
Grouping-only parentheses: (?:?)
Named capture: (?<Name>?)
Atomic grouping: (?>?)
Alternation: ?|?|?
Conditional: (?if then|else)
Greedy quantifiers: *, +, ?, {num,num}
Lazy quantifiers: *?, +?, ??, {num,num}?
Possessive quantifiers: *+, ++, ?+, {num,num}+

No comments:

Post a Comment