Thanx to http://www.emacswiki.org/emacs/RegularExpression, I hv lost many times pages in my bookmark, hence I would prefer to copy it on here for myself and others. There are some updations/changes where I felt it could help others too....Credit all goes to -
http://www.emacswiki.org/emacs/RegularExpression
Contents
The following characters are special :
Between brackets
Many characters are special when they follow a backslash – see below.
Characters are organized by category. Use
http://www.emacswiki.org/emacs/RegularExpression
RegularExpressions in Emacs/XEmacs
A regular expression (abbreviated “regexp” or sometimes just “re”) is a search-string with wildcards – and more. It is a pattern that is matched against the text to be searched. See Regexps. Examples:
"alex"
A plain string is a regular expression that matches the string exactly. The above regular expression matches “alex”.
"alexa?"
Some characters have special meanings in a regular expression. The question mark, for example, says that the preceding expression (the character “a” in this case) may or may not be present. The above regular expression matches “alex” or “alexa”.
Regexps are important to Emacs users in many ways, including these:
- We search with them interactively. Try
‘C-M-s’(commandisearch-forward-regexp). - Emacs code uses them to parse text. We use regexps all the time, without knowing it, when we use Emacs.
Regular Expression Syntax
Here is the syntax used by Emacs for regular expressions. Any character matches itself, except for the list below.The following characters are special :
. * + ? ^ $ \ [ Between brackets
[], the following are special : ] - ^ Many characters are special when they follow a backslash – see below.
. any character (but newline)
* previous character or group, repeated 0 or more time
+ previous character or group, repeated 1 or more time
? previous character or group, repeated 0 or 1 time
^ start of line
$ end of line
[...] any character between brackets
[^..] any character not in the brackets
[a-z] any character between a and z
\ prevents interpretation of following special char
\| or
\w word constituent
\b word boundary
\sc character with c syntax (e.g. \s- for whitespace char)
\( \) start\end of group
\< \> start\end of word
\` \' start\end of buffer
\1 string matched by the first group
\n string matched by the nth group
\{3\} previous character or group, repeated 3 times
\{3,\} previous character or group, repeated 3 or more times
\{3,6\} previous character or group, repeated 3 to 6 times
.?, +?, and ?? are non-greedy versions of ., +, and ? – see NonGreedyRegexp. Also, \W, \B, and \Sc match any character that does not match \w, \b, and \sc.Characters are organized by category. Use
C-u C-x = to display the category of the character under the cursor.\ca ascii character \Ca non-ascii character (newline included) \cl latin character \cg greek characterHere are some [[syntax_classes?]] that can be used between brackets,
[].[:digit:] a digit, same as [0-9] [:upper:] a letter in uppercase [:space:] a whitespace character, as defined by the syntax table [:xdigit:] an hexadecimal digit [:cntrl:] a control character [:ascii:] an ascii characterSyntax classes:
\s- whitespace character \s/ character quote character \sw word constituent \s$ paired delimiter \s_ symbol constituent \s' expression prefix \s. punctuation character \s< comment starter \s( open delimiter character \s> comment ender \s) close delimiter character \s! generic comment delimiter \s" string quote character \s| generic string delimiter \s\ escape characterYou can see the current [[syntax_table?]] by typing
C-h s. The syntax table depends on the current mode. As expected, letters a..z are listed as word constituents in text-mode. Other word constituents in this mode include A..Z, 0..9, $, %, currency units, accented letters, kanjis. See EmacsSyntaxTable for details.Idiosyncrasies of Emacs Regular Expressions
- In a interactive search involving a regexp, a space character stands for one or more whitespace characters (tabs are whitespace characters). Enter
C-q SPCto get a single space character. Or put the following in your InitFile to override this behaviour.
(setq search-whitespace-regexp nil)
[^…]matches all characters not in the list, even newlines. Put a newline in the list if you want it not to be matched. You can enter a newline character using‘C-o’,‘C-q C-j’, or‘C-q 012 RET’. Note also that\s-matches space, tab, newline and carriage return. This can be handy in a[^…]construct.- Default case handling for replacing commands executes case conversion. This means that both upper and lower case match in the regexp, whereas the case in the replacement string is chosen according to the match syntax. Try for example replacing
johnbyharrybelow. Case conversion can be toggled on/off by typing‘M-c’in the minibuffer during search. You can also set the variablecase-fold-searchtonilto disable case conversion; see CaseFoldSearch for more details. In the following example, only the last line would then be replaced.
John => Harry
JOHN => HARRY
john => harry
- Backslashes must be double-quoted when used in Lisp code. Regular expressions are often specified using strings in EmacsLisp. Some abbreviations are available:
\nfor newline,\tfor tab,\bfor backspace,\u3501for character with unicode value 3501, and so on. Backslashes must be entered as\\. Here are two ways to replace the decimal point by a comma (e.g.1.5 -> 1,5), first by an interactive command, second by executing Lisp code (typeC-x C-eafter the expression to get it executed).
M-x replace-regexp RET \([0-9]+\)\. RET \1, RET
(while (re-search-forward "\\([0-9]+\\)\\." nil t)
(replace-match "\\1,"))
Some Regexp Examples
[-+[:digit:]] digit or + or - sign
\(\+\|-\)?[0-9]+\(\.[0-9]+\)? decimal number (-2 or 1.5 but not .2 or 1.)
\(\w+\) +\1\> two consecutive, identical words
\<[[:upper:]]\w* word starting with an uppercase letter
+$ trailing whitespaces (note the starting SPC)
\w\{20,\} word with 20 letters or more
\w+phony\> word ending by phony
\(19\|20\)[0-9]\{2\} year 1900-2099
^.\{6,\} at least 6 symbols
^[a-zA-Z0-9_]\{3,16\}$ decent string for a user name
C-q C-j ]*>\(.*?\) html tag
Some Emacs Commands that Use Regular Expressions
C-M-s incremental forward search matching regexp
C-M-r incremental backward search matching regexp
replace-regexp replace string matching regexp
query-replace-regexp same, but query before each replacement
align-regexp align, using strings matching regexp as delimiters
highlight-regexp highlight strings matching regexp
occur show lines containing a match
multi-occur show lines in all buffers containing a match
how-many count the number of strings matching regexp
keep-lines delete all lines except those containing matches
flush-lines delete lines containing matches
grep call unix grep command and put result in a buffer
lgrep user-friendly interface to the grep command
rgrep recursive grep
dired-do-copy-regexp copy files with names matching regexp
dired-do-rename-regexp rename files matching regexp
find-grep-dired display files containing matches for regexp with Dired
Note that list-matching-lines is an alias for occur and delete-matching-lines is an alias for flush-lines. The command highlight-regexp is bound to C-x w h. Also query-replace-regexp is bound by default to C-M-%, although some people prefer using an alias, like M-x qrr. Put the following in your InitFile to create such alias.(defalias 'qrr 'query-replace-regexp)See also: IncrementalSearch, ReplaceRegexp, AlignCommands, OccurBuffer, DiredPower
Tools for Constructing Regexps
- Command
‘re-builder’constructs a regular expression. You enter the regexp in a small window at the bottom of the frame. The first 200 matches in the buffer are highlighted, so you can see if the regexp does what you want. Use Lisp syntax, which means doubling backslashes and using\\\\to match a literal backslash. - Macro
‘rx’provides user-friendly syntax for regular expressions. For example,(rx (one-or-more blank) line-end)returns the regexp string"\\(?:[[:blank:]]+$\\)". See rx. - SymbolicRegexp is similar in aim to
‘rx’.
Study and Practice
- Read about regexps in the Elisp manual (see also RegexpReferences), and study EmacsLisp code that uses regexps.
- Regexp searching (
‘C-M-s’) is a great way to learn about regexps – see Regexp Searches. Change your regexp on the fly and see immediately what difference the change makes. - Some examples of use (see also ReplaceRegexp and EmacsCrashRegexp):
- Search for trailing whitespace:
C-M-s SPC+$ - Highlight all trailing whitespace:
M-x highlight-regexp RET SPC+$ RET RET - Delete trailing whitespace:
M-x replace-regexp RET SPC+$ RET RET(same as‘M-x delete-trailing-whitespace’) - Search for open delimiters:
C-M-s \s( - Search for duplicated words (works across lines):
C-M-s \(\<\w+\>\)\s-+\1 - Count number of words in buffer:
M-x how-many RET \< RET - Align words beginning with an uppercase letter followed by a lowercase letter:
M-: (setq case-fold-search nil) RETthenM-x align-regexp RET \<[[:upper:]][[:lower:]] RET - Replace word
foobybar(won’t replacefoolbybarl):M-x replace-regexp RET \RET bar - Keep only the first two words on each line:
M-x replace-regexp RET ^\(\W*\w+\W+\w+\).* RET \1 RET - Suppress lines beginning with
;;:M-x flush-lines RET ^;; RET - Remove the text after the first
;on each line:M-x replace-regexp RET \([^;]*\);.* RET \1 RET - Keep only lines that contain an email address:
M-x keep-lines RET \w+\(\.\w+\)?@\(\w\|\.\)+ RET - Keep only one instance of consecutive empty lines:
M-x replace-regexp RET ^C-q C-j\{2,\} RET C-q C-j RET - Keep words or letters in uppercase, one per line:
M-x replace-regexp RET [^[:upper:]]+ RET C-o RET - List lines beginning with
ChapterorSection:M-x occur RET ^\(Chapter\|Section\) RET - List lines with more than 80 characters:
M-x occur RET ^.\{81,\} RET
Use Icicles to Learn about Regexps
Icicles provides these interactive ways to learn about regexps:- `
C-`’ (‘icicle-search’) shows you regexp matches, as does‘C-M-s’, but it can also show you (that is, highlight) regexp subgroup matches. Showing matched subgroups is very helpful for learning, and Icicles is unique in this. There are two ways that you can use this feature:- You can seach for a regexp, but limit the search context, used for further searching, to a particular subgroup match. For example, you can search for and highlight Lisp argument lists, by using a regexp subgroup that matches lists, placing that subgroup after
‘defun’:(defun [^(]*\(([^(]*)\), that is,defun, followed by non-`(’ character(s), followed by `(’, possibly followed by non-`)’ character(s), followed by `)’. - You can search for a regexp without limiting the search context to a subgroup match. In this case, Icicles highlights each subgroup match in a different color. Here’s an example, showing how each subgroup of the complex regexp
(\([-a-z*]+\)*\((\(([-a-z]+ *\([^)]*\))\))\).*is matched:
- You can seach for a regexp, but limit the search context, used for further searching, to a particular subgroup match. For example, you can search for and highlight Lisp argument lists, by using a regexp subgroup that matches lists, placing that subgroup after
- `
C-`’ also helps you learn by letting you use two simple regexps (search within a search) as an alternative to coming up with a single, complex regexp to do the same job. And, as with incremental search, you can change the second regexp on the fly to see immediately what difference the change makes. See Icicles - Search Commands, Overview ‘S-TAB’during minibuffer input shows you all matches for your input string, which can be a regexp. So, just type a regexp whenever the minibuffer is active for completion and hit‘S-TAB’to see what the regexp matches. Try this with command input (‘M-x’), buffer switching (‘C-x b’), file visiting (‘C-x f’), help (‘C-h f’,‘C-h v’), and so on. Almost any time you type input in the minibuffer, you can type a regexp and use‘S-TAB’to see what it matches (and then choose one of the matching candidates to input, if you want).
This page and its linked pages describe Icicles, an Emacs library that enhances minibuffer completion, that is, input completion. This page lists the main Icicles features and presents entry points to all of the Icicles doc.
Icicles is very general, and these concepts give it a wide reach. Icicles has lots for Emacs users and lots for EmacsLisp programmers – its application is limited only by your imagination. Have fun!
If you want to load Icicles each time you start Emacs, then put code in your init file to set your `load-path' appropriately and load Icicles:
The Icicles doc Table of Contents follows, but if you just want to get started immediately, follow the Next links from page to page.
Main Icicles Features
Not a bad summary, by one user:- “In case you never heard of it, Icicles is to
‘TAB’completion what‘TAB’completion is to typing things manually every time.” [1]
- cycle through completion candidates that match your current input *
- use a pattern to match completion candidates, including:
- use multiple input patterns (e.g., regexps) to match candidates progressively (intersection), chaining these filters together like piped
‘grep’commands * - use multiple input patterns at the same time to match multi-part candidates (multi-completions) piecewise — for example, match a container’s name and/or its contained text, in parallel *
- see all possible complete inputs (pertinent commands, variables, and so on) that match your partial or regexp input – the list is updated dynamically (incrementally) if you change your input *
- see all previous inputs that match your partial or regexp input, and selectively reuse them *
- match input against completion candidates that do not match a given regexp; that is, complement the set of matches and use the result for subsequent matching *
- use multiple regexps to search (and replace) text across multiple buffers, files, or regions *, +
- search areas of text that have a certain text property, such as a face *
- browse Imenu or tags entries that match your partial or regexp input *, +
- create and use multiple-choice menus; that is, menus where you can choose multiple entries any number of times *
- create and use multi-commands – commands that you can use to perform an action on any number of candidate inputs any number of times *
- act on multiple inputs in the minibuffer all at once *
- perform set operations (intersection, union,…) on the fly, using sets of completion candidates or other strings *
- persistently save and later reuse sets of completion candidates (e.g. project file names) *
- complete key sequences, and navigate the key-binding hierarchy (this includes the menu bar menu hierarchy) (see also LaCarte) *
- sort completion candidates on the fly, in multiple, context-dependent ways *
Icicles is very general, and these concepts give it a wide reach. Icicles has lots for Emacs users and lots for EmacsLisp programmers – its application is limited only by your imagination. Have fun!
Obtaining and Installing Icicles
See Icicles - Libraries for how to obtain the Icicles library files. Then:- Put those files in a directory that is in your
‘load-path’. - Load Icicles:
‘M-x load library RET icicles RET’. - Turn on Icicle mode:
‘M-x icy-mode RET’.
‘M-x icy-mode RET’ at any time to turn Icicle mode on and off.If you want to load Icicles each time you start Emacs, then put code in your init file to set your `load-path' appropriately and load Icicles:
(add-to-list 'load-path "/my/path/to/icicles/") (require 'icicles)If you also want to turn on Icicle mode each time you start Emacs, then add this line after the others:
(icy-mode 1)
The Icicles doc Table of Contents follows, but if you just want to get started immediately, follow the Next links from page to page.
No comments:
Post a Comment