Thanx to http://www.emacswiki.org/emacs/RegularExpression, I hv lost many times pages in my bookmark, hence I would prefer to copy it on here for myself and others. There are some updations/changes where I felt it could help others too....Credit all goes to -

http://www.emacswiki.org/emacs/RegularExpression

RegularExpressions in Emacs/XEmacs

A regular expression (abbreviated “regexp” or sometimes just “re”) is a search-string with wildcards – and more. It is a pattern that is matched against the text to be searched. See Regexps. Examples:

    "alex"

A plain string is a regular expression that matches the string exactly. The above regular expression matches “alex”.

    "alexa?"

Some characters have special meanings in a regular expression. The question mark, for example, says that the preceding expression (the character “a” in this case) may or may not be present. The above regular expression matches “alex” or “alexa”.

Regexps are important to Emacs users in many ways, including these:

We search with them interactively. Try ‘C-M-s’ (command isearch-forward-regexp).
Emacs code uses them to parse text. We use regexps all the time, without knowing it, when we use Emacs.

Contents

Regular Expression Syntax

Here is the syntax used by Emacs for regular expressions. Any character matches itself, except for the list below.
The following characters are special : . * + ? ^ $ \ [
Between brackets [], the following are special : ] - ^
Many characters are special when they follow a backslash – see below.

  .        any character (but newline)
  *        previous character or group, repeated 0 or more time
  +        previous character or group, repeated 1 or more time
  ?        previous character or group, repeated 0 or 1 time  
  ^        start of line
  $        end of line
  [...]    any character between brackets
  [^..]    any character not in the brackets
  [a-z]    any character between a and z
  \        prevents interpretation of following special char
  \|       or
  \w       word constituent
  \b       word boundary
  \sc      character with c syntax (e.g. \s- for whitespace char)
  \( \)    start\end of group
  \< \>    start\end of word
  \` \'    start\end of buffer
  \1       string matched by the first group
  \n       string matched by the nth group
  \{3\}    previous character or group, repeated 3 times
  \{3,\}   previous character or group, repeated 3 or more times
  \{3,6\}  previous character or group, repeated 3 to 6 times

.?, +?, and ?? are non-greedy versions of ., +, and ? – see NonGreedyRegexp. Also, \W, \B, and \Sc match any character that does not match \w, \b, and \sc.
Characters are organized by category. Use C-u C-x = to display the category of the character under the cursor.

  \ca      ascii character
  \Ca      non-ascii character (newline included)
  \cl      latin character
  \cg      greek character

Here are some [[syntax_classes?]] that can be used between brackets, [].

  [:digit:]  a digit, same as [0-9]
  [:upper:]  a letter in uppercase
  [:space:]  a whitespace character, as defined by the syntax table
  [:xdigit:] an hexadecimal digit
  [:cntrl:]  a control character
  [:ascii:]  an ascii character

Syntax classes:

  \s-   whitespace character        \s/   character quote character
  \sw   word constituent            \s$   paired delimiter         
  \s_   symbol constituent          \s'   expression prefix        
  \s.   punctuation character       \s<   comment starter          
  \s(   open delimiter character    \s>   comment ender            
  \s)   close delimiter character   \s!   generic comment delimiter
  \s"   string quote character      \s|   generic string delimiter 
  \s\   escape character

You can see the current [[syntax_table?]] by typing C-h s. The syntax table depends on the current mode. As expected, letters a..z are listed as word constituents in text-mode. Other word constituents in this mode include A..Z, 0..9, $, %, currency units, accented letters, kanjis. See EmacsSyntaxTable for details.

Idiosyncrasies of Emacs Regular Expressions

In a interactive search involving a regexp, a space character stands for one or more whitespace characters (tabs are whitespace characters). Enter C-q SPC to get a single space character. Or put the following in your InitFile to override this behaviour.

                 (setq search-whitespace-regexp nil)

[^ … ] matches all characters not in the list, even newlines. Put a newline in the list if you want it not to be matched. You can enter a newline character using ‘C-o’, ‘C-q C-j’, or ‘C-q 012 RET’. Note also that \s- matches space, tab, newline and carriage return. This can be handy in a [^ … ] construct.
Default case handling for replacing commands executes case conversion. This means that both upper and lower case match in the regexp, whereas the case in the replacement string is chosen according to the match syntax. Try for example replacing john by harry below. Case conversion can be toggled on/off by typing ‘M-c’ in the minibuffer during search. You can also set the variable case-fold-search to nil to disable case conversion; see CaseFoldSearch for more details. In the following example, only the last line would then be replaced.

                           John  =>  Harry
                           JOHN  =>  HARRY
                           john  =>  harry

Backslashes must be double-quoted when used in Lisp code. Regular expressions are often specified using strings in EmacsLisp. Some abbreviations are available: \n for newline, \t for tab, \b for backspace, \u3501 for character with unicode value 3501, and so on. Backslashes must be entered as \\. Here are two ways to replace the decimal point by a comma (e.g. 1.5 -> 1,5), first by an interactive command, second by executing Lisp code (type C-x C-e after the expression to get it executed).

           M-x replace-regexp RET \([0-9]+\)\. RET \1, RET
          (while (re-search-forward "\\([0-9]+\\)\\." nil t)
                        (replace-match "\\1,"))

Some Regexp Examples

 [-+[:digit:]]                     digit or + or - sign
 \(\+\|-\)?[0-9]+\(\.[0-9]+\)?     decimal number (-2 or 1.5 but not .2 or 1.)
 \(\w+\) +\1\>                     two consecutive, identical words
 \<[[:upper:]]\w*                  word starting with an uppercase letter
  +$                               trailing whitespaces (note the starting SPC)
 \w\{20,\}                         word with 20 letters or more
 \w+phony\>                        word ending by phony
 \(19\|20\)[0-9]\{2\}              year 1900-2099
 ^.\{6,\}                          at least 6 symbols
 ^[a-zA-Z0-9_]\{3,16\}$            decent string for a user name
  C-q C-j ]*>\(.*?\)  html tag

Some Emacs Commands that Use Regular Expressions

 C-M-s                   incremental forward search matching regexp
 C-M-r                   incremental backward search matching regexp 
 replace-regexp          replace string matching regexp
 query-replace-regexp    same, but query before each replacement
 align-regexp            align, using strings matching regexp as delimiters
 highlight-regexp        highlight strings matching regexp
 occur                   show lines containing a match
 multi-occur             show lines in all buffers containing a match
 how-many                count the number of strings matching regexp
 keep-lines              delete all lines except those containing matches
 flush-lines             delete lines containing matches
 grep                    call unix grep command and put result in a buffer
 lgrep                   user-friendly interface to the grep command
 rgrep                   recursive grep
 dired-do-copy-regexp    copy files with names matching regexp
 dired-do-rename-regexp  rename files matching regexp 
 find-grep-dired         display files containing matches for regexp with Dired

Note that list-matching-lines is an alias for occur and delete-matching-lines is an alias for flush-lines. The command highlight-regexp is bound to C-x w h. Also query-replace-regexp is bound by default to C-M-%, although some people prefer using an alias, like M-x qrr. Put the following in your InitFile to create such alias.

   (defalias 'qrr 'query-replace-regexp)

Tools for Constructing Regexps

Command ‘re-builder’ constructs a regular expression. You enter the regexp in a small window at the bottom of the frame. The first 200 matches in the buffer are highlighted, so you can see if the regexp does what you want. Use Lisp syntax, which means doubling backslashes and using \\\\ to match a literal backslash.
Macro ‘rx’ provides user-friendly syntax for regular expressions. For example, (rx (one-or-more blank) line-end) returns the regexp string "\$?:[[:blank:]]+$\$". See rx.
SymbolicRegexp is similar in aim to ‘rx’.

Study and Practice

Read about regexps in the Elisp manual (see also RegexpReferences), and study EmacsLisp code that uses regexps.
Regexp searching (‘C-M-s’) is a great way to learn about regexps – see Regexp Searches. Change your regexp on the fly and see immediately what difference the change makes.
Some examples of use (see also ReplaceRegexp and EmacsCrashRegexp):

Search for trailing whitespace: C-M-s SPC+$
Highlight all trailing whitespace: M-x highlight-regexp RET SPC+$ RET RET
Delete trailing whitespace: M-x replace-regexp RET SPC+$ RET RET (same as ‘M-x delete-trailing-whitespace’)
Search for open delimiters: C-M-s \s(
Search for duplicated words (works across lines): C-M-s $\<\w+\>$\s-+\1
Count number of words in buffer: M-x how-many RET \< RET
Align words beginning with an uppercase letter followed by a lowercase letter: M-: (setq case-fold-search nil) RET then M-x align-regexp RET \<[[:upper:]][[:lower:]] RET
Replace word foo by bar (won’t replace fool by barl): M-x replace-regexp RET \ RET bar
Keep only the first two words on each line: M-x replace-regexp RET ^$\W*\w+\W+\w+$.* RET \1 RET
Suppress lines beginning with ;;: M-x flush-lines RET ^;; RET
Remove the text after the first ; on each line: M-x replace-regexp RET $[^;]*$;.* RET \1 RET
Keep only lines that contain an email address: M-x keep-lines RET \w+$\.\w+$?@$\w\|\.$+ RET
Keep only one instance of consecutive empty lines: M-x replace-regexp RET ^C-q C-j\{2,\} RET C-q C-j RET
Keep words or letters in uppercase, one per line: M-x replace-regexp RET [^[:upper:]]+ RET C-o RET
List lines beginning with Chapter or Section: M-x occur RET ^$Chapter\|Section$ RET
List lines with more than 80 characters: M-x occur RET ^.\{81,\} RET

Use Icicles to Learn about Regexps

Icicles provides these interactive ways to learn about regexps:

`C-`’ (‘icicle-search’) shows you regexp matches, as does ‘C-M-s’, but it can also show you (that is, highlight) regexp subgroup matches. Showing matched subgroups is very helpful for learning, and Icicles is unique in this. There are two ways that you can use this feature:
- You can seach for a regexp, but limit the search context, used for further searching, to a particular subgroup match. For example, you can search for and highlight Lisp argument lists, by using a regexp subgroup that matches lists, placing that subgroup after ‘defun’: (defun [^(]*$([^(]*)$, that is, defun, followed by non-`(’ character(s), followed by `(’, possibly followed by non-`)’ character(s), followed by `)’.
- You can search for a regexp without limiting the search context to a subgroup match. In this case, Icicles highlights each subgroup match in a different color. Here’s an example, showing how each subgroup of the complex regexp ($[-a-z*]+$ *$(\(([-a-z]+ *\([^)]*$)\))\).* is matched:

`C-`’ also helps you learn by letting you use two simple regexps (search within a search) as an alternative to coming up with a single, complex regexp to do the same job. And, as with incremental search, you can change the second regexp on the fly to see immediately what difference the change makes. See Icicles - Search Commands, Overview
‘S-TAB’ during minibuffer input shows you all matches for your input string, which can be a regexp. So, just type a regexp whenever the minibuffer is active for completion and hit ‘S-TAB’ to see what the regexp matches. Try this with command input (‘M-x’), buffer switching (‘C-x b’), file visiting (‘C-x f’), help (‘C-h f’, ‘C-h v’), and so on. Almost any time you type input in the minibuffer, you can type a regexp and use ‘S-TAB’ to see what it matches (and then choose one of the matching candidates to input, if you want).

More on Icicles

This page and its linked pages describe Icicles, an Emacs library that enhances minibuffer completion, that is, input completion. This page lists the main Icicles features and presents entry points to all of the Icicles doc.

Main Icicles Features

Not a bad summary, by one user:

: “In case you never heard of it, Icicles is to ‘TAB’ completion what ‘TAB’ completion is to typing things manually every time.” [1]

Icicles lets you do the following:

cycle through completion candidates that match your current input *
use a pattern to match completion candidates, including:
- regexp matching (including substring) *
- fuzzy matching *
- prefix matching (as in vanilla Emacs)
- command abbreviation matching *
use multiple input patterns (e.g., regexps) to match candidates progressively (intersection), chaining these filters together like piped ‘grep’ commands *
use multiple input patterns at the same time to match multi-part candidates (multi-completions) piecewise — for example, match a container’s name and/or its contained text, in parallel *
see all possible complete inputs (pertinent commands, variables, and so on) that match your partial or regexp input – the list is updated dynamically (incrementally) if you change your input *
see all previous inputs that match your partial or regexp input, and selectively reuse them *
match input against completion candidates that do not match a given regexp; that is, complement the set of matches and use the result for subsequent matching *
use multiple regexps to search (and replace) text across multiple buffers, files, or regions *, +
search areas of text that have a certain text property, such as a face *
browse Imenu or tags entries that match your partial or regexp input *, +
create and use multiple-choice menus; that is, menus where you can choose multiple entries any number of times *
create and use multi-commands – commands that you can use to perform an action on any number of candidate inputs any number of times *
act on multiple inputs in the minibuffer all at once *
perform set operations (intersection, union,…) on the fly, using sets of completion candidates or other strings *
persistently save and later reuse sets of completion candidates (e.g. project file names) *
complete key sequences, and navigate the key-binding hierarchy (this includes the menu bar menu hierarchy) (see also LaCarte) *
sort completion candidates on the fly, in multiple, context-dependent ways *

As you can see, keywords here include match, complete, input, multiple, regexp, cycle, incremental, browse, and sets. You will see these concepts appear over and over in Icicles, with multiple meanings, combinations, and applications. They are the atoms that Icicles combines for chemistry that can help you use Emacs better.
Icicles is very general, and these concepts give it a wide reach. Icicles has lots for Emacs users and lots for EmacsLisp programmers – its application is limited only by your imagination. Have fun!

Obtaining and Installing Icicles

See Icicles - Libraries for how to obtain the Icicles library files. Then:

Put those files in a directory that is in your ‘load-path’.
Load Icicles: ‘M-x load library RET icicles RET’.
Turn on Icicle mode: ‘M-x icy-mode RET’.

You’re good to go. You can use ‘M-x icy-mode RET’ at any time to turn Icicle mode on and off.
If you want to load Icicles each time you start Emacs, then put code in your init file to set your `load-path' appropriately and load Icicles:

 (add-to-list 'load-path "/my/path/to/icicles/")
 (require 'icicles)

If you also want to turn on Icicle mode each time you start Emacs, then add this line after the others:

 (icy-mode 1)

The Icicles doc Table of Contents follows, but if you just want to get started immediately, follow the Next links from page to page.

gyan

Monday, May 27, 2013

Editor:: Emacs basics/adnanced