Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Regular Expression in Python Part I: Introducing Regex Library

Python is a versatile programming language and it has a rich library. In the visualization series we introduced you to different libraries used for data visualization purposes. Now, we introduce you to the Regex library in Python for handling textual data.

In Python to perform pattern recognition on textual data Regex is a library that provides a range of methods which when used with right pattern gives us the desired results. For example, if you want to change the spelling of colour to color in your text you can easily do so with the help of a given method provided that you form the pattern correctly.

Type of textual data in Regex

Literals:- In Python literals are the characters or words with their original meaning intact like the word dog means a literal dog and there is no hidden meaning behind that word.

Meta-characters:- These are the words or characters which hold special meaning for example \n means a new line or \t means tab separated values.

Given below are few of the meta-characters used in python with their meanings:-

\dMatches a digit .i.e. \d= 1 ,\d\d= 23, \d\d\d = 345

\w – Matches alpha-numeric characters i.e. \w= 1, \w= a, \w\w= a1

\W– Matches special characters i.e. \W= %

Dog[ogn]– Matches a single character within the square bracketsi.e. Dogo, Dogg, Dogn

Dog(ogn) – Matches the entire string within the parenthesisi.e. Dogogn

Dog(ogn|aaa)– Matches either ogn or aaa i.e. Dogogn or Dogaaa

*– Matches 0 or more characters i.e. tre* = tree, tre*= tr, tre*= treeeeee

?– Matches 0 or 1 character i.e. colou?r= color, colou?r= colour

+ – Matches 1 or more character i.e. tre+= tree, tre+= treee, tre+≠tre

. – Matches alpha-numeric or special characters but only one time i.e. tre.= tree, tre.= tre#, tre.=tre1, tre.≠tre#1

The above meta-characters alone or in combination are used to form a pattern  which then are used for text mining for example tre.* means match anything 0 or more times that means now we can match tre#1 or tre.

Watch the video tutorial attached below to learn more about the fundamentals of this library. 

Hopefully you found the discussion on Regex library helpful and at the end of it you must have become familiar with the way this particular library works. To learn more about python for data analysis, keep on exploring Dexlab Analytics blog, where you will always find informative posts.


.

The post Regular Expression in Python Part I: Introducing Regex Library appeared first on DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA.



This post first appeared on Discover The Best Industries To Have A Career In Data Science, please read the originial post: here

Share the post

Regular Expression in Python Part I: Introducing Regex Library

×

Subscribe to Discover The Best Industries To Have A Career In Data Science

Get updates delivered right to your inbox!

Thank you for your subscription

×