Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Introducing Automation: Learn to Automate Data Preparation with Python Libraries

In this blog we are discussing automation, a function for automating data preparation using a mix of Python libraries. So let’s start.

Problem statement

You are given a table containing data. The first Row contains column headers and all the other rows contain the data. Some of the rows are defective, a row is defective if it contains at least one cell with a NULL supposed to delete value written in it. You are supposed to delete all the of the defective rows.

In the sample below, the second row is defective, it contains a NULL in the salary column. The first row is never defective. Every cell contains a single word and each word may contain only digit (0-9), lowercase and upper case English letters (a-z, A-Z). for example:

For instance in the above example after removing the defective row the table looks like this:

You cannot change the order of rows. The number of rows and columns may differ in different test case.

The table is given in CSV format. Every two consecutive cells in each row are separated by a single comma ‘,’ symbol and every two consecutive rows are separated by a new-line ‘\n’ symbol. For example, the first table from the task statement, written in CSV format is a single String ‘S. No.,Name,Salary\n1,Niharika,50000\n2,Vivek,NULL\n3,Niraj,55000’ . You may assume that each row has same number of cells.

Write a function:

def Solution(s)

Which, given a string S of length N, returns the table without the defective rows in a CSV format.

Given S=‘S. No.,Name,Salary\n1,Niharika,50000\n2,Vivek,NULL\n3,Niraj,55000’

The table with data from string S looks as follows:

After removing the defective rows the table should look like this:

You can try a number of strings to cross-validate the function you have created.

Let’s begin.

  • First we will store the string in a variable s
  • Now we will start by declaring the function name and importing all the necessary libraries.
  • Creating a pattern to separate the string from ‘\n’ .
  • Creating a loop to create multiple lists within a list.

In the above code the list is converted to an array and then used to create a dataframe and stored as csv file in the default working directory.

  • Now we need to split the string to create multiple columns.

The above code creates a dataframe with multiple columns.

Now after dropping the rows with NaN values data looks like

To reset the index we can now use .reset_index() method.

  • Now the problem with the above dataframe created is that the NULL values are in string format, so first we need to convert them into NaN values and then only we will be able to drop them. For that we will be using the following code.

Now we will be able to drop the NaN values easily by using .dropna() method.

In the above code we first dropped the NaN values  then we used the first row of the data set to create column names and then dropped the original row. We also made the first column as index.


Hence we have managed to create a function that can give us the above data. Once created this function can be used to convert a string into dataframe with similar pattern.

Hopefully, you found the discussion informative enough. For further clarification watch the video attached below the blog. To access more informative blogs on Data science using python training related topics, keep on following the Dexlab Analytics blog.


.

The post Introducing Automation: Learn to Automate Data Preparation with Python Libraries appeared first on DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA.



This post first appeared on Discover The Best Industries To Have A Career In Data Science, please read the originial post: here

Share the post

Introducing Automation: Learn to Automate Data Preparation with Python Libraries

×

Subscribe to Discover The Best Industries To Have A Career In Data Science

Get updates delivered right to your inbox!

Thank you for your subscription

×