Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Exploratory Data Analysis: Data Visualization

Posted on Oct 8 In this article, we’ll use data visualization to explore a dataset from Streeteasy which contains information about housing rentals in New York City.Exploratory Data Analysis (EDA) is a process of describing the data by means of statistical and visualization techniques in order to bring important aspects of that data into focus for further analysis.Univariate analysisUnivariate analysis focuses on a single variable at a time. Univariate data visualizations can help us answer questions like:What is the typical price of a rental in New York City?What proportion of NYC rentals have a gym?Depending on the type of variable (quantitative or categorical) we want to visualize, we need to use slightly different visualizations.Quantitative variablesBox plots (or violin plots) and histograms are common choices for visually summarizing a quantitative variable. These plots are useful because they simultaneously communicate information about minimum and maximum values, central location, and spread. Histograms can additionally illuminate patterns that can impact an analysis (eg., skew or multimodality).For example, suppose we are interested in learning more about the price of apartments in NYC. A good starting place is to Plot a box plot of the rent variable. We could plot a boxplot of rent as follows:We can see that most rental prices fall within a range of $2500-$5000; however, there are many outliers, particularly on the high end. For more detail, we can also plot a histogram of the rent variable.The histogram highlights the long right-handed tail for rental prices. We can get a more detailed look at this distribution by increasing the number of bins:Categorical variablesFor categorical variables, we can use a bar plot (instead of a histogram) to quickly visualize the frequency (or proportion) of values in each category. For example, suppose we want to know how many apartments are available in each borough. We can visually represent that information as follows:Bivariate analysisIn many cases, a data analyst is interested in the relationship between two variables in a dataset. For example:Do apartments in different boroughs tend to cost different amounts?What is the relationship between the area of an apartment and how much it costs?Depending on the types of variables we are interested in, we need to rely on different kinds of visualizations.One quantitative variable and one categorical variableTwo good options for investigating the relationship between a quantitative variable and a categorical variable are side-by-side box plots and overlapping histograms.For example, suppose we want to understand whether apartments in different boroughs cost different amounts. We could address this question by plotting side by side box plots of rent by borough:This plot indicates that rental prices in Manhattan tend to be higher and have more variation than rental prices in other boroughs. We could also investigate the same question in more detail by looking at overlapping histograms of rental prices by borough:Two quantitative variablesA scatter plot is a great option for investigating the relationship between two quantitative variables. For example, if we want to explore the relationship between rent and size_sqft, we could create a scatter plot of these two variables:The plot indicates that there is a strong positive linear relationship between the cost to rent a property and its square footage. Larger properties tend to cost more money.Two categorical variablesSide by side (or stacked) bar plots are useful for visualizing the relationship between two categorical variables. For example, suppose we want to know whether rentals that have an elevator are more likely to have a gym. We could plot a side by side bar plot as follows:This plot tells us that buildings with elevators are approximately equally likely to have a gym or not have a gym; meanwhile, apartments without elevators are very unlikely to have a gym.Multivariate analysisSometimes, a data analyst is interested in simultaneously exploring the relationship between three or more variables in a single visualization. Many of the visualization methods presented up to this point can include additional variables by using visual cues such as colors, shapes, and patterns. For example, we can investigate the relationship between rental price, square footage, and borough by using color to introduce our third variable:Another common data visualization for multivariate analysis is a heat map of a correlation matrix for all quantitative variables:ConclusionIn this article, I’ve summarized some of the important considerations for choosing a data visualization based on the question a data analyst wants to answer and the type of data that is available. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse vigneshkw2s - Jul 7 Chris Greening - Aug 8 Sekinat Oyero - Jul 29 Avinash Singh - Jul 6 Once suspended, parq254 will not be able to comment or publish posts until their suspension is removed. Once unsuspended, parq254 will be able to comment and publish posts again. Once unpublished, all posts by parq254 will become hidden and only accessible to themselves. If parq254 is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to Chris. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag parq254: parq254 consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging parq254 will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

Exploratory Data Analysis: Data Visualization

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×