Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Co-relation and Casualty

What are correlation and causation and how are they different?

Two or more Variables considered to be related, in a statistical factor, if the value of one variable increase or decrease then the value of the other variable (although it may be in the opposite direction).

For example, for the two variables “hours walked” and “weight reduced” there is a Relationship between the two if the increase in hours walked is associated with an decrease in weight lost. If we consider the two variables “hours” worked and “salary” increases, as the price of wages increases a person’s ability to work will also increase (assuming a constant income).

Correlation is a statistical unit that describes the size and direction of a relationship between two or more variables. A Correlation between variables, however, does not automatically mean that the change in a variable is the cause of the change in the values of another variable.

Causation denotes one event is the result of the phenomenon of the other event; i.e. there is a causal relationship between the 2 events. This is also referred to as cause and effect.

Theoretically, the difference between the two kinds of relationships are easy to determine — an action can cause another (e.g. smoking causes an increase in a risk of developing a lung cancer), or it can correlate with another (e.g. smoking can be correlated with alcoholism, but it does not actually cause alcoholism). In practice, however, it remains tough to clearly establish cause and effect, compared with establishing correlation.

Why are correlation and causation important?

The reason of much research or scientific analysis is to determine the extent to which a variable relates to another variable. For example:

  • Is there a relationship between a person’s education level and his health?
  • Is ownership of a pet associated with living longer?
  • Did a company’s marketing campaign will increase their product sales?

These and some other questions are making understand whether a correlation exists between the 2 variables, and if there is a correlation then this may guide further research into determining whether one action causes the other.

How is correlation measured?

For two variables, a statistical correlation is determined by the use of a Correlation Coefficient, represented by the symbol (r), that is a single number that indicates the degree of relationship between two variables.

The coefficient’s numerical value ranges from (+1.0 to –1.0), which provides a description of the strength and direction of the relationship.

If the correlation coefficient consists of a negative value i.e. below 0, it will display a negative relationship between the variables. This means that the variables move in opposite direction.

If the correlation coefficient has a positive value (above 0) it indicates a positive relationship between the variables meaning that both variables move in tandem, i.e. as one variable decreases the other also decreases, or when one variable increases the other also increases.

Where the correlation coefficient is 0 this indicates there is no relationship between the variables (one variable can remain constant while the other increases or decreases).

While the correlation coefficient is a useful measure, it has its limitations:

Correlation coefficients are usually associated with measuring a linear relationship.

For example, if you compare hours worked and income earned for a tradesperson who charges an hourly rate for their work, there is a linear (or straight line) relationship since with each additional hour worked the income will increase by a consistent amount.

If, however, the tradesperson charges based on an initial call out fee and an hourly fee which progressively decreases the longer the job goes for, the relationship between hours worked and income would be non-linear, where the correlation coefficient may be closer to 0.

Care is needed when interpreting the value of ‘r’. It is possible to find correlations between many variables; however, the relationships can be due to other factors and have nothing to do with the two variables being considered.

For example, sale of an ice cream candy and the sale of  a sunscreen lotion can increase and decrease though out a year in a systematic pattern, but it can be a relationship that can be due to the effects of the season (i.e. hotter the weather sees an increase in people wearing sunscreen lotion as well as eating an ice cream candy) rather than due to any direct relationship between sales of sunscreen and ice cream.

The correlation coefficient must not be used to say anything about cause and effect relationship. By examining the value of ‘r’, we can conclude that two variables are related, but that ‘r’ value does not indicate if one variable was the cause of the change in the other.

How can causation be established?

Causality is the area of statistics that is generally misunderstood and misused by people in their mistaken belief that because the data show a correlation than there is definitely an underlying causal relationship.

The use of a controlled study is the most effective pattern of creating causality between two variables. In a controlled study, the sample or population is split into two parts, with both groups being comparable in almost every scenario. The two groups then receive different treatment processes, and the outcomes of each group are examined.

Example:

In medical research, one group might be given a placebo while the other group is given a new type of medication. If the two groups have determinable different outcomes, the different experiences may have caused the different outcomes.

Due to ethical reasons, there are limits to the use of controlled studies; it would not be appropriate to use two comparable groups and have one of them undergo a harmful activity while the other does not. To overcome this situation, observational studies are often used to investigate correlation and causation for the population of interest. The studies can look at the groups’ behaviors and outcomes and observe any changes over time.

The objective of these studies is to display statistical information to add to some other sources of information that would be required for a process of establishing whether or not causality exists between the two variables.

The post Co-relation and Casualty appeared first on Prwatech.



This post first appeared on Learn Big Data Hadoop In Bangalore, please read the originial post: here

Share the post

Co-relation and Casualty

×

Subscribe to Learn Big Data Hadoop In Bangalore

Get updates delivered right to your inbox!

Thank you for your subscription

×