Research 1 - Statistics basic notions
Statistics is a branch of mathematics which deals with the analysis, interpretation and presentation of data; it's used in several disciplines such as psychology, business, government, computer science and many more.
In applying statistics it's convenient to begin with a statistical population to be studied.
A statistical population is a set of elements which share some information and are of interest for one or more questions (or experiments). A subset of the population is called statistical sample and, a dataset, is a representation of the data that are considered during a statistical investigation.
A dataset can be viewed as a table where every column is labeled with a variable name and each row contains the value the variable assume in a particular statistical unit.
For instance, consider as the population the students of La Sapienza university and that we would like to analyse their gender, ages and their height. Below is the dataset of the information gathered after a sample of the students compiled some questionnaire.
As can be seen above, a variable can be measured on different scales. However, in statistics, the most used scales of measurement are nominal scales, ordinal scales, interval and ratio scales. Considering the example dataset above, the variable gender's nominal level of measurement because it can indicate something about an individual (i.e. male or female) but doesn't make sense to ask if some individual is more "male/female" than another. On the other hand, the variables age and height are both measured on a ratio scale because makes sense to ask for an equality or greater/less than comparison. In other words, we can answer questions like "who's higher among two or more students?" or "who's the highest student?"; we can compute median, average values, and so on.
The main difference between a ratio variable and an interval is the presence of the so called "true zero". For instance, the temperature expressed in Farenheit or Celsius is not a ratio variable. This because a temperature of 0.0 on either of those scales does not mean 'no heat'. However, temperature in Kelvin is a ratio variable, as 0.0 Kelvin really does mean 'no heat'.
A classic example of an interval scale then is a temperature measured in Celsius whereas some typical examples of ratio scales are any which posses an absolute zero characteristic (e.g. age, weight, height, income earned and so on).
In applying statistics it's convenient to begin with a statistical population to be studied.
A statistical population is a set of elements which share some information and are of interest for one or more questions (or experiments). A subset of the population is called statistical sample and, a dataset, is a representation of the data that are considered during a statistical investigation.
A dataset can be viewed as a table where every column is labeled with a variable name and each row contains the value the variable assume in a particular statistical unit.
For instance, consider as the population the students of La Sapienza university and that we would like to analyse their gender, ages and their height. Below is the dataset of the information gathered after a sample of the students compiled some questionnaire.
Gender | Age | Height (cm) |
---|---|---|
Male | 21 | 178 |
Male | 24 | 185 |
Female | 21 | 175 |
Male | 25 | 170 |
Male | 23 | 190 |
Female | 29 | 167 |
As can be seen above, a variable can be measured on different scales. However, in statistics, the most used scales of measurement are nominal scales, ordinal scales, interval and ratio scales. Considering the example dataset above, the variable gender's nominal level of measurement because it can indicate something about an individual (i.e. male or female) but doesn't make sense to ask if some individual is more "male/female" than another. On the other hand, the variables age and height are both measured on a ratio scale because makes sense to ask for an equality or greater/less than comparison. In other words, we can answer questions like "who's higher among two or more students?" or "who's the highest student?"; we can compute median, average values, and so on.
The main difference between a ratio variable and an interval is the presence of the so called "true zero". For instance, the temperature expressed in Farenheit or Celsius is not a ratio variable. This because a temperature of 0.0 on either of those scales does not mean 'no heat'. However, temperature in Kelvin is a ratio variable, as 0.0 Kelvin really does mean 'no heat'.
A classic example of an interval scale then is a temperature measured in Celsius whereas some typical examples of ratio scales are any which posses an absolute zero characteristic (e.g. age, weight, height, income earned and so on).
Commenti
Posta un commento