InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Stat.Book Beginners Stats & R
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

# Beginners statistics introduction

## What is a statistic?

Example,  Definition and Use,  Tips and Notes,  Test yourself,  References  Download R  R is Free, very powerful, and does the boring calculations & graphs for scientists.

### Example, with R

Say you have been collecting data - for example the colour - on a number of items.

You could just list the colour of each one like this.

greenishyellow   yellowishgreen   blue   puregreen   bluegreen   blue

Alternatively you could summarize the data in various ways.

For instance:

1. that list has 6 items
2. there are 5 different colours
3. the most common single colour is blue
4. most of that list are shades of green - or greenish
5. if you select an item at random there is a 3/6 chance it will be a shade of green
6. you could also say those colours range from greenishyellow to blue - but that assumes you rank them in spectral order, not in order of brightness, or 'purity', or emotional appeal.

With so few data is easy to do such summaries. But if you were summarizing very many results you may prefer a bit of help. Some of those summaries can be obtained with R:

### Definition and Use

The term 'statistic' can refer to several rather different things.
1. A statistic summarizes or represents a set of information - most commonly as a single number. The term statistic is used both for the value and for the mathematical function used to obtain that value.
Many functions are available to summarize information. For example, a salesman could equally truthfully provide the most typical cost as 'on average' or give the maximum ('up to...') or the minimum ('from...') just \$ 300. The 'average', 'maximum' and 'minimum' are all statistics.

Note, summary statistics of a sample are often used as estimates for the population at large - for instance when you are told 'the average man has 1.8 children' that result was found in a sample of men - it is usually impossible to check every man.

1. The term is used in the plural 'statistics' to describe the study of the collection, organization, analysis, interpretation and presentation of data.

### Tips and Notes

Whilst simple numerical measures are a useful way to summarize data:
• A single statistic, such as an average, is often a simplistic and misleading way to represent the information.
• Since a picture can provide more information than a thousand words, and because it is can be much easier to assess images than numbers, graphs are often a much more powerful way to present and explore data - assuming you and your audience can interpret them.

Since there are innumerable ways to summarize any set of information, and assuming no mistakes are made in making that summary, you should always ask yourself:

1. Which is the most appropriate way to summarize the information at hand - and who decides what is appropriate, and how impartial is that decision?
2. What information is being summarized, and how was it obtained - is the information detailed, consistent, plentiful, and does it comprise all the items of interest, or is it assumed to represent a larger set?
Governments and corporations have particular ideas of what summaries are appropriate, and may select their information and summary measures so as to achieve particular outcomes. Hence 'statistics' are commonly seen as 'lies using numbers'.

Nevertheless, since statistics are used for all sorts of important things, and because we all use statistics (consciously or otherwise) it is wise to understand something of their properties - and we do not mean you merely need to know how to calculate them, or to memorize the results of those calculations!

It is easy to get a computer to calculate a statistic, the hard part is knowing whether the result means anything - and how it may be misleading.

### Test yourself

Consider this set of children's test-scores:
11, 35, 36, 37, 38, 99, 104, 105, 417
• there are 9 scores
• every score is equally common
• the average (arithmetic mean) score is 98
• they range from 11 to 417
• they fall into 2 groups, plus two extreme values

Notice that:

1. Which, if any, of those summaries is appropriate depends upon how the summary is to be used.
2. How those few values were obtained is unstated - are they trustworthy, or supposed to be representative?

Consider this summary of salaries of a small company:

• 1 member of staff received \$ 32708400 that year
• 1104 members of staff received \$ 400 that year
The company said 'since their average salary was \$ 30000 our staff receive far above the industry average of \$ 15000 per year'.

Do you think this is an appropriate way to summarize their data?

Would the mid-ranking salary be a better measure of what their average member of staff gets?

It is easy to work out the numbers with R:

Note, on most pages we provide the R-code, and a few comments/notes, and expect you to ask appropriate questions and reach your own conclusions. Their object is to promote thought, not to simply impart information.

### Useful references

Wikipedia: Statistic. Full text

Wikipedia: Statistics. Full text