TimeWeb
SITEMAP | HELP | SAMPLE DATA | MIMAS DATA | SEARCH TIMEWEB  
HOME : DIGGING : MEANING : EXPLANATION
Digging
  - Tour
  - Meaning
      - Explanation
      - Illustration
      - Worksheets
      - Review
  - Sourcing
  - Selecting
  - Verifying
Crunching
Buffing
Reference

 



ExplanationEXPLANATION

A guide to what is meant by some common data-related terms:


Variables

Data are constantly being collected about much of what goes on. The death rate, TV viewing, wages or whatever are called "variables". This is because data on all of these things can and usually do vary from one time period to another, or from one individual to another, or from one country to another.

A variable cannot exist on its own; it is always related in some way to other variables. The variable "pc use by Higher Education students" is related to resources at universities, access to pcs in different faculties, home access to pcs, skills and so on.

Number measurements of characteristics like height, weight, amount of time and sum of money are known as quantitative variables, as they measure quantities. When they are processed, they produce parameters which can be used to measure and predict behaviour.

Variables are either dependent or independent. Dependent variables are affected by changes in the independent variable; independent variables are not affected by changes in the dependent variable. An analysis of a person's income growth over the course of their career, for example, would involve the independent variable time and the dependent variable income.

Quantitative or numerical data are produced when counts or measurements are the result of observations. The data are said to be discrete if the measurements are integers or whole numbers (eg number of people in a household, number of cigarettes smoked per day) and continuous if the measurements can take on any value, usually within some range (eg weight).

Aesthetic measures, such as appearance, taste and texture, are known as qualitative variables because they look at qualities, not quantities. They do not provide us with parameters. These non-parametric measures are used increasingly in market research when, for instance, a new product or variety is being launched.

Data are classified as: nominal, if there is no natural order between the categories (eg eye colour), or ordinal if an ordering exists (eg exam results, socio-economic status). We can denote data values by: x1, x2,....xn, where n is the total number of values.

Top

Tables

Statistics often appear in their raw form as tables of data. Tables can carry a great deal of information. However, they can make relationships between the variables covered in the table, hard to notice. Look at the example provided below and try to spot relationships or trends

Date Percentage change
1985 Jan 5
1985 Jun 7
1986 Jan 5.5
1986 Jun 2.5
1987 Jan 3.9
1987 Jun 4.2
1988 Jan 3.3
1988 Jun 4.6
1989 Jan 7.5
1989 Jun 8.3
1990 Jan 7.7
1990 Jun 9.8
1991 Jan 9
1991 Jun 5.8
1992 Jan 4.1
1992 Jun 3.9
1993 Jan 1.7
1993 Jun 1.2
1994 Jan 2.5
1994 Jun 2.6
1995 Jan 3.3
1995 Jun 3.5
1996 Jan 2.9
1996 Jun 2.1
1997 Jan 2.8
1997 Jun 2.9
1998 Jan 3.3
1998 Jun 3.7
1999 Jan 2.4
1999 Jun 1.3

Difficult isn't it?

Top

Graphs and Charts

More detailed guides to the use of graphs are contained in the "picturing" section of "Buffing".

Graphs are constructed from tables of data. They are drawn to illustrate the relationship between two or more variables.

Axes

Suppose that PC use by HE students is measured on an annual basis and the results are plotted on a graph. Values for the variable "PC use by HE students" are shown by a scale on the vertical axis. Values for the variable "Time" are shown by the scale on the horizontal axis. Where the two axes intersect is known as the origin.

"PC use by HE students" is called the dependent variable, because the values for it are dependent upon when the data was recorded. "Time" is called the independent variable. When drawing graphs you must remember that the dependent variable is always shown on the vertical (y) axis and the independent variable always on the horizontal (x) axis.

Line Graphs plot the relationship between two or more variables by using connected data points.. Line graphs are most often used where there is time series data. They are appropriate where the data is continuous. This means that you have statistics over an unbroken period of time, such as monthly, quarterly or annual data.

Here is a graph of the table shown above

Graph of example data

Source: National Statistics

Better don't you think?

Bar Charts depict data as a series of rectangles, with the height of each rectangle showing the level of the variable. They are appropriate where cross-sectional data is being used.

Histograms appear similar to bar charts. But whereas a bar chart shows the absolute amount in each category on the Y axis (vertical), a histogram marks the frequency of events.

Pie Charts enable you to show the component parts of an item. It may be helpful to imagine a pie that has been divided up into slices.

Z Charts are one-year time graphs comprising three sets of data: monthly sales or production levels, cumulative totals and moving annual totals.

There is an illustration of Z Charts available in the 'Illustration' section

There is also a worksheet available in the 'Buffing' section to help you practice choosing suitable graphs and charts.

Top

Types of Data Series

Cross section

Cross section data series give information on the different categories within a particular item at a certain time. The item could be anything that can be looked at in terms of its different categories, such as the age profile of students within Further Education colleges in the UK (14 - 17, 18 - 19, 20 - 24 and 25 and over).

Time Series

Time series data looks at the movement over time of a particular variable. Data within a time series can be "raw" (it is presented as it was recorded), or "seasonally adjusted" (peaks and troughs in the figures are smoothed out so that underlying trends can be identified). Sometimes you will need to show the seasonal pattern and will use raw data; at other times the trend over a number of years needs to be shown, and seasonally adjusted data should be used.

Definitional

Definitional series show the "breakdown" of some aggregate variable into its component parts; these constituent parts should add up to the aggregate total. Data explaining the Balance of Payments on current account for a particular country, in a particular year is a good example of a definitional series: the current account is divided into trade in goods (visible trade), and trade in services (invisible trade).

Multiple series

When you have two or more inter-related variables that need to be presented together, it is possible to plot these on just one graph or in the form of a scattergram. Be careful that the scale on the axes is the same for the two, or more, variables analysed. Try to think of reasons why the variables are inter-related. In the example of the link between disposable income and consumption, for instance, it is to be expected that the higher your income level, the greater your consumption of goods and services.

Top

Percentages

What do percentages do?

Data may be expressed as raw numbers or as percentages. It is often simpler to interpret the percentage than the raw value. If the point of the data is to show a trend or to illustrate the differences between variables, then percentages (meaning literally "a value out of a hundred"), are a good way to get the main point over to your reader.

If three-quarters of all new-born children in the world are born to families in less-developed countries (LDCs), we could represent this comparison as follows:

A ratio = 3 : 4
A fraction = ¾
A decimal = 0.75
A percentage = 75%

The percentage value in this example, means that out of every one hundred children born, seventy-five are within an LDC.

Percentages in cross-sectional data:

With cross-sectional data percentages are calculated on the total to show the proportions in each category. The percentage in each category is found by taking the number in that category, dividing by the total and multiplying by one hundred:

There is an illustration of cross-section material available in the 'Illustration' section

Top

Ratios

Ratios tend to be used to show the relative sizes of any number of parts of a total.

If there are £10 million worth of European grants available and the Netherlands receives £2 million of them, the ratio of grants received to those still available is 2:8.

Ratios are cancellable just like fractions, so 2:8 is the same as 1:4.

If there are 14 airport runway slots available at Schipol Airport and Lufthansa has 2 of them, BA has 4 and KLM has 8, then the ratio of the number of slots taken by Lufthansa to the number taken by BA to the number taken by KLM is 2:4:8 which is the same as 1:2:4.

The ratio 1:2:4 can be seen as comprising 7 parts (1+2+4) so in the example above, Lufthansa has 1/7th of the runway slots, BA has 2/7ths and KLM has 4/7ths.

There is an illustration of ratios available in the 'Illustration' section

Top

Exponents

Mathematical calculations often contain powers of numbers, such as squares or cubes, and roots of numbers, such as square roots or cube roots. The rules for doing calculations involving exponents are the same for all numbers whether positive, negative or decimal.

Exponents usually involve using letters to represent numbers. In the illustrations that follow the letters a, m and n are used. These can of course be any numbers, but in this case we are calling the number 4 "m" and the number 2 "n". When letters are used instead of numbers it is usual to either omit the multiplication sign "x" or to replace it with a "."

a m means "a to the power m" or in other words a multiplied by itself m times. For example, 34 means 3 x 3 x 3 x 3 = 81. "m" is called the "exponent" or "index".

There is an illustration of exponents available in the 'Illustration' section

Top

Sequence of Operations

When doing mathematical calculations it is vital that you carry out multiplications, divisions, additions and subtractions in the correct order.

If asked to calculate 10 + 4 x 6, you may suppose that the correct answer is 84. The correct answer is in fact 34.

This is because the operation of the multiplication must be carried out first, before the addition part of the calculation.

For your first answer of 64 to be correct, the sum must be written as follows:
(10 + 4) x 6 = 14 x 6 = 84.

When a sum requires you to calculate powers, this part of the sum must be calculated first, before carrying out multiplication or division tasks, as is shown below:

4 x 62 = 4 x 36 = 144

As before, though, operations within brackets must be done first:

(4 x 6)2 = 242 = 576

A useful way of remembering the correct sequence of mathematical operations is:

BEDMAS or.....

Brackets first
Exponents next
Division and
Multiplication next, then
Addition and
Subtraction

There is an illustration of the Sequence Rule available in the 'Illustration' section

Top

Summations

Handling a whole table of statistics can be daunting. The process of summation involves performing calculations from columns of figures.

Top

Approximations and Errors

As you move around sets of data you'll notice that many numbers are not entirely accurate but are approximations. Because of this many statistics are subject to error. But fear not! This does not mean that the data has been calculated wrongly, only that fully precise numbers are either unnecessary, impractical or impossible to obtain.This section explores the ways in which approximations are made and how resulting errors are presented and manipulated.

Imagine we are considering the value at the accounting year end of the assets of a sole trader business operating from home. The value placed on these is £ 25 000, to the nearest thousand pounds. This figure can be said to be accurate to two "significant figures" because only the first two figures are known. The zeros at the end of the number are only there to indicate the position of the decimal point.

This figure may have been calculated from two estimates of the asset value, for example £25 350 and £24 950. A mean average of the two values giving a value of £25 150. But if this figure is quoted as it stands, it gives the impression that the true value of the assets is known to the nearest ten pounds. Of course we do not know that this is the case, so it is more usual and accurate to say that given the range of the two estimates, the value should be stated to two significant figures (£25 000).

This asset value was given as being accurate to the nearest £1000, implying to the reader that the actual value is closer to £25 000 than to £24 000 or £26 000. The real value, therefore, cannot be said to be more than £500 higher or lower than the approximate value. The true value must lie between £24 500 and £25 500. The estimated value can be said to be subject to a maximum error of £500. This is written as follows: £25 000 +/- £500.

This "error" figure gives the maximum difference between the estimated and the actual value of the business" assets and is called the "absolute" error. The "absolute" error means that the difference can be positive or negative. Possible problems resulting from this use of absolute errors is that it gives no indication of the importance of the error. In other words, an error of £500 is insignificant in a large figure of say £250 000, but is likely to be highly significant in the case of an estimate of value of £2500. These problems are overcome by using "relative" errors rather than "absolute" ones.

The relative error states the absolute error as a percentage of the estimated value. In this example this can be calculated as follows:

500/25 000 x 100 = 2%

This relative value of the business' assets is written as:

£25 000 +/- 2%

Top

Rounding or Truncating?

Certain rules govern the rounding of numbers lying between two estimated values: Imagine that you have a result of 165 that you wish to round to two significant figures (either to 160 or to 170). The dividing line between rounding down and rounding up is precisely at 165, halfway between the two possible answers (160 or 170). The rule is to round such numbers up, so that 165 becomes 170.

Be careful to watch for three potential pitfalls when dealing with rounding and significant figures: firstly, imagine a number like 44.6. Correcting this to two significant figures gives us 45. But if you want to then correct it to one significant figure, you may be tempted to write 50, as 45 is rounded up. But, remember that the original number was 44.6, which is closer to 40 than it is to 50. So, whenever you are doing successive roundings-up, remember to work with the original number.

Secondly, when you're working with numbers like 9.8 and rounding to one significant figure, the number becomes 10. This may look weird, because it is very dissimilar to what you started with. Again, if you have the number 99.5 and want to round it to two significant figures, it becomes 100. Rounding in this case cannot leave the answer as 99, because the digit to be dropped is a 5, which means the rest of the number must be rounded up.

Thirdly, when dealing with very small numbers with many zeroes contained within it, such as 0.006572, it can be tempting to write this as 0.00 correct to three significant figures (three s.f.). But this would mean losing all the important details of the measurement, other than that it is very small. The most significant digit in this small number is the 6, since this tells us most about the size of the thing we are measuring. So, the measurement is written as 0.00657 (three s.f.) and 0.0066 (two s.f.).

The final thing to add here is that although we can ignore the leading zeroes in the example given above, we cannot do this when zeroes are surrounded by other digits. The result of this is that a number like 0.03024 becomes 0.302 (three s.f.) and 0.30 (two s.f.). The zero between the 3 and the 2 is significant unlike the two zeroes before the 3.

The final approximating practice is that of omitting unwanted digits or "truncating". This would mean changing a figure such as 23.737 to 23.73. This is a simple enough procedure but it does introduce a downward bias into the results. In general, rounding is the preferred method of handling approximate numbers.

Top

Decimal Places:

It is sometimes required that you quote a statistic to a certain number of decimal places, rather than significant figures. Whilst the rules are the same as for significant figures, in this case you write the statistic to a number of figures after the decimal point, no matter what is happening before the decimal point.

Unfortunately, this means that 0.00047 becomes 0.000 (to two decimal places), so clearly you would be asked to quote this number to significant figures not decimal places. But more relevantly, 15.903 should be written as 15.90 (two d.p.). The zero digit must be included to two d.p because it shows the degree of accuracy to which you are working.

Why not try the worksheet on rounding, significant figures and decimal places?

Top

Summarising Data:

Whilst we can use tables and charts to make sense of a collection of data, we often need to use mathematical summaries, if we are to carry out detailed statistical analysis. The aim of these summarising processes is to allow us to find one or two numbers that sum up the main characteristics of large collections of data. This section contains the following concepts: central tendency (averages), dispersion (spread), and skewness (bunching).

Central tendency: any measure of the central tendency is an average. In practice, three different types of average exist:

The "mode" is the most frequently occurring value in a set of data.

The "median" is the middle value in a set of data, when the data are arranged in ascending order.

The "mean" is the measure of central tendency that takes into account all of the values in a set of data. There are different versions of the mean, but the most commonly used is the "arithmetic" mean which is calculated by summing the values in a dataset and dividing the result by the number of values that the dataset contains. It is a very useful statistic to compare countries, time periods and so on. It is perhaps at its weakest when there are within the dataset a few extreme values at one end of the range of data. The effect of this will be to "pull" the mean towards them, thus making the mean unrepresentative of the dataset as a whole.

There is an illustration of Central Tendency available in the 'Illustration' section

Top

Indices:

An Index is really an enhanced percentage. Its usefulness can be seen whenever you get your information via the news media; many things are illustrated by way of indices (the plural of index). So share values, exchange rates, and even the rate of inflation (through the Retail Prices Index) are all measured and reported using indices. But the item that is being indexed can be anything that can be given a numerical value, that changes over time. An index usually compares the way things are at one moment with the way they stand or stood at another time.

There is an illustration of Indices available in the 'Illustration' section

The "dispersion" of a set of values is also known as the spread of the data. There are five measures of dispersion:

The "range" is simply the difference between the highest and lowest values in the dataset. This only takes account of the two extremes of the dataset.

The "quartile range" is half the range of the middle 50% of values. An associated measure, the "interquartile range" is the difference between the upper quartile and the lower quartile (Q3 - Q1 ).

The Mean Absolute Deviation takes account of the whole dataset. It is also relatively easy to understand and to calculate.

The Standard Deviation avoids the disadvantage associated with the two earlier measures in that it takes account of all the values in the dataset.

The variance measures the average squared deviations from the mean. The standard deviation is the square root of this.

There is an illustration of these Dispersion Concepts available in the 'Illustration' section

The "Coefficient of variation" gives us a measure of the amount of variability present in a dataset.

The "Coefficient of Skewness" shows the tendency of the dataset values to "bunch" at one end of its distribution, with the values at the other end being relatively dispersed. The mode is the measure indicating the value where most bunching happens.

Moving Averages

A moving average removes from data the short-run fluctuations that often occur in time series statistics, leaving a smooth pattern which helps analyse the general long-run trend of the time series.

There is an illustration of Moving Averages available in the 'Illustration' section

Top

Sampling:

Often when we are looking at data about things that interest us for one reason or another, we will want to make general statements about a large dataset. We may want, for instance, to predict the average lifespan of a CD-Player, or the batteries that power it; or, to give another example, we might be interested in predicting the outcome of a political election. Usually it is impossible or at least impractical to test the whole population, so the best that we can hope for is to carry out a survey from a sample drawn from the whole population.

It is important here to clarify that when we use the term "population" we are not only talking about people: in statistics any quantity of things is called a population. It doesn't matter if the things are people, or CD-Players, or batteries, or whatever. The total quantity of these things is always known as the population.

Using the examples already referred to, it would be as impossible to get hold and ask questions of the entire population of people, as it would be to commandeer all the CD-Players or batteries that exist. So we are forced to use samples.

We will see elsewhere in TimeWeb that, because of what we know about probability theory, it is not necessary to study a whole population, even if we could do so. By using the theory we can make reliable predictions about the characteristics of the whole population, based on the sample that we draw from it.

Top

Standard Error:

If a sample is selected at random from a population there's a high probability that it will represent the population from which it is drawn. If a measure such as the arithmetic mean is looked at from each of a large number of samples taken from a population, most of these sample means will be the same or very similar, although some may be above or below the other means.

When all these means are placed on a graph it's likely that the graph will describe a normal curve with most of the sample means in the centre and a few on either side. The average of these means will be the best estimate of the population mean. The standard deviation of this distribution is called the standard error and it is found by relating the sample size to the standard deviation.

Standard Error (SE) = standard deviation of sample / square root of sample size

Formula and explanation for standard error

There is an explanation of Sampling available above

Top

Correlation

The correlation between two variables is the degree to which there is a "linear relationship" between them. Correlation is usually expressed as a "coefficient" which measures the strength of that linear relationship between the variables.

There is an illustration of Correlation of Variables available in the 'Illustration' section

Top

Regression Analysis:

Regression techniques are closely related to correlation. The idea is simple enough: the closer two variables are related to one another, the more they conform to a straight line when they are shown on a graph. The concept of the straight line is important and you should go to the section of TimeWeb that provides an explanation of this key concept now.

There is more information available on regression analysis.

Top

The Normal Distribution Curve:

Few ideas in statistics are as important as the normal distribution curve. An understanding of its properties equips you with the basic skills to perform powerful experiments and tests on the data gathered during your studies and beyond.

The discovery of the normal distribution curve and its properties begins in the sixteenth century and takes in the work of Galileo, Quetelet, De Moivre and Newton. A fuller account of the story can be found in the Statistics in History section of this site.

Essentially the normal curve comes from observations from astronomy and art. What Quetelet did was to combine, somewhat unusually, the disciplines of art and mathematics. An artist may spend a great deal of time studying the physical human form - life drawing, as it sometimes called. Quetelet noticed that despite the wide variety of physical human shapes, they seemed to cluster around an average and ideal shape. What is more, he noted that whilst we tend to remember the more unusual differences in form - the ones that depart radically from the ideal or average shape - the reality is that most differences within a population are small.

So, a population may contain a wide variation in height amongst its members, but most of the variations are grouped around the average height. That is, there are very few extreme differences in height in a population - very tall or very short people - but very many small differences in general. Most people's height is close to the population average.

Quetelet's view was that the human form deviates from an average shape, due to chance. The various factors that contribute to making up a particular shape, interact to produce a shape that deviates from a standard shape by a lot or a little. But on average humans are a shape and size not much different from the norm. He showed this by producing height distribution charts where the majority of the population clustered around the middle of the height range.

What is more, when Quetelet put this data into a line chart, he showed that the distribution of a characteristic (for example, height, weight, propensity to commit crime, or whatever,) produced a "bell-shaped" curve which is symmetrical on its vertical axis. It is this bell-shaped distribution curve that forms the basis for much of modern statistics.

There is more available on Normal distributions in the 'Crunching' section of TimeWeb.

Top

Probability

When faced with a data set we have a number of choices to make about what to do. Briefly these can be summarised as follows:

Descriptive statistics, which is about:

  • Presentation of the data
  • Calculation of ratios, percentages, averages, dispersion and correlation

and/or

Inferential statistics, used to:

  • Infer the features of a population on the basis of a sample

Inferential statistics is based directly on probability theory.

The probability of an event is also known as its relative frequency of occurrence. If you find this term complicated, just think of it as "how many times out of 100 goes could an outcome happen".

If we think of a football division table, then at the start of the season, theoretically any of the clubs in that division could finish top by the season's end.

So in the Premiership, for instance, there are 20 clubs. Each of the clubs has a 1 in 20 chance of winning the league, in theory. In percentage terms this means that each club's pre-season chance of finishing top is 5%.

Of course, experience tells us that the biggest clubs, generating more income, can often secure the services of the best players, rather unfairly (!) increasing their chances of winning.

So a betting shop assesses these and other aspects, attaching a set of "weights" to these probabilities. These "weights" are known as the "odds" for and against certain clubs' chances. It may seem a very complicated way of doing things, but the starting point is simple probability theory.

The Premiership football example uses discrete data; only one club will finish in each place in the league. When we are dealing with continuous data, the relationship between the value of a variable (x) and its probability of occurrence, is usually known as a probability density function.

Imagine that we are taking observations of people's weight. If we measure to the nearest whole kilogram, we could construct a frequency distribution to show how often certain weights were observed. But this frequency chart would change if the measurements we took were correct to the nearest gram.

In fact, the curve that we would obtain on our chart would change each time we changed the measurement accuracy (say to one, two, three or four decimal places). In theory there is no limit to the number of decimal places to which we could measure. This is a literal example of what continuous data means.

So starting our illustration of people's weight with a simple histogram, we could refine the measurements repeatedly until we produce a smooth frequency curve. In the end the whole of the area under the curve would be equal to what is called unity and the chart would illustrate a probability curve. Any value of X must lie within or at the maximum and minimum points of the curve

The height of the probability curve at any value of X is shown by the expression f(x). Remember that f(x) does not show the probability of observing a particular value of X. With continuous data we can only find the probability of observing X within a certain range.

One of the most useful probability density functions is known as the normal distribution.

There is an explanation of Normal Distribution available above

There is also an illustration of Normal Distribution available in the 'Illustration' section

Statistical significance and probability (p-value)

The statistical significance of a result is expressed in probability terms. When analysing the statistical significance of outcomes, you are interested in whether the observed relationship (for example, between variables) or a difference (between means) in a sample happened by pure chance. You are trying to find out if, in the population from which the sample was drawn, no such relationship or differences exist.

In everyday language, you could say that the statistical significance of a result tells you something about the degree to which the result is "true" (in the sense of being "representative of the population"). More technically, the value of the p-value represents a decreasing index of the reliability of a result . The higher the p-value, the less you can believe that the observed relation between variables in the sample is a reliable indicator of the relation between the respective variables in the population.

Specifically, the p-value represents the probability of error involved in accepting an observed result as valid, that is, as "representative of the population." For example, a p-value of .05 (or one chance in every twenty goes) indicates that there is a 5% probability that the relation between the variables found in the sample is a "fluke".

So, assuming that in the population there was no relation between those variables whatsoever, and you were repeating your experiment one after another, you could expect that approximately in every 20 repeats of the experiment there would be one in which the relation between the variables in question would be equal or stronger than in yours.

This is not the same as saying that, given that there IS a relationship between the variables, you can expect to repeat the results 5% of the time or 95% of the time. In many areas of research, the p-value of .05 is customarily treated as a "border-line acceptable" error level.

Typically, in many sciences, results of probability p .05 are considered borderline statistically significant but even this level of significance still involves a fairly high probability of error (5%). Results that are significant at the p .01 level are usually seen as being statistically significant, and p .005 or p .001 levels are regarded as "highly" significant.

It may seem obvious, but the more times you analyse a data set, the more the results will meet "by chance" the conventional significance level. For example, if you calculate correlations between ten variables (i.e., 45 different correlation coefficients), then you should expect to find by chance that about two (i.e., one in every 20) correlation coefficients are significant at the p .05 level, even if the values of the variables were totally random and those variables do not correlate in the population.

A commonly quoted example from research on statistical reasoning involves two maternity wards: in the first one, 80 babies are born every day, in the other, only 8. On average, the ratio of girls to boys born every day in each ward is 50/50. However, in one shift in one of those wards twice as many boys were born as girls.

Number in category

X 100
Total in all categories
Q1. In which maternity ward was the boy baby boom more likely to happen?
(Select one answer)

(a) * The large maternity ward.
(b) * In neither of the maternity wards
(c) * In the smaller of the two wards.


Top

The Number System:

Our number system uses ten as its base. It is a "decimal" system from the Latin word for "tenth". The numbers zero to nine are shown by the digits 0, 1, 2, 3 and so on to 9. To show numbers ten times as large as this, the digits are shifted one position to the left and the digit 0 is used to indicate this new position: 10, 20, 30 and so on to 90. Further increases by a factor of 10 are shown by further shifts in position: 100, 200, 300 and so on to 900.

Position is counted from the decimal point which, although it is not shown in the above examples, is there all the same. The number "one" could be written as 1., ten could be written as 10., and so on.

A decrease by a factor of ten is shown by shifting the digits one position to the right with the digit 0 being used to indicate the new position. One-tenth is 0.1, one hundredth is 0.01, one thousandth is 0.001. Notice that it is usual to prefix the decimal point with the digit 0.

Making Sense of the Number System:

The number 42.527 means 40 plus 2
plus .5 (five tenths)
plus .02 (two hundredths)
plus .007 (seven thousandths)

Multiplying this number by ten means that all the digits in the number are shifted to the left: 425.27*

Dividing this number * by one hundred means all the digits are shifted by two positions to the right: 4.2527

Alternative Number Systems:

The decimal system is used almost universally, but systems based on numbers other than ten are possible. An example of this is the "binary" system, which is used in computing and which has a base of two. In this system there are only two digits, 0 and 1, and any number can be shown by placing these digits in different positions.

In the binary system zero is 0, one is 1, two is 10, three is 11, four is 100, five is 101, six is 110, seven is 111 and eight is 1000. Under this system an increase in a number by a factor of two (doubling it) is shown by shifting the digits one place to the left, with 0 being used to indicate position. Halving the number (decreasing by a factor of two) is shown by shifting the digits one place to the right.

Computer Memory

Computer memory is measured in bytes.

  • 1 byte is equivalent to 8 bits. The information in a byte is equivalent to a letter in a word.
  • 1 kilobyte is roughly 1000 (210 or 1024) bytes or characters, approximately equal to one page of double-spaced text.
  • 1 megabyte is roughly 1,000,000 (220 or 1,048,576) bytes, approximately equal to one novel.
  • 1 gigabyte is about 1,000,000,000 (230 or 1,073,741,824) bytes, approximately equal to 1000 novels.
  • 1 terabyte is about 1,000,000,000,000 (240 or 1,099,511,627,776) bytes, approximately equal to 1,000,000 novels.
Top

Time series

Time series analysis inevitably involves analysing historic data such as sales, costs, share prices, interest rates, and so on. This is carried out to predict the future values of the relevant variables.

This data may be about the economic market as a whole, the specific industry within which the organisation operates, or the organisation itself. Time series analysis takes this data and 'breaks it down' into its component parts, from which the organisation can extrapolate, or predict future values. In particular, it is about isolating the underlying trend, which may be precisely what the organisation needs in order to carry out its long-term planning.

Analysing a time series consists of :

  • 'breaking' the series down into its trend and seasonal variation.
  • projecting each characteristic into the future.
  • adding together all individual projections in order to arrive at a forecast figure.

There is an illustration of time series available in the 'Crunching' section of TimeWeb

Top