Statistical Analysis in Excel


Statistical Analysis

Statistical analysis is the study of mathematics used to explain groups of data, or data sets. Statistical methods are used to compare different groups or data sets. The following table lists the basic statistic terms: mean, median, mode, and standard deviation.

StatisticDescriptionExcel Function
Meanarithmetic average of a set of numbersAVERAGE
Medianvalue that occurs in the middle of a data set when arranged from lowest to highest, half the values above and half below the median valueMEDIAN
Modevalue that occurs most frequently in a data setMODE.MULT, MODE.SNGL
Standard Deviationmeasure of how widely the data values are dispersed from the arithmetic meanSTDEV.P, STDEV.S

Excel has many statistical functions. Here are the basic statistical functions in Excel.

Basic Statistical Functions

FunctionDescription
AVERAGEarithmetic mean of a data set
MEDIANmedian or the middle value in the dataset
MODE.SNGLmode or most frequent number in a dataset; can show only one value
MODE.MULTmost frequent values in a data set; can show multiple values
STDEV.Pstandard deviation of a population
STDEV.Sstandard deviation of a sample
LARGEnth largest value
SMALLnth smallest value
MAXlargest value
MINsmallest value
RANK.AVGrank in a list of numbers, any ties are averaged
RANK.EQrank in a list of numbers, ties are shown as first number

Mean, Median, and Mode

AVERAGE

Arithmetic mean of its arguments, which can be numbers or names, arrays, or references that contain numbers.

AVERAGE(number1,number2,…)

number1: number1,number2,… are 1 to 255 numeric arguments for which you want the average.

number2: number1,number2,… are 1 to 255 numeric arguments for which you want the average.

MEDIAN

The median or middle number in a group of supplied numbers. 

=MEDIAN (number1, [number2], …)

number1 – A number or cell reference that refers to numeric values.

number2 – [optional] A number or cell reference that refers to numeric values.

MODE.SNGL

The most frequently occurring, or repetitive, value in an array or range of data.

MODE.SNGL(number1,number2,…)

Number1: number1,number2,… are 1 to 255 numbers, or names, arrays, or references that contain numbers for which you want the mode.

Number2: number1,number2,… are 1 to 255 numbers, or names, arrays, or references that contain numbers for which you want the mode.

MODE.MULT

Returns a vertical array of the most frequently occurring values in an array or range of data. 

MODE.MULT(number1,number2,…)

Number1: number1,number2,… are 1 to 255 numbers, or names, arrays, or references that contain numbers for which you want the mode.

Number2: number1,number2,… are 1 to 255 numbers, or names, arrays, or references that contain numbers for which you want the mode.

Standard Deviation

Standard deviation is a measure of the amount of dispersion or variation in a dataset. To calculate the standard deviation for an entire population use the STDEV.P function. To calculate it for a sample, use the STDEV.S function.

STDEV.P

Calculates the standard deviation for a data set. This is the standard deviation of an entire population. STDEV.P replaces the older STDEV function, with the same behavior. 

=STDEV.P (number1, [number2], …)

number1 – First number or reference in the sample.
number2 – [optional] Second number or reference.

STDEV.P calculates standard deviation using the “n” method. STDEV.S assumes data is an entire population. When data represents a sample, use the STDEV.S function.

STDEV.S

Calculates the standard deviation for a sample set of data. STDEV.S replaces the older STDEV function, with the same behavior. 

=STDEV.S (number1, [number2], …)

number1 – First number or reference in the sample.
number2 – [optional] Second number or reference.

STDEV.S calculates standard deviation using the “n–1” method. STDEV.S assumes data is a sample only. When data represents an entire population, use the STDEV.P function.

Useful Statistical Functions

Largest, Smallest, and Rank

FunctionDescription
MAXlargest value in a dataset
MINsmallest value in a dataset
LARGEnth largest value
SMALLnth smallest value
RANK.AVGrank in a list of numbers, any ties are averaged
RANK.EQrank in a list of numbers, ties are shown as first number

MAX

Returns the largest value in a set of values. Ignores logical values and text.

MAX(number1,number2,…)

number1, number2,…: number1,number2,… are 1 to 255 numbers, empty cells, logical values, or text numbers for which you want the maximum.

MIN

Returns the smallest value in a set of values. Ignores logical values and text.

MIN(number1,number2,…)

number1, number2,…: number1,number2,… are 1 to 255 numbers, empty cells, logical values, or text numbers for which you want the minimum.

LARGE

Returns numeric values based on their position in a list when sorted by value. In other words, it can retrieve “nth largest” values – largest value, 2nd largest value, 3rd largest value, etc.

=LARGE (array, n)

array – The array from which you want to select the kth largest value.
n – An integer that specifies the position from the largest value, i.e. the nth position.

Using LARGE to find the 1st largest, would give the same results as MAX.

SMALL

Returns numeric values based on their position in a list ranked by value. In other words, it can retrieve “nth smallest” values – smallest value, 2nd smallest value, 3rd smallest value, etc.

=SMALL (array, n)

array – A range of cells from which to extract the smallest values.
n – An integer that specifies the position from the smallest value, i.e. the nth position.

Using SMALL to find the 1st smallest, would give the same results as MIN.

RANK.AVG

Returns the rank of a number against a list of other numeric values. When values contain duplicates, the RANK.AVG function will assign an average rank to each set of duplicates. RANK.AVG replaces the older RANK function.

=RANK.AVG (number, ref, [order])

number – The number to rank.
ref – An array that contains the numbers to rank against.
order – [optional] Rank ascending or descending. Default is zero.

RANK.EQ

Returns the rank of a number against a list of other other numeric values. When values contain duplicates, RANK.EQ will assign the higher rank to each set of duplicates. RANK.EQ replaces the older RANK function.

=RANK.EQ (number, ref, [order])

number – The number to rank.
ref – An array that contains the numbers to rank against.
order – [optional] Rank ascending or descending. Default is zero.

Statistical Functions With Logic

FunctionDescription
AVERAGEIFSaverage based on a logical test (replaces AVERAGEIF)
COUNTIFScount based on a logical test (replaces COUNTIF)
SUMIFSsum based on a logical test (replaces SUMIF)

AVERAGEIFS

Computes the average of the numbers in a range that meet one or more criteria. The criteria in AVERAGEIFS supports logical operators (>,<,<>,=) and wildcards (*,?) for partial matching.

=AVERAGEIFS (avg_rng, range1, criteria1, [range2], [criteria2], …)

avg_rng – The range to average.
range1 – The first range to evaluate.
criteria1 – The criteria to use on range1.
range2 – [optional] The second range to evaluate.
criteria2 – [optional] The criteria to use on range2.

COUNTIFS

Counts cells that meet one or more criteria. COUNTIFS can be used with criteria based on dates, numbers, text, and other conditions. COUNTIFS supports logical operators (>,<,<>,=) and wildcards (*,?) for partial matching.

=COUNTIFS (range1, criteria1, [range2], [criteria2], …)

range1 – The first range to evaulate.
criteria1 – The criteria to use on range1.
range2 – [optional] The second range to evaluate.
criteria2 – [optional] The criteria to use on range2.

SUMIFS

Sums cells that meet multiple criteria. SUMIFS can be used to sum values when adjacent cells meet criteria based on dates, numbers, and text. SUMIFS supports logical operators (>,<,<>,=) and wildcards (*,?) for partial matching.

=SUMIFS (sum_range, range1, criteria1, [range2], [criteria2], …)

sum_range – The range to be summed.
range1 – The first range to evaulate.
criteria1 – The criteria to use on range1.
range2 – [optional] The second range to evaluate.
criteria2 – [optional] The criteria to use on range2.

Random Numbers and Rounding

FunctionDescription
RANDrandom number from 0 to 1
RANDBETWEENrandom number between numbers specified
ROUNDrounds a number to a specified number of digits
ROUNDDOWNrounds a number down toward zero
ROUNDUProunds a number up away from zero
EVENrounds a number up to the next even integer
ODDrounds a number up to the next odd integer
INTrounds a number down to the nearest integer
TRUNCtruncates a number to the specified number of decimal places by removing digits to the right
CEILINGround a number up to nearest specified multiple
FLOORround a number down to nearest specified multiple

RAND

Returns a random number between 0 and 1. RAND recalculates when a worksheet is opened or changed.

=RAND ()

The RAND function is volatile: every change in Excel returns a new random number.

RANDBETWEEN

Returns a random integer between given numbers. RANDBETWEEN recalculates when a worksheet is opened or changed.

=RANDBETWEEN (bottom, top)

bottom – An integer representing the lower value of the range.
top – An integer representing the lower value of the range.

ROUND

Returns a number rounded to a given number of digits. The ROUND function can round to the right or left of the decimal point.

=ROUND (number, num_digits)

number – The number to round.
num_digits – The number of digits to which number should be rounded.

ROUNDDOWN

Returns a number rounded down to a given number of decimal places. Unlike ROUND, where only numbers less than 5 are rounded down, ROUNDDOWN rounds all numbers 1–9 down.

=ROUNDDOWN (number, num_digits)

number – The number to round down.
num_digits – The number of digits to which number should be rounded down.

ROUNDUP

Returns a number rounded up to a given number of decimal places. Unlike ROUND, where numbers less than 5 are rounded down, ROUNDUP always rounds numbers 1–9 up.

=ROUNDUP (number, num_digits)

number – The number to round up.
num_digits – The number of digits to which number should be rounded up.

EVEN

Returns the next even integer after rounding a given number up. The EVEN function always rounds numbers up (away from zero) so positive numbers become larger and negative numbers become smaller (i.e. more negative).

=EVEN (number)

number – The number to round up to an even integer.

ODD

Returns the next odd integer after rounding a given number up. The ODD function always rounds numbers up (away from zero) so positive numbers become larger and negative numbers become smaller (i.e. more negative).

=ODD (number)

number – The number to round up to an odd integer.

INT

Returns the integer part of a decimal number by rounding down to the integer. Note the INT function rounds down, so negative numbers become more negative. For example, while INT(10.8) returns 10, INT(–10.8) returns –11.

=INT (number)

number – The number from which you want an integer.

TRUNC

Returns a truncated number based on an (optional) number of digits. The TRUNC function does no rounding, it simply truncates as specified.

=TRUNC (number, [num_digits])

number – The number to truncate.
num_digits – [optional] The precision of the truncation (default is 0).

CEILING

Returns a given number rounded up to the nearest specified multiple. 

=CEILING (number, multiple)

number – The number that should be rounded.
multiple – The multiple to use when rounding.

FLOOR

Rounds a given number down to the nearest specified multiple. 

=FLOOR (number, multiple)

number – The number that should be rounded.
multiple – The multiple to use when rounding.

Excel Formula Wildcards

Excel has 3 wildcards you can use in formulas:

  1. Asterisk (*) – zero or more characters
  2. Question mark (?) – any one character
  3. Tilde (~) – escape for literal character (~*) a literal question mark (~?), or a literal tilde (~~).
UsageDescriptionMatch
?Any one characterA, d, C, r
??Any two charactersAA, BC, dd
???Any three charactersSky, BBB, win
*Any charactersace, David, D211
*stEnds in “st”best, East
w*Starts with “w”William, west, w104
?*At least one charactera, z, ab, DEF
???-???6 characters with a dashARZ–123, 107-BBB
*~?Ends in a question markWhat?, reply?
*ou*Contains “ou”accounting, shout, country

Compatible Functions for Wildcards

Not all functions allow wildcards. Here is a list of the most common functions that allow wildcards:

  • AVERAGEIF, AVERAGEIFS
  • COUNTIF, COUNTIFS
  • SUMIF, SUMIFS
  • HLOOKUP, VLOOKUP, XLOOKUP
  • MATCH
  • SEARCH

Named Ranges

An alternative to absolute cell references is a named range. A single cell or a range can be named and that name can be used directly in a formula or function. For example, a named range can be called ‘Sales’ rather than B7:B45.

To name a cell or a range, first highlight the cell or range and then click the Name Box below the ribbon to the left. The Name Box is just to the left of the Formula Bar. A named range is an absolute cell reference.

Named ranges are located in the Formulas tab on the ribbon under the Define Name button. Named ranges can be added, edited, or deleted.

The INDIRECT function below is useful when working with named ranges. If you have a named range ‘Sales’ it can be referenced by a column heading named Sales.

INDIRECT

Returns the reference specified by a text string.

INDIRECT(ref_text,[a1])

Ref_text: is a reference to a cell that contains an A1- or R1C1-style reference, a name defined as a reference, or a reference to a cell as a text string.

A1: is a logical value that specifies the type of reference in Ref_text: R1C1-style = FALSE; A1-style = TRUE or omitted.


Video 2: Stock and Bond Returns Tutorial


Video 3: Named Ranges and IFS Tutorial


Jeff Mankin

Jeff Mankin teaches financial literacy. His website is FinallyLearn.com.

Recent Posts

link to Expense

Expense

What is an expense? Expense is a cost to operate a business to produce revenue. In accounting, an expense occurs when an asset is used. This could include a cash outflow or consuming an...