NTP Iso 6658.
NTP Iso 6658.
2008-12-03
1a Edition
R.0037-2008/INDECOPI-
CNB. Published on 2009-01-01Pricebasedon 38 pages I.C:S: 01.040.67; 67.240 THIS STANDARD IS
RECOMMENDED
Descriptors: Sensory analysis, methodology, general guidelines
i
INDEX
pages
INDEX i
PREFACE iii
INTRODUCTION v
2. NORMATIVE REFERENCES 1
4. GENERAL REQUIREMENTS 2
5. TEST METHODS 9
6. ANALYSIS OF RESULTS 27
7. BACKGROUND 33
ANNEX A 34
ANNEX B 37
ii
PREFACE
A. HISTORICAL REVIEW
ENTITY REPRESENTATIVES
BODEGAS VISTA ALEGRE S.A. Rodolfo Vasconi
iii
BODEGAS Y VIÑEDOS TABERNERO S.A.C. Carlos Rotondo
VITIVINÍCOLA EL FUNDADOR
OF CAÑETE Miguel Mirez Crisóstomo
EL ALAMBIQUE SACJosé Américo Vargas de la Jara
PRODUCERS' ASSOCIATION
OF WINES AND PISCOS FROM THE VALLEY
FROM ICA - APROPICA Jesus Hernandez
WINE ASSOCIATION OF
LUNAHUANÁ Juan Carlos Alvarado
WINERY EL CATADOR José Carrasco
iv
ASPEC Samuel Ureña
---oooOooo---
v
Introduction
Definitions;
Application;
Assessors; Procedure;
Analysis of results.
Chapter 6 deals with some general principles of data collection and the analysis of
sensory data and also briefly discusses the general principles of statistical treatment
of the results.
vi
STANDARD NTP-ISO 6658
PERUVIAN 1 of 38
TECHNIQUE
This Peruvian Technical Standard establishes general guidelines on the use of sensory
analysis. It describes tests for the examination of foods by sensory analysis, and
includes some information on the techniques to be used if statistical analysis of the
results is required.
These tests are generally intended for objective sensory analysis only. However, if a test
cannot be used to determine preference, this shall be indicated.
2. NORMATIVE REFERENCES
This standard contains provisions which, when cited in this text, constitute requirements
of this Draft Peruvian Technical Standard. The indicated edition was in force at the
time of this publication. As all standards are subject to revision, it is recommended to
those who make agreements based on them, to analyse the convenience of using the recent
editions of the standards cited below. The Peruvian Standards Body has, at all times, the
information of the Peruvian Technical Standards in force.
For the purposes of this Peruvian Technical Standard, the terms and definitions given in NTP
ISO 5492 and the following apply:
3.1 sensory analysis: examination of organoleptic attributes of a product
through the sense organs.
4. GENERAL REQUIREMENTS
This chapter covers the general requirements common to all situations encountered in sensory
analysis. The basic information on these requirements is as follows:
a) The human response to a stimulus cannot be isolated from previous
experience or from other sensory stimuli received from the environment.
NOTE: However, the
influen
ces coming from these two sources can be controlled and the effect
enhanc
ed
d) The validity of the conclusions drawn from the results depends on the test
used and the way it is conducted, including the questions that have been asked.
a) Those where the main purpose of the test is to classify, rank order or
describe the product(s);
b) Those in which the purpose is to distinguish between two or more
products; here it is important to distinguish between the need to know
if there is a difference,
difference,
the influence of that difference, e.g. with respect to preference, or whether
all or part of a population detects a difference;
For each test, chapter 5 attempts to provide guidelines regarding its relevance.
Preliminary tests may be necessary to confirm the applicability of a given test.
Due to sensory fatigue and adaptation effects, depending on the nature of the test and the
type of product, only a limited number of samples can be evaluated during one session.
Some of these effects can be moderated by appropriate rinsing procedures and recovery
between samples.
Although the use of control samples is essential in most cases, it naturally limits the
number of samples that can be assessed during any one session.
The statistical plan should always be determined before testing begins. This is
especially recommended if the number of samples to be tested requires more than one
session. Details of the statistical plans should be selected on the basis of specialised texts.
Whichever testing method is used, the sequential testing approach described in ISO 16820
should be considered when it is desirable to keep the number of samples or the number of
assessors to a minimum.
A sensory analysis panel is a real -measuring instrument‖ and, therefore,the results of the
analyses carried out depend on its members. Therefore, the selection of persons willing
to participate in a panel has to be carried out with the following criteria in mind
It should be seen as a real investment, both in terms of time and finances. Management
support in the organisation is necessary if it is to be effective.
Sensory evaluation can be carried out by three types of judges: -judges‖, -qualified
judges‖ or -expert judges‖. Judges can be -novice judges‖ who do not have to meet a precise
selection or training criteria, or people who have already participated in some sensory
tests (initiated judge). Qualified judges are judges who have been selected and trained for a
particular sensory test. Expert judges are evaluators who have been selected and trained
for a variety of sensory analysis methods and who demonstrate special acumen in panel
work.
STANDARD NTP-ISO 6658
PERUVIAN 5 of 38
TECHNIQUE
The selection and training methods to be used depend on the tasks and methods that the
selected judges are intended to be given. It should be noted that these methods are only
a way of selecting the best candidates from those available, rather than meeting
predetermined criteria. In addition, the selection of assessors for their ability to
discriminate and describe foods is very different from that used for preference testing.
The former requires selection and training, whereas the latter requires only that the panel
is representative of a specified sector of the population, e.g. a group of consumers.
a) general ability to perform the specific sensory task, which may include a
special sensitivity to the stimuli being studied;
The nature of the product to be tested determines the experimental test protocol and
may also influence the type of test required to meet the test objectives. For example, a
protocol in which the food is to be consumed hot will need to take into account the rate of
cooling of the product and the likely effect on sensory attributes, and the changes in
sensory attributes that may result from keeping the product warm prior to testing.
STANDARD NTP-ISO 6658
PERUVIAN 6 of 38
TECHNIQUE
Sample preparation and submission methods should be appropriate to the product and the
problem involved.
EXAMPLE 1: A product that is normally consumed hot should be prepared in the usual
way and tested hot; however, higher temperatures may be used in some circumstances in
order to make it easier to evaluate some flavours.
EXAMPLE 2: A product that is normally consumed in discrete pieces should not be
homogenised to preserve textural characteristics. However, care should be taken to
ensure maximum uniformity between sub-samples for each assessor; this includes similar
portion size and compositional uniformity.
The general principles for product sampling (according to the International Standards
relevant to the product being tested) should be applied to the test samples. In all cases,
documentation of sample identification codes or lot numbers is required. Valid
conclusions can be drawn for a product as a whole only if the samples tested are
representative.
Carriers may sometimes be used for tests related to the evaluation of products for which
direct tasting is not feasible (see ISO 5497), e.g. food ingredients.
Lighting conditions should be specified when assessing appearance. When the test relates
only to differences in taste, the effect of colour differences can be partially masked by
the use of lighting conditions that minimise the colour difference.
Containers should be selected in such a way that they do not affect the test or the product.
These may include washable ceramic or glass containers, or disposable plastic or paper
containers, but should not transfer chemical materials that could cause contamination. In
particular, washable containers should be washed only with detergents free of any odour
and contamination and should be rinsed with water, and polymeric and paper containers,
including insulated containers used for hot or cold samples, should be free of any odour
and contamination.
STANDARD NTP-ISO 6658
PERUVIAN 7 of 38
TECHNIQUE
Judges may use palate cleansers from sample to sample and from session to session, but
care should be taken to ensure that they do not influence the taste of the products to be
evaluated. Still and carbonated water and bland foods (e.g. unsalted biscuits) can be
used from sample to sample and from session to session. Checks of the water supply to
ensure that it is soft are desirable. For particular purposes, deionised water, glass still
distilled water, spring water with low mineral content, carbon filtered water or boiled tap
water can be used, but it should be noted that they are likely to have different flavours.
Sensory analysis should be carried out in a dedicated testing room (See NTP ISO 8589
for further details). The aim should be to create, for each judge, a separate environment with
a minimum of distraction, so that the judge can quickly adapt to the nature of the new
task(s). During testing, no extraneous activities, including sample preparation, should
be allowed, as these may lead to biased results. The room should be at a comfortable
temperature and ventilated with odour-free air; limited air flow is desirable to avoid
excessive temperature fluctuations. Persistent odours, such as tobacco or cosmetics,
should not be allowed to contaminate the test room environment.
Noise should be restricted. Low background noise is generally more tolerable than a
fluctuating noise level. Conversation is more distracting than background noise.
Interruptions are most distracting.
It is usually useful to have control over the colour and intensity of the lighting, although
coloured lights rarely manage to completely hide differences in appearance.
Surfaces should be non-absorbent and designed to facilitate a high level of hygiene. The
dimensions of tasting booths are important; very low ceilings and very narrow booths
can be oppressive or produce a feeling of claustrophobia. Comfortable seating is
necessary.
The planning and conduct of the test are determined by the objectives of the programme, the
test selected, and the practical constraints related to the use of human subjects. In
particular, it is important to recognise the biases that may be inherent in the selected test,
and to conduct the test in such a way as to minimise the effects of any biases. Potential
biases may originate from both psychological and physiological sources.
The most serious psychological bias comes from evaluators interacting to influence each
other's judgements, and should be minimised through the use of individual booths or
adequate separation of evaluators. In addition, rigorous management of evaluators'
activities is necessary.
The manner and order of presentation of the samples are important aspects of the test and
may introduce psychological biases. For example, the transcripts should be coded with
3-digit random numbers, and the codes should be changed for each test. The order of
assessment can also be a source of bias and, in general, the order should be specified.
With a small number of samples and assessors, the order can be balanced so that every
possible order occurs an equal number of times. In larger experiments, the order can be
balanced or randomised.
Physiological biases are often related to the nature of the test samples. In particular,
adaptation to a specific taste stimulus may occur on repeated exposure to that stimulus,
and fatigue may be experienced when chewing solid foods. Both of these factors may
impose an upper limit on the number of samples to be tested in one session.
Expectoration of samples may be recommended for trained panels but the loss of
information on specific sensory attributes may be a limitation.
Hunger and satiety can influence an assessor's performance and, if the panels are held too
long, performance may deteriorate. If possible, assessors should be asked to refrain from
smoking or consuming light items such as coffee for 1 h before the test. Raters should not
bring any foreign odours to the session, e.g. tobacco odour or cosmetics, as these may
influence the responses of other raters.
STANDARD NTP-ISO 6658
PERUVIAN 9 of 38
TECHNIQUE
The time of day the test is conducted is important. The schedule should take into account
the usual local meal times as performance is generally considered optimal mid-morning
and mid-afternoon. Testers suffering from emotional disturbances, colds and other
illnesses should be excluded from testing until they recover.
verify that all data have been accurately recorded, whether computerised
or manually;
verify that any additional relevant information has been recorded that
may assist orcast doubt on the interpretation of the
results;
verify that evaluators are motivated to continue participating if further testing
is planned.
5. TEST METHODS
5.1 Generalidade
For the number of judges, refer to the respective standards, considering the α or β risk
depending on the purpose of the test. Alternatively, sequential analysis (See ISO 16820)
may allow a decision to be made after fewer tests than would be required by
conventional approaches using a predetermined number of assessments.
5.2.1 General
The following tests are commonly used to determine the probability of difference or
similarity between samples:
For all these tests, there are different ways of analysing the results.
5.2.2.1 Definition
This is a test in which samples are presented in pairs in order to compare and detect
differences on the basis of some defined criteria.
STANDARD NTP-ISO 6658
PERUVIAN 11 of 38
TECHNIQUE
5.2.2.2 Application
The advantages of this test over other discrimination tests are simplicity and less
sensory fatigue.
The disadvantage of the pairwise comparison method is that, as the number of samples
to be compared increases, the number of intercomparisons required quickly becomes
unmanageable.
5.2.2.3 Procedure
The judges receive a set of two samples (the pair). They nominate the sample that they
consider to be the most intense in the attribute under consideration, although this choice
is only a guess. One of the samples can be a control. The number of times each sample
is selected is counted.
It is necessary to determine, before testing, whether the statistical test that follows is to be
one-sided (i.e. the test supervisor expects a certain direction of the difference and the
alternative hypothesis corresponds to the existence of a difference in that direction) or
bilateral (i.e. the test supervisor has no expected direction of the difference and the
alternative hypothesis corresponds to a difference in either direction).
STANDARD NTP-ISO 6658
PERUVIAN 12 of 38
TECHNIQUE
Questions on difference and preference should not be combined: the criteria for panel
selection are different for these questions.
5.2.3.1 Definition
This is a discrimination test in which three coded samples are presented simultaneously,
two of which are identical. The raters are asked to indicate which is the different sample.
5.2.3.2 Application
The test should not be used for preference determination. Some disadvantages of the test
are that:
5.2.3.3 Procedure
Each assessor is presented with a set of three coded samples, two of which are
identical, and asked to select the different sample.
The samples should be submitted an equal number of times in each of the two groups
of three different order changes, which are:
5.2.4.1 Definition
This is a discrimination test in which a reference sample is first presented. Two samples
are then presented, one of which is identical to the reference sample and the assessors
are asked to identify it.
5.2.4.2 Application
This duo-trio test is used to determine whether there is a sensory difference or similarity
between a given sample and a reference. It is particularly suitable when the reference
sample is well known to the assessors, e.g. a regular production sample.
If there is a residual taste, this test is less suitable than the paired comparison test (5.2.2)
or the -A/not A‖ test (5.2.6).
5.2.4.3 Procedure
First, the assessors are presented with an identified reference sample. Then, they are
presented with two coded samples, one of which is identical to the reference sample
and the assessors are asked to identify this sample.
5.2.5.1 Definition
This is a discrimination test in which five coded samples are presented, two of which
belong to one type and three to another. The assessors are asked to group the two sets
of samples together.
5.2.5.2 Application
The two out of five test is recommended to establish a difference in a more economical
way than other tests (the method is statistically more efficient).
STANDARD NTP-ISO 6658
PERUVIAN 15 of 38
TECHNIQUE
The disadvantages of this test are similar to those of the triangular test (5.2.3). It is much
more affected by sensory fatigue and memory effects but has greater statistical power.
It is mainly used in visual, auditory or tactile applications.
5.2.5.3 Procedure
Each of the assessors is presented with a set of five coded samples and told that two
belong to one type and three belong to another. Evaluators are asked to group the two
sets of samples together.
When the number of evaluators is less than 20, the order of submission should be
randomly selected from the following 20 different changes.
5.2.6.1 Definition
This is a test in which assessors are presented with a series of samples that
can be -A‖ or -not A‖ after they have learned to recognise the sample -A‖. The assessors are
asked to indicate whether each of the samples is -A‖ or -not A‖.
5.2.6.2 Application
This is a discrimination test that can be used for the evaluation of samples that have
variations in appearance or leave a lingering aftertaste.
5.2.6.3 Procedure
Evaluators are presented with samples one at a time. First, they are presented with the
reference sample -A‖ several times, until they can recognise it. Then, they are randomly presented
with several samples, each of which can be either -A‖ or -not A', and they have to determine
whether the samples are -A‖ or not. A considerable time interval (e.g. 2-5 min) should be allowed
to elapse between each receipt of samples, with only
few samples should be examined during one session.
In sensory analysis, measurement methods may try to decide the categories, classes or
grades to which samples should be assigned. They may also try to find numerical
estimates of the magnitude of attributes of samples or differences between samples.
STANDARD NTP-ISO 6658
PERUVIAN 17 of 38
TECHNIQUE
There is no direct relationship between the scale of responses used to obtain numbers
and the scale of measurement that corresponds to the values recorded. Thus, the same
method to obtain numbers (response scale) can lead to values, whose measurement scale
is only ordinal (unequal intervals) or is on an interval scale (equal intervals). With an
ordinal measurement scale, it cannot be assumed that the magnitude of the difference
between two values reflects the difference between the perceived intensities. Nor can it
be assumed that the ratio of two values reflects the ratio of the perceived intensities.
With an interval measurement scale, higher numerical values correspond to higher
perceived intensities (or degrees of pleasure) and the magnitude of the difference
between two values reflects the magnitude of the difference in the perceived intensity of
the property being measured. However, a numerical value equal to zero may not indicate a
total absence of the property and it cannot be assumed that the ratio of two values
reflects the ratio of perceived intensities.
The choice of response scale depends on the objectives of the study and the products
under study. In any specific case, one can choose from a number of equally good scales.
Whatever response scale is adopted, it should be easy to use, discriminant, unbiased and
easily understood by assessors (see NTP ISO 4121).
Regardless of the scale of responses, the quality of the measurements depends on how
they were obtained. Aspects to consider are:
the level of training of assessors (see NTP ISO 8586-1 and NTP ISO 8586-
2), and
the method of presentation of the samples (see paragraphs 4.5 and 4.7).
Statistical analysis is influenced by the nature of the measurement scale (ordinal, interval or
ratio) rather than the scale of responses used.
STANDARD NTP-ISO 6658
PERUVIAN 18 of 38
TECHNIQUE
Results measured on an ordinal scale are best analysed using non-parametric methods,
e.g. the Wilcoxon test in the case of two equal samples or the Friedman test with more than
two samples. Measurements on an interval scale or ratios can be analysed using a
parametric test, e.g. analysis of variance, if a normal distribution of residuals can be
assumed.
In general, parametric tests are more powerful than non-parametric tests. That is, if
there is a difference, the parametric test is more likely to prove it. On the other hand,
non-parametric tests are more robust than parametric tests; that is, they are less affected by
anomalies in the data.
In sensory analysis, the perception of a property is assessed, not the property itself, and
it is impossible to be sure that the equality of the intervals has been achieved. Although
it is not unusual to interpret the results as if they correspond to an interval or ratio
measurement scale, this interpretation should be expressed in each specific case as a
working hypothesis.
d) classification with the help of a scale and scoring (See section 5.3.6)
5.3.3 Ranking
5.3.3.1 General
Classification is a method of distributing samples (physically or by means of labels
identifying them) into pre-defined categories.
5.3.3.2 Application
5.3.4.1 General
The overall assessment refers to a method of classifying samples into groups that
constitute an ordinal scale of quality.
5.3.4.2 Application
5.3.5 Ranking
5.3.5.1 General
5.3.5.2 Application
5.3.5.3 Procedure
It is necessary to ensure that the judges understand and agree on the attribute or
criterion on the basis of which the samples are to be ranked. Each judge independently
examines the coded samples in the established order and assigns a preliminary ranking.
The evaluators should then review this ranking by re-examining the samples and modify
it, if necessary, by changing the order.
STANDARD NTP-ISO 6658
PERUVIAN 21 of 38
TECHNIQUE
5.3.6.1 General
Scaled ranking is a ranking method in which each sample is assigned to some position on
an ordinal scale. More than one sample can be assigned to the same position on the
scale. The scale may be numerical, verbal, graphical or a combination of these. It may
be continuous or discrete and unipolar or bipolar (see NTP ISO 4121). If the scale is
numerical, the procedure is often referred to as the
-score‖. It may be useful for evaluators to have some samples for reference.
to identify specific positions on the scale.
5.3.6.2 Application
Rating with the help of a scale can be used to assess the intensity of one or more
attributes or degrees of liking of the samples.
Although both rank-ordering and scaled ranking invoke only ordinal scales, they are not
equivalent. Rank-ordering places the samples in order and therefore its results only
refer to the group of rank-ordered samples. Ranking with the help of a scale provides an
ordinal estimate of the magnitude of attributes or preferences because the same ordinal scale
is used regardless of the samples being evaluated. Therefore, scaled ranking is
preferable if the results of one set of samples are to be compared with others. But, since
rank-ordering encourages evaluators to use any perceived differences between samples, it
may reveal small distinctions between samples given the same ranking.
5.3.6.3 Procedure
The ranking method to be used should be clearly defined and understood by the assessors.
Each assessor independently examines the samples one by one in a set order and
assigns each sample to a position on a scale.
STANDARD NTP-ISO 6658
PERUVIAN 22 of 38
TECHNIQUE
These tests can be applied to one or more samples to characterise, both qualitatively
and quantitatively, one or more sensory attributes. They can be classified as:
5.4.2.1 Definition
5.4.2.2 Application
identify and describe the attributes of a particular sample or samples; and establish the
sequence in which these attributes are perceived.
STANDARD NTP-ISO 6658
PERUVIAN 23 of 38
TECHNIQUE
5.4.2.3 Procedure
The test can be applied to one or more samples. When more than one sample is presented
during a session, the order in which the samples are presented will have an influence.
The importance of this can be assessed by repeating the test, using a different order of
presentation.
Each assessor assesses the sample independently and the findings are recorded. A
checklist of attributes can be provided. The sensory evaluation may be followed by a
discussion led by the panel leader.
The results should be compared to draw up a list of descriptive terms applicable to the
sample, based on the frequency of use of each of these. An open discussion at the end of
the evaluation is often useful.
5.4.3.1 Definition
See ISO 6564, NTP ISO 8586-1 and ISO 13299 for further details.
STANDARD NTP-ISO 6658
PERUVIAN 24 of 38
TECHNIQUE
5.4.3.2 Application
5.4.3.3 Judges
5.4.3.4 Procedure
A preliminary test (or training) group is carried out with the variety of products to be
tested, in order to establish the organoleptic properties important for characterising and
distinguishing them. The results of these tests are used to elaborate the glossary of
descriptive terms to be used and to establish the experimental procedure to present and
examine the samples. A panel is then trained in the methodology and especially in the
use of the glossary. At this stage, it is useful to have a set of reference materials, pure
compounds or natural products, which obtain certain odour or taste scores or have
certain visual or textural properties.
In the test sessions, judges check the samples against the glossary of terms, scoring
each attribute present on an intensity scale.
It is usual to observe the order in which the factors are perceived, including the presence of
a residual flavour, and to rate the overall impression of aroma and flavour.
In consensus profiling methods, immediately after the assessors have completed their
assessments, the panel leader tabulates the results and initiates a discussion to resolve
differences. In the light of the discussion and, if necessary, after further examination of
the samples, the panel reaches a group decision on the profile.
In the other descriptive analysis methods, there may be no discussion and the profile
obtained is a series of averages of the scores assigned to each descriptor by each
assessor.
Averages can be statistically compared, for example, using analysis of variance. In
addition, for all descriptive analysis methods, multivariate analysis techniques are
available.
STANDARD NTP-ISO 6658
PERUVIAN 26 of 38
TECHNIQUE
5.4.4.1 Definition
5.4.4.2 Application
These tests are recommended for new product development (especially product space
perception mapping). Its most important advantage is that panel training is avoided.
5.4.4.3 Judges
5.4.4.4 Procedure
Each panel member develops his or her own idiosyncratic list of descriptive terms by
evaluating a wide variety of samples and trying to characterise and distinguish between
them.
Judges then make their individual evaluations of the test products using a traditional
descriptive evaluation sheet developed with their own vocabulary.
6. ANALYSIS OF RESULTS
6.1 General
This chapter provides general indications on the appropriate methods to be used for the
statistical analysis of sensory test results. Further details on specific tests can be found
in the appropriate International Standards mentioned in the Bibliography. Statistical
terms in bold are explained in Annex A and are in accordance with ISO 3534-1, ISO
3542-2 and ISO 3534-3.
6.2.1 General
The purpose of the discrimination tests described in 5.2 is to determine whether there is
a detectable difference between two products, A and B (or a preference for one of
them). The analysis is based on the number of assessors in each particular category, for
example, those who prefer A, those who prefer B, or those who correctly choose the
different sample.
This International Standard, which discusses each method in detail, also describes how to
use it to ensure similarity when required.
There are two possible forms of this test. The first relates to the detection and
determination of the direction of a specified difference between two products; the second
relates to the preference for one of them.
STANDARD NTP-ISO 6658
PERUVIAN 28 of 38
TECHNIQUE
In both cases, the null hypothesis is that no differentiation can be made between two
products (by intensity or by order of preference). In quantitative terms, the null hypothesis is
that there is an equal probability (1/2) that a randomly selected assessor from the panel will
select either sample A or sample B.
The interpretation of results based on the number of participants indicating that A (or B) has
the highest intensity or that they prefer A (or B), depends on the alternative hypothesis as
opposed to the null hypothesis. Depending on the nature of the alternative hypothesis,
which must be specified before testing, it will be either two-tailed or one-tailed.
The two-tailed test is one in which you simply want to find out whether there is a
difference in intensity between the two products (intensity test), or whether one of the
products is preferred over the other (preference test). The alternative hypothesis is written
PA
PB (i.e. PA > PB or PA < PB ).
At a 5% significance level, the null hypothesis is rejected if the number of votes for a
sample is at least equal to that in column 2 of Table A.1.
If this is the case, the conclusion will be that there is a significant difference between
the two products and, if the majority of votes are in favour of product A, the conclusion will
be that, for the characteristic in question, A has a significantly higher intensity than B
(or is significantly preferred, if that was the basis of the evaluators' votes).
A one-tailed test is one in which one wishes to discover whether one of the specially
designated products (A, for example) has a higher intensity than the other; the
alternative hypothesis is then PA > 1/2 . A directional test is appropriate only if
any result in the opposite direction will not be interpreted as a true effect but simply as a
chance result that does not call into question the null hypothesis.
STANDARD NTP-ISO 6658
PERUVIAN 29 of 38
TECHNIQUE
At a significance level of 5%, the null hypothesis is rejected if the number of votes in favour
of A is at least equal to that in column 4 of Table A.1. If this is the case, the conclusion
will be that the superiority of A over B (in intensity) has been significantly recognised
by the panel.
The null hypothesis is that it is not possible to distinguish between the products. In
this case, the probability P of identifying the sample that is different from the other
two is equal to P0 = 1/3. In statistical terms, the null hypothesis H0 is expressed by P0 =
1/3.
The test is one-tailed. The test supervisor wants to know whether it is possible to distinguish
between the two products, so he will reject the null hypothesis in favour of the
alternative hypothesis P > 1/3 .
If the number of correct answers is greater than or equal to the corresponding number
in column 3 of Table A.1, this corresponds to a proportion of correct answers
significantly higher than P0 = 1/3 at the 5% significance level.
The null hypothesis is that it is not possible to distinguish between the products. In
this case, the probability of identifying the sample that is identical to the reference
sample is equal to . In statistical terms, the null hypothesis H0 is expressed by P0 = 1/2
The test is one-tailed. The test supervisor wants to know whether it is possible to distinguish
between the two products, so he will reject the null hypothesis in favour of the
alternative hypothesis P0 > 1/2 , if the number of correct answers is greater than or equal
to the number in column 4 of Table A.1 corresponding to the 5% significance level.
STANDARD NTP-ISO 6658
PERUVIAN 30 of 38
TECHNIQUE
The null hypothesis is P0 = 1/10 . The test is one-tailed and the alternative hypothesis
is P0 > 1/10 . The number of correct answers is compared with the corresponding number
in column 5 of Table A.1.
The numbers of "A" responses and "non-A" responses are summed separately for samples
known to be "A" by the sensory analyst, and for those known to be "non-A", giving a 2 x 2
table.
Fisher's -exact‖ test to determine whether the proportions of "A" and "not A" responses are
different for the two types of sample.
The test is one-tailed, with the null hypothesis being that the two proportions are equal
and the alternative hypothesis that the proportion of "A" responses is higher for samples
known to be "A".
In discrimination tests, -no difference‖ answers can be presented. However, it can be stipulated
that these are not allowed (the forced choice technique). When this happens, the
responses of raters who would have somehow been able to make a difference are used.
The disadvantage is that it may contradict evaluators who honestly wish to record -no
difference‖. Its disadvantage is that it may upset evaluators who honestly wish to record -no
difference‖.
If -no difference‖ results are allowed, the number of assessors responding -no difference‖ is
reported and the statistical analysis uses only the results of those who reported a difference.
Conclusions are expressed as if they related to
assessors expressing a preference or reporting a difference.
STANDARD NTP-ISO 6658
PERUVIAN 31 of 38
TECHNIQUE
Checks should be carried out to see if there are any systematic effects in relation to, for
example:
6.3.1 General
The selection of a statistical method for sensory analysis with any of the tests listed in 5.3
depends on the purpose of the test and the number of products tested. This section
provides information on the statistical methods used. For further details in the specific
context of each test, the relevant statistical textbooks should be consulted or the advice of a
statistician should be sought.
STANDARD NTP-ISO 6658
PERUVIAN 32 of 38
TECHNIQUE
6.3.2 Ranking
The results obtained for a product type can be summarised as frequencies for each
category. Then, the chi-square test ( ) can be used to compare the distributions of two or
more types of a product in the different categories, i.e. to test the null hypothesis that the
distributions are the same against the alternative hypothesis that the distributions are
different.
When samples have been ordered by several assessors as in 5.3.2, statistical tests can be
performed to determine whether the samples are significantly different (rank sum tests).
Tests can also be performed to determine whether a given sample ranks significantly
higher or lower than the other samples.
When more than one sample is classified, a non-parametric method should be used to
compare the distributions obtained.
If the data meet the scoring conditions, either as is or after being transformed, then the
methods in 5.3.6 can be used.
6.3.6 Score
If only two samples are involved and the hypothesis of normality of the distribution of the
scores is reasonable, a t-test can be used (see ISO 2854). If the scores are obtained from
more than two samples, the normal procedure is the analysis of variance.
If the distribution of scores for each sample appears to be non-normal, the use of
distribution-free methods may be useful.
7. BACKGROUND
ISO 6658:2005SensoryanalysisMethodology
- General guidance
STANDARD NTP-ISO 6658
PERUVIAN 34 of 38
TECHNIQUE
ANNEX A
(INFORMATIVE)
STATISTICAL TERMS
The null hypothesis is generally one that states that there is no difference between
products with respect to the strength of a characteristic (or that there is no preference
for one of them).
The alternative hypothesis is the clearly stated hypothesis that will be accepted if the null
hypothesis is rejected. If the null hypothesis H0 is P = P01 , the alternative hypothesis
H1 can be two-tailed or one-tailed (e.g. P > P0 ). In 6.2.2, examples of both types are
given.
When analysing the results of a test, there are two possible conclusions: the null
hypothesis is not rejected;
Since any test is conducted by a limited number of testers, the conclusion that rejects
the null hypothesis (in favour of the alternative hypothesis) implies a risk. The
significance level is the probability (or maximum value of the probability) of rejecting
the null hypothesis when it is true. This is called -risk alpha‖.
The classical logic of significance testing requires that a decision be made in advance
regarding the acceptable alpha risk. Generally, the preset value for the significance level is
0 .05 (5%) or 0 .01 (1%). Most statistical tables used to interpret test results include these
two significance levels.
It is important to note that the null hypothesis can be rejected at the -5% level‖ but not at
the -1% level‖.
If the null hypothesis is rejected at the -level of 1%‖, it is in fact also rejected at the -level of
1%‖.
-level of 5%‖. This explains why the expressions -significant‖ are sometimes used.
for the 5% level and -very significant‖ for the 1% level.
If the test does not lead to the rejection of the null hypothesis, this in no way proves
that this hypothesis is true. It only means that, based on the limited information
available (test with n testers), there is not sufficient reason to reject this hypothesis (at the
chosen significance level). The greater the amount of information (the larger the n), the
more justified it is to reject the null hypothesis when it is false; the efficiency of the test
increases with the number of testers participating in the test. For example, in the
case of a preference test (6.2.2) conducted with 20 raters, the null hypothesis P0 = 1/2
may not be rejected (the conclusion being that there is no significant preference for
either product), whereas, if the test had been conducted with 100 raters, a significant
preference for one of the products could have been demonstrated from the same
proportions of the two choices.
The type two error (which depends on the chosen significance level) is the probability
(denoted by ) of not wrongly rejecting the null hypothesis when the specified
alternative hypothesis is actually true.
STANDARD NTP-ISO 6658
PERUVIAN 36 of 38
TECHNIQUE
If the null hypothesis and the alternative hypothesis can be defined with the values of
one parameter as in discrimination tests (paired comparison test, triangular test, duo-trio
test, etc.), the type two error can be calculated based on this parameter. For tests where
the null hypothesis and the alternative hypothesis cannot be defined using the values of
one parameter (evaluation tests, classification), it is generally not possible to calculate
the type two error.
Duo-Trio Test
Paired and Paired
Number of comparison Triangula Comparison Test Test two
evaluators test (two- r test (one-tailed) out of five
tailed) 5
5 -
5
5
8 8
8 8
8
8
5
5
5
5
5
5
21
25
26
ANNEX B
BIBLIOGRAPHY
13. NTP ISO 8586-1, Sensory analysis - General guidance for the selection,
training and monitoring of assessors - Part 1: Qualified assessors
14. NTP ISO 8586-2, Sensory analysis - General guidance for the selection,
training and follow-up of assessors - Part 2: Experts
17. PNTP ISO 8589, Sensory analysis - General guidance for test room design.
23. Roessler, Pangborn, Sidel and Stone, J. Food Science, 43, 1978, p. 940.