G.W. Hiddink: Educational Multimedia Databases
G.W. Hiddink: Educational Multimedia Databases
G.W. Hiddink
2001
Ph.D. thesis
University of Twente Twente University Press
ISBN 9036516773
EDUCATIONAL
MULTIMEDIA DATABASES
PROEFSCHRIFT
door
Five years ago, I embarked on a journey towards a new country. I did not know
much about that country; where or how to find it? I only knew that this country
would be able to provide a fertile soil for my academic ambitions.
The end of the journey is now in sight. Research questions have been posed,
answers have been sought, and research results have been documented and reported
upon. Doing these activities kept the train in motion. Without realising it, the
destination of the journey came closer with each step. Now it is time to disembark
and to explore the country.
I would like to thank my fellow Idylle researchers for joining me on (parts of)
the journey. Some of you have chosen a different destination or direction to travel
to, but nevertheless your company made the journey more pleasant.
I would also like to thank my supervisors Henk Blanken and Pløn Verhagen for
helping me to find ways to reach the destination; roadmaps of this kind of journey
are difficult or impossible to obtain, so their help was much appreciated.
Also, many thanks to my promotors Peter Apers and Jef Moonen for providing
a critical look, and for checking if the train was indeed going towards the chosen
destination in a certain and efficient way.
And last but not least, my gratitude to Joyce. Our roads crossed and then
merged. Thanks for sharing this and many other destinations with me.
i
ii
Contents
Preface i
List of tables ix
List of figures xi
1 Introduction 1
1.1 Computers in education . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Multimedia databases in education . . . . . . . . . . . . . . . . . 3
1.2.1 Multimedia . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Educational Multimedia . . . . . . . . . . . . . . . . . . 4
1.2.3 Databases in education . . . . . . . . . . . . . . . . . . . 5
1.2.4 Searching Multimedia Documents and Metadata . . . . . 6
1.3 How to store multimedia learning material . . . . . . . . . . . . . 8
1.3.1 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 Evolution of the problem statement . . . . . . . . . . . . 10
1.5 About this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 11
iii
2.3.3 The Media Debate . . . . . . . . . . . . . . . . . . . . . 22
2.3.4 The Hypermedia Myth . . . . . . . . . . . . . . . . . . . 23
2.3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Theoretical Framework 35
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 The simple ULM Model . . . . . . . . . . . . . . . . . . . . . . 35
4.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.2 Nesting ULMs . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.3 The size of ULMs . . . . . . . . . . . . . . . . . . . . . 39
4.2.4 Context Adapters . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Summary on the ULM model . . . . . . . . . . . . . . . . . . . . 42
4.4 Reusability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.1 Aspects of reuse . . . . . . . . . . . . . . . . . . . . . . 43
4.4.2 Graphical representation . . . . . . . . . . . . . . . . . . 46
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
iv
5 Access and Retrieval Methods 49
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.1.1 Relational query languages . . . . . . . . . . . . . . . . . 51
5.2 Adding Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2.1 Annotating Metadata . . . . . . . . . . . . . . . . . . . . 53
5.2.2 Advantages of using metadata . . . . . . . . . . . . . . . 54
5.3 Extracting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3.1 Keyword Indexing . . . . . . . . . . . . . . . . . . . . . 55
5.3.2 Form and Movement recognition . . . . . . . . . . . . . . 55
5.3.3 Formal Feature Extraction . . . . . . . . . . . . . . . . . 56
5.4 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5 Conceptual Structure . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5.1 Knowledge Networks . . . . . . . . . . . . . . . . . . . . 57
5.5.2 Naming Hierarchies . . . . . . . . . . . . . . . . . . . . 58
5.6 Generic techniques . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.6.1 Inference Networks . . . . . . . . . . . . . . . . . . . . . 59
5.6.2 Document similarity . . . . . . . . . . . . . . . . . . . . 59
5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.8 Educational Metadata . . . . . . . . . . . . . . . . . . . . . . . . 60
5.8.1 Dublin Core . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.8.2 The IEEE Metadata set . . . . . . . . . . . . . . . . . . . 62
5.8.3 “Voluntary Labeling” problem . . . . . . . . . . . . . . . 62
5.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6 A Prototype Architecture 65
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.2.1 Functional requirements . . . . . . . . . . . . . . . . . . 66
6.2.2 External Interface Requirements . . . . . . . . . . . . . . 66
6.2.3 Performance Requirements . . . . . . . . . . . . . . . . . 67
6.2.4 Design Constraints . . . . . . . . . . . . . . . . . . . . . 67
6.3 Common Architectures . . . . . . . . . . . . . . . . . . . . . . . 70
6.3.1 Web-enabled databases . . . . . . . . . . . . . . . . . . . 70
6.3.2 Accessing databases via the World Wide Web . . . . . . . 70
6.4 Prototype Architecture . . . . . . . . . . . . . . . . . . . . . . . 73
6.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.5.1 Software components . . . . . . . . . . . . . . . . . . . . 76
6.5.2 Generating HTML . . . . . . . . . . . . . . . . . . . . . 77
6.5.3 Encoding Interaction . . . . . . . . . . . . . . . . . . . . 80
6.5.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 83
v
6.6 Review: Inside or Outside? . . . . . . . . . . . . . . . . . . . . . 84
6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
vi
8.5.2 Validity of the distance measure . . . . . . . . . . . . . . 150
8.5.3 The effect of the weight vector . . . . . . . . . . . . . . . 151
8.5.4 Usability of the distance measures . . . . . . . . . . . . . 153
8.5.5 Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
References 165
Summary 175
Samenvatting 179
Appendices 183
vii
A.5.1 Labeling system . . . . . . . . . . . . . . . . . . . . . . 190
A.5.2 Search facilities . . . . . . . . . . . . . . . . . . . . . . . 190
A.6 Optical Database Project . . . . . . . . . . . . . . . . . . . . . . 190
A.6.1 Labeling technique . . . . . . . . . . . . . . . . . . . . . 190
A.7 Eisenhower National Clearinghouse . . . . . . . . . . . . . . . . 190
A.7.1 Labeling system . . . . . . . . . . . . . . . . . . . . . . 191
A.7.2 Search Interface . . . . . . . . . . . . . . . . . . . . . . 191
A.8 The NEEDS database . . . . . . . . . . . . . . . . . . . . . . . . 191
A.8.1 Search Interface . . . . . . . . . . . . . . . . . . . . . . 191
A.9 ARIADNE Knowledge Pool . . . . . . . . . . . . . . . . . . . . 192
A.9.1 Search Interface . . . . . . . . . . . . . . . . . . . . . . 192
A.10 ELECTRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
A.10.1 Search Interface . . . . . . . . . . . . . . . . . . . . . . 192
A.11 Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
A.12 Search Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
A.13 TeLeTOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
A.14 Online databases . . . . . . . . . . . . . . . . . . . . . . . . . . 194
A.14.1 Pathlore PHOENIX Web . . . . . . . . . . . . . . . . . . 194
A.14.2 PedagoNet . . . . . . . . . . . . . . . . . . . . . . . . . 194
A.15 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
B Questionnaire 197
C Instruments 210
C.1 Invitation letter . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
C.2 Experiment manual . . . . . . . . . . . . . . . . . . . . . . . . . 211
C.3 The IEEE Metadata Standard . . . . . . . . . . . . . . . . . . . . 218
C.4 Evaluation form . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
D Data 223
E Proofs 229
E.1 The WTD measure is a distance . . . . . . . . . . . . . . . . . . 229
E.2 The WED measure is a distance . . . . . . . . . . . . . . . . . . 231
F Glossary 235
Index 247
viii
List of Tables
7.1 An example metadata field and the use of the rank r(x) function. . 92
7.2 Another example metadata field and its rank r(x) function. . . . . 92
7.3 Example of how a teaching conception can be built up using the
five dimensions of Samuelowicz and Bain. . . . . . . . . . . . . . 99
7.4 The five characteristics of learning material that have been selected
for the experiment. . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.5 The seven variables and the number and type of questions that mea-
sure them. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.6 Distribution (both in numbers and percentage of returned question-
naires) of respondents across the faculties. . . . . . . . . . . . . . 112
7.7 Pearson’s correlation between case-based questions and attitude
questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.8 Reliability of the answers, 48 subjects . . . . . . . . . . . . . . . 115
7.9 Characteristics that have a significant mean difference between two
contexts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.10 Correlations between characteristics of learning material and di-
mensions of teaching conceptions. . . . . . . . . . . . . . . . . . 117
8.1 Initial list of factors that are expected to play a role in determining
whether a ULM is useful or not. . . . . . . . . . . . . . . . . . . 127
8.2 Fitness of metadata fields to serve as metric space . . . . . . . . . 131
8.3 Specification of the ULM profiles for the validation experiment. . 134
8.4 The actual ULMs and their metadata values that were used in the
experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
8.5 The framework for the design of the case descriptions . . . . . . . 137
8.6 The ULMs (vertically) and their fitness for particular cases (hori-
zontally) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
ix
8.7 The list of search results: six ULMs varying from ‘useful’ to ‘use-
less’ for each of the six cases. . . . . . . . . . . . . . . . . . . . . 141
8.8 The modified cases (vertical) with six ULMs, both useful and use-
less (horizontal). . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.9 Spearman’s correlation coefficient for the relationship between the
subject’s score S and the three distance measures EWD , WED ,
and WTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.10 Correlations between the weighted Euclidean distance and the test
subjects’ scores with different weight steps (N=90). . . . . . . . . 153
x
List of Figures
8.1 Search form that was designed to be used during the experiments. 145
8.2 An example of the ‘search results’ screen. . . . . . . . . . . . . . 146
8.3 An example of a ‘metadata overview’ screen. . . . . . . . . . . . 147
8.4 The correlations between the weighted Euclidean distance and the
test subjects’ scores with different weight steps (N=90). . . . . . . 152
8.5 Histogram of the times a factor was relevant to a subject (N=90). . 154
xi
A.1 Growth of the GEM database . . . . . . . . . . . . . . . . . . . . 188
A.2 Overview of all projects and their interrelationships . . . . . . . . 195
xii
Chapter 1
Introduction
1
could be programmed by creating menu structures (Moonen, in press).
In the early years a lot had yet to be learned about the design and use of (educa-
tional) software: not many generic software components were used; the programs
were each written from scratch, and many used proprietary file formats so that ex-
changing data between programs was difficult or impossible. Presently, however,
many course environments are implemented as software modules (using CGI tech-
nology, Active Server Pages, or Java servlets for example) of a World Wide Web
server that stores data in a database. Examples of these systems are Blackboard1 ,
WebCT2 , and TeleTOP3 . Using these course environments, learning materials can
be offered on the Web. The materials themselves are authored using common word
processors, presentation tools, HTML authoring tools and other user-friendly soft-
ware. These materials are then “uploaded” to the web server, which then stores
them into a database for later retrieval.
Already since the beginning of the nineties, ideas and concepts were generated
concerning storing learning materials into a database (Persico, Sarti, & Viarengo,
1992). One of the major advantages that were explored, were the opportunities
for reuse of learning materials (Olimpo, Chioccariello, Tavella, & Trentin, 1990;
Rada, 1995; Sarti & Van Marcke, 1995, August). Although opportunities exist,
it is very hard to achieve actual reuse. The Ariadne project4 , for example, has
as motto “share and reuse”; it turned out, however, that many teachers wanted to
reuse, but only few wanted to share (Ariadne, 1999). A project that achieved “reuse
by design” was the Optical Database project (ODB) (Bestebreurtje & Verhagen,
1992). In this project, the curricula (using traditional media) on cheese-making
were collected from vocational schools at various levels, and this material was re-
engineered into components (learning objects) consisting of video-framents on a
videodisc and computer-based instruction software so that each school could reuse
those components that suited their needs (Verhagen & Bestebreurtje, 1995), in a
form that was appropriate for the educational level of that school.
Due to the exponential growth of the World Wide Web since the mid nineties,
the world-wide availability of easily accessible learning materials has sparked the
re-emergence of these “old” concepts in the late nineties, and these concepts are
presently being further developed to generate knowledge and insights into storing
and retrieving learning materials.
This thesis tries to contribute to this development by refining the concept of
a Unit of Learning Material (ULM), by exploring factors that inhibit the reuse
1
www.blackboard.com
2
www.webct.com
3
teletop.edte.utwente.nl
4
Alliance of Remote Instructional Authoring and Distribution Networks for Europe; see also
http://ariadne.unil.ch
2
of these materials, and by exploring novel ways of retrieving relevant learning
materials from large databases of learning material.
1.2.1 Multimedia
The term “multimedia” can have many different meanings, but this thesis will adapt
the popular meaning of information that is represented in multiple ‘media’ simul-
taneously, for example text, motion video, and audio.
During the eighties, some Personal Computers (PC’s) were able to play mul-
timedia data, however, they needed dedicated hardware such as videodisc players
and graphics overlay cards. These discs contained analogue video fragments of
various lengths Verhagen (1992) and still pictures, which were “overlayed” onto
the computer screen by a special overlay card, without intervention of the com-
puter itself.
During the nineties, however, several technological advancements have en-
abled PC’s to handle multimedia data without additional hardware:
The average CPU speed has increased from about 16 MHz in the year 1990 to
over one GHz (1000 MHz) in 2000, which is more than sufficient computing
power to decode and display the MPEG, QuickTime and RealVideo streams
in real-time without additional hardware (and so without extra costs). The
term “real-time” indicates that the movements in the video will be perceived
as being smooth, and that the audio will run without interruptions.
Personal Computers are often used to play video games, and at least one
positive aspect of this is that the video cards used in PC’s have also seen
5
See http://www.real.com
3
a tremendous advancement during the nineties. The first graphical cards
(i.e. cards that can not only display text but also images) had 256 KB of
memory, nowadays 8 MB (32 times as much) is not unusual. These cards
are optimized to transfer data as quickly as possible from the computer’s
memory to the video memory that resides on the card.
The operating systems have also seen some improvements towards display-
ing video: the Microsoft Windows series has incorporated Intel’s DirectX
technology, which effectively means that an application can gain (almost)
direct access to the video card instead of having to do this via the operat-
ing system. The X11 windowing system for Unix and derivatives also allow
an application to directly access the video card using shared memory tech-
niques. This direct access to the video card means that applications can
display images much faster.
The ‘invention’ of the World Wide Web made the Internet much easier to
use, in fact so easy that also non-technical people can use it. This caused a
tremendous growth of the WWW, and also created a large demand for as-
sociated computer supplies: modems to connect to the Internet, scanners to
digitize photographs and images, soundcards to digitize one’s voice, web-
cams to transmit motion video, and colour printers to print images received
via the Internet. This large demand lowered the costs for these auxiliaries,
so that even more people could afford them.
Due to these advancements, PC’s are even better able to display video frames
at sufficient speeds onto the screen, and people are able to afford a computer and
auxiliary hardware to create and process multimedia data.
The impact of the fact that PC’s can play video “out of the box” (as it was
bought in the shop) is large: information producers can rely on the fact that video
can be played by the PC, so that they use more and more video fragments in their
products (in the form of CD-ROMs or websites). But what about the application
of multimedia in education? We will talk about this in the next section.
4
As the typical number of discs for an educational institution ranged from one to ten
or twenty pieces, this technology was too expensive for most institutions. Due to
the technological advancements explained in the previous section, the equipment
costs of multimedia equipment has become within economic reach of many more
educational institutions. So, multimedia is becoming more common in educational
applications, although its growth is limited due to the fact that video files are still
quite large, causing download times over several minutes which is undesirable in
many educational scenario’s. The implementation of Asynchronous Digital Sub-
scribe Line (ADSL) connections (via the telephone line) and Internet via the TV
cable in residential areas has already begun, offering sufficient bandwidth to re-
ceive high-quality video streams. Therefore, it can be expected that multimedia
information will become more and more common, in general as well as in educa-
tional applications.
The pedagogical issues of using multimedia in education are myriad. On the
one hand, multimedia seem to motivate learners, increase learning effects, provide
unique opportunities to show movements and provide insights that would other-
wise have been difficult to transfer onto learners (Salomon, 1984; Kozma, 1991).
Yet on the other hand, research results seem to indicate that if you formulate the
same ‘message’ in traditional media (eg. text) and account for other confounding
effects, then there is no difference in the learning effect (Clark, 1983, 1985). In
practice however, one often creates different messages when using new media, so
that different learning effects may be expected. These issues and the debate that
surrounds them is called the “Media debate” and will be discussed in more detail
in Section 2.3.3.
5
disturb other people. However, the problem of going through directories and files
persists, even if the multimedia files would have been put on CD-ROM.
There are several projects that are further exploring and testing principles for
storing and (re-)using multimedia learning material in databases, so that a Da-
tabase Management System (DBMS) can provide controlled access to centrally
stored materials. This control consists of providing high-level search capabilities
(the teacher is able to compose structured queries, instead of having to examine
filenames), access protection (a student is not allowed to retrieve the answers to
exam questions) and ensuring data integrity. We will go deeper into this subject in
Section 2.2.
Examples of these projects are the IMS project (Instructional Management Sys-
tem, see also Appendix A.2), which tries to develop methods and techniques of
instructional management systems by building and testing a prototype. The Eu-
ropean Ariadne project (see also Appendix A.9) is targeted at the development
of tools and methodologies for producing, managing and reusing computer-based
pedagogical elements and telematics supported training curricula. Finally, there
are several smaller projects that try to achieve similar goals (Hiddink, 1998).
The fact that large corporations are participating in these projects, and the ac-
tivities they are undertaking are indications that there is a need for construction-
and implementation methodologies for educational multimedia databases. Espe-
cially the IMS project has attracted large investing software companies, such as
Microsoft, Apple Computer, Oracle and Sun Microsystems; but also educational
companies such as Pearson Education, Eduprise.com, Asymetrix and some uni-
versities are so-called investment members of the IMS project. The fact that the
Learning Technology Standards Committee of the IEEE (Institute of Electrical and
Electronics Engineers) has a working group on Learning Object Metadata stan-
dards6 provides further evidence for the proposition that storing multimedia learn-
ing objects in a database and retrieving them using metadata are a key issue in the
electronic learning environments that are presently being developed by the soft-
ware industry.
6
sist of HTML files, presentation files (such as Powerpoint) or wordprocessor doc-
uments. A teacher that is looking for certain materials types in some keywords
into a search engine, clicks “submit”, and then often a very long list of matching
documents is presented. How does this mechanism work?
A WWW search engine indexes all documents that are available on the Web. It
does this (crudely) in the following manner: it assigns each document a unique
number, creates a very large list of all words present in all documents, and stores
for each word a list of document numbers in which that word appears. It also
stores the URL where that document came from. Words that are very common,
such as ’a’ and ’is’ are not indexed. For example, if document 15363 at URL
http://www.url.com/index.html that says “This page is empty” then it adds 15363
to the document list of the words ‘this’, ‘page’, and ‘empty’. If a user searches
for the word ‘page’, then the search engine looks through the document list of the
word ‘page’ and returns, amongst others, document 15363 and it presents the URL
to the user.
This is the basic principle that is used by most search engines7 . Many exten-
sions to this mechanism exist that relate to the frequency of words throughout all
documents: a word that is used very frequently, such as ’this’, does not say much
about a document, while a word that is used infrequently such as ‘ULM’ does.
This principle works fine with text- or HTML documents, but what about other
documents? The current search engines are not able to index Microsoft Word doc-
uments or Powerpoint documents, for example. A search engine also cannot ‘see’
what is inside a video clip or an audio clip. Yet, a teacher may be looking for a
video clip that shows what manual gestures a person makes when giving a presen-
tation. A search engine will then be less helpful; it can only find a HTML page that
writes about it. The teacher may then have to follow a few links to finally arrive at
the video.
So it can be observed that the more multimedia documents appear on the WWW,
the more difficult it becomes to find these documents using current search engines.
This problem of bad retrievability can be solved by adding metadata to multimedia
learning objects, so that a search engine is able to ‘look inside’ objects for indexing
purposes. We will elaborate on this in Chapter 5.
7
Unfortunately, the precise algorithms of the current WWW search engines are often not publicly
disclosed, because these are intellectual property of the respective companies.
7
1.3 How to store multimedia learning material
Already as early as in the beginning of the nineties (Olimpo et al., 1990; Persico
et al., 1992) it was proposed to model learning material in certain units: Units of
Learning Material (ULMs) and to store them in a database. Many variations on this
model have been developed throughout the years: learning objects, (interactive)
multimedia units, etcetera.
Many prototypes that stored multimedia learning material in databases were
built the past decade (Hiddink, 1998). The fact that the database would have
to store multimedia data with variable, unknown length in its tables was not the
largest problem; this problem is often solved by storing the actual multimedia data
elsewhere, and inserting a reference (for example a URL or a filename) into the
database tables. A problem that is more challenging, is how to find back learn-
ing material. The field of Information Retrieval (IR) has already provided a lot
of insights into the generic problem of retrieving multimedia data, and several ap-
proaches have been devised to try to solve the problem (Grossman & Frieder, 1998;
de Vries, 1999); we will discuss several approaches in Chapter 5.
1.3.1 Metadata
A database system can make use of so-called metadata to know more about the
multimedia data. Metadata are information (data) about these multimedia data
objects, such as: who created them, what are they about, how should they be inter-
preted, how should they be presented to the user when retrieved, and other proper-
ties of the data objects. If the multimedia data are stored outside the control of the
DBMS as described above, then the metadata also contains a reference where the
real multimedia object can be found (such as a URL or a filename). Finding multi-
media data then consists of composing a database query on the metadata, retrieving
the metadata, and referencing the pointer to the raw multimedia data.
Metadata for learning material has been a research focus for the past decade,
although the term “metadata” was not used at that time. Instead, researchers dis-
cussed about “attributes for educational database objects” (Olimpo et al., 1990;
Persico et al., 1992) or “labels” to be assigned to units of learning material. During
the nineties, many groups that researched learning objects (such as NIST, IEEE,
IMS and ARIADNE, see Appendix A) realized that standardizing metadata would
mean that educational institutions could reuse each others’ learning material, in-
creasing the options of choice for educational designers. See Hiddink (1998) for an
elaborate overview of projects that researched educational learning objects. During
1998, several of these groups either merged, joined forces or disappeared, and cur-
rently almost all groups have united in the IEEE P1484.12 group called Learning
8
Multimedia
video DBMS
server
meta
data
user
Figure 1.1: Architecture of a database system that uses a dedicated video server to
deliver videostreams.
Objects Metadata Group. These groups are working together to standardize edu-
cational metadata, and seem to form a critical mass that may induce a world-wide
exchange of learning materials based on a standard that is generally agreed upon
by all parties.
1.3.2 Architecture
When storing multimedia data, the common practice is to store the “raw” data
(the multimedia data streams) separate from their metadata (data about data, see
Section 1.3.1), for example on a dedicated video server, or on a web server; see
Figure 1.1. The incentive to do this is the proposition that a video server has been
designed especially to deliver video streams. Sometimes, even the filesystem of
the video server has been optimized to deliver a certain type of videostream, such
as MPEG movies. A database management system has been designed for other
purposes than delivering MPEG movies, so that it will not be able to perform this
task optimally. Sometimes, it is even impossible to integrate the video delivery
software into the database because this delivery software is proprietary, for example
the RealVideo format. For these reasons, the architecture shown in Figure 1.1 is
often the best (or the only) way to make multimedia materials accessible to the
user.
9
1.4 Research Questions
The issues problems described above were addressed by a a specific research pro-
gramme of the University of Twente called ‘Idylle’: “Innovative Distributed Learn-
ing Environments”. This multidisciplinary project focused on increasing the so-
called “study-ability” of academic education by introducing various telematics
tools (IDYLLE, 1996). Within this framework, the current research targeted on
developing methodologies for building educational multimedia databases.
The research questions were stated rather broad in the subproject’s description
(Verhagen, Blanken, Moonen, & Apers, 1996), and have been refined during the
first two years of the project as the researchers’ knowledge richened through read-
ing and reasoning. This section will try to give insight into the development of the
research questions.
The first issue that arises is: what are these “generic building blocks”? In ear-
lier research, a method to structure knowledge was developed (Bestebreurtje, Ver-
hagen, & Zwart, 1995; Bestebreurtje, 1989) so that certain reusable ‘knowledge
elements’ could be identified. Using these elements new curricula were designed
as well as multimedia learning materials that enabled the transfer of these knowl-
edge elements onto the learner, and then these materials were stored on a laserdisc
and tested in practice. From these tests it was concluded that the approach “is
only valid for what is called ‘process technologies’ or vocational training” (Beste-
breurtje et al., 1995). The current research, however, targets at academic education,
so a new approach is required. It was decided to try to structure elements of learn-
ing material instead of knowledge, because for academic material the instructional
design step from subject matter to actual learning material is much larger than for
vocational training (Hiddink, 1997). This led to the development of the so-called
“Unit of Learning Material” (ULM). This model will be presented in Chapter 4.
The second issue is the reuse of materials for “multiple target groups on certain
but multiple educational levels for multiple educational objectives”. As explained
earlier, many researchers agree that multimedia learning material can be expensive
10
to produce, especially when video material is involved (Verhagen & Bestebreurtje,
1994; Tan & Nguyen, 1993) and that reusing these expensive materials may reduce
the total costs (Rada, 1995; Sarti & Van Marcke, 1995, August; Olimpo et al.,
1990). But what are the properties of material so that it becomes (re)usable for
multiple target groups, on multiple educational levels, and for multiple educational
objectives? What determines this ‘reusability’ of learning materials? To resolve
these issues, four research questions were formulated:
RQ1 What is an appropriate model to store and retrieve multimedia learning mate-
rial so that it can be used by multiple target groups with different information
needs?
RQ2 What factors are of influence on the reusability of learning materials that are
stored in a multimedia database?
Chapter 4 will identify, amongst others, that a factor that influences the reusabil-
ity of learning material is the search method: what methods and techniques are
used to search through the database? As the focus of this research is reusability of
learning material, a third and fourth research question can be formulated:
The answers to these research questions will bring forward many insights into
the storage and retrieval of multimedia learning material, and in software architec-
tures that are suited to build an application necessary to deliver multimedia learning
materials in a flexible manner.
11
Looking at the thesis in some more detail, Chapter 2 discusses aspects of mul-
timedia database systems relevant to two different viewpoints: a technology view-
point and an educational viewpoint.
Chapter 3 will then give the state of the art of existing educational database sys-
tems. The major conclusion of this chapter is that most systems do not build upon
well-defined models. This thesis proposes that the functionality of these systems
can be increased if better models are used, so a theoretical framework of the ‘Unit
of Learning Material’ has been developed. Chapter 4 will discuss a model of learn-
ing material that provides many ways to facilitate learning and to enhance browsing
and searching the database using educational metadata. Economically, one of the
most important advantages of using a database system is the opportunities for reuse
of learning material. So, to explore the theoretical context of this concept the chap-
ter will also present a model of the factors influencing reuse of learning material,
thus providing a clear view of how actual reuse can be increased.
After that, Chapter 5 will focus on making units of learning material accessi-
ble. The techniques that have been developed so far will be explored, and from
these techniques one will be selected: annotating learning objects with educational
metadata ‘by hand’ as opposed to doing this automatically, and using the metadata
fields in database queries, search forms and advanced retrieval techniques. The
chapter will discuss the development of educational metadata, and show how it can
be used.
Then, Chapter 6 will describe a prototype of an educational database system
that was developed as a ‘proof of concept’ of several concepts introduced in this
thesis. It will be shown how reusability can be increased by decoupling the layout
of learning material from their structure by using a markup language known as
“XML”, and how the model of a Unit of Learning Material can help the learner
navigate through the database.
The prototype also served as a playground for research to increase the way
search results are presented. Often, this is a long list ordered by characteristics such
as word frequencies and document lengths. In the research described in Chapter 7,
the use of a measure of relevance based on educational metadata is proposed. To
be more specific, the measure of relevance is based on the teachers’ preferences for
certain characteristics (metadata fields) of learning material. The chapter describes
the mathematical basis of a distance measure, and the nature of the “preference for
characteristics of learning material”: what factors influence this preference?
In Chapter 8 the distance measure is tested to see if it is able to predict, to
some extent, the relevance of search results to teachers that are looking for learning
material.
Chapter 9 will end this thesis by presenting issues that remain open for future
research, and by providing some reflection upon the conducted research.
12
Chapter 2
Multimedia Databases in
Education
2.1 Introduction
The term “database” can mean different things. In general, a “database” is a col-
lection of related data (Elmasri & Navathe, 1989, p. 3) that can reside on any
digital medium such as a computer disk, a computer’s main memory, or on a CD-
ROM. The collection of data is often managed by a Database Management System
(DBMS). Without a DBMS, the database itself may loose its meaning; it is no
longer clear how to interpret the data and the “organized collection” is reduced to
meaningless files with ‘unreadable’ binary data. So, often we will denote this duo
of the database and its DBMS just as a “database system”.
A special kind of database is the so-called “relational database”, which is man-
aged by a “Relational Database Management System” (RDBMS). This type of
database finds its roots in a well-known article by Codd (1970) who developed a
relational algebra that formed the basis for relational databases and the Structured
Query Language (SQL). Data in this type of databases is represented in tables
where the rows represent data records, and the cells represent record fields. Al-
most all current commercial database systems have adopted this model because it
allows database applications to access the data through a formal definition, thus
abstracting from the way the data are physically stored and accessed.
The DBMS provides the functionality of structured access to the data using a
query language, such as SQL. However, additional software is then still needed to
process the data, or present it in graphical or table form. The software is needed
to do this is called the “database application”. For the communication between
the database application and the DBMS special computer languages have been de-
13
veloped. The first few DBMS’s each had their own computer language, so that
a particular database application ’A’ could only communicate with a DBMS of
brand ’X’. If brand ’Y’ would introduce a better, faster, and/or cheaper system,
then it would not be possible for application ’A’ to use it. To solve this problem, an
Application Programming Interface (API) has been developed by Microsoft Cor-
poration. The language is called “ODBC” which means Open Database Connecti-
vity1 , and it has become the de-facto standard for communicating with databases.
A Java version has been derived by Sun Microsystems for Java applications, called
“JDBC” which stands for Java Database Connectivity2 . If a database application
uses one of these languages, then it is very easy to move from DBMS brand ’X’
to DBMS brand ’Y’. Often, the only changes needed are configuration details and
some SQL query syntaxes.
The total system of database, DBMS and database application will be called
“database system” throughout this thesis. The architecture is pictured in Figure 2.1.
If the database application has been designed to store, retrieve, and deliver learning
materials then we will call the system an “educational database system”.
This thesis will take the currently available Database Management Systems as
a starting point. This research project will experiment with innovative ways to
implement search algorithms in an educational setting, so a working, stable DBMS
is needed for experiments. On top of this DBMS, new methods will be explored.
Therefore, this thesis will not concentrate on the internals of DBMS’es (such as
query optimizers, or logical and physical storage methods). Instead, it will take a
practical viewpoint with regard to databases and explore techniques that are needed
to build the total system of database, DBMS, and database application as depicted
in Figure 2.1.
When implementing an educational database many issues arise: how will the
teachers and the students cope with the technological innovation (Russell & Bradley,
1997; Tobin & Dawson, 1992; Riel, 1994; Kromhout & Butzin, 1993; Becker,
1999)? What will the total “cost of ownership” be and what will be the (econom-
ical) benefits? What is needed to re-engineer traditional courses into computer-
supported courses? What needs to be done to train the end-users (Collis, 1998b)?
What methods exist to learn more about the degree of success of an implementa-
tion? Answers to many of these questions have been sought elsewhere and in spite
of their relevance they are not within the scope of this thesis.
This chapter will discuss some issues that are very important to the current
research: labeling learning material and how an educational database can be used
by learners. But first some aspects of database technology will be discussed.
1
see http://www.microsoft.com/data/odbc/
2
see http://www.sun.com
14
Database Application
ODBC/JDBC
Database
A database not only contains the data, but also data about data (so-called metadata)
such as the data definitions, the relationships between data, etcetera. All informa-
tion that is needed to interpret, access, and manage the data are contained within
the database. This centrality may prevent programming errors due to the otherwise
distributed nature of the data.
Suppose, for example, that personnel data are not stored in a database but in a
file on a file system. Let us say that the “address” field is 40 characters long, and
that data typists have noticed that some addresses do not fit properly into this size.
15
The company decides that the address field size has to be increased to 50 characters,
which has as a consequence that the birthdate field that started at position 60, now
starts at position 70. This modification is reflected in the metadata of the database,
so that access software immediately knows that the birthdata field now starts at
position 70. So, the data the DBMS manages contains all information needed to
manage the data, and no additional information is stored elsewhere: the DBMS is
self-contained.
Data Abstraction
A database management system (DBMS) provides users with a conceptual repre-
sentation of the data. Then, the user is not bothered with technical details of how
the data are stored; instead, data models are expressed in logical concepts such as
objects, their properties, and their relationships with other objects. The user can
use these logical concepts in queries that seem very logical, e.g. “select name,
income from persons where (income < 40 000) and (childr
en > 3)” which would retrieve the name and income of those persons that earn
less than 40000 and have more than 3 children. The user is, fortunately, totally
unaware of how the table ‘persons’ is stored on the file system.
This property enlightens the task of the programmer writing the educational
database application. He does not have to worry about writing data to files, orga-
nizing the files, creating programming code for maintaining indexes to find data
16
back, making sure the data stays consistent etcetera. Instead, the programmer just
writes code to connect to a DBMS, issue a query, and read the result tables. This
code, furthermore, is very common so software components are readily available
that perform these tasks. Writing applications that access very large collections of
data thus becomes very easy. The fact that pre-built, well-tested components are
used reduces the amount of programming errors that have to be resolved before the
application is ready for use.
Multiple Views
A DBMS is able to present multiple views to support different data needs of dif-
ferent users. The data needs to be entered only once, avoiding redundancy and
inconsistency. This also means that a DBMS must take care that only one user at a
time can modify data: if two teachers want to add one grade to a student’s record
simultaneously, the results can be unpredictable.
To be able to define multiple views is a requirement for educational database
systems: the teacher should be able to view the structure of the learning materials
in all details, and modify them if needed. The student, however, should only be
able to view as much structure as is didactically appropriate. Also, for different
students there could be different views: some students may need to go through a
fixed schedule of learning materials step by step; others, perhaps more advanced,
may be allowed to browse the learning material by subject or search through the
database by subject keyword. Thus, the same materials are accessed from multiple
views.
Enforcing Constraints
A DBMS can have certain constraints that must hold on the data. For example, the
value of a database object ‘grade’ must lie in the range of 1 to 10 (in the Nether-
lands), or that a course must belong to at least one department. A DBMS can
enforce these constraints by warning the user if a data entry or modification would
violate a constraint, and refusing the entry. Thus, the integrity of the data can be
guaranteed.
Search Capabilities
The database of a relational DBMS can be searched using the Structured Query
Language. As mentioned in Section 2.1, the introduction of a relational algebra al-
lows the user to state queries in terms of conceptual entities such as “courses” and
“persons” in SQL. Although this approach works fine for data that can be easily
17
stored in tables, such as administrative data, it does not work well with multimedia
objects (such as multimedia learning materials). The Information Retrieval disci-
pline has developed many approaches to tackle this problem, and we will discuss
the most important of these in Chapter 5. Most techniques require a DBMS to ex-
ecute the complex tasks that are often required to retrieve multimedia information,
tasks that cannot (and should not) be implemented by a file system.
18
of DBMS’s, see Section 2.2. In the situation where multimedia objects are stored
in the file system, two access systems would interfere: the operating system’s file
access system, and the DBMS’s access system. A user with sufficient write privi-
leges would be able to erase or rename a multimedia file, causing the DBMS to be
confused because it cannot find the file back. Also, it is tempting to make assump-
tions upon the location or the name of the multimedia file, for example to publish
the directory or the file onto the Web. However, if the DBMS manufacturer decides
to change the mapping of multimedia objects to the filesystem, then the application
breaks. This defies the independencies that were introduced by Codd (1970) when
he designed the relational model.
The previous section has shown what functionality a Database Management Sys-
tem offers, and what advantages a DBMS has. But what makes a database ‘educa-
tional’?
With “Educational Multimedia Database” is meant the database application
and the logical contents of a database system that allows users to store and re-
trieve multimedia learning materials (for example in the form of certain Units of
Learning Material). The database application can provide many different function-
alities: it can provide Intelligent Tutoring capabilities, drill-and-practice training,
an online encyclopedia of learning materials, a course-materials database, etcetera.
Chapter 6 will go into further detail on the architecture of the database application.
As explained, an educational database is not an end-application. Instead, it is
a component of a larger whole. This ‘larger whole’ can be denoted as a digital
learning environment. It is believed that a thesis about educational multimedia da-
tabases should also cover the most important aspects of introducing digital learning
environments that utilize multimedia materials into education.
This section will also discuss some educational issues that arise when imple-
menting hypermedia systems in general in education. These are important issues
because introducing an educational multimedia database does not simply mean dig-
itizing the syllabi, or creating digital movies from recordings of classes. Instead,
the courses must be “re-engineered”, that is: redesigned from scratch with a new
pedagogy in mind. Section 2.3.1 will show how an educational database can be
used in education. Three pedagogical issues are important for this: learner control
(Section 2.3.2), the purpose of multiple media in education (Section 2.3.3), and
some issues on hypermedia (Section 2.3.4).
19
2.3.1 Use modes
In this section, three modes in which an educational database can be used by learn-
ers and teachers will be described: as an encyclopedia of learning materials, as a
courseware database, and as a presentation tool.
The learners are doing a (design) project for which they have to define
information- or knowledge needs themselves, and then try to fulfill these
needs. They access the database as a kind of digital encyclopedia, the differ-
ence with a true (digital) encyclopedia being that the material in it is specif-
ically designed for educational purposes for a specific target group and a
specific educational level (or specific educational leves). In other words, the
encyclopedia is less general than a regular one.
The learners are engaged in a course that requires certain learning materials
to be studied. These materials can support drill and practice learning, but
can also contain explorative materials like simulations.
Courseware Database
Some institutions, such as the University of Twente, provide online course materi-
als for some courses3 . Students can select a course to study and retrieve learning
material that belongs to this course.
In this mode of use, as well as the previous one, auditing information can be
generated such as: what learning material did which student examine? Which an-
swers were given by the students to questions the material posed? Which simula-
tions did the student study and did he or she reach certain objectives? This auditing
information can be logged into the same (logical) database as the database of learn-
ing material, but in some cases it may be better to create a separate database for it.
If the auditing information is managed by a different system, such as an instruc-
tional management system as specified by the IMS project (see Appendix A.2),
then the auditing information needs to be transferred from the courseware database
system to the instructional management system.
3
http://teletop.edte.utwente.nl
20
Presentation Tool
Sometimes, the teacher may want to present multimedia material in the classroom.
Instead of trying to transfer the learning material from the database system to a
presentation tool such as Powerpoint, it would be easier for the teacher if the data-
base system would be able to enter a presentation mode. Teachers sometimes use
a web browser as a presentation tool, so if the database system is able to present
the learning material via the web (which is desirable from a telelearning point of
view), then no extra design efforts are necessary.
One of the supposed advantages of hypermedia is the proposition that the con-
trol the learner has on his own study pace has a positive effect on the learning
results. This, however, has not yet been empirically proven. Niemiec, Sikorski,
and Walberg (1996) conclude after reviewing a large body of learner control litera-
ture: “Although learner control has theoretical appeal, its effects on learning seem
neither powerful nor consistent”. But Hannafin and Sullivan (1995) conclude from
experiments that “learner control may be more appropriate for high-ability learners
than for low-ability ones” (p. 28), as high ability learners appeared to choose more
instruction voluntarily, and thus performed better on post-tests. Similarly, Young
(1996) found that learners with good “self-regulated learning strategies” performed
better in learner-controlled situations.
There is, however, also a risk in precisely matching the learners’ needs: they
are not challenged to adapt to the amount of materials that is offered, so they will
not learn to filter if there is too much material. This effect is called “learned help-
lessness’.
Many experiments seem to show increased learning effects when learners se-
lect and choose their own materials, but at least one confounding variable is the
Amount of Invested Mental Effort (AIME): the higher this amount of effort, the
better the learning effects are (Salomon, 1984). The amount of effort learners in-
vest in materials, appears to depend on how difficult the learners think the learning
task is (“Perceived Demand Characteristics”) and how well they deem themselves
fit to perform this task (“Perceived Self-Efficacy”).
A possibly negative aspect of learner control is the fact that learners have to
focus on learning and on navigating simultaneously. If the navigation task is not
easy (i.e. not intuitively), then the learning task is too much interfered by the
navigation task. This type of interference is called retroactive interference (Bower,
Thompson-Schill, & Tulving, 1994).
21
2.3.3 The Media Debate
As stated in Section 2.2.1, anyone using multimedia data may benefit from a mul-
timedia Database Management System. But what are the consequences of using
multimedia in education?
Clark (1983) wrote a paper that sparked a heated discussion among scientists
whether or not using multiple media (perceptional channels) would increase learn-
ing. After reviewing the literature, he concluded that effects could not be shown,
and that research results that did show effects were confounded by the newness of
the medium, by methodological flaws, or had other errors. In essence, he states
that if two identical messages are presented in different media, then there will be
no noticable effect. As watching a movie may sometimes even cost less mental
effort than reading a book, the learning effect may even be less when watching a
movie due to the effects of AIME (Salomon, 1984).
His opponents, however, state that new media can bring new types of messages:
a video of a moving piston in a combustion engine can be more insightful than a
paragraph of text that describes this movement. Also, providing complementary
information via multiple “channels” can have a positive effect (Park & Hannafin,
1993). There is, however, a risk of “cognitive overload”: the learner is getting too
much information via too many channels. The point at which this occurs depends,
amongst others, on how familiar the learner already is with the presented materials
(Park & Hannafin, 1993).
Another way to utilize multimedia materials is to exploit the richness of the
media: in some situations, video sequences can show how things work better than
words. For example, a video or animation can show how a piston engine works:
when do the valves open, how is this moment determined, and at which moment is
the gas ignited. Video fragments can also illustrate social cues and gestures during
a conversation, which would have been hard to explain on paper with words or
figures.
Simulations are also a good example of exploiting modern media in education.
Using a simulation, learners can modify parameters themselves and see the results
of their actions. Medical students, for example, can inject various amounts of
medicine into a “virtual” patient, and see the effect on the heart beat rate, blood
pressure, and various concentrations of chemical substances on their screen without
harming human beings or animals.
This thesis will take notice of Clark’s warning that just using multiple percep-
tional channels will not guarantee an increased learning effect. It is believed that
using multimedia learning material in education is only useful if the educational
message could not have been formulated as well without the extra medium.
22
2.3.4 The Hypermedia Myth
2.3.5 Discussion
23
2.3.6 Conclusions
From the observations made in this chapter, some conclusions can be drawn:
Learners should be ready for self-control. The literature suggests that mul-
timedia learning systems that provide a lot of learner control should be used
with learners that have sufficient meta-cognitive capabilities. An educational
database can also be used in less hyperlinked modes: as a presentation tool
or as a resource of learning materials. The precise amount of learner control
is a matter of professional expertise.
So, the students may benefit from an educational database due to the indepen-
dence of time and place, but there may also be disadvantages due to complex hy-
perstructures, reduced interaction with the teacher and inappropriate use of (hyper)
media and learner control. It is up to the expertise of the teacher to find the right
balance of these instruments. An educational database system should not interfere
with this expertise, and support the teacher where ever possible.
24
Chapter 3
In the past, many projects have investigated theoretical and practical issues of edu-
cational database systems. In order to build upon the results of these projects, they
have been studied and design methodologies and principles have been extracted.
For a complete overview of the projects, the reader is referred to Appendix A; for
clarity, the abbreviated project names will be used here. This chapter will discuss
the aspects that are of most interest to the current research: the labeling methods,
the search interface, quality and validation techniques, the functionality of the sys-
tems, and the theoretical models of learning material that are used.
Definition 3.1 A metadata field F is a tuple (V; t) where t is the title of the label,
and V is a set of values.
Definition 3.2 A metadata value is a set of values vV where V is the set of
values of a metadata field F .
25
For example, if V equals to f‘primary school’, ‘secondary school’, ‘high school’,
‘university’g, then a metadata value ‘primary school’ can be assigned to learning
material to denote the school type for which it was developed.
Often V is a predefined set, also called a ‘vocabulary’. Some labels however,
are so-called ‘free text’, which indicates that any text string is allowed. In these
cases, V is the set of all text strings of finite length.
Throughout this thesis, the term ‘labeling system’ will be used to denote the
( )
set M of metadata fields V; t that is attached to a learning object to characterize
it.
26
Table 3.1 presents an overview of what educational fields have been used by
what projects (as the projects tend to update their metadata sets based on their
experiences, this overview provides a snapshot taken in the summer of 2000), so
that insight can be gained into metadata fields that the projects found important or
usable. Only the metadata fields that could be relevant to teachers are shown, and
fields that are too technical or not relevant considering the intended use mode of
the database system are omitted. Table 3.2 shows the more generic metadata fields
that were used by the projects.
Table 3.1: Overview of what educational fields were used by what projects.
Field ESM GEM CSTC ENC ARIADNE PEDAGO
pedagogical quality x
learning objectives x
subject x x x x
completeness x
educational function x x x x x
difficulty x x
school level x x
audience x x
duration x x
semantic density x
grade x x x
27
Table 3.2: Overview of what generic fields were used by what projects.
Field ESM GEM CSTC ENC ARIADNE PEDAGO
author x x x x x
abstract x x
description x x x
resources needed x x
keywords x x x
copyrights x
costs x x x x
geogr. region of origin x
that were encountered during the study. Note that although a large body of litera-
ture exists on the theory and practice of human-machine interaction, screen design,
and search methods, this literature will not be discussed here as the focus is on
getting insight into what search methods educational technology designers found
appropriate in past research projects, so that the results of the current research can
be put into a perspective.
28
the result list consists of only ‘perfect matches’: learning objects whose metadata
fields match all of the specified field values. Small deviations are not tolerated,
while this may be acceptable to the user: a learning object that was developed for
first-year students may be well usable as introductory or refreshment material to
third-year students.
29
The main disadvantage of this is that the metadata typists may have problems
identifying representative keywords for the learning objects, or to find appropri-
ate values for educational metadata fields. In the library world, authors sometimes
write the keywords themselves in the cover pages of the book. Translated to the
educational world, this would mean that teachers have to add keywords and edu-
cational metadata fields to the learning objects. The current research will therefore
assume that teachers enter these metadata fields.
3.4 Functionalities
Most educational database systems are a repository of learning materials that ed-
ucators can search (GEM, CSTC). Sometimes, the material itself is stored in the
database, sometimes hyperlinks are provided using URLs, and sometimes the me-
tadata refer to offline materials. Systems that provide hyperlinks as URLs run the
risk of pointing to non-existant URLs, as pages are often published on personal
homepages that disappear whenever the owner changes jobs. Also, reorganisa-
tions of directory structures and websites are a cause of malfunctioning links. It is
preferable to copy the learning objects into the database itself above merely link-
ing to them, as faulty links can be very frustrating to learners, discouraging their
motivation to learn. Another disadvantage of mixing online and offline materials is
that the user is often unable to indicate that he or she only desires online materials
(online materials can often be immediately downloaded, while offline materials are
much more difficult to obtain).
Some systems also include support for organizing course contents and include
personal and group communication facilities (such as the TeleTOP course manage-
ment system).
30
particles are (monomedia) pieces of text, audio or video in a particular lan-
guage;
elements are (monomedia) collections of particles in various languages;
molecules are collections of particles, so that they are multimedia and in
multiple languages;
a Unit of Learning Material is a collection of molecules.
The ‘pedagogical aspects’ of the ULM are covered by the ULM labels and the
relations of a ULM to other ULMs, contexts, topics, and keywords.
This elaborate structure can quickly become confusing to the user, as was dis-
covered during the ESM-BASE project. In spite of the users’ problems while using
the conceptual database schema, the researchers still considered the complexity of
the schema necessary to enable the database to handle large amounts of learning
objects (Persico et al., 1992).
31
instructional characteristics
interactivity management
and sequencing
raw data and
presentation
The IEEE Learning Object Metadata group defines a learning object as any entity,
digital or non-digital, that can be used, re-used or referenced during technology-
supported learning (IEEE, 2000). This is a very broad definition, which is under-
standable as the metadata standard this group is preparing should be applicable to
a wide range of learning environments.
3.5.5 Summary
32
3.6 Conclusion
Many educational databases have been built the past decade. Also, many different
ways to use these databases in practice have been explored:
A database of learning materials that is labeled upon entry, and that is re-
viewed by peers (CSTC, NEEDS). These reviews are added to the metadata
of the learning materials, and can be used as search criteria.
The database can contain fragments of learning material that can be ac-
cessed and copied online at will (IMS, Ariadne, Explorer), but other data-
bases may contain pre-packaged courses through which the user can search
(OLA, PHOENIX Web). Other databases contain only references to offline
materials and/or lesson plans (ENC).
The systems that have been discussed in this chapter share a more or less com-
mon way to search for materials: the user can choose from a simple, fast form that
usually only consists of a single field, and an advanced method that can be as elab-
orate as choosing how many, and which fields to search (Explorer). The retrieval
capabilities of the systems is rather simple: only “perfect matches” are shown, i.e.
learning objects that match all of the search criteria. As educational metadata often
consists of open vocabularies or vocabularies that can be interpreted in many ways,
learning objects that would be “perfect” cannot be found back due to improper la-
beling. Therefore, it would be better to use a retrieval mechanism that is less strict,
and that also returns results that are “nearly perfect”. This thesis will explore a
method to also retrieve these almost-perfect objects (see Chapter 7).
The fact that large database companies (such as Oracle) are investing in online
educational databases suggests that (at least from an industrial point of view) this
type of database is considered useful, and has a certain future.
Many metadata or labeling systems have been discussed. Although it is im-
portant to ask: ‘which one is the best?’, it is evenly important to create a standard
labeling system if the learning material is to be exchanged (reused) with other ed-
ucational institutions. Also, the labeling system has to be easy to understand to the
persons that have to enter the metadata, which are mostly the teachers. But as is the
case with so many standards, a learning object metadata standard will be a com-
promise of the voices of all parties involved, and hence it will not be the ultimately
best labeling system.
33
34
Chapter 4
Theoretical Framework
4.1 Introduction
Objects in a database represent particular aspects of objects in the “real-world”,
in database terminology also called Universe of Discourse (Elmasri & Navathe,
1989). The selection of these aspects (‘attributes’) greatly depends on the purpose
of the database system, and the choice of aspects can have a great impact on the
performance of the system, so one has to carefully choose how to model the real-
world objects.
In this chapter, the theoretical background is presented for one of the goals of
a multimedia educational database: to increase the amount of reuse of learning
material by developing an appropriate model of learning material.
Some of the models presented in the previous chapter are very elaborate, and
are very laborious to implement in an actual prototype. Also, the user may become
confused by the complexity of the model. As the model should be comprehended
by non-technical educational staff members, a model was developed that is less
complex, yet provides the necessary functionality of enabling storage and efficient
retrieval.
This thesis will try to fill this gap by using a simple ULM model and applying
a sophisticated retrieval mechanism (Chapter 7) in a prototype which will be field-
tested (Chapter 8).
35
simpler and thus easier to understand for teachers. In fact, this is a main viewpoint
in this thesis: the concept of Unit of Learning Material should be understandable
for teachers, and they should be the judge of what they consider a “unit” of learning
material. It is the teacher who will eventually have to use the concept to divide
learning material into pieces, so the teacher should be the ultimate authority. This
thesis believes that a technological system should assist the user, and not obstruct
his or her natural, intuitive way of working with the system. Inherently, this means
that the model is not very restrictive.
Some researchers believe that modeling learning material implies modeling
domain knowledge (Reigeluth, Merrill, & Bunderson, 1978; Reigeluth, Merrill,
Wilson, & Spiller, 1980; Hendley, Whittington, & Jurascheck, 1993; Horn, 1989;
Broeke, Zwart, Verhagen, & Rhemrev, 1994; Merrill, Li, & Jones, 1990). How-
ever, the name “Unit of Learning Material” indicates that it is not so much a unit
of knowledge, but more an object that has been specifically designed to achieve
learning. It is the message that the teacher uses to transfer that knowledge that we
want to capture in the ULM. Often, the knowledge contained in the message can-
not be distinguished from the message itself, especially in technical subjects: the
knowledge to be transferred is simply described (which is often hard enough as it
is) in the message. However, especially in the affective domain, the knowledge that
the teacher tries to transfer (e.g. insights concerning being tolerant to another hu-
man being) is very abstract, and difficult to describe. The knowledge is transferred
using for example an anecdote, an exercise, or a case study. These types of knowl-
edge cannot be modeled very easily, yet learning material that handles this kind of
knowledge should also ‘fit’ into the model. So, the model should not try to model
knowledge, but instead focus on the educational message that the teacher uses to
transfer the knowledge. This thesis assumes that the teacher is not just looking for
learning material about a certain subject, but instead that the teacher is looking for
educational materials with certain desired educational properties.
So, the current research does not try to model knowledge, or “to obtain com-
pleteness with respect to domain knowledge” (as was stated in the original research
problem, see Chapter 1) which is evenly difficult. Instead, this research will focus
on labeling ULMs with educational characteristics based on the contents of the
ULM in such a way that a teacher is able to retrieve them, re-purpose them, and
re-use them in a different educational context than they were originally designed
for.
4.2.1 Model
This thesis conceives a ULM is a multimedia building block that contains multi-
media content, metadata, relations to other ULMs, a history file and presentation
36
information (see Figure 4.1). These components will be discussed below.
content
active
Multimedia content
The multimedia content are the pieces of data that contain the actual content of the
learning material (comparable to the “elements” of the ESM-BASE project, see
Section 3.5.1):
audio clips in various coding formats (eg. wave formats (.wav), MPEG au-
dio, RealAudio1 ), which is denoted with the loudspeaker icon in Figure 4.1;
video clips (eg. quicktime, MPEG video, animated GIF, RealVideo) and still
pictures (eg. JPG, GIF, TIFF), denoted by the camera icon;
text, eg. ASCII, Microsoft Word, Portable Document Format (PDF), marked-
up text (SGML or XML), HTML documents, denoted by the icon with the
three lines of dashes;
“active” objects such as Java applets, Authorware applets, which are repre-
sented by the ‘active’ circle in the figure.
1
see http://www.realaudio.com
37
Metadata
A Unit of Learning Material not only carries educational content, but also meta-
information, for example who created the ULM, for what subject area is it intended,
or for what educational level was it created. A ULM can best be seen as a container:
on the outside, there’s a description of what is in it, while the interior is somewhat
hidden but consists of a unit of some kind. We will call the meta information
“labels”, as they are intended to help retrieving the ULMs for specific educational
purposes.
In the previous chapter, many different sets of metadata fields were presented.
Examining the number of projects that utilized some form of metadata, it can be
considered “common practice” to add metadata fields to learning objects.
Relations
Units of Learning Material often have some kind of educational relationship: ULM
A can be an example of ULM B, or A can give a deeper understanding of the subject
matter in B, etcetera. These relations have a specific educational purpose. In order
to make ULMs better accessible, these relations can be stored in the ULM, so that
when accessing ULM A it is also possible to easily access ULM B.
The relations can have more types than ‘example-of’; semantical relations can
also be used, especially if the ULMs represent concepts. For example, a ULM
representing a piston can have a relation “moves inside” to a ULM representing a
cilinder of a combustion engine. This idea was used by Elsom-Cook (1990) to build
a concept map consisting of ULMs and relations. This allows the student to browse
through the concept structure as if wandering through a “knowledge landscape”.
This application also touches the area of Intelligent Tutoring, enabling ULMs to
be used in a system that can autonomously decide what learning material is to be
presented to a particular student (Merrill, 1987; Li & Merrill, 1991; Brusilovsky
et al., 1996). However, as the current research focuses on the storage of learning
objects as opposed to knowledge objects, this topic will not be elaborated further
upon.
History
A Unit of Learning Material carries a history file which describes in what courses
the ULM has been used in the past. This file holds knowledge about for what
courses the ULM has been considered to be useful by some educational designers.
Suppose, for example, that a teacher is preparing material for a course on
recording information on magnetic surfaces for a Computer Science course, and
he or she poses a question to the database application to retrieve some material.
38
The teacher likes the material he or she gets very much, and would like to find
related information. The teacher then accesses the history information of the re-
trieved ULMs to see in what courses they have been used, and learns that the fac-
ulty of Electrical Engineering also has a course related to magnetic recordings. The
teacher asks the database what other material was used in those courses, and finds
a useful animation of the mechanics of a floppy drive. It will be clear that storing
history information can increase the reuse of learning material.
Presentation information
In order to properly display all raw data components (i.e. layout aspects) in the
proper sequence with the proper timings and on the proper screen locations, pre-
sentation information is needed in the form of positions on space- (screen) and time
dimensions.
This component stores information that says, for example, “play the second
audio clip two seconds after the first has ended”. These kind of rules can be noted
in for example MHEG (Meyer-Boudnik & Effelsberg, 1995), SMIL2 , or SGML3-
derived languages such as XML or HyTime (ISO, 1992).
Educational Purpose
The purpose of a particular collection of learning material may determine if the
collection can be considered a unit, or should be considered to form several units.
For example, a description of a chemical reaction can consist of many units (bits
2
see http://www.w3c.org/AudioVideo
3
Standardized General Markup Language
39
meta relations history
of theory, bits of examples, images) in a chemistry course, but it can also serve as
one large example of how descriptions of chemical reactions look like in a course
on scientific communication, and as such the description can considered to be one
large unit. In the first case, the educational purposes of the respective units are
’theory’, ’example’ and ’visual representation’, while in the second case the edu-
cational purpose is ’example’.
Personal preferences
Personal preferences may influence what an educational designer considers a unit
of learning material. Different designers may use different instructional design the-
ories to design instruction, and designers each have their own unique experiences
in designing instruction which may also influence their opinion about what can be
considered an educational unit.
This flexible definition suggests that almost every ULM will have an unpre-
dictable size, because each is created by different persons for different educational
purposes. This would seem to reduce reuse, as the desired size would not seem
to correspond easily with any ULM that is present in the database. There are two
cases:
the ULM the designer is looking for, should be larger than the ULMs he or
she has retrieved from the database. In this case, the designer could eas-
40
ily create a new composed ULM consisting of (a selection of) the retrieved
ULMs;
The ULM the designer is looking for, should be smaller than the ULMs he
or she has retrieved from the database. If one of these is a composed ULM,
then perhaps a sub-ULM fits his or her goals. If not, then a multimedia editor
may be needed to split the data in one of the ULMs into smaller parts and
create new basis ULMs using this data.
So, to meet the personal preferences and the requirements in unforeseen char-
acteristics, a very flexible unit-size has to be adopted. In the above two cases it
was shown that although there seems to be a danger of reducing the opportunities
for reuse, as long as editing facilities are available the educational designer can
still select those parts of a ULM that he needs. Due to the recursive definition of a
ULM, a designer can compose larger ULMs from smaller ones if he thinks this is
appropriate. The freshly created ULM is then also inserted into the database. Note
that the raw data (video fragments, audio clips) do not need to be copied; a simple
reference to these objects suffices.
There is an interesting trade-off between size of a ULM and its usability: a
large ULM is often more specific than a small ULM, so that the large ULM is
less reusable than the small one. Consider for example a ULM consisting of a
video fragment of 2 seconds showing a blue sky with some fluffy clouds, and a
ULM containing a video fragment of 20 seconds showing the same sky crossed
by a swarm of bees chasing the queen bee. The ULM of 2 seconds with the blue
sky can be used in learning material about weather forecasting, cloud types, the
filtering of specific colours by the atmosphere, or the lifecycle of water on this
planet. The ULM with the bees is actually only suited for learning material about
bees. So, the small ULM is more reusable, however it is difficult to put labels on
the 2-second ULM so that it can be retrieved for all courses we mentioned.
The only upper limit that exists on the size of a ULM is the maximum attention
span of the student. A ULM can be as large as one hour, and as small a picture plus
one sentence, which could take about one minute to study.
41
are presented before or after a certain ULM to “change” the context in which the
ULM was originally designed. This can be done by briefly introducing the learner
into the subject matter that is presented in the ULM, or to give instructions of how
to interpret the subject matter, what details to ignore, or what details to pay special
attention to.
Due to time constraints, this concept was not tested in a prototype implemen-
tation in the current research.
the nested nature of ULMs can provide in-depth material as well as material
that gives an overview of a certain topic, so that learners with different needs
can be served; as these ULMs are linked to each other, the user and/or the
computer software can easily move focus from one to another;
to allow users to easily grasp the concept, the model has been kept as simple
as possible while keeping the richest features.
The model borrows parts and pieces of existing models, so it is not very inno-
vative. It will be the working model of the current research, however, for building a
prototype and testing methods to increase the reusability of learning materials. But
first, the factors that affect this reusability of learning materials will be discussed
in the next chapter.
4.4 Reusability
Already in the beginning of the nineties, it was expected by some researchers that
reusability could be a key factor to solving high costs of multimedia learning ma-
terials. Persico et al. (1992) write, citing Olimpo:
42
Courseware re-usability is one of the most promising approaches to
the solution of key problems in courseware development such as high
development costs, unsatisfactory quality, and the need for tools for
fast prototyping.
Chalon (1994) agrees that due to reusability of media and multimedia components,
efficiency of authoring and the global cost-effectiveness of courseware production
is increased; see also Bloom (1995). The opportunities for reuse have greatly in-
creased since the Internet, and especially the World Wide Web, became popular as
a vehicle to deliver instruction during the second half of the nineties.
According to Persico et al. (1992), the following issues have to be addressed to
make a piece of learning material re-usable:
portability: the ability to handle with linguistic and cultural differences while
reusing the learning material for another target audience
Duval (1999, June) identifies five further factors to achieve reuse: designing for
reuse, self-containment, customizability, adaptability, and multilinguality. To achieve
an increase in reusability, insight is needed into how these factors increase reuse.
This thesis will organize the factors mentioned above into a hypothetical model,
called Formula-M (see also Hiddink (2001a)), of factors and how these could in-
fluence reuse. This model serves to position the efforts of the current research into
a theoretical framework, and to indicate where further work on increasing reusabil-
ity can find a starting point. Although it would be possible to try to validate this
model by undertaking an empirical study, this is outside the scope of the current
research. Where possible, however, the model will be supported with references to
existing literature.
43
accessability The ease with which Units of Learning Material can be accessed; if
access is very difficult, then it is unlikely that ULMs will be reused. As Ver-
hagen and Bestebreurtje (1995) write, the applicability of a large multimedia
database depends on the retrievability of the required information to cover
specific needs that can differ from person to person.
genericity One can imagine that a ULM is very specific to a certain subject area, a
certain educational setting or a certain teacher or class. These ULMs are less
likely to be reused than very generic ULMs. Designing for reuse, as Duval
(1999, June) calls it, will increase the genericity of ULMs.
opportunities This aspect refers to the amount of opportunities for reuse that exist
in the educational institution (i.e. the organization complete with the people
working in it and the culture that exists within the organization). One can
imagine an institution in which people are very eager to create and present
their own work, and just consider other people’s learning material inferior.
In such an organization not much reuse will occur.
These three aspects can be further refined; this refinement will be discussed in
the sections below.
Accessability
The ease of finding ULMs back is believed to depend on several hypothetical fac-
tors.
The labeling system (metadata fields) with which the ULMs are labeled
needs to correspond to the way teachers think about searching learning ma-
terial. If the teachers, for example, think in terms of pedagogical styles,
learning objectives and didactic strategies, then the labeling system should
allow teachers to use these terms in search criteria. If the labeling system
only allows keyword searches, then the ULMs will not be well accessible to
teachers, so that not much reuse occurs. As Rada (1995) states, a powerful
yet flexible description and classification schema is needed for the purpose
of efficient retrieval and reuse of ULMs.
44
If the search facility also uses usage history information to rate the appropri-
ateness of learning material, then the amount of usage of ULMs also affects
the amount of reuse: if no usage history information is present, there is a
lower chance that appropriate ULMs are retrieved, decreasing opportunities
for reuse.
Genericity
Several factors can increase the genericity of a Unit of Learning Material:
Designing for reuse (Duval, 1999, June)) can be achieved in many different
ways; one of the design principles that can be used, is to try to avoid refer-
ences to local institutions, companies, persons etcetera. Doing so will make
the material more generic and thus more reusable.
Another design principle is to create the material in different human lan-
guages, in other words: to make it multi-lingual.
The layout of a Unit of Learning Material can make it very specific to the
educational institution where it was developed; for example by the presence
of many logo’s, color schemes, and other details specific to the graphical
house-style of the institution (Hiddink, 2001a). Chapter 6 will present an
architecture that allows for the separation of content and layout, so that a
teacher that dislikes a ULM for its layout can still reuse the content, and add
his or her own layout.
Opportunities
There are several opportunities “external” to the ULM or the database that can
influence the amount of reuse.
45
for identical learning material that is applied in slightly differing contexts,
so that learning material for one level can be reused at the other. These
commonalities indicate opportunities for reuse.
The social relationships in an educational organization can inhibit the reuse
of learning material: if people do not wish to publicly show that they reuse
learning material created by others, then less reuse will occur at that organi-
zation. This phenomenon is commonly known as the “not being made here”
syndrome.
Also, legal factors such as copyright policies of departements of an organi-
zation can inhibit reuse: if the department has decided to ask royalty fees for
reusing materials, then other departments may be less eager to reuse their
materials.
4.5 Conclusion
The model of Units of Learning Material presented in this chapter has been com-
posed from key aspects of previous models. While not a very novel model, it should
provide sufficient modeling power to allow the entry of a wide range of learning
materials into an educational database. The chapter also presented a hypothetical
model for organizing factors that affect the reusability of online learning materials.
The model can be used to identify strong and weak points of models of learning
material, considered from the viewpoint of increasing reusability of learning mate-
rial.
This thesis hypothesizes that the model of Units of Learning Material facilitates
the reusability of online materials in a number of ways through the Formula-M
model:
the history information enables search methods based on the past and current
use of learning materials, making them better accessible;
the context adapters increase the genericity of ULMs, which in turn makes a
ULM more reusable;
the relations between a ULM further help the user in finding related ULMs;
46
search method
usage
genericity
presence of
context adaptors
layout
properties
subject matter
commonalities
opportunities
social factors
legal factors
47
the metadata that is added to the ULM allows for advanced search methods
to be used; see Chapter 7.
48
Chapter 5
5.1 Introduction
Chapter 2 introduced the reader to various aspects of using educational multimedia
databases: why are databases in general convenient to use, what are their proper-
ties, and what is the impact of an educational multimedia database and its applica-
tions on its users (students and teachers). After that, a model of learning objects
was discussed in Chapter 4 to answer Research Question 1 (see Section 1.4.1),
and identified factors that influence the reusability of these learning objects to an-
swer Research Question 2. As answering Research Question 3 requires insights
into search methods, this chapter will focus on search and retrieval methods that
have been developed in the past. The objective is not to provide an in-depth dis-
cussion of all existing techniques, but rather to provide a broad overview that can
be comprehended by technical as well as by non-technical readers. The purpose
of this chapter is to provide a context of existing techniques so that the retrieval
mechanism presented in Chapter 7 can be viewed in the proper perspective.
Figure 5.1 is proposed1 , depicting three sample data objects each with a meta-
data record. The figure serves as a framework along which various access methods
will be discussed. The object on the left side shows a certain content structure, vi-
sualized by two rectangles. The object at the lower part of the picture shows three
attributes. The arrows indicate where the various retrieval methods operate:
the content-based retrieval methods operates on the contents of the data ob-
jects;
49
content-structure
based retrieval
metadata-based
retrieval
content-based
retrieval
relational
query languages
attribute 1
attribute 2
attribute 3
a data object
50
relational query languages (such as SQL) operate on the attributes and inter-
object relationships;
Note that the figure presents an overly sharp distinction between the various
methods. In practice, almost all methods present themselves to the user as an en-
vironment in which (relational) queries can be formulated. Furthermore, metadata
are often stored as a data object so that there is almost no distinction between
metadata approaches and a normal relational database. Also, the content-based
method resembles the content-structure based methods, because both methods ex-
amine the contents of data objects. Still, for the purposes of organizing the different
approaches into one framework, these similarities will be ignored.
name name
number SSN
n 1
course taught_by teacher
The numbers “1” and “n” next to the taught by relationship diamond indi-
cate that one teacher can give many courses; it is a so-called “1:n” relationship.
51
This diagram can be checked for certain desired properties in a number of
‘normalization’ phases; the resulting diagram can be in the Third Normal Form
(3NF) (see Elmasri and Navathe (1989, p 376)) or the Boyce-Codd Normal Form
(BCNF, see page 380 of the same book) which are forms that make it easy to
translate the diagram to database tables. The database tables that can be derived
from the diagram in Figure 5.2 look as follows (the keys are underlined):
teacher
name SSN
course
number name trimester taught by
The fact that a teacher gives a course is denoted by the appearance of the so-
called ‘foreign’ key ssn of the teacher as attribute taught by in the course
entity. Each “row” in the database table now stores the data of one entity.
Data can be retrieved from a relational database using the Structured Query
Language (SQL), in which queries such as the following can be formulated: select
name from course where trimester=1. This query would retrieve all
courses that are taught in the first trimester. A query that would retrieve all courses
given by a particular teacher would be:
select course.name
from course, teacher
where course.taught_by = teacher.ssn
This retrieval method deploys exact matching: the data that is retrieved matches
the search specification exactly.
With the advent of databases that contain multimedia, however, retrieving data
becomes more difficult. If a database consists of a large collection of video, audio,
and text fragments, then there is no elaborate conceptual schema to refer to when
searching for information. For example, one could imagine a database of learning
material with only one entity type: ‘Video Material’, containing a large collection
of assorted video fragments. How can one find a video fragment that shows the
manual gestures a speaker makes to support a course on presentation techniques?
Many retrieval techniques that try to solve this type of problem have been devel-
oped. The next sections will try to classify these methods and techniques. The
discrimination that will be made, is to first differentiate between adding new data
as metadata on the one hand (Section 5.2), and deriving data (or features) from
existing data on the other hand. The derived data can be based on the data content
of the document (Section 5.3), on the structure of the document (Section 5.4) or the
conceptual structure of the information contained in the document (Section 5.5).
52
5.2 Adding Data
The technique that will be discussed here is providing data elements with metadata:
data about data. The metadata tell something about the data objects; for example
who created it, when, for what purpose, who updated the object at what time,
what’s in it, or how to use it. The metadata fields can be collected in an entity type,
but sometimes the metadata fields are added to the entity as attributes; indeed,
there is a thin line between what can be considered ‘metadata’ and ‘data attribute’.
The educational level of a Unit of Learning Material can both be modeled as a
regular attribute, as well as a metadata field. Generally, a field that is only used for
retrieval purposes but that is not made known to the user in an apparent way can be
considered metadata. For example, the average pitch of a sound is a characteristic
of the sound object that is generally not presented to the user, see also Section 5.3.3.
There is a wide variety of types of metadata that can be added to data objects,
and about as many methods have been developed to utilize the metadata for retriev-
ing data.
53
Annotations about what is happening in a particular video sequence (move-
ments, objects and their relations, meaning of these movements, etcetera
(Wiesman, 1999; Costagliola et al., 1995)). Figuring out “what is happen-
ing” requires human intelligence, as one has to interpret the video scenes
and see relationships between objects and movements. Therefor, this pro-
cess cannot currently be done by computers.
Spatial annotations: what is the shape of the forms in the images: rectangles,
circles, lines, etcetera. This process can be done automatically, as it only
requires recognizing simple shapes.
Processing Power
Content-based retrieval (see Section 5.3) requires the database system to analyze
the data, which can be very laborious, especially if the content-based retrieval al-
gorithm involves recognizing geometrical shapes. If, on the other hand, the query
processor only has to inspect metadata fields instead of processing very large raw
data objects, then it can process the query much faster. A metadata record of 1
Kbyte can contain a wealth of metadata elements, while a raw data object can eas-
ily be several megabytes, three magnitudes larger.
54
ent keywords for identical phenomena (Wold, Blum, Keislar, & Wheaton, 1996).
This problem is also known as the vocabulary problem (Furnas, T.K.Landauer,
L.M.Gomez, & Dumais, 1987).
A special type of metadata has been developed by researchers in the field of
Educational Technology: educational metadata. Section 5.8 will elaborate on this
type of metadata.
55
5.3.3 Formal Feature Extraction
Formal feature extraction is a relatively new technique. The idea is that multimedia
data has many so-called ‘formal features’ such as colour and texture distribution
and variances (Zaniolo et al., 1997; Faloutsos et al., 1994, p. 301), audio frequency
and amplitude distributions and variances, so-called envelopes2 , etcetera. These
characteristics, or features, can be easily measured. It has been proven that these
formal features can sometimes be mapped onto human, fuzzy characteristics such
as “scratchy noises”. These audio characteristics can help in discriminating video
framents, for example to identify sports videos (a lot of rapid talking and cheering)
from news reports (one voice that talks quite steadily).
These fuzzy characteristics can then be used in a query, which are translated
into formal features for the database to select multimedia data (Wold et al., 1996).
These features can also be modeled in a multi-dimensional space, where each fea-
ture is a dimension. Below, this will be elaborated upon.
Multi-dimensional Spaces
Documents can be mapped into a multidimensional space (sometimes also called
the Vector Space Model, see Grossman and Frieder (1998)), so that mathematical
techniques can be applied to aid the search process. Neural networks, for example,
can be used to classify the space, i.e. find clusters of documents that resemble each
other (Liu, Huang, Wang, & Chen, 1997).
Quite often, a user that requests multimedia documents from a database is look-
ing for documents that resemble something, or that are as near as possible to an
ideal document. To implement this notion of similarity, distances have been de-
fined in multi-dimensional feature spaces. Many different methods exist to map a
document into such a space: using concepts, terms, extracted features, etcetera. On
these spaces, a distance function is defined so that a query is a point in this space,
and a database management system can easily retrieve documents that are within
a distance from the query point. The problem of finding the nearest documents
within a certain vicinity of a query point is known as the nearest neighbour prob-
lem (Yianilos, 1998, 1992) or similarity search (Weber & Zezula, 1997; Seidl &
Kriegel, 1997).
The notion of similarity can also be used in systems in which the user tells the
system that he or she likes a particular document, a technique called ‘relevance
feedback’ (Boll et al., 1998, p. 183). The system can then try to find documents
that are similar by finding the nearest neighbors of that document.
2
the Attack, Decay, Sustain, and Release amplitudes and periods, which together describe the
overall envelope of an audio signal
56
5.4 Document Structure
This section will examine retrieval techniques that are based on the structure of
a document. This structure can be denoted using structure description languages
such as MHEG (Meyer-Boudnik & Effelsberg, 1995; Chalon, 1994) or SGML
(Goldfarb, 1990). SGML is in fact a meta-language: many markup languages
can be specified in SGML using a so-called Document Type Definition (DTD).
The Hypertext Markup Language (HTML) is one of the most well-known markup
languages that is derived from SGML. The more recent XML language is also
brought forth by SGML.
A Unit of Learning Material, for example, could be decomposed into an intro-
duction, some theory, a few examples, and a test. An Educational Markup Lan-
guage (EML) could be designed to capture this structure. The user can then refer
to the structure to specifiy documents that he or she is seeking, for example to an-
swer questions such as “give me all units of learning material that have at least two
examples, a test, and a theory part about programming in Pascal”.
The Hypermedia/Time-based Structuring Language (HyTime) can be used to
specify time- and space constraints (ISO, 1992), such as: “the second videoclip
has to start 20 seconds after the first one has started, and it should appear to the
left of the text”. These characteristics are very useful when presenting multimedia
objects onto the screen, but are less useful when searching a database.
57
specific characteristics is needed (Li & Merrill, 1991).
This approach saves development and maintenance costs, as the author of
learning material can design courseware at a high conceptual level, and auto-
matically generate actual multimedia material that matches the conceptual design
(Hendley et al., 1993).
A slightly different technique is to map all data elements into a naming hierar-
chy. For example, the subject areas of a Computer Science curriculum could be
organised as follows:
Computer Science
State Machines and Languages
...
Computer Networks
Network Architectures
Protocol Design
Protocol Validation
Database Systems
Relational Database Systems
Hierarchical Database Systems
Network Database Systems
3
see http://lcweb.loc.gov/marc/
58
5.6 Generic techniques
There are also some techniques which operate on all locations indicated by the
arrows in Figure 5.1. These will be discussed below.
5.7 Discussion
Globally, there are two options for retrieval mechanisms in educational databases:
adding new information, or extracting information from existing data. Techniques
from the Content-Based Retrieval can be used, for example, to index multimedia
objects based on their content. Teachers can then, for example, retrieve video
objects that contain cows and meadows. However, in Section 4.2 it was stated
that the Unit of Learning Material should not be considered to be a unit of subject
matter, or a unit of knowledge. This thesis proposes a different approach: to treat
59
a learning object as truely an object that has been created for learning purposes,
and that the teacher should be able to search for learning objects instead of just
subject matter about a certain subject or whose (moving) pictures contain certain
objects. The Content-Based Retrieval techniques are inadequate to do this: there
is no known algorithm that can determine, by looking at the content of a learning
objects (text, video, images) what educational objectives the object tries to reach,
or for what target group the material is designed, or how much interactivity the
object provides.
Determining these properties can currently only be done by human beings,
adding these labels manually. Of course this is a subjective task as there are many
different views on learning (such as constructivism and behaviorism), and each
view may have its own set of terms to describe learning objectives or target audi-
ences. Yet, if an agreement between these different views is found, or constructed,
then manually assigning educational metadata labels may be a useful tool for re-
trieval purposes. The next section will discuss educational metadata in some more
detail.
60
a generic metadata scheme for networked resources, and to reach a consensus on a
core set of metadata elements. The elements that were defined are:
Title The name of the resource, or a descriptive phrase if the resource does not
have a name.
Author The person(s) primarily responsible for the intellectual content of the re-
source.
Publisher The agent or agency responsible for making the resource available.
OtherAgent The person(s), such as editors and transcribers, who have made other
significant intellectual contributions to the resource.
Form The data representation of the resource, such as Postscript file or Windows
executable file.
Source Objects, either print or electronic, from which this resource is derived, if
applicable
Coverage The spatial locations and temporal durations characteristic of the re-
source.
Values for each of these elements can be added to an online resource. There is not a
fixed vocabulary from which the values should be taken. It is possible, however, to
specify this vocabulary to avoid ambiguous interpretations of the element values.
The list of elements is extensible so that other fields (e.g. the educational field)
can add specific elements while still maintaining compatibility with the Dublin
Core. The Dublin Core workshop series are still being held each year, and on-going
work includes amongst others the deployment of the Dublin Core for educational
resources.
61
Some of the fields that are in the Dublin Core have also been explored by educa-
tional database-projects (see Section 3.1.2), and some projects discussed in Chap-
ter 3 extended the Dublin Core with educational metadata fields. In Appendix A
an overview of these projects is given, including a diagram of the relationships
between the projects concerning the metadata fields (see Figure A.2). The figure
shows that many projects were inspired by the Dublin Core, and that many projects
eventually lead to the common standardization efforts undertaken by the IEEE Me-
tadata group.
62
learning objects to which labels, or metadata, are attached.
The IEEE metadata standard is such a labeling system; it consists of many
mandatory and additional voluntary labels (see Appendix C.3). It would take con-
siderable time for a person to assign values to all these labels, doubting the feasabil-
ity of such a scheme. In the Ariadne project, this task was assigned to metadata
typists that were not necessarily an expert on the subject matter. However, the per-
son entering the metadata should be a subject matter expert, because one of the
most important metadata fields is the list of the main concepts that the learning
material is about. Only a subject matter expert will be able to accurately describe
these concepts, especially if the learning material concerns subject matter at an aca-
demic level. The consequence is that the teacher will have to enter the metadata,
because in academic settings he or she is the subject matter expert of his courses.
However, one could propose that teachers will never voluntarily label learning
material, and that therefore database systems based on labeled learning material
will fail in practice. This proposition will be called the Voluntary Labeling Prob-
lem.
It is believed to be no problem for the following reasons:
In the world of libraries, objects have been labeled for decades. People are
hired to categorize objects and make sure they are stored in a logical man-
ner which enables library users to find the books back. Similarly, personel
can be hired to label educational materials and store them in an educational
database, or teachers can be trained to do this.
There are examples of projects that use a system in which the teachers have
to label learning material themselves (such as the GEM6 system), and that
show that teachers are willing to do this.
5.9 Conclusion
In this chapter, an overview was given of existing retrieval techniques. Globally,
two methods can be identified: adding new information to the data, and extracting
6
see http://www.thegateway.org
63
information from existing data. Educational metadata usually add new informa-
tion to certain learning objects, and are in the process of being standardized. This
enables the world-wide exchange of learning objects between large educational da-
tabase systems. To generate methods and principles that support this process, and
because educational metadata are a feasible way to implement search algorithms,
the remaining part of this thesis will be based on educational metadata as is cur-
rently being standardized by the IEEE.
In the next chapter, a prototype of such an educational database system will be
described. This prototype will be used to test (Chapter 7) and validate (Chapter 8) a
method for improving the retrieval of learning objects using educational metadata.
64
Chapter 6
A Prototype Architecture
6.1 Introduction
The previous chapters have discussed various models of learning objects, and a
novel model was proposed in Section 4.2. Also, various access methods were de-
scribed in Chapter 5, and it was concluded that educational metadata are a suitable
way to make learning materials retrievable.
This chapter will describe a prototype multimedia educational database system
that was built as a “proof-of-concept” for the model of a Unit of Learning Material,
and that provided an experimental environment for research on a distance measure
that uses educational metadata fields; Chapter 7 will elaborate on this distance mea-
sure, and Chapter 8 will describe the experiments performed using the prototype.
The prototype also served as a vehicle to test architectures of online multimedia
delivery applications, so that answers to Research Question 3 can be found. This
architecture allows for the separation of content and layout, which is hypothesized
to increase the reusability of online learning materials (see also Section 4.4.1).
To make sure that the prototype implementation will meet certain goals, a re-
quirements analysis will be made so that an explicit statement about these require-
ments exists; the final design can then be checked against these statements to ensure
that all requirements are met by the design.
The requirements analysis is described in Section 6.2. Section 6.3 will review
some existing architectures: of what functional components does it consist? How
do these components interact with each other? After that, the prototype architecture
will be presented in Section 6.4. The architecture was implemented in a prototype,
which is discussed in Section 6.5. Section 6.6 will review the question whether
large binary objects should be stored inside the database tables or outside, based
on the experiences with the prototype. Finally, Section 6.7 concludes the chapter
65
with some final remarks.
6.2 Requirements
This section will describe the requirements that the prototype has to meet. It is
structured according to recommendations for software requirements specifications
of the Institute of Electrical and Electronics Engineers (IEEE) as quoted by Be-
hforooz and Hudson (1996, p. 114). First, the functional requirements are pre-
sented in Section 6.2.1: what is the prototype supposed to do? Then, Section 6.2.2
will discuss interface requirements. After that, Section 6.2.3 will discuss perfor-
mance requirements. Finally, Section 6.2.4 will present some design constraints.
Note that as this thesis does not intend to present the entire development process,
only the most important requirements will be discussed.
Requirement 6.1 The prototype’s basic functionality is to allow the entry, config-
uration, and retrieval of Units of Learning Material.
66
flawlessly with the other installed components. Therefore, it is desirable that the
prototype requires as few extra software components as possible at the end-users’
computers. If the prototype is to be accessed via the World Wide Web, then a
standard web-browser should be the only component to be installed.
Requirement 6.2 The prototype shall require as few software components as pos-
sible to be installed at the computers of the end-users.
Requirement 6.3 The prototype shall be built as a network resource that can be
accessed simultaneously by concurrent users.
Solving Statelessness
A very suitable network-based mechanism to disclose multimedia database con-
tents to many users exists: the Hypertext Transfer Protocol (HTTP). This is the
protocol used by the World Wide Web to transfer hypertext documents. How-
ever, if this protocol is to be used to implement the interaction between a learner
and the application, then problems can be expected: the HTTP protocol was de-
signed as a document retrieval protocol, not as a delivery platform for interactive
67
network-based applications. For example, the HTTP protocol assumes that the
server (the network host that offers documents for retrieval by clients) is state-
less, which means that it does not store any information about the clients. This is
not needed in the context the protocol was originally meant for: the client sends
a request for a hypertext document to the server, and the server sends that docu-
ment back to the client. The HTTP protocol is, however, currently used for much
more advanced applications than just requesting hypertext documents; the HTTP
protocol is also used for interactive applications such as online shopping systems.
This type of applications are not stateless, and the server will need a method to
remember the per-client state. For example, a webserver that wants to keep track
of a shopping cart for a customer that visits an online shop, needs to store a list
of items that the customer has in his cart. An online learning system has similar
problems: when a learner starts a session, the learning system has to remember
the username, the current position of the user in the course, the history of what the
user has done to generate navigation, and so on. Often, this problem is solved by
making use of so-called HTTP cookies. These are small messages that an HTTP
(web) server sends to the client’s web browser, containing a name, a value, an ex-
piry date and the server’s domain name (such as utwente.nl). The cookie is
deleted from the client’s computer after the expiry date. A web server that is in
the cookie’s domain can ask the value of the cookie. Using this mechanism, a web
server can for example store a cookie with name “SessionId” and value “526447”
on the computer of the user when the user logs in on an online shop system and is
assigned shopping cart number 526447. When the user then clicks an item to be
put in his or her shopping cart, the web server can request the value of the cookie
with the name “SessionId”, and it will receive the reply “526447” from the user’s
web browser. The web server then knows in what shopping cart the item is to be
put. This way, the web server is able to track the actions of a user, in spite of HTTP
being a stateless protocol.
Although these observations have already indicated how the issue of storing
state information can be solved, a requirement on this issue will be formulated
below:
Requirement 6.5 The prototype will implement state information for each client
to track the client’s session.
Search Facilities
One of the goals of the prototype is to explore new search facilities utilizing ed-
ucational metadata. In one of these search facilities, the prototype will use the
measure of relevance that will be developed in the next chapter to enhance the way
68
the search results are presented.
Requirement 6.6 The prototype shall be designed so that a distance measure can
be incorporated into the search algorithms, and so that experiments can be done
to validate the distance measure.
Navigation
As mentioned in Section 2.3.2, learners may become distracted by the navigation
task when using an interactive learning environment. To avoid this task to be too
difficult causing the learners to become “lost in hyperspace”, this task has to be
intuitive and clear. To achieve this, the following requirement is formulated:
Requirement 6.8 The prototype shall provide sufficient navigational clues so that
the user will always know where he is, where he has come from, and where he can
go next.
Modularity
Modularly designing a computer system prevents errors: if modules are as inde-
pendent of each other as possible, then an error in one module does not as quickly
lead to errors in other modules. Also, designing independent modules allows for
easy replacement of modules, as long as it implements the same function calls with
the same functionality. A design that is sufficiently modular will allow modules to
be replaced without rewriting other modules. So, the next requirement is the fol-
lowing:
Requirement 6.9 The prototype should be designed in a modular fashion, allow-
ing for easy replacement of modules and easy error localisation.
69
Portability
A prototype of a learning environment that is supposed to enable the reuse of learn-
ing materials between departments or even between universities, should be able to
run on a diversity of Operating Systems. Also, within the Idylle project it was spec-
ified that all software should eventually be able to run on the Microsoft Windows
platform, while the researcher preferred the Unix-like platforms as a development
environment. So, the prototype should be able to run on a multitude of operating
systems without major modifications, that is, it should be portable. This leads to
Requirement 6.10:
70
Web browser Student
World Wide Web
HTML
ODBC
Database
<html>
<body>
<?php
$db = mysql_connect("localhost", "root");
mysql_select_db("mydb",$db);
$result = mysql_query("SELECT * FROM employees",$db);
echo "<table border=1>\n";
echo "<tr><td>Name</td><td>Position</tr>\n";
while ($myrow = mysql_fetch_row($result)) {
printf("<tr><td>%s %s</td><td>%s</td></tr>\n",
$myrow[1], $myrow[2], $myrow[3])
}
echo "</table>\n";
?>
</body>
</html>
As can be seen, the HTML tags are interleaved with SQL queries and result
processing code. It is very common to mix SQL queries and programming code;
see for example Elmasri and Navathe (1989, p. 204). The advent of HTML as a
layout language (although intended as a markup language) has made things worse:
71
now SQL queries are mixed with programming code and layout code. ASP works
similarly; see the following example3 :
<%
DO WHILE NOT sessionRecords.EOF
%>
<tr>
<td><p><% =sessionRecords("Name") %> </td>
<td><p><% =sessionRecords("Position") %> </td>
</tr>
<% sessionRecords.MoveNext
LOOP
sessionRecords.Close
%>
The commonly accepted layout of indenting the code for each loop, tag or
procedure fails so that the programmer cannot see the loops and tags clearly any-
more. In the PHP example, it cannot be seen anymore by looking at the indentation
where the <table> elements start which table elements belong in it, and where
the <table> element ends. This increases the chance of mistakes and errors, such
as forgotten element ending tags (such as </table>), unmodular PHP code (ev-
erything has to be written in a particular HTML file, although files can be included)
and so on. Similarly, in the ASP example it is impossible to correctly indent the
DO WHILE ... LOOP and the HTML table tags simultaneously. Properly in-
denting is one of the virtues of the art of programming.
Observe, for example, that the PHP script uses a mysql fetch row call.
This indicates that apparently the MySQL database is used. When the company
decides to use another database, then a programmer would have to edit the script-
file, and modify the queries and the database call without touching the HTML code.
This is extremely error-prone.
Another method that is popular on Unix-like operating systems is to use the
Common Gateway Interface definition (CGI) to execute a program (written in a
scripting language such as Perl, or an executable program) to process the request.
The CGI script would connect to the database, retrieve the data, format the returned
results, and give the resulting HTML page back to the Web server. It is totally up
to the CGI file how to generate the HTML file. The main disadvantage of this
technique is that to execute the CGI program (or script) a process is created each
3
taken from http://www.bann.co.uk/asp/global/connections.asp, an online ASP tutorial; some ir-
relevant code was omitted and the column names were modified to match the PHP example.
72
time a request to the CGI program is made. In most operating systems, a process
creation is a relatively slow operation.
Finally, many Database Management Systems allow web integration via extra
modules (sometimes also called plug-ins or datablades). These software modules
enable the DBMS to return query results as HTML code by using a (mostly pro-
prietary) script language. These techniques resemble Server-Side Scripting tech-
niques, but the languages used are less common.
These technologies are used in many application domains, amongst others the
delivery of online learning materials. The Blackboard system version 5, for exam-
ple, uses the MySQL database, the Apache webserver and Perl scripting for Unix
platforms. On Windows platforms, the Microsoft SQL server and the Microsoft
Information Server are supported.
The interaction events from the user (menu selections, filled in forms, navi-
gation events) are encoded in an XML document by the presentation layer, and
73
Presentation Layer
API XML
Navigation History
Manager Manager
ODBC SQL
74
then passed on to the Interaction Processor using the process API call. The In-
teraction Processor communicates with functional components that implement the
basic functionality of the prototype (as described in Requirement 6.1): editing and
“playing” courses, and making learning material accessible to the user via an en-
cyclopedia. These components are the Course Editor, the Course Player, and the
Encyclopedia. Other components are the History Manager (which traces the us-
age history so that the user can go back one or more events), and the Navigation
Manager which generates and manages the navigation events (satisfying Require-
ment 6.8).
These components create output in so-called Interaction Elements: tables, fill-
in forms, menus, warnings, errors, and ULMs themselves. These components are
capable of creating an XML representation of themselves (for an example of a
menu, see below). These XML parts are gathered by the Interaction Processor,
and are stored temporarily until the generateOutput call is made; the Interac-
tion Processor will then return the XML document containing the outputs of the
components. The presentation layer will use the XML document to create a screen
which the user can view and interact with. By encoding all interaction into XML
documents, these components are as independent of the presentation layer as pos-
sible (satisfying Requirement 6.9). This means that any presentation technology
can be used to create the user interface itself.
The forementioned components interact with a Database Abstraction Layer
that was provided to create minimal dependencies on a particular database system,
making it easier to switch to another database system in the future (see Require-
ment 6.9). It should be noted in this context that although JDBC provides a uniform
interface to the database management system, but it does not hide specific details:
each DBMS has its own dialect of SQL, which is almost never completely compat-
ible with other SQL dialects. The purpose of the Database Abstraction Layer is to
hide these dialect differences as well.
As an example, the CoursePlayer uses the Database Abstraction Layer to re-
trieve a list of courses that a particular student can do, and generates the following
SGML code:
This code is then given to the Interaction Processing Layer that will transform
it for example into HTML (for the Web), Xwindows API calls (for the Xwindows
75
environment), or AWT calls (for the Java Abstract Window Toolkit) to create a
presentation for the user.
6.5 Implementation
In the course of the project, a prototype implementation was continuously revised
and improved to test various ideas. The technical problems and their solutions that
were encountered during these efforts will not be discussed here, as the purpose
of this Section is to show that the architecture presented above can be success-
fully implemented. So only the final implementation will be discussed here. Sec-
tion 6.5.1 will present the software components that were used while constructing
the implementation, Section 6.5.2 will discuss how HTML code was generated by
the presentation layer, and Section 6.5.3 will explain how the interaction with the
user was handled. Finally, Section 6.5.4 will present some thoughts on scalability
issues.
76
and uses almost the same syntax, this decision has no impact on the architecture
itself.
To disclose the database contents via a computer network (to satisfy Require-
ment 6.3), the webserver “Apache”6 was chosen for reasons of stability, free avail-
ability, extensibility, and the fact that it supports Java and XML. Apache is one
of the most popular webservers at the time of writing. The webserver was ex-
tended with a Java Servlet Engine, which is a software component that enables
Java Servlets to process certain types of requests. The Cocoon servlet (also being
developed by the Apache community) was used to format XML documents; see
also Figure 6.4. Cocoon was chosen for its free availability and its compatibility
with the Apache webserver. It was the only XML publishing framework written in
Java that could be found.
The application-specific components, such as the Interaction Processing Layer,
the Course Player and so on, were implemented as a Java Servlet, called “DILE”
which stands for “DIstributed Learning Environment”.
These components were configured to operate together; see Figure 6.3. The
figure shows the WWW client interacting with the Apache server using the HTML
language. The Apache server also communicates with the Servlet Engine using
the HTML language. The Java implementation in the lowest box implements the
architecture shown in Figure 6.2. Note that the WWW client, the Apache web
server and the Cocoon formatting engine are part of the ‘Presentation Layer’, as
they are responsible for getting data across the network onto the screen of the user.
The advantage of implementing the application as a Java servlet is that the
Servlet Engine is capable of tracking HTTP sessions, satisfying the Statelessness
Requirement 6.5 (see Section 6.2.4). Also, the Servlet Engine runs as one process
for all users interacting with the prototype, so that no process needs to be created
for each request as is the case with CGI technology (see Section 6.3.2).
In the next sections, some more details on the inner working of the prototype
will be presented.
77
WWW client
Presentation Layer
HTTP HTTP
requests responses
MySQL DBMS
78
XML HTML
Interaction Cocoon output stream
Processor
Figure 6.4: Creating HTML output by applying style (XSL) to content (XML).
79
The fact that the output of the Interaction Processor is partly generated by send-
ing queries to a DBMS is fully hidden from this scheme, so that the presentation
layer does not rely on particular features of a particular DBMS, further increasing
the independence between software modules.
The ‘- O’ sequence indicates that the “opening” tag <menu> may not be
omitted and that the “closing” tag </menu> of the menu may be omitted. The
‘CDATA’ indicates that no further sub-elements are specified and that the element
can contain any (textual) data.
We have created a DTD containing elements for menus, forms, navigation, his-
tory, and other interaction elements. Documents that conform to this DTD should
be able to represent all possible output data of the prototype. Similarly, a DTD
was created for all possible input data of the prototype. Each SGML document
specifies in a “DOCTYPE” element to what DTD it conforms. For example, the
following SGML document describes the main menu of the prototype using the
“output.dtd” DTD:
<manager id = "1"/>
<menu title="Main menu" identifier="main-menu">
<menu-option ref="do-course"
name="Do a course"/>
Do a course
80
<menu-option ref="encyclopedia"
name="Go to encyclopedia"/>
Go to encyclopedia
<menu-option ref="edit-course"
name="Edit a course"/>
Edit a course
<menu-option ref="view-results"
name="View student’s results"/>
View student’s results
<menu-option ref="logout"
name="Logout"/>
Logout
</menu>
81
Figure 6.5: Appearance of the HTML code in a Netscape browser window
As can be seen, this HTML code is not “polluted” with database queries. The
menu shown does not include any information retrieved from the database. How-
ever a menu that consists, for example, of the names of some courses retrieved
from the database is similarly specified in XML, and similarly translated to HTML
by Cocoon. The HTML code appears as shown in Picture 6.5 on the computer
screen.
The user will then click a menu-option, which is translated to a URL by the
user’s web browser and that looks something like this: “?managerid=1&ide
ntifier=main-menu&type=menu&do-course=Do+a+course”. This
URL will be sent by the web client to web server (located in the presentation layer)
which translates it to an abstract representation in XML that again conforms to the
“input.dtd” DTD. The XML document then looks as follows:
82
do-course
</submission>
The interaction processor, that receives SGML documents from the presenta-
tion layer, now knows that the user has selected menu option “do-course” from the
menu that is identified with “main-menu”.
6.5.4 Scalability
What about Requirement 6.4 on scalability? The prototype implementation can
be scaled upwards in several phases. Some scalability is already included in the
Apache webserver: it will create more clones of itself as the load (number of URL
requests) increases; each clone is capable of handling one request at a time. If
the load decreases, then the clones are disposed of by Apache. The maximum
number of clones can be speficied in a configuration file. If the machine that the
webserver runs on has multiple CPU’s, then the clones will each use a different
CPU concurrently so that multiple requests can be handled at the same time.
All clones use the same Apache Servlet Engine and the same Java Virtual Ma-
chine. If this part of the system becomes overloaded, then several instances of
Apache can be started, each with its own Servlet Engine (Apache is not designed
to run multiple Servlet Engines). Each Apache instance will have to run at its own
network port i.e. one at port 8080, the next at port 8081, and so on, because al-
most all known operating systems do not allow multiple applications to listen to
the same port. Note that clients that connect to the system must specify in the URL
(using the protocol://host:port/path notation) to what port they want
to connect, which is undesirable because learners do not want to be bothered with
network port numbers. This problem can be solved by setting up Apache servers at
different machines, and assigning these machines the same Internet hostname8 . It
would also be possible to run one Apache server for each department. Note that the
Apache instances will still all connect to the same MySQL database, thus sharing
one large collection of learning materials.
Another bottleneck that can occur is the Database Management System (DBMS),
because it has to transmit the multimedia data to the database application, a task
that is very IO intensive. The current version of the MySQL database cannot
be replicated, although a “master” DBMS can be setup to which all updates are
sent; one or more “slave” DBMS’s are updated from the master, and non-updating
queries (“read only” queries) are sent to the slave(s). The Ariadne Knowledge
Pool system (see Appendix A.9) also uses this technique. Commercial databases
8
This can be achieved by implementing a round-robin IP number assignment scheme in the Do-
main Name Server (DNS).
83
sometimes do implement replication, but this option is often very expensive. A dis-
advantage of the master/slave configuration is that changes in the database are not
immediately visible to the user; true replication does not have this disadvantage,
although a slight delay (several minutes) may occur. Another disadvantage when
using the MySQL database was that the version used did not support transactions9 .
This means that when a ‘mirror’ (an exact copy of the database contents) has to be
made, the entire DBMS has to be shut down (or all tables locked) to avoid update
queries that could make the mirror inconsistent. As the prototype implementation
uses JDBC, it is not very difficult to use a DBMS that does implement replication
or transactions.
If all three stages are implemented, multiple complete “stacks” (database, DBMS,
Dile, and Apache) can be setup on different computers to spread the load. If a
DNS round-robin IP assignment mechanism is used, then all machines appear to
be the same Internet host; a technique that is also used often for websites of large
corporations (e.g. www.microsoft.com). Another technique is to use HTTP-level
redirection using the META REFRESH tag, which redirects a web client to another
(random or least-loaded) web server, so that in the learners’ perceptions there is
only one website.
84
files, and download both. This defies the purpose of the cache. Similarly if a stu-
dent accesses the materials twice, the MPEG object may again be stored in two
files, which will each be downloaded because the web browser’s internal cache
will not be able to notice that the two files are identical. Another, although smaller,
problem is that it is difficult to determine when these temporary files are no longer
needed. The database application has to have a policy for deciding when these files
can be safely removed, e.g. when they have not been accessed for one week, or
when the original multimedia object that is in the database has been modified (to
avoid inconsistencies).
Although these problems illustrate that consistency problems may arise when
the self-contained nature of a DBMS is abandoned, it may still be better to store
multimedia objects outside the database when accessing the database contents
through the web. Storing objects in files prevents object-to-file mapping prob-
lems, so that WWW cache mechanisms operate as intended, and also avoids the
need to copy the database objects into files before they can be transmitted across
the WWW.
The aforementioned problems can be solved by making sure that the database
application has a 1:1 mapping of multimedia objects onto temporary files, and that
there is an appropriate policy to remove unused files.
A better solution, however, is to maintain the self-contained nature of a DBMS,
and to require that the DBMS is able to send its multimedia objects directly to
the web browser without storing it in a file, and using a unique and permanent
URL for each object. This URL can have the form of http://dbms.host.name/?id=
identifier with identifier being the unique object identifier of the object within
the DBMS. The DBMS has to make sure, however, that only users that have read
access to the object are allowed to retrieve the object (for example by using HTTP
cookies).
6.7 Conclusions
In this chapter, nine requirements were identified which together, when fulfilled,
should yield an architecture of a modular, scalable, extensible, flexible digital
learning environment that will allow a teacher to efficiently retrieve learning ma-
terials, add his own layout to the content, and make these available to students via
the network.
Three implementations allowed us to test and revise the architecture, eventu-
ally yielding an architecture that fulfills all of our requirements. While writing
the implementations, it became clear that the database-web integration solutions
that are currently commonly accepted (such as PHP and ASP) are not consistent
85
with proper system design. Therefore, this thesis has adopted an approach using
XML and Java servlets that enables to separate database issues from content lay-
out issues. This leads to a modular software product that is easier to maintain and
extend.
The architecture allows for the separation between content and layout of online
learning materials. This thesis proposes that this helps to increase the reusability
of learning materials: a teacher that dislikes a ULM for its layout, may still be able
to reuse it by replacing the layout with his or her own layout.
The question whether to store the objects inside or outside the database was
reviewed based on experiences with implementing the architecture. This lead to
a requirement for “web-enabled” Database Management Systems: they have be
able to disclose their objects via the WWW (i.e. using the HTTP protocol) while
retaining the privilege system.
86
Chapter 7
7.1 Introduction
In Section 4.4 a model for the reusability of online learning materials was pre-
sented. Reuse is a desirable phenomenon, because it saves the costs of producing
(potentially expensive) learning materials. One of the factors that was hypothesized
to affect the reusability was the search method: the better the search method, the
better a teacher is able to find relevant Units of Learning Materials (ULMs), and
the more reuse will occur. For the purpose of organizing the research, Research
Question 3 has been formulated: it concerns the possibility to develop a search
method based on educational metadata. To address this question, this chapter will
propose a distance measure that tries to predict the usability of search results to a
teacher. Using such a prediction, the teacher should be able to search a database of
learning materials effectively, so that the reusability of the learning materials that
are stored in the database is increased.
This chapter is structured as follows: first, Section 7.2 will explain how the
concept of ‘distance’ can be seen in relation to educational metadata. The sec-
tion will explain that the relative importance that teachers assign to characteristics
of learning material should be taken into account, so Section 7.3 will present a
research effort to investigate this relative importance. This knowledge will be in-
corporated into a mathematical distance measure in Section 7.4. Section 7.5 will
summarize the results of this chapter.
The knowledge obtained in this chapter will be used by the next chapter to
construct and validate the distance measure.
87
7.2 Proximity in Metadata Spaces
Section 5.9 concluded that educational metadata provided a feasible way to im-
plement a search algorithm. However, a problem that may arise is the following:
suppose that a teacher is looking for a Unit of Learning Material on a certain sub-
ject with moderate interactivity, and approximately 20 minutes to work through.
If a search algorithm would try to find a ‘perfect match’, that is a ULM whose
metadata fields have exactly the same values as specified by the teacher, then only
ULMs whose metadata field indicate that the ULM takes 20 minutes would be re-
turned to the teacher. But ULMs that take 18 minutes, or 24, could also be of use to
the teacher. Similarly, if the ULM has more interactivity than ‘moderate’, as speci-
fied, then it could again still be a useful ULM. In general, the ‘closer’ the values of
the metadata fields of a ULM are to what the teacher has specified, the more useful
the ULM is.
There is, however, another reason why it is difficult to implement a search
method based on a perfect match: ULMs are entered by different people, who each
have their own implicit definitions of ‘interactivity’, ‘difficulty’, and who make
their own estimations of the duration of a ULM. Thus, if two persons would fill
in the metadata values of a ULM, then it may very well be possible that different
results are obtained. So some differences in the values of the metadata fields should
be tolerated, because the values are only estimates. This tolerance is translated into
the need for a search method that finds imperfect matches, i.e. ULMs that do not
precisely match the search specification, but that come close. The remainder of
this thesis will call the (possibly non existent) ULM that precisely matches the
search specification the “ideal ULM”, because this is what the teacher is ultimately
looking for.
The following sections present the concept of ‘distance’ in so-called “metadata
spaces” to obtain a measure of closeness of a ULM as compared to what a teacher
has specified. Using this distance measure, a search method can be developed that
is able to find learning objects that ‘match’ or ‘almost match’ the search specifi-
cation entered by the teacher. The search algorithm that will be developed in this
thesis will calculate the closeness (distance) of each ULM to the ideal ULM (the
one that would exactly match the search specification). This distance can be used,
for example, to return the ten ULMs that are closest to the ideal ULM, sorted in
order of increasing distance; or it can present the search results graphically as a
cloud of dots centered around the ideal ULM. The teacher can then start examining
these ULMs to find out whether ULMs at some distance from the ideal ULM will
still fit his goals
88
7.2.1 Defining distances
Many metadata fields can be seen as some sort of sequence with a certain ordering
from ‘none’ to ‘many’, or ‘easy’ to ‘difficuly’, etcetera. This observation can be
exploited in building a distance measure. To be able to do this, a mathematical
construct is needed to capture the property of values being orderable. This section
and Section 7.2.2 will present the mathematical definitions for this concept.
The metric space is a suitable, well-known construct from the discipline of
discrete mathematics. It is defined as follows:
89
then it can be modeled using a metric space; and if a metadata field is not orderable,
then it cannot be modeled using a metric space.
One metadata field (e.g. “difficulty”) is modeled using metric space. But how
to model a metadata-set of a ULM with n fields? This can be done by combining
=1
n metadata fields in a vector, so that ui for i ::n gives the value of metadata
field i of ULM u.
But what is ? It is an operator that models the ordering upon the n-dimensional
metadata space X . There are a number of definitions of on spaces with n dimen-
sions which fulfill the three demands mentioned in Definition 7.1 (Rudin, 1953):
Definition 7.2 The Euclidean distance (or L2 ) between two points x, y is defined
as q
(x; y) = (x1 ; y1)2 + (x2 ; y2)2 + : : : + (xn ; yn )2
Definition 7.3 The Taxicab distance (or L1 ) between two points x, y is defined as
Using this definition of the minus, the terms xi ; yi that are used in the three
distance definitions given above can now properly be calculated, so that in principle
the three distance functions are well defined.
However, there is still a problem that needs to be solved: a difference of one
rank position does not necessarily have the same impact on all dimensions. This
problem is solved in the next section.
90
7.2.2 Normalization
It should be noted that the metadata fields can count a different number of values.
The range of possible distances in a metric space (recall from Section 7.2.1 that
there is one metric ‘space’ for each metadata field) therefor also varies between
metadata fields: a field A with 5 values has distances ranging from 0 to 4, while
a metadata field B with 10 values has distances ranging from 0 to 9. These dis-
tances on the individual dimensions are combined into one distance measure in a
multidimensional space using formulas such as the Euclidean distance (see Defini-
tion 7.2). But then, the distances in metadata field B would have a greater impact
on the total distance than the distances in field A. So, to give each metadata field
the same weight they should be normalized: the distances in each individual di-
mension should range from 0 to 1. This can be accomplished as follows. Let V
be a set of metadata values that can be ordered, and let V be the distance function
that belongs to this set. Let jV j be the cardinality2 of V . Then the normalized
metric distance between two values x; y 2 V is defined as:
V (x; y) = jr(x)jV;jr(y)j
with r(x) the rank number of x (see Definition 7.5).
For proof that V is a distance and as such exhibits the three properties given
in Definition 7.1 the reader is referred to Appendix E.
7.2.3 Selection
In Section 7.2.1, three distances were defined: the Euclidean, Taxicab, and L1
(pronounced “el infinity”) distance. But not all three of these are suitable as a
distance for the intended goal of comparing educational materials: the L1 dis-
tance only takes into account the dimension with the highest difference, ignoring
the other dimensions. However, a distance measure that ignores dimensions also
ignores certain choices that teachers have indicated for learning material. This is
not desirable, because the teacher did not make these choices for nothing; these
choices have to make some difference. For this reason, the L1 distance will not be
considered further.
7.2.4 Example
Consider the following metadata field called ‘difficulty’ as defined in Table 7.1.
2
Recall from set theory that the cardinality of a set is the number of elements in that set.
91
Table 7.1: An example metadata field and the use of the rank r(x) function.
code label rank r(x)
Da very easy 0
Db easy 1
Dc average 2
Dd a bit difficult 3
De difficult 4
Df very difficult 5
Dg extremely difficult 6
The ‘code’ is a short way of writing the label; so instead of writing ‘a bit
difficult’ each time in mathematical formulas, this thesis will just write ’Dd ’. This
metadata field has seven values, so the normalized metric distance between two
(() ( )) 7
values x and y in the ‘difficulty’ dimension is r x ; r y = .
Another metadata field called “educational level” can be introduced; see Ta-
ble 7.2, describing the educational level a unit of learning material was designed
for. This metadata field has 6 values, so that the normalized metric distance be-
(()
tween two values x and y in this dimension is r x ; r y = .( )) 6
Table 7.2: Another example metadata field and its rank r(x) function.
code label rank r(x)
La elementary school 0
Lb high school 1
Lc university first year 2
Ld university second and third year 3
Le university graduate level 4
Lf postgraduate level 5
The distance between two Units of Learning Material can be calculated as fol-
lows: assume that ULM u has been designed for university first year students (Lc ),
and that it is fairly difficult (De ). The rank numbers for ‘difficulty’ and ‘edu-
cational level’ are 4 and 2 respectively. ULM v has been designed for graduate
students (Le ), and is just a bit difficult (Dd ). Its rank numbers are 3 for difficulty
and 4 for educational level. The L2 distance can then be calculated as follows:
q
2 (y; v) = (u1 ; v1 )2 + (u2 ; v2 )2 =
92
s
= De ; Dd 2 + Lc ; Le 2 =
jDj jLj
s 2 r(L ) ; r(L ) 2
= r ( D e ) ; r ( D d ) + c e =
7 6
s
= 3 ; 4 2 + 2 ; 4 2 p0:14292 + 0:33332 0:132
7 6
Similarly, the L1 distance can be calculated.
d2
y
d1 v
93
In Figure 7.2, the Euclidean distance is depicted. This is the definition of ‘dis-
tance’ that is used in everyday life, for example to calculate the distance between
two locations on a map. The distance ’d’ is found graphically by drawing a straight
line from u to v and measuring its length.
d
y
v
94
educational graduatepost
(5)
level (rank)
university
graduate (4)
duration in minutes
95
Problem 2 Some dimensions have an infinite number of values, such as “duration”
and “size”: theoretically, any duration or size is possible. This thesis will
avoid this problem by assuming that there are practical bounds to these fields,
so that the dimension counts a limited number of values so that it can also be
normalized.
Problem 3 Some metadata fields allow more than one value to be assigned; for ex-
ample, the “educational level” dimension can have the values “fourth grade”
and “sixth grade”; however, the range-index function is only defined on
one value. To solve this problem, a new 0 could be defined that calculates
the average of the individual rank indices of all values. In the remainder of
this thesis, metadata fields with more than one value are avoided so that this
problem can be safely ignored.
96
not take the relative importance into account, so that each dimension (character-
istic) is weighted equally. This thesis proposes that the distance measure can be
improved (i.e. is able to predict the usability of a ULM better) if it would take the
relative importance that teachers assign to characteristics into account. But what is
the nature of this importance? Is it a constant, or does it depend on ‘environmental’
factors such as the educational context? A context that demands practical material,
could cause a teacher to find interactivity more important. Another factor could
be the teacher him- or herself, or better, the teacher’s own ‘belief system’: some
teachers like to include a lot of hands-on experience in their classes, while other
teachers find that less important. These preferences may lead to certain preferences
for characeristics of online learning materials (i.e. for certain metadata fields). If
it would be possible to discover the teacher’s belief system, then perhaps it is also
possible to predict what characteristics the teacher prefers. And as the teacher’s
belief system changes only very slowly, this would only have to be measured once,
or at least not very often.
So, two factors are proposed that could have an impact on the teachers’ prefer-
ences for characteristics of learning material: (1) the educational context, and (2)
the teachers’ belief system. The next section will present results of an investigation
targeted at finding the influence that these factors have on the teachers’ preferences.
Insight in the nature of these influences provide answers to the question whether
the teachers’ preferences are dynamic or static; this will have a great impact on
the way the ‘weights’ of the various dimensions of the distance measure will be
implemented.
This section will describe a research to factors that influence the teachers’ pref-
erence for characteristics of learning material. The previous section hypothesized
that this preference depends on two factors: (1) the educational context, and (2)
the teacher’s belief system. Section 7.3.1 will first discuss a literature review on
teacher’s conceptions on teaching to find a suitable model. Using this model, two
research questions are formulated in Section 7.3.2. Then, Section 7.3.3 will verify
these hypotheses by developing an instrument to measure the teacher’s belief sys-
tem and the teacher’s preferences for characteristics of learning materials in various
educational situations. Section 7.3.4 will discuss how the sample population was
chosen, after which Section 7.3.5 will present the results of the survey. Finally,
Section 7.3.6 will discuss the findings and draw conclusions upon the research re-
sults.
97
7.3.1 Teachers’ Conceptions of teaching
Theoretically, a teacher’s conceptions of teaching would reflect the teacher educa-
tion he (or she) has enjoyed, both in theory and in practice. However, in reality
it seems that teachers base their belief system upon their daily experiences, both
during their own training and their classes. Kagan (1992) reviewed a large body
of research literature and found that the personal beliefs of preservice teachers
changed during their professional formation.
There have been several research efforts to determine aspects of teacher belief
and to measure them, many utilizing different research methodologies. According
to Kagan (1990), the concept of “teachers’ conceptions” itself is too ambiguous to
provide a coherent set of research results. Furthermore, conceptions are often held
unconsciously by the teachers, making it difficult to measure them: time consum-
ing methods have to be used to elicit and assess thoughts. These time constraints
force many researchers to limit the size of the subject population, so that the results
may not be very generalizable, and so that statistical methods are hard to use due
to a limited data set.
Yet, Samuelowicz and Bain (1992) succeeded in analyzing academic teachers’
conceptions of teaching. They synthesized a framework consisting of five dimen-
sions, which are relatively consistent with other research results on teacher belief
(Proser, Trigwell, & Tayler, 1994; Gow & Kember, 1993). To describe a concep-
tion of teaching, one value is chosen on each of the five dimensions. On each
dimension, three ‘values’ are possible: two extremes A and B , and a value AB ‘in
the middle’ which means “both A and B ”. The dimensions and their values are:4
Learning Outcome (LO) The expected outcome of the learning process. This
dimension ranges from (A) “after the learning process, students know more
than before the learning process” to (B ) “after the learning process, students’
knowledge has changed rather than increased”.
Nature of Knowledge (NK) the nature of the knowledge to be learned, which
ranges from (A) knowing just the subject matter without links to reality, to
(B ) being able to relate the subject matter to reality.
Students’ Conceptions (SC) The degree to which the teacher takes the concep-
tions the students have on the subject matter into account. “Not taken into
account” is coded as A, while “taken into account” is coded as B .
Bidirectionality (BI) The degree of bidirectionality that the teacher finds is most
appropriate: none (teaching is unidirectional, i.e. transmitting knowledge,
4
The dimensions have been renamed to make it easier to define a continuous scale; refer to the
original paper for the original names.
98
which is coded as A) or a lot (teaching is bidirectional, e.g. engaging in a
conversation with the students, which is coded as B ).
Content Control (CC) The amount of control students have over the content of
teaching. This variable is given the code A when students have no control
over the content, while value B is assigned to the variable when students to
have control over the content.
Samuelowicz and Bain (1992) use these five dimensions as “building blocks”
for five conceptions of teaching. As an example, they explain how the conception
“teaching is transmitting knowledge” can be built up using the five dimensions as
shown in Table 7.3:
Table 7.3: Example of how a teaching conception can be built up using the five
dimensions of Samuelowicz and Bain.
dimension value meaning
Learning Outcome AB both to know more, or differently
Nature of Knowledge A bound by the curriculum
Students’ Conceptions A not taken into account
Bidirectionality AB education is sometimes bidirectional
Content Control A teacher is in control of the content
This way, many conceptions can be analyzed and described by five values; one
for each of these five dimensions. Theoretically, the five dimensions (with each
3 = 243
three values) can describe 5 possible conceptions. Of course, not all con-
ceptions make sense in the real world, but still the dimensions are more powerful
than other methods that describe individual conceptions of teaching found by other
researchers (Proser et al., 1994; Gow & Kember, 1993). These methods describe a
fixed number of conceptions, usually less than 10. The behavior of a teacher can
then only be described using one of this limited number of conceptions, yielding a
very coarse categorization. The model of Bain and Samuelowicz, however, allows
a much finer categorization, as there are theoretically 243 conceptions; so there are
more conceptions with enables a more accurate description of the teacherss con-
ceptions of teaching. This model was therefor chosen as the basis of the current
research.
99
the concept of a metadata space in Section 7.2, Section 7.2.6 noted that the im-
portance that teachers assign to characteristics of learning material (i.e. metadata
fields) could improve the distance measure. Two factors were hypothesized to
influence this importance: (1) the teacher’s conceptions on teaching, and (2) the
educational context. The model of Samuelowicz and Bain makes it possible to de-
velop an instrument that measures the teacher’s conceptions by assigning values
A, B or AB on each of five “dimensions” that can be formulated into questions.
The “Content Control” dimension, for example, can be formulated as: who is most
often in control of the content of teaching, the teacher or the student? Depend-
ing on the answer, one of the three values can be assigned. Doing this for all five
dimensions yields a ‘pattern’ of A, B and AB values that described the teacher’s
conceptions of teaching. This thesis will call such a value for one dimension the
‘position’ of a teacher on that dimension.
Having obtaind values for the five dimensions, correlations can be calculated
between the position of a teacher on a dimension and the preference for certain
characteristics of learning material.
The following research question can then be formulated:
The second factor is the educational context in which the teacher is perform-
ing. For example, if the target group of learners does not have sufficient self-
management capabilities such that they can work through large assignments unat-
tended, then they may need other pedagogical material. The teacher will then find
the pedagogical aspects of learning material of particular interest. Or, if the teacher
wants to find extra assignments for fast students, then the subject matter may be
considered less important, while the difficulty of the learning material must be rela-
tively high to make sure the fast students will not become bored. If the educational
context is not of influence to any of the contexts, then one distance measure is
appropriate for all contexts. If, on the other hand, the educational context is of
influence to the importance that teachers assign to one or more characteristics of
learning material, then the distance measure will depend on the educational con-
text. This means that the educational context has to be taken into account by the
distance measure, so it is important to know the effect of the educational context
on how important teachers find characteristics of learning material. This leads to
the second research question:
100
teristics of learning material depend on one or more educational con-
texts?
The next section will develop an instrument that will be used to find answers
to these two questions.
7.3.3 Instrument
This section will develop an instrument to measure two properties: (1) the teacher’s
conception, and (2) the teacher’s preferences for characteristics of learning material
in a certain educational context. First, both the dimensions of Samuelowicz and
Bain as well as the characteristics of learning material that will be measured, will be
defined properly. Then, the instrument to do these measurements will be discussed,
including the pilot test and formative evaluation.
101
5. Experimental research, which also tries to find cause-effect relationships
by manipulating the subject of the study and examining the results of that
change.
6. Research and Development, which focusses on the development and evalua-
tion of a new product (physical or conceptual)
The research questions talk about concepts such as “importance that teachers
assign to characteristics”, and “conceptions of teaching”. These are properties of
human beings that can be measured. Furthermore, the research questions concern
the supposed existence of dependencies between these concepts. As the “corre-
lational research” method attempts to find relationships (dependencies) between
certain conditions (properties of human beings), this research method is the most
suitable one for the current research.
The following three paragraphs will operationalize the three properties that are
to be measured in the current research: the dimensions of teaching, the importance
that teachers assign to characteristics of learning material, and the educational con-
texts.
102
A AB B
0 5 10
Below, we will give a new formal definition of the dimensions. This section
will describe later how these dimensions can be measured.
SC Students’ Conceptions type: a variable on this dimension takes the value ’0’
if the teacher, when taking educational decisions, does not take into account
what the students’ conceptions are. A variable takes the value ’10’ if the
teacher does take these conceptions fully into account.
LO Learning Outcome: a variable on this dimension takes the value ’0’ if the
teacher has the opinion that the goal of learning is to know more (i.e. to learn
new knowledge), and ’10’ if the teacher finds that the goal is to know dif-
ferently (i.e. to obtain a deeper and richer understanding of existing knowl-
edge).
103
material, and that would be easy to understand by the teachers were selected. As
the teachers have to indicate which characteristic is more important to them than
others, only a small number of characteristics was used; otherwise, the task would
become too difficult to the teachers. The five characteristics that were selected are
listed in Table 7.4.
Table 7.4: The five characteristics of learning material that have been selected for
the experiment.
characteristic description
subject the subject of the Unit of Learning Material
pedagogical function the pedagogical function of the material, such as
exercise, theory, test, example
educational level the educational level for which the learning
material was originally designed, such as primary
or secondary education, freshmen, or graduate
student level
duration the time a typical learner needs to work through
the Unit of Learning Material (ULM)
amount of interactivity of the ULM (i.e. is it “lean back and watch”
material with no interactivity, or does the learner
have to interact with the ULM such as a
simulation?)
Research Question RQ 7.2 is about finding out whether the preference of teachers
for characteristics of learning material depends on the educational context, that
is: educational aspects of the situation that the teacher is in when he or she will
want to consult a database of learning materials. As the purpose of the research
question is to test whether the preferences of teachers for certain characteristics is
rather constant or not, the contexts should be chosen from several extremes. If the
preferences of the teachers are found not to vary in ‘extreme’ contexts, then they
can be expected to also not vary in milder contexts. So the first requirement to the
contexts is that they should reflect very different situations.
As the purpose of the contexts is to describe a situation in which a teacher
would feel the need to consults a database of learning materials, the second re-
quirement was stated: the contexts should describe a need for specific learning
104
material. The teacher should be able to translate the need for this specific learning
material into characteristics of learning material. So the contexts should ‘drive’
the teachers into a specific direction on one or more of the five chosen character-
istics (see Table 7.4), so that it can be tested whether the teacher’s preference for
characteristics really varies between contexts.
So the contexts were varied by describing a need for learning materials from
different angles: a need for extra material, very practical or theoretical material,
or material for a specific target group. Of course many more combinations are
possible, and it is difficult to say which ones are better than others. In fact, as long
as the requirement of “describing a need for specific material with respect to one
or more of the five characteristics” is met, the context is suited for this research.
Six contexts were created which described a need for the following material:
3. all materials that could be suitable to put online for a specific course (i.e. a
very broad need);
It was expected that the teachers’ preferences for, for example, the characteris-
tic ‘educational level’ will vary between these contexts: in context 5, material for
junior is explicitly needed, so the teachers were expected to find the characteristic
‘educational level’ important in this context. In context 2, however, the educational
level is of less importance; here, the characteristic ‘interactivity’ was expected to
be found more important by the teachers. Similarly, the other contexts also drive
the teachers into a specific ‘direction’ with respect to one or more characteristics
of learning material. The full text of the context descriptions that were used can
be found back in part C of the questionnaire that is printed in full in Appendix B
(Dutch). An english translation is provided at the end of the appendix.
Questionnaire
The previous paragraphs explained three concepts that are to be measured: the
teachers’ conceptions of teaching, the characteristics of learning material, and the
105
educational contexts. This paragraph will discuss the questionnaire that was de-
signed to measure these concepts. The questionnaire consists of three parts. Be-
low, we will discuss how these three parts served to measure the two variables:
teachers’ conceptions on teaching and the teachers’ preference for characteristics
of learning material in different educational contexts.
The first part (A) contains general (demographical) questions such as age, years
of experience in education, years of experience with computers. These data were
obtained to get an impression of the target group, and to be able to discover differ-
ences between subgroups if necessary.
The second part (B) was designed to measure the teachers’ position relative
to the five dimensions of Samuelowicz and Bain. This is done in two ways: first,
six multiple-choice questions were posed to measure the position of a teacher on
a dimension indirectly (these questions were numbered “B1” to “B6” in the final
version, see Appendix B). The teachers were presented with a case description,
such as: Suppose that your students are able to reproduce and apply all aspects
contained in the subject matter. Does this mean that your goals as a teacher are
reached?. Then, the teachers were given several possible answers, such as No,
the knowledge and insights of the students can perhaps be broadened or deepend.
For each answer, a score on the dimension that the question tried to measure, was
assigned on a scale from 0 to 10 by two researchers independently to avoid subjec-
tivity, and then the averages of these two sets of scores was calculated. If there was
a difference of more than three points, then the researchers discussed their opin-
ions on the matter, and after having exchanged interpretations a score that both
researchers agreed upon was determined.
Next, proposition-based attitude questions are used to measure the position
of a teacher on one of the five dimensions directly by posing attitude questions.
Each question measured the position of a teacher on one dimension. The subjects
were asked to indicate inhowfar they agreed with the proposition described in the
question using a 5-point Likert scale. These questions are numbered “B7.1” to
“B7.24” in the final version (see Appendix B). To avoid bias, some questions were
formulated so that the more subjects agreed, the higher their score on the question’s
dimension would be, while the other questions were formulated in a negative form
so that the more subjects agreed, the lower their score on the dimension would be.
The results on both types of questions will be used to cross-validate the results
of this part of the survey.
Finally, the third part (C) of the questionnaire contains six educational contexts
in which a need for learning material is described. For each educational context,
the teacher is asked to rank five characteristics of learning material from most im-
portant to least important. The characteristic ranked most important receives score
’5’, the next important one score ’4’, and so on so that the least important charac-
106
teristic receives score ’1’.
Validity
According to Tuckman (1994), the validity of an instrument is the extent to which
the instrument measures what it purports to measure. Four types of validity can be
identified (p. 182 and 183):
1. Predictive validity: the extent to which the outcomes of a test are able to pre-
dict a certain related behavior; for example, students that are slow learners
can be expected to pass less courses per year. The validity of a test that mea-
sures the ‘slowness’ of a student can be established by relating the slowness
of a student to the number of courses he or she passed this year.
2. Concurrent validity: the extent to which the outcomes of a test are compara-
ble to other (known valid) tests; for example, many versions of IQ tests are
already available. The concurrent validity of a new test could be established
by trying to relate its outcomes with the outcomes of an IQ test that is known
to be valid.
3. Construct validity: the extent to which the outcomes of a test that measure a
construct (or concept) can be related to known effects of that construct; for
example, teachers that find that the outcome of the learning process is “to
know differently” (see variable LO above) can be expected to often engage
students into a discussion about the subject matter. A relation between LO
and the number of times a teacher engages students into a discussion will
then prove construct validity of the test.
4. Content validity: the extent to which the sample situations in which the test is
performed represent all situations in which the behavior could be observed.
For example, a test for a certain course should be representative for all topics
covered in the course.
Establishing predictive validity would mean that a new test would have to be
developed to measure a related behavior. For example, to prove predictive validity
of the dimension ‘bidirectionality’, the number of times a teacher interacts with a
student during a lecture could be counted and related to the score of the teacher on
this dimension. The precise definition of what type of events count as ‘interaction
with the student’ would have to be defined, and then a number of lectures would
have to be observed. This kind of time-consuming research, however, is out of
scope of the current research.
107
To prove concurrent validity, a known valid test for measuring the scores of the
teachers on the five dimensions would have to be used. The research of Samuelow-
icz and Bain (1992) does not provide such a test, because their goal was to explore
the teachers’ conceptions and after that synthesize (as opposed to measuring) a
framework of dimensions. Therefor, the current research tried to prove concur-
rent validity by using two different test types concurrently: one using case-based
questions, and one using attitude questions.
There is a second way in which concurrent validity was established: during
the pilot test, the results of the questionnaire were compared to the researchers’
own interpretation of the teachers’ conceptions of teaching, measured on the five
dimensions of Samuelowicz and Bain (see the next paragraph). The researchers’
interpretation of the teachers’ conceptions can be seen as a ‘known valid test’ to
which the instrument’s results were compared, so that concurrent validity could be
tested (and improved).
The case-based questions in part B tried to measure the score of a teacher on
a dimension indirectly, by measuring the behavior of a teacher in a hypothetical
situation. A list of possible behaviors was provided from which the teacher could
choose. This thesis assumed that a teacher that scored high on a dimension D
would expose a certain behavior and choose for a certain answer. This answer
would thus be assigned a high score on dimension D . It can be stated that the an-
swers were based on certain assumed relationships between the score of a teacher
on a dimension and certain behavior exposed by the teacher, which conforms to the
definition of construct validity.
Content validity could not be established because it is difficult if not impossible
to assess the behavior of the teachers in ‘all situations possible’.
So, to summarize: two types of validity could be established: concurrent valid-
ity and construct validity. The other two types of validity could neither be proven
nor disproven.
Pilot test
The first version of the questionnaire was tested for comprehensiveness and inter-
pretation problems in two rounds.
The first round took place during January 1999, with three fellow researchers
as test subjects. It was found that the variable about Students’ Conceptions (’SC’)
covered too many different notions, causing the pilot persons to have difficulties in
answering questions about it. Therefor, this variable was split up into three sub-
variables: conceptions on the subject matter, conceptions on the teaching process,
and conceptions on society and the world; these variables also have a continuous
108
scale, which means that any value between 0 and 10 is possible. The three subvari-
ables are then defined as follows:
SCa Students’ Conceptions type ’a’: a variable on this dimension takes the value
’0’ if the teacher, when taking educational decisions, does not take into ac-
count what the students’ conceptions are regarding the subject matter. A
variable takes the value ’10’ if the teacher does take these conceptions fully
into account.
SCb Students’ Conceptions type ’b’: a variable on this dimension is ’0’ if the
teacher does not take into account what the students’ conceptions are re-
garding the educational processes they see happening at their institution; the
variable will take the value ’10’ if the teacher does take these conceptions
fully into account.
SCc Students’ Conceptions type ’c’: a variable on this dimension is ’0’ if the
teacher does not take into account what the students’ conceptions are re-
garding the society around them. The variable will be ’10’ if the teacher
does take these conceptions into account.
These three variables (SCa, SCb, SCc) and the other four variables described
previously (CC, NK, LO and BI) will together be called the “seven variables” in
the remainder of this thesis.
After the questionnaire was revised, it was tested in a second round during
February 1999 using five test subjects (colleagues with a lot of teaching experi-
ence). Their teaching behavior was determined beforehand in terms of their po-
sition on each of the seven dimensions by the two researchers (Van der Peet6 and
Hiddink) and written down; for example, a test subject could score “high” on the
dimension “bidirectionality of the education”. The test subject was then asked to
think aloud while filling in the questionnaire, and one of the researchers (who was
present in the room) annotated the scores of the test subject. These annotated scores
were immediately compared with the scores that were determined beforehand. If
there was a conflict, for example if the subject scored “2” on the dimension “bidi-
rectionality of the education” while the teacher was expected to score “high” on
this dimension, then the researcher tried to find the cause of the conflict by engag-
ing in a conversation with the test subject to find out why the test subject choose
his answers. Using this method, some questions were found to have interpreta-
tion problems, which were solved in the next revision of the questionnaire. The
concurrent validity of the instrument was also ensured using this method.
6
George van der Peet assisted in creating the questionnaire as a part of a course on doing educa-
tional research.
109
In the initial versions of the questionnaire, it was attempted to have one case-
based question for each dimension, and about four attitude questions per dimen-
sion. However, the questionnaire evolved through many informal discussion rounds
and the two pilot test rounds. Some questions had been dropped, some turned out
to be better suited to measure another of the seven variables than they were orig-
inally designed for, and some questions were added. The criteria on which these
decisions were based, are a combination of a lot of different considerations based
on questionnaire design principles, and creativity involved in such a design pro-
cess. It would be too laborious to document all of these (mostly implicitly made)
considerations here. Instead, table 7.5 illustrates how many questions for each vari-
able remained in the final version of the questionnaire after the pilot test; the final
version is given in Appendix B.
Table 7.5: The seven variables and the number and type of questions that measure
them.
Variable case-based attitude
SCa 1 3
SCb 1 2
SCc 1 2
CC 2 2
NK 2 3
BI 0 5
LO 1 7
total 8 24
After these preparations and revisions, an instrument was obtained that was
believed to have sufficient validity to use for gathering data. Further validity tests
will be done on the basis of the empirical data.
7.3.4 Experiment
In the previous section, an instrument was developed and tested to measure the
teachers’ conceptions on teaching using seven variables as well as the teachers’
preferences for characteristics of learning material in six different educational con-
texts. Also, ways to ensure the validity of the instrument were described. This
section will describe how this instrument was deployed.
A sample of the teachers of the University of Twente was selected randomly
from the university’s Course Guide by selecting the teacher of the first course that
was mentioned on each page. If that person was a teacher that had been involved in
110
the formative testing of the questionnaire, then the next teacher on the same page
was selected. This process yielded 196 subjects.
All subjects were contacted by phone during March and April 1999 to ask
whether they were willing to participate in the research. A paper copy of the final
version of the questionnaire was sent to those who agreed. Several teachers asked
whether they could fill in the questionnaire through the World Wide Web. This
option has been made available; the paper copy of the questionnaire that is sent to
each participant mentions this possibility but leaves the choice to the subject.
7.3.5 Results
This section will describe the results of the experiment. First, the number and na-
ture of the respondents is analyzed. Then, after standardizing the scores on the
seven variables, the concurrent validity and the reliability of the data are assessed
to get an impression of the quality of the data. After that, an analysis of the edu-
cational contexts is conducted to answer RQ 7.2: are there significant differences
between the contexts? An answer to this question is needed for properly analysing
the data to answer RQ 7.1: are the teachers’ preferences for characteristics of learn-
ing material related to the teachers’ conceptions of teaching? This analysis will
be done by correlating the teachers’ scores on the seven variables (dimensions of
conceptions of teaching) with the teachers’ ranking of characteristics of learning
materials.
Respondents
Of the 196 selected teachers, only 86 could be reached by phone (44%) within
about three tries. Of these 86 teachers, 71 (83%) were willing to participate in
the research. These 71 teachers were sent a paper copy, of which 53 were returned
(75%). Only three were returned via the World Wide Web, the others were returned
by mail.
Table 7.6 illustrates the distribution of the respondents across faculties, which
is fairly equal (we didn’t perform an analysis of per-faculty means because of the
large number of variables involved and the small number of respondents per fac-
ulty).
Of two questionnaires the faculty could not be traced: a unique number was
written on the self-addressed envelope that accompanied the copy; this number
corresponded to a translation table, however, the respondents returned the copy in
a different envelope.
111
Table 7.6: Distribution (both in numbers and percentage of returned question-
naires) of respondents across the faculties.
Faculty Number Percentage
Mechanical Engineering 3 75
Electrical Engineering 6 86
Chemical Technology 3 75
Applied Physics 5 100
Mathematical Sciences 5 63
Philosophy and Social Sciences 2 67
Applied Communication Sciences 3 75
Public Administration adn Public Policy 7 100
Industrial Technology & Management 4 57
Civil Technology & Management 1 50
Educational Technology 5 45
Computer Science 5 83
Business Information Technology 2 67
unknown 2
Standardizing
The questions in part B of the questionnaire are targeted at measuring the position
of a teacher on one of the seven variables (SCa, SCb, SCc, CC, NK, LO, BI). To
be able to calculate averages of answers for a particular dimension, the values have
been standardized to so-called Z scores: a mean of 0 and standard deviation of 1
(Ferguson, 1981, p. 449). The averages of the teachers’ standardized answers on
each of the seven dimensions are presented in Appendix D, Figure D.1.
Concurrent Validity
112
coefficients are significant at the 0.01 level. For variable ’BI’ no correlation could
be calculated because the only case-based question that was present for this vari-
able was removed during the pilot test because it was confounding.
Table 7.7: Pearson’s correlation between case-based questions and attitude ques-
tions.
Variable Correlation N
SCa 0.28 28
SCb -0.09 27
SCc -0.12 24
CC 0.54 50
NK 0.12 51
LO 0.37 49
As Table 7.7 shows, not all case-based questions and the attitude questions
correlate significantly; only CC and LO do. This can be explained by the fact that
the case-based questions elicit a certain behavior in a hypothetical situation, while
the attitude questions require the subject to be the judge of their own attitude. As
Kagan (1990) states, the teachers’ conceptions are often held unconsciously, and
appear to be highly contextualized, i.e. their conceptions depend on many aspects
of the situation the teachers are in. This is consistent with the discrepancies that
were found between the behavioral questions and the attitude questions: in differ-
ent situations, the teachers’ conceptions appear to be very different. To summarize,
for only two out of the seven variables, moderate concurrent validity can be proven
using the approach of using attitude questions and case-based questions.
This means that the two methods, using attitude questions and using case-based
questions, of measuring the position of teachers on the seven dimensions are not
in agreement with each other. There are two possibilities to proceed: (1) merge
the data sets of the case-based questions and the attitude questions and use this
data set for further data analysis, or (2) choose one of the two data sets to use for
the data analysis. Option (2) can be chosen if there are indications that one of the
two methods yields more reliable results than the other. Three observations can be
made with respect to these two possibilities:
Merging two data sets that do not agree with each other will result in a data
set with a high standard deviation, so it will be more difficult to find signifi-
cant relationships.
From Table 7.5 it can be calculated that data set of the attitude questions
113
contains more ‘datapoints’ (answers to questions; on average 3.4 datapoints
per dimension (24 questions for 7 variables) while the case-based questions
contains on average 1.1 datapoints per question (8 questions for 7 variables).
Therefor it can be expected that the attitude dataset is more reliable.
The case-based data set does not contain any data for variable BI (see Ta-
ble 7.5). So choosing for the case-based data set would mean that variable
BI cannot be analyzed.
For these three reasons it was decided to choose the attitude dataset for the remain-
der of the data analysis.
Reliability
To determine the reliability of the data obtained, the results of the attitude ques-
tions (questions B7.1 to B7.25, see Appendix B) were tested for reliability using
Cronbach s (Tuckman, 1994, p. 181). This number indicates whether or not a
set of datapoints are in agreement with eachother, on a scale from 0 to 1.
The reliability indications turned out to be quite low for some questions; some
were even less than 0.10. To improve the reliability, questions that were confound-
ing were removed from the data set. The decision which question to remove was
based on the item-total correlation, so items that correlate least with the total score
were removed. Only questions whose removal would provide a reasonable im-
provement of the reliability (an increase of of more than 0.10) were actually
removed.
The variables and their Cronbach coefficient are shown in Table 7.8. Per
variable, the number of questions and the reliability index is given on the left
side of the table. The right side of the table shows, if applicable, what question
was removed, and the after removal. As five questionnaires were not complete,
the data of 48 subjects could be extracted.
As can be observed, SCb and SCc have a very poor reliability. These variables
will therefor no longer be considered in the remainder of this data analysis.
Due to the straight-forward nature of the questions in the third part of the ques-
tionnaire, in which the respondents were asked to rank five characteristics of learn-
ing material in order of importance for a certain educational context, no reliability
and validity checks were built into part C of the questionnaire.
114
Table 7.8: Reliability of the answers, 48 subjects
Variabele nr. of questions removed question ’
SCa 3 0.13 B7.13 0.47
SCb 2 0.09 0.09
SCc 2 0.10 0.10
CC 2 0.48 0.48
NK 3 0.42 B7.3 0.70
LO 7 0.45 0.45
BI 5 0.43 B7.5 0.52
The remaining data (i.e. what remains after removing the answers to the questions
as described in Table 7.8) will now be analyzed to answer Research Questions 7.1
and 7.2.
RQ 7.1 is about a dependency between the importance that teachers assign to
characteristics of learning material, and their position on the dimensions of teach-
ing (as operationalized by the seven variables SCa, SCb, SCc, CC, NK, LO and
BI). However, the importance that teachers assign to characteristics of learning
material has been measured by part C of the questionnaire in several educational
contexts. The question that now arises is: should this importance be examined on
a per-context basis, or for all contexts at the same time? To answer this question,
it is necessary to investigate Research Question 7.2 first: does the importance that
teachers assign to characteristics of learning material depend on the educational
context? If it does not, then RQ 7.1 can be answered for all contexts at the same
time. But if it does, then RQ 7.1 can only be answered for individual contexts.
The data obtained from part C of the survey consist, per teacher, of a rank num-
ber (1 to 5) for each characteristic (duration, level, function, interaction, subject)
for each of the six contexts (see Appendix D, Figure D.2). To test whether the
teachers’ preferences varied between contexts, the means of the rank numbers of
each characteristic will be compared between contexts as described in part C of
the questionnaire to see whether they differ significantly. If so, then apparently the
teachers’ preferences for characteristics differs between contexts.
So a paired samples t-tests was performed for each characteristic; for exam-
ple, for characteristic “duration” a paired t-tests was performed for the means
in context 1 and 2, context 1 and 3, and so forth. Per characteristic, this yields
5 + 4 + 3 + 2 + 1 = 15 t-tests (on the same data set). To obtain an approximate
familywise error of = 0 05 = 0 05 15 0 0034
: a per-comparison test of PC : = :
was used. These results are summarized in Table 7.9 in bold face. To get some in-
115
sight into possible other relations, also characteristics that have a familywise error
of PC = 0 10 15 0 0067
: = : are printed in normal face. The results of these tests
are summarized in Table 7.9.
Table 7.9: Characteristics that have a significant mean difference between two con-
texts.
characteristic contexts t df sig
level 2 and 4 -3.2 45 0.003
level 4 and 6 3.0 44 0.004
function 1 and 3 3.5 44 0.001
function 1 and 4 4.1 44 0.000
function 4 and 6 -3.1 44 0.004
The table shows that for the characteristics “level” and “function”, there are
contexts with significantly different means for these characteristics. This means
that in these contexts, the teachers’ preference for these characteristics significantly
differed. So although no significant difference for all characteristics in all con-
texts was found, Research Question 7.2 should still be answered with “yes”: the
teachers’ preferences for one or more characteristics of learning material (metadata
fields) is related to one or more educational contexts.
Part B of the questionnaire was developed to measure seven variables, using two
methods: case-based questions and attitude questions. In Section 7.3.5 it was de-
cided to only use the data set of the attitude questions. After a reliability analysis in
Section 7.3.5, it was decided to drop variables SCb and SCc because they were in-
sufficiently reliable. The reliability of variables SCa, NK, and BI was improved by
removing a question from the data set. So, the final data set consists of the columns
“at.cc.avg”, “at.lo.avg”, “sca-7.13”, “nk-7.3” and “bi-7.5” from Figure D.1 in Ap-
pendix D.
The data from part C consist of rank numbers for five characteristics in six
educational contexts (see Figure D.2). In the previous paragraph, it was concluded
that the teachers’ preferences sometimes varies between contexts, so the contexts
have to be examined individually. To answer Research Question 7.1, the following
analysis will try to find correlations for each characteristic between the scores of a
teacher on the five dimensions of teacher conceptions, and the rank number of that
characteristic in each of the five contexts.
116
This yields six tables (one for each educational context), in which the five di-
mensions of teaching conceptions are compared with the rank numbers of the five
characteristics of learning material. Note that for each context-characteristic pair,
unique data are present from the results of part C of the questionnaire. However,
the data for each dimension of teaching conception is re-used five times in each
table, so using the data for each dimension, throughout the six tables 30 compar-
isons are made. To maintain a familywise error of 0.5, the per-comparison error
rate would have to be PC = 0 5 30 0 0017
:= : .
As a rank variable is involved, Spearman’s will be used as correlation coeffi-
cient (Charles, 1988, p. 101). Not all six tables will be presented here, instead Ta-
ble 7.10 only presents the relations with an error rate of less than 0.05, with printed
0 0017
in bold face those relations with an error rate of less than : ; the boldly printed
0 05
relations will therefor have a familywise error rate of : .
117
0 05
familywise error rate of : . As a basis for these planned comparisons, the rela-
tions from Table 7.10 could be chosen. These 8 comparisons would then require a
0 05 8 0062
per-comparison error rate of : = : .
7.3.6 Discussion
The data show that for two characteristics (“level” and “function”) there are signif-
icant differences in means between contexts (see Table 7.9), which means that in
some situations, the teacher has a significantly different preference for character-
istics of learning material. So it should be concluded that the educational context
is of influence on the teachers’ preferences for characteristics of learning material
(answering RQ 7.2 affirmatively). Three characteristics, however, are not sensitive
to the contexts: “subject”, “duration”, and “interaction”. Apparently, teachers have
a fixed preference for these characteristics, which is not subject to context changes.
For the development of the distance measure, it would have been convenient if
the teachers’ preference for characteristics of learning material was independent of
all contexts, so that one measure would be appropriate in all situations.
There are some small to medium interactions between dimensions of teachers’
conceptions of teaching and their preference for characteristics of learning mate-
rial. Table 7.10 shows that two relations have been found in context 6: one between
duration and Content Control, and the other between duration and Bidirectional-
ity. In the other contexts however, no significant relations were found. Although
the data suggests that Research Question 7.1 should be answered affirmatively, the
number of significant relations that were found is quite limited.
Only in context 6 significant relations were found (see Table 7.10). This
means that in the other five contexts, no data to base a prediction on are
118
present, and therefor prediction is impossible.
Relations were found only for one characteristic of learning material (‘dura-
tion’, see Table 7.10), so that no prediction can be made for the other four
characteristics.
The relations that were found only have small effect sizes: the effect size
is equal to the square of the correlation coëfficient, so that the dimension
47
CC only explains : 2 : 22 1% of the variance of the preference for the
59 34 8%
‘duration’ characteristic, while dimension BI explains : 2 : of the
variance of the preference for the ‘duration’ characteristic (see Table 7.10).
Due to these three reasons, it is not possible to predict the weights of the me-
tadata fields (see end of Section 7.2.6) by measuring the teachers’ conceptions on
teaching. The distance measure will therefor be based on relative “weights” that
the teacher enters for each search question. This gives the teacher the opportunity
to indicate how important he thinks the characteristics of learning material are in
his current educational context. The search results can then be sorted on distance
to the ideal ULM (the ULM that exactly matches the search specification), or visu-
alized graphically to give insight into the relevance of the search results.
The next section explains how the relative importance that teachers assign to
characteristics of learning material is translated into a weight factor and how this
factor fits into the distance measure.
119
X jr(xi ) ; r(yi)j
WTD = wi jVi j
i
In the remainder of this thesis, this distance measure will be called the Weighted
Taxicab Distance or WTD.
Similarly, dividing the terms in Definition 7.2 by the cardinality of the metadata
sets, and multiplying these with a weight factor wi yields the following formula for
the Weighted Euclidean Distance (WED):
v
u
uX r(xi ) ; r(yi ) 2
WED (x; y) = t wi jVi j
i
These formulas constitute a ‘distance measure’ in the mathematical sense of the
word. For a proof the reader is referred to Section E.1 and Section E.2, respectively.
Note that this formula alone is not sufficient to predict the usability of a Unit
of Learning Material; the “ideal ULM” is needed, as well as the weights wi for
each characteristic. So, to use this formula an “ideal ULM” should be known
(from the search specification a user has entered), and the relative importance of
characteristics of learning material. As the questionnaire results (see Table 7.9)
showed that this relative importance changes from situation to situation, the actual
distance measure (with all weight factors filled in) will also vary from person to
person, and from situation to situation.
7.5 Conclusion
In this chapter, the concept of “distance in metadata spaces” has been explored.
Such a distance can be measured, but one first needs to know how important the
teachers each characteristic (metadata field) find. A research effort has been de-
scribed that tries to learn more about this importance, and in particular, if it can be
related to how a teacher thinks about teaching. An instrument was developed to
measure this, and an experiment was executed. The results suggest that this impor-
tance sometimes depend on the educational context in which the teacher is looking
for materials. Also, the results suggest that this importance relates, in some situa-
tions, to one or more conceptions of teaching as measured using seven variables. A
distance measure based on this importance therefor may also (in some situations)
vary with the educational context and the teacher’s conceptinos on teaching. Al-
though the results showed that only in a few situations the teachers’ preferences
for characteristics of learning materials varied with these two concepts, still these
120
effects may occur so that it is necessary to take them into account when developing
a distance measure.
The results have been used to incorporate the importance that teachers assign
to characteristics of learning material into a measure of distance. Two distance
measures were defined: the Weighted Euclidean Distance (WED) and the Weighed
Taxicab Distance (WTD). These measures depend on the teacher’s search specifi-
cation, which is translated into a so-called ‘ideal ULM’, a hypothetical ULM in a
metadata space. The distance measure is then able to calculate inhowfar a certain
search result differs from what the teacher ideally wants. These distances could
then, for example, be used to sort the search results in order of increasing distance.
The next chapter will describe a research effort that tries to investigate if these
distance measures are indeed able to give an indication of the “distance” as per-
ceived by a teacher between the ULM he or she is ultimately looking for (the ‘ideal
ULM’), and certain search results.
121
122
Chapter 8
8.1 Introduction
This thesis proposes that the use of a measure of distance in a metadata space
can help the teacher in finding learning materials that fit his or her purposes. The
previous chapter proposed two distance measures: the Weighted Taxicab Distance
(WTD) and the Weighted Euclidean Distance (WED). The weight factor in these
formulas is derived from the relative importance that teachers assign to character-
istics of learning materials (see Section 7.4.1). The distance measures calculate the
distance between the search profile that the teacher has entered (the ‘ideal’ ULM)
and the ULMs in the database; a list of search results can then be sorted on their
distance to the ideal ULM.
This chapter will try to validate whether one or both of the distance measures
really succeed in predicting the ‘usability’ of ULMs to a teacher; this attempt can
be called the validation of the distance measures. More specifically, this chapter
will attempt to compare a teacher’s judgement of the usability of a ULM (given a
certain educational situation) with the prediction of the usability according to the
two distance measures. This can be worded in the following research question:
The two distance measures are based upon the assumption that the preference
of teachers for certain characteristics (that are modelled by the metadata) of a Unit
of Learning Material is related to their opinion on the usability of that ULM (see
123
Section 7.4). There may be, however, many other factors that also influence the
teacher’s judgement. The greater the influence of these factors, the less the distance
measures are able to predict the usability of the learning materials (because these
other factors are not taken into account). So, the distance measures can be made
more effective by basing it on the most important factors, i.e. those that are the
most relevant to the teacher when judging the usability of ULMs. For this purpose
the following research question is formulated:
The previous chapter introduced weights into the distance measure, because it
was shown that the teachers’ preference for characteristics of a ULM is related to
their opinion on the usability of that ULM (see Section 7.4.1). However, this sec-
tion did not specify how the teachers’ ‘preference’ for characteristics is translated
into a number. This is a difficult question. In the current research it was decided
to start with a simple approach and evaluate how effective it is: assign the value
‘1’ to the least important characteristic, ‘2’ to the second least important one, and
so forth. With four characteristics, the most important characteristic gets weight
‘4’, four times as much as the least important characteristic. To get an impression
of the effects of these weights, two different versions will be tested in this chap-
ter: a Euclidean distance measure with weights that increase by 1, and a Euclidean
distance measure that has equal weights, i.e. wi =1 for all i. If the results of the
Equal Weights Distance (EWD) measure are better than the results of the Weighted
Euclidean Distance (WED), then weights with small increases are probably better.
To be able to compare the Weighted Taxicab Distance with the Weighted Euclidean
Distance, both will be used with a weight vector that increases by 1 for each field
(this increase will be called the weight step in this thesis). So in total, three distance
measures will be tested in this chapter: the Weighted Euclidean Distance (WED),
the Weighted Taxicab Distance (WTD), and the Equal Weights Distance (EWD) to
be able to assess the impact of the weight vector.
This chapter is structured as follows: first, Section 8.2 will discuss the design
of the research effort undertaken to answer Research Question 8.1 and 8.2. Then,
Section 8.3 will discuss what instrument has been used to collect data. After that,
Section 8.4 will present how the instrument was used to gather data. Section 8.5
will present the results of the experiment, and finally Section 8.6 will discuss the
results and close the chapter.
124
8.2 Research Design
Recall the list of six types of educational research methods presented in Sec-
tion 7.3.3: (1) historical, (2) descriptive, (3) correlational, (4) causal-comparative,
(5) experimental, and (6) research and development. In Research Question 8.1, two
variables can be identified: “the prediction of the usability of a ULM” according
to either one of the distance measures, and “the judgement of usability” according
to a teacher. These two measures should be compared to each other, so in this case
the correlational research method again seems most appropriate.
Research Question 8.2 is not about correlations, but it attempts to describe on
the basis of what characteristics a teacher selects learning material. So to answer
this question, a descriptive research method is the most suited.
The next section will explain how variables that can be measured are derived
from both Research Questions. After that, Section 8.2.2 will explain what proce-
dure is needed to collect data of these variables, and Section 8.2.3 will explain how
this approach relates to the existing research methodology in the field of Informa-
tion Retrieval.
125
discussed in Section 8.1? As the judgement of the usability of a ULM depends on
the educational context, an educational context has to be described to the teacher
first. The teacher can then determine his (or her) preferences for certain character-
istics, and make these known to the computer. To do this, an operational definition
for these preferences should be determined. In part C of the questionnaire (see
Section 7.3.3) the test subjects were asked to number five characteristics in order
of relevance. From comments that were written on the copies of the questionnaire,
it became clear that the subjects found it very difficult to assign a number that in-
dicates an ‘importance’. So, for the current study a different approach was taken.
On the computer, the (four) characteristics were summed up, and for each char-
acteristic so-called ‘radio buttons’1 were present. Using these buttons, teachers
could indicate which characteristic would be “checked first”. In reality, the buttons
were not used to indicate an order in which characteristics were to be checked, but
instead weights were assigned: the characteristic that the subject said “should be
4
checked first”, received weight ; the characteristic that should be checked second
3
received weight , and so on.
Research Question 8.2 is about finding factors other than those represented by
the metadata fields that are relevant to the teacher when judging the usability of
a Unit of Learning Material. To get an overview of these factors, the test subject
could be asked for each search result what his or her motives (or factors) were to
assign the grade that he or she assigned. To help the test subject in analyzing what
factors played a role, a list of possible factors is presented to the subject, and then
the subject is asked to indicate per factor if it played a role in the decision or not.
Of course, the test subject will also have the opportunity to write down his or her
own factors if it is not present in the list. This list will be written down on a form
called the “ULM Evaluation Form”, which the test subjects are required to fill out
for each search result.
The initial list of factors that was created is depicted in Table 8.1. The factors
in this list were gathered through many informal discussions with teachers about
educational databases and the usability of Units of Learning Material.
8.2.2 Procedure
Using the operational definitions described above, the procedure for collecting data
will be described. For answering RQ 8.1, a correlation needs to be calculated
between two variables: a teacher’s judgement on the usability of a ULM in a given
situation, and the distance between this ULM and the search specification that the
1
A series of buttons on the screen of which at most one at a time can be in the ‘selected’ state
(compare listening to one radio station at a time). This mechanism has been incorporated into most
Graphical User Interfaces.
126
Table 8.1: Initial list of factors that are expected to play a role in determining
whether a ULM is useful or not.
Factor description
Flayout layout characteristics such as font size and use of colours
Freferences amount of references to other learning materials
Ftextstyle style of the textual content
Fcorrectness correctness of the content
Fpedagogy the used pedagogy of the material
Fexercises the way exercises conform to the teacher’s own insights
Finteractivity the amount of interactivity of the material
Ftime the duration of the material
Fdifficulty the difficulty of the material
Fsize the size (in kilobytes) of the material
teacher would enter to find learning material. To collect data points, the following
procedure will be used:
1. A test subject is given an educational context in which a need for learning
material is described. The educational context will take the form of a case
description.
2. The test subject examines the case description, and will fill out a computer-
form that specifies what characteristics the ULMs should have in the case
and how important they find each characteristic. The search specification
will specify the ‘ideal ULM’, because it describes what the ULMs would
ideally look like.
3. The computer will return a list of ULMs which the test subject is asked to
examine. For each ULM, the subject is asked to evaluate the usability of the
ULM for the educational context by assigning a grade number.
4. For each ULM in the search result list, the computer will calculate the dis-
tance between the ULM and the ‘ideal ULM’ according to the three distance
measures (WTD, WED and EWD).
This procedure yields four numbers for each ULM: the test subject’s grade num-
ber, and the distances according to the three distance measures. For each distance
measure, a correlation coëfficient between the test subject’s grade and the distance
according to the measure will be calculated. Thus, for three distance measures,
a correlation coëfficient is obtained that indicates how well the distance measure
127
relates to the test subjects’ judgement on usability of ULMs in certain educational
contexts.
To ensure the external validity (generalisability) of the results, the educational
contexts, the test subjects, and the ULMs should consist of a representative sample
of the respective populations.
This section will discuss two alternative approaches that may at first seem also
suitable for the current research, but upon a closer look problems arise.
An approach to evaluate the effectiveness of a retrieval method that is well
known in the Information Retrieval discipline is the TREC method. In this method,
a large collection of documents is composed, and a number of queries are formu-
lated by an independent party (the TREC organisation). Research groups that want
to test the performance of a novel search method can obtain a copy of the docu-
ment set and feed the queries to their search algorithm, which will then search the
documents and return a list of search results. The results of each query are then
judged by members of the TREC organisation, to see if the search results are rele-
vant considering the query. These results are then publicized and discussed on the
TREC conference. But as the TREC document set does not consist of educational
multimedia documents this approach is not suited for the current research.
An alternative approach would be to sort the list of search results according to
the distance measure, and also to ask the test subject to sort the search results ac-
cording to his judgement. These two lists of the same search results can be seen as
permutations of a set of ULMs, so that methods from discrete algebra (Kececioglu
& Sankoff, 1993; Foata, 1976) can be used to see how much the two lists differ
(i.e. how much agreement there is between the order generated by the distance
measure and the order determined by the test subject). The main disadvantage of
this approach is, however, that small differences in the order of the lists can result
in a relatively large amount of ‘disagreement’. However, the most important issue
in the current research is if the distance measure is able to discriminate between
usable and useless ULMs, and not if the distance measure is able to precisely pre-
dict the order that the test subject would choose. So this research method is too
sensitive to small errors in the order of the result list, and not sufficiently sensitive
to large relevance errors.
128
8.3 Research instrument
In this section, the “instruments” needed to execute the procedure described in
Section 8.2.2 are discussed. First, the dimensions of the distance measure are de-
termined in Section 8.3.1. Then, a set of suitable ULMs that will be used during
the experiment are derived in Section 8.3.2. Then, the design of the case descrip-
tions will be presented in Section 8.3.3. For each case, a predefined list of search
results will be presented to the test subjects. Section 8.3.4 will explain how an ex-
pert was consulted to develop this list. Section 8.3.5 will then present the software
that was used to conduct the experiment. Then, the evaluation form that the test
subjects used to evaluate the search results is presented in Section 8.3.6. Finally,
Section 8.3.7 will describe a pilot test that was conducted to assess the effectiveness
of the instrument, and how this was improved.
129
not to find the best set of metadata fields.
Below, the metadata fields of the IEEE metadata set are examined for these
two desired properties; an elaborate description of the choices made can be found
in Appendix C.3. Fields that have both properties, are printed in a bold typeface.
It follows from Table 8.2 that there are six suitable metadata fields seem to
be both orderable and useful. Let’s examine them for the two desired properties
“orderability” and “usefulness”:
Size As the size of a ULM is measured in bytes2 , it is trivial that the “size” field is
orderable: it is possible to unambiguously state that a certain ULM is larger
in size than another. The field can be useful to a teacher to determine whether
a ULM will not cause unnecessary download delays due to large sizes.
Difficulty Again, the orderability of the difficulty field is obvious: a ULM can
be more difficult to a student (within the intended educational context as
described in element 5.6) than another ULM. Also, it can be very useful to a
teacher to know how difficult a ULM is.
Typical Learning Time The orderability of the typical learning time is trivial. It
is important to a teacher because teachers often want to know the learning
material’s “time load” onto the learners.
130
Table 8.2: Fitness of metadata fields to serve as metric space
nr metadata field ordering? useful?
1 G ENERAL
1.1 Identifier no no
1.2 Title no yes
1.3 Catalog Entry no no
1.4 Language no yes
1.5 Description no yes
1.6 Keywords no yes
1.7 Coverage no no
1.8 Structure no no
1.9 Aggregation level no no
2 L IFECYCLE
2.1 Version yes no
2.2 Status yes no
2.3 Contribute no no
3 M ETAMETADATA
3.1 Identifier no no
3,2 Catalog Entry no no
3.3 Contribute no no
3.4 Metadata Scheme no no
3.5 Language no no
4 T ECHNICAL
4.1 Format no no
4.2 Size yes yes
4.3 Location no no
4.4 Requirements no yes
4.5 Installation Remarks no no
4.6 Other Platform Requirements no no
4.7 Duration yes no
131
nr metadata veld ordering? useful?
5 E DUCATIONAL
5.1 Interactivity Type yes no
5.2 Learning Resource Type no yes
5.3 Interactivity Level yes yes
5.4 Semantic Density yes perhaps
5.5 Intended End Role no no
5.6 Context (educational level) yes yes
5.7 Typical Age Range yes perhaps
5.8 Difficulty yes yes
5.9 Typical Learning Time yes yes
5.10 Description no perhaps
5.11 Language no yes
6 R IGHTS
6.1 Cost (y/n) yes no
6.2 Copyright (y/n) yes no
6.3 Description no no
7 R ELATION
7.1 Kind no no
7.2 Resource no no
8 A NNOTATION
8.1 Person no no
8.2 Date yes no
8.3 Description no no
9 C LASSIFICATION
9.1 Purpose no no
9.2 TaxonPath yes yes
9.3 Description no yes
9.4 Keywords no no
132
So, these six metadata fields comply with the two requirements, and hence are
sufficiently usable to base the distance measure upon and to conduct the experiment
with.
The consequence of the fact that only a small subset of the metadata fields are
used in the distance measure is that the distance measure will be less able to predict
for sure if a certain ULM is relevant to the teacher, compared to methods that
include all metadata fields. Therefor, small effect sizes should be anticipated, and
consequently the predictive power of the distance measure will be relatively small.
For practical applications of the distance measure, for example to sort database
search results in order of decreasing predicted usability, this means that the order
will be less accurate than when more metadata fields would have been used.
Note, however, that the goal of the current research is to try to demonstrate that
the mechanism of using a distance measure in a metadata space can predict to some
effect the relevance of ULMs to a teacher. If it can be proven that a prediction is
possible, then the mechanism could be refined and improved to allow for greater
effect sizes and more accurately sorted lists of search results.
As said before, the experiment will use Units of Learning Material with varying
values for certain metadata fields. These fields form the dimensions of the distance
measure. Now that the metadata fields are known, the next section will determine
what the ULMs that will be used in the experiment should look like.
133
results; we will discuss this decision in Section 8.3.4. This also eliminated the
need to vary the “subject” field. Thus, four fields remained: Size, Interactivity
Level, Difficulty, and Typical Learning Time. To vary these four fields on two
2 2 2 2 = 16
values would require ULMs. As it was expected that the
teachers would find the “Interactivity” field the most important, it was decided to
2 2 2 3 = 24
vary the ‘Interactivity’ field on three values. Thus, a total of
ULMs are needed. Their specification (“profile”) is listed in Table 8.3.
Table 8.3: Specification of the ULM profiles for the validation experiment.
Nr size interaction difficulty time
1 low low low low
2 low low low high
3 low low high low
4 low low high high
5 low medium low low
6 low medium low high
7 low medium high low
8 low medium high high
9 low high low low
10 low high low high
11 low high high low
12 low high high high
13 high low low low
14 high low low high
15 high low high low
16 high low high high
17 high medium low low
18 high medium low high
19 high medium high low
20 high medium high high
21 high high low low
22 high high low high
23 high high high low
24 high high high high
Now that the profiles of the ULMs are determined, content for the ULMs that
suit each specification is needed. The subject of computer networks was chosen
for the following reasons: teachers of the Telematics Systems and Support group
134
of the University of Twente had shown a great interest in the concept of “Units
of Learning Material”. Secondly, as the group teaches courses to many different
target groups, much reuse of materials occurs. Finally, the teachers already had put
a lot of their course materials online, including some simulations. The subject area
was narrowed down to the “Internet Control Message Protocol” (ICMP), because
many different materials on this subject were available on the Internet from many
universities.
To ensure that the decisions that the test subjects will make about the learning
material are valid and represent real life situations, it was decided to use exist-
ing learning material that is actually being used in education. The Internet was
searched3 on keywords such as ‘icmp’, ‘course’, ‘traceroute’, ‘ping’, ‘exercise’,
‘simulation’ ‘computer networks’ and various combinations of these words. The
course materials that were found were examined and in a creative process, parts
of courses were combined and edited to comply with one of the profiles listed in
Table 8.3. For large ULMs, the table gives video fragments as example; but as no
video fragments related to the subject matter could be found, ULMs with many
pictures were created. Also, simulations on ICMP could not be found (for highly
interactive ULMs), so instead ULMs with many assignments or exercises were
created.
Interpreting the specification listed in Table 8.3 is a subjective process: at what
size should a ULM be considered “small”, and when is a ULM to be considered
“large”? The size of the collected ULMs varied from a few kilobytes to tens of
kilobytes, so it was determined that ULMs that are smaller than 5 kilobytes (about
730 words) were considered “small”, while ULMs that were larger than 15 kilo-
bytes (about 2200 words) were considered to be “large”. Similarly, ULMs that take
less than 10 minutes to work through (the ‘duration’ property) were considered to
be “short”, while ULMs that take longer than 20 minutes were considered to be
“long”. The list of ULMs that were created is presented in Table 8.4. The profile
numbers correspond to the profile numbers in Table 8.3. Note that the ranges that
were chosen to distinguish between “short” and “long”, and “small” and “large”
are subjective. This is unavoidable, as no objective definition exists for how big
a “small” ULM typically is. ULMs were considered to be “difficult” if they were
written in a difficult to read style, or if difficult words were used that weren’t explic-
itly explained first. ULMs that were written in clear, easy to understand language,
were considered to be “easy”.
Note that the 24 profiles in Table 8.3 are only a guideline to make sure that
all “quadrants” of the metadata space are represented by at least one ULM. Some-
times, the ULM does not exactly match a profile; ULM 20, for example, has a
3
using the search engines www.hotbot.com and www.google.com
135
Table 8.4: The actual ULMs and their metadata values that were used in the experiment.
ULM size (k) size interact. difficulty time (min) time profile
1 3.9 small medium easy 20 long 6
2 2.1 small none easy 10 short 1
3 0.6 small a lot very easy 10 short 9
4 2.9 small none difficult 10 short 15
5 29.2 large none very diff. 60 long 16
6 36.2 large none very easy 20 long 14
7 0.9 small some very easy 7 short 5
8 18 large some easy 13 short 17
9 18 large a lot easy 12 short 21
10 2.3 small none difficult 10 short 3
136
11 10.6 medium none very easy 10 short 13
12 1.7 small medium difficult 8 short 7
13 4.5 small medium very diff. 30 long 8
14 3.0 small a lot difficult 45 long 12
15 1.6 small a lot difficult 10 short 11
16 12.5 medium a lot difficult 50 long 24
17 13.8 large a lot difficult 10 short 23
18 4.6 small none easy 18 medium 2
19 5.0 small none difficult 18 medium 4
20 2.5 small a lot very easy 15 medium 10
21 15.2 large some very easy 25 long 18
22 20.8 large some very diff. 35 long 20
23 22.4 large some difficult 15 short 19
24 18.3 large a lot very easy 30 long 22
“medium” duration, while it should be “long” according to its profile (10). It was
decided to leave these ULMs as intact as possible, to preserve their representative-
ness for real-life learning material. The fact that they do not fully comply with the
specification has no influence on the outcomes of the current experiment; the only
objective of the 24 profiles is to ensure that all metadata fields get a chance to play
a role in the distance measure.
Table 8.5: The framework for the design of the case descriptions
137
Below, the eight cells in the framework are worked out into a case description:
4. Large size material: it is rather difficult to imply a need for large material. If
the same material exists with a smaller size, then it is not logical to choose
the large materials. For this reason, this combination was found to be insuf-
ficiently realistic and no case description was created.
6. Low difficulty: as the material is about a technical subject, it was decided to de-
scribe a situation in which non-technical students would have to learn it. This
can only be achieved if the material is sufficiently easy: A non-technical fac-
ulty has decided to give her students a course on Information Technologies.
The students do not have any prior technical knowledge. As the programs
‘ping’ and ‘traceroute’ can be found on any PC and as these programs il-
lustrate how the Internet works, the Internet Control Message Protocol and
these two programs are also part of the course.
138
7. Short duration: the need for material with a short duration can be suggested
by describing a situation in which there is only very little time, such as: The
course that you give already demands too much time from the students. Still,
a subject is missing: the Internet Control Message Protocol. The existing
material cannot be made smaller, but still you want to give some material on
ICMP.
8. Small sized material is needed when the Internet connections are very slow, for
example when the target audience is at a large distance. This observation
formed the inspiration to the following case description: Due to a European
project, your course has gained an extra target audience: students of a uni-
versity in the eastern part of Europe want to study parts of your course on
computer networks. Their level is comparable to that of your own students,
but of course their network connection is much slower. The standard books
such as “Computer Networks” by Tanenbaum are not available, so the stu-
dents have to rely on your online materials. The subject that you are looking
material for, is again ICMP.
So, there are six cases with each six ULMs to judge. However, during the
pilot test (see Section 8.3.7) it was noted that one case takes about 15 minutes,
so the experiment would take 6 15 = 90 minutes. This was found to be too
long, and also there would be a risk that filling in the ULM evaluation form (that
will be described in Section 8.3.6) might become an automatism to the subjects.
Therefor it was decided that each subject would do three cases. Two versions
of the experiment were created: the first version uses the first three cases, and
the second version uses the second three cases. The test subjects were assigned
randomly to one of these two versions. Subjects of the first group were numbered
1, 2, 3 etcetera, while subjects of the second group were numbered 101, 102, and
so on.
139
subject will be asked to fill out a ULM evaluation form for each of the ULMs. A
search result list should be determined for each of the case descriptions that was
determined in Section 8.3.3. How to determine these lists? The current section will
find an answer to this question.
The purpose of the experiment is to try to relate the judgement of usability of
a ULM to a teacher with the distance between that ULM and the ‘perfect ULM’ of
that teacher. To properly calculate correlations, the entire range from “useless” to
“very useful” should be covered. So there should be some search results that are
very useful to a teacher, as well as search results that are useless. This means that
somebody or something will have to determine what ULMs are useful in a certain
case, and what ULMs are useless. Obviously, the computer by itself is not able
to determine the usability of a ULM; this is a decision that ultimately has to be
taken by human experts. Also, the distance measure itself cannot be used to check
for usability, because it is the subject of this study. The only option that remains
is to ask a human expert to select three useful and three useless Units of Learning
Material for each case, and to present these six ULMs as “search results” to the test
subject. Although the test subject may disagree with the expert on the usability of
the search results, it still can be expected that there is a certain agreement between
their opinion so that the test subject will find both useless and useful ULMs in the
search results.
The following procedure was used: hardcopies of the 24 ULMs were created
(insofar as this was possible), as well as a list with a very short objective description
of each ULM. A teacher from the Telematics Systems and Services group was
asked to review the list of the 24 ULMs online and if needed on hardcopy, and to
review the cases. The expert was then asked to take the list of ULMs and to write
down for each ULM some cases that the ULM was very useful in, and some cases
that the ULM was not useful in (see Table 8.6). The expert chose to denote the
usability of a ULM using the signs “+”, “–” and “o” for “useful”, “useless”, and
“undecided” respectively.
For example, the expert judged that ULM 4 was particularly suited in case 2
+
( ), and that ULM 2 was useless in all cases (–) except in case 5 (undecided).
After that, the expert was asked to take the list of cases and to select three
usable ULMs, and three useless ULMs per case using Table 8.6. The results are
shown in Table 8.7. The table shows, for example, that the expert chose ULM 4 as
a ‘useful’ ULM in case 2, and ULM 2 as a ‘useless’ ULM in case 1, 2, 3 and 6.
Two problems arose, however: not all ULMs are used in Table 8.7. For ex-
ample, ULM 5 and 7 are not in Table 8.7. Recall that per case, the three ‘useful’
ULMs and the three ‘useless’ ULMs together form the result list that will be re-
turned to the user. So a ULM that is not in Table 8.7 would not be present in
any result list, and hence would not participate in the experiment. This conflicts
140
Table 8.6: The ULMs (vertically) and their fitness for particular cases (horizon-
tally)
case case
ULM 1 2 3 4 5 6 ULM 1 2 3 4 5 6
1 0 0 + 13 0
2 – – – – 0 – 14 0 0 + 0 – 0
3 0 0 0 + 15 0 0 + 0 – 0
4 + 16 0 0 0 – – 0
5 0 17 0 0 0 – – –
6 – – – – 0 0 18 – – – – – 0
7 0 19 – – – – – –
8 0 – – 0 0 0 20 – – – 0 0 0
9 0 – 0 + 0 + 21 0 0 0 – – –
10 – – – – – 22 0 – 0 0 – –
11 – 0 – 0 0 – 23 0,+ – + 0 – –
12 – – – – 0 – 24 0 0 0 0 – –
Table 8.7: The list of search results: six ULMs varying from ‘useful’ to ‘useless’
for each of the six cases.
case useful (rated ‘+’) undecided (rated ‘0’) useless (rated ‘-’)
1 ULM 23 ULM 24, 16 ULM 2, 6, 11
2 ULM 4 ULM 11, 16 ULM 2, 22, 6
3 ULM 14, 23 ULM 22 ULM 2, 6, 11
4 ULM 9, 20 ULM 24 ULM 4, 6, 21
5 ULM 3 ULM 20 ULM 14, 18, 23
6 ULM 1, 9 ULM 20 ULM 2, 11, 12
141
with the design of the 24 experiment-ULMs: they have been designed especially
to cover the ‘metadata space’.
The second problem is that some ULMs are used quite often; ULM 2, for
example, is assigned to case 1, 2, 3, 5, and 6. Similarly, ULM 6 is used in case 1, 2,
3, and 4, so that these ULMs would be present in four of a total of six search result
lists. By counting the number of cases a ULM is used in Table 8.7, one can find
that there are eight unused ULMs, four ULMs that are used once, six ULMs that
are used two times, two ULMs that are used three times, two ULMs that are used
four times, and one ULM that is used five times (ULM 2). As the test subject has to
examine each ULM in all search results, he or she would have to judge ULM 2 five
times. This could become a bit boring to the test subject, which might endanger
the results of the experiment.
To solve both problems, the table of search results (Table 8.7) that was devel-
oped by the expert should be modified. This could compromise the subjectivity of
the table, but this thesis assumes that as long as the expert’s judgement concerning
the (un)usability of a ULM for a case is respected, the subjectivity of the search re-
sults list remains intact. So a ULM should only be assigned as a ‘useless’ ULM to
a case if the expert indicated that the particular ULM is ‘useless’ for that particular
case. Note that the purpose of the search result list is to return a list of six ULMs
varying from useful to useless as judged by an expert; this requirement remains
fulfilled.
So, it was decided to select ULMs that were used often (such as ULM 2) and
to replace them in some cases with ULMs that were unused (such as ULM 5). As
the judgement of the expert concerning what ULMs are useful in a case and what
ULMs aren’t should not be violated, Table 8.6 should be respected. For example,
the table says that the expert found ULM 5 “neither useful nor useless” (rated ‘0’)
for case 2. In case 2 (see Table 8.7), ULMs 4, 11, 16, 2, 22 and 6 are returned; it was
already shown that ULM 2 is used relatively often (four times). So by replacing
ULM 2 with ULM 5 in case 2, the number of times ULM 2 is used as a search
result is reduced, while ULM 5 is now incorporated into the experiment. Note that
ULM 2 was judged to be ‘useless’ for case 2, while it is now replaced with ULM 5
of which the expert judged it was ‘neither useful nor useless’ for case 2. The search
+000
results for case 2 are now “ ; ; ; ; ;; ;” which still fulfills the requirement of
returning both useful as well as useless ULMs.
There were eight unused ULMs in Table 8.7: 5, 7, 8, 10, 13, 15, 17, and 19, so
eight replacements had to be made. The resulting table and the changes made to
Table 8.7 to arrive at this result is shown in Table 8.8. In this revised table, there are
16 ULMs that are used one time, four ULMs that are used twice, and four ULMs
that are used three times. So a test subject will have to evaluate a ULM at most
three times, instead of up to five times according to Table 8.7.
142
Table 8.8: The modified cases (vertical) with six ULMs, both useful and useless
(horizontal).
This table shows per case the search results that will be presented to the test
subjects. Case 3, for example, has as search results ULMs 14, 15, 23, 2, 19 and
11; for an example of the search result list, see Figure 8.2. The experimental
environment was programmed so that the results were presented in a random order.
The search form that was designed to be used during the experiments is shown
in Figure 8.1. It allows the test subject to select desired values for the four meta-
data fields that were selected in Section 8.3.1 to be used in the distance measure. To
make the form more naturally-looking, some extra metadata fields were added, and
the ‘keywords’ field was filled in in advance with two relevant keywords: “icmp”
and “internet”. The teacher is still able to modify the keywords, but this has no im-
pact on the search results: these are pre-determined, as described in Section 8.3.4.
143
The search results
Figure 8.2 shows an example of how the search results are presented to the test
subject. The table shows the ID of the Unit of Learning Material, a short descrip-
tion, and two links on which the test subject can click: ‘view’ to view the contents
of a ULM, and ‘metadata’ to get the ‘metadata overview’. Figure 8.3 shows what
the metadata overview looks like. The metadata fields were the same ones as used
in the search specification form.
After having examined the contents and the metadata of the search results,
the user could click a button “to next case” after which a new search form was
presented. When the last case was finished, the subject was thanked for his or her
cooperation, and was logged out.
Manual
A manual was written to introduce the test subject to the experimental environment
to instruct the test subject how to carry out the experiment. The figures 8.1, 8.2,
and 8.3 were incorporated to prepare the test subject to using the experiment envi-
ronment. The final version of the manual, after the revisions induced by the pilot
test (see below), is printed in Appendix C.2 (Dutch).
144
Figure 8.1: Search form that was designed to be used during the experiments.
145
Figure 8.2: An example of the ‘search results’ screen.
146
Figure 8.3: An example of a ‘metadata overview’ screen.
147
8.3.7 Pilot test
In the previous section, the various parts of the research instrument were described:
the dimensions of the distance measure, the contents of the database, the case de-
scriptions, how the search result list is composed, the experimental environment,
the manual, and the ULM evaluation form. To verify if these components together
interoperated as intended, a pilot test was performed on August 29th, 2000 with
a test subject (a researcher in the area of computer networks). The subject first
read the manual, and was asked whether anything was unclear. The subject indi-
cated that everything was clear to him. Then, the subject was asked to carry out
the experiment, and to mention any difficulties. The researcher kept a time log to
see how much time every step took to get an impression of the time a test subject
would need to complete the experiment.
The following issues arose:
The subject found it difficult to indicate whether or not he agreed with the
propositions on the evaluation form.
The evaluation form said ”number” while the prototype said ”ID” when in-
dicating ULMs.
The subject wondered whether he had to fill in factors on the evaluation form
that were irrelevant to the case. The instructions should mention that this is
not needed.
The evaluation form should have “I agree” on the right hand, and “I disagree”
on the left hand instead of the other way around.
The subject wondered whether ULMs could be combined into larger con-
structs. This is not allowed, so the manual should explain this clearly.
The button “back to search results” in the online environment did not always
work as expected if the subject had also used the “back” button of the web
browser.
The subject needed about 15 minutes to do one case; this was slightly more
than anticipated so that it was decided to let each subject do only three cases.
During this pilot test, as well as during informal discussions with other teach-
ers, two more factors were determined that might be of importance when judging
the usability of a ULM: the level of completeness (denoted as Fcompleteness ) and
the measure to which the abstractness level of the ULM was appropriate for the
148
educational situation (denoted as Fabstraction ). These two factors were added to
the list presented in Table 8.1 and to the ULM evaluation form.
These issues were resolved in the next version of the evaluation form, which is
depicted in Appendix C.4. The prototype was also updated to address these issues,
as well as the manual (see Appendix C.2).
8.4 Subjects
To ensure the validity of the experiment, the subjects have to be familiar with the
subject matter so that they can make a founded decision upon the usability of a
ULM in the hypothetical situations. As the subject matter, the Internet Control
Message Protocol, is often part of courses on computer networks, the websites of
Dutch technical universities were browsed to search for teachers that give these
courses. During the search for online learning material (see Section 8.3.2) a non-
technical university was discovered that also gave a course on computer networks.
In total, 11 teachers were found. The teachers were approached by email to inquire
whether they were willing to participate in the experiment; the letter used can be
found in Appendix C.1 (Dutch). The teachers of the University of Twente, where
the researcher had his office, were approached in person. Nine teachers agreed
to participate: four from Delft University (TUD), two from Eindhoven University
(TUE), two from Leiden University (LU) and three from Twente University (UT).
8.5 Experiment
This section will first present the results of the experiment concerning the valida-
tion of the distance measures (to answer Research Question 8.1), and then it will
discuss the factors that are relevant to the teachers when selecting online learning
materials (to answer Research Question 8.2).
149
The second source of data is the ULM evaluation form. The variable S records
the score that the subject assigns to the ULM. The ULM evaluation form also
provides data on the factor variables as listed in Table 8.1. These variables take the
value “1” if the subject indicated that the factor did play a role when judging the
usability of a ULM, and “0” in all other cases.
The third source of data are the distances as calculated using the three formu-
las for the Weighted Euclidean Distance (WED), the Weighted Taxicab Distance
(WTD), and the Equal Weights Distance (EWD), yielding the variables WED ,
WTD, and EWD.
8.5.2 Validity of the distance measure
Recall Research Question 8.1:
150
Table 8.9: Spearman’s correlation coefficient for the relationship between the sub-
ject’s score S and the three distance measures EWD , WED , and WTD
EWD WED WTD
Subject corr sig corr sig corr sig N
1 (UT) -0.18 0.47 -0.45 0.06 -0.34 0.16 18
2 (UT) -0.07 0.79 -0.05 0.85 -0.09 0.71 18
3 (TUD) -0.67 0.00 -0.51 0.03 -0.56 0.02 18
102 (TUE) -0.53 0.02 -0.46 0.05 -0.25 0.32 18
103 (TUE) -0.56 0.02 -0.44 0.07 -0.44 0.07 18
All subjects -0.21 0.04 -0.26 0.01 -0.20 0.06 90
UT: University of Twente
TUE: University of Eindhoven
TUD: University of Delft
The line “all subjects” in Table 8.9 shows the overall performance of the three
distance measures. It shows that in general, the Weighted Euclidean Distance per-
forms best (correlation coefficient -0.26, predicting 6.7% of the variance). The
performance of the Equal Weights Distance, however, is not very much worse: a
correlation coefficient of -0.21. Recall that in the Equal Weights Distance, all me-
tadata fields are assigned an equal weight. So a third conclusion that can be drawn
from the current research is that the “relative importance” (the weight vector w)
that teachers assign to the metadata fields has not much effect on the performance
of the distance measure. To get more insight into the effects of the weight vector,
the next paragraph will analyze the experimental data some more.
151
0.270
0.260
0.250
0.240
0.230
corr.
0.220
0.210
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
weight step
Figure 8.4: The correlations between the weighted Euclidean distance and the test
subjects’ scores with different weight steps (N=90).
152
Table 8.10: Correlations between the weighted Euclidean distance and the test
subjects’ scores with different weight steps (N=90).
weight step correlation significance
0.0 -0.213 0.044
0.1 -0.230 0.029
0.2 -0.241 0.022
0.3 -0.249 0.018
0.4 -0.255 0.015
0.5 -0.259 0.014
0.6 -0.262 0.013
0.7 -0.264 0.012
0.8 -0.265 0.012
0.9 -0.265 0.012
1.0 -0.264 0.012
8.5.5 Factors
Recall that on the ULM evaluation form, 12 factors were mentioned for each of
which the subjects had to indicate whether or not this factor played a role when
judging the usability of a search result (see Section 8.2.1). These factors were
153
included to find answers to Research Question 8.2: what factors are relevant to
teachers when making a judgement about the usability of learning materials? Hav-
ing an understanding of these factors is important to create a search method (i.e. a
distance measure) that can help the teachers in finding relevant learning material.
As each subject has judged six ULMs in three cases, there are 18 ULM eval-
uation forms per subject; so five subjects yield a total of 90 evaluation forms. For
each factor, the times a subject had indicated that the factor did play a role when
judging the usability of a ULM. The resulting histogram is pictured in Figure 8.5.
Flayout 54
Freferences 44
Ftextstyle 44
Fcorrectness 63
Fpedagogy 53
Fexercises 67
Fcompleteness 61
Finteractivity 66
Ftime 60
Fdifficulty 57
Fsize 9
Fabstraction 47
0 10 20 30 40 50 60 70
Figure 8.5: Histogram of the times a factor was relevant to a subject (N=90).
The figure shows that almost all factors are taken into account when judging
the usability of a ULM (44 to 66 times out of 90). The only factor that is not often
considered is Fsize , which represents the size of a ULM. Note that the factors on
which the dimensions of the distance measure are based in the current experiment
(interactivity, difficulty, size and time) were also measured. The figure shows that
there are other factors that are more often considered when judging the usability of
a ULM. This data suggests that the distance measure as used in the current research
can be improved by removing the “size” dimension (which is not considered of-
ten by the subjects), and including other factors that the teachers do consider often
according to Figure 8.5, and that are semantically orderable as explained in Sec-
tion 7.2.1, such as the completeness and the abstraction level of a ULM. Further
research can use this input to refine the distance measure, and to try to increase
the “predictability” of it (i.e. the correlation between the distance measure and the
teacher’s scores for the usability of a ULM).
154
8.6 Conclusion
This chapter has explored a novel search method based on educational metadata to
answer Research Question 3. The method uses a distance measure that calculates
how much a search result differs from the search specification (i.e. how relevant it
is to a teacher). Based on results from a survey research described in Chapter 7, it
was decided to base the distance measure (‘how different is a search result from the
search specification’) on weights that had to be indicated by the teacher explicitly
each time he or she submits a search query to the database.
In this chapter, an experiment was devised to validate three distance measures:
a weighted Taxicab distance, a weighted Euclidean distance (in which the weights
increase by 1) and a Euclidean distance with equal weights. The data suggest that
a weight step of about 1 yielded optimal results, while the performance decreased
rapidly with a smaller weight step. Therefor, a Weighted Euclidean Distance with a
weight step of 1 should give the best results (i.e. it correlates best with the teachers’
judgement on the usability of the search results).
It should be noted that for some test subjects, the distance measure worked par-
ticularly well (up to a correlation of -0.67), while for other subjects no significant
correlation was found. Apparently, for some persons it works better than for oth-
ers. Further research is needed to study why it works well for some subjects, and
why it doesn’t for others.
Finally, this chapter investigated what characteristics (factors) of learning ma-
terial were relevant to the test subjects when judging the usability of the search
results. The research results suggested that only one of the presented factors was
almost never found relevant: the size of learning materials. The other factors were
found relevant many times. This thesis proposes that the more of these factors
are incorporated into the distance measure, the better it will be able to predict the
teachers’ scores on the usability of search results.
155
156
Chapter 9
Conclusions and
recommendations
9.1 Introduction
This section will discuss the results of this thesis. Section 9.2 will recall the re-
search questions, and discuss what answers were found. Then, Section 9.3 will
state some recommendations for other research efforts based on the results of this
thesis. Section 9.4 will conclude the chapter.
RQ1 What is an appropriate model to store and retrieve multimedia learning mate-
rial so that it can be used by multiple target groups with different information
157
needs?
RQ2 What factors are of influence on the reusability of learning materials that are
stored in a multimedia database?
The sections below will describe how answers were found for each of these
research questions.
The model includes the concept of ‘context adapters’: small (textual) ele-
ments that are presented before or after a certain ULM to change the context
for which the ULM was originally designed. This makes the ULM more
generic, and thus more reusable.
158
Only the opportunities that the educational metadata provides are explored in this
thesis. Although the relationships and the context adaptors also provide many op-
portunities that can be used to enhance electronic learning systems, due to time
constraints these characteristics were not further studied. More research is needed
to explore these concepts and to develop methods and design principles to use them
effectively in educational learning systems.
1. the search method that is used to find appropriate Units of Learning Material
that reside in an educational multimedia database system. In this thesis, a
search method was developed based on educational metadata (see RQ3 and
Chapter 7).
159
describes an experiment to test if the proposed distance measure is indeed able to
predict the usability of ULMs. The results suggest that for some teachers the dis-
tance measure is very well able to predict the usability of search results, while for
some teachers it is not at all able to predict the usability (see Section 8.5.2).
What is the cause of this? The fact that the distance measure is unable to pre-
dict the usability of learning materials, means that apparently some teachers base
their judgement upon other characteristics than ‘encoded’ in the metadata used in
the experiment. A possible explanation could be that these other characteristics
are related to pedagogical principles that are difficult to explicitly describe. From
conversations with teachers about this subject, it appeared that sometimes teachers
choose a ULM just because it contains a picture that perfectly explains the rela-
tionships between concepts of the subject matter, or because a ULM contains a
very good exercise. Even if other characteristics of the material are less optimal
(such as amount of interaction, or size) then the teacher will still choose the ULM.
It could be that teachers for whom the distance measure works better base their
opinion more upon the explicit characteristics of the ULMs.
This leads to the proposition that the distance measure is perhaps better suited
for application areas where a user will base his or her judgement more on the char-
acteristics encoded in the metadata fields. For example, on the Internet hundreds of
building plans for model airplanes can be found. Now, imagine an online database
of building plans for model airplanes that uses metadata fields based on semanti-
cally orderable properties of these plans, such as the airplane’s wing span, engine
power, weight, and scale factor of the plane. A distance measure could be based
upon these metadata fields, and a search interface could be built that allows the user
to search building plans. The user would then be able to find building plans that
suit his basic requirements, after which he would be able to customize the airplane
during the actual building process (using balsa wood, a knife, glue, and paint).
Stated in more general terms, the distance measure may be well suited for
those application areas where a certain degree of ‘re-purposing’ or manufacturing
is already part of the natural process. In the educational setting, most teachers want
to reuse a ULM unmodified; changing it may sometimes even cost more time than
creating it from scratch.
The distance measure as it was used in Chapter 8 uses the following metadata
fields: size, educational level, difficulty, and typical learning time. Using these
fields, it is able to predict the usability of Units of Learning Material to some
extent. If more metadata fields are taken into account by the distance measure,
then it may be better able to predict the usability of ULMs. Further research is
needed to investigate what combination of metadata fields yield the best results
(i.e. allow the distance measure to best predict the usability of ULMs).
160
9.2.4 Software Architecture
Chapter 6 describes a software architecture to answer RQ4. To make sure that the
architecture is able to function properly in practice, a list of ten requirements was
developed (see Section 6.2). These requirements express the need for the basic
functionality, the user interface, performance requirements, and design constraints.
Table 9.1 presents the ten requirements, and the sections in which it is described
how these requirements are met.
The architecture has proven to allow implementations that can have many dif-
ferent “faces”. The face of an application can be changed without running the risk
of altering the programming instructions by accident; no HTML, SQL and script
languages need to be interleaved, thus preventing many errors. Encoding the data-
base results, and in fact the entire interaction between the user and the application
in XML has proven to be a suitable way to implement the separation of content,
layout, and programming instructions.
XML was also used to encode the internal structure of ULMs. This concept
was not explored fully, as many good alternatives exist (such as SMIL, HyTime,
MHEG). Still, an Educational Markup Language that is based on XML provides
many advantages such as enabling search methods based on the internal structure
of ULMs, and adding layout to ULMs using XSL. For example, assignments and
161
questions can be automatically made accessible by clicking a button, or a search
method could automatically show only the introduction of a ULM so that the user
can quickly get an impression of what the ULM is about. The disadvantage is that
dedicated XML-authoring tools are needed, that are able to communicate properly
with the educational database application itself. As XML gains more momentum
in industry, these tools will probably become available during the current decade.
This thesis has shown that using the XML/XSL duo, content and layout can
be separated. Due to this property, XML is very useful as a vehicle to transport
certain content to various types of end-user “terminals”: mobile devices, desktop
computers, text-only devices, or as input for automated information processing
devices. XML is better suited for these tasks than HTML; it is, for example, very
difficult to display HTML pages with a rich layout on text-only devices. This thesis
therefor recommends that XML be used in the next generation of the World Wide
Web.
9.3 Recommendations
Based on the results from the current research, a number of recommendations can
be stated. These will be discussed below.
162
9.3.3 Information Retrieval
In Chapter 8 a novel research method was used to test the effectiveness of a search
mechanism (the distance measures). The method involves developing various case
descriptions, and asking an expert to create a ‘fake’ search result list. Test subjects
then examine the result list, and assign scores for the relevance of each search
result. This list is then correlated with the ‘prediction’ of the search method that is
to be tested (see Section 8.2.2).
The Information Retrieval discipline should examine this research method to
determine if it can be used to test the effectiveness of other search methods and
algorithms.
2. perform research to find the characteristics that the users of the database
system find most important;
5. perform research (much the same way as done in Chapter 8 to study if the
distance measure is sufficiently able to predict the usability of the search
results;
163
found fruitful: educational metadata can be used to develop a novel search method
that is not only relevant to the educational world, but that may also prove useful to
many other disciplines. In this respect, this thesis is truly multidisciplinary.
164
References
Alberto Del Bimbo, E. V., & Zingoni, D. (1995). Symbolic description and visual
querying of image sequences using spatio-temporal logic. IEEE Transac-
tions on knowledge and data engineering, 7(4), 609 – 621.
Bestebreurtje, R., & Verhagen, P. W. (1992). ODB project: developing the concept
of an instructional multimedia database for multiple targetgroups. Presented
at the AECT ’92 annual convention “capture the vision”, Washington, DC,
February 4 – 9, 1992.
Bestebreurtje, R., Verhagen, P. W., & Zwart, W. J. (1995). C/BB approach: a way
to organise multimedia databases. Presented at the 1995 AECT national
165
convention “information technology: expanding frontiers”, Anaheim, CA,
February 8 – 12, 1995.
Boll, S., Klas, W., & Sheth, A. (1998). Overview on using metadata to manage
multimedia data. In A. Sheth & W. Klas (Eds.), Multimedia data manage-
ment: using metadata to integrate and apply digital media (pp. 1 – 24). New
York: McGraw-Hill.
Broeke, B. A. ten, Zwart, W. J., Verhagen, P. W., & Rhemrev, A. (1994, Novem-
ber). Ontwerp-richtlijn voor ordening en structurering van instructieve mul-
timedia. Enschede: University of twente.
Brusilovsky, P., Schwarz, E., & Weber, G. (1996). ELM-ART: An intelligent tutor-
ing system on world wide web. In G. Goos, J. Hartmanis, & J. van Leeuwen
(Eds.), Lecture notes in computer science volume 1086: Third international
conference, ITS ’96 (pp. 261 – 269). Berlin: Springer.
Codd, E. F. (1970). A relational model of data for large shared data banks. Com-
munications of the ACM, 30(4), 377 – 387.
166
Collis, B. (1998b). WWW-based rapid prototyping as a strategy for training uni-
versity faculty to teach WWW-based courses. In B. Khan (Ed.), Web-based
training. Englewood Cliffs, NJ: Educational Technology Publications.
Collis, B. A., & de Boer, W. (1999). The TeleTOP decision support tool. In
J. van den Akker, R. Branch, K. Gustafson, H.M.Nieveen, & T. Plomp (Eds.),
Design approaches and tools in education and training (pp. 235 – 248).
Dordrecht: Kluwer Academic Publishers.
Costagliola, G., Ferrucci, F., Tortora, G., & Tucci, M. (1995). Non-redundant 2d
strings. IEEE Transactions on knowledge and data engineering, 7(2), 347 –
350.
Davis, M. (1995). Media streams: an iconic visual language for video representa-
tion. In R. M. Baecker, J. Grudin, W. A. S. Buxton, & S. Greenberg (Eds.),
Readings in human-computer interaction: Toward the year 2000 (pp. 854 –
866). San Fransisco: Morgan Kaufmann Publishers, Inc.
Duval, E. (1999, June). An open infrastructure for learning - the ARIADE project,
share and reuse without boundaries. Keynote session at Enable ’99, June
2-5, Espoo, Finland.
Faloutsos, C., Equitz, W., Flickner, M., Niblack, W., Petkovic, D., & Barber, R.
(1994). Efficient and effective querying by image content. Journal of Intel-
ligent Information Systems, 3(3/4), 231–262.
167
Ferguson, G. A. (1981). Statistical analysis in psychology and education (3rd ed.).
McGraw-Hill Inc.
Gow, L., & Kember, D. (1993). Conceptions of teaching and their relationship to
student learning. British Journal of Educational Psychology, 63, 20 – 33.
Hakala, J., Husby, O., & Koch, T. (1996). Warwick Framework and Dublin Core
set provide a comprehensive infrastructure for network resource description
(Tech. Rep.). Warwick, UK: OCLC.
Hannafin, R. D., & Sullivan, H. J. (1995). Learner control in full and lean CAI
programs. Educational Technology: Research and Development, 43(1), 19
– 30.
168
Hiddink, G. W. (1998). Educational multimedia databases: past and present
(Tech. Rep. No. TR-CTIT-20). University of Twente, Centre for Technology
and Information Technology.
Hiddink, G. W. (2001a). Using XML to solve reusability problems of online
learning materials. Campus Wide Information Systems, 18.
Hiddink, G. W. (2001b). ADILE: Architecture of a database-supported learning
environment. Journal of Interactive Learning Research, to appear.
Horn, R. E. (1989). Mapping hypertext. the analysis, organization, and display of
knowledge for the next generation of on-line text and graphics. Lexington,
MA: The Lexington Institute.
IDYLLE. (1996). Werkplan Idylle (Tech. Rep.). Enschede: University of Twente,
Centre for Telematics and Information Technology.
IEEE. (2000). Learning Object Metadata. Standard in preparation; preliminary
versions can be retrieved from: http://ltsc.ieee.org.
ISO. (1992). Hypermedia/time-based structuring language (HyTime), ISO/IEC
10744-1992 (E) edition. Geneva, Switzerland: International Organization
for Standardisation.
ISO. (1996). Coding of moving pictures and associated audio for digital storage
media at up to 1.5 Mbit/s, ISO/IEC 11172. Geneva, Switzerland: Interna-
tional Organisation for Standardisation.
Jensen, F. V. (1996). An introduction to Bayesian networks. London: UCL Press
Ltd.
Kagan, D. M. (1990). Ways of evaluating teacher cognition: Inferences concerning
the goldilocks principle. Review of Educational Research, 60(3), 419–469.
Kagan, D. M. (1992). Professional growth among preservice and beginning teach-
ers. Review of Educational Research, 62(2), 129–169.
Kececioglu, J., & Sankoff, D. (1993). Exact and approximation algorithms for the
inversion distance between two chromosomes. Lecture Notes in Computers
Science, vol. 684, 87 – 105.
Kobla, V., & Doermann, D. (1997). Compressed domain video indexing techniques
using DCT and motion vector information in MPEG video. In Proceedings
of SPIE – conference on storage and retrieval for image and video databases
V (Vol. 3022, pp. 200–211). San Jose, CA.
169
Korfhage. (1997). Information storage and retrieval. New York: John Wiley &
Sons.
Kozma, R. B. (1991). Learning with media. Review of Educational Research,
61(2), 179 – 211.
Kromhout, O. M., & Butzin, S. M. (1993). Integrating computers into the elemen-
tary school curriculum: an evaluation of nine project CHILD model school.
Journal of Research on Computing in Education, 26(1), 55–69.
Lang, S. (1989). Linear algebra. New York: Spinger-Verlag.
Li, Z., & Merrill, M. D. (1991). ID Expert 2.0: Design theory and process.
Educational Technology: Research and Development, 39(2), 53–69.
Liu, Z., Huang, J., Wang, Y., & Chen, T. (1997). Audio feature extraction & anal-
ysis for scene classification. In Proceedings of the workshop on multimedia
signal processing (pp. 343 – 348). Princeton, NJ.
Marcke, K. van. (1995). Genericity in instructional knowl-
edge. Presented at AI-ED ’95, the workshop on authoring
shells for ITS, washington, Retrieved August 15th 2001 from:
http://www.pitt.edu/˜al/aied/van marc.html.
Mast, C. van der, & Rantanen, J. (1990). Next generation authoring systems:
integration of multiple methodologies and tools. In S. A. Cerri & J. Whiting
(Eds.), Learning technology in the European communities: proceedings of
the Delta conference on research and development, the Hague (pp. 519 –
533). Dordrecht: Kluwer Academic Publishers.
Merrill, M. D. (1987). An expert system for instructional design. IEEE Expert,
1(2), 25–40.
Merrill, M. D., Li, Z., & Jones, M. K. (1990). Second generation instructional
design (ID2). Educational Technology, 30(2), 7–14.
Meyer-Boudnik, T., & Effelsberg, W. (1995). MHEG explained. IEEE Multimedia,
2(1), 26–38.
Milheim, W. D. (Ed.). (1994). Authoring-systems software for computer-based
training. Englewood Cliffs: Educational Technology Publications, Inc.
Moonen, J. (in press). Design methodologies. In H. Adelsberger, B. Collis, &
J. Pawlowski (Eds.), Handbook of information technologies for education
and training. Berlin: Springer Verlag.
170
Moonen, J., & Plomp, T. (Eds.). (1987). Eurit 86: Developments in educational
software and courseware. Oxford, England: Pergamon Press.
Olimpo, G., Chioccariello, A., Tavella, M., & Trentin, G. (1990). On the concept
of reusability in educational design. In S. A. Cerri & J. Whiting (Eds.),
Learning technology in the European communities: proceedings of the Delta
conference on research and development, the Hague. Dordrecht: Kluwer
Academic Publishers.
Park, I., & Hannafin, M. J. (1993). Empirically-based guidelines for the design
of interactive multimedia. Educational Technology: Research and Develop-
ment, 41, 63–85.
Persico, D., Sarti, L., & Viarengo, V. (1992). Browsing a database of multimedia
learning material. Interactive Learning International, vol. 8, 213 – 235.
Reigeluth, C. M., Merrill, M. D., Wilson, B. G., & Spiller, R. T. (1980). The
Elaboration Theory of Instruction: A model for sequencing and synthesizing
instruction. Instructional Science, 9, 195-219.
Resmer, M. (1999, June). Building the internet architecture for learning - the IMS
project. Keynote session at Enable ’99, June 2-5, Espoo, Finland.
171
Rudin, W. (1953). Principles of mathematical analysis. New York:McGraw-Hill
Books Co.
Russell, G., & Bradley, G. (1997). Teachers’ computer anxiety: implications for
professional development. Education and Information Technologies, 2(1),
17 – 30.
Sarti, L., & Van Marcke, K. (1995, August). Reuse in intelligent courseware
authoring. Presented at AI&ED’95, August 16 – 19, Washington, USA.
Seidl, T., & Kriegel, H.-P. (1997). Efficient user-adaptable similarity search in
large multimedia databases. In Proceedings of the 23rd VLDB conference,
August 25 – 29 (pp. 506 – 515). Athens, Greece.
Tan, W., & Nguyen, A. (1993). Lifecycle costing models for interactive multime-
dia systems. In C. Latchem, J. Williamson, & L. Henderson-Lancett (Eds.),
Interactive multimedia – practice and promise (pp. 151 –164). London: Ko-
gan Page.
Tobin, K., & Dawson, G. (1992). Constraints to curriculum reform: Teachers and
the myths of schooling. Educational Technology: Research and Develop-
ment, 40(1), 81 – 92.
172
Verhagen, P. W., & Bestebreurtje, R. (1995). Towards the architecture of an in-
structional multimedia database. Journal of Computer Assisted Learning,
11(3), 80–91.
Verhagen, P. W., Blanken, H. M., Moonen, J. C. M. M., & Apers, P. M. G. (1996).
Distributed educational databases: design, production and application. En-
schede: University of Twente, Faculty of Educational Science and Technol-
ogy.
Weber, R., & Zezula, P. (1997, september). The theory and practice of similarity
searches in high dimensional data spaces. Retrieved August 15th 2001 from:
http://www-dbs.ethz.ch/weber/paper/DELOS97.ps.
Weibel, S., Godby, J., & Miller, E. (1995). OCLC/NCSA metadata work-
shop report. Retrieved August 15th 2001 from: http://www.oasis-
open.org/cover/metadata.html, Online Computer Library Center, Inc.
Wiesman. (1999). Information retrieval by graphically browsing meta-
information. Unpublished doctoral dissertation, Enschede: University of
Twente.
Wold, E., Blum, T., Keislar, D., & Wheaton, J. (1996). Content-based classifica-
tion, search, and retrieval of audio. IEEE Multimedia, 3(3), 27–36.
Yazdani, M. (1990). Multilingual aspects of a multimedia database of learning
materials. In S. A. Cerri & J. Whiting (Eds.), Learning technology in the
European Communities: proceedings of the DELTA conference on research
and development, the Hague. Dordrecht: Kluwer Academic Publishers.
Yianilos, P. N. (1992). Data structures and algorithms for nearest neighbor search
in general metric spaces. In Proceedings of the 4th annual ACM-SIAM sym-
posium on discrete algorithms (SODA), Orlando, Florida (pp. 311 – 321).
New York: Association for Computing Machinery.
Yianilos, P. N. (1998). Excluded middle vantage point forests for nearest neighbor
search (Tech. Rep.). Princeton, NJ: NEC Research Institute.
Young, J. D. (1996). The effect of self-regulated learning strategies on performance
in learner-controlled computer-based instruction. Educational Technology:
Research and Development, 44(2), 17–27.
Zaniolo, C., Ceri, S., Faloutsos, C., Snodgrass, R. T., Subrahmanian, V. S., &
Zicari, R. (1997). Advanced database systems. San Francisco: Morgan
Kaufmann Publishers, Inc.
173
174
Summary
Research into methods to use computers in education started already in the sixties.
This research is a continuous process, because information technology is also in
continuous motion: new technologies become available every year, and with it also
new possibilities to use these technologies in education.
A technology that has seen a tremendous growth during the nineties is the
World Wide Web (WWW): millions of computers distributed over the entire globe
that together make available a huge amount of information in the form of web
pages. These pages can contain text, but also pictures, movies, and interactive
elements, or in other words, multimedia data. A standard consumer PC as it is for
sale currently is very well able to handle these kinds of data.
Another development that is important for the research described in this thesis,
is the development of relational databases. These were primarily used for admin-
istrative data in the eighties, but a number of new developments has made them
suitable to also store multimedia data (see also Chapter 2).
The combination of these two developments led to the concept of a multime-
dia database of learning materials that can be used by students and teachers via
the WWW. During the nineties, a lot of research efforts were put into developing
this concept. However, a problem that remained was that the costs of developing
multimedia learning materials can be quite high. To reduce these costs, researchers
have focused on reusing the same materials for different target groups on different
educational levels. Many factors can affect the reusability of learning materials;
these have been organized in the Formula-M model in Section 4.4 of this thesis.
A teacher that is looking for learning materials for a particular course can ac-
cess the database to check if another teacher has entered suited material previously,
and if so, reuse it in his or her course. The pieces of learning material that are stored
in the database are called “Units of Learning Material”, or ULMs for short.
The question that arises is, however: what Units of Learning Material are
‘suited’ given a certain educational setting1 ? The database must be able to de-
1
An educational setting are all educational aspects of the given situation, such as the target group,
175
termine this ‘suitability’, but at the moment a computer is not capable to determine
this itself. This knowledge must be added to a Unit of Learning Material by human
beings in the form of so-called “educational metadata”. This is information that
describes the object at hand: who made it, with what pedagogy, for what target
group, how large is it, about what subject matter is it, how difficult is it, and so on
(see Section 5.8). A standard that specifies these metadata is in preparation; using
this standard, it is precisely known what characteristics need to be described, and
what ‘values’ are allowed for these characteristics. If a teacher specifies his search
criteria in terms of these metadata values, then the database can use these values to
check if a certain ULM present in the database fits the teacher’s search criteria.
But there’s one problem: most database systems can currently only compare
whether the values for metadata fields are exactly identical, which is called “perfect
matching”. For example, a value “very difficult” for a characteristic “difficulty” of
a certain is not exactly identical to a search criterium “a bit difficult”, so that the
database decides not to return this ULM as a search result. However, the other
fields of this ULM might match all other search criteria perfectly, so that the ULM
might be a very usable to the teacher. This thesis tries to solve this problem by
proposing a measure of conformity between the search criteria and the ULM, based
on the metadata values. The database will then be able to find those ULMs that best
conform to the search criteria.
If the metadata characteristics are modeled using dimensions of a mathematical
space, then mathematical formulae can be used to calculate the distance between
two ULMs (see Section 7.2). The search criteria then also need to be modeled as
a (possibly non-existent) ULM: the “ideal ULM” that exactly matches all search
criteria. The measure of conformity between a particular ULM and the search
criteria is then equal to the distance between that ULM and the ideal ULM, and
therefor it is called a distance measure.
It should be noted, however, that not all characteristics are equally important
to a teacher. For example, a characteristic “duration”, indicating how much time
a typical student needs to work with the ULM, can be very important to a teacher
whose course already poses a high time demand upon the students. Similarly, a
characteristic “interactivity” can be very important to a teacher that is convinced
that education is more than just absorbing information, and that there should be an
interaction during the educational processes. This observation leads to the propo-
sition that there might be a relation between the importance (“weight”) a teacher
assigns to a characteristic, and his or her attitude towards giving education. If this
importance would be known then the computer would not have to ask them to the
teacher with each search query, which would simplify the search process.
the pedagogy that is being used, the educational level, and so on.
176
To attempt to find this relation, an experiment was conducted (see Section 7.3).
The teachers’ attitude towards giving education was described using a variant of
the model proposed by Samuelowicz and Bain (1992) who used five dimensions.
In the variant, one of the dimensions was split into three dimensions, resulting in
seven dimensions. The ‘position’ of the test subjects on each of these seven di-
mensions was determined using a questionnaire. The questionnaire also presented
six hypothetical educational situations, and asked the test subjects to indicate how
important they found five characteristics of learning material using the numbers 1
to 5. An analysis of the data only revealed a few very weak relations. The purpose
of the experiment was to predict the weights of the characteristics to a teacher, but
for this there were too fewrelations, and those that were found were too weak.
The implication of this is that the weights a teacher assigns to characteristics
of learning material have to be asked with each search query. To investigate how
effective the distance measure is using this method, a second experiment was de-
veloped. In this experiment, the test subjects had to work with a prototype of an
educational database. The effectiveness of the distance measure (which indicates
how well a search result conforms to the search criteria) was measured by compar-
ing the computed distance with a human judgement on the usability of the search
results: it can be expected that the smaller the distance is, the better the judgement
will be. In the experiment, the test subjects were given a hypothetical educational
situation, and they were asked to give search criteria that ULMs should conform
to in that situation. The search criteria consisted of five characteristics of learning
material as specified in the metadata standard. Then, the test subject was asked in
what order the system should check each characteristic; it was assumed that the
characteristic that the subject would like to have checked first was the most im-
portant one; the characteristic that should be checked second was the second-most
important one, and so on. Then, the prototype would present a predetermined list
of search results, and the test subject was asked to evaluate each search result and
indicate a score ranging from 1 to 10. The prototype calculated the distance be-
tween each search result and the ‘ideal ULM’, making use of the weights that the
test subject indicated. These distances were compared with the scores of the test
subjects after the experiment. An analysis showed a strong relation between the
computed distance and the scores of some test subjects, while it did not show a
relation at all with some other test subjects (see Section 8.5).
In the experiment described above, only a small number of characteristics of
learning material was used, described in metadata fields (the selection of these
fields can be found in Section 8.3.1). The experiment assumes that using these
characteristics it can be predicted whether the teacher finds the search result usable
in the given situation. As such, these characteristics are factors that relate to the
judgement of the teacher. The more factors are known, the better the prediction
177
will be. So the experiment also tried to provide insight into what other factors play
a role during the judgement of the teacher. The test subjects were asked for each
search result that they evaluated to also indicate for twelve factors whether this
factor played a role during the evaluation process or not. The results showed that
almost all of these factors play about equally often a role; except the factor “size”
(Fsize ) that did not very often play a role. As these twelve factors also included
the five characteristics of learning amterial that were used in the distance measure,
it can be concluded that there are still 12 5 1 = 6
; ; factors that do play a role
when judging learning materials, but that have not been included into the distance
measure.
From the results of this experiment it can be concluded that the distance mea-
sure does have potential, but that there is insufficient insight into what factors play
what role when a teacher is judging online learning materials. At least six factors
that do play a role have not been considered in the distance measure, which could
explain why the distance measure appeared to be able to well predict the scores
for some teachers, while it was not able to do so for other teachers. This provides
many opportunities to improve the distance measure.
Applying a distance measure based on metadata does not need to be restricted
to the educational domain. In every domain where metadata is used that fulfills
certain requirements (see Section 7.2) a distance measure can be defined; also, the
research method of comparing a computer judgement (the distance measure) with
a human judgement to validate the computer judgement, could prove to be very
useful in other application areas as well.
178
Samenvatting
179
databank worden opgeslagen, worden “leerobjecten” genoemd of ook “eenheden
van leermateriaal” (Units of Learning Material, of ULMs).
De vraag die hierbij echter rijst is: welke leerobjecten zijn ‘geschikt’ gegeven
een bepaalde onderwijscontext2 ? De databank moet dit kunnen beoordelen, maar
het is vooralsnog onmogelijk dat computers zelf kunnen beoordelen welk leerma-
teriaal het meest geschikt is voor een gegeven situatie. Deze kennis wordt daarom
door mensen toegevoegd aan een leerobject middels wat men noemt “onderwi-
jskundige metadata”. Dit is beschrijvende informatie over onderwijskundige ken-
merken van het materiaal, zoals: wie het gemaakt heeft, met welke pedagogiek,
voor welke doelgroep, hoe groot het leerobject is, over welk onderwerp het mate-
riaal gaat, hoe moeilijk het is, enzovoorts (zie paragraaf 5.8). Er is een standaard
in voorbereiding die deze metadata specificieert; wanneer gebruik gemaakt wordt
van deze standaard is precies bekend is welke kenmerken beschreven moeten wor-
den, en welke ‘waarden’ daarvoor toegestaan zijn. Wanneer een docent dan aan
de databank opgeeft voor welke doelgroep en welk onderwerp materiaal nodig is
(de zoek-criteria) dan kan de databank door middel van de metadata nagaan welke
aanwezige leerobjecten aan deze criteria voldoen.
Een probleem dat zich hierbij echter voor kan doen, is dat een databank slechts
in staat is om te vergelijken of de door de docent opgegeven gewenste waarde voor
een kenmerk, bijvoorbeeld de waarde “zeer moeilijk” voor het kenmerk “moeili-
jkheid”, precies gelijk is aan de waarde die de leerobjecten in de databank hebben
voor dit kenmerk; men noemt dit een “perfect match”. Een leerobject met de
waarde “moeilijk” komt dan al niet meer in aanmerking, terwijl dit leerobject voor
wat betreft de overige kenmerken misschien wel zeer toepasselijk is. In dit proef-
schrift wordt getracht dit probleem op te lossen door een maat op te stellen voor
de gelijkenis tussen de opgegeven zoekcriteria en een leerobject. De databank zal
dan in staat zijn om de leerobjecten te vinden die het meest lijken op wat de docent
opgegeven heeft.
Wanneer de metadata-kenmerken opgevat worden als dimensies van een wis-
kundige ruimte, dan kunnen wiskundige formules gebruikt worden om de afstand
te berekenen tussen twee leerobjecten (zie paragraaf 7.2). De zoekcriteria die de
docent opgegeven heeft, moeten dan ook worden opgevat als een fictief leerobject:
het ‘ideale leerobject’ dat precies voldoet aan alle opgegeven zoekcriteria. De mate
van gelijkenis tussen een leerobject in de databank en de zoekcriteria is dan gelijk
aan de afstand tussen het leerobject zelf en het ideale leerobject, en wordt daarom
‘afstandsmaat’ genoemd.
2
Onder ‘onderwijscontext’ wordt verstaan: alle onderwijskundige aspecten van de gegeven sit-
uatie, zoals samenstelling van de doelgroep, de gebruikte pedagogiek, het onderwijsniveau, en-
zovoorts.
180
Hierbij dient een kanttekening geplaatst te worden: niet elk kenmerk zal voor
elke docent even zwaar wegen. Zo kan een kenmerk “tijdsduur”, dat aangeeft hoe
lang een gemiddelde student bezig is met het leerobject, heel zwaar wegen voor
een docent wiens vak eigenlijk al teveel tijd vraagt van de studenten. Ook zou een
kenmerk “interactiviteit” zwaar kunnen wegen voor docenten die het belangrijk
vinden dat onderwijs meer is dan alleen maar het opnemen van informatie, en dat
er dus een vorm van interactie moet zijn tijdens het leren. Dit geeft aanleiding
tot het vermoeden dat er een zekere samenhang bestaat tussen het gewicht dat een
docent toekent aan een kenmerk, en de houding van de docent ten opzichte van
het geven van onderwijs. Deze samenhang is interessant, omdat daarmee wellicht
voorspeld kan worden hoe zwaar bepaalde kenmerken wegen voor een bepaalde
docent. En als deze gewichten van een docent bekend zouden zijn dan hoefde
de computer deze niet telkens aan de docent te vragen, en zou het zoekproces
eenvoudiger worden.
Om te trachten de hierboven beschreven vermoede samenhang te vinden, is
een onderzoek opgezet (zie paragraaf 7.3). De houding van de docent ten opzichte
van het geven van onderwijs werd beschreven door middel van een variant op het
model van de vijf dimensies zoals beschreven door Samuelowicz en Bain (1992).
In de variant werd één dimensie opgesplitst in drie dimensies, waardoor het model
zeven dimensies telde. De ‘positie’ van de proefpersonen op elk van de zeven
dimensies werd bepaald door middel van een vragenlijst. In deze vragenlijst wer-
den daarna zes hypothetische onderwijskundige situaties gepresenteerd, en werd de
proefpersonen gevraagd hoe zwaar elk van een vijftal kenmerken van leermateriaal
woog (door middel van getalletjes 1 tot en met 5). Een analyse van de onder-
zoeksgegevens bracht slechts enkele zwakke verbanden aan het licht. Het doel van
het onderzoek was te pogen de gewichten die een docent toekent aan kenmerken
van leermateriaal te voorspellen aan de hand van de houding van de docent; maar
hiervoor zijn er te weinig, en te zwakke verbanden gevonden.
Dit betekent dat de gewichten die een docent toekent aan kenmerken van leer-
materiaal bij elke zoekactie opnieuw aan de docent gevraagd moeten worden. Om
te onderzoeken hoe effectief de afstandsmaat is op die manier, is een tweede on-
derzoek opgezet waarin de proefpersonen met een prototype van een onderwi-
jskundige databank moesten werken. De effectiviteit van de afstandsmaat (die
aangeeft in hoeverre een zoekresultaat overeenstemt met de zoekcriteria) werd
gemeten door de berekende afstand te vergelijken met het oordeel van een proef-
persoon met betrekking tot de bruikbaarheid van het zoekresultaat: hoe kleiner de
afstand, hoe beter het oordeel naar verwachting. In het experiment werd de proef-
personen een aantal hypothetische onderwijskundige situaties voorgelegd, waarna
gevraagd werd om zoekcriteria op te stellen waaraan leerobjecten moeten voldoen
in zo’n situatie. Deze zoekcriteria bestonden uit een vijftal kenmerken van leer-
181
materiaal zoals is vastgelegd in de metadata-standaard. Daarna werd gevraagd in
welke volgorde het zoeksysteem de kenmerken moest controleren; hierbij werd
aangenomen dat het kenmerk dat de proefpersoon het eerst gecontroleerd wilde
hebben, het zwaarst weegt. Het tweede kenmerk weegt dan het op één na zwaarst,
enzovoorts. De proefpersoon kreeg daarna een fictieve lijst met zoekresultaten
te zien, en moest op een evaluatie-formulier invullen hoe bruikbaar elk zoekre-
sultaat was in de gegeven onderwijssituatie door middel van een score van 1 tot
10. Het prototype berekende telkens de afstand tussen het zoekresultaat en de
opgegeven zoekcriteria, gebruik makend van de gewichten die de proefpersoon
had opgegeven. Deze werden naderhand vergeleken met de scores van de proef-
personen. Uit de analyse bleek dat er een sterke samenhang bestond tussen de
gemeten afstand en de score van de proefpersoon voor enkele proefpersonen, ter-
wijl er voor andere proefpersonen geen enkele samenhang aangetoond kon worden
(zie paragraaf 8.5).
In het experiment dat in de bovenstaande alinea is geschetst werd een klein
aantal kenmerken van leermateriaal gebruikt, beschreven in metadata velden (de
keuze van deze kenmerken is te vinden in paragraaf 8.3.1). Het experiment gaat
ervan uit dat aan de hand van deze kenmerken gedeeltelijk voorspeld kan worden
of een docent een zoekresultaat bruikbaar vind of niet. Als zodanig zijn deze ken-
merken factoren die samenhangen met het oordeel van de docent. Des te meer van
deze factoren bekend zijn, des te beter is het mogelijk het oordeel van de docent
te voorspellen. Daarom is tijdens het experiment getracht om inzicht te krijgen in
andere factoren die mogelijk ook een rol spelen wanneer een docent leermateriaal
beoordeelt. Hiertoe moesten de proefpersonen bij het beoordelen van zoekresul-
taten van een twaalftal factoren aangeven of deze factor een rol speelde of niet. Uit
de resultaten blijkt dat vrijwel alle factoren ongeveer even vaak een rol spelen, en
dat alleen de factor “afmeting” (Fsize ) opvallend weinig een rol speelt. Aangezien
deze twaalf factoren ook de vijf kenmerken omvat die in de afstandsmaat verwerkt
12 5 1 = 6
zijn, kan geconcludeerd worden dat er nog minstens ; ; factoren zijn die
een rol spelen bij het beoordelen van leermateriaal, en die niet in de afstandsmaat
verwerkt zijn.
Uit de resultaten van het experiment kan geconcludeerd worden dat de afs-
tandsmaat weliswaar potentie heeft, maar dat er nog te weinig inzicht is in welke
factoren welke rol spelen wanneer een docent digitaal leermateriaal beoordeelt op
bruikbaarheid. Er zijn nog minstens zes factoren buiten beschouwing gelaten, het-
geen kan verklaren waarom de afstandsmaat voor sommige docenten een goede
voorspelling kan doen van de bruikbaarheid van leermateriaal, terwijl dat voor
andere docenten niet het geval is. Dit biedt duidelijke aanknopingspunten voor
verdere verbetering van de afstandsmaat.
Het toepassen van een afstandsmaat gebaseerd op metadata hoeft overigens
182
niet beperkt te blijven tot het onderwijskundige domein. Overal waar metadata
gebruikt wordt die aan bepaalde eisen voldoet (zie paragraaf 7.2) kan een afstands-
maat gedefinieerd worden. Ook de gebruikte onderzoeksmethodiek, waarbij het
door de computer berekende oordeel (de afstandsmaat) wordt vergeleken met een
door proefpersonen toegekende score om het computeroordeel op correctheid te
verifiëren, kan ook in andere toepassingsgebieden zeer nuttig blijken.
183
184
Appendix A
In the past, many projects have investigated theoretical and practical issues of ed-
ucational database systems. In order to build upon the results of these projects,
we will examine them in this appendix. Emphasis is put on the labeling systems
and the search facilities. The projects were found by searching the World Wide
Web, searching the library, and following literature references. Only projects that
actually use a database of learning materials are incorporated; a more elaborate
overview has been give elsewhere (Hiddink, 1998). Also, common commercial
packages whose internals are mostly unknown (such as Blackboard, WebCT, Ora-
cle Learning Architecture) are not discussed.
Note that many of the references to online materials (URLs) may have become
invalid since the time of writing. The reader is advised to use a WWW search
engine to find online documents back if the URLs have become invalid.
185
The database itself is based on a filesystem (i.e. the multimedia data are stored
outside the reach of the DBMS, see Section 2.2.2) and files are “registered” to the
system. Users have to describe the contents of the file with a few keywords and/or
write a small abstract. Users can then search for a file by providing keywords.
186
standard, which is based on the Dublin Core. The IMS project cooperated with
the Ariadne project (see Section A.9 to create the first draft of the IEEE Learning
Object Metadata specification.
187
8000
7000
6000
5000
nr. of items
4000
3000
2000
1000
0
2 4 6 8 10 12 14 16 18 20 22
lifetime (months)
188
The labeling system is an extension of the Dublin Core (see also Section 5.8.1,
and consists of 23 elements. The extension consists of the following fields5 :
Cataloging The cataloging agency provides basic information about the agency
that created the GEM catalog record
Grade Grade, grade span, educational leve, or age of the entity’s audience.
Quality The Quality Indicators element is a means for assessing the quality of
instructional materials.
Standards State and/or national academic standards mapped to the entity being
described.
During the project, controlled vocabularies for Audience, Format, Grade, Lan-
guage, Pedagogy, Relation, ResourceType and Subject were determined. During
1999, the GEM project was in the process of harmonizing their metadata with those
of the IMS project, which in turn has been one of the largest sources of input for
the IEEE Learning Object Metadata standard.
189
A.5.1 Labeling system
The resources are labeled with the following fields: author, type of resource (mul-
timedia, laboratories, visualizations), ACM subject classification, Computer Sci-
ence curriculum classification, keywords, and abstract. Also, some optional fields
can be added: language, programming language or markup language, operating
system, platform, tools (necessary to view the resource), publisher, acknowledge-
ments, source (of the resource), copyrights and multimedia content. It is not clear
what this last field is precisely meant for.
190
and professional development opportunities on topics important to the regions and
the United States as a whole. The database mostly contains metadata of books,
CD-ROMs and lesson plans, but also metadata of online learning materials.
191
A.9 ARIADNE Knowledge Pool
The Ariadne project (Duval, 1999, June) has collected a large amount of learning
material into their Knowledge Pool. The Ariadne metadata laid the basis for the
IEEE Learning Object Metadata standard (IEEE, 2000).
A.10 ELECTRA
The universities of Aachen, Liége, Maastricht and Diepenbeek-Hasselt (the “ALMA”
universities) have joined together in a project called “Electronic Learning Environ-
ment for Continual Training and Research in the ALMA Universities”, or ELEC-
TRA for short. The goal of the project is to realize a sophisticated electronic
learning environment. Part of this project was the Interactive Multimedia Data-
base (IMM-DB), developed by the Expertise Centre for Digital Media of LUC
(Limburgs Universitair Centrum). These projects have ended, but the technology
that was developed lead to the TUTORAID learning environment (Teaching Utility
Through Online RDBMS And Interactive Discussions)10
A.11 Explorer
The Explorer database11 is being developed by the Great Lakes Collaborative and
the University of Kansas UNITE group. It stores K-12 mathematics and science
10
see http://www.edm.luc.ac.be/edm2/act 4.html
11
see http://unite.rtec.org
192
education resources (learning plans, lab exercises etcetera). The database contains
about 1000 resources, most of which are downloadable.
The Mathematics Curriculum and the Natural Sciences Curriculum can both
be browsed. These curricula are presented as a large hierarchical list of up
to 7 levels. When the user selects a topic, a list of resources concerning this
topic is shown.
There is also a Quick Search, which consists of a single search field in which
keywords can be typed. The search facility then searches the title, author
and description for these keywords, and then presents a list of resources that
contain the keyword(s).
The Advanced Search facility allows the user to select text fields (resource
title, author, publisher, resource description, cost description, ISBN, avail-
ability, series, email address, publication date, cost in US dollars, GeoFocus
code and comments) the user wishes to use as a search parameter. These
fields contain free text. The user can also select zero or more of the list fields
(resource type, grades, process skills, physical media, curriculum, GeoFocus
location) the user wishes to use as a search parameter; these fields contain a
fixed vocabulary. A custom search form is then created.
A.13 TeLeTOP
The TeleTOP system was developed at the University of Twente (Collis, 1998a),
and is primarily a course management system in which a teacher is guided through
a decision process to generate a custom course environment. This course envi-
ronment consists among others of a course schedule, with task descriptions and
learning resources for each week. The teacher can add learning materials and ref-
erences (URLs) to this schedule.
Although the system is built on a database (it is implemented using the Lotus
Domino Server and the Lotus database management system), it focusses on course
management capabilities instead of the storage and retrieval of learning objects.
The system stores (references to) learning material without metadata, except a tex-
tual description.
193
A.14 Online databases
There are some more organizations that host a database of learning materials. Be-
low follows a list of the largest ones at the time of writing (September 1999).
A.14.2 PedagoNet
PedagoNet13 is a learning material and resources center where users can post re-
quests for learning materials, or browse the database to find suitable learning ma-
terials. Some of these are free and online, others have to be paid for and are not
directly accessible. The labels that can be assigned to a resource are the following:
region of origin (states in the USA, provinces of Canada, or elsewhere); subject
(fixed vocabulary), educational level (fixed vocabulary), price, a description, and a
URL if the resource is online. The database can be searched by selecting a subject,
and then the entire list of resources within this subject is displayed. As the data-
base is not very large (less than ten resources per subject), this is a suitable search
method.
A.15 Conclusion
Figure A.2 shows many projects about educational databases and metadata. Not all
projects shown in the figure have been discussed in this appendix; for a complete
overview, see Hiddink (1998).
The figure shows, from left to right, the evolution from theory through re-
search and development projects to industry. The projects are (roughly) situated at
this scale. A straight line from one project A to project B means that project B is
explicitly based on (results of) project A. A dotted line from A to B means that per-
sons from project A are participating in project B, so that knowledge is implicitly
transferred form project A to project B.
12
see http://www.pathlore.com
13
see http://www.pedagonet.com
194
Theory & Research Application Standardization Industry
Pragmatism projects projects
GEM
Rosetta
Dublin Core
IMS PHOENIX
Warwick Framework
IEEE P1484.12
ODB
TopClass
HBLE
ILCE Mercator
task-based Discourse
Indios
Pedagogical
classification Oscar
It becomes clear that already many projects have been working on educational
databases, and that standardization efforts come together in the IEEE Metadata
group.
195
196
Appendix B
Questionnaire
Enquête
Gerrit Hiddink
George van der Peet
Mei 1999
197
Inleiding
Het Centrum voor Telematica en Informatie-Technologie doet in samenwerking met
de vakgroep Instrumentatie van de faculteit Toegepaste Onderwijskunde onderzoek
naar toepassingsmogelijkheden van moderne Informatie- en Communicatie
Technologie (ICT). Het Idylle project1 bestudeert de mogelijke toepassing van
technologie als tools voor teleleren. Eén van de aandachtsgebieden binnen het Idylle
project is het toepassen van een digitale databank voor de opslag van leermateriaal,
zoals video's, simulatie's, teksten, COO programmatuur etcetera. Een dergelijke
databank maakt het naar verwachting mogelijk voor docenten om snel en effectief
kwalitatief hoogwaardig leermateriaal te vinden, aan te passen en te integreren in
hun bestaande vakken.
Dit onderzoek wordt uitgevoerd door Gerrit Hiddink (AIO), en een onderdeel van
het onderzoek is deze enquête. Deze is mede opgesteld door George van der Peet
(student Toegepaste Onderwijskunde) in het kader van het vak “Onderzoeks-
opdracht”.
Het invullen van de enquête kost circa 30 minuten. De door u gegeven antwoorden
zullen enkel voor het onderzoek gebruikt worden. Indien u er prijs op stelt zullen we
u op de hoogte houden van de uitkomsten van het onderzoek; vul hiertoe bij vraag
A9 uw naam en adres in.
http://www.wwcn.org:1080
Nadat u deze heeft ingevuld kunt u met een druk op de knop uw antwoorden
electronisch verzenden.
1
http://www.ctit.utwente.nl/Docs/projects/idylle/IDYLLE.htm
198
Deel A
[ ] a. lerarenopleiding
[ ] b. didaktisch inwerktraject UT
[ ] c. pedagogische of didaktische opleiding
[ ] d. cursus mondelinge- en presentatie vaardigheden
[ ] e. cursus begeleiden van werkcolleges
[ ] Anders, nl:
..............................................................................................................................
..............................................................................................................................
199
A5. Wat is uw leeftijd?
[] a. 20 - 29 jaar
[] b. 30 - 39 jaar
[] c. 40 - 49 jaar
[] d. ouder dan 50 jaar
A7. Hoeveel jaren heeft u ervaring met het gebruik van computers? (voor
onderwijs, onderzoek of hobby)
[] a. 0 t/m 5 jaar
[] b. 6 t/m 10 jaar
[] c. 11 t/m 15 jaar
[] d. 16 t/m 20 jaar
[] e. meer dan 20 jaar
A9. Indien u prijs stelt op de resultaten van het onderzoek of mee wilt werken
aan een vervolgonderzoek, vul dan hieronder uw naam en adres in:
Naam: ........................................................................
Gebouw: ....................................................................
Afdeling: ...................................................................
200
Deel B
Dit deel van de enquête heeft tot doel na te gaan wat uw houding ten opzichte van
het geven van onderwijs is. Kruis het hokje aan dat het meest met uw mening
overeenstemt. Probeert u zich bij het beantwoorden van de vragen af te vragen hoe u
zèlf zou reageren, waarbij u geen rekening hoeft te houden met tijdsdruk,
organisatorische randvoorwaarden, faculteitsbeleid en dergelijke. We willen graag
weten hoe u zèlf zou handelen.
Lees eerst alle alternatieven voordat u de (volgens u) beste aankruist. Als u zich in
geen van de alternatieven kunt vinden dan hoeft u niets aan te kruisen, maar vul wel
op de commentaar-regel in wat uw mening is. Als u zich bedenkt, maak dan het
foute hokje helemaal zwart en kruis uw nieuwe keuze aan.
B1a. Stel dat u een vak geeft bestaande uit een hoorcollege en een praktikum
aan eerstejaars studenten. Het vak is afgestemd op deze doelgroep. Stel dat een
andere afdeling of faculteit u zou vragen dit vak ook bij hun te geven, maar dan
voor ouderejaars studenten. Deze studenten bezitten de voorkennis die voor uw
vak nodig is. Zou u veranderingen aanbrengen in de stof van het vak?
[ ] a. Ja
[ ] b. Nee (ga naar vraag B2)
[ ] a. kleine veranderingen
[ ] b. matige veranderingen
[ ] c. redelijk veel veranderingen
[ ] d. veel veranderingen
[ ] e. compleet herontwerp
[ ] f. Anders, nl:
..............................................................................................................................
..............................................................................................................................
201
B2. Stel dat u een vak geeft aan ouderejaars studenten met een praktikum
waarbij de studenten een opdracht moeten kiezen uit een vooraf opgestelde
lijst. Wat is uw houding ten opzichte van studenten die zelf een opdracht
verzinnen (zowel vorm als inhoud)?
B3. Stel dat u gevraagd wordt een curriculum voor een nieuwe afdeling samen
te stellen. Wat wordt, als het aan u lag, ongeveer het percentage vakken waarin
studenten zelf de inhoud van het vak samen kunnen stellen, zoals
probleemgestuurd onderwijs of projectonderwijs? Doctoraalopdrachten hoeft u
niet mee te tellen.
[] a. 0% tot 20%
[] b. 21% tot 40%
[] c. 41% tot 60%
[] d. 61% tot 80%
[] e. 81% tot 100%
B4. Stel dat u een nieuw vak aan het inrichten bent, en dat u één studiepunt (40
uren) over hebt nadat u verder alle verplichte stof hebt verwerkt. Wat doet u
met het resterende studiepunt? U mag aannemen dat uw afdeling middelen
voor bijvoorbeeld excursies beschikbaar stelt. Kruis aan welke (meerdere
antwoorden mogelijk) van de volgende alternatieven u zou uitvoeren, en
vermeld ook welk percentage van de studiepunt u hieraan zou besteden.
[ ] a. u besteedt dit voor ..... % aan het door studenten laten voorbereiden en
uitvoeren van een excursie naar een bedrijf of organisatie die iets doet met de in
de stof behandelde theorie
[ ] b. u besteedt dit voor ..... % aan een extra college om de stof te oefenen
[ ] c. u besteedt dit voor ..... % aan een extra college om de stof verder uit te diepen
[ ] d. u besteedt dit voor ..... % aan een praktikum over technieken en
methoden om de behandelde stof praktisch toe te passen
202
[ ] e. Anders, nl voor .... % aan:
..............................................................................................................................
..............................................................................................................................
B5. Stel dat een student bij een werkcollege reeds alle vereiste aspecten van de
stof blijkt te beheersen en toe te kunnen passen. Is dan uw doel bereikt?
[ ] a. Ja, de student heeft alle nodige kennis en inzichten verworven die nodig zijn
om het vak succesvol af te kunnen ronden. De student kan zijn/haar tijd dan beter
besteden aan een ander vak.
[ ] b. Nee, de kennis en inzichten van de student kunnen misschien nog
genuanceerder of dieper worden
[ ] c. Anders, nl:
..............................................................................................................................
..............................................................................................................................
B6a. Gebruikt u naast de theorie in het boek en/of het diktaat nog aanvullend
materiaal voor uw collegestof?
B6b. Welke eisen heeft u ten aanzien van het aanvullende materiaal? (meer
antwoorden mogelijk)
203
B7. In hoeverre bent U het met onderstaande uitspraken eens?
204
zeer mee enigzins enigzins zeer mee
oneens oneens neutraal mee eens eens
205
zeer mee enigzins enigzins zeer mee
oneens oneens neutraal mee eens eens
206
Deel C
Dit deel van de enquête heeft tot doel na te gaan hoe belangrijk u bovenstaande
kenmerken van leermateriaal vindt in de gegeven situaties, door deze telkens te
rangschikken op belangrijkheid. Dit is van belang voor het ontwerpen van een
database voor leermateriaal, omdat de computer moet weten welke kenmerken hij
zwaarder moet laten wegen dan andere.
U rangschikt de bovenstaande vijf kenmerken met getallen van 1 tot 5, waarbij u het
kenmerk dat u het belangrijkst vindt met een 5 waardeert, en de minst belangrijke
met een 1. Als u twee kenmerken van eenzelfde belang vindt dan mag u deze ook
hetzelfde getal geven. U nummert dan van 1 tot 4. Als u het antwoordt niet zo snel
kunt verzinnen, kunt u zich afvragen: welk kenmerk zou ik als eerste laten vallen als
zoek-criterium als de database niets vindt? en welke als tweede? enzovoorts.
207
C1. Stel u geeft een trimester lang elke week een hoorcollege voor ouderejaars
studenten. U heeft vanuit voorgaande jaren de ervaring dat studenten te weinig
praktische vaardigheid bezitten, en u wilt dit verbeteren door “digitaal
leermateriaal” beschikbaar te stellen (bijvoorbeeld via het WWW) welke u
zoekt via de databank. Geef voor deze situatie een rangorde voor onderstaande
kenmerken, waarbij 5 het belangrijkste aangeeft en 1 het minst belangrijk:
onderwijsniveau .....
doorwerk-tijd .....
interactiviteit .....
onderwijskundige functie .....
onderwerp .....
Opm:
..............................................................................................................................
..............................................................................................................................
C2. Stel u moet een practicum samenstellen voor een bepaald vak, en tijdens de
eerste sessie wilt u dat de studenten wat voorbereidende opdrachtjes doen om
een basis te hebben voor de grotere opdracht(en), en u raadpleegt de databank
op zoek naar leermateriaal. Geef hieronder uw rangorde aan voor de
belangrijkheid van onderstaande kenmerken (5 is het belangrijkst):
onderwijsniveau .....
doorwerk-tijd .....
interactiviteit .....
onderwijskundige functie .....
onderwerp .....
Opm:
..............................................................................................................................
..............................................................................................................................
C3. Stel dat u een groot deel van uw vak via de computer (op het WWW, op
CD-ROM of anderszins) geeft, en u raadpleegt de databank op zoek naar
leermateriaal hiervoor. Wat is in deze situatie uw rangorde wat betreft
belangrijkheid?
onderwijsniveau .....
doorwerk-tijd .....
interactiviteit .....
onderwijskundige functie .....
onderwerp .....
Opm:
..............................................................................................................................
..............................................................................................................................
208
C4. Stel dat u voor leergierige of “snelle” studenten extra leermateriaal
beschikbaar wilt stellen, welke u wederom in de databank zoekt. Hoe
rangschikt u in deze situatie de kenmerken van het leermateriaal?
onderwijsniveau .....
doorwerk-tijd .....
interactiviteit .....
onderwijskundige functie .....
onderwerp .....
Opm:
..............................................................................................................................
..............................................................................................................................
C5. Stel dat u merkt dat eerstejaars een bepaald onderwerp niet uitvoerig
genoeg onderwezen hebben gekregen toen ze nog op het VWO zaten, en dat u
met name hun theoretische ondergrond bij wilt spijkeren. Hoe ziet nu uw
rangschikking eruit?
onderwijsniveau .....
doorwerk-tijd .....
interactiviteit .....
onderwijskundige functie .....
onderwerp .....
Opm:
..............................................................................................................................
..............................................................................................................................
C6. Stel dat u een nieuw keuzevak voor ouderejaars aan het ontwikkelen bent,
en dat u hiervoor een practicum wilt inrichten. Wat is dan uw rangschikking?
(5 is het belangrijkst)
onderwijsniveau .....
doorwerk-tijd .....
interactiviteit .....
onderwijskundige functie .....
onderwerp .....
Opm:
..............................................................................................................................
..............................................................................................................................
G. W. Hiddink, CTIT/INF
209
Appendix C
Instruments
ik ben promovendus bij het Centrum voor Telematica en Informatie Technologie bij
de Universiteit Twente. Ik doe onderzoek naar het toepassen van onderwijskundige
databanken, dat zijn databanken die gevuld zijn met digitaal, multimedia leermate-
riaal. Dit kunnen filmpjes zijn, audio fragmenten, animaties, Java applets, teksten,
HTML documenten, etcetera. Mijn onderzoek richt zich met name op de zoek-
methoden die een docent ter beschikking kunnen worden gesteld. Hiertoe heb ik
een prototype gemaakt, en deze gevuld met leermateriaal over een specifiek onder-
werp: inleiding in computernetwerken.
Op het World Wide Web zag ik dat u een soortgelijk vak geeft, en ik zou het dan
ook een zeer waardevolle aanvulling van mijn onderzoeksgegevens vinden als u
uw medewerking zou willen verlenen aan het onderzoek. Het betreft het doen
van een klein aantal zoekopdrachten, en het invullen van korte vragenlijstjes per
zoekopdracht. Alles bij elkaar zal het onderzoek ongeveer een uur in beslag nemen.
Het onderzoek loopt van 14 augustus t/m 1 september.
Zou u willen of kunnen deelnemen aan dit onderzoek? Kent u docenten die ook
een dergelijk vak geven die misschien ook mee zouden willen werken?
Gerrit Hiddink
210
C.2 Onderzoek naar selectie-criteria bij het zoeken in
onderwijskundige databanken.
Gerrit Hiddink
Centrum voor Telematica en Informatie Technologie
Inleiding
Het Centrum voor Telematica en Informatie-Technologie doet samen met de
faculteit Toegepaste Onderwijskunde onderzoek naar het toepassen van
onderwijskundige databanken in het onderwijs. Docenten kunnen een dergelijke
databank doorzoeken naar online leermateriaal. Eén van de aandachtspunten is op
welke manier docenten kunnen worden ondersteund in het zoekproces. Om de tot nu
toe ontwikkelde methodiek te toetsen en om meer inzicht te krijgen in het
zoekproces, neemt u deel aan een experiment.. Het experiment bestaat uit een drietal
hypothetische onderwijs-situaties (casussen). Uw opdracht is om met behulp van een
onderwijskundige database leermateriaal (zogenaamde Units of Learning Material,
ofwel ULMs) te zoeken dat geschikt is om in de beschreven onderwijs-situatie aan
de studenten aan te bieden via het Internet. De databank zal telkens een beperkt
lijstje zoekresultaten presenteren (4 tot 6) waarna u gevraagd zal worden de
bruikbaarheid van de resultaten voor de gegeven situatie te beoordelen op een
beoordelingsformulier. Deze indicatie van `bruikbaarheid’ zal vergeleken worden
met een indicatie die de het prototype zelf vaststeld op basis van de door u
opgegeven zoekspecificatie.
Instructie
Start op uw computer een internet-verkenner zoals Internet Explorer of Netscape
Navigator. Ga naar de volgende URL:
http://demeter.cs.utwente.nl:8080/development/dile/.
Vergeet niet de laatste ‘slash’! U kunt nu inloggen met de naam “experiments”. Een
wachtwoord is niet nodig. Klik daarna op “Do experiments” links in beeld. Als u
deze optie niet kunt vinden, dan heeft u waarschijnlijk een typefout gemaakt bij het
inloggen. Klik “Logout”, daarna “Login”, en probeer opnieuw.
Nadat u op “Do experiments” heeft gelikt moet u het “Subject ID” intoetsen, dit is
voor u: ...... Klik na het invullen van het nummer op `submit’. Dit nummer wordt
gebruikt om de door u ingevulde zoekgegevens te koppelen aan de door u ingevulde
beoordelingsformulieren. Als u dit nummer heeft ingevuld, verschijnt het
zoekformulier voor de eerste casus. Na de instructie volgen zes casus-
beschrijvingen, en per casus zult u een formulier invullen. De cyclus die doorlopen
wordt ziet er als volgt uit:
Voor 3 casussen:
1.u leest de casusbeschrijving;
2.u vult het zoekformulier in in de experimenteer-omgeving;
3.u krijgt een lijstje met zoekresultaten;
4.u bekijkt elk zoekresultaat en vult een beoordelingsformulier in;
5.u klikt “to next case”, waarna u weer bij 1 begint.
211
Hieronder volgt een korte beschrijving van het formulier.
212
Figuur 2: voorbeeld van de zoekresultaten-lijst
Nadat u “submit data” hebt geklikt, krijgt u een lijst met zoekresultaten, zie Figuur
2. Door op “view” te klikken, kunt u het leermateriaal bestuderen. De knop “back to
search results” onderaan de pagina brengt u terug in het overzicht; mogelijk moet u
eerst naar onder ‘scrollen’ voordat u deze knop ziet. U kunt ook de “back” knop van
uw browser gebruiken. De bedoeling van het experiment is nu dat u per
zoekresultaat (“Unit of Learning Material”, of ULM) een beoordelings-formulier
invult. Op het formulier vult u het nummer van de casus in, en het nummer van de
ULM. Dit nummer kunt u vinden in het zoekresultaten-scherm onder de kolom
“ULM id”. Als u eerst een overzicht wilt van alle zoekresultaten in een casus, dan
mag u deze natuurlijk ook eerst alle bekijken, en daarna pas de beoordelings-
formulieren invullen. Zorg wel dat u per zoekresultaat (ULM) één
beoordelingsformulier invult.
Units of Learning Material kunnen in deze proef niet gecombineerd worden. Dus
ook als u twee ULMs vind die tezamen een goede combinatie zouden vormen, dan
moet u deze toch afzonderlijk beoordelen (en dus concluderen dat ze elk
afzonderlijk minder bruikbaar zijn).
In de lijst met zoekresultaten vind u ook een hyperlink “metadata” bij elke Unit of
Learning Material. Als u hierop klikt, dan krijgt u een opsomming van de
belangrijkste kenmerken van de ULM, zoals: hoeveelheid interactiviteit, keywords,
213
onderwijskundige functie, afmeting, doorwerktijd, download-tijd enzovoorts (zie
Figuur 3). Deze kunt u ook raadplegen om de ULM te beoordelen op bruikbaarheid.
214
De casussen
Hieronder volgen drie casussen. Elk van deze casussen beschrijft een behoefte aan
leermateriaal. Deze behoefte kan worden vertaald in een bepaalde voorkeur voor
kenmerken van het leermateriaal Probeer u in te leven in de situatie, en probeer van
daaruit zinvolle keuzes te maken voor de invoervelden in het zoekformulier
(bovenste helft) en de volgorde waarin het systeem deze moet beoordelen (onderste
helft).
Het systeem zal u de zoekresultaten tonen, welke u kunt bekijken. Vul per
zoekresultaat (“ULM”) een beoordelingsformulier in, waarop u het nummer van de
casus en het nummer van de ULM noteert.
Casus 1.
Stel u geeft een trimester lang elke week een hoorcollege over computernetwerken
voor ouderejaars studenten. U heeft vanuit voorgaande jaren de ervaring dat
studenten te weinig praktische vaardigheden hebben in het werken met
netwerkanalyse-programma’s zoals ‘ping’ en ‘traceroute’. U wilt dit verbeteren door
“online leermateriaal” beschikbaar te stellen.
Casus 2.
Stel dat tijdens het college blijkt dat veel studenten de stof erg snel oppakken. Om
het deze studenten wat moeilijker te maken, wilt u extra materiaal op het Internet
beschikbaar stellen over het Internet Control Message Protocol.
Casus 3.
Voor het praktikum-gedeelte van uw vak heeft u 4 uur ingeroosterd voor het
onderwerp “TCP/IP in het OSI model”, en u wilt nu 1 uur hiervan besteden aan de
netwerk-laag en signalering. De bedoeling is dat de studenten een theorie-gedeelte
krijgen tijdens dit uur, en aansluitend een praktijk-gedeelte.
Z.O.Z.
215
De casussen
Hieronder volgen drie casussen. Elk van deze casussen beschrijft een behoefte aan
leermateriaal. Deze behoefte kan worden vertaald in een bepaalde voorkeur voor
kenmerken van het leermateriaal. Probeer u in te leven in de situatie, en probeer van
daaruit zinvolle keuzes te maken voor de invoervelden in het zoekformulier
(bovenste helft) en de volgorde waarin het systeem deze moet beoordelen (onderste
helft).
Het systeem zal u de zoekresultaten tonen, welke u kunt bekijken. Vul per
zoekresultaat (“ULM”) een beoordelingsformulier in, waarop u het nummer van de
casus en het nummer van de ULM noteert.
Casus 1.
Een niet-technische faculteit heeft besloten haar studenten een vak Informatie-
Technologie aan te bieden. De studenten hebben geen technische voorkennis. Omdat
de programma’s ‘ping’ en ‘traceroute’ op elke PC te vinden zijn, en omdat deze de
werking van het Internet zichtbaar kunnen maken, behoort ook het ICMP protocol
en deze twee programma’s tot de stof.
Casus 2.
Het vak dat u geeft vergt eigenlijk al teveel tijd van de studenten. Toch blijkt een
hiaat in de stof te bestaan: het Internet Control Message Protocol mist. De bestaande
stof kan eigenlijk niet ingekort worden. Toch wilt u materiaal aanbieden over ICMP.
Casus 3
Via een europees project heeft u bij uw college een doelgroep erbij gekregen:
studenten van universiteiten in oost-Europa gaan delen van uw vak
“computernetwerken” volgen. Hun niveau is vergelijkbaar met dat van uw eigen
studenten, maar hun netwerkverbinding is natuurlijk veel trager. De standaard-
boeken zoals Tanenbaum zijn niet verkrijgbaar, dus de studenten zijn aangewezen
op het online materiaal. Het onderwerp waar u materiaal over zoekt is wederom
ICMP.
Z.O.Z.
216
U kunt nu uitloggen door op “Logout” te klikken.
We zijn er ook in geïnteresseerd hoe moeilijk u deze proef vond. Daarom volgen
hieronder enkele vragen over handelingen tijdens de proef. Kunt u hierbij aangeven
hoe gemakkelijk u de handelingen vond?
gemak- moeilijk
kelijk
1. Hoe gemakkelijk vond u het om te bepalen of
een Unit of Learning Material bruikbaar was voor [ ] [] [] [] []
een casus?
2. Hoe gemakkelijk vond u het om u in te leven in [ ] [] [] [] []
de casus-beschrijvingen?
3. Hoe gemakkelijk vond u het om de situatie in
de casus te vertalen naar keuzes voor kenmerken [] [] [] [] []
van het leermateriaal in het zoekformulier?
4. Hoe gemakkelijk vond u het om de volgorde [] [] [] [] []
aan te geven waarin het systeem de kenmerken
moet controleren?
5. Waren er factoren in het beoordelingsformulier waarvan u het doorgaans moeilijk
vond om te bepalen of deze meetelden bij het bepalen van de bruikbaarheid? Zo ja,
geef dan hieronder de nummers op (zie het beoordelingsformulier):
................................................................................
G. W. Hiddink
Centrum voor Telematica en Informatie Technologie
Universiteit Twente
Postbus 217, 7500 AE Enschede
217
C.3 The IEEE Metadata Standard
The IEEE metadata standard 1 consists of nine major categories: ‘General’; ‘Life-
cycle’ (all about how the learning object was created, who contributed to it, etcen-
tera); ‘Metametadata’ (what is known about the metadata record itself); ‘Techni-
cal’: technical details about using the learning object; ‘Educational’: educational
characteristics; ‘Rights’: what is known about the intellectual property rights and
the costs of using the learning object; ‘Relation’: relations that this learning ob-
ject may have with other learning objects; ‘Annotation’: additional comments or
notes added to the metadata record by users; and ‘Classification’: where does this
learning object fall within a particular classification system.
These nine categories contain various elements or subcategories. The cate-
gories and their elements are described below. Note that this overview does not
pretend to be able to explain the precise meaning of these elements in detail; it
only serves as a global overview of what constitutes the IEEE metadata standard.
1
http://ltsc.ieee.org/doc/wg12/LOM WD6-1 without tracking.htm
218
1.8 Structure The internal structure of this learning object (eg.
linear, or branched).
1.9 Aggregation level The functional granularity of this learning object.
2 L IFECYCLE The category describing the lifecycle of the
learning object.
2.1 Version The current version of the learning object.
2.2 Status The state or condition (eg. draft, final) of the object.
2.3 Contribute A data element describing the people that
contributed to the object.
2.3.1 Role the kind of contribution that a person made.
2.3.2 Entity the person or organisation that made this
contribution.
2.3.3 Date the data on which the contribution took place.
3 M ETAMETADATA This category describes what is known about the
metadata record itself.
3.1 Identifier A globally unique label that identifies this
metadata record.
3.2 Catalog Entry This describes a catalog entry in which this
metadata record resides.
3.2.1 Catalog The name of the catalog.
3.2.2 Entry The string value of the entry in the catalog.
3.3 Contribute This subcategory describes the contributions that
have been made to this record.
3.3.1 Role The kind of contribution that a person made.
3.3.2 Entity The person or organisation that made this
contribution.
3.3.3 Date The date on which the contribution took place.
3.4 Metadata Scheme The name and version of the specification used to
create the metadata instance.
3 Language The default human language of the texts in the
metadata record.
4 T ECHNICAL The technical requirements and characteristics of
the learning object.
4.1 Format The data type(s) of this learning object (eg.
MIME types).
219
4.2 Size size of the digital learning object in bytes.
4.3 Location A string that is used to access this
learning object, such as a URL.
4.4 Requirements A sub-category describing the technical
requirements needed to use the learning
object.
4.4.1 Type The technology type required to use this
learning object, such as hardware, network,
or software.
4.4.2 Name The name of the required technology.
4.4.3 Minimum Version Minimum required version of the
technology needed.
4.4.4 Maximum Version Maximum version known to support the use
of this learning object.
4.5 Installation Remarks Description of how to install the learning
object.
4.6 Other Platform Information about other software and
Requirements hardware requirements.
4.7 Duration Time a continuous learning object takes to
play.
5 EDUCATIONAL The educational characteristics of this
learning object.
5.1 Interactivity Type The flow of interaction between this
learning object and the user (eg. active, or
expositive).
5.2 Learning resource type The specific kind of the learning object (eg.
graph, exam, or simulation).
5.3 Interactivity Level The degree of interactivity between the end
user and the object (very low . . . very high).
5.4 Semantic Density The amount of information conveyed by this
learning object as compared to its size or
duration.
5.5 Intended End User Role The intended role of the end user (eg.
teacher, author, or learner).
5.6 Context The principal environment within which this
learning object was intended to be used.
5.7 Typical Age Range The age range of the typical end user.
5.8 Difficulty This defines how difficult it is for the
intended end user to use this learning object.
220
5.9 Typical Learning The typical time it takes to work with this
Time learning object.
5.10 Description How this learning object is to be used.
5.11 Language The human language used by the typical
intended user of this learning object.
6 RIGHTS The intellectual property rights and
conditions for using this object.
6.1 Cost Whether or not using this learning object
requires payment.
6.2 Copyright and Whether copyright or other restrictions
other restrictions apply to the use of this object.
6.3 Description Comments on the conditions of use of this
learning object.
7 R ELATION This category defines the relationshop
between this learning object and others.
7.1 kind Nature of the relationship (eg. IsPartOf, or
IsBasedOn).
7.2 Resource The learning object that this relation references.
7.2.1 Identifier The unique identifier of the target learning object.
7.2.2 Description Description of the target learning object.
7.2.3 Catalog Entry See 1.3: Catalog Entry.
8 A NNOTATION A category that describes comments added
to the metadata record.
8.1 Person The person who created this annotation.
8.2 Date The date that this annotation was created.
8.3 Description The content of this annotation.
9 C LASSIFICATION This category describes where this learning
object falls within a particular classification
system.
9.1 Purpose The purpose of classifying this learning object.
9.2 TaxonPath A sub-category that describes a taxonomic path
in the classification system.
9.2.1 Source The name of the classification system.
9.2.2 Taxon A sub-category that describes a particular term
within the taxonomy.
9.2.2.1 ID The identifier of the term.
9.2.2.2 Entry The textual label of the taxon.
9.3 Description The description of the learning objective with
respect to the classification purpose.
9.4 Keywords The keywords and phrases
221
C.4 Beoordelings-formulier
Casus nummer .....
ULM ID .....
Subject ID .....
Geef een algehele score voor de bruikbaarheid van dit leermateriaal (1 - 10) .........
Hieronder volgen 12 factoren die mee kunnen hebben gespeeld bij het bepalen van de
bruikbaarheid van het leermateriaal. Kruis per factor aan of deze mee heeft geteld bij het
bepalen van de bruikbaarheid van deze Unit of Learning Material voor deze casus, of niet. Als
u het niet weet, kunt u dit ook aankruisen.
Opm: ............................................................................................................................................
............................................................................................................................................
222
Figure D.1: Answers of the test subjects on part B of the questionnaire (see Section 7.3.3)
cb.scc(B1c.c)
cb.scb (B1c.
at.scb.avg
at.scc.avg
cb.lo (B5)
cb.nk.avg
cb.cc.avg
at.nk.avg
at.cc.avg
sca-7.13
at.lo.avg
sca.avg
scb.avg
scc.avg
nn.avg
cc.avg
nk-7.3
lo.avg
bi.avg
bi-7.5
ID
2 -0,0345 -0,0916 -0,1128 0,34897 -0,4675 -0,2714 -0,4439 -0,0916 -0,1128 -0,719 1,41689 -0,7035 0,13672 -0,4695 -0,2714 0,33804 -0,7027 -0,4439
3 1,34592 0,67036 0,01376 -0,7641 0,89434 0,25766 0,73975 -0,9309 1,47102 -0,6515 0,34641 -0,719 -0,8092 1,33169 0,59401 0,25766 1,19144 0,66685 0,73975
4 0,12726 -0,0916 0,34641 1,3524 0,46607 0,11804 0,31653 -0,0916 0,34641 1,2879 1,41689 0,26101 0,59401 -0,4695 0,11804 0,17498 0,66685 0,31653
17 0,8049 0,23184 0,34334 0,45742 -0,2812 0,5409 0,47273 0 0,34775 0,34334 0,28447 0,63036 -0,6939 0,13672 -0,4695 0,5409 1,19144 -0,2466 0,47273
144 0,24992 0,65721 1,25559 -0,6707 0,56691 -0,017 0,53284 0,65721 1,25559 -0,139 -1,2025 0,05698 0,59401 -0,4695 -0,017 0,76474 1,12298 0,53284
10 0,3834 -0,0916 0,59013 -0,5841 -0,0303 -0,3887 0,22044 -0,0916 0,17145 0,79947 -0,719 -0,4493 0,14972 -0,5357 -0,4695 -0,3887 -0,0887 0,20956 0,22044
32 0,40434 -0,9704 -0,018 0,45742 -1,0875 -0,3658 -0,1413 -0,9704 0,17145 -0,1128 0,28447 0,63036 -0,6664 -1,4503 -0,4695 -0,3658 0,76474 -1,6173 -0,1413
11 0,10946 0,68775 0,59013 1,64239 0,08413 0,33471 0,27713 0 1,03163 0,17145 0,79947 1,86789 1,41689 -0,6939 0,59401 -0,4695 0,33471 0,17498 0,66685 0,27713
14 -0,6176 -0,0916 0,79947 -0,0808 1,04758 -0,1533 -0,1357 -0,0916 0,79947 -0,139 -0,0227 1,25749 1,0513 -0,4695 -0,1533 -0,0887 1,12414 -0,1357
26 -0,0485 -0,0267 0,80254 -0,9607 0,93246 -0,3175 -0,4742 -0,0267 0,80254 -0,719 -1,2025 1,42814 0,13672 -0,4695 -0,3175 -0,0887 0,66569 -0,4742
18 0,11618 -0,596 -1,472 0,73429 -1,1884 -0,5219 -1,2328 -0,596 -1,472 1,13136 0,33722 -0,7035 -1,6654 -0,4695 -0,5219 0,17498 -1,16 -1,2328
x1 0,20818 -0,4147 0,79947 1,15576 0,27317 0,31278 0,74573 -0,9309 -0,1566 0,79947 1,2879 1,02362 -0,2212 0,59401 -0,4695 0,31278 0,76474 0,66685 0,74573
7 0,60049 0,29254 0,25834 0,87575 -0,2506 -0,7374 0,68904 0,93095 -0,0267 0,99444 -0,1097 1,021 0,73049 -1,0745 0,59401 -0,4695 -0,7374 0,76474 0,21073 0,68904
28 0,09934 -0,4661 0,34948 -0,9909 0,02344 -0,5361 0,16306 -0,4661 0,34948 -1,5658 -0,4159 -0,6293 -0,0784 -0,4695 -0,5361 -0,6784 1,12298 0,16306
30 0,1133 0,43814 1,25559 0,56907 -0,0704 0,65837 0,35956 0 0,65721 1,25559 0,44102 0,69711 -0,4067 -0,5357 2,08656 0,65837 -0,2517 0,66569 0,35956
29 0,66828 1,47102 0,01172 0,52257 0,23911 0,0242 0,37664 1,47102 -0,6515 0,34334 0,70792 0,33722 0,14972 0,59401 -0,4695 0,0242 0,17498 0,21073 0,37664
34 -0,2896 0,37744 0,62193 0,7657 0,78646 0,81871 0,6066 -0,9309 1,03163 -0,6515 1,25866 0,44102 1,09038 0,14972 0,59401 -0,4695 0,81871 0,17498 1,5791 0,6066
x2 0,16706 0,55954 -0,1128 0,34263 -0,1075 0,21564 0,52942 1,8619 -0,0916 -0,1128 0,70792 -0,0227 0,14972 0,47464 -0,4695 0,21564 -0,0887 -1,2042 0,52942
5 -0,0353 -0,2216 -1,0189 -1,3842 0,05666 0,24879 -0,4277 -0,2216 -1,0189 -1,5658 -1,2025 0,14972 0,59401 -0,4695 0,24879 -0,1024 -0,2454 -0,4277
33 0,51522 -0,3107 -0,6221 0,62506 -0,2656 -0,605 0,16921 0 -0,4661 0,17145 -1,0189 0,12793 1,61933 0,71588 -0,7779 -0,4695 -0,605 0,17498 -1,6173 0,16921
25 -0,1432 -0,458 -0,5597 -0,7792 0,74002 0,09286 -0,0731 -0,9309 -0,2216 -0,5597 -1,1424 -0,4159 0,94588 0,59401 -0,4695 0,09286 0,17498 0,66685 -0,0731
38 -0,1144 -0,1477 -0,318 0,1309 -0,1496 0,31265 -0,7041 0 -0,2216 0,17145 -0,5628 0,28447 -0,0227 0,09266 -0,3206 2,08656 0,31265 -0,6784 -0,7039 -0,7041
51 0,42153 -0,621 0,01376 -0,459 0,20986 0,06096 -0,2374 -0,9309 -0,4661 -0,6515 0,34641 0,28447 -1,2025 -0,3795 0,59401 2,08656 0,06096 1,19144 0,66685 -0,2374
55 -0,1432 -0,3714 -0,5628 -0,0808 0,52304 0,42308 -0,0812 -0,9309 -0,0916 -0,5628 -0,139 -0,0227 1,31685 0,13672 -0,4695 0,42308 0,17498 -0,2466 -0,0812
59 -0,6886 -0,6817 -0,2606 -0,341 -0,3439 0,2411 0,09212 -1,8619 -0,0916 -1,4745 0,34641 -0,9859 0,30385 -0,4191 -0,7508 2,08656 0,2411 0,17498 0,66685 0,09212
44 0,49251 0,65721 -0,1128 -0,6707 -0,0007 0,50644 -0,8234 0,65721 -0,1128 -0,139 -1,2025 0,46363 0,13672 -0,4695 0,50644 -0,0887 -0,7027 -0,8234
Figure D.1 (continued): Answers of the test subjects on part B of the questionnaire (see Section 7.3.3)
cb.scc(B1c.c)
cb.scb (B1c.
at.scb.avg
at.scc.avg
cb.lo (B5)
cb.nk.avg
cb.cc.avg
at.nk.avg
at.cc.avg
sca-7.13
at.lo.avg
sca.avg
scb.avg
scc.avg
nn.avg
cc.avg
nk-7.3
lo.avg
bi.avg
bi-7.5
ID
43 0,16706 0,65721 0,83879 1,12559 0,43732 1,03313 0,22044 0,65721 1,81742 0,34948 0,44102 1,81016 -0,267 0,59401 2,08656 1,03313 -0,0887 1,12298 0,22044
42 0,1133 -1,3448 -0,316 -0,3875 0,46607 -0,0307 0,6066 -1,3448 0,17145 -0,5597 -0,719 -0,056 0,26101 0,59401 2,08656 -0,0307 -0,2517 0,66685 0,6066
37 0,99953 0,60285 0,22858 -0,2941 -0,4675 0,34148 -0,2606 1,8619 -0,0267 1,81742 -0,5658 -0,139 -0,4493 -0,7035 0,13672 2,08656 0,34148 0,76474 -0,7027 -0,2606
54 -1,5022 -0,9704 -0,8667 0,1758 -0,4335 -0,2791 0,02859 -0,9704 -1,4745 -0,5628 0,44102 -0,0894 -1,0745 0,13672 -0,4695 -0,2791 -1,8579 -0,2466 0,02859
41 -0,0014 0,73107 0,59013 0,31085 0,40375 -0,143 0,49595 0 1,0966 0,17145 0,79947 0,28447 0,33722 0,10521 0,59401 -0,4695 -0,143 0,76474 0,66685 0,49595
57 0,52043 -0,0611 -0,5658 -0,0673 -0,0619 -0,0588 -0,8092 0 -0,0916 -0,5658 0,70792 -0,8426 0,98298 -0,5357 -0,4695 -0,0588 0,76474 -0,7027 -0,8092
6 0,5085 -0,0437 0,26038 0,28585 0,46607 -0,2632 -0,6099 0,93095 -0,531 0,99444 -0,1066 1,021 -0,4493 0,26101 0,59401 -0,4695 -0,2632 0,17498 0,66685 -0,6099
13 0,16332 -0,8792 -1,0158 -0,5841 -0,3156 -0,188 0,10022 0,93095 -1,7842 -1,0158 -0,719 -0,4493 0,13489 -0,7779 -0,4695 -0,188 -0,5154 -1,1611 0,10022
39 0,22201 0,65721 -0,5646 -0,0824 -0,468 -0,1433 -0,3855 0,65721 -1,4745 -0,1097 0,28447 -0,4493 -0,7035 -0,3206 -0,4695 -0,1433 -0,0887 -0,7039 -0,3855
36 0,18724 -0,751 0,34027 0,81256 -0,433 0,30263 -0,9134 -0,9309 -0,661 0,34027 1,2879 0,33722 -1,0745 0,59401 2,08656 0,30263 -0,0887 -0,2454 -0,9134
9 0,0053 0,12782 -0,5658 -0,5841 -0,0817 0,50945 0,04989 -0,9309 0,65721 -0,5658 -0,719 -0,4493 0,26101 0,13672 -0,4695 0,50945 0,76474 -0,7027 0,04989
35 -0,8488 -0,9704 -0,5944 -0,7641 -0,8334 0,0896 -0,1691 -0,9704 -0,6515 -0,5658 -0,719 -0,8092 -0,7035 -0,7779 -0,4695 0,0896 -1,6949 -1,6173 -0,1691
16 0,19473 0,24922 1,25559 -0,0808 0,11433 -0,1006 0,51576 0,93095 -0,0916 1,25559 -0,139 -0,0227 -1,0745 0,59401 -0,4695 -0,1006 0,76474 1,12298 0,51576
24 -0,0345 0,12782 -0,1097 -1,1875 0,62691 -0,5127 -0,8126 -0,9309 0,65721 -0,1097 -1,5658 -0,8092 1,57652 0,13672 -0,4695 -0,5127 0,33804 -0,2466 -0,8126
15 -0,1572 0,28278 -0,318 -0,0975 0,34298 0,17683 0,85889 0,28278 0,17145 -0,5628 -0,139 -0,056 0,4094 0,59401 0,17683 -0,2517 0,21073 0,85889
50 -1,9057 -2,2056 -1,472 -1,2025 -0,9818 0,00845 0,11372 -2,2056 -1,472 0 -1,2025 -1,0745 -0,7779 -0,4695 0,00845 -2,8744 -1,6173 0,11372
40 0,31234 0,43814 0,28809 1,27412 -0,0681 0,21839 1,0752 0 0,65721 0,17145 0,34641 1,13136 1,41689 -1,0745 0,59401 -0,4695 0,21839 0,17498 0,66685 1,0752
21 -0,5895 -0,531 -1,0158 -0,0109 -0,0564 -1,3071 -0,3452 -0,531 -1,0158 -0,719 0,69711 1,42814 -1,4232 -0,4695 -1,3071 -0,0887 0,21073 -0,3452
61 -0,9124 0,65721 0,03943 -0,5841 -0,1672 -0,3701 0,53284 0,65721 -1,4745 0,7964 -0,719 -0,4493 0,72083 -0,993 -0,4695 -0,3701 -0,6784 -0,7039 0,53284
65 -0,1001 0,74845 -0,1097 0,31085 0,25693 0,21074 -0,1413 0,93095 0,65721 -0,1097 0,28447 0,33722 0,4094 0,37889 -0,4695 0,21074 0,17498 0,66802 -0,1413
56 -1,0216 0,21782 -0,5658 -0,0808 -0,2647 0,22412 -0,7983 0,21782 -0,5658 -0,139 -0,0227 0,26101 -0,3206 2,08656 0,22412 -1,6949 -1,16 -0,7983
66 0,10946 -0,6036 -0,016 0,27907 -0,5912 -0,5693 -0,2777 0 -0,9055 0,17145 -0,1097 -0,139 0,69711 -0,7962 -0,5357 -0,4695 -0,5693 0,17498 -0,2466 -0,2777
68 -1,2672 -0,9055 -1,0158 -0,9607 -1,4578 -1,0809 -0,2408 -0,9055 -1,0158 -0,719 -1,2025 -0,7035 -2,7951 -0,4695 -1,0809 -1,1051 -1,1611 -0,2408
63 -0,5364 -0,2216 1,44287 -0,7641 0,64852 0,42858 -0,316 -0,2216 1,81742 1,25559 -0,719 -0,8092 0,26101 0,59401 -0,4695 0,42858 -0,4148 1,12298 -0,316
47 0,54136 1,3517 0,6652 0,37187 0,27299 0,04716 1,8619 1,0966 0 1,32114 0,33722 0,14972 0,59401 -0,4695 0,27299 0,77848 0,71338 0,04716
Figure D.2: Answers of the test subjects on part C of the questionnaire (see Section 7.3.3)
context 1 context 2 context 3 context 4 context 5 context 6 average over all contexts
interactivity
interactivity
interactivity
interactivity
interactivity
interactivity
interactivity
subject ID
function
function
function
function
function
function
function
subject
subject
subject
subject
subject
subject
subject
level
level
level
level
level
level
level
time
time
time
time
time
time
time
2 4 2 3 5 5 4 2 1 5 4 4 1 2 3 5 4 3 1 2 5 4 1 2 4 5 5 2 1 5 4 4.17 1.83 1.67 4 4.67
3 3 1 2 4 5 3 2 1 4 5 4 1 3 2 5 5 1 3 2 4 1 2 3 4 5 3 1 2 5 4 3.17 1.33 2.33 3.5 4.67
4 3 2 1 4 5 3 2 1 4 5 4 2 1 3 5 4 1 2 3 5 4 2 1 3 5 4 2 1 3 5 3.67 1.83 1.17 3.33 5
17 1 3 2 4 5 2 1 3 4 5 3 2 1 4 5 3 2 4 1 5 5 2 1 3 4 1 2 3 4 5 2.5 2 2.33 3.33 4.83
144 1 4 5 3 2 1 4 3 3 2 1 4 3 2 2 1 3 3 2 2 1 4 5 3 2 1 3 3 4 2 1 3.67 3.67 2.83 2
10 4 1 2 3 5 4 3 2 1 5 4 1 2 3 5 3 2 1 4 5 4 3 1 2 5 4 3 1 2 5 3.83 2.17 1.5 2.5 5
32 5 4 2 1 3 3 4 5 1 2 5 4 3 1 2 2 3 4 1 5 2 4 5 3 1 3 4 5 2 1 3.33 3.83 4 1.5 2.33
11 1 2 4 5 3 5 3 2 1 4 4 2 1 3 5 4 1 2 3 5 5 2 1 3 4 4 1 2 3 5 3.83 1.83 2 3 4.33
14 5 3 2 2 4 4 3 5 2 3 5 3 5 4 3 5 4 5 2 2 3 2 4 4 1 5 2 5 2 1 4.5 2.83 4.33 2.67 2.33
26 3 2 1 4 5 3 2 1 4 5 3 2 1 4 5 3 2 1 4 5 3 2 1 4 5 3 2 1 4 5 3 2 1 4 5
18 1 2 3 5 4 3 1 2 4 5 1 2 5 4 3 1 3 5 4 2 1 2 4 3 5 2 4 5 3 1 1.5 2.33 4 3.83 3.33
x1 4 1 3 5 2 3 4 2 5 1 4 1 2 5 3 4 1 3 2 5 2 1 3 5 4 2 1 3 5 4 3.17 1.5 2.67 4.5 3.17
7 2 1 3 3 3 1 4 2 3 5 1 1 1 1 1 5 1 3 2 4 3 1 2 3 3 2 1 2 2 2 2.33 1.5 2.17 2.33 3
28 4 1 5 2 3 2 1 5 4 3 4 1 5 2 3 5 1 4 2 3 3 1 4 2 5 2 1 3 4 5 3.33 1 4.33 2.67 3.67
30 2 1 5 3 4 2 1 4 5 3 1 2 4 5 3 4 3 5 2 1 4 3 2 1 5 2 1 5 3 4 2.5 1.83 4.17 3.17 3.33
29
34 2 1 5 4 3 1 2 4 5 3 2 1 4 3 5 4 1 3 2 5 4 1 5 3 2 3 1 5 4 2 2.67 1.17 4.33 3.5 3.33
x2 3 1 2 4 5 1 3 2 4 5 4 1 2 3 5 4 1 3 2 5 3 1 2 4 5 2 1 3 4 5 2.83 1.33 2.33 3.5 5
5 1 2 3 4 5 4 2 3 1 5 4 2 3 1 5 4 1 3 2 5 4 3 1 2 5 4 3 1 2 5 3.5 2.17 2.33 2 5
33
25 3 1 2 4 5 5 1 2 4 3 5 1 2 3 4 2 1 4 3 5 4 1 3 5 2 3 1 2 5 4 3.67 1 2.5 4 3.83
38 2 1 5 4 3 2 3 4 1 5 2 1 4 3 5 2 1 5 3 4 2 1 5 4 3 2 1 5 4 3 2 1.33 4.67 3.17 3.83
51 4 3 5 1 2 3 4 5 2 1 3 4 5 2 1 3 4 5 2 1 3 4 5 2 1 3 4 5 2 1 3.17 3.83 5 1.83 1.17
55 4 2 1 3 5 3 2 1 4 5 4 2 1 3 5 4 2 1 3 5 1 3 2 4 5 4 2 1 3 5 3.33 2.17 1.17 3.33 5
59 2 4 1 3 5 3 2 1 4 5 4 1 2 3 5 3 2 1 4 5 3 1 2 4 5 2 1 3 4 5 2.83 1.83 1.67 3.67 5
44 4 3 2 1 5 1 2 3 4 5 2 4 5 1 3 1 5 3 4 2 2 5 4 1 3 4 3 2 1 5 2.33 3.67 3.17 2 3.83
43 2 3 4 5 1 1 3 4 2 5 3 1 2 3 5 5 1 3 4 2 5 1 3 4 2 3 1 2 4 5 3.17 1.67 3 3.67 3.33
42 3 2 1 4 5 3 2 1 4 5 3 1 2 4 5 4 2 1 3 5 4 1 2 3 5 4 1 2 3 5 3.5 1.5 1.5 3.5 5
37 4 3 2 1 5 4 3 2 1 5 4 3 2 1 5 5 3 2 1 4 5 3 2 1 4 4 3 2 1 5 4.33 3 2 1 4.67
54 1 2 4 5 3 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3.17 4.17 4.67
41 3 1 2 2 3 3 1 2 2 3 3 1 2 2 3 3 1 2 2 3 3 1 2 2 3 3 1 2 3 3 3 1 2 2.17 3
57 2 1 5 4 3 5 4 3 1 2 5 1 2 4 3 5 2 1 3 4 5 2 1 3 4 1 2 5 3 4 3.83 2 2.83 3 3.33
6 3 3 2 4 5 3 3 2 4 5 3 3 2 4 5 3 3 2 4 5 3 3 2 4 5 3 3 2 4 5 3 3 2 4 5
13 4 1 2 3 5 3 1 2 4 5 4 2 1 3 5 5 1 3 2 4 4 1 2 3 5 4 1 2 3 5 4 1.17 2 3 4.83
39 5 3 4 5 5 5 3 4 5 5 5 3 4 5 5 5 3 4 5 5 5 3 4 5 5 5 3 4 5 5 5 3 4 5 5
36 5 1 2 4 3 2 4 3 5 1 3 5 4 2 1 4 1 2 3 5 1 4 2 3 5 3 3 2.6 3.4 3
9 2 1 3 5 4 2 1 4 5 3 4 1 2 3 5 4 1 3 1 5 4 1 3 2 5 4 1 2 3 5 3.33 1 2.83 3.17 4.5
35 3 4 1 2 5 3 4 1 2 5 3 4 1 2 5 3 4 1 2 5
16
24 5 1 2 3 4 5 1 3 4 2 4 2 1 3 5 4 1 3 2 5 5 1 2 3 4 3 4 1 2 5 4.33 1.67 2 2.83 4.17
15 5 1 2 4 3 4 3 2 5 1 5 1 3 2 4 3 1 4 5 2 5 1 3 4 2 3 1 2 5 4 4.17 1.33 2.67 4.17 2.67
50
40
21 4 1 2 3 5 4 2 1 3 5 4 2 1 3 5 4 2 1 3 5 3 2 1 4 5 4 1 2 3 5 3.83 1.67 1.33 3.17 5
61 5 1 3 4 2 5 1 3 4 2 5 1 3 4 2 5 3 4 1 2 4 1 2 3 5 5 1 4 2 3 4.83 1.33 3.17 3 2.67
65 4 2 1 3 5 3 1 2 5 4 4 2 3 1 5 4 2 3 1 5 4 3 2 1 5 3 2 1 5 4 3.67 2 2 2.67 4.67
56 4 1 2 5 3 4 3 1 2 4 4 1 2 4 3 5 2 1 4 3 3 2 1 4 4 3 2 1 4 3 3.83 1.83 1.33 3.83 3.33
66 4 1 3 2 5 4 1 3 2 5 4 1 3 2 5 4 1 3 2 5 4 1 3 2 5 4 1 3 2 5 4 1 3 2 5
68 3 2 1 5 4 2 3 1 4 5 3 2 1 5 4 5 2 1 3 4 5 2 1 3 4 3 2 1 4 5 3.5 2.17 1 4 4.33
63 3 1 5 2 4 1 3 4 2 5 3 1 5 2 4 5 1 4 3 2 5 4 1 3 2 1 3 5 2 4 3 2.17 4 2.33 3.5
47 2 5 4 3 5 4 3 1 2 5 4 1 2 3 5 4 1 2 3 5 5 1 2 3 4 4 1 2 3 5 3.83 2 2.17 2.83 4.83
225
Figure D.3: Data of the validation experiment (see Section 8.5)
persoon
factor10
factor11
factor12
factor1
factor2
factor3
factor4
factor5
factor6
factor7
factor8
factor9
timeW
sizeW
casus
score
diffW
intW
time
time
size
size
ulm
taxi
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
diff
diff
int
int
1
1 1 2 4 2:00:00 1 1 4 3 238 3 3 2 15 3,78 0,84 1,02 1,19 1,37 1,55 1,73 1,91 2,09 2,26 2,44 2,62 4 3 1 2 3 3 3 3 3 3 1 1 3
1 1 2 4 2:00:00 1 1 4 3 242 3 3 4 10 2,7 0,86 1,02 1,18 1,35 1,51 1,68 1,84 2,01 2,18 2,34 2,51 1 3 3 3 3
1 1 2 4 2:00:00 1 1 4 3 231 3 3 4 50 0,7 0,26 0,28 0,31 0,33 0,36 0,39 0,42 0,45 0,48 0,51 0,54 5 2 2 3 2 3 2 3 2 2 2 2
1 1 2 4 2:00:00 1 1 4 3 220 3 4 0 60 3,07 0,78 0,95 1,14 1,33 1,52 1,71 1,91 2,11 2,3 2,5 2,7 1 2 1 1 3 1 3 3 3 1 1 1 1
1 1 2 4 2:00:00 1 1 4 3 221 3 0 0 20 5,07 1,02 1,25 1,47 1,7 1,94 2,17 2,41 2,64 2,88 3,12 3,36 5 3 1 1 3 3 3 1 3 1 1 1 1
1 1 2 4 2:00:00 1 1 4 3 226 3 0 0 10 5,57 1,14 1,38 1,63 1,88 2,14 2,39 2,65 2,9 3,16 3,42 3,68 1 3 1 3 3 1 1 3 3 1 1 1 1
1 2 4 4 2:00:00 1 4 1 1 219 2 3 0 10 2,3 1,09 1,1 1,11 1,13 1,15 1,18 1,21 1,23 1,27 1,3 1,33 8 1 1 1 3 1 3 3 1 1 1 1 1
1 2 4 4 2:00:00 1 4 1 1 222 2 0 2 7 4,42 1,24 1,4 1,59 1,79 2 2,21 2,43 2,65 2,88 3,11 3,34 1 1 1 1 3 1 3 3 1 1 3 1 1
1 2 4 4 2:00:00 1 4 1 1 228 2 4 2 30 0,83 0,6 0,6 0,6 0,6 0,6 0,6 0,6 0,6 0,6 0,6 0,6 1 1 1 1 2 3 3 3 1 1 3 1 1
1 2 4 4 2:00:00 1 4 1 1 217 2 1 0 10 3,9 1,22 1,32 1,44 1,56 1,7 1,84 1,99 2,14 2,3 2,46 2,63 6 1 3 1 3 3 3 3 1 1 1 1 1
1 2 4 4 2:00:00 1 4 1 1 237 3 4 2 35 0,75 0,53 0,53 0,53 0,53 0,53 0,53 0,53 0,53 0,53 0,53 0,53 6 1 3 1 3 1 3 3 1 1 1 1 1
1 2 4 4 2:00:00 1 4 1 1 221 3 0 0 20 4,53 1,24 1,4 1,59 1,79 2 2,21 2,43 2,65 2,88 3,11 3,34 1 1 3 1 3 1 1 3 1 1 1 1 1
1 3 2 4 1:00:00 1 4 4 1 229 2 3 4 45 1,05 0,32 0,36 0,41 0,45 0,51 0,56 0,61 0,67 0,72 0,78 0,84 3 1 3 1 3 3 2 3 3 1 1 1 1
1 3 2 4 1:00:00 1 4 4 1 230 2 3 4 10 1,63 0,86 0,87 0,89 0,92 0,94 0,97 1 1,04 1,08 1,11 1,16 3 1 3 1 3 2 3 3 3 1 1 1 1
1 3 2 4 1:00:00 1 4 4 1 238 3 3 2 15 2,88 0,84 0,9 0,97 1,05 1,14 1,23 1,32 1,42 1,52 1,62 1,73 4 3 3 1 3 2 3 3 3 1 1 1 1
1 3 2 4 1:00:00 1 4 4 1 217 2 1 0 10 4,3 1,09 1,23 1,39 1,56 1,74 1,93 2,12 2,31 2,51 2,71 2,91 2 1 3 1 3 2 3 3 3 1 1 1 1
1 3 2 4 1:00:00 1 4 4 1 234 2 3 0 18 4,17 0,99 1,14 1,32 1,5 1,68 1,88 2,07 2,27 2,47 2,67 2,87 1 3
1 3 2 4 1:00:00 1 4 4 1 226 3 0 0 10 5,1 1,14 1,31 1,5 1,7 1,9 2,11 2,33 2,55 2,77 2,99 3,22 1 3 3
2 1 2 2 0 0:20:00 1 3 2 4 226 3 0 0 10 2,12 0,5 0,59 0,68 0,77 0,87 0,98 1,08 1,2 1,31 1,44 1,56 6 3 1 3 3 2 3 3 1 3 1 1 2
2 1 2 2 0 0:20:00 1 3 2 4 221 3 0 0 20 1,45 0,47 0,54 0,62 0,7 0,79 0,88 0,98 1,08 1,19 1,3 1,42 7 3 1 3 3 3 1 2 1 3 3 1 3
2 1 2 2 0 0:20:00 1 3 2 4 220 3 4 0 60 4,12 0,82 1,02 1,23 1,45 1,67 1,89 2,11 2,33 2,56 2,79 3,02 5 3 2 3 1 3 1 1 2 1 2
2 1 2 2 0 0:20:00 1 3 2 4 231 3 3 4 50 4,18 0,89 1,04 1,2 1,36 1,52 1,69 1,86 2,04 2,22 2,4 2,59 7 3 3 3 3 3 2 3 1 3 3 1 2
2 1 2 2 0 0:20:00 1 3 2 4 242 3 3 4 10 2,85 0,76 0,84 0,93 1,02 1,12 1,21 1,32 1,42 1,54 1,65 1,77 7 3 1 3 3 3 1 2 1 3 3 1 3
2 1 2 2 0 0:20:00 1 3 2 4 238 3 3 2 15 1,85 0,47 0,52 0,57 0,63 0,7 0,77 0,85 0,93 1,02 1,11 1,22 7 3 1 3 3 2 3 3 1 3 1 1 2
2 2 2 3 2 0:30:00 1 4 3 2 221 3 0 0 20 3,98 0,75 0,93 1,12 1,32 1,51 1,72 1,92 2,14 2,35 2,58 2,81 4 3 1 3 3 3 1 2 1 3 3 1 3
2 2 2 3 2 0:30:00 1 4 3 2 237 3 4 2 35 1,22 0,33 0,38 0,44 0,51 0,59 0,68 0,78 0,89 1,01 1,14 1,29 7 3 3 3 3 2 2 1 1 1 1 1
2 2 2 3 2 0:30:00 1 4 3 2 217 2 1 0 10 3,27 0,62 0,75 0,89 1,02 1,16 1,3 1,44 1,58 1,72 1,86 2 3 3 3 3 2 3 2 2 3 2 2 1 3
2 2 2 3 2 0:30:00 1 4 3 2 228 2 4 2 30 0,8 0,2 0,26 0,32 0,38 0,44 0,5 0,56 0,62 0,68 0,74 0,8 1 3 3 3 2 3 2 3 2 1 1 1 3
2 2 2 3 2 0:30:00 1 4 3 2 222 2 0 2 7 3,17 0,71 0,89 1,06 1,24 1,42 1,61 1,79 1,97 2,15 2,34 2,52 1 3 1 3 2 3 3 3 2 3 1 1 3
2 2 2 3 2 0:30:00 1 4 3 2 219 2 3 0 10 1,67 0,47 0,54 0,61 0,69 0,76 0,83 0,91 0,98 1,05 1,13 1,2 6 3 2 3 1 3 3 1 3 1 2
2 3 2 2 3 1:00:00 1 4 3 2 226 3 0 0 10 5,02 1,08 1,24 1,41 1,58 1,76 1,94 2,12 2,32 2,51 2,72 2,93 5 3 3 3 2 2 3 3 3 1 1 1 2
2 3 2 2 3 1:00:00 1 4 3 2 234 2 3 0 18 3,7 0,88 1,01 1,14 1,27 1,4 1,53 1,67 1,8 1,93 2,07 2,2 2 3 3 3 2 3 3 1 1 1 1 3
Figure D.3: Data of the validation experiment (see Section 8.5)
2 3 2 2 3 1:00:00 1 4 3 2 217 2 1 0 10 3,97 0,99 1,13 1,26 1,4 1,54 1,68 1,82 1,96 2,1 2,24 2,38 2 3 2 3 2 3 3 1 1 1 1 3
2 3 2 2 3 1:00:00 1 4 3 2 238 3 3 2 15 3,05 0,83 0,92 1,02 1,13 1,23 1,35 1,47 1,6 1,73 1,88 2,03 5 3 2 3 2 3 3 3 2 1 1 1 2
2 3 2 2 3 1:00:00 1 4 3 2 230 2 3 4 10 2,97 0,87 0,97 1,08 1,18 1,28 1,39 1,49 1,6 1,7 1,81 1,92 1 1 3 1 2 3 2 3 1 1 2 1 3
2 3 2 2 3 1:00:00 1 4 3 2 229 2 3 4 45 1,8 0,36 0,43 0,5 0,57 0,64 0,71 0,78 0,85 0,92 1 1,07 1 1 1 1 2 3 3 3 1 1 1 1 3
3 1 3 2 2 0:30:00 2 3 4 1 238 3 3 2 15 0,85 0,32 0,35 0,38 0,41 0,44 0,47 0,51 0,54 0,58 0,61 0,65 6 1 2 2 3 2 3 3 3 3 3 1 2
3 1 3 2 2 0:30:00 2 3 4 1 242 3 3 4 10 2,27 0,51 0,6 0,69 0,78 0,88 0,98 1,08 1,19 1,29 1,39 1,5 7 1 2 2 3 2 3 3 3 3 3 1 2
3 1 3 2 2 0:30:00 2 3 4 1 231 3 3 4 50 2,27 0,51 0,6 0,69 0,78 0,88 0,98 1,08 1,19 1,29 1,39 1,5 8 1 2 2 3 2 3 3 3 3 3 1 2
3 1 3 2 2 0:30:00 2 3 4 1 220 3 4 0 60 3,03 0,72 0,82 0,92 1,03 1,14 1,26 1,38 1,5 1,62 1,74 1,86 6 1 2 2 3 2 3 3 3 3 3 1 2
3 1 3 2 2 0:30:00 2 3 4 1 221 3 0 0 20 2,7 0,55 0,67 0,79 0,92 1,04 1,17 1,29 1,42 1,55 1,67 1,8 6 1 2 2 3 2 3 3 3 3 3 1 2
3 1 3 2 2 0:30:00 2 3 4 1 226 3 0 0 10 2,87 0,62 0,73 0,84 0,96 1,08 1,2 1,33 1,45 1,57 1,7 1,82 6 1 2 2 3 2 3 3 3 3 3 1 2
3 2 3 3 0:30:00 1 4 3 2 219 2 3 0 10 2,17 0,6 0,7 0,81 0,91 1,01 1,12 1,22 1,33 1,43 1,54 1,64 7 1 3 2 3 2 3 2 3 3 3 1 2
3 2 3 3 0:30:00 1 4 3 2 222 2 0 2 7 3,67 0,73 0,91 1,09 1,27 1,46 1,64 1,83 2,01 2,2 2,38 2,57 6 1 3 2 3 2 2 2 3 3 3 1 2
3 2 3 3 0:30:00 1 4 3 2 228 2 4 2 30 1,3 0,26 0,33 0,4 0,46 0,53 0,6 0,67 0,74 0,81 0,87 0,94 6 1 3 2 3 2 2 2 3 3 3 1 2
3 2 3 3 0:30:00 1 4 3 2 217 2 1 0 10 3,77 0,72 0,87 1,03 1,19 1,34 1,5 1,66 1,82 1,97 2,13 2,29 7 1 3 2 3 2 2 2 3 3 3 1 2
3 2 3 3 0:30:00 1 4 3 2 237 3 4 2 35 1,47 0,27 0,34 0,41 0,48 0,55 0,61 0,68 0,75 0,82 0,89 0,96 7 1 3 2 3 2 2 2 3 3 3 1 2
3 2 3 3 0:30:00 1 4 3 2 221 3 0 0 20 4,23 0,8 1 1,2 1,41 1,61 1,82 2,03 2,23 2,44 2,64 2,85 6 1 3 2 3 2 2 2 3 3 3 1 3
3 3 2 3 3:00:00 1 3 4 2 229 2 3 4 45 1,77 0,36 0,42 0,49 0,56 0,62 0,69 0,76 0,82 0,89 0,96 1,03 6 1 3 2 3 2 3 2 3 3 3 1 2
3 3 2 3 3:00:00 1 3 4 2 230 2 3 4 10 2,93 0,87 0,97 1,07 1,17 1,27 1,38 1,48 1,58 1,69 1,79 1,89 5 1 3 2 3 2 3 2 3 3 3 1 2
3 3 2 3 3:00:00 1 3 4 2 238 3 3 2 15 2,77 0,79 0,89 0,98 1,07 1,17 1,26 1,36 1,46 1,55 1,65 1,75 5 1 3 2 3 2 3 2 3 3 3 1 2
3 3 2 3 3:00:00 1 3 4 2 217 2 1 0 10 4,27 0,99 1,15 1,31 1,48 1,64 1,81 1,98 2,15 2,33 2,5 2,67 4 1 3 2 3 2 3 2 3 3 3 1 2
3 3 2 3 3:00:00 1 3 4 2 234 2 3 0 18 4 0,88 1,04 1,19 1,35 1,52 1,68 1,85 2,01 2,18 2,35 2,51 4 1 3 2 3 2 3 2 3 3 3 1 2
3 3 2 3 3:00:00 1 3 4 2 226 3 0 0 10 4,87 1,05 1,22 1,4 1,58 1,76 1,94 2,12 2,31 2,49 2,68 2,87 4 1 3 2 3 2 3 2 3 3 3 1 2
102 4 4 2 4 null 1 4 2 3 241 3 1 4 13 1,05 0,32 0,37 0,43 0,5 0,57 0,66 0,76 0,88 1 1,13 1,28 7 3 2 3 3 3 3 3 3 1 3 1 3
102 4 4 2 4 null 1 4 2 3 240 3 0 2 12 2,52 0,58 0,69 0,8 0,93 1,06 1,2 1,35 1,5 1,66 1,82 2 5 3 2 3 3 3 3 3 3 3 3 1 3
102 4 4 2 4 null 1 4 2 3 239 3 0 4 30 1,85 0,47 0,58 0,7 0,82 0,95 1,09 1,23 1,39 1,54 1,71 1,89 9 3 1 3 3 3 3 3 3 3 3 1 3
102 4 4 2 4 null 1 4 2 3 219 2 3 0 10 2,63 0,86 0,93 1,03 1,14 1,27 1,42 1,59 1,79 2,01 2,26 2,53 5 3 1 3 3 3 3 2 3 3 3 1 3
102 4 4 2 4 null 1 4 2 3 221 3 0 0 20 3,18 0,82 0,94 1,06 1,2 1,34 1,48 1,63 1,79 1,96 2,13 2,31 5 3 1 3 1 3 3 2 3 3 3 1 3
102 4 4 2 4 null 1 4 2 3 236 3 0 2 25 2,52 0,58 0,69 0,8 0,93 1,06 1,2 1,35 1,5 1,66 1,82 2 6 3 2 3 1 3 3 1 3 3 3 1 3
102 5 4 2 3 0:30:00 1 3 2 4 217 2 1 0 10 3,43 0,81 0,9 1,01 1,13 1,26 1,41 1,56 1,73 1,91 2,11 2,32 3 3 1 3 1 3 3 3 3 3 3 1 3
102 5 4 2 3 0:30:00 1 3 2 4 218 2 0 4 10 3,37 0,74 0,84 0,96 1,1 1,24 1,4 1,57 1,75 1,94 2,14 2,36 2 1 1 1 1 3 3 3 3 3 1 1 3
102 5 4 2 3 0:30:00 1 3 2 4 235 2 0 4 15 3,03 0,71 0,79 0,9 1,02 1,14 1,29 1,44 1,61 1,79 1,98 2,19 4 3 1 1 1 3 3 3 3 3 3 1 3
102 5 4 2 3 0:30:00 1 3 2 4 229 2 3 4 45 2,43 0,62 0,68 0,75 0,85 0,96 1,08 1,22 1,38 1,54 1,73 1,93 5 3 1 3 1 3 3 3 3 3 3 1 3
102 5 4 2 3 0:30:00 1 3 2 4 233 2 1 0 18 2,9 0,76 0,83 0,91 1,01 1,12 1,24 1,37 1,52 1,69 1,87 2,06 3 3 1 1 3 3 1 3 3 3 1 3
102 5 4 2 3 0:30:00 1 3 2 4 238 3 3 2 15 2,18 0,44 0,51 0,59 0,68 0,77 0,87 0,97 1,08 1,19 1,3 1,43 9 3 1 3 1 3 3 2 3 3 3 1 3
102 6 2 2 2 0:50:00 4 1 2 3 129 2 1 3 20 2,03 0,56 0,66 0,75 0,85 0,95 1,05 1,15 1,25 1,35 1,45 1,55 3 3 1 1 1 3 3 1 3 3 3 1 3
Figure D.3 (continued): Data of the validation experiment (see Section 8.5)
persoon
factor10
factor11
factor12
factor1
factor2
factor3
factor4
factor5
factor6
factor7
factor8
factor9
timeW
sizeW
casus
score
diffW
intW
time
time
size
size
ulm
taxi
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
diff
diff
int
int
1
102 6 2 2 2 0:50:00 4 1 2 3 241 3 1 4 13 3,72 0,77 0,89 1 1,12 1,25 1,37 1,49 1,62 1,74 1,87 1,99 8 3 1 1 1 3 3 1 3 3 3 1 3
102 6 2 2 2 0:50:00 4 1 2 3 235 2 0 4 15 2,82 0,78 0,89 0,99 1,1 1,22 1,33 1,45 1,56 1,68 1,8 1,91 3 3 1 1 1 3 3 3 3 3 3 1 3
102 6 2 2 2 0:50:00 4 1 2 3 225 2 3 -1 10 3,2 0,86 0,99 1,13 1,27 1,4 1,54 1,68 1,82 1,96 2,1 2,24 2 3 1 1 1 3 3 1 3 3 1 3 3
102 6 2 2 2 0:50:00 4 1 2 3 226 3 0 0 10 4,07 0,88 1 1,12 1,24 1,37 1,5 1,63 1,76 1,89 2,03 2,16 6 3 1 1 1 3 3 3 3 3 3 3 3
102 6 2 2 2 0:50:00 4 1 2 3 227 2 3 2 8 2,3 0,73 0,86 1 1,14 1,28 1,41 1,55 1,69 1,83 1,97 2,11 1
103 4 1 0 2:00:00 1 4 2 3 241 3 1 4 13 4,02 1,14 1,31 1,48 1,66 1,83 2 2,18 2,35 2,53 2,71 2,88 5 3 3 3 3 3 3 3 3 3 3 1 3
103 4 1 0 2:00:00 1 4 2 3 240 3 0 2 12 4,2 0,96 1,14 1,31 1,49 1,66 1,84 2,01 2,19 2,37 2,54 2,72 6 3 3 3 3 3 3 3 3 3 3 1 3
103 4 1 0 2:00:00 1 4 2 3 239 3 0 4 30 3,97 0,99 1,13 1,26 1,4 1,54 1,68 1,82 1,96 2,1 2,24 2,38 7 3 3 3 3 3 3 3 3 3 3 1 3
103 4 1 0 2:00:00 1 4 2 3 219 2 3 0 10 4,43 0,94 1,14 1,35 1,55 1,75 1,96 2,16 2,37 2,58 2,78 2,99 4 3 3 3 3 3 3 3 3 3 3 1 3
103 4 1 0 2:00:00 1 4 2 3 221 3 0 0 20 3,13 0,72 0,86 1,01 1,15 1,3 1,45 1,59 1,74 1,89 2,03 2,18 7 3 3 3 3 3 3 3 3 3 3 1 3
103 4 1 0 2:00:00 1 4 2 3 236 3 0 2 25 3,55 0,79 0,93 1,06 1,2 1,34 1,47 1,61 1,75 1,89 2,03 2,17 8 3 3 3 3 3 3 3 3 3 3 1 3
103 5 3 0 2:00:00 1 4 2 3 217 2 1 0 10 4,43 0,94 1,14 1,35 1,55 1,75 1,96 2,16 2,37 2,58 2,78 2,99 8 3 3 3 3 3 3 3 3 3 3 1 3
103 5 3 0 2:00:00 1 4 2 3 218 2 0 4 10 6,57 1,32 1,56 1,81 2,06 2,31 2,57 2,82 3,08 3,33 3,59 3,85 4 3 3 3 3 3 3 3 3 3 1 3 3
103 5 3 0 2:00:00 1 4 2 3 235 2 0 4 15 6,32 1,27 1,5 1,74 1,98 2,22 2,46 2,71 2,95 3,2 3,44 3,69 4 3 3 3 3 3 3 3 3 3 3 1 3
103 5 3 0 2:00:00 1 4 2 3 229 2 3 4 45 2,42 0,87 0,96 1,06 1,15 1,25 1,35 1,44 1,54 1,63 1,73 1,83 4 3 3 3 3 3 3 3 3 3 3 1 3
103 5 3 0 2:00:00 1 4 2 3 233 2 1 0 18 4,03 0,82 1 1,19 1,37 1,55 1,74 1,92 2,11 2,29 2,48 2,66 8 3 3 3 3 3 3 3 3 3 3 1 3
103 5 3 0 2:00:00 1 4 2 3 238 3 3 2 15 3,25 0,9 1,05 1,21 1,36 1,52 1,68 1,83 1,99 2,15 2,3 2,46 8 3 3 3 3 3 3 3 3 3 3 1 3
103 6 2 0 2:00:00 3 4 1 2 129 2 1 3 20 2 0,94 0,99 1,04 1,09 1,15 1,2 1,26 1,31 1,37 1,43 1,49 7 3 3 3 3 3 3 3 3 3 3 3 3
103 6 2 0 2:00:00 3 4 1 2 241 3 1 4 13 3,15 1,17 1,22 1,27 1,33 1,39 1,45 1,51 1,57 1,64 1,71 1,77 7 3 3 3 3 3 3 3 3 3 3 3 3
103 6 2 0 2:00:00 3 4 1 2 235 2 0 4 15 2,33 1,12 1,17 1,23 1,28 1,34 1,4 1,46 1,52 1,59 1,65 1,72 5 3 3 3 3 3 3 3 3 3 3 3 3
103 6 2 0 2:00:00 3 4 1 2 225 2 3 -1 10 1,67 0,83 0,92 1 1,08 1,17 1,25 1,33 1,42 1,5 1,58 1,67 7 3 3 3 3 3 3 3 3 3 3 3 3
103 6 2 0 2:00:00 3 4 1 2 226 3 0 0 10 2,58 0,89 0,96 1,03 1,11 1,19 1,27 1,35 1,43 1,51 1,59 1,67 7 3 3 3 3 3 3 3 3 3 3 3 3
103 6 2 0 2:00:00 3 4 1 2 227 2 3 2 8 2,23 1 1,08 1,15 1,23 1,31 1,39 1,47 1,56 1,64 1,72 1,8 8 3 3 3 3 3 3 3 3 3 3 3 3
Appendix E
Proofs
To be proven
To prove: the Weighted Taxicab Distance (WTD), given by the formula:
229
Proof
Each of the three conditions will be proven separately below.
1. The proof of the first condition is split into two parts: first, it is proven that
the WTD distance is a value greater than or equal to zero; then, it is proven
( )=0
that WTD x; y iff x y . =
(a)
X jri (xi ) ; ri (yi )j
WTD (x; y) = wi jVij
i
0
Observe that wi , and that the other terms are absolute values. Thus,
WTD is a sum of terms that are all greater or equal to zero QED.
(b)
X jri (xi ) ; ri (yi )j
WTD (x; y) = wi =0
jVi j
i
A sum of all positive terms is zero if and only if all terms are zero, that
=0
is: wi ( ) ( ) =0
8i or jri xi ; ri yi j 8i. As wi > 8i it follows that 0
( )= ( )
ri xi ri yi 8i which is true if and only if x y QED. =
2. To prove: WTD (x; y ) = WTD (y; x)
230
wi ( jri (xi)jV;jri(yi)j + jri (yi)jV;jri (zi )j )
X
i i i
( ) ( )+ ( )
given that jri xi ; ri yi j jri yi ; ri zi j jri xi ( ) ( ) + ri(zi )j, because
( )
x; y is a distance measure and fulfills condition 3:
X jri (xi ) ; ri (zi )j
WTD (x; y) + WTD (y; z) wi jVi j = WTD (x; z)
i
QED
From the proof of these three conditions it follows that WTD (x; y) is a dis-
tance measure.
QED
231
Proof
Each of the three conditions will be proven separately below.
1. The proof of the first condition is split into two parts: first, it is proven that
the WED distance is a value greater than or equal to zero; then, it is proven
that WED x; y ( )=0 iff x y . =
(a) Recall the definition of WED (x; y):
v
u
uX ri (xi ) ; ri (yi ) 2
WED (x; y) = t wi jVij
i
0
Observe that wi , and that the other terms under the square root
are squared, i.e. always greater or equal to zero. Thus, WED is a sum
of terms that are all greater or equal to zero and hence the sum is also
greater or equal to zero.
QED
(b) A sum of all positive terms is zero if and only if all terms are zero, that
( ( ( )
is: wi ri xi ; ri yi 2( ))) = 0 or wi ,x y =0 =
QED
232
v
u 2
tX wi ri (xi ) ; ri (zi )
u
i jVi j
Define ri0 (x) = wi , then the equation is equivalent to:
v
j ij
sX sX sX
(ri(xi) ; ri(yi)) +
0 0 2
(ri (yi) ; ri(zi ))
0 0 2
(ri(xi ) ; ri(zi ))2
0 0
i i i
This is equivalent to proving the triangle inequality in a (metric) space that
()
is defined using ri0 x . Usually, the triangle inequality is proven using the
Schwarz inequality (see for example Lang (1989, p. 101)) that is defined on
vectors in <n . So the proof would require defining vectors in Nn , defining
inner products, proving the Schwarz inequality, and then proving the triangle
inequality. It was felt that this is inappropriate in this thesis, so here only the
triangle inequality on a uni-dimensional metric space defined using ri0 x ()
will be proven, as this proof does not depend on the Schwarz inequality
and thus does not require defining a vector space. Note that due to the uni-
dimensinal nature of the space, the index ‘i’ will be omitted.
To prove: (x; y) + (y; z) (x; z )
Proof:
w w
jV j r(x) ; jV j r(z) = r (x) ; r (z) = (x; z)
0 0
QED
From the proof of these three conditions it follows that WED (x; y) is a dis-
tance measure.
QED
233
234
Appendix F
Glossary
Introduction
This appendix will present a glossary of terms that have not been discussed elabo-
rately in the thesis. The keyword index of this thesis can be used to find passages
relevant to a certain keyword. Note that the purpose of this glossary is to clarify
terms as opposed to defining them. This means that in some cases, accuracy has
been sacrificed for better readability.
A
Abstract Windows Toolkit An Application Programming Interface for Java that
allows Java programs to manipulate objects in a graphical windowing envi-
ronment.
Active Server Pages A script language developed by Microsoft that allows a HTML
page to contain active code that is executed by the web server whenever the
page is requested. This is a form of Server-Side Scripting.
ADSR Attack, Decay, Sustain, Release: four phases of the amplitude of an au-
dio signal that together describe the “envelope”. Of these four phases, the
duration and the “steepness” is specified.
235
applet a small program (often written in Java) that is part of a WWW page. When
the user visits the page, the applet is downloaded from the WWW server to
the user’s computer, where it is executed.
Authoring System A system in which a person can create and/or compose learn-
ing materials by writing texts, and including (or creating) pictures, video
fragments, and audio fragments.
B
bandwidth the capacity of a network to transmit data.
Bayesian network An abstract structure that allows one to calculate the likeliness
of a certain outcome based on various ‘evidences’ that are input into the
network.
BLOB Binary Large Object: a data type which allows large binary data (such as
pictures, video and audio fragments) to be stored in database tables.
C
cardinality The cardinality of a set is the number of elements in that set.
CGI script a program written in a scripting language (often Perl) that is called
using the CGI set of rules.
CD-ROM a storage unit for digital information, such as movies, pictures, audio,
computer data, computer software, etcetera. The name “CD-ROM” can de-
note both the disc itself (resembling an audio CD) as well as the drive that
reads the disc.
236
Cocoon An XML publishing engine that executes as a Java servlet in the Apache
webserver.
Common Gateway Interface An set of rules that describe how information is de-
scribed from a web server to an auxiliary program. The program will gener-
ate a web page based on this information, and hand the resulting page back
to the web server for propagation to the user that requested the page.
concept map A ‘network’ in which the nodes are concepts, and the edges between
the nodes are relationships between these concepts.
cookie A small message that a web server can store on the harddisk of a web client.
Cookies are often used to overcome the statelessness of the HTTP protocol.
CPU Central Processing Unit: the main processor that is the core of the mod-
ern computer, which are almost all built according to the von Neumann1
architecture: a central processing unit that executes instructions stored in a
computer memory.
D
DNS Domain Name System: the system (and the protocol) that allows any com-
puter on the Internet to find the ‘IP address’ (in numbers) that belongs to an
Internet host name. Below the surface, the Internet only works with numbers,
but these are difficult to remember for human beings. Therefor, a translation
service is provided.
download To retrieve information from a remote computer onto the user’s local
computer.
237
Dublin Core A ‘core’ set of generic metadata fields for online resources.
dynamic documents Documents that are generated ‘on the fly’, that is, at the mo-
ment they are requested by the user. Generally, techniques such as Server-
Side Scripting or CGI-scripts are used for generating the document.
E
educational database A database that stores learning material and/or course man-
agement information.
F
file system a subsystem of an Operating System that is responsible for storing,
organizing, and retrieving files.
File Transfer Protocol An Internet protocol that prescribes how files can be trans-
ferred from one computer to another.
front-end a special component of a software system that interfaces with the user.
G
GHz Gigahertz, a measure of frequency (cycles per second). A Gigahertz is 1
10
9 cycles per second; see also ‘MHz’.
GIF The name of a picture compression standard that is often used on the Internet.
GUI Graphical User Interface, a user interface that operates using graphical metaphors
such as a desktop, trashcan, movable icons, etcetera.
238
H
harddisc a storage unit for digital information that consists of a number of mag-
netic discs that are sealed airtight in a housing. A harddisc is used to perma-
nently store information, such as computer software, documents, and other
data. The data are not lost when the power to the computer is switched off.
hardware the components of a computer system that are physical, such as the key-
board, the monitor, but also every component inside the computer cabinet.
Popularly spoken, hardware are those parts of a computer that can be kicked.
hypermedia Multimedia material through which the user can navigate using hy-
perlinks.
I
IEEE Institute of Electrical and Electronics Engineers, a large organization of
professionals.
Information Retrieval The discipline that studies and develops methods to store
and retrieve information from databases.
239
Internet domain A part of the Internet ‘namespace’; see also ‘DNS’.
Internet Protocol Suite The set of protocols that are used to operate the Inter-
net and applications on it, such as: the Internet Control Message Protocol
(ICMP), the User Datagram Protocol (UDP), the Transport Control Protocol
(TCP), and the Hypertext Transfer Protocol (HTTP).
J
Java A platform-independent programming language developed by Sun Microsys-
tems, based on the paradigm “write once, run anywhere”.
JDBC Java Database Connectivity: a standard method for Java programs to access
a database; the functionality resembles ODBC.
JPEG Joint Photographic Experts Group: a group that works on standards for
image coding and compression. Their best known standard is the JPG format
(formally IS 10918-1).
L
LAN Local Area Network: a computer network that usually spans one building,
or one complex of buildings.
learner control The amount and type of control the learner has over the (computer-
supported) learning process.
240
LOMG Learning Objects Metadata Group; a working group of the IEEE Learn-
ing Technology Standards Committee (LTSC) that focuses on developing
metadata standards for learning objects.
M
metadata data about a data item, such as: who created the data item, at what time,
for what purposes, what natural language is used by the item.
MHz Megahertz: a measure of frequency, named after the German scientist Hein-
rich Hertz (1857-1894) who demonstrated the existence of radio waves. A
MHz denotes a million cycles per second, and is often used to indicate the
speed of computers (it then denotes the number of instruction cycles a CPU
makes per second).
MPEG literally Motion Pictures Experts Group, but often the audio and video
compression algorithms developed by this group is meant.
multimedia often used to denote the use of text, graphics, (digital) video and (dig-
ital) audio data in one application. Used by marketing drones to denote a PC
with a CD-ROM drive and a soundcard.
multimedia database A database that stores multimedia data, such as video frag-
ments, audio clips, images, text, or combinations of these.
N
network port On the Internet, computers (so-called “hosts”) can run many dif-
ferent services, such as mail, ftp, and www. To allow another computer to
identify which service is to be accessed, network port numbers are used. For
example, the port that the www server often uses is 80.
241
O
OCR see Optical Character Recognition
Operating System The software layer that encapsulates the specific hardware of
a computer system, and that makes these resources available to application
programs in a structured manner.
P
PC see Personal Computer
PHP Hypertext Preprocessor: a script language that allows a HTML page to con-
tain active code that is executed by the web server whenever the page is
requested. This is a form of Server-Side Scripting.
ping A program that sends a certain packet to an Internet host that should return
the packet if it is up and running.
Powerpoint A widely used presentation tool that runs on the Microsoft Windows
platform. The tool generates slides that can be projected on a screen, much
like an overhead projector.
242
prototype A program (or device) that is not fully operable, but that serves to
demonstrate that certain ideas can work in practice, or that serves to develop
new methods or techniques.
Q
Quicktime A proprietary video format, developed by Apple Computer. It deploys
a lossy compression method with a high compression factor, so that the for-
mat is suited to transmit video fragments across the Internet.
R
radio button A series of buttons (analogue or digital) of which at most one at a
time can be in the ‘pressed’ or ‘selected’ state (i.e. listen to one radio station
at a time). This mechanism has been incorporated into most Graphical User
Interfaces.
Random Access Memory Computer memory that has the physical form of Inte-
grated Circuits, and that acts as “scratch” memory, as it can be written and
read from very fast. This type of memory cannot be used for long-term
storage, as its contents will disappear when the power to the computer is
switched off.
S
scalability The ability of a system to adapt to a growing load.
Server-Side Scripting A technique in which the web server executes scripts when
a HTTP request for a certain document (type) is received. The script gener-
ates, or helps to generate, the HTML output belonging to the request.
243
servlet A Java program that helps the web server to deal with certain types of
requests, for example to access a database or to process information the user
has sent. This technique is similar to so-called CGI scripts.
SGML Standard Generalized Markup Language (Goldfarb, 1990): a language in
which markup languages can be specified. One of the most well-known
languages which is derived from SGML is the Hypertext Markup Language
(HTML).
software series of computer instructions that tell a computer how to do a certain
task, for example how to behave like a typewriter or how to read a document
from a disc. Software is contained on a carrier, such as a harddisc, a floppy
disc, a CD-ROM, or a tape.
SMIL Synchronized Multimedia Integration Language: a multimedia presenta-
tion language that allows for time-based multimedia delivery over the web.
It is based on XML.
speech recognition The process of recognizing spoken words by a computer.
stateless A property of a system that indicates that the system does not store ‘state
information’ (this should be seen with respect to the model of Finite State
Machines).
Structured Query Language A language that allows the user to specify a query
to a relational database in a structured way.
SQL see Structured Query Language.
SSI see Server-Side Scripting
summative evaluation An evaluation effort that is directed at obtaining a final
evaluation of the object at hand (see also ‘formative evaluation’).
SunOS The Operating System that runs on most computers manufactured by Sun
Microsystems Inc.
T
t-test A statistical test on the t statistic (a stochastic variable that has a ‘student’
distribution). Such a variable often represents the mean of a population.
temporal structure The time-related relationships between certain objects, such
as: A starts 10 seconds after B has started.
244
traceroute A program that tracks the route that Internet packets follow to a certain
Internet host.
U
ULM see Unit of Learning Material
upload To send information from the user’s computer to a remote computer sys-
tem.
V
videodisc A disc that contains analogue video sequences and analogue still pic-
tures; a dedicated videodisc-player is needed to play the discs. It is also
known as “laserdisc”.
W
WAN Wide Area Network: a computer network that spans a number of cities, or
a number of countries.
web browser A program that runs on a computer which enables the user to re-
trieve documents from a webserver through an Internet connection.
web server A process (program that is being executed) on a computer that allows
documents to be retrieved through an Internet connection. The pages are
addressed using a Universal Resource Locator (URL), and are often written
in HTML.
245
WWW see World Wide Web
WWW cache A temporary storage facility of WWW documents that have been
fetched in the past; the facility is often built into web browers. If a page is
requested again, then first the cache is examined to see if the page is still
present, and if so, the cached page is presented. The page does not have to
be re-fetched from the network, thus saving bandwidth and WWW server
capacity.
X
XML eXtensible Markup Language: a markup language that can be used to markup
many different types of documents. It is a member of the family of SGML
languages.
246
Index
L1 distance, 90 CBT, 1
3NF, 52 CD-ROM, 13
CGI, 2, 72
Abstract Windows Toolkit, 76 classification, 26, 44
accessability, 44 CMC, 1
Active Server Pages, 2 CMI, 1
ADSR, 56 Cocoon, 77, 79, 82
AIME, 21 cognitive overload, 22
annotation, 53 Computer Based Training, 1
Apache, 77, 83 computer networks, 149
Apache webserver, 73 Computer-Assisted Education, 1
API, 14, 73 Computer-Assisted Instruction, 1
applet, 37 Computer-Based Learning, 1
Ariadne, 62 Computer-Managed Instruction, 1
ASCII, 37 Computer-Mediated Communication, 1
ASP, 70, 77 concept map, 57
attitude questions, 108 conceptions of teaching, 98
attribute, 35, 51, 53 concurrent validity, 107, 112
audio track, 53 construct validity, 107
Authoring System, 1 content validity, 107
Authorware, 37
Content-Based Retrieval, 55, 59
AWT, 76
context adapters, 42, 45
Bayesian network, 59 copyright, 46
BCNF, 52 correlational research, 101
belief system, 97 Course Editor, 75
Binary Large Object, 24 Course Player, 75
BLOB, 24, 84 courseware, 43
boolean expression, 55 courseware database, 20
Boyce-Codd Normal Form, 52 cow, 53
CPU, 3
CAE, 1 CPU-intensive, 67
CAI, 1 Cronbach’s , 114
cardinality, 91
causal-comparitive research, 101 Database Abstraction Layer, 75
CBL, 1 Database Management System, 13, 16
CBR, 55 database row, 52
247
database tables, 18 HTML, 71, 73, 81
DBMS, 13, 14, 16, 83, 85 HTTP, 68
master -, 83 cookies, 68
slave -, 83 Hypercard, 1
descriptive research, 101 hyperspace, 69
DILE, 77 Hypertext Preprocessor, 70
discrete algebra, 128 HyTime, 39, 57
distance, 56
distance measure, 69, 87, 144 ICMP, 139, 149
DNS, 83, 84 ideal ULM, 88, 119, 120, 150
document IEEE, 62
clusters, 56 LTSC LOMG, 129
ranking, 55 metadata, 103
similarity, 56, 59 implementation, 67
Document Type Definition, 57, 80 IMS, 20
drill and practice, 19 Information Retrieval, 8
DTD, 57 Intelligent Tutoring, 19, 38
Dublin Core, 26, 60, 62 Intelligent Tutoring System, 1
dynamic document, 70 Interaction Processor, 73
Internet Control Message Protocol, 135
educational internet domain, 68
context, 116 interoperability, 43
database, 19 Inverse Document Frequency, 55
metadata, 26 IO-intensive, 67
Educational Markup Language, 57 IR, 8, 18
entity, 51
type, 52 Java Media Framework, 73
entity type, 53 Java Servlet Engine, 77
Equal Weights Distance, 124 Java servlets, 2
Euclidean distance, 90, 91, 94 JDBC, 14, 18, 75, 76, 84
evidence, 59
experimental research, 102 key, 51
keyword indexing, 55
faculty, 111 keyword search, 28
familywise error, 115, 117 knowledge landscape, 38
feature extraction, 56
file system, 15, 18 labeling system, 26, 48
foreign key, 52 LAN, 6
formal features, 56 laserdisc, 10
Formula-M, 43, 46 layout, 45
learned helplessness, 21
genericity, 45 learner control, 21, 23
geometrical shapes, 54 learning object, 29, 32
lost in hyperspace, 23
historical research, 101 Lotus Domino, 70
history, 46 Lotus LearningSpace, 70
248
LTSC, 6, 62 pedagogical metadata, 32
peer reviewing, 29
manual gestures, 52 perfect match, 29, 33, 88
measure of relevance, 68 Perl, 72
Medical Subject Headings, 58 permutation, 128
META REFRESH, 84 PHP, 70, 77
meta-cognitive capabilities, 23 pilot test, 108, 148
metadata, 8, 38, 53 pitch, 53
educational -, 55, 60 portability, 43, 70
field, 25, 44, 89, 119, 144 PostgreSQL, 76
IEEE -, 62, 130 Powerpoint, 7, 21
labeling, 25 predictive validity, 107
labels, 38 Presentation Layer, 73
record, 49, 54 proposition, 59
search, 28 prototype, 8, 65, 143
space, 88, 90, 142
validators, 29 query, 52
value, 25, 89, 91 query by example, 59
vocabulary, 26 query processor, 54
metric space, 89, 91 questionnaire, 111
MHEG, 31, 39 Quicktime, 3
Mhz, 3
modularity, 69 rank number, 90, 91
Motion Pictures Expert Group, 3 RDBMS, 17
MPEG, 3, 9, 84 relational database, 51
multi-dimensional space, 56 reliability, 114
multimedia requirements, 66
database, 5 retroactive interference, 21
editor, 41 reusability, 42, 87
libraries, 55 reuse, 2, 35, 36
MySQL, 70, 72, 76, 83 designing for -, 45
opportunities for -, 45
navigation, 69
reuse by design, 2
Navigation Manager, 75
round-robin, 83, 84
nearest neighbour problem, 56
network port, 83
scalability, 67, 83
neural networks, 56
score, 150
normalization, 91
search
ODBC, 14, 18 engine, 6, 7, 28
Operating System, 70 facility, 44
operational definition, 125 interface, 27
search engine, 7
PC, 3 semantic network, 57
PDF, 37 semantic ordering, 89
Pearson’s correlation coefficient, 112 semantics, 54
249
Server-Side Scripting, 70, 73 Weighted Euclidean Distance, 120, 123,
servlets, 2, 77 124, 155
SGML, 37, 39, 57, 73, 75, 80 Weighted Taxicab Distance, 120, 123, 124
share and reuse, 2 word frequency, 7, 55
shopping cart, 68 word processor, 7
similarity search, 56 World Wide Web, 67, 111
simulation, 20 WWW, 6
SMIL, 39 WWW cache, 85
spatial annotations, 54
Spearman’s , 117 XML, 12, 37, 39, 57, 75, 76, 79
speech recognition, 53 XSL, 79
SQL, 13, 51, 52, 71, 75
stateless, 68
SunOS, 76
t-test, 115
Taxicab distance, 90
TeleTOP, 70
temporal structure, 53
Third Normal Form, 52
transactions, 58
transcript, 53
TREC, 128
ULM, 2, 8, 10, 35
ULM evaluation form, 144
Unit of Learning Material, 2, 30, 59, 65
Universe of Discourse, 35
Unix, 73
upload, 2
URL, 7, 8
usability, 123, 140
USMARC, 58
validation, 149
validity, 107, 150
vector, 90
Vector Space Model, 56
video scenes, 54
videodisc, 2
visual objects, 53
webserver, 68
weight, 118
weight step, 151
weight vector, 124
250