Reinforcement Learning is a form of machine learning which allows a controller (or agent) to learn to optimize the behavior of a control system. In this talk we will study how reinforcement learning can be applied to achieve good robot controllers. First a description of problems in robot control will be given. Then reinforcement learning will be described. Since a robot usually acts in a high-dimensional continuous space, reinforcement learning will be combined with function approximators. Different combinations and their advantages and limitations will be discussed. Finally, some recent work on averaging reinforcement learning which leads to more stable learning dynamics when combined with function approximators will be described.
Spiking neural networks offer a more detailed abstraction of real biological neurons. Research into these models shows how efficient computation in spiking neurons can be achieved. As these neural networks are close models from biology, they both take functional ideas and insights from this field, as they also deliver predictions and models for biological systems. I will discuss what is different about spiking neural networks, and walk through some recent results in the field regarding supervised, unsupervised and Bayesian learning.
Probabilistic networks, also called Belief networks, or Bayesian networks, are the underlying technology of several decision support systems. These networks model statistical (in)dependencies between a set of (usually discrete) stochastic variables.
In the talk, the notion of probabilistic network will be explained, with some applications. The central problem for probabilistic networks is to compute the probability distribution of a variable, conditional upon values of some other variables (the `observations'). In the talk we explain the main ideas of the most used algorithm to solve this `inference' problem for probabilistic networks, i.e., the algorithm by Lauritzen and Spiegelhalter. This algorithm works in linear time when the moralized graph of the probabilistic network has bounded treewidth - something that appears to be the case in many practical cases. In general, the inference problem is #P-complete.
In the talk, we also discuss recent work with Linda van der Gaag and Ad Feelders on the monotonicity of probabilistic networks. In many cases, one assumes from the application that a probabilistic network displays some monotonicity properties. Two notions of monotonicity are introduced. The complexity of these notions, and a heuristic anytime algorithm will be discussed.
Joint work with Nikos Vlassis, Sjaak Verbeek, and Wojciech Zajdel (UvA)
Intelligent houses perceive the world with sensors and must take appropriate actions. Since sensor data are noisy, and the effect of actions may not be deterministic, the state of the observed dynamic systems will be uncertain. Bayesian statistics provides a framework tailored to real world state estimation (inference) problems. In this framework a-priori probabilities relate to the system’s state of knowledge, ‘belief’, and are updated after receiving new observational evidence.
In the tutorial we will gently introduce Dynamic Bayesian Networks. We show that the Kalman filter and Markov model are special cases of the DBN. Then we show some real problems where sensor data has to be used to track and localize moving objects. We introduce approximation techniques as for example particle filtering.
Joint work with Emile Aarts and Jan Korst (Philips Research).
Ambient intelligence opens a world of unprecedented experiences: the interaction of people with electronic devices will change as context awareness, natural interfaces, and ubiquitous availability of information will be realized. Distributed applications and their processing on embedded stationary and mobile platforms play a major role in the realization of ambient intelligent environments. Notions as media at your fingertips, enhanced-media experiences, and ambient atmospheres denote novel and inspiring concepts that are aimed at realizing specific user needs and benefits such as personal expression, social presence, and well being that seem quite obvious from a human perspective, but are quite hard to realize because of their intrinsic complexity and ambiguity. Obviously, the intelligence experienced from the interaction with ambient intelligent environments will be determined to a large extent by the software executed by the distributed platforms embedded in the environment, and consequently, by the algorithms that are implemented by the software. Hence, the concept of ambient intelligence poses major challenges on the design and analysis of intelligent algorithms.
Joint work with Mark Vossen (TUE), Wim Verhaegh, Emile Aarts (Philips Research)
Music listening is an everyday leisure activity for many people in which they select and sort songs for playback. The profusion of digital music files (i.e., mp3) on the Internet and on personal hard disks makes it almost impossible for a music listener to know about and to sort out her music. Given the huge amount of music already available, users need support from technology to create their music playlists and to get more fun out of their music collection.
We consider playlist creation as a shared responsibility between a user and an algorithm.
The user indicates what kind of music should be included in a playlist by specifying constraints using an interactive user interface. The task of an algorithm is then to find a `best possible' playlist that satisfies all or most constraints.
Automatic playlist generation (APG) from a given set of constraints happens to be a NP-hard problem.
The best course of action to solve any APG problem instance is to apply a general approach such as local search. In this talk, I will present how we have modelled the problem by translating each constraint into a penalty function. Playlist generation then comes down to finding an approximate solution by minimizing the total penalty. Further, neighborhood structure definition and search heuristics will be addressed. Findings from a user evaluation will be presented in which the performance of the approach in terms of quality of the playlist is compared to a classical constraint satisfaction approach. In conclusion, user experiences in a near-future application for a consumer electronics device using these algorithms are discussed.
Joint work with Srinivas Gutta and Wim Verhaegh (Philips Research).
As a result of the increasing availability of TV programs, it is becoming quite a challenge to select the programs to watch (or to record). The advent of electronic program guides (EPGs) makes it already possible to apply filtering techniques to focus one's attention to only part of the guide, but it remains a daunting task to make a good selection.
Machine learning techniques provide additional and more advanced ways to aid in the selection of which programs to watch. By analyzing the viewing behavior of a user to obtain implicit program ratings or have the user explicitly rate a number of programs, these ratings, together with the EPG metadata associated to the programs, provide a training set to generate a user profile. This profile can subsequently be used to recommend new programs, i.e. to distinguish between interesting and uninteresting programs.
Naive Bayesian classification (NBC) is a machine learning algorithm that is quite suited to this recommendation task. It is intuitive, which is an advantage towards the user in case some feedback is desired on the recommendation results, it is computationally efficient, and competitive when compared to other machine learning techniques such as decision trees and artificial neural networks.
NBC operates on occurrence counts of classes and feature values in the training set, such as the number of interesting programs of the genre drama or the number of interesting programs starring Richard Gere. However, the occurrence of small counts has a detrimental effect on the accuracy of the classifier. This can be the result of a small training set, which is typically the case in the initial phase of using the recommender, but this may also occur because some feature values simply do not often occur.
The use of Laplace correction is a well-known solution to deal with zero-counts, but a more general approach is required to deal with small, non-zero counts. By considering the training set as a set of randomly chosen programs, it is possible to statistically analyze the operation of NBC, leading to a notion of confidence in the classification of a single program. This makes it possible to enrich the classifier in several ways.
In this talk, I will briefly outline the context and explain how NBC works, mathematically as well as by way of an example, thereby highlighting the problem I will tackle. Then, I will introduce this notion of confidence and show in what ways it can be incorporated into the classifier.
With viewing histories obtained during the time frame 1999-2001 of eleven users, I will present some results of the effectiveness of one of the proposed approaches.
Joint work with Aukje van Duijnhoven, Jan Korst and Pim Tuyls (Philips Research).
Due to the increased availability of digital content and the explosive growth of the Internet, people are confronted with an overload of information. For instance, the amount of music available through the Internet is by far too large for a user to cope with. One of the approaches to help people to make good selections of content is given by the use of recommender systems. These systems estimate to what extent a user would like the available content, based on the user's likes and dislikes for previously encountered content. A well-known technique to do so is collaborative filtering, also known as social filtering, which uses preferences of a community of users to predict the preference of a particular user for a particular piece of content. Collaborative filtering systems are found, for example, on the Internet at music sites to recommend new music, and at book sites to recommend new books.
One of the issues with collaborative filtering is that it requires the preferences of a community of users for doing the computations. Generally, this information is collected on a server, and web services applying collaborative filtering are usually accompanied by a privacy statement. However, if the preference information is really personal (e.g. in case of medical information), users may not be willing to give this information, because of a lack of trust in the server. Furthermore, there may be other reasons why users do not want to reveal preference information.
Therefore, we developed a system that prevents any information about a user's preferences to become known to others. This not only means that we keep the user's ratings for items secret, but even the information of what items he has rated. Furthermore, we do not even reveal this kind of information anonymously, as we do not want to run the risk that the identity of the user is traced back somehow, after which his data is in the clear. Finally, as similarities between users also give information about a user's preferences, we also keep this data secret.
The presentation consists of three main parts. First, we discuss the techniques and formulas behind collaborative filtering. Secondly, we discuss the encryption system that we are going to use, and some special properties it has. Thirdly, we are going to combine the previous two ingredients, and show how collaborative filtering can be done on encrypted data.
The presentation is concerned with real-time scheduling for media processing in software in high-volume electronics multimedia consumer terminals, such as digital TV sets, digitally enhanced analog TV sets and set-top boxes. Media processing in software is performed using powerful programmable components. In order to be competitive with dedicated hardware solutions, these programmable components have to be used very cost-effectively. Moreover, software solutions should preserve existing qualities of these systems, such as robustness, predictability, and stability. An existing QoS approach to address this challenge is based on a combination of application adaptation and QoS-based resource management, where the latter is based on a combination of a multi-layer control hierarchy and a reservation-based resource manager that provides (absolutely) guaranteed and enforced resource budgets. Moreover, budget allocation below worst-case is required for cost-effectiveness reasons.
These requirements combined with the QoS approach result in a substantial reaction-delay of the system upon structural load changes of the media data, which gives rise to perceivable quality degradations of the media output. The presentation addresses the context, analyses the problem, and presents a solution based on so-called conditionally guaranteed budgets (CGBs). This novel concept of a CGB is explained, a so-called `in-the-place-of’ design for CGBs illustrated, and means to analyze systems using CGBs are briefly presented.
Joint work with Clemens Wüst, Reinder Bril, Christian Hentschel and Liesbeth Steffens (Philips Research).
Consumer terminals, such as set-top boxes and digital TV sets, currently apply dedicated hardware components to process video. In the near future, programmable hardware with video processing in software is expected to take over for reasons of flexibility. The problem with this, however, is that software video processing shows highly fluctuating processing requirements, resulting in a worst-case load that is significantly higher than the average-case load. A safe (worst-case) resource allocation is however too costly, so one has to resort to a lower, close to average-case resource allocation.
To ensure that video processing tasks do not interfere with each other, each task gets a guaranteed resource budget. Then the problem for each task is to get by with its budget, while producing timely output at regular intervals. To this end, we introduce two means. The first one is working ahead, for which some buffering of input and output data is required. This allows a task to spread load peaks over time. The second means is scalable processing: if a task risks missing a deadline, it can choose to reduce the quality of the processing of some video frames, in order to save time and meet its deadline.
The problem we address is now how to choose the quality level for each frame such that the experienced quality is as high as possible. This quality of experience is determined by three components: the quality of processing (which should be as high as possible), deadline misses (which should be avoided), and quality changes (which should also be minimized to avoid jumping up and down).
The above quality control problem can be seen as a stochastic optimization problem. Therefore, we first formulate it as a Markov decision problem. Based on statistical information about processing times, which we derive from processing traces of real video material, we can off-line determine a control strategy that maximizes the average quality of experience. This strategy is then used on-line to select the appropriate quality levels. Secondly, to become more independent of the used video processing statistics, we derive a reinforcement learning approach, which in a sense is an on-line variant of the Markov decision approach. By exploiting the characteristics of the problem area, we can adapt the reinforcement learning approach to learn very fast, thereby achieving nearly the same performance as that of off-line determined strategies.
In the presentation, we will discuss first the different aspects of the above quality control problem. Next, we discuss the Markov decision approach and the reinforcement learning approach, together with a simple technique to handle long-term load fluctuations. We end with simulation experiments based on real video data, and show that the developed approaches perform close to a theoretical upper bound.
An overview is given of basic algorithmic problems from the field of computational molecular biology. With the possibility of reading the DNA code a vast amount of data is available for analysis. The use of computers to handle this information is instrumental. We also illustrate how discrete models can help in understanding some problems.
One of the most recent trends in data mining research is to develop algorithms for the analysis of (semi)-structured databases, which are databases that cannot sensibly be stored in a single relational table. One useful step in the analysis of these databases is the discovery of substructures that satisfy user-defined constraints. The most well-known such constraint is the frequency constraint. In this talk we will introduce mining algorithms that search for structures that satisfy predefined anti-monotone constraints, and introduce the complexity issues that are involved in the development of such algorithms. Most attention will be given to a frequent graph mining algorithm that we recently developed. The issues at hand will be illustrated using an application of chemistry.
We present a new method for clustering based on compression. The method doesn't use subject-specific features or background knowledge, and works as follows: First, we determine a universal similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is universal in that it is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, co-developed by one of the authors, is provably optimal but uses the non-computable notion of Kolmogorov complexity. We propose precise notions of similarity metric, normal compressor, and show that the NCD based on a normal compressor is a similarity metric that approximates universality. To extract a hierarchy of clusters from the distance matrix, we determine a dendrogram (binary tree) by a new quartet method and a fast heuristic to implement it. The method is implemented and available as public software, and is robust under choice of different compressors. To substantiate our claims of universality and robustness, we report evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors. In genomics we presented new evidence for major questions in Mammalian evolution, based on whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta hypothesis against the Theria hypothesis.
Wireless Sensor Networks offer enormous challenges for researchers and developers. The same applies for marketers. In this presentation we will discuss the research programme of TNO Physics and Electronics Laboratory on Wireless Sensor Networks. We will focus on the analysis of the needs of our customers, as well as the application of existing products in the network.
We will focus on possible applications for Defense and Public Safety, as well as examples for the process industry.
Wireless and mobile networks are an excellent playground for researchers with an algorithmic background. Many research problems turn out to be variants of classic graph theory problems. In particular the rapidly growing areas of ad hoc and sensor networks demand new solutions for timeless graph theory problems, because i) wireless devices have lower bandwidth and ii) wireless devices are mobile and therefore the topology of the network changes rather frequently. As a consequence, algorithms for wireless and mobile networks should have i) as little communication as possible and should ii) run as fast as possible. Both goals can only be achieved by developing algorithms requiring a small number of communication rounds only (so-called "local" algorithms). In this talk I present a few algorithmic applications in wireless networking.
In the talk, we introduce the notion of robustness for optimization algorithms on restricted domains. Generally speaking, robust algorithms are designed for restricted classes of instances, e.g. graphs with certain additional properties, but accept a wider range of input, e.g. all undirected graphs. A robust algorithm then:
Specifically, we use the example of polynomial-time approximation schemes (PTAS) that create maximum independent sets and minimum dominating sets in geometric intersection graphs (without representation or localization) as examples of robust algorithms for wireless communication graphs. These algorithms accept any undirected graph as input, and either return a (1 + epsilon)-approximate independent or dominating set, or a certificate showing that the input instance is no wireless communication graph. Note that the corresponding recognition problem of deciding whether a given instance reflects such a graph is NP-hard.
A mobile ad hoc network is an autonomous collection of mobile nodes communicating over wireless channels. Because no centralized control is present, algorithmic problems are more difficult than for general wireless networks. One way to simplify these challenges is to consider a graph model for mobile ad hoc networks. We describe the (unit) disk graph model and show that several relevant optimization problems can be approximated if a geometric representation of the graph is available. We discuss an existing 5-approximation algorithm and a PTAS for the maximum independent set problem on (unit) disk graphs. The PTAS uses the well-known shifting technique.
We then propose two new notions for unit disk graphs: thickness and density. Thickness is the maximum number of disk centers in a slab of width 1, and density is the maximum number of disk centers in a 1-by-1 square. We show that a unit disk graph of bounded thickness has bounded pathwidth. We also present an algorithm that solves Maximum Independent Set in polynomial time if the thickness is log n. We combine this algorithm with the shifting technique to construct an asymptotic FPTAS for Maximum Independent Set on unit disk graphs of bounded density.
The self-localization of nodes in ad-hoc (and sensor) networks gained a lot of attention lately. It came as a response to the fact that positioning information is essential in providing context awareness in a wireless sensor network, and that existing solutions such as global positioning systems are not a solution due to high cost (in terms of money and resources) and might not be available or provide imprecise positioning information in indoor environments, etc.
Information such as connectivity, distance estimation based on radio signal strength, sound intensity, time of flight, angle of arrival, etc. were used with success in determining the position of each node with various accuracy using most of the times localized computation. The position information, once obtained, is not only used for stamping the sensed data, but also in designing the networking protocols, for example, leading to more efficient routing schemes based on the estimated position of the nodes.
In this talk I will present the current state of the art of the field, describing some of the basic ideas used in the existing algorithms (for both the static and mobile scenarios). I will outline the main identified problems and some of the existing bounds for the localization algorithms in general.
The application of Game Theory to networks, particularly to routing in networks, is gaining a growing amount of interest in the computer science community. The reason is that large-scale communication networks like the Internet cannot be managed by a central authority. This motivates the analysis of network traffic using models from Game Theory in which single packets or data streams are managed by selfish agents trying to minimize their individual latencies.
One of the common assumptions in this research area is that the routing process should arrive into a Nash equilibrium in which no network user has an incentive to change its strategy. It is well known (and easy to see) that Nash equilibria do not always optimize the overall performance of the system. The so-called "price of anarchy" or "coordination ratio" measures the degradation of the performance due to uncoordinated behavior in form of a comparison between the latency experienced in a worst case Nash equilibrium and the latency in an optimal solution. We give an overview over the results and techniques in this area.
Classical Game Theory is based on the assumptions that agents are fully rational and have global knowledge about all details of the game under study. For routing games in large networks like the Internet, these assumptions seem to be quite unrealistic. Evolutionary Game Theory takes a different approach to explain why large populations of agents may or may not "converge" towards equilibrium states: Agents try to improve their situation by replicating the locally observed strategies of other more successful agents. We present an evolutionary model for selfish routing and study questions concerning the stability of equilibria and the time of convergence towards them.
In the presentation, we introduce the concepts and use of competitive software agents, multiagent systems, and economic games. In particular, we discuss various types of economics games, like negotiations and auctions, as well as the design of appropriate games (or "mechanisms") for specific application settings. Subsequently, we address the possibilities for developing intelligent algorithmic solutions for individual or multiple agents, participating in such games. We also show several concrete application cases and examples.
For various e-commerce applications autonomous agents can do the actual trading on behalf of their users. We consider an agent who trades repeatedly on behalf of his user, given an overall budget and preferences per time step, both specified at the start. For many e-commerce settings such an agent has limited computational resources, limited prior information concerning price fluctuations, and little time for online learning. We therefore develop an efficient heuristic that requires little prior information to work well from the start, even for very roughed nonsmooth problem instances. Extensive computer experiments conducted for a wide variety of customer preferences show virtually no difference in performance between a dynamic programming (DP) approach and the developed heuristic carrying out the agent's task. The DP approach has, however, the important drawback of generally being too computationally intensive.
We present a novel system for selling bundles of news items. Through the system, customers bargain with the seller over the price and quality of the delivered goods. The advantage of the developed system is that it allows for a high degree of flexibility in the price, quality, and content of the offered bundles. The price, quality, and content of the delivered goods may, for example, differ based on daily dynamics and personal interest of customers.
Autonomous "software agents" execute the negotiation on behalf of the users of the system. To perform the actual negotiation these agents make use of bargaining strategies. We present the novel approach of decomposing bargaining strategies into concession strategies and Pareto efficient search strategies. Additionally, we introduce the orthogonal and orthogonal-DF strategy: two Pareto search strategies. We show through computer experiments that the use of these Pareto search strategies will result in very efficient bargaining outcomes. Moreover, the system is setup such that it is actually in the best interest of the customer to have their agent adhere to this approach of disentangling the bargaining strategy.
Application 1: Intermediaries in an electronic trade network
We address the question whether intermediaries will still exist and be able to make a profit if consumers can make direct connections with producers, as is often the case in electronic commerce. We have performed agent-based simulations to study the performance of intermediaries under different market conditions. We modeled an electronic trade network where an information good is traded over the network. Each trade period, cost-minimizing consumers have to decide which links to form to sellers (i.e., producers and intermediaries), while the good can only be purchased if a link between the buyer and the seller exists. Links thus represent trading possibilities, and flows of relevant information (i.e., in our case, price quotes between potential buyers and sellers) and the consumers have to make a strategic decision about which (costly) links to form. We have used an evolutionary algorithm to model the search and learning behavior of the buyers. Our main finding is that if market dynamics are sufficiently complex, intermediaries that have better knowledge about the market than the average consumer are initially able to increase their market share and make a profit.
Application 2: investigates the spread of information on a social network. The network consists of agents that are exposed to the introduction of a new product. Consumers decide whether or not to buy the product based on their own preferences and the decisions of their neighbors in the social network. We have used and extended concepts from the literature on epidemics and herd behavior to study this problem. The central question of this chapter is whether firms can learn about the network structure and consumer characteristics when only limited information is available, and use this information to evolve a successful directed advertising strategy. In order to do so, we have extended existing models to allow for heterogeneous agents and positive as well as negative externalities. The firm can learn a directed advertising strategy that takes into account both the topology of the social consumer network and the characteristics of the consumer. Such directed advertising strategies outperform random advertising.
We study the possibilities of automatically extracting information from the Internet, by combining data that can be found on many potentially relevant web pages. We assume that a knowledge domain is given by an ontology that specifies the classes of objects that are relevant in the domain and the relations between instances of these classes. Given example instances for each of the classes and example statements for each of the relations, we want to automatically extend these example sets by combining data from different web pages.
Experimental results indicate that good results (high precision and recall) can be obtained in this way, provided that enough web pages can be found that relate to the knowledge domain.
Since the Semantic Web was thought up by Tim Berners-Lee, many researchers have become aware of the enormous potential of the Semantic Web. The mesh of information linked together in a format that machines can efficiently process allows a wide range of applications to have a global scale knowledge base within reach. However, the size and nature of the Semantic Web may benefit from approximation techniques to unleash its full potential.
First, approximation can be used to reduce complexity. It is well-known that there exists a trade-off between the expressiveness of a language and its complexity in reasoning. Second, information on the Web may not always be correctly defined (e.g., scraped from directories). Approximate methods can be used to cope with erroneous, inconsistent, or missing information.
In this presentation we will focus on which kind of approximation techniques can be used in the context of the Semantic Web. In particular we will focus on a technique called "Approximate Deduction". One of its members, called S-1-/S-3-Entailment, will be described in more detail. Furthermore, we describe in some detail how S-1-/S-3-entailment can be applied to some of the Semantic Web inference tasks.