Je suis chercheuse postdoctorale dans le département de mathématiques et informatique de l'Université technologique d'Eindhoven. J'ai obtenu mon doctorat en informatique à Télécom Paris et Nokia Bell Labs France, où j'était supervisée par Thomas Bonald et Fabien Mathieu.
Ma recherche actuelle se focalise sur la conception et l'analyse d'algorithmes de gestion des ressources pour les grandes grappes de serveurs, telles que les clusters d'ordinateurs ou les réseaux de distribution de contenu, soumises à une demande stochastique.
E-mail: please enable JavaScript
Profils:
Google Scholar,
HAL,
GitHub,
CV
Faits marquants récents sur mon travail de recherche :
La demande croissante pour les services de cloud computing encourage les opérateurs à optimiser l'utilisation des ressources dans les grappes d'ordinateurs. Cela motive le développement de nouvelles technologies qui rendent plus flexible la gestion des ressources. Cependant, exploiter cette flexibilité pour réduire le nombre d'ordinateurs nécessite aussi des algorithmes de gestion des ressources efficaces et dont la performance est prédictible sous une demande stochastique. Dans cette thèse, nous concevons et analysons de tels algorithmes en utilisant le formalisme de la théorie des files d'attente.
Notre abstraction du problème est une file multi-serveur avec plusieurs classes de clients. Les capacités des serveurs sont hétérogènes et les clients de chaque classe entrent dans la file selon un processus de Poisson indépendant. Chaque client peut être traité en parallèle par plusieurs serveurs, selon des contraintes de compatibilité décrites par un graphe biparti entre les classes et les serveurs, et chaque serveur applique la politique premier arrivé, premier servi aux clients qui lui sont affectés. Nous prouvons que, si la demande de service de chaque client suit une loi exponentielle indépendante de moyenne unitaire, alors la performance moyenne sous cette politique simple est la même que sous l'équité équilibrée, une extension de processor-sharing connue pour son insensibilité à la loi de la demande de service. Une forme plus générale de ce résultat, reliant les files order-independent aux réseaux de Whittle, est aussi prouvée. Enfin, nous développons de nouvelles formules pour calculer des métriques de performance.
Ces résultats théoriques sont ensuite mis en pratique. Nous commençons par proposer un algorithme d’ordonnancement qui étend le principe de round-robin à une grappe où chaque requête est affectée à un groupe d'ordinateurs par lesquels elle peut ensuite être traitée en parallèle. Notre seconde proposition est un algorithme de répartition de charge à base de jetons pour des grappes où les requêtes ont des contraintes d'affectation. Ces deux algorithmes sont approximativement insensibles à la loi de la taille des requêtes et s'adaptent dynamiquement à la demande. Leur performance peut être prédite en appliquant les formules obtenues pour la file multi-serveur.
The growing demand for cloud-based services encourages operators to maximize resource efficiency within computer clusters. This motivates the development of new technologies that make resource management more flexible. However, exploiting this flexibility to reduce the number of computers also requires efficient resource-management algorithms that have a predictable performance under stochastic demand. In this thesis, we design and analyze such algorithms using the framework of queueing theory.
Our abstraction of the problem is a multi-server queue with several customer classes. Servers have heterogeneous capacities and the customers of each class enter the queue according to an independent Poisson process. Each customer can be processed in parallel by several servers, depending on compatibility constraints described by a bipartite graph between classes and servers, and each server applies first-come-first-served policy to its compatible customers. We first prove that, if the service requirements are independent and exponentially distributed with unit mean, this simple policy yields the same average performance as balanced fairness, an extension to processor-sharing known to be insensitive to the distribution of the service requirements. A more general form of this result, relating order-independent queues to Whittle networks, is also proved. Lastly, we derive new formulas to compute performance metrics.
These theoretical results are then put into practice. We first propose a scheduling algorithm that extends the principle of round-robin to a cluster where each incoming job is assigned to a pool of computers by which it can subsequently be processed in parallel. Our second proposal is a load-balancing algorithm based on tokens for clusters where jobs have assignment constraints. Both algorithms are approximately insensitive to the job size distribution and adapt dynamically to demand. Their performance can be predicted by applying the formulas derived for the multi-server queue.
Efficiently exploiting the resources of data centers is a complex task that requires efficient and reliable load balancing and resource allocation algorithms. The former are in charge of assigning jobs to servers upon their arrival in the system, while the latter are responsible for sharing the server resources between their assigned jobs. These algorithms should adapt to various constraints, such as data locality, that restrict the feasible job assignments. In this présentation, we propose a token-based algorithm that efficiently balances the load between the servers without requiring any knowledge on the job arrival rates and server capacities. Assuming a balanced fair sharing of the server resources, we show that the resulting dynamic load balancing is insensitive to the job size distribution. Its performance is compared to that obtained under the best static load balancing and in an ideal system that would constantly optimize the resource utilization. We also make the connection with other token-based algorithms such as Join-Idle-Queue.
Understanding the performance of a pool of servers is crucial for proper dimensioning. One of the main challenges is to take into account the complex interactions between servers that are pooled to process jobs. In particular, a job can generally not be processed by any server of the cluster due to various constraints like data locality. In this talk, we will represent these constraints by some assignment graph between jobs and servers. We will present a recursive approach to computing performance metrics like mean response times when the server capacities are shared according to balanced fairness. While the computational cost of these formulas can be exponential in the number of servers in the worst case, we will illustrate their practical interest by introducing broad classes of pool structures that can be exactly analyzed in polynomial time. This extends considerably the class of models for which explicit performance metrics are accessible.
We consider a network of multi-class multi-server queues. Each job can be processed in parallel by any subset of servers within a pre-defined set that depends on its class. Each server is allocated in FCFS order at each queue. Jobs arrive according to Poisson processes, have independent exponential service requirements and are routed independently at random. We prove that, when stable, the network has a product-form stationary distribution. From a practical perspective, we propose an algorithm on this basis to allocate the resources of a computer cluster according to balanced fairness. We finally examine further developments of this model to allow for dynamic adaptation to the system occupancy.
The Python Data Analysis Library (pandas) provides data structures and tools for manipulating and analyzing data. It is built on top of NumPy, the core package for numerical computations in the Scientific Computing Tools for Python (SciPy) ecosystem. The objective of this presentation is threefold: explain when one can benefit from using pandas, describe its fundamental data structures, and give an overview of the data analysis tools it provides.
Basé sur les références suivantes :
Basé sur les références suivantes :
The need for matching items with one another while meeting assignment constraints or preferences gave rise to several well-known problems like the stable marriage and roommate problems. While most of the literature on matching problems focuses on a static setting with a fixed number of items, several recent works incorporated time by considering stochastic models, in which items of different classes arrive according to independent Poisson processes and assignment constraints are described by an undirected non-bipartite graph on the classes. In this paper, we prove that the continuous-time Markov chain associated with this model has the same transition diagram as in a product-form queueing model called an order-independent loss queue. This allows us to adapt existing results on order-independent queues to stochastic matching models, and in particular to derive closed-form expressions for several performance metrics, like the waiting probability or the mean matching time, that can be implemented using dynamic programming. Both these formulas and the numerical results that they allow us to derive are used to gain insight into the impact of parameters on performance. In particular, we characterize performance in a so-called heavy-traffic regime in which the number of items of a subset of the classes goes to infinity while items of other classes become scarce.
Order-independent (OI) queues, introduced by Berezner, Kriel, and Krzesinski in 1995, expanded the family of multi-class queues that are known to have a product-form stationary distribution by allowing for intricate class-dependent service rates. This paper further broadens this family by introducing pass-and-swap (P&S) queues, an extension of OI queues where any customer that completes service is not necessarily the customer that leaves the system. More precisely, we supplement the OI queue model with an undirected graph on the customer classes, which we call a swapping graph, such that there is an edge between two classes if customers of these classes can be swapped with one another. When a customer completes service, it passes over customers in the remainder of the queue until it finds a customer it can swap position with, that is, a customer whose class is a neighbor in the graph. In its turn, the customer that is ejected from its position takes the position of the next customer it can swap with, and so on. This is repeated until a customer cannot find another customer to be swapped with anymore; this customer is the one that leaves the queue. After proving that P&S queues have a product-form stationary distribution, we derive a necessary and sufficient stability condition for (open networks of) P&S queues that also applies to OI queues. We then study irreducibility properties of closed networks of P&S queues and derive the corresponding product-form stationary distribution. Lastly, we demonstrate that closed networks of P&S queues can be applied to describe the dynamics of new and existing load-distribution and scheduling algorithms in machine pools.
One of the key features of small-world networks is the ability to route messages in a few hops, using a decentralized algorithm in which each node has a limited knowledge of the topology. In 2000, Kleinberg proposed a model based on an augmented grid that asymptotically exhibits such a property.
In this paper, we revisit the original model with the help of numerical simulations. Our approach is fueled by a new algorithm that can sample augmenting links in an almost constant time. The speed gain allows us to perform detailed numerical evaluations. We first observe that, in practice, the augmented scheme proposed by Kleinberg is more robust than what is predicted by the asymptotic behavior, even in very large finite grids. We also propose tighter bounds on the asymptotic performance of Kleinberg's greedy routing algorithm. We finally show that, if the model is fed with realistic parameters, the results are in line with real-life experiments.
Efficiently exploiting the resources of data centers is a complex task that requires efficient and reliable load balancing and resource allocation algorithms. The former are in charge of assigning jobs to servers upon their arrival in the system, while the latter are responsible for sharing the server resources between their assigned jobs. These algorithms should adapt to various constraints, such as data locality, that restrict the feasible job assignments. In this paper, we propose a token-based algorithm that efficiently balances the load between the servers without requiring any knowledge on the job arrival rates and the server capacities. Assuming a balanced fair sharing of the server resources, we show that the resulting dynamic load balancing is insensitive to the job size distribution. Its performance is compared to that obtained under the best static load balancing and in an ideal system that would constantly optimize the resource utilization. We also make the connection with other token-based algorithms such as Join-Idle-Queue.
Efficiently exploiting the resources of data centers is a complex task that requires efficient and reliable load balancing and resource allocation algorithms. The former are in charge of assigning jobs to servers upon their arrival in the system, while the latter are responsible for sharing server resources between their assigned jobs. These algorithms should take account of various constraints, such as data locality, that restrict the feasible job assignments. In this paper, we propose a token-based mechanism that efficiently balances load between servers without requiring any knowledge on job arrival rates and server capacities. Assuming a balanced fair sharing of the server resources, we show that the resulting dynamic load balancing is insensitive to the job size distribution. Its performance is compared to that obtained under the best static load balancing and in an ideal system that would constantly optimize the resource utilization.
Stochastic network calculus is a tool for computing error bounds on the performance of queueing systems. However, deriving accurate bounds for networks consisting of several queues or subject to non-independent traffic inputs is challenging. In this paper, we investigate the relevance of the tools from analytic combinatorics, especially the kernel method, to tackle this problem. Applying the kernel method allows us to compute the generating functions of the queue state distributions in the stationary regime of the network. As a consequence, error bounds with an arbitrary precision can be computed. In this preliminary work, we focus on simple examples which are representative of the difficulties that the kernel method allows us to overcome.
Understanding the performance of a pool of servers is crucial for proper dimensioning. One of the main challenges is to take into account the complex interactions between servers that are pooled to process jobs. In particular, a job can generally not be processed by any server of the cluster due to various constraints like data locality. In this paper, we represent these constraints by some assignment graph between jobs and servers. We present a recursive approach to computing performance metrics like mean response times when the server capacities are shared according to balanced fairness. While the computational cost of these formulas can be exponential in the number of servers in the worst case, we illustrate their practical interest by introducing broad classes of pool structures that can be exactly analyzed in polynomial time. This extends considerably the class of models for which explicit performance metrics are accessible.
We represent a computer cluster as a multi-server queue with some arbitrary graph of compatibilities between jobs and servers. Each server processes its jobs sequentially in FCFS order. The service rate of a job at any given time is the sum of the service rates of all servers processing this job. We show that the corresponding queue is quasi-reversible and use this property to design a scheduling algorithm achieving balanced fair sharing of the computing resources.
We consider a system of processor-sharing queues with state-dependent service rates. These are allocated according to balanced fairness within a polymatroid capacity set. Balanced fairness is known to be both insensitive and Pareto-efficient in such systems, which ensures that the performance metrics, when computable, will provide robust insights into the real performance of the system considered. We first show that these performance metrics can be evaluated with a complexity that is polynomial in the system size if the system is partitioned into a finite number of parts, so that queues are exchangeable within each part and asymmetric across different parts. This in turn allows us to derive stochastic bounds for a larger class of systems which satisfy less restrictive symmetry assumptions. These results are applied to practical examples of tree data networks, such as backhaul networks of Internet service providers, and computer clusters.
Traffic modeling is key to the dimensioning of data networks. Usual models rely on the implicit assumption that each user generates data flows in series, one after the other, the ongoing flows sharing equitably the considered network link. We relax this assumption and consider the more realistic case where users may generate several data flows in parallel, these flows having to share the user's access line as well. We qualify this model as multi-source since each user now behaves as an independent traffic source. Usual performance metrics like mean throughput and congestion rate must now be defined at user level rather than at flow level. We derive explicit expressions for these performance metrics under the assumption that flows share bandwidth according to balanced fairness. These results are compared with those obtained by simulation when max-min fairness is imposed, either at flow level or at user level.
Optimiser l’utilisation des centres de données requiert des algorithmes efficaces et fiables pour répartir la charge entre les machines en s’adaptant à des contraintes variées, telles que la localité des données, qui restreignent les affectations possibles. Dans ce papier, nous proposons un algorithme simple qui réalise cette tâche sans avoir besoin de connaître le taux d’arrivée des requêtes ni la capacité des machines. Sa propriété essentielle, vérifiée à l’aide d’un modèle de files d’attente, est son insensibilité à la loi de la taille des requêtes : si chaque machine applique la discipline de service processor sharing et si les requêtes arrivent selon un processus de Poisson, alors la performance ne dépend de la loi de la taille des requêtes que par l’intermédiaire de sa moyenne. De premiers résultats numériques évaluant les taux de rejet des requêtes et d’occupation des machines indiquent de plus que cet algorithme surpasse le meilleur algorithme statique et est proche d’un algorithme idéal qui optimiserait constamment l’utilisation des ressources.
Le calcul réseau stochastique est un outil de calcul de bornes d’erreur sur la performance des réseaux de files d’attente. Obtenir des bornes précises pour des réseaux constitués de plusieurs files ou soumis à des arrivées non-indépendantes est un exercice délicat. Dans ce papier, nous évaluons la pertinence des outils de combinatoire analytique pour aborder ce problème. Nous appliquons en particulier la méthode du noyau pour exprimer les fonctions génératrices des distributions des états des files, ce qui nous permet de calculer des bornes d’erreur d’une précision arbitraire sur le comportement à l’état stationnaire. Dans ce travail préliminaire, nous nous focalisons sur des exemples simples mais représentatifs des difficultés que la méthode du noyau permet de surmonter. Ces résultats sont validés par des simulations.
Les infrastructures de cloud computing reposent sur des grappes de serveurs qui se partagent des requêtes. Leur dimensionnement nécessite de prédire leurs performances. L’un des principaux défis consiste à tenir compte des interactions complexes entre serveurs lorsqu’ils sont mis en commun pour traiter des requêtes en parallèle, en intégrant certaines contraintes comme la localité des données qui restreignent les affectations possibles. Pour les analyser, nous représentons ces contraintes par un graphe d’affectation entre les requêtes et les serveurs ; les ressources sont partagées selon l’équité équilibrée, qui a l’avantage de rendre les performances du système insensibles à la distribution de la taille des requêtes. Notre principale contribution est une nouvelle approche récursive pour calculer les métriques de performance, consistant à décomposer le système en éteignant les serveurs les uns après les autres. Même si la complexité des formules obtenues peut être exponentielle en la taille de la grappe dans le pire cas, nous illustrons leur intérêt pratique en identifiant de vastes familles de structures pour lesquelles elle devient polynomiale. Ceci étend considérablement l’ensemble des systèmes pour lesquels des métriques de performance explicites sont accessibles.
Longtemps source d’anecdotes, le routage dans les petits mondes a été étudié en 2000 par Kleinberg grâce à un modèle simple qui capture l’influence de la construction de liens distants, les raccourcis, sur la navigabilité dans un graphe : la grille de Kleinberg. Tandis que le modèle initial a été enrichi par de nombreuses analyses asymptotiques et extensions, les résultats numériques sur des grilles simulées sont limités par la complexité du calcul des raccourcis. La contribution principale de l’article est l’introduction d’une nouvelle méthode de tirage par double rejet dynamique qui permet une analyse numérique détaillée. Il devient possible de choisir comme taille de la grille de Kleinberg l’univers, et le reste de l’article montre quelques applications : mise en évidence d’une certaine robustesse du routage ; proposition de nouvelles bornes asymptotiques ; observation des six degrés de séparation.
Nous considérons un cluster de serveurs traitant des requêtes en parallèle. Si les clients ont en général intérêt à ce que leurs requêtes soient traitées par le plus grand nombre de serveurs, l'impact du parallélisme sur les serveurs est moins clair : trop faible, il ne permet pas d'utiliser pleinement les ressources disponibles ; trop fort, il risque d'encombrer inutilement les serveurs de requêtes en attente. Nous étudions ce phénomène à l'aide d'un modèle de files d'attente où les requêtes arrivent selon un processus de Poisson et requièrent des traitements dont le volume suit une loi exponentielle. Chaque nouvelle requête est affectée à un certain nombre de serveurs, choisis de manière aléatoire, uniforme, et indépendante de l'état du système. Chaque serveur traite ses requêtes dans leur ordre d'arrivée. Nous montrons qu'il existe un degré de parallélisme qui minimise le nombre moyen de requêtes présentes dans chaque serveur. Ce degré optimal est de l'ordre de la racine carrée du nombre de serveurs pour une charge faible à modérée, et décroît jusqu'à deux à très forte charge.
The growing demand for cloud-based services encourages operators to maximize resource efficiency within computer clusters. This motivates the development of new technologies that make resource management more flexible. However, exploiting this flexibility to reduce the number of computers also requires efficient resource-management algorithms that have a predictable performance under stochastic demand. In this thesis, we design and analyze such algorithms using the framework of queueing theory.
Our abstraction of the problem is a multi-server queue with several customer classes. Servers have heterogeneous capacities and the customers of each class enter the queue according to an independent Poisson process. Each customer can be processed in parallel by several servers, depending on compatibility constraints described by a bipartite graph between classes and servers, and each server applies first-come-first-served policy to its compatible customers. We first prove that, if the service requirements are independent and exponentially distributed with unit mean, this simple policy yields the same average performance as balanced fairness, an extension to processor-sharing known to be insensitive to the distribution of the service requirements. A more general form of this result, relating order-independent queues to Whittle networks, is also proved. Lastly, we derive new formulas to compute performance metrics.
These theoretical results are then put into practice. We first propose a scheduling algorithm that extends the principle of round-robin to a cluster where each incoming job is assigned to a pool of computers by which it can subsequently be processed in parallel. Our second proposal is a load-balancing algorithm based on tokens for clusters where jobs have assignment constraints. Both algorithms are approximately insensitive to the job size distribution and adapt dynamically to demand. Their performance can be predicted by applying the formulas derived for the multi-server queue.