Work Package 1

P2P distribution systems analysis and modeling

The project concentrates on CDAs. We do not consider standard file-sharing applications. The difference between file-sharing and CDA is simple. In the first case the system is dominated by the search phase because the number of existing files is very large, their location is not known, and it is assumed that the download phase is ideal. In the latter case the search phase is either non-existent, as in push services, or is easy because the “content” is popular, and the system is dominated by the download phase, because the number of clients that want the content is huge, because they are sparse or simply because the network is not ideal. There is a major subdivision in CDA: time-sensitive and delay-tolerant. This WP focuses on theoretical analyses, whose aim is finding models for different applications in order to identify common parts and be able to design modular systems. Results of the WP will be:

  • Models (behavioral, performance, stochastic, etc.) describing P2P applications;
  • Applications classification;
  • Decomposition of applications and systems in simpler tasks.

Task 1.1: Time-sensitive applications

There are two large categories of time-sensitive applications: streaming and conversational. Both of them include voice and video, while the conversational class can include also distributed computing, mission-critical tasks, etc.: the conversation is no more between humans, but between computers, but the application requirements remains tight in terms of delay and jitter. Additionally these applications are not loss-tolerant as voice and video. Fig. 4 illustrates the main difference between streaming and conversational applications from the P2P networking point of view: streaming is mono-directional and tree based, while conversations are bi-directional and meshed.

Fig. 4: Topological difference between streaming and conversation

Streaming services are tree based because the audio/video information propagates from one source to many users. However, even if the service is based on a logical tree multicast, the application providing the service may well use a mesh-based distribution topology, like when using Multiple Description Coding (MDC), where each Description can reach the final user via different paths. The definition of proper models to represent a mesh based, MDC distribution process, as well as their analysis, has never been undertaken to the best of our knowledge. We stress that research in this project will not focus on signal processing techniques to obtain MDC, rather it will focus on modeling protocols, algorithms and topological properties that allow exploiting MDC at best. Most people think about conversational services as telephony: point-to-point and human based. Indeed, as depicted in Fig.3 b), a “conversation” can involve many parties, like in any conferencing application. Additionally, many aspects of what is known as “grid computing” imply multi-point conversational services between the computing units. We are interested in defining the fundamental properties of different classes of conversational services. To start with, voice, video, and data based conversations have very different constraints and properties. To focus the point let’s consider Skype voice conferencing. The choice of Skype is to entirely avoid a mesh topology: A “mixing node” is elected that is the root of a spanning tree reaching the other conferencing points. The mix merges the voice flows from different users into a single stream. This technique does not scale, since the mixing node is heavily loaded (Skype choice to limit the conferencing party to four is a clear symptom). Second “mixing” can be a solution for voice, where superimposed signals are intelligible, but it’s not feasible for data and video. The first step of the research will be the analysis of different “meshing” strategies to support different conversational applications. Ideally, conferencing imply a full mesh between all end-points. This is a non scalable, non feasible solution, thus different strategies must be devised. For instance, for voice a multi-mixing hierarchical topology might be feasible. There is far less experience on data conversational application, but it can be envisioned that these will be based on messaging, so that an application layer full mesh is conceivable albeit not simple to obtain.

Task 1.2: Delay-tolerant applications

RUs working on T12 are UniTn and UniTo. After an initial common part, UniTn and UniTo will tackle complementary aspects. As a first step a detailed analysis of fundamental parameters that influences the distribution process will be carried out. The result of this preliminary study will be a classification of existing delay-tolerant applications. We notice that this part is extremely important to feed the design part envisioned in WP2, where it is of the utmost importance to identify common features of applications and reusable protocols and parts. In this initial phase the two RUs will also identify specific applications that are of particular interest for fine-tuning the general model devised, in order to be able to predict the behavior and performance. For instance, the same general modeling technique can be applied to the distribution of instant-messages and software patches (both delay-tolerant applications), but the technique might need specialization to be applied to the two different cases. The UniTn RU has already recently matured experience in modeling generic distribution architectures. This early experience will be shared with the partners to perform theoretical studies on the properties of fast distributions architectures. This study will be the starting point for further research of both RUs. Afterwards UniTn will focus on thorough performance evaluation of different schemes using simulations and analytical techniques. Analysis is a means to obtain guidelines for the system parameter settings. Simulation is the tool of choice for accurate performance evaluations, for the comparison of different schemes, as a means to validate the analytical results, and to fine-tune the analytic models themselves. In parallel UniTo RU will focus on a complementary problem: the optimized distribution of a specific popular content, initially available at one or a few peers, to a large set of peers, exploiting features of the “chuncking” mechanism (i.e., the partition of the content in pieces of moderate size distributed in parallel). Assuming the complete knowledge of the system parameters, and using the download time as performance metric, the research unit will formalize the content distribution problem as a deterministic optimization problem, obtaining a formal definition of the optimal distribution policy. We expect the formalization of the optimal distribution policy to fall in the class of NP-hard problems. In this phase, the goal will be the identification of qualitative properties that must be satisfied by optimal distribution policies. In a second phase, the UniTo RU will devise simple and efficient approximations (heuristics) of the optimal content distribution policy subject to distributed implementation (a bridge towards task 2.3). Finally, the performance of the approximated algorithms and protocols suitable for implementation will be carefully evaluated. The performance of the heuristics will be also compared with those of the optimal policy (on instances of moderate size). We envision that in this phase of the project, the analytical tools developed by UniTn can be used by UniTo.

Task 1.3: Security issues in P2P content distribution

Research units involved in T13, that addresses security and privacy from an application modeling point of view, are UniFi and UniTo. Security issues in P2P networks cannot leave out of consideration the specific applications and must take into account the topology, hierarchy, and trust relationships. A P2P network can offer heterogeneous services and for each service appropriate security precautions must be undertaken (access control, authentication, ...). P2P networks also exhibit specific problems related to their cooperative nature. Some of these issues have been tackled in different contexts, such as communication protocols for ad-hoc networks. In particular, issues concerning routing, service availability, message filtering and resilience against attacks from inside the network have been addressed in the literature on mesh wireless networks. For any type of offered service, P2P networks should guarantee some form of access control: this can be a strict requirement with strong user identification (e.g., cooperative working activities on sensitive data) or a loose one just requiring some form of “suitability” control (e.g., video streaming). The access to the service cannot be delegated to a single centralized authentication server, least releasing in part the P2P paradigm. The research work will concentrate on the development of authentication algorithms taking into account two key factors. First, the choice of the credentials (identification tokens, biometric authentication algorithms, etc.) must trade-off ease of access and security. Second, the computational burden of the algorithm must pursue a good compromise between the algorithm’s robustness and the system’s vulnerability. If a centralized authentication is not suited for P2P networks, an entirely distributed approach might rise concerns about authentication information protection. In general, highly distributed solutions might have problems related to authentication data integrity under high churn (the normal process of users attaching and detaching from a P2P network) rates. In this case, services and applications might become both insecure and unavailable at the same time. In order to blend the best properties of centralized and distributed approaches, resource replication protocols will be investigated.

In networks for which access control is less important than resilience and dynamism, the advantages achievable by distributed authentication schemes will be thoroughly investigated and evaluated. Distributed Certification Authorities based on threshold cryptography algorithms have been proposed in the literature. Such algorithms allow replacing a single authentication server with a group of n cooperating hosts, guaranteeing a satisfactory system operation provided that no more than k hosts misbehave (“k over n” models). A specific research topic spawning form the approach devised above, concerns the study of the possible exploitation of “k over n” models in P2P networks both in alternative to centralized authentication and via design of appropriate cooperation strategies. In the area of security, attention will also be devoted to the robustness of algorithms for routing and association between keys and corresponding values. In particular, the interaction with routing algorithms can lead to problems with logical partitioning of the network and/or an excessive load of some nodes, thus making some resources unreachable. A possible approach to these problems, that UniTo RU is going to pursue, is based on the use of metrics related to the nodes’ reputation. Such metrics are updated during the network’s lifetime; in this way, each node is judged according to its past behaviour and more reliable routes are preferred. This approach also helps in solving the problem of “free-riders,” i.e., users that do not actively contribute to the network’s activities but only try to get the maximum of available resources. Clearly, this approach must face issues such as the possible overloading of nodes having highest reputation.