October 30th 2014

NETWORKING AND SECURITY:

1. ON WIRELESS SCHEDULING Algorithms FOR MINIMIZING THE QUEUE-OVERFLOW PROBABILITY

In this Paper, we are interested in wireless scheduling algorithms for the downlink of a single cell that can minimize the queue-overflow probability. Specifically, in a large-deviation setting, we are interested in algorithms that maximize the asymptotic decay-rate of the queue-overflow probability, as the queue-overflow threshold approaches infinity. We first derive an upper bound on the decay-rate of the queue-overflow probability over all scheduling policies. We then focus on a class of scheduling algorithms collectively referred to as the “α-algorithms”. For a given α ≥ 1, the -algorithm picks the user for service at each time that has the largest product of the transmission rate multiplied by the backlog raised to the power α. We show that when the overflow metric is appropriately modified, the minimum-cost-to-overflow under the -algorithm can be achieved by a simple linear path, and it can be written as the solution of a vector-optimization problem. Using this structural property, we then show that when α approaches infinity, the α-algorithms asymptotically achieve the largest decay-rate of the queue overflow probability. Finally, this result enables us to design scheduling algorithms that are both close-to-optimal in terms of the asymptotic decay-rate of the overflow probability, and empirically shown to maintain small queue-overflow probabilities over queue-length ranges of practical interest.

2. EFFICIENT AND DYNAMIC ROUTING TOPOLOGY INFERENCE FROM END-TO-END MEASUREMENTS

Inferring the routing topology and link performance from a node to a set of other nodes is an important component in network monitoring and application design. In this paper we propose a general framework for designing topology inference algorithms based on additive metrics. The framework can flexibly fuse information from multiple measurements to achieve better estimation accuracy. We develop computationally efficient (polynomial-time) topology inference algorithms based on the framework. We prove that the probability of correct topology inference of our algorithms converges to one exponentially fast in the number of probing packets. In particular, for applications where nodes may join or leave frequently such as overlay network construction, application-layer multicast, peer-to-peer file sharing/streaming, we propose a novel sequential topology inference algorithm which significantly reduces the probing overhead and can efficiently handle node dynamics. We demonstrate the effectiveness of the Proposed inference algorithms via Internet experiments.

KNOWLEDGE AND DATA MINING:

3. BINRANK: SCALING DYNAMIC AUTHORITY-BASED SEARCH USING MATERIALIZED SUBGRAPHS

Dynamic authority-based keyword search algorithms, such as ObjectRank and personalized PageRank, leverage semantic link information to provide high quality, high recall search in databases, and the Web. Conceptually, these algorithms require a querytime PageRank-style iterative computation over the full graph. This computation is too expensive for large graphs, and not feasible at Query time. Alternatively, building an index of precomputed results for some or all keywords involves very expensive preprocessing. We introduce BinRank, a system that approximates ObjectRank results by utilizing a hybrid approach inspired by materialized views in traditional query processing. We materialize a number of relatively small subsets of the data graph in such a way that any keyword query can be answered by running ObjectRank on only one of the subgraphs. BinRank generates the subgraphs by partitioning all the terms in the corpus based on their co-occurrence, executing ObjectRank for each partition using the terms to generate a set of random walk starting points, and keeping only those objects that receive non-negligible scores. The intuition is that a subgraph that contains all objects and links relevant to a set of related terms should have all the information needed to rank objects with respect to one of these terms. We demonstrate that BinRank can achieve subsecond query execution time on the English Wikipedia data set, while producing high-quality search results that closely approximate the results of ObjectRank on the original graph. The Wikipedia link graph contains about 108 edges, which is at least two orders of magnitude larger than what prior state of the art dynamic authority-based search systems have been able to demonstrate. Our experimental evaluation investigates the trade-off between query execution time, quality of the results, and storage requirements of BinRank.

4. CLOSENESS: A NEW PRIVACY MEASURE FOR DATA PUBLISHING

The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain “identifying” attributes) contains at least k records. Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure. The notion of `-diversity has been proposed to address this; `-diversity requires that each equivalence class has at least ` well-represented (in Section 2) values for each sensitive attribute. In this article, we show that `-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. Motivated by these limitations, we propose a new notion of privacy called “closeness”. We first present the base Model t-closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t). We then propose a more flexible privacy model called (n, t)-closeness that offers higher utility. We describe our desiderata for designing a distance measure between two probability distributions and present two distance measures. We discuss the rationale for using closeness as a privacy measure and illustrate its advantages through examples and experiments.

5. DATA LEAKAGE DETECTION

We study the following problem: A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data is leaked and found in an unauthorized place (e.g., on the web or somebody’s laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. We propose data allocation strategies (across the agents) that improve the probability of identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In some cases we can also inject “realistic but fake” data records to further improve our chances of detecting leakage and identifying the guilty party.

6. PAM: AN EFFICIENT AND PRIVACY-AWARE MONITORING FRAMEWORK FOR CONTINUOUSLY MOVING OBJECTS

Efficiency and privacy are two fundamental issues in moving object monitoring. This paper proposes a privacy-aware monitoring (PAM) framework that addresses both issues. The framework distinguishes itself from the existing work by being the first to holistically address the issues of location updating in terms of monitoring accuracy, efficiency, and privacy, particularly, when and how mobile clients should send location updates to the server. Based on the notions of safe region and most probable result, PAM performs location updates only when they would likely alter the query results. Furthermore, by designing various client update strategies, the framework is flexible and able to optimize accuracy, privacy, or efficiency. We develop efficient query evaluation/reevaluation and safe region computation algorithms in the framework. The experimental results show that PAM substantially outperforms traditional schemes in terms of monitoring accuracy, CPU cost, and scalability while achieving close-to-optimal communication cost.

7. P2P REPUTATION MANAGEMENT USING DISTRIBUTED IDENTITIES AND DECENTRALIZED RECOMMENDATION CHAINS

Peer-to-peer (P2P) networks are vulnerable to peers who cheat, propagate malicious code, leech on the network, or simply do not cooperate. The traditional security techniques developed for the centralized distributed systems like client-server networks are insufficient for P2P networks by the virtue of their centralized nature. The absence of a central authority in a P2P network poses unique challenges for reputation management in the network. These challenges include identity management of the peers, secure reputation data management, Sybil attacks, and above all, availability of reputation data. In this paper, we present a cryptographic protocol for ensuring secure and timely availability of the reputation data of a peer to other peers at extremely low costs. The past behavior of the peer is encapsulated in its digital reputation, and is subsequently used to predict its future actions. As a result, a peer’s reputation motivates it to cooperate and desist from malicious activities. The cryptographic protocol is coupled with self-certification and cryptographic mechanisms for identity management and countering Sybil attack. We illustrate the security and the efficiency of the system analytically and by means of simulations in a completely decentralized Gnutella-like P2P network.

8. MANAGING MULTIDIMENSIONAL HISTORICAL AGGREGATE DATA IN UNSTRUCTURED P2P NETWORKS

A P2P-based framework supporting the extraction of aggregates from historical multidimensional data is proposed, which provides efficient and robust query evaluation. When a data population is published, data are summarized in a synopsis, consisting of an index built on top of a set of subsynopses (storing compressed representations of distinct data portions). The index and the subsynopses are distributed across the network, and suitable replication mechanisms taking into account the query workload and network conditions are employed that provide the appropriate coverage for both the index and the subsynopses.

9. Deriving Concept-Based User Profiles from Search Engine Logs

User profiling is a fundamental component of any personalization applications. Most existing user profiling strategies are based on objects that users are interested in (i.e., positive preferences), but not the objects that users dislike (i.e., negative preferences). In this paper, we focus on search engine personalization and develop several concept-based user profiling methods that are based on both positive and negative preferences. We evaluate the proposed methods against our previously proposed personalized query clustering method. Experimental results show that profiles which capture and utilize both of the user’s positive and negative preferences perform the best. An important result from the experiments is that profiles with negative preferences can increase the separation between similar and dissimilar queries. The separation provides a clear threshold for an agglomerative clustering algorithm to terminate and improve the overall quality of the resulting query clusters.

10. Parallelizing Itinerary-Based KNN Query Processing in Wireless Sensor Networks

Wireless sensor networks have been proposed for facilitating various monitoring applications (e.g., environmental monitoring and military surveillance) over a wide geographical region. In these applications, spatial queries that collect data from wireless sensor networks play an important role. One such query is the K-Nearest Neighbor (KNN) query that facilitates collection of sensor data samples based on a given query location and the number of samples specified (i.e., K). Recently, itinerary-based KNN query processing techniques, which propagate queries and collect data along a predetermined itinerary, have been developed. Prior studies demonstrate that itinerary-based KNN query processing algorithms are able to achieve better energy efficiency than other existing algorithms developed upon tree-based network infrastructures. However, how to derive itineraries for KNN query based on different performance requirements remains a challenging problem. In this paper, we propose a Parallel Concentric-circle Itinerary based KNN (PCIKNN) query processing technique that derives different itineraries by optimizing either query latency or energy consumption. The performance of PCIKNN is analyzed mathematically and evaluated through extensive experiments. Experimental results show that PCIKNN outperforms the state-of-the-art techniques.

11. Towards an Automatic Detection of Sensitive Information in a Database

In order to validate user requirements, tests are often conducted on real data. However, developments and tests are more and more outsourced, leading companies to provide external staff with real confidential data. A solution to this problem is known as Data Scrambling. Many algorithms aim at smartly replacing true data by false but realistic ones. However, nothing has been developed to automate the crucial task of the detection of the data to be scrambled. In this paper we propose an innovative approach - and its implementation as an expert system - to achieve the automatic detection of the candidate attributes for scrambling. Our approach is mainly based on semantic rules that determine which concepts have to be scrambled, and on a linguistic component that retrieves the attributes that semantically correspond to these concepts. Since attributes cannot be considered independently from each other we also address the challenging problem of the propagation of the scrambling among the whole database. An important contribution of our approach is to provide a semantic modeling of sensitive data. This knowledge is made available through production rules, operationalizing the sensitive data detection.

12. ViDE: A Vision-Based Approach for Deep Web Data Extraction

Deep Web contents are accessed by queries submitted to Web databases and the returned data records are enwrapped in dynamically generated Web pages (they will be called deep Web pages in this paper). Extracting structured data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. Until now, a large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language dependent. As the popular two-dimensional media, the contents on Web pages are always displayed regularly for users to browse. This motivates us to seek a different way for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. In this paper, a novel vision-based approach that is Web-page programming- language-independent is proposed. This approach primarily utilizes the visual features on the deep Web pages to implement deep Web data extraction, including data record extraction and data item extraction. We also propose a new evaluation measure revision to capture the amount of human effort needed to produce perfect extraction. Our experiments on a large set of Web databases show that the proposed vision-based approach is highly effective for deep Web data extraction.

13. An UpDown Directed Acyclic Graph Approach for Sequential Pattern Mining

Traditional pattern growth-based approaches for sequential pattern mining derive length-(k þ 1) patterns based on the projected databases of length-k patterns recursively. At each level of recursion, they unidirectionally grow the length of detected patterns by one along the suffix of detected patterns, which needs k levels of recursion to find a length-k pattern. In this paper, a novel data structure, UpDown Directed Acyclic Graph (UDDAG), is invented for efficient sequential pattern mining. UDDAG allows bidirectional pattern growth along both ends of detected patterns. Thus, a length-k pattern can be detected in blog2k þ 1c levels of recursion at best, which results in fewer levels of recursion and faster pattern growth. When minSup is large such that the average pattern length is close to 1, UDDAG and PrefixSpan have similar performance because the problem degrades into frequent item counting problem. However, UDDAG scales up much better. It often outperforms PrefixSpan by almost one order of magnitude in scalability tests. UDDAG is also considerably faster than Spade and LapinSpam. Except for extreme cases, UDDAG uses comparable memory to that of PrefixSpan and less memory than Spade and LapinSpam. Additionally, the special feature of UDDAG enables its extension toward applications involving searching in large spaces.

14. Domain-Driven Classification Based on Multiple Criteria and Multiple Constraint-Level Programming for Intelligent Credit Scoring

Extracting knowledge from the transaction records and the personal data of credit card holders has great profit potential for the banking industry. The challenge is to detect/predict bankrupts and to keep and recruit the profitable customers. However, grouping and targeting credit card customers by traditional data-driven mining often does not directly meet the needs of the banking industry, because data-driven mining automatically generates classification outputs that are imprecise, meaningless, and beyond users’ control. In this paper, we provide a novel domain-driven classification method that takes advantage of multiple criteria and multiple constraint level programming for intelligent credit scoring. The method involves credit scoring to produce a set of customers’ scores that allows the classification results actionable and controllable by human interaction during the scoring process. Domain knowledge and experts’ experience parameters are built into the criteria and constraint functions of mathematical programming and the human and machine Conversation is employed to generate an efficient and precise solution. Experiments based on various data sets validated the effectiveness and efficiency of the proposed methods.

15. Feature Selection Using f-Information Measures in Fuzzy Approximation Spaces

The selection of nonredundant and relevant features of real-valued data sets is a highly challenging problem. A novel feature selection method is presented here based on fuzzy-rough sets by maximizing the relevance and minimizing the redundancy of the selected features. By introducing the fuzzy equivalence partition matrix, a novel representation of Shannon’s entropy for fuzzy approximation spaces is proposed to measure the relevance and redundancy of features suitable for real-valued data sets. The fuzzy equivalence partition matrix also offers an efficient way to calculate many more information measures, termed as f-information measures. Several f-information measures are shown to be effective for selecting nonredundant and relevant features of real-valued data sets. This paper compares the performance of different f-information measures for feature selection in fuzzy approximation spaces. Some quantitative indexes are introduced based on fuzzy-rough sets for evaluating the performance of proposed method. The effectiveness of the proposed method, along with a comparison with other methods, is demonstrated on a set of real-life data sets.

MOBILE COMPUTING:

16. SECURE DATA COLLECTION IN WIRELESS SENSOR NETWORKS USING RANDOMIZED DISPERSIVE ROUTES

Compromised-node and denial-of-service are two key attacks in wireless sensor networks (WSNs). In this paper, we study routing mechanisms that circumvent (bypass) black holes formed by these attacks. We argue that existing multi-path routing approaches are vulnerable to such attacks, mainly due to their deterministic nature. So once an adversary acquires the routing algorithm, it can compute the same routes known to the source, and hence endanger all information sent over these routes. In this paper, we develop mechanisms that generate randomized multipath routes. Under our design, the routes taken by the “shares” of different packets change over time. So even if the routing algorithm becomes known to the adversary, the adversary still cannot pinpoint the routes traversed by each packet. Besides randomness, the routes generated by our mechanisms are also highly dispersive and energy-efficient, making them quite capable of bypassing black holes at low energy cost. Extensive simulations are conducted to verify the validity of our mechanisms.

17. A Novel Dual-Index Design to Efficiently Support Snapshot Location-Based Query Processing in Mobile Environments

Location-based services are increasingly popular recently. Many applications aim to support a large number of users in metro area (i.e., dense networks). To cope with this challenge, we present a framework that supports location-based services on MOVing objects in road Networks (MOVNet, for short) [26]. MOVNet’s dual-index design utilizes an on-disk R-tree to store the network connectivities and an in-memory grid structure to maintain moving object position updates. In this paper, we extend the functionality of MOVNet to support snapshot range queries as well as snapshot k nearest neighbor queries. Given an arbitrary edge in the space, we analyze the minimum and maximum number of grid cells that are possibly affected. We show that the maximum bound can be used in snapshot range query processing to prune the search space. We demonstrate via theoretical analysis and experimental results that MOVNet yields excellent performance with various networks while scaling to a very large number of moving objects.

PARELLEL AND DISTRIBUTED SYSTEM:

18. PRIVACY-CONSCIOUS LOCATION-BASED QUERIES IN MOBILE ENVIRONMENTS

In location-based services, users with location-aware mobile devices are able to make queries about their surroundings anywhere and at any time. While this ubiquitous computing paradigm brings great convenience for information access, it also raises concerns over potential intrusion into user location privacy. To protect location privacy, one typical approach is to cloak user locations into spatial regions based on user-specified privacy requirements, and to transform location-based queries into region-based queries. In this paper, we identify and address three new issues concerning this location cloaking approach. First, we study the representation of cloaking regions and show that a circular region generally leads to a small result size for region-based queries. Second, we develop a mobility-aware location cloaking technique to resist trace analysis attacks. Two cloaking algorithms, namely MaxAccu_Cloak and MinComm_Cloak, are designed based on different performance objectives. Finally, we develop an efficient polynomial algorithm for evaluating circular-region-based kNN queries. Two query processing modes, namely bulk and progressive, are presented to return query results either all at once or in an incremental manner. Experimental results show that our proposed mobility-aware cloaking algorithms significantly improve the quality of location cloaking in terms of an entropy measure without compromising much on query latency or communication cost. Moreover, the progressive query processing mode achieves a shorter response time than the bulk mode by parallelizing the query evaluation and result transmission.

SECURE COMPUTING:

19. Using Web-Referral Architectures to Mitigate Denial-of-Service Threats

The web is a complicated graph, with millions of websites interlinked together. In this paper, we propose to use this web sitegraph structure to mitigate flooding attacks on a website, using new web referral architecture for privileged service (“WRAPS”). WRAPS allows a legitimate client to obtain a privilege URL through a simple click on a referral hyperlink, from a website trusted by the target website. Using that URL, the client can get privileged access to the target website in a manner that is far less vulnerable to a distributed denial-of-service (DDoS) flooding attack than normal access would be. WRAPS does not require changes to web client software and is extremely lightweight for referrer websites, which makes its deployment easy. The massive scale of the web sitegraph could deter attempts to isolate a website through blocking all referrers. We present the design of WRAPS, and the implementation of a prototype system used to evaluate our proposal. Our empirical study demonstrates that WRAPS enables legitimate clients to connect to a website smoothly in spite of a very intensive flooding attack, at the cost of small overheads on the website’s ISP’s edge routers. We discuss the security properties of WRAPS and a simple approach to encourage many small websites to help protect an important site during DoS attacks.

SOFTWARE ENGINEERING:

20. Vulnerability Discovery with Attack Injection

The increasing reliance put on networked computer systems demands higher levels of dependability. This is even more relevant as new threats and forms of attack are constantly being revealed, compromising the security of systems. This paper addresses this problem by presenting an attack injection methodology for the automatic discovery of vulnerabilities in software components. The proposed methodology, implemented in AJECT, follows an approach similar to hackers and security analysts to discover vulnerabilities in network-connected servers. AJECT uses a specification of the server’s communication protocol and predefined test case generation algorithms to automatically create a large number of attacks. Then, while it injects these attacks through the network, it monitors the execution of the server in the target system and the responses returned to the clients. The observation of an unexpected behavior suggests the presence of a vulnerability that was triggered by some particular attack (or group of attacks). This attack can then be used to reproduce the anomaly and to assist the removal of the error. To assess the usefulness of this approach, several attack injection campaigns were performed with 16 publicly available POP and IMAP servers. The results show that AJECT could effectively be used to locate vulnerabilities, even on well-known servers tested throughout the years.

21. On Event-Based Middleware for Location-Aware Mobile Applications

As mobile applications become more widespread, programming paradigms and middleware architectures designed to support their development are becoming increasingly important. The event-based programming paradigm is a strong candidate for the development of mobile applications due to its inherent support for the loose coupling between components required by mobile applications. However, existing middleware that supports the event-based programming paradigm is not well suited to supporting location-aware mobile applications in which highly mobile components come together dynamically to collaborate at some location. This paper presents a number of techniques including location-independent announcement and subscription coupled with location-dependent filtering and event delivery that can be used by event-based middleware to support such collaboration. We describe how these techniques have been implemented in STEAM, an event-based middleware with a fully decentralized architecture, which is particularly well suited to deployment in ad hoc network environments. The cost of such location-based event dissemination and the benefits of distributed event filtering are evaluated.

CLOUD COMPUTING:

22. An Efficient Hybrid Peer-to-Peer System for Distributed Data Sharing

Peer-to-peer overlay networks are widely used in distributed systems. Based on whether a regular topology is maintained among peers, peer-to-peer networks can be divided into two categories: structured peer-to-peer networks in which peers are connected by a regular topology, and unstructured peer-to-peer networks in which the topology is arbitrary. Structured peer-to-peer networks usually can provide efficient and accurate services but need to spend a lot of effort in maintaining the regular topology. On the other hand, unstructured peer-to-peer networks are extremely resilient to the frequent peer joining and leaving but this is usually achieved at the expense of efficiency. The objective of this work is to design a hybrid peer-to-peer system for distributed data sharing which combines the advantages of both types of peer-to-peer networks and minimizes their disadvantages. The proposed hybrid peer to- peer system is composed of two parts: the first part is a structured core network which forms the backbone of the hybrid system; the second part is made of multiple unstructured peer-to-peer networks each of which is attached to a node in the core network. The core structured network can narrow down the data lookup within a certain unstructured network accurately, while the unstructured networks provide a low-cost mechanism for peers to join or leave the system freely. A data lookup operation first checks the local unstructured network, and then, the structured network. This two-tier hierarchy can decouple the flexibility of the system from the efficiency of the system. Our simulation results demonstrate that the hybrid peer-to-peer system can utilize both the efficiency of structured peer-to peer network and the flexibility of the unstructured peer-to-peer network and achieve a good balance between the two types of networks.

23. A Model for Reducing Power Consumption in Peer-to-Peer Systems

Information systems based on the cloud computing model and peer-to-peer (P2P) model are now getting popular. In the cloud computing model, a cloud of servers support thin clients with various types of service like Web pages and databases. On the other hand, every computer is peer and there is no centralized coordinator in the P2P model. It is getting more significant to discuss how to reduce the total electric power consumption of computers in information systems to realize eco-society. In this paper, we consider a Web type of application on P2P overlay networks. First, we discuss a model for showing how much each server peer consumes electric power to perform Web requests from client peers. Then, we discuss algorithms for a client peer to select a server peer in a collection of server peers so that the total power consumption can be reduced while some constraint like deadline one is satisfied. Lastly, we evaluate the algorithms in terms of the total power consumption and throughput compared with traditional round robin algorithms.

24. A cloud oriented approach for people-provided services

Web 2.0 started a paradigm shift concerning content generation. Users spend hours browsing and producing contents to share, most of the times, freely. During these activities, their laptops and home PCs are considerably underused. By combining the amount of idle processing capacity with increasingly fast internet accesses, every computer can be considered as a computational “nearby” resource. The question is: what to do with them? This paper explores how these resources can be used to give something back to the user in a secure and private way.

INTERNET AND WEB APPLICATION:

25. A Solution Model and Tool for Supporting the Negotiation of Security Decisions in E-Business Collaborations

Sharing, comparing and negotiating security related actions and requirements across businesses has always been a complicated matter. Issues arise due to semantic gaps, disparity in security documentation and formats, and incomplete security-related information during negotiations, to say the least. As collaborations amongst e-businesses in particular increase, there is a growing, implicit need to address these issues and ease companies’ deliberations on security. Our research has investigated this topic in substantial detail, and in this paper we present a novel solution model and tool for supporting businesses through these tasks. Initial evaluation results and feedback from interviewed security professionals affirm the use and suitability of our proposals in supporting the security actions negotiation process.

26. WSOTF: An Automatic Testing Tool for Web Services Composition

This paper presents an automatic conformance testing tool with timing constraints from a formal specification (TEFSM: Timed Extended Finite State Machine) of web services composition (WSOTF: Web Service composition, Online Testing Framework), that is implemented by an online testing algorithm. This algorithm combines simultaneously idea of test execution and debug to generate and simultaneously execute the test cases. In this tool, the complete test scenario (timed test case suite and data) is built during test execution. This tool focus on unit testing, it means that only the service composition under test is tested and all its partners will be simulated by the WSOTF. This tool also considers the timing constraints and synchronous time delay. We can also use this tool for debug that is not easy while we develop a composite of Web services.

27. TRIC: An Infrastructure for Trust and Reputation Across Virtual Communities

Virtual communities are gradually becoming aninherent part of modern life. Trust and reputation systems, which are considered key enablers of virtual communities, These systems support the accumulation of member reputation information andleverage this information to increase the likelihood of successful member interactions and to better protect the community from fraudulent members. Reputationinformation is an important part of a user's identity, which makes this data both sensitive anddesired for communitiesto share. Reputation sharing is also motivated by the individual user that can leverage her state in new communitiesusing the reputation she gained in others. In previous work we have presented the Cross-Community Reputation (CCR) model forsharing reputation knowledge among communities. The CCR model identifies the fundamental terms required for a meaningful sharing of reputation information among communitiesand proposes means to make the information sharing feasible. This paper presents the TRIC infrastructure for enabling CCR systems. The major concerns of TRIC are to protect the user's rights for privacy andcontrol over her data and to draw some architecture guidelines that can be implemented in more than one way. We explain the major components required for this type of infrastructure and discuss known design alternatives and their tradeoffs in the case of TRIC.

28. Crawling Online Social Graphs

Extensive research has been conducted on top of online socialnetworks (OSNs), while little attention has been paid to the data collection process. Due to the large scale of OSNs and their privacy control policies, a partial data set is often used for analysis. The data set analyzed is decided by many factors including the choice of seeds, node selection algorithms, and the sample size. These factors may introduce biases and further contaminate or even skew the results. To evaluate the impact of different factors, this paper examines the OSN graph crawlingproblem, where the nodes are OSN users and the edges are the links (or relationship) among these users. More specifically, by looking at various factors in the crawling process, the following problems are addressed in this paper: 1) Efficiency: How fast different crawlers discover nodes/links; 2) Sensitivity: How different OSNs and the number of protected users affect crawlers; 3) Bias: How major graph properties are skewed. To the best of our knowledge, our simulations on four real world online social graphs provide the first in-depth empirical answers to these questions.

29. Combining the Missing Link: an Incremental Topic Model of Document Content and Hyperlink

The content and structure of linkedinformation such as sets of web pages or research paper archives are dynamic and keep on changing. Even though different methods are proposed to exploit both the link structure and the content information, no existing approach can effectively deal with this evolution. We propose a novel joint model, called Link-IPLSI, to combine texts and links in a topic modeling framework incrementally. Themodel takes advantage ofa novel link updating technique that can cope with dynamic changes of online documentstreams in a faster and scalable way. Furthermore, anadaptive asymmetric learning method is adopted to freely control the assignment of weights to terms and citations. Experimental results on two different sources of online information demonstrate thetime saving strength of our method and indicate that our model leads to systematic improvements in the quality of classification and link prediction.

30. Query-Aware Complex Object Buffer Management in XML Information Retrieval

In this paper, we analyse the data access characteristics of a typical XMLinformation retrievalsystem and propose a new query awarebuffer replacement algorithm based on prediction of Minimum Reuse Distance (MRD for short). The algorithm predicts an object's next reference distance according to the retrieval system's running status and replaces the objects that have maximum reuse distances. The factors considered in the replacement algorithm include the access frequency, creation cost, and size of objects, as well as the queries being executed. By taking into account the queries currently running or queuing in the system, MRD algorithm can predict more accurately the reuse distances of index data objects.

31. Recording How-Provenance on Probabilistic Databases

Tracking data provenance(or lineage) has become increasingly important in many large-scale applications, and a few methods have been proposed to recorddata provenance recently. However, most of previous works mainly focus on deterministic databases except Trio style lineage that aims at probabilistic databases, which is much more challenging because of the exponential growth of possible world instances and dependence among intermediate tuples. This paper proposes an approach, named PHP-tree, to model how-provenance upon probabilistic databases. we also show how to evaluate probability based on a PHP-tree. Compared with Trio style lineage, our approach is independent of intermediate results and can calculate the probability both cases of restricted and complete propagation of data provenance. Detailed experimental results show the effectiveness, efficiency and scalability of our proposed model.

32. Efficient Peer-to-Peer Similarity Query Processing for High-dimensional Data

Objects, such as a digital image, a text document or a DNA sequence are usually represented in a highdimensional feature space. A fundamental issue in (peer-to-peer) P2P systems is to support an efficientsimilarity search for high-dimensional data in metric spaces. Prior works suffer from some fundamental limitations, such as being not adaptive toa highly dynamic network, poor search efficiency under skewed data scenarios, large maintenance overhead and etc. In this study, we propose an efficient scheme, Dragon, to support P2P similaritysearch in metric spaces. Dragon achieves the efficiency through the following designs: 1) Dragon is based on our previous designed P2P network, Phoenix, which has the optimal routing efficiency in dynamic scenarios. 2) We design a locality-preserving naming algorithm and a routing tree foreach peer in Phoenix tosupport range queries. A radius-estimated method is proposed to transform a kNN queryto a range query. 3) A load-balancing algorithm is given to support strong query processing under skewed data distributions. Extensive experiments verify the superiority of Dragon over existing works.

33. Two Novel Adaptive Symbolic Representations for Similarity Search in Time Series Databases

Since the last decade, we have seen an increasing level of interest in timeseries data mining due to its variety of real-world applications. Numerous representation models of timeseries have been proposed for data mining, including piecewise polynomial models, spectral models, and the recently proposed symbolic models, such as SymbolicAggregate approXimation (SAX) and its multiresolution extension, indexable Symbolic Aggregate approXimation (iSAX). In spite of many advantages of dimensionality/numerosity reduction, and lower bounding distance measures, the quality of SAX approximation is highly dependent on the Gaussian distributed property of time series, especially inreduced-dimensionality literature. In this paper, we introduce a novel adaptivesymbolic approach based on the combination of SAX and k¬-means algorithm which we call adaptive SAX (aSAX). The proposed representation greatly outperforms the classic SAX not only on the highly Gaussian distribution datasets, but also on the lack of Gaussian distribution datasets with a variety of dimensionality reduction. In addition to being competitive with, or superior to, the classic SAX, we extend aSAX to the multiresolution symbolic representationcalled indexable adaptive SAX (iaSAX). Our empirical experiments with real-world time series datasets confirm the theoretical analyses as well as the efficiency of the two proposed algorithms interms of the tightness of lower bound, pruning power and number of random disk accesses.

34. An Asymmetric Similarity Measure for Tag Clustering on Flickr

Web 2.0 tools and environments have made tagging, the act of assigning keywords to on-line objects, a popular way to annotate shared resources. The success of now-prominent taggingsystems makes tagging "the natural way for people to classify objects as well as an attractive way to discover new material". One of the most challenging problems is to harvest the semantics from these systems, which can support a number of applications, including tagclustering and tagrecommendation. We conduct detailed studies ondifferent types of tag relations and tag similarity measures, and propose a scalable measurethat we name Reliability Factor Similarity Measure (RFSM). We compare it with two other measures having similar scalability by integrating them into hierarchical clustering methods and performing tag clustering on a subset of Flickr data. The results suggest that RFSM outperforms those two measureswhen it is applies for tagclustering purpose. We also present an alternative way of utilizing discovered tag relations to set up tagrefining rules in order to deal with some noise in the initial tag sets, which can in turn improve the precision of tag relations.

35. Effective Clustering of Dense and Concentrated Online Communities

Most clusteringalgorithms tend to separate large scale online communities into several meaningful sub-communities by extracting cut points andcut edges. However, these algorithms are not effectiveon dense and concentrated graphs which do not have any meaningful cut points. Common problems with the previous algorithms are as follows. First, the size of the first clusteris too large as it may contain many incompatible users. Second, the quality and the purity of the clusters are very low. Third, only the dominant first cluster is found to be meaningful. To address these problems, we first propose a graph transformation to separate large scale online communities into two different types of meaningful subgraphs. The first subgraph is the intimacy graph and the second is the reputation graph. Then, we present the effectivealgorithms for discovering good sub-communities and for excluding incompatible users in these subgraphs. The experimental results show that our algorithms allow for extracting more suitable and meaningful sub-communitiesthan the previous work in dense onlinenetworks.

36. Graph-based Recommendation on Social Networks

Recommender systems have emerged as an essential response to the rapidly growing digital information phenomenon in which users are finding it more and more difficult to locate the right information at the right time. Systems under Web2.0 allow users not only to give resources- ratings but also to assign tags to them. Tags play a significant role in Web 2.0. They can be used for navigation, browsing, recommendation and so on. In this paper, we propose a novel recommendationalgorithm, which is based onsocial networks. The social network is established among users and items, taking into account both the information of ratings and tags. We consider users' co-tagging behaviors and add the similarity relationship to the graph to enhance the performance. Our algorithm is based on the Random Walk with Restarts but provides a more natural and efficient way to represent social networks. Having considered the influence of tags, the transition matrix is denser and the recommendationis more accurate. By evaluating our new algorithm and comparing it to the baseline algorithm which is used in many real world recommender systems on a real life dataset, we make the conclusion that our method performs better than the baseline method.

37. Improving a News Recommendation System in Adapting to Interests of a User with Storage of a Constant Size

It is desired tohave a system that can recommend news articles according to interests of a user, which would change with time. In this paper, we attempt to improve anews recommendation system with supervised classification by integrating a clustering method into the systemin order to adapt flexibly to the variety of interests of a user. Tofollow the changes of interestswith time, we need to make not only the classifier but also the clustering module of the system be easily updatable. We construct clusters in the feature space from combining one-dimensional clusters to make the size ofstorage to hold for updating clusters be constant. We let the data distribution in each one-dimensional space be influenced by the clustering results from another one-dimensional space, thereby taking into account of data distribution in the original multiple dimensional space. Main contribution of this paper is to propose a method that can achieve both of the two goals: toimprove the performance ofa news recommendation system and to make the amount of data to hold for updating clusters be constant. Some experimental results are shown and the effective and weak points of our proposal are discussed.

38. Dynamically Analyzing Time Constraints in Workflow Systems with Fixed-Date Constraint

In workflow management systems(WFMSs), time management plays an essential role in controlling the lifecycle of business processes. Especially, run-time analysisof time constraints is necessary to help process manager proactively detect possible deadline violations and appropriately handle these violations. Traditional time constraint analyses either present deterministic results which are too restrictive in highly uncertain workflow processes, or only consider static analysis at workflow build-time. For such an issue, this paper proposes a dynamic approach for analyzing timeconstraints during process execution. To be specific, based on a Petri-net-extended stochastic model, this approach first analyzes activity instances' continuous probabilities of satisfying time constraintswhen a process instance is initiated. Afterwards, during the execution of this process instance, the approach dynamically updates these probabilities whenever an activity instance is completed. Moreover, an example process instance in real-world WFMSs shows the practicality of our approach.

39. Retrieving Cultural Heritage Information with Google Earth

In recent years, a great number of digital archives have been built in the field of cultural heritage preservation. Besides providing a systematic way to store and handle digitized versions of valuable artifacts and archival items these archives allow scientists, preservation officers and other professionals to access documents and information that may otherwise be unavailable. Opening the culturalarchives for interoperation as well as public access is often a desirable objective. For this purpose, a user interface everybody knows and understands would be a good choice. In this paper, we propose to use GoogleEarth as a search and retrieval client for public access to digital archives. A mapping from a metadata model comprising geospatial, temporal, and thematical tags onto the GoogleKML is presented. This way Google Earth can easily be extended withcultural heritage information as stored in one or more digital archives. Any user all around the world can use Google Earth to locate, display and access the desired information by simply downloading and installing the appropriate KML file.

40. Hash Tree: A New Hybrid Index for Flash Disks

Flash disks have become a popular alternative for magnetic disks, due to their fast I/O speed and other features such as small-size, shock-resistance, energy-efficient and non-volatile. However, flash disks also have characteristics of out-of-place update and asymmetric I/O latencies for read, write, and erase operations, which introduce new challenges into the indexingmechanism for flash disks. Traditional indexstructures do not take the flash I/O characteristics into account and, therefore, will cause poor performance. To address this problem, we present a newhybrid index structure for flash disksin this paper, which is called HashTree. HashTree aims at getting better update performance while keeping relatively high search efficiency. The HashTree uses a hash-based index to split the indexedrecords into several buckets, and then we develop a new tree structure named FTree to organize the records in each bucket. Compared with the previous tree-based indexes, our policy can reduce the costs of maintaining the hierarchical tree structure. We also introduce a tuning mechanism into the HashTree so that we can obtain appropriate trade-off between search performance and update performance. We conducted an experiment on acommercial SSD to evaluate the performance of HashTree. Compared with its competitors such as BFTL and FD-tree, our experime

The Ultimate Guide to Cloud Gaming: D…

This post first appeared on Final Year IEEE Projects In Chennai, Real Time Pro, please read the originial post: here

People also like

The Ultimate Guide to Cloud Gaming: Discover the Best Services

Finalyearstudentsproject - 2014 PROJECT LISTS

Related Articles

Finalyearstudentsproject - 2014 PROJECT LISTS

Related Articles

Share the post

Subscribe to Final Year Ieee Projects In Chennai, Real Time Pro

Thank you for your subscription