SEDE 2010

19th International Conference on Software Engineering and Data Engineering

Sponsored by

Back to Homepage (CFP)

Prem Devanbu, University of California at Davis, USA

Prem Devanbu is Professor of computer science at UC Davis. He received his B.Tech in electrical engineering at IIT Madras in 1977, and an M.S. and Ph.D from Rutgers University in computer science in 1979 and 1994, respectively. He was a Principal member of technical staff at Bell Laboratories, Murray Hill, and it's offshoot, AT&T Labs Research, until January 1998, when he left behind AT&T Labs, and snow, good bagels and bad traffic, to join UC Davis where he started what is now a very active software engineering research group. Devanbu has served on the editorial board of the IEEE Transactions on Software engineering, and the ACM Transactions on Software Engineering and Methodology, and is now on the board of the Journal of Empirical Software Engineering. He was program chair of ACM SIGSOFT 2006, and is now program co-chair of ICSE 2010. He has served on numerous occasions on program committees of various conferences (including SIGSOFT, OOPSLA Onward! and ICSE) and National Science Foundation panels. Some legume-numerologists may be impressed that he has published over 100 papers, but you shouldn't pay attention to such measures. More to the point, he was rated among the top 10 software engineering researchers in North America in a survey conducted by researchers at the University of California at Irvine. There are a few such surveys, which produce a variety of permutations, but Prof. Devanbu earnestly hopes that this one was particularly objective.

Fair and Balanced? Bias in bug-fix datasets

Prem Devanbu, UC Davis

When a bug is fixed, programmers are supposed to report where it is repaired. The association of the bug, and the repair locus, is critical to quality-improvement efforts, such as bug prediction and defect etiology and avoidance. But what if programmers don't tell us where all the bugs are fixed? If they fail to report bug fixes, or worse, do the reporting in a selective manner, we might get a biased sample of bug-repair data. Open-source bug-repair data can be (and has been) very useful in academic software engineering research. We will describe this question of bias more precisely, and describe our investigations into the prevalence of effects of bias.

Michael Franklin, University of California Berkeley, USA

Michael Franklin is a Professor of Computer Science at UC Berkeley focusing on new approaches for data management and data analysis. His recent projects have spanned systems ranging from dynamic networks of tiny wireless sensor devices to large-scale scientific grid computing and cloud computing infrastructures. He is a co-founder and CTO of Truviso, Inc. a real-time data analytics company that enables customers to quickly make sense of diverse, high-speed, continuous streams of information. He is a Fellow of the Association for Computing Machinery, and a recipient of the National Science Foundation CAREER award and the ACM SIGMOD "Test of Time" award. He received his Ph.D. from the University of Wisconsin-Madison (1993), his M.S.E. from the Wang Institute of Graduate Studies (1986) and a B.S. from the University of Massachusetts, Amherst (1983).

Solving the Scalability Dilemma with Clouds, Crowds, and Algorithms

Michael J. Franklin, UC Berkeley
http://www.cs.berkeley.edu/~franklin

The creation, analysis, and dissemination of data have become profoundly democratized. Social networks spanning 100's of millions of users enable instantaneous discussion, debate, and information sharing. Streams of tweets, blogs, photos, and videos identify breaking events faster and in more detail than ever before. Deep, on-line datasets enable analysis of previously unreachable information. This sea change is the result of a confluence of Information Technology advances such as: intensively networked systems, cloud computing, social computing, and pervasive devices and communication.

The key challenge is that the massive scale and diversity of this continuous flood of information breaks our existing technologies. State-of-the-art Machine Learning algorithms do not scale to massive data sets. Existing data analytics frameworks cope poorly with incomplete and dirty data and cannot process heterogeneous multi-format information. Current large-scale processing architectures struggle with diversity of programming models and job types and do not support the rapid marshalling and unmarshalling of resources to solve specific problems. All of these limitations lead to a Scalability Dilemma: beyond a point, our current systems tend to perform worse as they are given more data, more processing resources, and involve more people - exactly the opposite of what should happen.

The Berkeley RADLab is a collaborative effort focused on cloud computing, involving nearly a dozen faculty members and postdocs, several dozen students and fifteen industrial sponsors. The lab is in the final year of a five-year effort to develop the software infrastructure to enable rapid deployment of robust, scalable, data-intensive internet services. In this talk I will give an overview of the RADLab effort and do a deeper dive on several projects, including: PIQL, a performance insightful query language for interactive applications, and SCADS, a self-managing, scalable key value store. I will also give an overview of a new effort we are starting on next generation cloud computing architectures (called the "AMPLab" - for Algorithms, Machines, and People) focused on large-scale data analytics, machine learning, and hybrid cloud/crowd computing. In a nutshell, the RADLab approach has been to use Statistical Machine Learning in the service of building large-scale systems. The AMPLab is exploring the other side of this relationship, namely, using large-scale systems to support Statistical Machine Learning and other analysis techniques for data-intensive applications. And given the central role of the cloud in a world of pervasive connectivity, a key part of the research agenda is to support collaborative efforts of huge populations of users connected through cloud resources.