Prem Devanbu, University of California at Davis, USA
Prem Devanbu is Professor of computer science at UC Davis. He
received his B.Tech in electrical engineering at IIT Madras in 1977, and an
M.S. and Ph.D from Rutgers University in computer science in 1979 and 1994,
respectively. He was a Principal member of technical staff at Bell
Laboratories, Murray Hill, and it's offshoot, AT&T Labs Research, until
January 1998, when he left behind AT&T Labs, and snow, good bagels and bad
traffic, to join UC Davis where he started what is now a very active software
engineering research group. Devanbu has served on the editorial board of the
IEEE Transactions on Software engineering, and the ACM Transactions on Software
Engineering and Methodology, and is now on the board of the Journal of
Empirical Software Engineering. He was program chair of ACM SIGSOFT 2006, and
is now program co-chair of ICSE 2010. He has served on numerous occasions on
program committees of various conferences (including SIGSOFT, OOPSLA Onward!
and ICSE) and National Science Foundation panels. Some legume-numerologists may
be impressed that he has published over 100 papers, but you shouldn't pay
attention to such measures. More to the point, he was rated among the top 10
software engineering researchers in North America in a survey conducted by
researchers at the University of California at Irvine. There are a few such
surveys, which produce a variety of permutations, but Prof. Devanbu
earnestly hopes that this one was particularly objective.
Fair and Balanced? Bias in bug-fix datasets
Prem Devanbu, UC Davis
When
a bug is fixed, programmers are supposed to
report where it is repaired. The association of the bug, and the repair
locus, is critical to quality-improvement efforts, such as bug
prediction and defect etiology and avoidance. But what if programmers
don't tell us where all the bugs are fixed? If
they fail to report bug fixes, or worse, do the reporting in a
selective manner, we might get a biased sample of bug-repair
data. Open-source bug-repair data can be (and has been) very useful in
academic software engineering research. We will describe this question
of bias more precisely, and describe our investigations into the
prevalence of effects of bias.
|
Michael Franklin, University of California Berkeley, USA
Michael
Franklin is a
Professor of Computer Science at UC Berkeley focusing on new approaches
for
data management and data analysis. His recent projects have spanned
systems ranging from dynamic networks of tiny wireless sensor devices
to
large-scale scientific grid computing and cloud computing
infrastructures. He is a co-founder and CTO of Truviso, Inc. a
real-time data
analytics company that enables customers to quickly make sense of
diverse,
high-speed, continuous streams of information. He is a Fellow of
the Association for Computing Machinery, and a recipient of the
National
Science Foundation CAREER award and the ACM SIGMOD "Test of Time"
award. He received his Ph.D. from the University of Wisconsin-Madison
(1993), his M.S.E. from the Wang Institute of Graduate Studies (1986)
and
a B.S. from the University of Massachusetts, Amherst (1983).
Solving
the Scalability Dilemma with Clouds, Crowds, and Algorithms
Michael J.
Franklin, UC Berkeley
http://www.cs.berkeley.edu/~franklin
The creation,
analysis, and dissemination of data have become profoundly democratized. Social
networks spanning 100's of millions of users enable instantaneous discussion,
debate, and information sharing. Streams of tweets, blogs, photos,
and videos identify breaking events faster and in more detail than ever
before. Deep, on-line datasets enable analysis of previously
unreachable information. This sea change is the result of a confluence of
Information Technology advances such as: intensively networked systems, cloud
computing, social computing, and pervasive devices and communication.
The key challenge
is that the massive scale and diversity of this continuous flood of information
breaks our existing technologies. State-of-the-art Machine Learning
algorithms do not scale to massive data sets. Existing data
analytics frameworks cope poorly with incomplete and dirty data and cannot
process heterogeneous multi-format information. Current large-scale
processing architectures struggle with diversity of programming models and job
types and do not support the rapid marshalling and unmarshalling of resources
to solve specific problems. All of these limitations lead to a
Scalability Dilemma: beyond a point, our current systems tend to perform worse
as they are given more data, more processing resources, and involve more people - exactly the opposite of what should happen.
The
Berkeley RADLab is a collaborative effort focused on cloud
computing, involving nearly a dozen faculty members and postdocs,
several
dozen students and fifteen industrial sponsors. The lab is in the
final year of a five-year effort to develop the software infrastructure
to
enable rapid deployment of robust, scalable, data-intensive internet
services. In this talk I will give an overview of the RADLab effort and
do a
deeper dive on several projects, including: PIQL, a performance
insightful
query language for interactive applications, and SCADS, a
self-managing,
scalable key value store. I will also give an overview of a new
effort we are starting on next generation cloud computing architectures
(called
the "AMPLab" - for Algorithms, Machines, and People) focused on
large-scale data analytics, machine learning, and hybrid cloud/crowd
computing. In a nutshell, the RADLab approach has been to use
Statistical Machine Learning in the service of building large-scale
systems. The AMPLab is exploring the other side of this relationship,
namely, using large-scale systems to support Statistical Machine
Learning
and other analysis techniques for data-intensive applications. And given the central role
of the cloud in a world of pervasive connectivity, a key part of the research
agenda is to support collaborative efforts of huge populations of users
connected through cloud resources.
|