KC Claffy has played a leading role in Internet research for more than a decade. She is the principal investigator for the Cooperative Association for Internet Data Analysis (CAIDA) which is based at the San Diego Supercomputer Center and provides tools and analyses to promote a robust, scalable global Internet infrastructure. As a research scientist at SDSC, her research interests include the collection, analysis, and visualization of workload, routing, topology, performance, and economic data on the Internet. She has been at SDSC since 1991 and holds a Ph.D. in Computer Science from UC San Diego.
Q: You co-founded the Cooperative Association for Internet Data Analysis, CAIDA, a little over 10 years ago. Can you tell us how CAIDA has evolved, and what you're focusing on today?
A: I founded CAIDA to address a problem that began the year I earned my Ph.D., and which has now grown into a crisis, despite our and others' best efforts -- the lack of available empirical data on the public Internet as the infrastructure has privatized.
I finished my Ph.D. in computer science and engineering at UCSD in 1994, with a thesis that relied on network traffic and performance measurements from the NSFNET Backbone network, the general purpose backbone supporting the U.S. research and education community at the time. (SDSC was a transit node on NSFNET, and Hans-Werner Braun, an architect of the NSFNET, had just arrived at SDSC, so we had access to a lot of data and knowledge about the infrastructure). Under the terms of their cooperative agreement to provide the backbone, the operator Merit made certain measurements available by ftp every month, and we also took packet header trace measurements at SDSC, UCSD, and NCSA, which gave my thesis, "Internet traffic characterization," strong empirical grounding.
The year after my graduation, the NSFNET backbone was decommissioned as commercial provisioning of Internet service by private sector players took off. Publicly available traffic data on or about large-scale IP networks also went away that year, which made me fear for the future of Internet research. I started CAIDA to try to narrow the already growing gap between the Internet research community and the Internet providers and users.
Together we managed to grow and maintain a research group through increasingly difficult science funding periods, but over the last decade our ability to get data from commercial Internet providers gradually diminished, as did the quality of science in the field.
Q: Can you explain why the ability to get data diminished?
A: The data that exists within commercial ISPs (Internet Service Providers) is considered proprietary. Providers worry that competitors could use it to steal customers or otherwise harm their business. Other important data is not collected at all, because there is no economic incentive to do so or any regulations requiring it. Metrics that are currently grounded in dangerously insubstantial measurement include the amounts and patterns of data traffic, the structure and evolution of Internet topology, the extent and locations of congestion, the amount or number of sources of spam, phishing, or DOS (Denial of Service) attacks, patterns and distribution of ISP interconnectivity, and other metrics that are critical to analyzing the security, stability, scalability, and sustainability of the Internet.
With mixed success, CAIDA and many others in the research community have navigated the many obstacles to collection and analysis of traffic data on the commercial Internet -- not only the technical and engineering challenges but also the more daunting legal (privacy), logistical, and proprietary considerations. But the unfortunate reality is that while the Internet has already become critical communications infrastructure for business, education, public safety, health care, and civil society, there is amazingly little rigorous empirical inquiry to inform opinion, much less policy, on how to solve problems of the Internet that have persistently resisted solution for the last decade.
Q: What have we learned from and about Internet measurement, or the lack thereof?
A: Over time it became clear to me that there are a common set of operational problems across the Internet industry which can be classified into four dimensions of the Internet as emerging critical infrastructure, these are safety, scalability, sustainability, and stewardship. The bad news is that making progress on all of these operational problems, even those that seem technical in nature, is blocked on non-technical issues of economics, ownership, and trust. For ten years CAIDA sought to tackle one problem -- measurement -- whose biggest obstacles had long clearly been economic (cost of instrumentation and data management), ownership (legal access to data), and trust (privacy and security obstacles to measurement). A more recent, and more painful, insight was that measurement is not unique in this regard, and that all persistently unsolved operational problems of the Internet are similarly blocked on issues of economics, ownership, and trust.
The economic forces of the industry are a key factor because they drive the policy conversations in Washington right now. Without directly confronting the economic constraints that network infrastructure providers face, the integrity of network science, communications policy, or indeed, our own national information infrastructure will always be suspect. Although emerging as the essential communications fabric of our professional and personal lives, the Internet has not yet stabilized from the tremendous privatization and commercialization of infrastructure that began in the early to mid-1990s. After a decade of boom and bust, consolidation continues, with the largest of the remaining providers publicly insisting that they will not be able to make the required investment to build out broadband infrastructure unless they can have more flexible pricing strategies to recover costs, that is, they want to implement differential pricing by type of traffic. Legal scholars have long argued that this development is a constitutional threat to the First Amendment, since providers would thus have a lever to control how users of their infrastructure communicate.
While such dramatic developments are occurring inside the policy realm of Washington and around the world, the network science community has to sit by, frustrated at being unable to engage in empirical network investigations that would support not only the scientific and engineering community but also the policymaking community, where lack of data now carries with it ominous Constitutional implications. The recent controversy over the NSA's access to commercial Internet links only heightened the already well-established paranoia about traffic data collection, further hampering this already stunted field. At this rate, by the end of the decade the network research community will be one of the few groups of people who do not have access to Internet data!
This recognition has led to a change in strategy for CAIDA. It is no longer appropriate to pursue solutions to the Internet's problems without tackling the related economic, ownership, and trust issues. CAIDA's activities have always spanned the four Ss -- security, scalability, sustainability, and stewardship -- but we have begun to refocus current projects and pursue new ones that openly navigate links between technology, economics, and policy.
Q: You've pointed out that the lack of available measurement data on the public Internet as the infrastructure has privatized makes it hard to understand the complexities of the Internet or develop informed policy. Can you tell us why this is important?
A: Well, we should recognize the reality: the United States is facing a worsening information infrastructure crisis -- over the past half-decade the U.S. has fallen behind a growing list of industrialized nations in delivery speeds, price per megabit, broadband penetration rates, and other facets of broadband service provision. Our personal and national security realities are even more disturbing, since the best (but not good) available data shows a formidable profusion in the number and extent of unwanted and malicious traffic, things like DOS attacks, identity theft, spam, phishing, viruses, and worms. The more of our lives we migrate over to this digital realm, the more risk we assume. A targeted attack, relying on technology as well as social capabilities that have already been demonstrated, could cut off, for some period of time, not only our channels of personal communication and entertainment but also our banking, financial services, e-commerce, and supply chain infrastructure, creating devastating economic impacts.
Emphasizing my earlier point, regulatory, political, and market constraints on providers have rendered Internet researchers incapable of studying mission-critical aspects of the Internet and the state of its current robustness, capacity, usage, and vulnerabilities. Potential solutions to persistently unsolved problems thus remain an area of uninformed conjecture rather than rigorous, empirically grounded analysis.
Q: There's a lot of concern about the future of the Internet and whether it will be turned into a private toll road or remain an open public information highway. What do you see as the principal opportunities, and the main challenges, that lie ahead, and how can CAIDA help?
A: The good news is that there's a growing realization in society that the Internet is critical infrastructure for our nation and the world. Historically, new transport infrastructure such as railroads, telegraphs, the electric grid, started out like the Internet did -- "in the wild" and largely unregulated. Once everyone -- especially voters -- considers Internet access critical to their lives, their elected representatives will take an interest in ensuring stability and universal access as essential services. In fact, our broader reliance on the Internet has already led to discussion in the U.S. Congress and elsewhere about how the Internet should develop. For example, what requirements and incentives should there be to ensure connectivity for the significant still-unconnected segment of our own country's population? The discussion is healthy, but the dearth of empirical data hinders informed debate.
And there are lots of policy issues at stake now. In contrast to other countries, the U.S. recently removed the policy that required open access for competitors to the pipes into people's homes and businesses. There is no clear path to competition without open access requirements; facilities-based competition (assuming sufficient competition will emerge across entirely independent physical facilities such as DSL, cable, and satellite) has failed to fulfill its promise of recapturing U.S. leadership in the Internet industry -- on the contrary, since removal of open access the best available data suggests a drop in competition as well as -- arguably related -- in our international ranking in broadband penetration. We hope, and try to help, governments base public policy strategies on the best available empirical data, and to quantitatively measure the performance of those strategies against intended results. Having good data is essential to good policy.