The National Telecommunications and Information Administration NTIA requested consultation on widely available model weights in open-source foundation models. The NTIA’s posture is one of attempting to understand the landscape of the AI frontier to ensure that governance maximizes the positive benefits of continued AI innovation, the following excerpt highlights one of their assumptions:
Dual use foundation models with widely available weights (referred to here as open foundation models) could play a key role in fostering growth among less resourced actors, helping to widely share access to AI’s benefits… The concentration of access to foundation models into a small subset of organizations poses the risk of hindering such innovation and advancements, a concern that could be lessened by availability of open foundation models.
Neil Chilson and I submitted a comment emphasizing the prominence of open source principles in the history of computer science and software development:
The history of computer science and software development demonstrates that openness in software has significant benefits that outweigh any costs, and NTIA should begin its analysis of foundational models with widely available open weights from that default position.
Open source software democratizes access and empowers contribution.
Circuit Boards
Open source software is not only embedded in the informational infrastructure of our economy, the free and open source software movement (FOSS) was around, and partially responsible for, the quick development of the technologies which enabled the internet as we know it. In 1973 Donald O. Pederson released his program, SPICE (Simulation Program with Integrated Circuit Emphasis), into the public domain. SPICE was significant because the early 1970’s was a time where the digital computer was beginning to gain prominence within scientific circles. SPICE was software that taught scientists and hobbyists the fundamentals of the digital computer–the integrated circuit. SPICE enabled contribution through educational access to the technological frontier. Large language models are the current technological frontier and open model weights, ideally in combination with open training data, give an unimaginable amount of people access to one of society’s most powerful technologies.
In the late 1960’s and early 1970’s, AT&T’s Bell Labs had developed their notable unix distribution. The following decade Bell Labs, equipped with DARPA funding, incorporated open source design and implementation principles into their Berkeley Software Distribution (BSD) Unix. BSD Unix is still wildly popular today and was used to build MacOS, IOS, and Sony PlayStation’s operating systems. The BSD team subsequently created the foundations of the internet, TCP/IP protocols. TCP/IP protocols were instrumental in the development of ARPAnet. The internet would look very different without the adoption of open source principles. Open source foundation models allow for high rates of experimentation, what we will create is yet to be determined.
The internet today is the central medium of human communication which has created an unprecedented repository of human knowledge. This is the case because anyone, including those without a technical background, can easily interface with and contribute to this repository. This wasn’t always the case however, in the late 1980’s and early 1990’s interfacing with the internet was a large technical challenge which placed a limit on who could contribute. In order to create the World Wide Web, this limit had to be mitigated.
Mosaic, the first web browser, was created in 1993 at the National Center for Supercomputing (NCSA) at the University of Illinois, Urbana Champaign. Robert McCool, who was formerly at NCSA, created the first widely adopted web server in 1995. The server was named Apache and launched as a free open source project, and continues to remain a free open source project. The contributors at Apache believe:
The tools of online publishing should be in the hands of everyone, and that software companies should make their money by providing value-added services such as specialized modules and support, amongst other things.
By 2014 Apache was hosting 1 billion web sites and today, Apache and Nginx (another open source web server) run over 60% of the world’s websites.
Apache showcased the viability of embedding commercial software in open source principles, many companies would follow. Early programmers understood that the internet would create the most value for society if contribution through the creation of software was accessible. Programming is still very difficult however in the 1990’s and early 2000’s, it was significantly less accessible to the general public. Programming involves abiding by strict rules imposed by a programming language which require the developer to memorize large amounts of syntax, boilerplate code, best practices with varying programming paradigms, and even algorithms. This can make the process of writing a program grueling without assistance. Some of this assistance is made possible with integrated development environments (IDEs). IDEs give developers access to tools that make programming and developing applications easier. Open source principles are prevalent in the market for IDEs as eight of the top ten most popular IDEs for the Java programming language are open source. IDEs made it easier to interface with a computer and large language models are revolutionizing the ease of interfacing with a computer.
Not only is open source prevalent in the history of software but open source is instrumental for many modern projects. Open source enables those with small means to make a large impact. Although training costs for state of the art large-language models are falling, it was estimated that training OpenAI’s GPT-3 cost $4.6M which limits accessibility to resource constrained organizations.
Social Change
332 comments were submitted to the NTIA’s request for public comment. Among these comments are three unique perspectives from organizations that may not regularly participate in activism, JusticeText, Recidiviz, and The Last Mile. Three mission driven organizations and startups dedicated to leveraging technology to increase equity in the criminal justice system.
JusticeText
JusticeText is a software platform developed by University of Chicago students in 2019, aimed at improving justice outcomes for low-income defendants by leveraging technology to analyze body-worn camera footage, interrogation videos, and jail calls. Utilizing advanced speech recognition and natural language processing, it automates evidence transcription and identification of critical moments, thereby addressing the significant challenges faced by overburdened public defenders. Central to JusticeText’s development is the use of open source software, which not only democratizes access to innovative tools against more resource-rich competitors but also ensures the platform can be adapted and refined to meet specific needs of criminal justice reform.
Recidiviz
Recidiviz is a nonprofit technology organization dedicated to reforming the criminal justice system by partnering with state corrections departments across the U.S. to leverage data for systemic change. Since its inception in 2019, Recidiviz has developed an open source data platform that processes data from 15 state partners, enabling the identification of individuals eligible for release or those who could benefit from targeted support, thereby facilitating their reintegration into society. This initiative has reached 39% of the U.S. incarcerated population, helping to accelerate the release of over 103,500 people. Open source software is fundamental to Recidiviz’s mission to promote transparency, accountability, and trust between the organization and its partners, ensuring that their tools and algorithms avoid causing unintended consequences.
The Last Mile
The Last Mile is a program aimed at transforming the lives of justice-impacted individuals through education and technology training within correctional facilities. Open source software is instrumental to the educational programs at The Last Mile due to correctional facilities not having access to an internet connection. Licensing restrictions with closed source software require The Last Mile to leverage open source software in their educational programs to bypass the unique constraints of teaching in a correctional facility.
Historical insights reveal the profound creativity unleashed through embracing open source software principles, delivering unparalleled value to society in both tangible and intangible ways. From laying the foundations of the internet to supporting public defenders, facilitating the release of eligible inmates, and offering educational opportunities for inmate reintegration into the workforce, open source has been pivotal. With large language models poised to become the next major software paradigm, regulators should prioritize ensuring open and democratic access.