History of Federal Public Data: Past, Present, and Uncertain Futures
Public data is a quiet cornerstone of U.S. democracy. Whether tracking air quality, monitoring disease outbreaks, or evaluating disparities in criminal sentencing, the ability to access timely and reliable federal data empowers citizens, researchers, and policymakers alike.
The Inkwell Global Report dashboard monitors data availability across key agencies that shape public policy and lived experience. Below is a brief introduction to each agency we track, followed by a look at the history of Data.gov and what’s at stake for its future.
The Agencies We Track
CDC – Centers for Disease Control and Prevention
Maintains data on infectious disease, injury, maternal health, and other health related topics. Public health officials, journalists, and academic researchers rely on CDC dashboards and surveillance reports to inform both crisis response and long-term policy planning.
Census Bureau
Provides population-level data critical to redistricting, federal funding, education planning, and social science. Census products like the American Community Survey (ACS) are foundational to nearly every demographic model in the U.S.
DOJ – Department of Justice
Collects and reports data on crime, incarceration, and law enforcement practices, often through the Bureau of Justice Statistics (BJS). These datasets are essential for understanding systemic inequities in the criminal legal system.
EPA – Environmental Protection Agency
Publishes environmental data on air and water quality, toxic releases, and climate indicators. EPA tools are used in environmental justice mapping, risk modeling, and compliance monitoring.
HHS – Department of Health and Human Services
Oversees multiple sub-agencies like CMS and SAMHSA that release vital datasets on healthcare utilization, insurance coverage, opioid use, and mental health. Many HHS datasets underpin federal health equity and policy analysis efforts.
NSF – National Science Foundation
Funds scientific research and collects metadata on STEM education, research outputs, and federal science funding. Its National Center for Science and Engineering Statistics (NCSES) is one of the most important sources of STEM workforce data.
NOAA – National Oceanic and Atmospheric Administration
Maintains climate, weather, and marine data. Used in everything from hurricane prediction and drought modeling to commercial shipping and fisheries management.
USDA – U.S. Department of Agriculture
Manages agricultural production data, food insecurity statistics, rural economic indicators, and food safety alerts. USDA data supports food policy, market forecasts, and nutritional programs.
🔍 How People Use Federal Data
- Public accountability and journalism
- Scientific research and program evaluation
- Local planning and emergency preparedness
- Business innovation and environmental modeling
Data, when timely and trustworthy, becomes the connective tissue between lived experience and responsive governance. When it’s missing, broken, or hidden— public trust erodes.
🗂️ Data.gov and other public data sites
Data.gov launched in 2009 as a central portal for federal datasets under the Obama administration’s Open Government Initiative. It aimed to:
- Consolidate access to agency datasets
- Enable civic tech innovation
- Foster data-driven decision-making
At its peak, Data.gov linked to over 250,000 datasets. But as agencies shifted to hosting their own dashboards and funding for data maintenance grew more fragile, the platform became harder to navigate—and less reflective of real-time data health.
Unfortunately, there are many ways to present data and not all are easy to access. Several agencies with excellent data but do not have an API (application programming interface) or metadata that lets other computers easily understand what is there. Therefore, despite the desire for completeness, our counts of datasets released is only partial. For example, the CDC lists some of their data on data.gov but has other dedicated APIs that the Inkwell Global Report Transparency Dashboard pulls from instead. Specifically, Inkwell looks for CDC data and the Census Bureau directly. Despite having many datasets available, only Socrata (www.data.cdc.gov) from the CDC has an API which allows for easy access to data. The other sites are evaluated using the BeautifulSoup program to find information directly from their websites.
Socrata looks at disease surveillance, chronic conditions and the behaviors that lead to them, maternal and child health, environmental and social health such as social vulnerability indexing and urban heat exposure, injury, violence and substance abuse data and overall mortality and vital statistics like life expectancy. MMWR is the primary loudspeaker for the CDC. Most of its data comes from local reporting, special investigations and some large national surveys. Its releases are in the form of a peer reviewed reporting of datasets found elsewhere making it challenging to obtain a count of new datasets. The last database Inkwell checks for new datasets is the Vaccine Adverse Event Reporting System (VAERS) which is co-managed by the FDA. It has a specialized database designed for onsite queries. It lacks robust metadata and is therefore also very difficult to count. Updates are yearly and accounts for the small yearly spike seen on the CDC data.
Other notable data spikes visible on the dashboard illustrate peak activity months within the normalized scale. Normalization scales each agency’s data as a percentage of its own highest monthly total, allowing comparisons over time. It remains helpful to understand a little of what was happening on those maximum months.
- CDC (November 2024): Completed the Public Health Data Authority initiative that focused on modernizing their data process, emphasizing end-of-year publishing.
- HHS (November 2020): Loaded previously separate data onto Data.gov.
- DOJ (August 2021): Updated its metadata across all series.
- USDA (March 2021): Quick Stats database published a large compressed dataset.
- NSF (February 2021): Marked the initial upload of NSF datasets.
- NOAA (October 2, 2024): Metadata update of their datasets.
- Census Bureau: Release of the 2024 Census information.
Despite these incongruencies, a pattern of data release becomes evident over time. Also important to remember: CDC data is gathered separately and extends the full 15 years covered, whereas the other organizations’ dataset releases can only be counted within the last 5 years — despite data collection dating back significantly further.
🔮 The Uncertain Future
- Some agencies interpret “open” loosely or inconsistently.
- Political priorities and budget constraints undermine enforcement.
- Tools break, dashboards vanish, and metadata lags.
When dashboards stop updating or links die, the damage is often quiet—but profound. This isn’t just technical neglect. It’s a form of civic decay.
Lately, many of DOGE improvements have caused well established data trails to be underfunded or unfunded. It is these more recent changes that alter the future most significantly now.
🧭 Why We Track
Inkwell Global Report watches these systems for signs of disruption. Are updates regular? Are APIs stable? Is metadata available and usable? Are links quietly going dark?
We believe public data is public infrastructure. And we track it like the foundation it is.
📚 Source Citations
- CDC
CDC Data & Statistics
Data Modernization Initiative - Census Bureau
Census Data Tools
American Community Survey - DOJ / Bureau of Justice Statistics
BJS Data Collections
DOJ Data.gov page - EPA
EPA Environmental Dataset Gateway
EPA Open Data - HHS
HHS Protect Public Data Hub
HealthData.gov (CMS, SAMHSA, etc.) - NSF
NCSES Data Tools
NSF Data.gov page - NOAA
NOAA National Centers for Environmental Information
NOAA Open Data - USDA
USDA National Agricultural Statistics Service
USDA Economic Research Service - Data.gov / Open Government
Data.gov About
Foundations for Evidence-Based Policymaking Act of 2018
Federal Data Strategy
OMB M-19-23: Phase 1 Implementation