Before You Cry ‘Fraud’: What DOGE’s Open-Source Medicaid Data Is (and Isn’t)
DOGE’s Medicaid data drop should have been a good news story, but it is being used to fuel reckless speculation about fraud in Medicaid.
Authors: Jocelyn Guyer, Avi Herring and Laura Nolan
Editors: Patti Boozang and Amanda Eisenberg
tl;dr
On Feb. 13, the Department of Government Efficiency (DOGE)’s Department of Health and Human Services (HHS) team released a large data set on Medicaid hospital outpatient and professional services provided between 2018 and 2024. The data set — representing about a fifth of total Medicaid spending — marks the first time comprehensive data on Medicaid expenditures for an important subset of services in fee-for-service and managed care delivery systems has been made broadly public.
The release of the data should have been unequivocally good news, but it was paired with disingenuous claims amplified on social media that now anyone can easily use these data to conduct their own fraud investigation, prompting a number of wannabe sleuths to begin making confident but often inaccurate accusations of fraud, waste and abuse (FWA).
The new data can be used for many important things — including understanding trends in certain types of Medicaid utilization and expenditures across states — but it cannot serve as a reliable tool to “easily” detect fraud. Even if you had access to comprehensive Medicaid data that is documented with appropriate caveats — a best-case scenario that unfortunately is not at play here with this DOGE-released data — preventing, detecting and prosecuting fraud requires comprehensive data mining and investigative efforts that are cooperative between the federal government and states.
The notion that you can get away with using a DOGE release of partial Medicaid data, AI bots and a long weekend to uncover FWA undercuts the real work needed to address this issue and ensure that Medicaid funds are used to provide critical services to the program’s 80 million beneficiaries.
The 80 Million Impact
Amid HHS’ escalating rhetoric disparaging state stewardship of Medicaid and the federal–state partnership needed to support robust program integrity, DOGE’s publication of the Medicaid provider spending data file could have been a welcome step toward transparency. Instead, when sharing the data, DOGE HHS and its allies made outlandish claims that it now will be easy to find fraud. As a result, the release now appears less like a transparency initiative to improve Medicaid and more like a data drop designed to fuel a broader narrative that states are permitting unmitigated Medicaid fraud and that Medicaid is weak and ill-managed — language that aims to justify further Medicaid cuts.
The reality is that Medicaid is a resoundingly successful program that provides critical care to, at most recent count, almost 80 million people. Like any public or private insurance plan, it must confront the reality of fraud. Medicaid program integrity is a federal–state partnership. States administer the program, but they operate under federal statute, federal financing rules, federal eligibility standards and federal audit frameworks. Oversight is shared by design, and by necessity.
What’s in the data
The DOGE HHS published data file is a summary of 2018–2024 Medicaid hospital outpatient and professional claims from the Transformed Statistical Information System (T-MSIS) files, the Centers for Medicare & Medicaid Services’ (CMS) comprehensive data files collected from all 50 states, the District of Columbia and U.S. territories on a monthly basis. The fields in the summary data file shared with the public are a fragment of the T-MSIS data files, but do include:
National Provider Identifiers (NPI)
Healthcare Common Procedure Coding System (HCPCS) procedure codes
Month of claim
Count of Medicaid beneficiaries
Count of claims
Amount paid by Medicaid
The data includes, by month and service, which providers delivered each service, how many beneficiaries they served and how much Medicaid paid them. Prior to this consequential release, access to detailed Medicaid claims required a Data Use Agreement and secure data environment via the Research Data Assistance Center (ResDAC). With access to these difficult-to-obtain credentials, researchers, policymakers and others could gain access to fee-for-service data, but not data on Medicaid paid amounts on managed care encounters which represent the majority of Medicaid expenditures.
What’s not in the data
The DOGE data excludes information on almost 80% of total Medicaid spending, including (based on CMS’ 2023 National Health Expenditures data):
Inpatient hospital care ($293.8 billion, representing 33.6% of Medicaid spending)
Nursing facility and continuing care stays ($72 billion, representing 8.2% of Medicaid spending)
Retail pharmacy prescriptions ($51.8 billion, representing 5.9% of Medicaid spending)
Other critical Medicaid services
The (very) partial nature of the data drop — and the failure to include any methodology or documentation with it — has resulted in the data being used to tell highly misleading stories. For example, DOGE HHS used the data to generate a chart that suggests personal care services are the largest driver of Medicaid spending, feeding the Trump administration’s narrative that these expenditures — which allow older adults and individuals with disabilities to stay in their homes — are wildly out of control due to fraud. We, too, would be speechless if personal care was the biggest source of spending in Medicaid, but this notion, simply put, is wrong. According to CMS’ NHE data, the biggest category of spending in Medicaid is — by far — hospital care. The actual percentage of all Medicaid spending represented by personal care services was closer to 2.5% in 2023.
The DOGE data file on its own cannot answer questions about how the included services are driving overall Medicaid spending because it’s an incomplete data set. It’s like leaving your mortgage out of your household budget and then concluding groceries are your biggest expense.
This data alone cannot be used to identify FWA
It is also misleading to say that the data file alone can be used to easily identify FWA. While identifying outlier providers with a large number of claims or expenditures can be an important first step in identifying potential FWA, analyses cannot stop there. On the contrary, it is irresponsible to contend that expenditure trends alone can be used to establish that FWA is occurring. Rapid growth in home health care within a state could represent remarkable success in helping people move out of nursing homes and allowing them to live in the community — or a criminal syndicate stealing the identities of a state’s residents and using them to generate fraudulent claims. To identify fraud, it takes analysis of expenditure trends, of course, but also formal investigations at the state and/or federal level to differentiate among possible explanations. A serious effort to uncover FWA requires combing data mining techniques with an informed assessment of other contextual factors, such as changing referral and contracting dynamics, policy changes like program scale-up or changes in eligibility, and seasonal effects.
And, as we have said before, both the federal government and states are responsible for ensuring Medicaid program integrity. The best outcomes occur when they work together to identify FWA.
The Bottom Line
Open data can strengthen Medicaid, and independent scrutiny can improve public programs. But partial transparency paired with politicized messaging does the opposite — distorting reality rather than illuminating it. Releasing incomplete claims data without context, then treating utilization as proof of fraud, is not serious program integrity reform. If the goal is to reduce fraud, waste and abuse, the path forward is clear: full transparency, analytical rigor and sustained federal–state partnership — not partial data drops that are released with messaging designed to undermine the program itself.



Really important context to counter misuse of this data. Spead this post as widely as possible, please.