Serial Killers, FBI DNA Tracking, and Scientific Methods
Tension of Government and Technology
Lately I’ve found myself fascinated by civic tech or government’s use of technology to provide social services, enforce the law, or manage general administrative tasks. I like how different these two things feel. The things that make tech move fast are often the things governments try to avoid: For governments moving fast and breaking things < moving slow and making sure things don’t completely break. This brings an obvious tradeoff when it comes to government adoption of new technologies—how can governments provide services as reliable, efficiently, and transparently as the most innovative actors in the private sector? Governments of sufficient size are monopsonies (single buyers in a market) setting the foundation for entire industry ecosystems in the private markets to flourish, think defense. Recently agencies like the US digital service have sprung up to streamline government processes and help safely integrate new technologies into antiquated government systems. It’s easy to shit on government bureaucracy and stagnation and harder to acknowledge the complexity that comes with budgets that require annual approval, the need for bipartisan support, and the often catastrophic consequences of failing to deliver when your client is the citizenry. I’m not sure if this will be a series or a stand alone essay, but I wanted to shed light on a fascinating asset of federal government’s technology stack.
What the heck is CODIS?
CODIS or the Combined DNA Index System, is a database maintained by the FBI. The database has three main purposes: exonerating wrongly convicted criminals, linking individuals to crimes, and identifying the bodies of missing individuals.
The FBI wanted to discourage states from building competing DNA identification software, so they provided it free of charge to state and local crime laboratories. This acted as a safeguard against disjointed data collection and maintenance methods that would fail to identify criminals and missing persons across state lines (take notes).
Adoption of CODIS has been incredibly successful.
As of 2020, it contained more than 14 million offender profiles, more than four million arrestee profiles and more than one million forensic profiles.[11][12]
All 50 states and over 50 other countries now use CODIS software to catalogue this kind of data. This got me wondering…
When is data collected?
Interestingly, some states only collect data for CODIS convictions; whereas, other states will collect data for felony arrests.
Current laws governing DNA collection as of 2023.
🟦 Collection upon conviction only
🟨 Collection from some felony arrests
🟩 Collection from all felony arrests
Privacy considerations
Before you start mulling too many dystopian DNA government tyranny, you should understand that there are privacy considerations here.
First of all, the database doesn’t contain personal identifying information (name, city of residence, etc.)
CODIS does not contain full genome sequences. This means that sensitive information like propensity for disease or certain behaviors cannot be ascertained through CODIS.
In 2013, the United States Supreme Court ruled in Maryland v. King that the collection of DNA from those arrested for a crime, but not yet convicted, is part of the police booking procedure and is reasonable when that collection is used for identification purposes.[26]
As of April 2018, twelve states have approved the use of familial searching in CODIS.[30]
The Science Behind CODIS
If CODIS doesn’t have ‘identifying information’, how can we use it to confirm or exonerate suspected murders or to identify missing people?
Short Tandem Repeats (STRs) are 5-50 base base sequences that repeat across an individuals genome. An individual inherits a set of STRs from both the mother and the father. Currently, as a standard, the national level of CODIS requires STRs at 20 different loci (discrete places in the genome) to be sequenced and uploaded in order to be added. These are ‘non-coding’ pieces of DNA, which are snipped out by enzymes before getting translated in to proteins and impacting specific traits of an individual, further enhancing the anonymity. The rarity of these snippets has to be at least 1 in 10 million to be sufficiently unique for entry into the database. This is a feature that avoids false positives when identifying alleged perpetrators or missing people.
These short tandem repeats rhyme, to an extent, across generations. In other words, your aunt or uncle’s DNA could be used to inform the linkage a DNA sample found at the scene of a crime back to you. The closer the person is to your immediate family, the more informative the sample will be for deducing a matching sample.
Solving Crimes
As of April, 2018 12 states have approved the use of familial searching for ‘close matches’ to help solve serious criminal cases. One of the most notable cases is that of the Grim Sleeper, Lonnie Franklin. Franklin, had no exact DNA matches in CODIS or the California DNA profile database, but forensic researchers identified a similar match. Franklins son who was added to the database by virtue of a felony weapons charge some years earlier. A cop went under cover as a waiter at a restaurant Franklin went to often, and collect DNA samples from his pizza crust and silverware.
This case makes sense. The cops found a close match from a relative who was in the database from another felony then collected the suspect’s DNA from publicly discarded items. Interestingly, law enforcement does not need a warrant to collect DNA samples from objects that an individual has voluntarily abandoned in a public place, like dishes or food remnants in a restaurant. The case of the Golden State Killer was very different. From 1974 to 1986 he perpetrated at least 13 murders, more than 50 rapes, and over 100 burglaries. Similar to the Grim Sleeper, none of the state criminal DNA profiles matched the samples taken from the crime scene; nor did any close matches turn up. This killer’s fate was sealed through evidence pulled from GEDmatch, an open-source genealogy database where people can upload data from Ancestry.com or 23andMe. Catching murderers is, unarguably a good thing, but it does feel a bit icky that familial DNA from consumer genetic testing could be used for booking people who never uploaded their DNA in the first place. Consumer data protections like the Genetic Information Nondiscrimination Act (GINA) make it so that health insurers can’t use genetic information to make decisions about eligibility, premium rates, or the extent of coverage. For what it’s worth GINA does not extend to life insurance, long-term care insurance, or disability insurance. Put that fact together with open source consumer genetic data and we have a recipe for misuse.
If you’ve made it this far let me know what you think!