Emergent Identity Inference in Large Language Models

An AI safety audit of whether LLMs can re-link fragmented public traces into coherent personal profiles - without direct identifiers

Status: Active

Started: July 2025

Contributors: 4

Project Manager:

Introduction & Rationale

Data anonymisation underpins modern privacy practice: remove direct identifiers (names, emails, phone numbers) and treat the remainder as "safe" for analysis and sharing. Yet many real-world privacy failures do not come from a single leak of PII, but from the mosaic effect - lots of small, seemingly harmless traces that become identifying when linked.

We hypothesise that modern Large Language Models (LLMs), especially when paired with web search or browsing tools, make this linkage scalable. Their pattern-matching and synthesis abilities may enable emergent identity inference: constructing a coherent, individual-level profile by connecting disparate, non-identifying public data points, and then inferring additional sensitive attributes that were never explicitly stated.

From an AI safety perspective, this is a capability risk as much as a misuse risk. Automating record linkage and inference lowers the marginal cost of producing dossiers at scale, enabling downstream harms such as targeted manipulation, harassment, surveillance, and fraud. It also undermines the assurance that "anonymised" or "non-PII" data is inherently safe.

This project delivers a controlled, reproducible audit of that capability in publicly accessible AI systems. We use fully synthetic personas to avoid real-world harm, and we quantify both accuracy and overreach (unsupported inferences) to characterise safety-relevant failure modes.

Research Objectives

The project is structured around three primary objectives:

Empirically test whether pre-trained LLMs can, zero-shot, link scattered public traces into a single coherent persona without direct identifiers.
Quantify performance with a transparent scoring rubric: which attributes are reconstructed correctly, which are guessed, and how confidently each claim is presented.
Translate results into safety-relevant guidance for developers and policymakers, clarifying where anonymisation assumptions break and what mitigations and evaluation criteria should be expected.

Methodology

Our research is centred on a transparent, low-cost experimental framework we call the "Public Data Ghost Audit". The goal is to test inference under conditions that resemble real deployment (public, messy, fragmented traces) while keeping the experiment ethically safe and fully replicable.

The audit consists of five phases:

Ground-Truth Persona Construction: We develop several detailed fictional personas ("digital ghosts") with a ground-truth attribute sheet, and author a set of short public traces where each trace reveals only a partial view of the persona.
Public Trace Seeding: We publish those traces as disconnected fragments across publicly indexable sites, avoiding any real-world personal details (real names, addresses, real employers or organisations, or contact details).
Indexing Period: We allow time for normal indexing and record which traces are actually retrievable, separating inference from accidental non-availability.
Zero-Shot Audit Queries: We evaluate leading public LLMs using a fixed prompt suite that asks the model to reconstruct a coherent profile from evidence, clearly separating evidence-backed claims from speculation. No attempt is made to identify real individuals.
Quantitative Scoring: We score outputs against ground truth with metrics for attribute accuracy, linkage success, and overreach (unsupported claims), producing a comparable capability profile across models.

Significance & Contributions

This research is poised to make significant contributions to several fields:

AI Safety Evaluation: A replicable protocol for measuring identity-inference capability in web-enabled LLM systems, complementing existing privacy, jailbreak, and tool-use evaluations.
Privacy Engineering: Empirical evidence for when "no PII" and anonymisation assumptions fail due to linkage and inference, informing mitigations in product policy, retrieval controls, training, and monitoring.
Governance and Policy: A concrete way to operationalise "inferred data" and the mosaic effect for risk assessments, with results that can inform regulators and auditors (e.g., GDPR-style frameworks) without exposing any real individual.
Public Understanding: An accessible demonstration of a subtle risk using synthetic data, helping non-experts understand why fragmented digital traces can still be identifying in the age of frontier AI.

Persona Template (Ground Truth + Seeding Plan)

Completed

Inputs:

Project scope and safety framing (capability audit; no real individuals)
Best practices for creating plausible but synthetic, non-identifying personas
Examples of benign public trace fragments (short posts, comments, gists)

Process:

Draft, review, and finalise a standardised template that (1) separates seedable vs [PRIVATE] ground truth, (2) includes a concrete seeding plan table for fragmented public traces, and (3) defines scoring anchors so outputs can be evaluated consistently.

Outputs:

Version-controlled template (persona_template.md)

Completed by:

sn00z

Seeding Guidelines (Safety, Logging, Indexing)

Completed

Inputs:

persona_template.md - The persona creation framework
Research on suitable public platforms (forums, Q&A, micro‑notes, etc.)
Ethical constraints (synthetic only; benign content; no endorsements; takedown plan)

Processes:

Develop strict, safety-first guidelines to govern the seeding process: what is allowed vs forbidden, how to avoid accidentally creating real-world resemblance, how to log URLs/timestamps/screenshots, how to run indexing checks, and how to remove/redress seeds if needed.

Outputs:

seeding_guidelines.md

Completed by:

sn00z

Identification and Selection of Target LLMs for Audit

Completed

Inputs:

A landscape analysis of currently available, public-facing LLMs.

Processes:

Select target LLMs based on public accessibility and safety relevance, and specify test-day capture requirements (model/version strings, interface URLs, and whether web search/browsing is enabled vs disabled).

Outputs:

Finalised list of target models (target_llms.md).

Completed by:

Literature Review + Starter Prompt Library

Completed

Inputs:

Academic databases and pre-print archives (e.g., arXiv).

Processes:

Review prompt strategies relevant to evidence-first inference audits (tool use, citation requests, uncertainty calibration, refusal handling) and distil them into a model-agnostic starter prompt suite.

Outputs:

Summary of relevant techniques and a bibliography (lit_review_prompts.md).

Completed by:

Ellroi

Digital Ghost #1 (DG-01): Ground Truth + Seeding Plan

Outstanding

Inputs:

Processes:

Fill out the template to create a complete synthetic persona with clear separation between seedable vs [PRIVATE] ground truth.
Author a 15–25 item seeding plan where each post reveals only 1–3 weak identifiers, but the set can be linked at the persona level.
Run a safety check: no real names, addresses, employers, contact details, endorsements/ratings, or anything that could plausibly be mistaken as a real person.

Outputs:

DG-01_profile.md

Digital Ghosts #2–3 (DG-02, DG-03): Ground Truth + Seeding Plans

Outstanding

Inputs:

persona_template.md
Completed DG-01_profile.md for cross-referencing.
seeding_guidelines.md

Processes:

Create two new, distinct personas with different anchors (interests, tone, platforms) while keeping all content synthetic and non-identifying.
Ensure each includes a full seeding plan and scoring anchors, and that personas are mutually distinguishable (minimal overlap with DG-01).

Outputs (expected):

Persona files (DG-02_profile.md, DG-03_profile.md).

Scoring Rubric (Accuracy, Linkage, Overreach, Refusal)

Outstanding

Inputs:

Completed persona profiles (DG-01_profile.md, DG-02_profile.md, DG-03_profile.md)
Project safety goals (evidence-backed claims; quantify unsupported inference)
Academic research on qualitative analysis and annotation reliability

Processes:

Define what counts as a correct reconstruction (attribute accuracy + granularity) vs a guess, and how to score partial credit.
Add explicit scoring for overreach: confident claims not supported by evidence, fabrication, or attempts to produce direct identifiers.
Specify how to handle refusals and safety messages (capture verbatim; treat as a distinct outcome).
Produce a scorer checklist and tie-breaking rules to improve consistency across contributors.

Outputs (expected):

scoring_rubric.md with an attribute-level rubric and overreach taxonomy
scoring_sheet_template.csv (or equivalent) for consistent annotation

Querying Protocol & Prompt Suite (Evidence-First)

Outstanding

Inputs:

Completed lit_review_prompts.md for theoretical grounding.
Completed persona profiles as they become available (DG-01_profile.md, DG-02_profile.md, DG-03_profile.md).
List of target_llms.md to understand model-specific syntax or behavior.
Draft scoring rubric (so prompts elicit scorable outputs)

Processes:

Design a model-agnostic prompt suite that asks for (a) a candidate profile, (b) an evidence table/citations (where available), and (c) explicit uncertainty per attribute.
Include ablations to separate "linkage" from generic stereotyping (e.g., single-trace vs multi-trace prompts).
Include refusal-handling instructions and a standard output schema (attribute → value → evidence → confidence).
Package prompts as a single, version-controlled document suitable for running in public web UIs.

Outputs (expected):

Version-controlled query_library.md (official prompt suite for the audit)

Seed Public Traces + Log URLs (DG-01)

Outstanding

Inputs:

Completed persona profile DG-01_profile.md
seeding_guidelines.md

Processes:

Publish the planned traces for DG-01 across the approved platform types, keeping each post benign and non-identifying.
Log every post in seeding_log.csv with URL, timestamp, anchor tags, and (where allowed) a screenshot path.
Do not engage with other users or target real individuals; the goal is to create synthetic fragments, not to influence discourse.

Outputs (expected):

New entries added to seeding_log.csv (plus any supporting screenshots)

Seed Public Traces + Log URLs (DG-02 & DG-03)

Outstanding

Inputs:

Completed persona profiles DG-01_profile.md, DG-02_profile.md, DG-03_profile.md
seeding_guidelines.md

Processes:

Publish the planned traces for DG-02 and DG-03 across the approved platform types, keeping each post benign and non-identifying.
Log every post in seeding_log.csv with URL, timestamp, anchor tags, and (where allowed) a screenshot path.

Outputs (expected):

New entries added to seeding_log.csv (plus any supporting screenshots)

Indexing & Retrievability Checks (indexing_log.csv)

Outstanding

Inputs:

seeding_log.csv confirming that all planned traces have been deployed
Indexing check schedule (e.g., 14 and 30 days after posting)

Processes:

Run discoverability checks using generic multi-anchor queries (as defined in the guidelines) and record whether traces are retrievable.
Log queries and outcomes in indexing_log.csv, and note missing/non-indexed traces without attempting to deanonymise or identify anyone.
Produce a short summary describing overall retrievability per persona and any major indexing gaps.

Outputs (expected):

indexing_log.csv capturing retrievability checks
indexing_summary.md with a concise snapshot of what was actually retrievable at audit time

Run Inferential Audit & Capture Dataset

Outstanding

Inputs:

Finalised query_library.md (prompt suite)
List of target_llms.md specifying which models to audit
Ground-truth persona files (DG-01_profile.md, DG-02_profile.md, DG-03_profile.md)
indexing_summary.md (what was actually retrievable)

Processes:

Run the prompt suite on each target model using the public web UI, recording exact model/version strings and UI settings (including browsing/search on vs off where possible).
Capture outputs verbatim, including refusals and any citations/links the UI provides, and store results with timestamps and metadata.
Where stochasticity is high, run multiple trials per prompt to estimate variance.

Outputs (expected):

audit_results.json containing prompts, outputs, citations, model metadata, and timestamps

Score, Analyse, and Characterise Failure Modes

Outstanding

Inputs:

audit_results.json containing the complete log of all queries and AI responses
scoring_rubric.md for consistent evaluation criteria
Ground-truth persona files (DG-01_profile.md, DG-02_profile.md, DG-03_profile.md)
indexing_summary.md (to contextualise retrievability)

Processes:

Apply the scoring rubric to each response, recording attribute-level accuracy and overreach.
Compute comparable metrics across models and settings (e.g., browsing on vs off): linkage success, attribute accuracy, overreach rate, refusal rate.
Qualitatively analyse representative failures (confident fabrication, stereotyping, unsafe identifier attempts) and document patterns.
Generate plots and tables suitable for inclusion in an academic writeup.

Outputs (expected):

analysis_report.md with key findings, metrics, and interpretations
scored_results.csv (or equivalent) plus supporting visualisations

Draft Paper + Mitigation Recommendations

Outstanding

Inputs:

analysis_report.md with comprehensive findings
All methodology documents (persona_template.md, seeding_guidelines.md, query_library.md, etc.)
Literature review and background research

Processes:

Synthesise all project phases, methodologies, findings, and implications into a formal academic paper
Structure the paper to include introduction, methodology, results, discussion, and conclusions
Address implications for privacy engineering and AI safety evaluation, including concrete mitigations and recommended evaluation criteria
Ensure the paper meets academic standards for rigour and clarity

Outputs (expected):

Complete research_paper_draft.md
Executive summary for policy makers and public dissemination
Mitigation_notes.md (developer-facing recommendations)

Project Repository

This repository contains all materials generated from the project.

persona_template.md - Standardised template for digital ghost personas
seeding_guidelines.md - Guidelines for data seeding process
target_llms.md - Finalised list of target models for audit
lit_review_prompts.md - Summary of prompt engineering techniques and bibliography