Mission to Protect Intelligent Life | 保护智能生命特派团 | बुद्धिमान जीवन की रक्षा के लिए मिशन | Misión para proteger la vida inteligente | بعثة حماية الحياة الذكية | Mission pour la protection de la vie intelligente

The future of life is under threat...

Let's fix that.

Emergent Identity Inference in Large Language Models

An AI safety audit of whether LLMs can re-link fragmented public traces into coherent personal profiles - without direct identifiers

Status: Active
Started: July 2025
Contributors: 4
Project Manager:
Project Manager

Introduction & Rationale

Data anonymisation underpins modern privacy practice: remove direct identifiers (names, emails, phone numbers) and treat the remainder as "safe" for analysis and sharing. Yet many real-world privacy failures do not come from a single leak of PII, but from the mosaic effect - lots of small, seemingly harmless traces that become identifying when linked.

We hypothesise that modern Large Language Models (LLMs), especially when paired with web search or browsing tools, make this linkage scalable. Their pattern-matching and synthesis abilities may enable emergent identity inference: constructing a coherent, individual-level profile by connecting disparate, non-identifying public data points, and then inferring additional sensitive attributes that were never explicitly stated.

From an AI safety perspective, this is a capability risk as much as a misuse risk. Automating record linkage and inference lowers the marginal cost of producing dossiers at scale, enabling downstream harms such as targeted manipulation, harassment, surveillance, and fraud. It also undermines the assurance that "anonymised" or "non-PII" data is inherently safe.

This project delivers a controlled, reproducible audit of that capability in publicly accessible AI systems. We use fully synthetic personas to avoid real-world harm, and we quantify both accuracy and overreach (unsupported inferences) to characterise safety-relevant failure modes.

Research Objectives

The project is structured around three primary objectives:

  • Empirically test whether pre-trained LLMs can, zero-shot, link scattered public traces into a single coherent persona without direct identifiers.
  • Quantify performance with a transparent scoring rubric: which attributes are reconstructed correctly, which are guessed, and how confidently each claim is presented.
  • Translate results into safety-relevant guidance for developers and policymakers, clarifying where anonymisation assumptions break and what mitigations and evaluation criteria should be expected.

Methodology

Our research is centred on a transparent, low-cost experimental framework we call the "Public Data Ghost Audit". The goal is to test inference under conditions that resemble real deployment (public, messy, fragmented traces) while keeping the experiment ethically safe and fully replicable.

The audit consists of five phases:

  1. Ground-Truth Persona Construction: We develop several detailed fictional personas ("digital ghosts") with a ground-truth attribute sheet, and author a set of short public traces where each trace reveals only a partial view of the persona.
  2. Public Trace Seeding: We publish those traces as disconnected fragments across publicly indexable sites, avoiding any real-world personal details (real names, addresses, real employers or organisations, or contact details).
  3. Indexing Period: We allow time for normal indexing and record which traces are actually retrievable, separating inference from accidental non-availability.
  4. Zero-Shot Audit Queries: We evaluate leading public LLMs using a fixed prompt suite that asks the model to reconstruct a coherent profile from evidence, clearly separating evidence-backed claims from speculation. No attempt is made to identify real individuals.
  5. Quantitative Scoring: We score outputs against ground truth with metrics for attribute accuracy, linkage success, and overreach (unsupported claims), producing a comparable capability profile across models.

Significance & Contributions

This research is poised to make significant contributions to several fields:

  • AI Safety Evaluation: A replicable protocol for measuring identity-inference capability in web-enabled LLM systems, complementing existing privacy, jailbreak, and tool-use evaluations.
  • Privacy Engineering: Empirical evidence for when "no PII" and anonymisation assumptions fail due to linkage and inference, informing mitigations in product policy, retrieval controls, training, and monitoring.
  • Governance and Policy: A concrete way to operationalise "inferred data" and the mosaic effect for risk assessments, with results that can inform regulators and auditors (e.g., GDPR-style frameworks) without exposing any real individual.
  • Public Understanding: An accessible demonstration of a subtle risk using synthetic data, helping non-experts understand why fragmented digital traces can still be identifying in the age of frontier AI.
Persona Template (Ground Truth + Seeding Plan)
Completed
Inputs:
  • Project scope and safety framing (capability audit; no real individuals)
  • Best practices for creating plausible but synthetic, non-identifying personas
  • Examples of benign public trace fragments (short posts, comments, gists)
Process:
  • Draft, review, and finalise a standardised template that (1) separates seedable vs [PRIVATE] ground truth, (2) includes a concrete seeding plan table for fragmented public traces, and (3) defines scoring anchors so outputs can be evaluated consistently.
Outputs:
Completed by:
Completed by sn00z
Seeding Guidelines (Safety, Logging, Indexing)
Completed
Inputs:
  • persona_template.md - The persona creation framework
  • Research on suitable public platforms (forums, Q&A, micro‑notes, etc.)
  • Ethical constraints (synthetic only; benign content; no endorsements; takedown plan)
Processes:
  • Develop strict, safety-first guidelines to govern the seeding process: what is allowed vs forbidden, how to avoid accidentally creating real-world resemblance, how to log URLs/timestamps/screenshots, how to run indexing checks, and how to remove/redress seeds if needed.
Completed by:
Completed by sn00z
Identification and Selection of Target LLMs for Audit
Completed
Inputs:
  • A landscape analysis of currently available, public-facing LLMs.
Processes:
  • Select target LLMs based on public accessibility and safety relevance, and specify test-day capture requirements (model/version strings, interface URLs, and whether web search/browsing is enabled vs disabled).
Outputs:
Completed by:
Completed by BD
Literature Review + Starter Prompt Library
Completed
Inputs:
  • Academic databases and pre-print archives (e.g., arXiv).
Processes:
  • Review prompt strategies relevant to evidence-first inference audits (tool use, citation requests, uncertainty calibration, refusal handling) and distil them into a model-agnostic starter prompt suite.
Outputs:
Completed by:
Completed by Ellroi
Digital Ghost #1 (DG-01): Ground Truth + Seeding Plan
Outstanding
Processes:
  • Fill out the template to create a complete synthetic persona with clear separation between seedable vs [PRIVATE] ground truth.
  • Author a 15–25 item seeding plan where each post reveals only 1–3 weak identifiers, but the set can be linked at the persona level.
  • Run a safety check: no real names, addresses, employers, contact details, endorsements/ratings, or anything that could plausibly be mistaken as a real person.
Outputs:
  • DG-01_profile.md
Digital Ghosts #2–3 (DG-02, DG-03): Ground Truth + Seeding Plans
Outstanding
Inputs:
Processes:
  • Create two new, distinct personas with different anchors (interests, tone, platforms) while keeping all content synthetic and non-identifying.
  • Ensure each includes a full seeding plan and scoring anchors, and that personas are mutually distinguishable (minimal overlap with DG-01).
Outputs (expected):
  • Persona files (DG-02_profile.md, DG-03_profile.md).
Scoring Rubric (Accuracy, Linkage, Overreach, Refusal)
Outstanding
Inputs:
  • Completed persona profiles (DG-01_profile.md, DG-02_profile.md, DG-03_profile.md)
  • Project safety goals (evidence-backed claims; quantify unsupported inference)
  • Academic research on qualitative analysis and annotation reliability
Processes:
  • Define what counts as a correct reconstruction (attribute accuracy + granularity) vs a guess, and how to score partial credit.
  • Add explicit scoring for overreach: confident claims not supported by evidence, fabrication, or attempts to produce direct identifiers.
  • Specify how to handle refusals and safety messages (capture verbatim; treat as a distinct outcome).
  • Produce a scorer checklist and tie-breaking rules to improve consistency across contributors.
Outputs (expected):
  • scoring_rubric.md with an attribute-level rubric and overreach taxonomy
  • scoring_sheet_template.csv (or equivalent) for consistent annotation
Querying Protocol & Prompt Suite (Evidence-First)
Outstanding
Inputs:
  • Completed lit_review_prompts.md for theoretical grounding.
  • Completed persona profiles as they become available (DG-01_profile.md, DG-02_profile.md, DG-03_profile.md).
  • List of target_llms.md to understand model-specific syntax or behavior.
  • Draft scoring rubric (so prompts elicit scorable outputs)
Processes:
  • Design a model-agnostic prompt suite that asks for (a) a candidate profile, (b) an evidence table/citations (where available), and (c) explicit uncertainty per attribute.
  • Include ablations to separate "linkage" from generic stereotyping (e.g., single-trace vs multi-trace prompts).
  • Include refusal-handling instructions and a standard output schema (attribute → value → evidence → confidence).
  • Package prompts as a single, version-controlled document suitable for running in public web UIs.
Outputs (expected):
  • Version-controlled query_library.md (official prompt suite for the audit)
Seed Public Traces + Log URLs (DG-01)
Outstanding
Inputs:
Processes:
  • Publish the planned traces for DG-01 across the approved platform types, keeping each post benign and non-identifying.
  • Log every post in seeding_log.csv with URL, timestamp, anchor tags, and (where allowed) a screenshot path.
  • Do not engage with other users or target real individuals; the goal is to create synthetic fragments, not to influence discourse.
Outputs (expected):
  • New entries added to seeding_log.csv (plus any supporting screenshots)
Seed Public Traces + Log URLs (DG-02 & DG-03)
Outstanding
Inputs:
Processes:
  • Publish the planned traces for DG-02 and DG-03 across the approved platform types, keeping each post benign and non-identifying.
  • Log every post in seeding_log.csv with URL, timestamp, anchor tags, and (where allowed) a screenshot path.
Outputs (expected):
  • New entries added to seeding_log.csv (plus any supporting screenshots)
Indexing & Retrievability Checks (indexing_log.csv)
Outstanding
Inputs:
  • seeding_log.csv confirming that all planned traces have been deployed
  • Indexing check schedule (e.g., 14 and 30 days after posting)
Processes:
  • Run discoverability checks using generic multi-anchor queries (as defined in the guidelines) and record whether traces are retrievable.
  • Log queries and outcomes in indexing_log.csv, and note missing/non-indexed traces without attempting to deanonymise or identify anyone.
  • Produce a short summary describing overall retrievability per persona and any major indexing gaps.
Outputs (expected):
  • indexing_log.csv capturing retrievability checks
  • indexing_summary.md with a concise snapshot of what was actually retrievable at audit time
Run Inferential Audit & Capture Dataset
Outstanding
Inputs:
  • Finalised query_library.md (prompt suite)
  • List of target_llms.md specifying which models to audit
  • Ground-truth persona files (DG-01_profile.md, DG-02_profile.md, DG-03_profile.md)
  • indexing_summary.md (what was actually retrievable)
Processes:
  • Run the prompt suite on each target model using the public web UI, recording exact model/version strings and UI settings (including browsing/search on vs off where possible).
  • Capture outputs verbatim, including refusals and any citations/links the UI provides, and store results with timestamps and metadata.
  • Where stochasticity is high, run multiple trials per prompt to estimate variance.
Outputs (expected):
  • audit_results.json containing prompts, outputs, citations, model metadata, and timestamps
Score, Analyse, and Characterise Failure Modes
Outstanding
Inputs:
  • audit_results.json containing the complete log of all queries and AI responses
  • scoring_rubric.md for consistent evaluation criteria
  • Ground-truth persona files (DG-01_profile.md, DG-02_profile.md, DG-03_profile.md)
  • indexing_summary.md (to contextualise retrievability)
Processes:
  • Apply the scoring rubric to each response, recording attribute-level accuracy and overreach.
  • Compute comparable metrics across models and settings (e.g., browsing on vs off): linkage success, attribute accuracy, overreach rate, refusal rate.
  • Qualitatively analyse representative failures (confident fabrication, stereotyping, unsafe identifier attempts) and document patterns.
  • Generate plots and tables suitable for inclusion in an academic writeup.
Outputs (expected):
  • analysis_report.md with key findings, metrics, and interpretations
  • scored_results.csv (or equivalent) plus supporting visualisations
Draft Paper + Mitigation Recommendations
Outstanding
Inputs:
  • analysis_report.md with comprehensive findings
  • All methodology documents (persona_template.md, seeding_guidelines.md, query_library.md, etc.)
  • Literature review and background research
Processes:
  • Synthesise all project phases, methodologies, findings, and implications into a formal academic paper
  • Structure the paper to include introduction, methodology, results, discussion, and conclusions
  • Address implications for privacy engineering and AI safety evaluation, including concrete mitigations and recommended evaluation criteria
  • Ensure the paper meets academic standards for rigour and clarity
Outputs (expected):
  • Complete research_paper_draft.md
  • Executive summary for policy makers and public dissemination
  • Mitigation_notes.md (developer-facing recommendations)

Project Repository

This repository contains all materials generated from the project.

Contributor Dashboard: Log In

Please enter your credentials to continue.

Invalid username or password. Please try again.

Contributor Registration

Please submit your details for review.

Please provide a brief, informal bio. This may include your interests, any particular skills that you may have, your motivations for wishing to contribute to MPIL projects, or anything else that you feel is relevant.

An error occurred. Please try again.