The Political Technology Awards

The Political Technology Awards is an open evaluation exercise run by the 2025–26 Newspeak House fellowship cohort. Review the iterations, read about the process, and join us at the Showcase on March 31, 2026 to hear all about it!

Attend the Showcase!

Evaluation / Algorithm (timeline + rankings)

Browse the active algorithm timeline and rankings. Each version captures the scoring logic, rationale, and ranked outputs from the same evaluation pipeline.

Fatima Sarah Khalid Fatima Sarah Khalid

Social Choice Round β€” LiquidFeedback wins committee deliberation (Opus 4.6)

31 Mar 2026 PR #112 merged

Heuristic

Borda count (38 candidates Γ— 16 agents = max 592 pts). Four phases: preference construction from existing ranking-table.csv, belief declaration, ballot strategy, and deliberation. All agent calls: claude-opus-4-6.

Rationale

The four aggregation PRs (#100–#107) each apply a different mathematical rule to the same preference data, yielding different winners (mean β†’ LiquidFeedback, median rank β†’ ODE, divisive β†’ Gapminder, consensus β†’ Vote for Policies). This round asks: given that each agent knows the honest standings and can reason strategically, what does the committee choose? Borda count is used because it is the most standard social-choice rule for ranked preferences and, crucially, the most theoretically susceptible to strategic manipulation (burying). If the honest result is stable under Borda with strategic reasoning, it is likely stable under any rule.

Data Sources

Changes / Limitations

  • AlgorithmWatch absent from Alexandra's Agent's v3 CSV β†’ assigned 0, costs ~12 Borda points
  • FundaciΓ³n CiudadanΓ­a Inteligente absent from Nicholas's Agent's v3 CSV β†’ assigned 0
  • Borda count rewards breadth of support over intensity; a project with passionate but isolated advocates (e.g. Worker Info Exchange) finishes last despite being a constitutional Rank 1
  • 3-point gap between honest and final tallies reflects minor CSV-loading path differences between phases; ranking substantively identical

Assessment

Winner: LiquidFeedback β€” 454 / 592 pts. Unchanged from honest voting.

Phase Outcome
Honest Borda (pre-deliberation) LiquidFeedback 457 pts, ODE 439 pts (+18 lead)
Ballot strategy 0 strategic deviations β€” all 16 agents voted honestly
Deliberation 13 outcome arguments, 3 passes, 0 ballot revisions
Final Borda LiquidFeedback 454 pts β€” result unchanged

Every agent independently concluded that the 18-point lead was too large to overcome unilaterally and/or that strategic manipulation would violate their own constitutional commitments. The deliberation phase produced substantive arguments but no revisions.

Final Borda standings (top 5):

Rank Project Pts
1 LiquidFeedback 454
2 CONSUL Democracy 441
3 Open Data Editor (ODE) 436
3 Alaveteli 436
5 Decidim 422

Comparison with aggregation baselines:

Method Winner
v11 mean score LiquidFeedback
v12 median rank ODE
v13 most divisive Gapminder Worldview Upgrader
v14 lowest stdev Vote for Policies
Social Choice Round (Borda) LiquidFeedback

LiquidFeedback is the only project to win under both mean-score aggregation (v11) and Borda count. Its 18-point honest lead proved robust: no single agent could close it unilaterally, and no credible coalition formed even when 13 agents made public arguments. The most analytically interesting output is the convergence on ODE as the shadow winner β€” four agents from different constitutional directions independently identified it as the constitutionally better choice for a majority of agents, just 18 Borda points behind.

πŸ†

Top 5

No data for this version.

β–½

All rankings

No data for this version.


Project Assessment Log

Each iteration's ranked projects with assessments. Assessments marked * were inferred from the heuristic β€” earlier iterations did not produce per-project rationale.

v15 Social Choice Round β€” LiquidFeedback wins committee deliberation (Opus 4.6) 31 Mar 2026 Β· 0 projects

Process (meetings + discussions)

This log captures committee meetings, governance decisions, and open tradeoffs so deliberation remains public, inspectable, and contestable.

Committee Meetings

Current view

2026-03-30 (matrix chat synthesis through Mar 30)

  • Type: Matrix synthesis note
  • Participants: cohort thread participants (including Ed, Gamithra, Hannah, Fatima, Nick, Huda, others)
  • Source: matrix export (Awards 2026 room, exported 2026-03-30)
  • Related context: weekly process updates, ranking interpretation, showcase prep

Clustered themes

  1. Data quality and enrichment limits

    • Repeated concern that cached/project-page data is too thin for reliable assessments.
    • Consensus that richer external evidence is needed (usage context, third-party references, clearer project descriptions).
    • Practical implication: enrichment quality should be treated as a first-order constraint on ranking quality.
  2. Values and legitimacy framing

    • Rankings were framed as expressions of committee values rather than objective truth claims.
    • Discussion pointed toward making values explicit and inspectably aggregating them across members.
    • Intermediate outputs were treated as meaningful artifacts, not just pipeline exhaust.
  3. Model behavior and stability

    • Strong interest in distinguishing cross-model disagreement from within-model variance.
    • Winner divergence across juries increased demand for repeatability checks before strong winner claims.
    • Communication need identified: publish clearer reasoning alongside numeric outputs.
  4. Process and event UX

    • Ongoing interest in attendee-facing ranking interactions (pairwise or values-driven interfaces).
    • Mini-workshop format at the event was discussed as a way to expose deliberation complexity.
    • Showcase planning emphasized clear narrative continuity across iterations.
  5. Operational cadence

    • Weekly Wednesday check-ins remained the default coordination rhythm.
    • PR cadence and incremental branch experiments continued to be used for rapid iteration.
    • Operator publishing and interface updates were treated as core delivery tasks.

Action items captured

  • Keep publishing concise weekly synthesis notes that separate evidence, values claims, and decisions.
  • Include an explicit "known limits" statement when presenting rankings publicly.
  • Track repeatability and explanation quality as standing criteria before final showcase claims.


Open Questions & Further Notes

Outstanding questions and unresolved tradeoffs currently shaping the evaluation process.

Q1 What minimum evidence quality is required before a project can be scored with confidence?

Area: Data

Why unresolved: Coverage and citation reliability remain uneven across projects

Potential ideas to resolve: Define publication thresholds (coverage, citation quality, manual spot-check criteria) and enforce them before final scoring

Q2 Which committee values should be explicit in the framework, and how should tradeoffs between values be handled?

Area: Values

Why unresolved: Different iterations encode values differently and can lead to different winners

Potential ideas to resolve: Publish a concise values schema and map each scoring lens to it

Q3 Where should human judgment sit in the decision process versus automated scoring?

Area: Facilitation

Why unresolved: Full automation improves scale but may weaken legitimacy and deliberative quality

Potential ideas to resolve: Test and document one explicit human-in-the-loop decision checkpoint

Q4 What does a "good" evaluation design look like for this context: ranking, deliberation, aggregation, or a hybrid?

Area: Method

Why unresolved: Different methods optimize different goals (consistency, interpretability, participation)

Potential ideas to resolve: Compare a small set of methods on the same shortlist and document tradeoffs

Q5 How stable are outcomes across reruns, models, and aggregation rules?

Area: Model behavior

Why unresolved: Winner sensitivity may reflect model variance as much as project differences

Potential ideas to resolve: Run repeatability and cross-model variance checks and publish uncertainty notes with results

Q6 What should rankings be interpreted as: objective winners, value-expressions, or decision aids?

Area: Interpretation

Why unresolved: Public meaning is still ambiguous and risks overclaiming

Potential ideas to resolve: Add explicit interpretation guidance to awards communications and logs

Q7 What level of explanation should accompany each score or ranking decision?

Area: Transparency

Why unresolved: Numbers without rationale reduce contestability and trust

Potential ideas to resolve: Publish concise per-project explanation blocks tied to evidence, values, and method

Q8 How should attendee/cohort participation (for example pairwise or criteria input) inform final outcomes?

Area: Participation

Why unresolved: Participation can improve legitimacy but may conflict with methodological consistency

Potential ideas to resolve: Prototype one participation pathway and define how it affects final decisions


Data Gathering

Track each data collection attempt, what changed in sources and cleaning, and known quality gaps that affect downstream evaluation.

Data Gathering Attempts

Attempt 3 Enriched dossiers era
  • Primary shift: full per-project dossiers with schema, passes, and verification.
  • Key PR: v6
  • Outcome: full coverage and stronger auditability, with explicit evidence-quality limits.
Attempt 2 Flat data-dump era
  • Primary shift: introduce first structured metadata export in flat CSV form.
  • Key PR: v3
  • Outcome: better baseline legibility, still partial and ad-hoc.
Attempt 1 Cache-first era
  • Primary shift: move from URL-only scoring toward cached homepage/pipeline artifacts.
  • Key PRs: v4, v5
  • Outcome: faster reproducibility, but weak structured evidence.

What we're doing

The Political Technology Awards is an open evaluation exercise run by the 2025–26 Newspeak House fellowship cohort. We're building a public, inspectable ranking of civic and political technology projects β€” the kind of tools that help citizens understand institutions, participate in democracy, and hold power to account.

Our evaluation process is iterative and transparent. We use a scoring algorithm that evolves over time. Each version applies different heuristics and produces a ranked list. The algorithm lives in a public GitHub repo; you can inspect the code, the pull requests, and the rationale for every change. We may add written assessments per project as the evaluation matures.

Rankings are political. By making our process transparent and iterative, we hope to surface both strong projects and the tradeoffs inherent in any evaluation framework. A simple tool that empowers marginalized communities ranks higher than a technically impressive platform that reinforces existing power structures.

Make a contribution to the evaluation

Awards Committee

Our committee brings together expertise in civic technology, political science, digital rights, and community organizing. The committee is defined in the CODEOWNERS file in the evaluation repository β€” everyone listed there can approve changes to the algorithm and assessments.

Fred O'Brien Fred O'Brien
Asil Sidahmed Asil Sidahmed
Fatima Sarah Khalid Fatima Sarah Khalid
Gamithra Marga Gamithra Marga
Jamie Coombes Jamie Coombes
Francesca Galli Francesca Galli
Alexandra Ciocanel Alexandra Ciocanel
Davit Jintcharadze Davit Jintcharadze
Nick Botti Nick Botti
Huda Abdirahim Huda Abdirahim