The Political Technology Awards

The Political Technology Awards is an open evaluation exercise run by the 2025–26 Newspeak House fellowship cohort. Review the iterations, read about the process, and join us at the Showcase on March 31, 2026 to hear all about it!

Attend the showcase

Fatima Sarah Khalid Fatima Sarah Khalid

Six-jury ITN/A deliberation

9 Mar 2026 PR #15 open

Heuristic

This heuristic inherits the approach in v5: ITN/A multi-agent deliberation heuristic with 6 independent AI juries that run an ITN/A deliberation on a shortlist of 183 projects. The jury with the highest confidence score picks the project winner.

v6 inherits the full ITN/A evaluation and deliberation pipeline from v5 — the same 4-agent structure (political, relational, experimental), the same multi-argument format with a facilitator, and the same scoring tiers (deliberated: 51–90, 2+ greens: 45, 1 green: 20, none: 5).

The changes are in who runs the jury, how the shortlist is built, and how the winner is selected.

Total pipeline spend: $11.30 USD

Phase 1: Evaluation

Grok 4.1 Fast, Claude Sonnet 4, and Kimi 2.5 independently evaluate all 321 candidates across three lenses: political, relational, and experimental. Each assessment produces a bucket (green / yellow / red / grey) and a rationale per dimension.

Phase 2: Shortlist

Shortlist rule: at least 2 of the 3 models rated the project green or yellow in any dimension.

This 2-of-3 rule produced 183 candidates.

Phase 3: Six Jury Deliberations

The three original models deliberate on their own evaluation data (each model reads only its own assessments). Each jury runs the full ITN/A multi-agent deliberation — agents argue, contest, revise scores, and produce a ranked shortlist with a winner, confidence score, and case for/against.

Original juries:

  • Grok 4.1 Fast
  • Claude Sonnet 4
  • Kimi 2.5

Mixed juries (deliberate from merged assessments):

  • GPT-4o — mainstream / average voter
  • DeepSeek-R1 — adversarial / contrarian
  • Specialist panel — Gemini 2.5 Pro (political), Llama 3.3 70B (relational), Mistral Large (experimental)

Phase 4: Promotion

The six jury verdicts are compared by confidence score. The highest-confidence jury is promoted and selects the final award winner.

Rationale

v5 showed us that a single deliberating agent via Grok could produce a well-reasoned winner (one that made sense in an obvious political-tech-award sort of way), but left an open question:

Is the result robust to model choice?
Do different AI “worldviews” produce the same answer?

v6 tests this theory.

Three original juries read their own evaluation data. The three mixed juries read merged assessments and deliberately introduce different perspectives: GPT-4o as a mainstream / institutional voice, DeepSeek-R1 as a contrarian with different political training data, and a specialist panel routing each evaluation lens (political, relational, experimental) to a purpose-fit model.

Confidence scoring

Winner selection is based on the confidence score — the jury that is most certain of its verdict wins.

Grok 4.1 Fast again had the highest confidence (as in v5). It was also the cheapest and fastest model while still producing interesting analysis.

Data Sources

project URL, scraped content, additional data files

Changes / Limitations

  • Confidence is self-reported by the deliberating model. A model that is overconfident by nature may win regardless of reasoning quality (e.g. Grok?).
  • Claude’s calibration asymmetry skews the Claude jury’s deliberation pool to a very narrow shortlist, which likely explains its low confidence (42/100). This may not be a fair comparison.
  • Merged assessments for mixed juries use the most optimistic bucket per dimension, which biases mixed juries toward charitable readings.
  • Specialist panel (different model per lens) is experimental — no prior evidence that routing lenses to different models improves deliberation quality.
  • All six juries picked a different winner. The result is sensitive to which jury gets promoted. The committee may want to treat the full set of verdicts as equally valid competing perspectives.
  • Technical issues:
    • Pipeline ran on a VPS with a Nanoclaw agent overnight.
    • Kimi 2.5 was the least reliable model (repeated stalling, manual restarts).
    • Several container resets meant partial progress was lost and some evaluations were re-run.
    • Deliberation phase: ~6h 38m (14:40–21:18 UTC).
    • Total cost ~$11.30, mostly from re-runs due to stalling.

Assessment

Human-readable Markdown summaries of all six jury deliberations: jury README.

All six juries disagreed on their winners:

Jury Winner Confidence
Grok AlgorithmWatch 95
Specialist Mixed Jury Alaveteli 90
Adversarial SlopStop 85
Kimi Worker Info Exchange 82
Mixed Jury Bellingcat Toolkit 75
Claude Awesome Gov Datasets 42

Different AI training backgrounds appear to encode different political values:

  • Grok — systemic advocacy, evidence-to-policy pipelines
  • DeepSeek — decentralised infrastructure
  • Specialist panel — civic access tools with proven relational networks

AlgorithmWatch as the promoted winner (score 97, confidence 95) is defensible: it scored near-uniformly high across all three lenses (political 98, relational 97, experimental 97) and was contested, meaning it survived challenge.

Findings & reflections

Why is it always Grok?

Grok performed well on v5; testing against other model juries, it again came out on top by confidence. Grok may be trained to be opinionated (e.g. if its corpus includes Twitter/X debates). This task structure — high velocity, high conviction, argument format — may suit Grok’s design. The confidence score Grok reports (95/100) might reflect a greater willingness to say it’s confident (analogous to Claude being less willing to assign green). The ITN/A deliberation format (make a case, contest, revise) may be one Grok is natively good at.

Claude (claude-sonnet-4-6) is structurally conservative as an evaluator

Model Greens Green+Yellow
Grok 120 191
Kimi 33 230
Claude 3 92

Claude assigned green to only 3 / 321 projects, compared with Grok’s 120.

The shortlist rule was originally a union of green projects but had to be redesigned after evaluation. Research into Claude Sonnet 4 suggests this behaviour is related to RLHF calibration around political statements and alignment. The shortlist rule was changed to the 2-of-3 model rule, widening the pool to 183 projects.

Shortlist rule design

Goal: roughly 100 projects (~⅓ of the dataset).

Strategy Count Verdict
Union — any model green or yellow 242 Too loose; single-model noise
2-of-3 — at least 2 models green or yellow 183 Chosen
Intersection — all 3 models green or yellow 88 Too tight; Claude’s calibration removes too many

Rule chosen: A project enters the shortlist if at least 2 of the 3 models rated it green or yellow in any dimension.

Jury structure

Original juries (each reads its own evaluation data):

Jury Model Assessments
grok x-ai/grok-4.1-fast assessments-grok.json
claude anthropic/claude-sonnet-4-6 assessments-all-claude.json
kimi moonshotai/kimi-k2 assessments-all-kimi.json

Mixed juries (read merged assessments):

Jury Model(s) Rationale
mixed openai/gpt-4o mainstream / institutional perspective
adversarial deepseek/deepseek-r1 contrarian political training data
specialist gemini-2.5-pro + llama-3.3-70b + mistral-large per-lens specialists

Winner selection compares winner.confidence across all six deliberation outputs.

Open questions

  • Does Claude’s calibration asymmetry reflect a different political epistemology, or just noise? The adversarial (DeepSeek) jury is an interesting contrast.
  • The specialist jury (per-lens model specialisation) is experimental — does routing political/relational/experimental to different models improve deliberation quality?
  • Confidence scores across juries indicate how contested the winner is. High variance = interesting; low variance = robust consensus.
  • Open expansions from the v5 iteration.
  • Trying a different framework could further test reflections on Grok vs Claude on deliberations and confidence.
🏆

Top 5

#1
algorithmwatch.org
https://algorithmwatch.org

Score: 97.00

#2
toolkit.bellingcat.gitbook.io
https://bellingcat.gitbook.io/toolkit

Score: 94.00

#3
creativecommons.org
https://creativecommons.org

Score: 83.00

#4
slopstop.blog.kagi.com
https://blog.kagi.com/slopstop

Score: 68.00

#5
adhocracy.plus
https://adhocracy.plus

Score: 20.00

All rankings

  1. #1 algorithmwatch.org 97.00
  2. #2 toolkit.bellingcat.gitbook.io 94.00
  3. #3 creativecommons.org 83.00
  4. #4 slopstop.blog.kagi.com 68.00
  5. #5 adhocracy.plus 20.00
  6. #6 aleph.occrp.org 20.00
  7. #7 bonfirenetworks.org 20.00
  8. #8 civicpress.io 20.00
  9. #9 civicrm.org 20.00
  10. #10 climateaction.tech 20.00
  11. #11 cobudget.com 20.00
  12. #12 consulproject.org 20.00
  13. #13 coralproject.net 20.00
  14. #14 platform.cortico.ai 20.00
  15. #15 datacollective.mozillafoundation.org 20.00
  16. #16 datatrusts.uk 20.00
  17. #17 decelerator.org.uk 20.00
  18. #18 decidim.org 20.00
  19. #19 ethelo.com 20.00
  20. #20 expo.diia.gov.ua 20.00
  21. #21 fixmyblock.org 20.00
  22. #22 cverluise.github.com 20.00
  23. #23 disarmfoundation.github.com 20.00
  24. #24 huridocs.github.com 20.00
  25. #25 i-dot-ai.github.com 20.00
  26. #26 mikekelly.github.com 20.00
  27. #27 openpolitics.github.com 20.00
  28. #28 podemos-info.github.com 20.00
  29. #29 sentinelteam.github.com 20.00
  30. #30 travisbrown.github.com 20.00
  31. #31 vbuterin.github.com 20.00
  32. #32 globalfactcheck.bot 20.00
  33. #33 tools-and-services.hact.org.uk 20.00
  34. #34 humbledata.org 20.00
  35. #35 library.theengineroom.org 20.00
  36. #36 liquidfeedback.com 20.00
  37. #37 matrix.org 20.00
  38. #38 nyaaya.org 20.00
  39. #39 oneproject.org 20.00
  40. #40 opendigitalplanning.org 20.00
  41. #41 openheartmind.org 20.00
  42. #42 openparliament.ca 20.00
  43. #43 openreferraluk.org 20.00
  44. #44 osf.io 20.00
  45. #45 2017.richardpope.org 20.00
  46. #46 secfirst.org 20.00
  47. #47 snowdrift.coop 20.00
  48. #48 societyforhopefultechnologists.org 20.00
  49. #49 sourceafrica.net 20.00
  50. #50 soweego.readthedocs.io 20.00
  51. #51 spacetu.be 20.00
  52. #52 standards.theodi.org 20.00
  53. #53 the-list.uk 20.00
  54. #54 thecircuit.cc 20.00
  55. #55 upgrader.gapminder.org 20.00
  56. #56 agenciesforgood.org 20.00
  57. #57 service-manual.gov.uk 20.00
  58. #58 hotosm.org 20.00
  59. #59 opensanctions.org 20.00
  60. #60 oporaua.org 20.00
  61. #61 organise.org.uk 20.00
  62. #62 otree.org 20.00
  63. #63 shareddigitalguides.org.uk 20.00
  64. #64 teachingpublicservice.digital 20.00
  65. #65 theyworkforyou.com 20.00
  66. #66 ushahidi.com 20.00
  67. #67 watchduty.org 20.00
  68. #68 yourtracka.org 20.00
  69. #69 activisthandbook.org 5.00
  70. #70 all-our-ideas.citizens.is 5.00
  71. #71 responsible-tech-guide-2025.alltechishuman.org 5.00
  72. #72 annas-archive.pm 5.00
  73. #73 landlordtech.antievictionmappingproject.github.io 5.00
  74. #74 aragon.org 5.00
  75. #75 arxiv.org 5.00
  76. #76 abs.arxiv.org 5.00
  77. #77 atlasofsurveillance.org 5.00
  78. #78 benefits-calculator.turn2us.org.uk 5.00
  79. #79 bsky.app 5.00
  80. #80 channel.org 5.00
  81. #81 charitydigitalskills.co.uk 5.00
  82. #82 charmverse.io 5.00
  83. #83 choosealicense.com 5.00
  84. #84 platform.citizenos.com 5.00
  85. #85 ciudadaniai.org 5.00
  86. #86 civiclick.com 5.00
  87. #87 civicmatch.app 5.00
  88. #88 civictech.guide 5.00
  89. #89 ckan.org 5.00
  90. #90 collab.land 5.00
  91. #91 collaborative-data.theodi.org 5.00
  92. #92 methodology.communities.sunlightfoundation.com 5.00
  93. #93 communityrule.info 5.00
  94. #94 conservativepartyfunding.co.uk 5.00
  95. #95 contractsfordatacollaboration.org 5.00
  96. #96 cotsi.org 5.00
  97. #97 data.humdata.org 5.00
  98. #98 data.mysociety.org 5.00
  99. #99 ddocs.new 5.00
  100. #100 dev.reuselibrary.service.justice.gov.uk 5.00
  101. #101 developer.parliament.uk 5.00
  102. #102 api.developers.democracyclub.org.uk 5.00
  103. #103 product.dgc-cgn.org 5.00
  104. #104 product.digitalcharitylab.org 5.00
  105. #105 docs.plus 5.00
  106. #106 dogooder.co 5.00
  107. #107 dovetail.network 5.00
  108. #108 dunadyne.org 5.00
  109. #109 entitledto.co.uk 5.00
  110. #110 product.esper.com 5.00
  111. #111 fairbnb.coop 5.00
  112. #112 farmerchat.digitalgreen.org 5.00
  113. #113 fatebook.io 5.00
  114. #114 filmot.com 5.00
  115. #115 gender-pay-gap.service.gov.uk 5.00
  116. #116 getodk.org 5.00
  117. #117 bluesky-social.github.com 5.00
  118. #118 deepseek-ai.github.com 5.00
  119. #119 fission-codes.github.com 5.00
  120. #120 geeksforsocialchange.github.com 5.00
  121. #121 ideal-postcodes.github.com 5.00
  122. #122 kazad.github.com 5.00
  123. #123 notchia.github.com 5.00
  124. #124 populatetools.github.com 5.00
  125. #125 propublica.github.com 5.00
  126. #126 radicalxchange.github.com 5.00
  127. #127 rahvaalgatus.github.com 5.00
  128. #128 ribenamaplesyrup.github.com 5.00
  129. #129 stanfordcdt.github.com 5.00
  130. #130 thicknavyrain.github.com 5.00
  131. #131 tulir.github.com 5.00
  132. #132 whotargetsme.github.com 5.00
  133. #133 find-local-consultations.gov.uk 5.00
  134. #134 uk.granicus.com 5.00
  135. #135 granitt.io 5.00
  136. #136 grantnav.threesixtygiving.org 5.00
  137. #137 greenpt.ai 5.00
  138. #138 hand-written-petition-scanner.streamlit.app 5.00
  139. #139 harmonica.chat 5.00
  140. #140 idealist.org 5.00
  141. #141 journaliststudio.google.com 5.00
  142. #142 kialo.com 5.00
  143. #143 labourxchange.uk 5.00
  144. #144 ladderhub.org 5.00
  145. #145 landexplorer.coop 5.00
  146. #146 liberopinion.com 5.00
  147. #147 local-deep-researcher-hnmh.vercel.app 5.00
  148. #148 localinsight.org 5.00
  149. #149 logos.co 5.00
  150. #150 manifold.markets 5.00
  151. #151 mapit.mysociety.org 5.00
  152. #152 mapped.commonknowledge.coop 5.00
  153. #153 mapping.kids 5.00
  154. #154 @abscond.medium.com 5.00
  155. #155 metagov.medium.com 5.00
  156. #156 membersinterests.org.uk 5.00
  157. #157 wiki.meta.wikimedia.org 5.00
  158. #158 metaculus.com 5.00
  159. #159 projects.metagov.org 5.00
  160. #160 missingnumbers.org 5.00
  161. #161 monitormamdani.com 5.00
  162. #162 moralmachine.net 5.00
  163. #163 navigator.oii.ox.ac.uk 5.00
  164. #164 nestr.io 5.00
  165. #165 nookcrm.com 5.00
  166. #166 numfocus.org 5.00
  167. #167 nymtech.net 5.00
  168. #168 oa.report 5.00
  169. #169 objector.ai 5.00
  170. #170 en.okfn.org 5.00
  171. #171 bisect.onodi.co 5.00
  172. #172 openaccess.transparency.org.uk 5.00
  173. #173 openaudience.org 5.00
  174. #174 openbudgets.eu 5.00
  175. #175 opencouncil.network 5.00
  176. #176 opencouncildata.co.uk 5.00
  177. #177 opendatacommunities.org 5.00
  178. #178 openletter.earth 5.00
  179. #179 opensupplyhub.org 5.00
  180. #180 orcid.org 5.00
  181. #181 osintframework.com 5.00
  182. #182 overton.io 5.00
  183. #183 p4ai.net 5.00
  184. #184 pageviews.wmcloud.org 5.00
  185. #185 sol3.papers.ssrn.com 5.00
  186. #186 parliamentwatch.ug 5.00
  187. #187 parsethebill.com 5.00
  188. #188 parti.xyz 5.00
  189. #189 planit.org.uk 5.00
  190. #190 plausible.io 5.00
  191. #191 keepitinthecommunity.plunkett.my.site.com 5.00
  192. #192 uk.policyengine.org 5.00
  193. #193 policykit.org 5.00
  194. #194 policymogul.com 5.00
  195. #195 postbug.com 5.00
  196. #196 understanding-your-morality.programs.clearerthinking.org 5.00
  197. #197 app.public.tableau.com 5.00
  198. #198 publicaccountability.org 5.00
  199. #199 publicmediastack.com 5.00
  200. #200 pursuanceproject.org 5.00
  201. #201 quadraticvote.radicalxchange.org 5.00
  202. #202 relationaltechproject.org 5.00
  203. #203 remembertovote.org.uk 5.00
  204. #204 research.localgov.digital 5.00
  205. #205 right-to-know.org 5.00
  206. #206 riversentiment.app 5.00
  207. #207 schema.org 5.00
  208. #208 sci-hub.se 5.00
  209. #209 semanticclimate.github.io 5.00
  210. #210 site.urbanistai.com 5.00
  211. #211 sobre.ejparticipe.org 5.00
  212. #212 strikemap.org 5.00
  213. #213 talktothecity.org 5.00
  214. #214 thegovernmentsays.com 5.00
  215. #215 insights.theodi.org 5.00
  216. #216 timecounts.org 5.00
  217. #217 tracking-template-38b4c.web.app 5.00
  218. #218 trends.whotargets.me 5.00
  219. #219 turkopticon.ucsd.edu 5.00
  220. #220 products.unpaywall.org 5.00
  221. #221 urbit.org 5.00
  222. #222 viewpoints.xyz 5.00
  223. #223 violationtrackeruk.goodjobsfirst.org 5.00
  224. #224 voteforpolicies.org.uk 5.00
  225. #225 wardwatch.uk 5.00
  226. #226 whoisology.com 5.00
  227. #227 whopostedwhat.com 5.00
  228. #228 en.campaigntracker.nl 5.00
  229. #229 consoc.io 5.00
  230. #230 crowdjustice.com 5.00
  231. #231 research.csail.mit.edu 5.00
  232. #232 deliberaide.com 5.00
  233. #233 dfos.com 5.00
  234. #234 discourse.org 5.00
  235. #235 donotpay.com 5.00
  236. #236 publication.dsc.org.uk 5.00
  237. #237 fixmystreet.com 5.00
  238. #238 forms.service.gov.uk 5.00
  239. #239 givefood.org.uk 5.00
  240. #240 govocal.com 5.00
  241. #241 govtrack.us 5.00
  242. #242 en.govwise.ai 5.00
  243. #243 localintelligencehub.com 5.00
  244. #244 marksoutoftenancy.com 5.00
  245. #245 martus.org 5.00
  246. #246 mastodonc.com 5.00
  247. #247 mptwitterbios.co.uk 5.00
  248. #248 mpwatch.org 5.00
  249. #249 myaction.center 5.00
  250. #250 notifications.service.gov.uk 5.00
  251. #251 objector.ai 5.00
  252. #252 openorigins.com 5.00
  253. #253 parallelparliament.co.uk 5.00
  254. #254 payments.service.gov.uk 5.00
  255. #255 plinth.org.uk 5.00
  256. #256 polimonitor.com 5.00
  257. #257 polimorphic.com 5.00
  258. #258 prolific.com 5.00
  259. #259 publiceditor.io 5.00
  260. #260 rightsdd.com 5.00
  261. #261 shineyoureye.org 5.00
  262. #262 sign-in.service.gov.uk 5.00
  263. #263 welivedit.ai 5.00
  264. #264 whatdotheyknow.com 5.00
  265. #265 whatgov.co.uk 5.00
  266. #266 workincharities.co.uk 5.00
  267. #267 writetothem.com 5.00
  268. #268 yoti.com 5.00
  269. #269 yrpri.org 5.00
  270. #270 aisafety.info 0.00
  271. #271 alaveteli.org 0.00
  272. #272 commonslibrary.org 0.00
  273. #273 constituteproject.org 0.00
  274. #274 coops.tech 0.00
  275. #275 cybersecurityfordemocracy.org 0.00
  276. #276 docs.holepunch.to 0.00
  277. #277 publications.ecnl.org 0.00
  278. #278 wiki.en.wikipedia.org 0.00
  279. #279 ai.fullfact.org 0.00
  280. #280 bellingcat.github.com 0.00
  281. #281 blacksky-algorithms.github.com 0.00
  282. #282 cavi-au.github.com 0.00
  283. #283 compdemocracy.github.com 0.00
  284. #284 datakind.github.com 0.00
  285. #285 g0v.github.com 0.00
  286. #286 mastodon.github.com 0.00
  287. #287 ytdl-org.github.com 0.00
  288. #288 frankensteinbill.gordonguthrie.github.io 0.00
  289. #289 guardianproject.info 0.00
  290. #290 joss.theoj.org 0.00
  291. #291 littlesis.org 0.00
  292. #292 murmurations.network 0.00
  293. #293 p.newpublic.substack.com 0.00
  294. #294 oa.works 0.00
  295. #295 opencollective.com 0.00
  296. #296 openprocurement.io 0.00
  297. #297 participedia.net 0.00
  298. #298 privacybadger.org 0.00
  299. #299 publicai.co 0.00
  300. #300 radicle.xyz 0.00
  301. #301 riseup.net 0.00
  302. #302 securedrop.org 0.00
  303. #303 shareyourpaper.org 0.00
  304. #304 spartacus.app 0.00
  305. #305 tech-coops.xyz 0.00
  306. #306 turbophonebank.com 0.00
  307. #307 vframe.io 0.00
  308. #308 web.archive.org 0.00
  309. #309 communitytech.network 0.00
  310. #310 projects.demnext.org 0.00
  311. #311 globaleaks.org 0.00
  312. #312 loomio.org 0.00
  313. #313 meet.coop 0.00
  314. #314 climate.mysociety.org 0.00
  315. #315 open-contracting.org 0.00
  316. #316 opencrvs.org 0.00
  317. #317 openownership.org 0.00
  318. #318 papertree.earth 0.00
  319. #319 torproject.org 0.00
  320. #320 wikidata.org 0.00
  321. #321 workerinfoexchange.org 0.00

What we're doing

The Political Technology Awards is an open evaluation exercise run by the 2025–26 Newspeak House fellowship cohort. We're building a public, inspectable ranking of civic and political technology projects — the kind of tools that help citizens understand institutions, participate in democracy, and hold power to account.

Our evaluation process is iterative and transparent. We use a scoring algorithm that evolves over time. Each version applies different heuristics and produces a ranked list. The algorithm lives in a public GitHub repo; you can inspect the code, the pull requests, and the rationale for every change. We may add written assessments per project as the evaluation matures.

Rankings are political. By making our process transparent and iterative, we hope to surface both strong projects and the tradeoffs inherent in any evaluation framework. A simple tool that empowers marginalized communities ranks higher than a technically impressive platform that reinforces existing power structures.

Read our process documentation →

Awards Committee

Our committee brings together expertise in civic technology, political science, digital rights, and community organizing. The committee is defined in the CODEOWNERS file in the evaluation repository — everyone listed there can approve changes to the algorithm and assessments.

FO Fred O'Brien
Asil Sidahmed Asil Sidahmed
Fatima Sarah Khalid Fatima Sarah Khalid
Gamithra Marga Gamithra Marga
Jamie Coombes Jamie Coombes
Francesca Galli Francesca Galli