Contract Spy
Remote
Partnering with a top AI research organization to evaluate and improve how coding assistants reason, act, and communicate during development workflows. We’re seeking technically sharp experts (especially those with experience in code review, testing, or documentation) to assess full transcripts of user–AI coding conversations. This short-term engagement helps shape the future of developer-assisting AI systems. Key Responsibilities Review long-form transcripts between users and AI coding assistants Analyze the AI’s logic, execution, and stated actions in detail Score each transcript using a 10-point rubric across multiple criteria Optionally write brief justifications citing examples from the dialogue Detect mismatches between claims and actions (e.g., saying “I’ll run tests” but not doing so) Ideal Qualifications Top choices: Senior or Staff Engineers with deep code review experience and execution insight QA Engineers...