Contract Spy
Remote
Full job description
Partnering with a top AI research organization to evaluate and improve how coding assistants reason, act, and communicate during development workflows. We’re seeking technically sharp experts (especially those with experience in code review, testing, or documentation) to assess full transcripts of user–AI coding conversations. This short-term engagement helps shape the future of developer-assisting AI systems.
Key Responsibilities
Review long-form transcripts between users and AI coding assistants
Analyze the AI’s logic, execution, and stated actions in detail
Score each transcript using a 10-point rubric across multiple criteria
Optionally write brief justifications citing examples from the dialogue
Detect mismatches between claims and actions (e.g., saying “I’ll run tests” but not doing so)
Ideal Qualifications
Top choices:
Senior or Staff...