Contract Spy
Remote (United Kingdom)
hiring experienced Python Engineers to support a variety of high-impact research collaborations with leading AI labs. Freelancers will help improve AI systems through work extending coding benchmarks that reflect real-world development across diverse languages and domains.
This is a unique opportunity to apply your engineering expertise toward shaping the next generation of intelligent systems.
Key Responsibilities
Develop and validate coding benchmarks in Python by curating issues, solutions, and test suites from real-world repositories
Ensure benchmark tasks include comprehensive unit and integration tests for solution verification
Maintain consistency and scalability of benchmark task distribution
Provide structured feedback on solution quality and clarity
Debug, optimize, and document benchmark code for reliability and reproducibility
Ideal Qualifications
3–10 years of experience as a backend software engineer, ML engineer,...