Data Scientist (Part-Time | Remote | $100 –$120/hr)

<p><strong>Data Scientist (AI Task Evaluation & Statistical Analysis Specialist)</strong></p><p><strong>Hourly Contract | Part-Time Remote | $100 –$120 per hour</strong></p><p><strong>1. About the Role</strong></p><p>Mercor is partnering with a <strong>leading AI research lab</strong> to hire experienced <strong>Data Scientists</strong> specializing in <strong>AI task evaluation and statistical analysis</strong>.</p><p>In this role, you will conduct <strong>comprehensive failure analysis</strong> on AI agent performance across finance-sector tasks — identifying systemic patterns, diagnosing performance bottlenecks, and improving model evaluation frameworks.</p><p>You’ll work closely with AI engineers and research analysts to transform raw evaluation data into actionable insights, strengthening the quality, fairness, and reliability of large-scale AI systems.</p><p><strong>2. Key Responsibilities</strong></p><ul><li><strong>Statistical Failure Analysis:</strong> Identify recurring patterns in AI agent failures across task components (prompts, rubrics, file types, tags, etc.).</li><li><strong>Root Cause Analysis:</strong> Determine whether issues stem from <strong>task design</strong>, <strong>rubric clarity</strong>, <strong>file complexity</strong>, or <strong>agent limitations</strong>.</li><li><strong>Dimensional Analysis:</strong> Examine performance variations across <strong>finance sub-domains</strong>, file structures, and evaluation criteria.</li><li><strong>Visualization & Reporting:</strong> Build <strong>dashboards and analytical reports</strong> that highlight edge cases, performance clusters, and opportunities for improvement.</li><li><strong>Framework Enhancement:</strong> Recommend refinements to <strong>rubric design, evaluation metrics, and task structures</strong> based on empirical findings.</li><li><strong>Stakeholder Communication:</strong> Present key insights to <strong>data labeling teams, ML engineers, and research collaborators</strong>.</li></ul><p><strong>3. Required Qualifications</strong></p><ul><li>Strong foundation in <strong>statistical analysis, hypothesis testing, and pattern recognition</strong>.</li><li>Proficiency in <strong>Python</strong> (pandas, scipy, matplotlib/seaborn) or <strong>R</strong> for data analysis.</li><li>Hands-on experience with <strong>exploratory data analysis (EDA)</strong> and <strong>feature interpretation</strong>.</li><li>Understanding of <strong>AI/ML evaluation methodologies</strong> and <strong>LLM performance metrics</strong>.</li><li>Skilled in using <strong>Excel</strong>, <strong>SQL</strong>, and <strong>data visualization tools</strong> (e.g., Tableau, Looker).</li></ul><p><strong>4. Preferred Qualifications</strong></p><ul><li>Experience with <strong>AI/ML model evaluation</strong> or <strong>quality assurance pipelines</strong>.</li><li>Background in <strong>finance</strong> or interest in learning financial domain structures.</li><li>Familiarity with <strong>benchmark datasets</strong>, <strong>failure mode analysis</strong>, and <strong>evaluation frameworks</strong>.</li><li><strong>2–4 years</strong> of relevant professional experience in data science, analytics, or applied statistics.</li></ul><p><strong>5. More About the Opportunity</strong></p><ul><li><strong>Commitment:</strong> Part-time, 20–25 hours/week</li><li><strong>Schedule:</strong> Fully remote and asynchronous — work on your own time</li><li><strong>Duration:</strong> 1–2 months, with strong potential for extension</li><li><strong>Start Date:</strong> Immediate</li></ul><p><strong>6. Compensation & Contract Terms</strong></p><ul><li><strong>Hourly Rate:</strong> $100–$120/hour (based on experience and region)</li><li><strong>Classification:</strong> Independent Contractor (via Mercor)</li><li><strong>Payments:</strong> Weekly via <strong>Stripe Connect</strong> for approved work</li></ul><p>⚡ <strong>PS: Mercor reviews applications daily. Please complete your interview and onboarding steps to be considered for this opportunity.</strong> ⚡</p>

Back to blog