DescriptionAbout Groov:
Groov is a Workplace Science & Analytics platform that is on a mission to make workplaces better by applying science to workplace data to generate and deliver actionable insights in the flow of work for all people in an organization: individual contributors, managers and leaders. These insights are tailored to the organization’s culture, structure and strategy so that they can effectively improve productivity, morale and job satisfaction. Our team of industrial and organizational psychologists, workplace scientists, data scientists, engineers, product specialists and user experience experts are building the future of workplace science and analytics together.
We work with enterprises to understand their workplace problems and develop strategies and solutions that take advantage of opportunities to improve their core business. Key to this endeavor is the use of large dynamic data sets of passive and active data to build Statistical, Machine Learning and Artificial Intelligence models grounded in cutting edge science, to develop insights and interventions and then test, learn and optimize the efficacy of these insights and interventions. Groov has demonstrated the power that real time actionable insights can have on workplaces to improve both performance and employee care.
Role Overview:
We are looking for an experienced Data Scientist to be part of our Data Science team and play a pivotal role in the creation, development and delivery of our product. This would start with helping to execute our 2025 data science roadmap. This role involves significant contributions to our ML- and AI-driven product features, including analyzing the extensive the data available, continuing the development of individual workplace science constructs and the Groov Index, modeling Patterns of Work, extracting insights and optimizing interventions in the form of prompts that are delivered in the flow of work through workplace channels such as Slack and MS Teams.
The position requires a strong foundation in Data Science and, ideally, experience in leveraging workplace data (e.g.. MS365, CRM, Slack, Servicing platforms, and software engineering & revenue intelligence platforms) as well as experience using test and learn strategies for dynamic optimization of user interactions.
The Senior Director of Data Science will be expected to be a player/coach who can provide hands-on development of our product, lead our small and growing Data Science team and be a thought leader/partner with our workplace scientists, engineers, product managers and client success team.
Key Responsibilities:
- Develop, train, and deploy statistical, machine learning and AI models to enhance the personalization and effectiveness of Groov's products such as personalized prompts and intelligent Q&A systems.
- Perform extensive data wrangling and analysis to clean, preprocess, and interpret large dynamic datasets, ensuring data quality and accessibility for modeling.
- Design and implement ETL pipelines using tools like Pyspark, Python, AWS Glue, and Airtable, to efficiently manage data workflows.
- Collaborate with product and business teams to translate business needs into technical specifications and data-driven solutions that drive product innovation.
- Utilize R and Python for statistical analysis, hypothesis testing, and building predictive models that inform product development and business strategies.
- Engage in business analysis to quantify the impact of product features on user engagement and organizational outcomes, contributing to strategic decision-making.
- Ensure the scalability, reliability, and performance of data models and pipelines, supporting their deployment in a production environment.
- Contribute to the creation of dynamic reports and dashboards using Power BI and R Shiny, facilitating data-driven insights for internal and external stakeholders.
Basic Qualifications:
- Bachelor's degree in a quantitative field such as statistics, mathematics, data science, business analytics, economics, finance, engineering, or computer science
- 5+ years of data scientist experience and experience mentoring junior team members, leading code reviews, and driving projects that require collaboration across teams
- Deep understanding of statistical methods, hypothesis testing, regression analysis, and Bayesian inference. Strong at exploratory data analysis and handling complex data transformations
- Advanced in Python (for ML and data processing) and R (for analysis). Comfortable with Jupyter Notebooks and VS Code. Uses Git and manages code in GitHub or Bitbucket
- 4+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance
- Experienced with big data tools like Spark (beyond PySpark in Glue) or Hadoop, plus familiarity with real-time data streaming technologies like Kafka
- Advanced SQL skills for handling complex queries, transforming data, and optimizing performance on large datasets
- Experience applying theoretical models in an real-world environment
Preferred Qualifications:
- Masters or Ph.D. in a quantitative field such as statistics, mathematics, data science, business analytics, economics, finance, engineering, or computer science
- 7+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
- Skill in building and deploying solutions using Large Language Models (LLMs) on AWS, such as Llama 3 and Claude 3/3.5 (especially with SageMaker and Bedrock).
- Experience with generative AI models, particularly with self-hosted, open source models
- Experience in creating AI agents, anomaly detection, clustering, and time-series models
- Strong AWS data engineering skills, including experience with AWS Glue for ETL, Athena for SQL querying, Step Functions for orchestration, and managing data lakes on S3
- Knowledge of how to build cost-effective, scalable data workflows
- Experience in deploying production-grade ML models and familiarity with MLOps tools like MLflow to manage model versioning, tracking, and monitoring in production.