Mental Health Innovations UK

Mental Health Innovations UK — Senior DevOps Engineer (Contract)

Mental Health Innovations UKSenior DevOps Engineer (Contract)

About the company

Mental Health InnovationsMental Health Innovations runs Shout, the UK's free 24/7 crisis text service, in partnership with the Duke and Duchess of Cambridge. The charity provides confidential support for anyone in crisis, with a strong focus on data protection and safe handling of sensitive conversations.

At Mental Health Innovations UK I had a pivotal role in developing and enhancing the organisation’s de-identification data pipeline, ensuring safe and compliant handling of sensitive data. I used Python with PySpark and spaCy for large-scale processing and NLP, applied test-driven development throughout, and ran performance testing on ARM64 AWS Graviton processors to improve efficiency and cost.

Role Location Period
Senior DevOps Engineer (Data Engineer focus) Greater London, England, United Kingdom May 2023 – Jan 2024

Key contributions

  1. Python development with PySpark and spaCy — Developed and optimised the de-identification pipeline using PySpark for large-scale data processing and spaCy for NLP tasks such as named entity recognition.
  2. Test-driven development — Implemented TDD with pytest and pylint to raise code quality and reliability and reduce defects.
  3. Performance testing on ARM64 — Carried out performance testing on AWS Graviton processors, leading to measurable gains in efficiency and cost-effectiveness.
  4. AWS suite — Used Glue, S3, EMR, CodePipeline, CodeBuild, Lambda, and DynamoDB for pipeline and deployment.
  5. Infrastructure as code — Used CloudFormation for consistent, repeatable deployment of infrastructure.
  6. Containers & EMR — Dockerised components and integrated with EMR Serverless for scalability and reliability.
  7. Documentation — Maintained documentation in Confluence and produced UML diagrams for architecture and communication.

Combining PySpark, spaCy, and AWS Graviton to build a robust, cost-effective de-identification pipeline that met both performance and compliance requirements.

Technologies

Area Tools & technologies
Data & NLP PySpark, spaCy, spark-nlp
Testing pytest, pylint, TDD
AWS Glue, S3, EMR, CodePipeline, CodeBuild, Lambda, DynamoDB
IaC & ops CloudFormation, EMR Serverless, Docker
Documentation Confluence, mkdocs, UML

References


    Posts


    References


      Posts