We are seeking a skilled Synthetic Data Engineer to design and generate realistic synthetic datasets that protect privacy and support robust machine learning. The role is ideal for engineers with strong data engineering and generative modelling experience. You will work closely with data scientists, privacy officers and software engineers to deliver production-ready synthetic data solutions.
This job description outlines the core responsibilities, required skills and qualifications for a Synthetic Data Engineer. It is intended for recruiters, HR professionals and staffing agencies who wish to attract top talent in data synthesis and privacy engineering.
Synthetic Data Engineer Job Profile
The Synthetic Data Engineer develops and implements techniques to create synthetic datasets that replicate real-world distributions while preserving individual privacy. They design pipelines, select generative models and establish validation metrics to ensure data utility and safety.
Reporting to the data platform lead or head of machine learning, the role bridges data engineering, machine learning and privacy teams to operationalise synthetic data for testing, model training and analytics.
Synthetic Data Engineer Job Description
A Synthetic Data Engineer is responsible for designing scalable systems to generate high-quality synthetic data across structured, time series and unstructured formats. The role involves evaluating and selecting generative approaches such as variational autoencoders, generative adversarial networks and probabilistic graphical models, and implementing privacy-preserving mechanisms, including differential privacy and k-anonymisation techniques. The engineer will integrate synthetic data workflows into CI CD pipelines and cloud data platforms to support repeatable production use.
Key tasks include building data transformation pipelines, training and fine-tuning generative models, establishing automated validation suites to compare synthetic and real distributions, and documenting privacy risk assessments. The engineer will collaborate with stakeholders to identify use cases, define utility metrics and iterate on synthetic data strategies to balance fidelity, privacy and performance. Strong coding skills, familiarity with large-scale data tooling and a pragmatic approach to model evaluation are essential.
Synthetic Data Engineer Duties and Responsibilities
- Design and implement synthetic data generation pipelines for tabular, time series and image data.
- Select and develop generative models such as GANs, VAEs and autoregressive models.
- Apply privacy-preserving techniques, including differential privacy and anonymisation.
- Integrate synthetic data workflows into cloud platforms and CI CD pipelines.
- Create validation and utility metrics to assess fidelity and detect mode collapse or bias.
- Collaborate with data scientists, privacy teams and product owners on use cases and requirements.
- Perform data quality testing and maintain documentation for reproducibility and compliance.
- Optimise model training for performance and cost on distributed compute resources.
- Monitor and maintain production synthetic datasets and remediation procedures.
Synthetic Data Engineer Requirements and Qualifications
- Bachelor's degree in Computer Science, Statistics, Mathematics or related field; Master's degree preferred.
- Proven experience in machine learning and data engineering, with hands-on work in generative models.
- Strong programming skills in Python and familiarity with libraries such as PyTorch, TensorFlow and scikit learn.
- Experience with cloud platforms (AWS, GCP or Azure) and data engineering tools such as Spark or Kafka.
- Knowledge of privacy concepts, including differential privacy, k-anonymisation and privacy risk assessment.
- Experience designing validation metrics and statistical tests to compare real and synthetic data distributions.
- Familiarity with containerisation and CI/CD tools for production deployment.
- Excellent communication skills and ability to work with cross-functional teams.
- Attention to detail, strong analytical skills, and a pragmatic approach to trade-offs between privacy and utility.
