image
  • Snapboard
  • Activity
  • Reports
  • Campaign
Welcome ,
loadingbar
Loading, Please wait..!!

AI Data Architect

  • ... Posted on: Apr 16, 2026
  • ... Genzeon
  • ... Exton, Pennsylvania
  • ... Salary: Not Available
  • ... Full-time

AI Data Architect   

Job Title :

AI Data Architect

Job Type :

Full-time

Job Location :

Exton Pennsylvania United States

Remote :

No

Jobcon Logo Job Description :

Job Description

AI Data Architect | Healthcare AI Platform

Genzeon Corporation — Healthcare Division

Exton, PA / Hybrid | 0–4 years | Full-time

AI native Product Architect-Exp in data engineering needed for product build out


The short version: We run a multi-model AI pipeline that processes 150K Medicare documents/year — faxed PDFs, EDI transactions, FHIR data, clinical notes. You’ll design and build the data architecture that ingests, stores, governs, and serves all of it to AI models and clinical reviewers. On-prem GPUs, hybrid cloud, HIPAA compliance. This is the real thing.


What you’ll do:

Design the end-to-end data architecture for a healthcare AI platform — ingestion,storage, processing, serving, governance Build pipelines for heterogeneous healthcare data: faxed PDFs, X12 EDI (835/837/278),FHIR R4, HL7v2, CMS files, unstructured clinical notes Architect the data lake/lakehouse layer (Apache Iceberg, MinIO, DuckDB,PostgreSQL/pgvector)

Design the embedding and vector storage layer that powers RAG — chunking, indexing, retrieval optimization Build data lineage tracking from source document to AI decision

Implement HIPAA/HITRUST data governance — encryption, access controls, audit logging, PHI handling Monitor data quality across the pipeline — schema drift, completeness, freshness, anomalies

Optimize for hybrid infrastructure: on-prem GPUs (RTX 5090, L40S), NAS, Azure GovCloud, Azure Commercial


What you need:

A data pipeline you’ve built that ran in production (we’ll ask about it)

SQL fluency and Python proficiency

Experience with at least one of: Spark, dbt, Airflow, Dagster, Prefect

Hands-on work with unstructured or semi-structured data — PDFs, images, OCR outputs, free text

Practical understanding of vector databases, embeddings, and how RAG systems consume data

Comfort with on-premises infrastructure, not just managed cloud services

Data quality and governance as instincts, not afterthoughts


Strong signals:

Healthcare data formats (X12 EDI, FHIR, HL7, CCD/C-CDA)

Apache Iceberg, Delta Lake, or modern table formats

MinIO / S3 / object storage architecture

pgvector, Pinecone, Weaviate, or similar vector stores

DuckDB or embedded analytical engines

HIPAA technical safeguards implementation

ML data pipelines — training data, feature stores, evaluation sets, feedback loops


We don’t require:

A data engineering bootcamp cert

Mastery of the entire “modern data stack”

Prior healthcare experience (but it helps)


A specific degree

To apply, submit:

1. Resume

2. Link to a data project you’ve built (GitHub, architecture diagram, write-up)

3. 200 words max: “Describe the messiest data problem you’ve encountered. How did you

solve it?”

View Full Description

Jobcon Logo Position Details

Posted:

Apr 16, 2026

Reference Number:

4cf389e72a6483eb

Employment:

Full-time

Salary:

Not Available

City:

Exton

Job Origin:

ziprecruiter

Share this job:

  • linkedin

Jobcon Logo
A job sourcing event
In Dallas Fort Worth
Aug 19, 2017 9am-6pm
All job seekers welcome!

AI Data Architect    Apply

Click on the below icons to share this job to Linkedin, Twitter!

Job Description

AI Data Architect | Healthcare AI Platform

Genzeon Corporation — Healthcare Division

Exton, PA / Hybrid | 0–4 years | Full-time

AI native Product Architect-Exp in data engineering needed for product build out


The short version: We run a multi-model AI pipeline that processes 150K Medicare documents/year — faxed PDFs, EDI transactions, FHIR data, clinical notes. You’ll design and build the data architecture that ingests, stores, governs, and serves all of it to AI models and clinical reviewers. On-prem GPUs, hybrid cloud, HIPAA compliance. This is the real thing.


What you’ll do:

Design the end-to-end data architecture for a healthcare AI platform — ingestion,storage, processing, serving, governance Build pipelines for heterogeneous healthcare data: faxed PDFs, X12 EDI (835/837/278),FHIR R4, HL7v2, CMS files, unstructured clinical notes Architect the data lake/lakehouse layer (Apache Iceberg, MinIO, DuckDB,PostgreSQL/pgvector)

Design the embedding and vector storage layer that powers RAG — chunking, indexing, retrieval optimization Build data lineage tracking from source document to AI decision

Implement HIPAA/HITRUST data governance — encryption, access controls, audit logging, PHI handling Monitor data quality across the pipeline — schema drift, completeness, freshness, anomalies

Optimize for hybrid infrastructure: on-prem GPUs (RTX 5090, L40S), NAS, Azure GovCloud, Azure Commercial


What you need:

A data pipeline you’ve built that ran in production (we’ll ask about it)

SQL fluency and Python proficiency

Experience with at least one of: Spark, dbt, Airflow, Dagster, Prefect

Hands-on work with unstructured or semi-structured data — PDFs, images, OCR outputs, free text

Practical understanding of vector databases, embeddings, and how RAG systems consume data

Comfort with on-premises infrastructure, not just managed cloud services

Data quality and governance as instincts, not afterthoughts


Strong signals:

Healthcare data formats (X12 EDI, FHIR, HL7, CCD/C-CDA)

Apache Iceberg, Delta Lake, or modern table formats

MinIO / S3 / object storage architecture

pgvector, Pinecone, Weaviate, or similar vector stores

DuckDB or embedded analytical engines

HIPAA technical safeguards implementation

ML data pipelines — training data, feature stores, evaluation sets, feedback loops


We don’t require:

A data engineering bootcamp cert

Mastery of the entire “modern data stack”

Prior healthcare experience (but it helps)


A specific degree

To apply, submit:

1. Resume

2. Link to a data project you’ve built (GitHub, architecture diagram, write-up)

3. 200 words max: “Describe the messiest data problem you’ve encountered. How did you

solve it?”

Loading
Please wait..!!