Data Engineer
DS Technologies Inc
San Francisco, CA$150,000 a yearFull Time
Data ScientistApply with AI Cover Letter
Job Description
About US: We are a company that provides innovative, transformative IT services and solutions. We are passionate about helping our clients achieve their goals and exceed their expectations. We strive to provide the best possible experience for our clients and employees. We are committed to continuous improvement and innovation, and we are always looking for ways to improve our services and solutions. We believe in working collaboratively with our clients and employees to achieve success.
DS Technologies Inc is looking for Data Engineer role for one of our premier clients.
Job Title: Data Engineer
Location: San Francisco Bay Area, CA (or) New York City, NY (Onsite)
Position Type: Full-Time
Experience: 4-7 Years
Note: Any VISA'S are accepted
Primary focus : Model reproduction, feature engineering logic, performance validation, and ensuring alignment with client's’s established modelling frameworks.
• Rebuild and port existing client's Python based models into customer’s Databricks platform.
• Develop, train, and validate predictive models using Python, PySpark, and ML frameworks such as scikitlearn, XGBoost, and Spark MLlib.
• Develop, validate and reproduce feature engineering logic and ensure parity with client's models.
• Train, retain, validate, and benchmark model performance using customer provided datasets while maintain performance parity with baseline models.
• Work with data engineers to define feature requirements and ensure datasets support model needs.
• Perform model diagnostics, bias checks, stability checks, and accuracy assessments.
• Prepare model documentation, validation summaries, and stakeholder ready insights.
• Support scoring pipeline design and ensure reproducibility across Dev/QA/Prod.
• Collaborate with compliance and platform teams to ensure adherence to governance.
• Perform model diagnostics, hyperparameter tuning, and stability analysis.
• Evaluate model performance across population segments and time periods.
• Work with platform and engineering teams to support scoring pipeline deployment across Dev/QA/Prod.
Qualifications:
• 4–6 years of experience in applied machine learning or data science.
• Strong hands-on experience with Python, scikit-learn, XGBoost, LightGBM, CatBoost, or similar libraries.
• Experience developing ML models in Databricks with Python or PySpark.
• Strong knowledge of feature engineering, model training workflows, and evaluation techniques.
• Experience working with large structured datasets (financial or transactional data preferred).
• Ability to write clear documentation and communicate technical results to non-technical stakeholders.
• 4+ years of hands-on experience developing, deploying, and maintaining machine-learning models.
• Advanced proficiency in Python (NumPy, pandas, scikit-learn, PyTorch or TensorFlow).
• Strong statistical and mathematical foundation, including regression, classification, probability, optimization, etc.
• Experience building end-to-end ML pipelines: data ingestion, cleaning, feature engineering, modeling, evaluation, deployment.
• Experience working within client environments, including adapting to unfamiliar infrastructure, constraints, and security requirements.
• Experience with cloud platforms (AWS, Azure, or GCP) and on-prem environments.
• Advanced SQL ability and experience with big-data tools (Spark, Databricks, Hadoop).
DS Technologies Inc is looking for Data Engineer role for one of our premier clients.
Job Title: Data Engineer
Location: San Francisco Bay Area, CA (or) New York City, NY (Onsite)
Position Type: Full-Time
Experience: 4-7 Years
Note: Any VISA'S are accepted
Primary focus : Model reproduction, feature engineering logic, performance validation, and ensuring alignment with client's’s established modelling frameworks.
• Rebuild and port existing client's Python based models into customer’s Databricks platform.
• Develop, train, and validate predictive models using Python, PySpark, and ML frameworks such as scikitlearn, XGBoost, and Spark MLlib.
• Develop, validate and reproduce feature engineering logic and ensure parity with client's models.
• Train, retain, validate, and benchmark model performance using customer provided datasets while maintain performance parity with baseline models.
• Work with data engineers to define feature requirements and ensure datasets support model needs.
• Perform model diagnostics, bias checks, stability checks, and accuracy assessments.
• Prepare model documentation, validation summaries, and stakeholder ready insights.
• Support scoring pipeline design and ensure reproducibility across Dev/QA/Prod.
• Collaborate with compliance and platform teams to ensure adherence to governance.
• Perform model diagnostics, hyperparameter tuning, and stability analysis.
• Evaluate model performance across population segments and time periods.
• Work with platform and engineering teams to support scoring pipeline deployment across Dev/QA/Prod.
Qualifications:
• 4–6 years of experience in applied machine learning or data science.
• Strong hands-on experience with Python, scikit-learn, XGBoost, LightGBM, CatBoost, or similar libraries.
• Experience developing ML models in Databricks with Python or PySpark.
• Strong knowledge of feature engineering, model training workflows, and evaluation techniques.
• Experience working with large structured datasets (financial or transactional data preferred).
• Ability to write clear documentation and communicate technical results to non-technical stakeholders.
• 4+ years of hands-on experience developing, deploying, and maintaining machine-learning models.
• Advanced proficiency in Python (NumPy, pandas, scikit-learn, PyTorch or TensorFlow).
• Strong statistical and mathematical foundation, including regression, classification, probability, optimization, etc.
• Experience building end-to-end ML pipelines: data ingestion, cleaning, feature engineering, modeling, evaluation, deployment.
• Experience working within client environments, including adapting to unfamiliar infrastructure, constraints, and security requirements.
• Experience with cloud platforms (AWS, Azure, or GCP) and on-prem environments.
• Advanced SQL ability and experience with big-data tools (Spark, Databricks, Hadoop).