⚠️ Only available to applicants residing in Latin America ⚠️
An AI venture providing fundamentally different augmented intelligence software across a variety of healthcare applications, including life sciences, drug discovery, payment integrity and medical claims.
Founded in 2019, Company is made up of an interdisciplinary team with expertise across life sciences, technology and enterprise business who think differently about technology and its impact on our life. Validated at Washington University in St. Louis, Company technology for Life Sciences has been used in 20+ publications, with additional publications still in progress.
About the role:
We are looking for a mid-level data engineer to join our team. The ideal candidate will have experience working with Python, PySpark, SQL, and Databricks.
In this role, you will be responsible for designing and implementing data pipelines, as well as maintaining and optimizing existing ones. You will work closely with our team of data scientists and analysts to ensure that data is stored, processed, and analyzed efficiently.
Responsibilities:
● Key Responsibilities would be to design, build, and maintain data pipelines using PySpark, ensuring data is collected, processed, and made available for analysis in a timely and efficient manner. Detailed tasks include:
○ Extracting data from primary and secondary sources
○ Removing corrupted data and fixing coding errors and related problems
○ Performing analysis to assess the quality and meaning of data
○ Filter Data by reviewing reports and performance indicators to identify and correct code problems
● Design, develop, and maintain data-centric software solutions using Python.
● Collaborate with data scientists and product owners to understand requirements and transform them into scalable data engineering solutions.
● Optimize data pipelines for performance, scalability, and reliability, working with large volumes of structured and unstructured data.
● Perform data quality assurance, validation, and debugging to ensure accurate and reliable results.
● Develop and maintain data ingestion processes from various sources (e.g., databases, APIs, flat files).
● Create and manage config-driven data transformation workflows, including data cleansing, aggregation, and enrichment.
● Write clean, maintainable, and well-documented code.
● Collaborate within and across teams for knowledge sharing, code reviews, and general tech synchronization.
What they’re looking for:
- Bachelor's Degree in Computer Science, Engineering, or a related field.
- 3+ Years of experience working as a Data Engineer.
- 3+ Years of experience working with Python.
- 1+ Years of experience working with SQL.
- 1+ Years of experience working with PySpark.
- 1+ Years of experience working with Databricks.
- Advanced English level.
Nice to have:
Medallion, Data Lake, Data Modeling, Data Storage, NoSQL, Apache Airflow. Dagster and CI/CD
Time Zone:
EST
😎 Ready to embark on an inspiring journey? Become part of our client community today, access global opportunities, and take your technology career to the next level.