Job Description
We are seeking a skilled Big Data Engineer to design, develop, and optimize ETL processes while ensuring data accuracy, completeness, and timeliness. The role involves collaborating with cross-functional teams to implement efficient data solutions and support business needs.
Key Responsibilities
- Design, develop, and optimize big data ETL processes to meet business requirements
- Participate in data warehouse architecture design and develop appropriate ETL solutions
- Develop Spark applications for large-scale data processing, including data cleaning, transformation, and loading
- Optimize Spark job performance to improve efficiency and reduce resource consumption
- Write Python scripts for data collection, preprocessing, and monitoring tasks
- Integrate Python code with Spark applications for complex data workflows
- Develop in PySpark environment to leverage combined advantages of Python and Spark
- Troubleshoot PySpark technical issues including data type conversion and performance optimization
- Implement data quality monitoring strategies and conduct ETL quality checks
- Establish data quality reporting mechanisms and provide decision-making support
- Collaborate with data analysts, scientists, and warehouse engineers on projects
- Participate in technical knowledge sharing to improve team capabilities
Job Requirements
- Strong experience in big data ETL process design and optimization
- Proficiency in Spark application development and performance tuning
- Expertise in Python programming for data processing tasks
- Hands-on experience with PySpark integration and development
- Knowledge of data quality assurance methodologies and tools
- Understanding of data warehouse architecture principles
- Ability to troubleshoot complex data processing issues
- Excellent collaboration and communication skills
- Experience working in cross-functional data teams
- Continuous learning mindset and knowledge sharing attitude
Preferred Qualifications
- Experience with additional big data technologies (Hadoop, Hive, etc.)
- Knowledge of cloud-based data platforms (AWS, Azure, GCP)
- Familiarity with data visualization and reporting tools
- Understanding of machine learning concepts and applications
- Previous experience in implementing data governance frameworks