Job Description
The role involves developing, optimizing, and maintaining the company's big data platform, participating in the design and implementation of data collection, processing, modeling, and analysis systems to support business decision-making and intelligent applications.
Key Responsibilities
- Participate in big data platform architecture design and technology selection, developing and maintaining data processing pipelines (Data Pipeline).
- Implement data cleaning, ETL, aggregation, and computation tasks using frameworks like Hadoop/Spark/Flink.
- Develop and optimize offline analytical query systems such as Hive/Presto/ClickHouse.
- Design and implement real-time data stream processing (e.g., Kafka, Flink, Spark Streaming).
- Collaborate with data analysts and algorithm engineers to provide high-quality data interfaces and services.
- Optimize big data cluster performance, monitor resources, schedule tasks, and handle exceptions.
- Prepare technical documentation, establish development standards, and promote standardization and automation in data engineering.
Job Requirements
- Bachelor's degree or higher in Computer Science, Software Engineering, Data Engineering, or related fields.
- Proficiency in Hadoop ecosystem (HDFS, YARN, Hive, HBase, Spark, Flink, Kafka, etc.).
- Strong SQL skills and familiarity with at least one programming language (Python/Java/Scala).
- Experience in ETL development and data warehouse modeling (dimensional modeling, star/snowflake schemas).
- Familiarity with Linux environments, Shell scripting, and data scheduling tools (e.g., Airflow/Azkaban/DolphinScheduler).
- Cloud-based big data platform experience (AWS EMR, GCP BigQuery, Azure Synapse, Aliyun MaxCompute) is a plus.
- Strong communication, problem-solving, and independent project execution skills.
Preferred Qualifications
- Experience in real-time computing or log collection systems (e.g., Flink + Kafka + Druid).
- Knowledge of data security, privacy protection, and access control mechanisms.
- Experience supporting machine learning data processing workflows.
- Background in large-scale internet or financial industry projects.
Benefits
Global remote work option, competitive salary, annual leave, positive team culture, and supportive company environment.


