Job Description
1. Responsible for building the company's end-to-end data architecture, covering both offline batch processing and real-time streaming pipelines.
2. Establish and maintain ETL/ELT processes for data collection, synchronization, cleaning, and transformation to ensure accurate and efficient data flow into the system.
3. Implement real-time data collection, cleaning, aggregation, and metric calculation using Flink to guarantee data timeliness and accuracy.
4. Design and implement a layered data warehouse architecture, including data modeling, dimension design, and unified metric definitions to create a reusable and maintainable data foundation.
5. Develop data interfaces, reports, and basic data services to support analytical, decision-making, and operational needs across business departments (operations, sales, product).
6. Monitor data quality, troubleshoot issues, and optimize processes to ensure data accuracy, completeness, and timeliness while establishing basic data governance standards.
7. Rapidly respond to business data requests, optimize existing data workflows and SQL/scripting jobs to improve both batch and real-time processing performance while reducing maintenance costs.
Key Responsibilities
- Design and maintain scalable data infrastructure supporting both historical and real-time analytics
- Develop robust data pipelines with proper error handling and monitoring mechanisms
- Collaborate with cross-functional teams to understand data requirements and deliver solutions
- Document data processes, models, and standards for knowledge sharing
- Continuously evaluate and implement new technologies to enhance data capabilities
Job Requirements
- Bachelor's degree or higher in Computer Science, Mathematics or related field with 5+ years of data development and warehouse experience
- Expert SQL skills including complex queries and stored procedures across MySQL, Hive, Paimon, HBase
- Proficiency in Java/Python for ETL scripting and data processing job development
- Hands-on experience with Spark, Flink, Kafka for data synchronization and processing optimization
- Strong data warehouse modeling skills with understanding of design principles
- Excellent problem-solving abilities for troubleshooting data anomalies and performance issues
- Experience in both startups and established companies preferred, with adaptability to multi-role responsibilities
- Familiarity with Alibaba Cloud data platforms (MaxCompute, DataWorks, etc.) is a plus
- Background in finance, trading, or payment domains with relevant data scenarios preferred
Benefits
Remote work options | Minimum 10 days annual leave | 5 days paid sick leave | Positive work environment
Mandatory requirement: Double First-Class university degree