# Data Engineer - Bilingual Mandarin required

**Company:** [CWILL](http://jobs.workable.com/companies/3pBzndN1tf1QvG8WDwNGii.md)
**Location:** Los Angeles, United States
**Workplace:** hybrid
**Employment type:** Full-time

[Apply for this job](http://jobs.workable.com/view/ccafdebb-030d-44a0-b52e-adab7c2b0db5)

## Description

_CWILL (pronounced "quill") is the post-purchase and retention suite built for Shopify._

With strong product-market fit and expanding US operations, we're building out our security and compliance capabilities to meet global data privacy standards.

Learn more: [www.cwill.com](https://www.cwill.com)

**I. Basic Information**

**Work Authorization**

Green Card / U.S. Citizen required (we do nor sponsor)

**Job Title**

Data Engineer

**Focus Areas**

Data ingestion, data lakehouse, data warehouse, data platform, data service APIs, data quality & engineering agent development

**Level**

Junior to mid-level with high growth potential

**Location**

United States — on-site, remote, or hybrid (per company requirements)

**Employment Type**

Full-time

**Collaborating Teams**

CWILL Data Engineering, Data Analytics, Business, Product, and Technology teams

**Language**

English required; Mandarin is a strong plus

**Cross-Timezone Work**

Must maintain a regular collaboration window with the China team; strong async communication and documentation skills required (approx. 2 hrs/day overlap needed)

**Collaboration Frequency**

Every 1–2 days; approx. 2 hrs per session. Candidates in western US time zones preferred for scheduling.

**II. Role Positioning**

CWILL is building data infrastructure to support business operations, product capabilities, customer service, analytics, and intelligent applications. As a US-side data engineer, you will participate in multi-source data ingestion, data lakehouse and warehouse development, data quality governance, data platform capability building, and AI Agent engineering automation exploration.

We are looking for candidates with a solid foundation in SQL, Python, and data engineering — someone who can, with guidance from the existing data team, progressively take ownership of data ingestion, modeling, quality, and service tasks, while collaborating effectively with domestic data engineering, analytics, and business teams.

This is not a pure data analysis, BI reporting, or one-off scripting role. It is a comprehensive data engineering position focused on data integration, data warehouse development, data platform capabilities, data services, and engineering automation.

**III. Role Mission**

Through stable, well-structured, and scalable data engineering capabilities, help the company unify, govern, model, and serve data scattered across business systems, SaaS platforms, external channels, and internal systems — improving the usability, accuracy, timeliness, and reusability of CWILL’s data assets.

This role is expected to continuously drive:

• More standardized data source ingestion

• Clearer data lakehouse and warehouse structure

• More automated data quality monitoring

• More platform-driven data service capabilities

• Progressive adoption of agent-based and automated approaches for data development, troubleshooting, documentation, and quality checks

**IV. Key Responsibilities**

**1\. Data Ingestion & Pipeline Development**

• Ingest data from internal and external business systems, third-party platforms, SaaS products, and external data sources; handle data collection, sync, cleansing, and loading

• Participate in building offline and real-time data pipelines using SeaTunnel, Kafka, Flink, Spark, or similar technologies to improve ingestion stability and processing efficiency

• Handle practical challenges in data sync: authentication, pagination, rate limiting, failure retry, incremental sync, backfill, schema changes, and task anomalies

**2\. Data Warehouse & Data Modeling**

• Participate in layered data warehouse development across ODS, DWD, DWS, and ADS layers; build and maintain data models

• Support business domain modeling, metric standardization, shared data model development, and core table maintenance

• Optimize data organization and query performance on OLAP engines such as Doris to provide stable data support for product, operations, growth, customer success, and management analytics

**3\. Data Quality & Data Governance**

• Build and maintain data quality rules for core data pipelines; ensure data accuracy, completeness, consistency, and timeliness

• Participate in data validation, anomaly detection, alerting, and issue resolution; help improve stability of critical data pipelines

• Contribute to data governance capabilities including DataHub or similar tools; improve metadata management, data lineage, data asset catalog, and data standards

**4\. Data Platform & Data Services**

• Participate in building data platform capabilities including data development, task scheduling, monitoring, quality management, governance, and service delivery modules

• Use tools such as DolphinScheduler and StreamPark for task management, scheduling orchestration, and real-time task operations

• Support the data service layer by delivering standardized APIs, metric services, and data capabilities to internal systems, analytics applications, and business tools

• Support underlying data for tools like Superset; ensure data availability for BI dashboards, metric boards, and business monitoring

**5\. AI Agent & Engineering Automation**

• Participate in designing and implementing data development automation tools and engineering agents

• Explore AI agent applications in data development, governance, quality detection, task operations, anomaly diagnosis, and documentation generation

• Leverage large language models and automation tools to improve data engineering efficiency, task stability, and platform intelligence

## Requirements

**Must-Have**

**Experience**

• 1–4 years of experience in data engineering, data platforms, data warehousing, backend development, analytics engineering, or a related role

• Real project experience in data ingestion, data pipelines, data warehouse development, data modeling, data services, or data platform work

• Strong learning ability and execution skills; able to independently drive small-to-medium data engineering tasks with clear objectives

**SQL Skills**

• Proficient in SQL for querying, cleansing, aggregation, deduplication, comparison, validation, and metric calculation

• Familiar with joins, window functions, CTEs, aggregation analysis, incremental logic, and basic performance optimization

• Understands data warehouse layering concepts: fact tables, dimension tables, subject domains, metric definitions, and shared models

**Data Development**

• Proficient in Java or Python for API integration, data processing, automation scripting, and file handling

• Understands common engineering patterns: REST APIs, OAuth/API keys, pagination, rate limiting, retry logic, error handling, logging, and task idempotency

• Good code structure habits; writes clean, maintainable, and reusable code

• Familiar with Git, code review practices, README documentation, logging, testing, and collaborative engineering workflows

**Pipeline & Platform Tools**

• Familiar with one or more of: SeaTunnel, Kafka, Flink, Spark (data integration, real-time, or offline processing)

• Familiar with one or more of: Doris, ClickHouse, Snowflake, BigQuery, Redshift, Databricks, PostgreSQL (data warehouse, OLAP, or lakehouse systems)

• Familiar with one or more of: DolphinScheduler, StreamPark, Airflow, Dagster, Prefect, dbt (scheduling, development, or task management tools)

• Understands data pipeline operations: scheduling, dependencies, monitoring, failure retry, backfill, version management, and deployment processes

• Candidates are not expected to master all tools, but must have a solid data engineering foundation and the ability to quickly learn new tech stacks

**Data Quality & Governance Mindset**

• Understands data quality dimensions: accuracy, completeness, consistency, uniqueness, timeliness, and anomaly detection

• Proactively designs data validation rules and can identify and locate data anomalies

• Familiar with metadata management, data lineage, data asset catalogs, and data standards; experience with DataHub or similar platforms is a plus

**Collaboration & Communication**

• Able to communicate data requirements with analysts, business stakeholders, backend engineers, and product managers

• Clearly describes problems, solutions, risks, progress, and deliverables

• Comfortable with cross-timezone collaboration; strong written and spoken English communication skills

• Willing to participate in regular fixed collaboration sessions with China-based teams and drive work through documentation and async communication

**Nice-to-Have**

• Experience integrating third-party SaaS data: CRM, ERP, marketing platforms, customer service systems, logistics, e-commerce, payment systems, or ad platforms

• Experience building data lakehouses, data middle platforms, data platforms, or enterprise-level data warehouses

• Experience developing data service APIs, metric services, internal data products, or lightweight backend services

• Experience with data quality frameworks, data lineage, metadata management, data catalogs, observability, or monitoring and alerting

• AWS, GCP, or Azure cloud platform experience

• Docker, CI/CD, Terraform, Kubernetes, or basic DevOps experience

• Experience with LLMs, AI Agents, code generation, automated testing, task inspection, data quality agents, or engineering efficiency tooling

• Experience with cross-border teams, international business, supply chain, e-commerce, logistics, marketing, or customer success data scenarios

## Benefits

Starting Pay: 90 - 130k depends on experiences, open to negotiation

401(k)

PTO

Paid Holidays

Insurance
