Migration & Transfer
    🚛Migration & Transfer

    AWS DataSync

    Automated high-speed data transfer between on-premises storage, other clouds, and AWS

    Moving files from one place to another sounds simple, but at scale it becomes a nightmare. You have 180TB on your old NAS and need it in S3. You could write scripts to copy files one by one, but they break halfway through, skip files silently, and take weeks. DataSync is like a high-speed automated moving truck with a detailed manifest. You tell it where the data lives and where it is going, and it handles everything: it validates every byte was copied correctly, encrypts the transfer, and moves files in parallel across 10 lanes at once instead of one. It picks up exactly where it left off if interrupted. It works with your existing storage (NFS, SMB, HDFS, object storage), lands data in S3, EFS, or FSx, and even moves data cross-cloud from Google Cloud Storage or Azure Blob directly into AWS. You pay per gigabyte transferred, not per hour.

    DataSync uses a purpose-built network protocol and parallel multi-threaded architecture capable of saturating a 10Gbps link. You deploy a DataSync agent (a VM or EC2 instance) on-premises or in another cloud, then create a task: source location plus destination location plus transfer settings. Supported on-premises sources include NFS, SMB, HDFS, and object storage. Supported cloud sources include Google Cloud Storage, Azure Blob, Azure Files, and 10 other providers. Destinations include S3 (all storage classes), EFS, and all FSx variants. DataSync performs automatic checksums on every file before and after transfer. Transfers run over TLS. Tasks can be scheduled or triggered via API or EventBridge.

    Key Capabilities

    • Transfers data from NFS, SMB, HDFS, and object storage on-premises to S3, EFS, and all Amazon FSx variants
    • Cross-cloud transfers from Google Cloud Storage, Azure Blob, Azure Files, and a dozen other providers directly into AWS
    • Purpose-built parallel multi-threaded protocol capable of saturating a 10Gbps link
    • Automatic end-to-end checksum verification on every file before and after transfer
    • Scheduled or on-demand tasks; integrates with EventBridge for automated pipeline triggers
    • Encrypts all data in transit over TLS; supports VPC endpoints to avoid the public internet

    Gotchas & Constraints

    Gotcha #1: DataSync does not preserve file system permissions by default when crossing storage types (NFS to S3, for example). Review the metadata handling settings for your specific source-destination combination before running production transfers. Gotcha #2: The on-premises agent requires outbound HTTPS on port 443. Running the agent on a shared, heavily used server degrades throughput significantly; dedicate a host and a NIC. Constraints: Maximum file size is 5TB (S3 single-object limit). Tasks are region-specific.

    A media production company stores 180TB of raw video footage on an on-premises NetApp NAS. They are moving to a cloud-based editing pipeline and need everything in S3 before the cutover date in 6 weeks. A manual rsync attempt was abandoned after 3 days when it failed silently on 12,000 files with no error log. They deploy two DataSync agents on dedicated VMs connected to a 10Gbps uplink, configure tasks pointing their NFS mount to an S3 bucket in us-east-1, and enable nightly scheduling for incremental syncs after the initial bulk run. DataSync transfers the 180TB in 9 days using parallel threads, verifies every file with automatic checksums, and produces a transfer report showing 0 errors. Daily incremental syncs during the 6-week cutover window keep the S3 bucket current as editors continue working on-premises. On day 42, they cut over the editing pipeline to S3.

    The Result

    180TB migrated with zero data corruption, ongoing incremental sync automated at $0.0125 per GB, and the original 3-day rsync failure replaced by a 9-day verified transfer with a clean audit trail.

    Official AWS Documentation