•1 min read•from Machine Learning
Anyone have an S3-compatible store that actually saturates H100s without the AWS egress tax? [R]
We’re training on a cluster in Lambda Labs, but our main dataset ( over 40TB) is sitting in AWS S3. The egress fees are high, so we tried to do it off Cloudflare R2. The problem is R2’s TTFB is all over the place, and our data loader is constantly waiting on I/O. Then the GPUs are unused for 20% of the epoch.
Is there a zero-egress alternative that actually has the throughput/latency for high-speed streaming? Or are we stuck building a custom NVMe cache layer?
[link] [comments]
Want to read more?
Check out the full article on the original site
Tagged with
#generative AI for data analysis
#Excel alternatives for data analysis
#natural language processing for spreadsheets
#rows.com
#big data management in spreadsheets
#conversational data analysis
#large dataset processing
#row zero
#real-time data collaboration
#intelligent data visualization
#data visualization tools
#enterprise data management
#big data performance
#data analysis tools
#data cleaning solutions
#S3-compatible store
#AWS S3
#zero-egress alternative
#egress tax
#high-speed streaming