Cloud TPU
$4.50 USD per TPU per hour.
(preemptive $1.35 USD per TPU per hour)

Pricing

AWS Elastic GPU Cost
eg1.2xlarge, 8 GiB, $0.400/hour
g3.16xlarge 64 188 488 GiB, EBS Only, $4.56 per Hour
NOTES 
NEED TO USE us-central1-f ZONE!! 
ctpu --zone us-central1-f -log-http ls
You need a Google Storage Bucket to store models and what not
If you want to delete it: 
gsutil rm -r gs://YOUR-BUCKET-NAME
Tutorial Link
Questions 
Tensorflow vs PyTorch
Tensorboard
Setup link for Google Cloud (connected to a Compute Engine VM and Cloud TPU)
Benchmark Comparisons
According to this link, TPUs are faster and also cheaper when accounting for time saved. 
Another link here.
Tensorflow Research Cloud Program
"If you're enrolled in the TFRC program you are granted access to Cloud TPU for a limited period of time free of charge. You are not charged for Cloud TPU as long as your TPUs are running in zone us-central1-f."
How TPUs Work Doc
Link here
Tensorflow
- TPUs are faster
- Drawbacks: need to adhere to the Estimator framework. Which is more complex than PyTorch. Definitely not as flexible as PyTorch
- Static computation graph (vs dynamic), which means things like variable sized inputs aren't well handled. You can't just use like a for loop. 
The compute graph is created NOT on the host machine, but rather the TPU device itself. As such, you can't initialize the session. This is really unfortunate because I use this functionality a lot when I'm trying to debug what's going on. 
- I think TF Estimator framework suffers from overengineering. Lots of subclassing going on, so it can be hard to follow 
- tfdecode_raw int8 not supported on TPU 
- Really hard to debug when things go wrong. 
- Not well documented. For example, the TFRecord
tf.train.Example, tf.train.Features, community seems fairly frustrated about this as well ().
Commands 
TO RUN Resnet on TPU
python resnet_main.py   --tpu=$TPU_NAME   --data_dir=gs://cloud-tpu-test-datasets/fake_imagenet   --precision='float32' --model_dir=${STORAGE_BUCKET}/resnet
Input Transformations
INPUTS:
<tf.Tensor 'truediv:0' shape=(128, 512, 512, 3) dtype=float32>

INITIAL CONV:
Tensor("initial_conv:0", shape=(128, 256, 256, 64), dtype=float32)

INITIAL POOL:
Tensor("initial_max_pool:0", shape=(128, 128, 128, 64), dtype=float32)

AFTER BLOCK 1 (num filters 64, stride 1):
Tensor("block_group1:0", shape=(128, 128, 128, 256), dtype=float32)

AFTER BLOCK 2 (num filters 128, stride 2):
Tensor("block_group2:0", shape=(128, 64, 64, 512), dtype=float32)

AFTER BLOCK 3 (num filters 256, stride 2):
Tensor("block_group3:0", shape=(128, 32, 32, 1024), dtype=float32)

AFTER BLOCK 4 (num filters 512, stride 2):
Tensor("block_group4:0", shape=(128, 16, 16, 2048), dtype=float32)

After average pooling
<tf.Tensor 'final_avg_pool:0' shape=(128, 1, 1, 2048) dtype=float32>

Final Dense
<tf.Tensor 'final_dense:0' shape=(128, 1000) dtype=float32>

Header

Within Block Fns
START 
<tf.Tensor 'initial_max_pool:0' shape=(128, 56, 56, 64) dtype=float32>

<tf.Tensor 'Relu_3:0' shape=(128, 56, 56, 256) dtype=float32>

<tf.Tensor 'Relu_6:0' shape=(128, 56, 56, 256) dtype=float32>
Google TPU Resnet Tutorial
Encoder
AFTER BLOCK 4 (num filters 512, stride 2):
Tensor("block_group4:0", shape=(128, 16, 16, 2048), dtype=float32)

Translates To:
<tf.Tensor 'conv2d_53/Conv2D:0' shape=(128, 16, 16, 8) dtype=float32>
Decoder
After First Expansion to 128 channels:
<tf.Tensor 'Relu_49:0' shape=(128, 16, 16, 128) dtype=float32>



Now Need To Go To
<tf.Tensor 'truediv:0' shape=(128, 256, 256, 3) dtype=float32>
Optimization Performed
TFRecords - they're loaded NOT from the instance, but rather from the TPU itself (lazily). From a bucket. Then deserialized, then parsed. I used TFRecords which are apparently the paradigm here for fast performance. 
- They do a lot of preprocessing to the images before putting them. 
- Spent a large chunk of time serializing and deserializing 
- Serializing an image into numpy array to then be deserialized by Tensorflow on TPU is a pain. 
Memory Issues
Batch size of 8. Finished training up to step 1251. Elapsed seconds 302.
Batch size of 80. 1251 steps. Elapsed seconds 503.
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Compilation failure: Ran out of memory in memory space hbm. Used 10.38G of 8.00G hbm. Exceeded hbm capacity by 2.38G.
New Note
Justin Johnson's setup: 
https://www.reddit.com/r/MachineLearning/comments/8g3akw/d_whats_your_setup_for_training_ml_models_in_the/
Other Cloud Provider
https://www.paperspace.com/
TODO: Google Cloud
- Pricing for comparable p3 instance
- connect with Sasha 
Loss construction
- why are we solving
MultiLoss Project
- a
Task Space Project
- d
- heireustic reductions
- Input: Fixing the first space
- Output: other domains
- Finding the task means finding output spaces 
   Login to remove ads X
Feedback | How-To