Although AWS has launched various GPU-powered instances + DL library pre-installed amis, it’s lack of support for remote GUI access still blocking some of the debugging jobs better facilitated by a GUI system. For instance, so far Ubuntu instances still doesn’t have multiple-platform RDP support, you cannot install ubuntu-desktop on your instance and expect AWS will provide you with all handy remote desktop tools that you just log on and whoop in to your instance hassle free. I’ve been trying to connect from my mac to a Ubuntu box on EC2 and believe me that’s pain in the ass setups.
In that respect, accessing tensorboard from your EC2 instance is just messy crap; if your instance need to follow some corp compliance and restricted in a VPC, you can’t just simple open up a port on that GPU instance and connect to tensorboard from internet. Tensorboard so far has no https support (neither do I believe they have intention to do that); but also it might breach your corp security requirements.
what about syncing back the event files between the GPU and your local development environment? that ideally works — but my recent experience with that approach is just … HORRIBLE. I was rsync-ing the tf.SummaryWriter between my Ubuntu + Tesla K2 box to my local mac, and turns out the model files (.mkt) are not recognized well by the tensorboard installed on my mac, which lost all the data of the “Embedding”s I logged down during the training — which made my believe I wrote either bad model or wrong Projector config — which took me entire afternoon to pull out hairs && lost my mind. so lesson learned — anything produced/run by tensorflow is system specific –> it’s overlapped possibly with GPU setups so I’d prolly say it’s doubled up with GPU driver version as well if you’re using GPU training.
So what we do? We need to make sure the tensorboard is running on the training box and access that on the box? Eventually my solution is kinda nasty. I ended up kicking off a new windows instance, under the same subnet — > log on to the windows box with MS RDP — which gives me a lame, however usable GUI, and then connect to tensorboard http server under the same subnet. This alleviate the pain a bit and I actually found Chrome working well on Win2016 instance, so at least thru this nasty approach I’m finally able to debug my Ubuntu tensorflow training — as a plebian who really needs it to visualize nerual nets cuz I don’t have the wack of those scientists to just run everything like that in their mind — a satisfying training experience.