This guide compliments the quickstart guide on the main Visual Mesh repo by being geared towards those with little experience of the Linux command line (i.e. bash) or of Machine Learning workflows. If you are already familiar with bash and machine learning, you may prefer to work off the Quickstart guide on the Visual Mesh repo as it is briefer than this guide.
To get started, run this line to check the installation and version of the required programs.
python3 --version && pip3 --version && git --version && docker --version
This should provide a result similar to that listed below, possibly with higher version numbers. Run through the relevant installation guides if this doesn't appear
Python 3.8.5pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)git version 2.25.1Docker version 20.10.3, build 48d30b5
Installing Python3
The script to build the dataset is written in Python. When using Ubuntu, the simplest way to install Python is to use the APT package manager. Run sudo apt install python3
to have APT install to the most recent version of Python 3 for your system.
Installing pip
Pip is the default package manager for Python. In systems with both Python2 and Python3, the pip3 specifically invokes the pip tied to the Python3. Because Ubuntu packages pip separately from Python, you may have to install it specifically by running sudo apt install python3-pip
.
Installing Git
While you don't strictly need to use git to download the Visual Mesh repo (e.g. you could just download it from github's web interface), git is the recommended way to get and manage the code from the repo. The git command line interface is installed on Ubuntu with APT by running sudo apt install git
. If you prefer a GUI for git, try GitKraken.
Installing Docker
Docker is the container system that we use for standardised software environments at NUbots, to install it, check out the documentation, as the version packaged by Ubuntu often lags behind the official version. Be sure to ask on Slack if you've got any questions.
Clone the Visual Mesh Repo
Use git to clone the Visual mesh repo. When using git on the commandline, git will default to cloning the repo into a folder under your current working directory. Start by navigating to the folder where you want to store the repo: for example, if you wanted to store the visual mesh under ~/code/
, run:
cd ~mkdir codecd code
Once you're in the folder you wanted to be in (e.g. ~/code/
), clone the repository with the following command:
git clone https://github.com/Fastcode/VisualMesh.git
Build the docker image
We specify build steps for docker using a file named Dockerfile
. This uses similar syntax to a bash script, but is a format specifically used by docker. Open the dockerfile in a text editor/reader, and take note of the last line that starts with WORKDIR
(At the time of writing, the last line of the file). This line tells us that the working directory of the container will be the /workspace
directory within the container (not the host). Take note of this as it will be the directory from which the program will run.
Having done a quick check of the dockerfile, we can now tell docker to build the image according to the dockerfile with the following command:
docker build . --pull -t visualmesh:latest
What does this command do?
docker
is the call to dockerbuild
is the main command being sent to docker.
is telling docker that the build context is the current directory. Docker will look for a file namedDockerfile
in the build context telling it what to build (so in this case, the current directory).--pull
tells docker to always attempt to pull a newer version of the image-t visualmesh:latest
tells docker that we want to tag the built image asvisualmesh:latest
. We'll see later that when we tell docker to run a container, that we tell docker which image to use by referencing this tag.
Make a dummy dataset
For the sake of this guide, we'll use a dummy dataset containing the same image repeated over and over again. While this will lead to extreme overfitting, it's an easy way to quickly check that the code is running correctly. Download the example image, mask, and lens file dummy_data.zip to your downloads folder.
Now, we're going to make a dummy dataset by repeating these files. We can do this using a bash loop. Assuming that we have the dummy data in ~/Downloads/dummy_data
, and we're going to put the repeated data in ~/Downloads/repeated_data
, run the four commands below:
cd ~/Downloads
unzip dummy_data.zip
mkdir repeated_data
for i in {1..1000}; do cp ./dummy_data/image.jpg ./repeated_data/image${i}.jpg && cp ./dummy_data/mask.png ./repeated_data/mask${i}.png && cp ./dummy_data/lens.yaml ./repeated_data/lens${i}.yaml; done
To ensure that the data was copied sucessfully run the command below which will print all the filenames to the terminal window. If you would like to clean up your display, you can use the clear
command.
ls ~/Downloads/repeated_dataclear
Install the prerequisites to make a tfrecord
We want to convert our dataset from image and text files to the tfrecord format that the visual mesh expects, but to do this we need to install some python libraries to allow python to work with tfrecords. Feel free to use something like conda or virtualenv if you're concerned about ending up with conflicting library versions.
pip3 install tensorflow tqdm
Make a set of tfrecord
Having made the repeated dataset and installed the libraries required to work with tfrecords, we can now put it into the tfrecord format the visual mesh wants using the make_dataset.py
script. By default, the visual mesh will look for the dataset inside a folder named dataset
, so we'll navigate back to the visual mesh repo, make the dataset
folder, then make the .tfrecord
files required using the make_dataset.py
script:
cd ~/code/VisualMeshmkdir dataset./training/make_dataset.py ~/Downloads/repeated_data ./dataset
The make_dataset.py
script won't complain if you give it an empty folder, so make sure that the created tfrecord files are not empty.
Configure the training
The visual mesh looks for a config.yaml
file in the output folder to set the training parameters. An example file example_net.yaml
can be found in the root of the repo. We'll create an output folder and copy the example parameters into it:
cd ~/code/VisualMeshmkdir outputcp example_net.yaml ./output/config.yaml
With that, we're almost ready to train the network. To save time, change the configuration to run less training epochs (500 epochs can easily take 30 hours to complete on CPU). Open the config file in a text editor, (e.g. nano output/config.yaml
) and change the line that says epochs: 500
to say epochs: 5
. Also note that the paths for the training, testing and validation tfrecords are set in the config file. If you have accidentally started training 500 epochs hit ctrl+c
to keyboard interrupt training, if that doesn't work, open another bash terminal, type xkill
then click on the training terminal window
Train the network
The command to train the network is as below. It's a long command, but it's straightforward if we break it down into the individual parts:
cd ~/code/VisualMeshdocker run --gpus all -u $(id -u):$(id -g) -it --rm --volume $(pwd):/workspace visualmesh:latest ./mesh.py train output
What does this command do?
docker
is the call to dockerrun
is the main command being sent to docker--gpus all
tells docker to use the gpus from your system (remove--gpus all
if you're running on a system without a GPU)-it
tells docker to run in interactive mode with a terminal--rm
tells docker to automatically clean up the temporary files associated with this container--volume $(pwd):/workspace
tells docker to mount the current working directory as/workspace
inside the containervisualmesh:latest
tells docker which image we want to run (note that we built this image and gave it this name in an earlier step)./mesh.py train output
is the command to run inside the container, so we're passing the commandtrain
with the pathoutput
to themesh.py
script. Note that the filesystem inside the container is different to the filesystem outside of the container, but that due to the--volume
flag, the current working directory is mounted at/workspace
, so in this example, theoutput
folder ends up being the one in the code directory.
Test the network
Testing the network is a very similar command to training, we just replace "train" with "test":
cd ~/code/VisualMeshdocker run --gpus all -u $(id -u):$(id -g) -it --rm --volume $(pwd):/workspace visualmesh:latest ./mesh.py test output
Once the test has finished running, you can inspect the output in output/test/metrics
.
Using Tensorboard to inspect the process
Tensorboard is a tool that provides a dashboard for visualising of the training process that runs in the browser. For tensorflow, we just pass the output directory to tensorboard: (Note: You may have to install TensorBoard separately if your TensorFlow install didn't bundle it.)
tensorboard --logdir output
You should see something like the following:
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_allTensorBoard 2.4.1 at http://localhost:6006/ (Press CTRL+C to quit)
Go to the url specified on your terminal to see the tensorboard dashboard.