Budapest 1896 📽 Automatic restoration of old video footage using machine learning

Recently, several videos have been popping up on the internet that showed really old video footage restored to modern standards. From a contextual perspective, it is a very fascinating process – i.e. using the knowledge of the past 100 years to create a better image of the even more distant past – but it is also an interesting machine learning application. In a nutshell, the process is the following: you show an algorithm a lot if images of how our world currently looks like, a few images about how our world used to look like in the past – and then it tries to recreate the feel of the modern images on the old images (and a video is just a bunch of images chained after another).

Ezt a bejegyzést magyar nyelven is olvashatod.

Perhaps the most prominent videos in this category are the ones made by Denis Shiryaev: Arrival of a Train at La Ciotat from 1896 and A Trip Through New York City from 1911. Denis shares the methodology only conceptually, but of course anyone in the know suspects that there are several steps involved. So, I looked for some old footage of Budapest and I had a nice weekend project lined up… But before I delve in, allow me to take off my 👒 in front of Denis – this was a reaaally long and a reaaally difficult process!

📽 Source videos

Quickly I found that the hardest part of my project was getting good source videos. Denis’ restorations look amazing, partly due to the fact that he uses already good quality source videos, sometimes restored by hand, following a classical process. It was much more challenging to find old videos (I wanted it to be as old as possible) about Budapest and the Hungarian cultural heritage, but eventually I settled on a few very short, very bad quality clips featuring Budapest and Kolozsvár (Cluj-Napoca). Here I must point out that cinematography developed hand-in-hand with the progression of the 20th century, so footage from the late 19th and early 20th centuries is of much worse quality than say from the 1940s onwards.

After benchmarking the project’s difficulty – I settled on the earliest source videos I could find, I’ve created a 7-step process to produce the restoration effect. The whole process, and all the files needed can be found in this folder (there is also a mirror in this repo, but the video files are not replicated here). I tried to conduct the entire process in a Google Colab notebook as much as possible. In theory, you can try to reproduce the steps and apply it to your old family videos 😎 especially if your great-granpa was an enthusiast and you have an attic full of dusty old video reels.

Budapest in 1896, perhaps the first video footage ever taken in Hungary, by the Lumière brothers, the inventors of cinematography, about the Budapest Millennial Parade, just one year after the 1895 birth of the technology
Budapest in 1916, the first ever Hungarian documentary film
Budapest in 1925, (likely) a footage from the Hungarian News Agency
Budapest in 1927, excerpt from the Dutch movie Land en volk van Hongarije
Budapest in the 1930s, by British Pathé
Possibly the earliest video of Kolozsvár (today Cluj-Napoca, Romania) from 1930, a private footage
Excerpts of footage about Transylvania, from the Hungarian News Agency’s Világhíradó (News of the World) programme, from 1941-1943
A documentary about railway development in the Székelyland region of Transylvania from the 1940s

🎉 The result

🎠 The process

The whole process is called deoldifying and it has several steps. These are not the recommended steps by anyone, nor is their order, it is just the result of my experimentation – feel free to try alternative pathways and suggest improvements in the comments below. But remember, all of this is an automatic process, enabled by artificial intelligence and machine learning. I personally have no idea about restoring videos, Ii just like to play around with code 🤓. This means that rather than looking at the physics of the videos, the algorithms look at a lot of examples of how a video should look like, and they they try to make small modifications to the source video so it looks like that.

0. Prerequisites

You can copy my entire folder, but you may want to start out fresh by making your own work folder, and then creating a subfolder in there, called raw. Best if you use mp4 video formats.

Second, you might want run the entire process in the cloud, perhaps on Google Colab, or elsewhere, because for most videos that have a reasonable resolution (>640×480), you will need a GPU memory larger than 16GB, which the standard offering on today’s high-end graphic’s cards. Oh yes, you will definitely need a GPU for this. If you have multiple GPUs and can leverage on their their combined memory, then you’re set with about >40GB of GPU RAM for processing HD (720p) videos in color. Otherwise, in the Jupyter notebooks I’ve included, there are various precautionary measures to fit into the 16GB memory of the Google Colab GPUs (Nvidia Tesla P100 if you get lucky) – but often this means reducing the video size 😐. The first cell of each workbook checks the GPU you have been allocated by Google. You may want to consider using NVidia‘s compute cloud.

In every workbook, I made use of Google Drive. So, the second cell of each workbook is to mount your your google drive onto Colab – following Google’s official process. It asks for an authorization key, and you will need to repeat the process for every workbook.

1. Stabilization

The first step you will want to take is to stabilize the videos. Old videos are often shaky, and that’s not a clean input for our algorithms. For this is used Adam Spannbauer’s vidstab. To follow the steps, run the notebook called 1_stabilize. After the process, you should end up with a folder named 1, filled with your stabilized videos from raw. Vidstab doesn’t do a terrific job, when transitioning between scenes, but it does trick for our use case.

2. Upscaling

The second step is upscaling the video. Remember the fancy zooming algorithm in your typical 90’s action movie? That’s the one. Except it was not really possible until very recently. Because when you zoom in, you will need to fill the newly created pixels somehow. In the past the standard technique would be to just interpolate pixel vales – but this created a very blurry image. But with machine learning, you can try to recognize the features present in the image and fill the gaps that way.

For this step, you have two options:

Either you can try to run 2_upscale from the folder – this is running an algorithm called VaporSynth from AlphaAtlas. However, it is quite challenging to get it to run – so be prepared for a lot of sweat.
The other option is run Topaz Video Enhance AI from Topaz Labs. Their trial version should cover your needs for this project.

When you’re finished, upload your upscaled videos in folder 2 in your drive. If you used Topaz (like me), you can keep the automatic filenames.

3. Deoldifying

The next step is to try to artificially color the videos and remove smears . For this we use Jason Antic’s DeOldify algorithm – which gave the name to the entire process. In order to be able to run this algorithm, we will have to upload our videos to YouTube and then save the links. In the 3_deoldify notebook, you will see a textbox where to paste your YouTube link. For restoring old videos, in my experience having a render_factor of 10-15 does the best job. Once finished, the notebook uploads your deoldified videos to folder 3.

4. Preparation

The next step is remastering – the process you would follow if you are restoring an old video reel by hand. But here, of course we are doing the remastering process using machine learning. What the remastering does is that it removes dust particles from the footage, that would cause lines and smears on the video.

For this we will be using Satoshi Iizuka’s DeepRemasteralgorithm. However, since this is a transfer learning algorithm, you will need to provide it some good examples, of how the remastered video should you like. These will need to be images in a folder about good, representative scenes of the video. Ideally, these should be colored, too. We could do this manually, but we can also automate:

The first step in the process is scene detection. For this we will be using Brandon Castellano’s pyscenedetect. This algorithm basically looks for changes in the frames and cuts the video up into scenes. It then takes a few snapshots from each scene. Let’s do the scene detection for all videos both before the deoldifying takes place and after – so namely all files in folders 2 and 3. We will save the scenes in the same folder, under a subfolder named scenes. After the scene detection is finished, we go and look at the contents of our scenes folder, select the good looking scenes, which have a clear image and are not tilted, and copy these to a the goodscenes subfolder, in each of 2 and 3. For doing this, we will be using the workbook 4a_scene_detection.
After that, as the input for remastering needs color images, we need to color the scenes. Well, we already have a collection of automatically colored scenes in folder 3 – these are scenes cut from videos that have gone through the deoldifying process. But let us create a second set of colored scenes, using a different algorithm. For this we will be using Richard Zhang’s ideepcolor (or you can also use this alternative). As this is also a transfer learning algorithm, we need to provide example photos about the color composition (color histogram) of our targets. For this, chose a new-ish photo randomly from the internet that is most representative for the scenes encountered in the video for each your clips and upload it to the 2/goodscenes folder and name it the same as as the video clip, with a .png extension. So if your original clip was called bp1.mp4 (along with and a few upscaled versions of it), name your color reference image bp1.png. Run the workbook called 4b_ideepcolor. This will create folder 4 and two subfolders therein:
- colorzed_auto – here we will put the colorized pictures using the automatic colorization parameters
- colorized_ref – here we will put the colorized pictures using the reference images provided, using color histogram transfer learning
Finally, we need to combine the colored scenes into one common folder, per video, with the 4c_prepare notebook.

5. Remastering

Now everything is ready for the remastering process. Remember that for this we are using Satoshi Iizuka’s DeepRemaster algorithm. As this is a transfer learning algorithm, you will need to provide a folder containing good, colored scenes – which is what we have just created in the above process. Run 5_remaster to finish the process. At the end of the process, you will end up with folders 5, 5b, 5c – as we run the remastering on several versions of the video, with and without added colorization. You might also need to scale down the resolution of the output videos ere, as this is the most memory hungry algorithm in the whole process. In my experience, the deoldified videos are the better source (so folder 3), perhaps with a render_factor of 12.

6. Deblurring

The next step is to deblur our videos. This is very similar to the upscaling process, and uses deep neural networks to identify features in the individual frames and tries to sharpen them by using a pre-trained model that contains how these objects should typically look like. The best process for this currently is Minyuan Ye’s SIUN algorithm, which you can find in 6_deblur. The output of this is symmetric to the input structure – so any videos you have in folders 5, 5b, 5c will be deblurred and stored in 6, 6b, 6c.

7. Interpolating

Finally, because older videos have usually much lower frame rates that those of today, we will perform an interpolation to artificially increase the frame rate of our videos. Currently, the best algorithm for this by far is Bao Wendo’s DAIN, which creates a 3D model from each scenes and extracts depth information for a super smooth output. However, I found it quite difficult to get it to run (it does work flawlessly against benchmarks, just couldn’t get it to run on my own video – UPDATE: apparently, there is a Windows release), so I had to resort to an alternative, such as Shurui Gui and Chaoyue Wang’s FeatureFlow. You can run this in 7_interpolate. Just like the previous step, the output of this is also symmetric to the input structure – so any videos you have in folders 6, 6b, 6c will be deblurred and stored in 7, 7b, 7c. The algorithm needs to create scaled down versions of the input videos (in order to fit into the video memory), these will be the files under the default file names in folders 7, 7b, 7c, while the processed outputs will have a _Sedraw ending, respectively. At the end of the process, the processed outputs are moved into 8, 8b, 8c. If no resizing takes place, then folders 7, 7b, 7c remain empty (likely you will only find files in 7b, as the rest of the folders are already shrunk at the remastering stage).

🏁 Finish

Finally, before showcasing your work to others, you can use any video editing tool to illustrate the difference, I used OpenShot. If you want, you can try to run Topaz to upscale another time, but it might your final video look overprocessed .

The final folder structure looks like this:

raw – source video files
1 – stabilized
2 – upscaled
3 – colored with deoldify
4 – split by scenes for both 2 and 3
From here onwards, file number multiplies:
- file1 – stabilized, upscaled
- file1_12 – stabilized, colored with deoldify, render_factor=12
- file1_21 – stabilized, colored with deoldify, render_factor=21
5 – remastered, size reduced (to fit into memory), recolored with ideepcolor
5b – remastered, original size
5c – remastered, size reduced
From here onwards, the 5 → 6, 5b → 6b, ... conventions are preserved
6, 6b, 6c – deblurred
7, 7b, 7c – size reduced further, if necessary, for interpolation
7req – required model files for interpolation
8, 8b, 8c – interpolated, finished

And that’s it! You’ve just learned how to restore and deoldify old videos using artificial intelligence by applying 7 machine learning algorithms in sequence. One day, if I will have a lot of time, maybe I’ll combine this into a point-and-click tool – but this field is changing so quickly that most likely I’ll have to swap most of it out for newer, better components.

2 thoughts on “Budapest 1896 📽 Automatic restoration of old video footage using machine learning”

Pingback: Bucharest 1930s 📽 old footage restored using machine learning – Kontext
Brent Budden says:

2021 June 10 at 23:22

This is a great article, thank you. Can I say the 6_deblur is missing in google drive; would really like to be able to try it.

LikeLike