Overview
As a pipeline TD, especially in 2D department, where is usually responsible for data IO, compositing, and even delivering, we often have to deal with the numerous files that exist on disk. Most of the files are created via DCC tools or open-sources like OpenImageIO, but we sometimes just need to copy them from one location to another.
For example, if we have a comp render sequence that already meets all client requirements(such as formatting, color, slate, and so on), we then copy the files to a directory designated for delivery packaging. Another example we can think of would be the Localization exchanging the same files between network storage.
Copying files
In this case, python libraries like shutil
offer some useful functions for copying:
import shutil
shutil.copy2(source, destination)
This is very useful for sure, but let’s say if there are 100 sequences that need to be copied, and say each sequence has a total file size of approximately 1GB, we’re going to need +100GB of storage space along with corresponding network bandwidth, and it will immediately affect network where artist work. Then people will complain about why we always experience network delays on the busiest days of delivering something. I’m not saying that only this causes network slowness. I’m saying this is certainly one of the things we can optimize.
Remember that this is just to put the same files in a different location.
Hard Links
Some of you may or may not heard about Hard Links which can be used in bash like so:
$ ln [source] [link destination]
In python, it would be:
import os
os.link(source, link destination)
In short, this makes a link to an existing file without copying the actual data. Meaning that you don’t need the network capacity to copy the entire file data or free space as much as the original file size. Furthermore, even if the original file is moved or deleted, you can still access the data until you delete all linked files.
Of course there are limitations. Since each hard link file uses the same ‘inode’ number as the original, it’s unable to cross different file systems. We also cannot create a hard link for a directory.
Soft Links
There’s another concept similar to hard links called Soft Links.
$ ln -s [source] [link destination]
In python:
import os
os.symlink(source, link destination)
Instead of describing soft links in detail, refer to the table below for simple comparison with hard links:
Comparison | Hard Link | Soft Link |
---|---|---|
Inode number | Take the same inode number | Take a different inode number |
Directory links | X | O |
Cross file systems | X | O |
If original file is moved | still available | not available |
If original file is deleted | still available | not available |
Speed | Relatively fast | Relatively slow |
Soft links may look better than hard links as they can be linked between different file systems or even directories.
Hard Links vs Soft Links
I can’t say which is right. Instead, here are some scenarios:
- You downloaded an element source and transferred the file into an element directory through a pipeline tool. Someone accidentally moves the original element file from the download directory to another in order to re-organize the folder structure by date.
- You have a render sequence from an artist and have transferred the sequence to a designated directory for a delivery. The artist unintentionally deletes the sequence from the render directory.
- You are developing a localization tool that needs to copy all file dependencies to the local drive.
In the first and second scenarios, soft links won’t be very helpful because the link breaks as soon as the original file is moved or deleted. But hard links can keep the linked files intact in the same situation.
As for the third, neither method will work - As mentioned above, hard links are not able to link to local drives, and localized files linked by soft links are useless because the file system still reads the original file data located on the network disk. In this case, you need to copy actual data using other commands like shutil
(python), robocopy
(Windows) or rsync
(Linux).