The following is a glossary of terms defined by the OmniBOR project. For the current precise definitions, refer to the specification.
An artifact is any software object of interest.
Examples:
.o object file.so shared object file.class Java class file.jar file.pyc compiled python fileWhat all artifacts have in common is that they are all arrays of bytes.
Two artifacts are equivalent if and only if their byte representations are exactly equal.
Most artifacts are produced by a build tool consuming some set of input artifacts to produce an artifact as an output. Such artifacts are said to be 'derived artifacts'.
Artifacts which are not 'derived artifacts' are said to be 'leaf artifacts'. Leaf artifacts are usually source code files constructed by hand by humans.
Examples:
foo.o is derived from foo.c and bar.h using gcc"fooexecutable is derived from foo.o and baz.o using ld"foo.class is derived from foo.java using javac"It should be possible to identify each artifact with an Artifact ID.
Artifact IDs should have the following characteristics:
Canonical : Independent parties, presented with equivalent artifacts, derive the same Artifact ID.
Unique : Non-equivalent artifacts have distinct Artifact IDs.
Immutable : An artifact cannot be modified without also changing its Artifact ID.
OmniBOR uses the GitOID of an artifact as its Artifact ID.
Source code leaf artifacts are typically already being stored in Git where they are identified via their GitOID.
The Artifact Dependency Graph (ADG) of an artifact is the DAG (Directed Acyclic Graph) of all the 'leaf artifacts' that are transformed by a build tool into that artifact. This includes the direct input artifacts, and the transitive set of artifacts to each input artifact, all the way down to source code.
Simple C Executable
Running C Executable with Shared Object
flowchart BT
c1[.c] --> o1[.o]
h1.1[.h] --> o1[.o]
h1.2[.h] --> o1[.o]
c2[.c] --> o2[.o]
h2.1[.h] --> o2[.o]
h2.2[.h] --> o2[.o]
o1 --> executable
o2 --> executable
c3[.c] --> o3[.o]
h3.1[.h] --> o3[.o]
h3.2[.h] --> o3[.o]
c4[.c] --> o4[.o]
h4.1[.h] --> o4[.o]
h4.2[.h] --> o4[.o]
o3 --> .so
o4 --> .so
executable --> running[running executable]
.so --> running[running executable]
Java Example
flowchart BT
java1[.java] --> cls1[.class]
java2[.java] --> cls2[.class]
java3[.java] --> cls3[.class]
java4[.java] --> cls4[.class]
java5[.java] --> cls5[.class]
cls1 --> running[running executable]
cls2 --> running[running executable]
cls3 --> running[running executable]
cls4 --> running[running executable]
cls5 --> running[running executable]
Go Example
flowchart BT
go1[.go] --> o1[.o]
go2[.go] --> o2[.o]
go3[.go] --> o3[.o]
go4[.go] --> o4[.o]
go5[.go] --> o5[.o]
o1 --> executable
o2 --> executable
o3 --> executable
o4 --> executable
o5 --> executable
Python Example
flowchart BT
py1[.py] --> pyc1[.pyc]
py2[.py] --> pyc2[.pyc]
py3[.py] --> pyc3[.pyc]
py4[.py] --> pyc4[.pyc]
py5[.py] --> pyc5[.pyc]
pyc1 --> running[running executable]
pyc2 --> running[running executable]
pyc3 --> running[running executable]
pyc4 --> running[running executable]
pyc5 --> running[running executable]
A build tool is something which reads one or more input artifacts and writes one or more output artifacts.
flowchart LR
input1 --> buildtool[build tool] --> output
input2 --> buildtool[build tool]
input3 --> buildtool[build tool]
Examples:
.c file and zero or more .h files to produce a
.o file flowchart LR
.c --> compiler[[compiler]]
*.h --> compiler[[compiler]]
compiler --> .o
.o files to produce an executable file flowchart LR
*.o --> linker[[linker]]
linker --> executable
.o files to produce a shared object flowchart LR
*.o --> linker[[linker]]
linker --> .so
flowchart LR
executable --> linker[[dynamic linker]]
*.so --> linker[[dynamic linker]]
linker --> running[running executable]
.java file to produce a .class file flowchart LR
.java --> compiler[[compiler]]
compiler --> classfile[.class]
.class files to produce a running process flowchart LR
classfile[*.class] --> runtime[[runtime]]
runtime --> running[running executable]
.py file to produce a .pyc file flowchart LR
.py --> compiler[[compiler]]
compiler --> .pyc
The totality of ancestors for a given artifact may be represented as an Artifact Dependency Graph (ADG).
Typically, source code files are hand written by humans, and as such are leaf artifacts in the Artifact Dependency Graph (ADG).
Source code files can also be generated from other inputs by a code generator.
flowchart LR
input[input] --> codegenerator[[code generator]] --> generatedsrc[generated source code file]
In this scenario, the generated source code file is a derived artifact. This is because the code generator is a build tool and, by definition, the output from the build tool is a derived artifact.
Code generation is very common in many languages. See go generate, Java Xtend, and qtcpp for examples.
Git is an object store masquerading as a source code management system (SCM).
Git's storage model stores source code and metadata using a Merkel tree.
Git Objects are represented as follows:
${type} - Git Object Type as a string
blob - any bytestree - represents a filesystem treecommit - represents a Git committag - represents a Git tag${size}: size in bytes of ${content} represented as a string base 10.${content}: the byte content of the objectA Git blob (binary large object) is the type used for file contents in git:
${content} - bytes of the file contents
Git Blobs are identified by the SHA-1 hash of the blob object with the GitOID construction, which first hashes in a string containing the object type, an ASCII space character, the length of the content in number of bytes, and an ASCII null terminator character:
An artifact dependency graph can be represented as a graph with nodes identified by an Artifact ID. In the examples below, we only show tree structures for simplicity.
flowchart BT
Artifact-2[Artifact-2 ID] --> Artifact-1[Artifact-1 ID]
Artifact-3[Artifact-3 ID] --> Artifact-1[Artifact-1 ID]
Artifact-4[Artifact-4 ID] --> Artifact-2[Artifact-2 ID]
Artifact-5[Artifact-5 ID] --> Artifact-2[Artifact-2 ID]
Artifact-6[Artifact-6 ID] --> Artifact-3[Artifact-3 ID]
Artifact-7[Artifact-7 ID] --> Artifact-3[Artifact-3 ID]
OmniBOR uses the GitOID of an artifact as its Artifact ID.
flowchart BT
Artifact-2[Artifact-2 gitoid] --> Artifact-1[Artifact-1 gitoid]
Artifact-3[Artifact-3 gitoid] --> Artifact-1[Artifact-1 gitoid]
Artifact-4[Artifact-4 gitoid] --> Artifact-2[Artifact-2 gitoid]
Artifact-5[Artifact-5 gitoid] --> Artifact-2[Artifact-2 gitoid]
Artifact-6[Artifact-6 gitoid] --> Artifact-3[Artifact-3 gitoid]
Artifact-7[Artifact-7 gitoid] --> Artifact-3[Artifact-3 gitoid]
The parent-child relationship is captured by a set of Input Manifests.
Each artifact has an Input Manifest that describes its immediate children consiting of a set of new line delimited records, one for each child, in lexical order.
A child artifact which is itself a leaf artifact would be represented by:
${Artifact ID of child}\n
A child artifact which is itself a derived artifact would be represented by:
${Artifact ID of child}⎵manifest⎵${Artifact ID of child's Input Manifest}\n
Example:
flowchart BT
Artifact-2[Artifact-2 Artifact ID] --> Artifact-1[Artifact-1 Artifact ID]
Artifact-3[Artifact-3 Artifact ID] --> Artifact-1[Artifact-1 Artifact ID]
Artifact-4[Artifact-4 Artifact ID] --> Artifact-2[Artifact-2 Artifact ID]
Artifact-5[Artifact-5 Artifact ID] --> Artifact-2[Artifact-2 Artifact ID]
Artifact-6[Artifact-6 Artifact ID] --> Artifact-3[Artifact-3 Artifact ID]
Artifact-7[Artifact-7 Artifact ID] --> Artifact-3[Artifact-3 Artifact ID]
Artifact-2's Input Manifest:
gitoid:sha256\n
${Artifact ID of Artifact-4}\n
${Artifact ID of Artifact-5}\n
Artifact-3's Input Manifest:
gitoid:sha256\n
${Artifact ID of Artifact-6}\n
${Artifact ID of Artifact-7}\n
Artifact-1's Input Manifest:
gitoid:sha256\n
${Artifact ID of Artifact-2}⎵manifest⎵${Artifact ID of Artifact-2's Input Manifest}\n
${Artifact ID of Artifact-3}⎵manifest⎵${Artifact ID of Artifact-2's Input Manifest}\n
OmniBOR advocates for build tools to embed into each derived artifact the Artifact ID of that derived artifact's Input Manifest.
Examples:
ELF Files (Executables and .so, and .o files)
: Embed Input Manifest Artifact ID into an ELF section named .omnibor
ar Files (.a static libraries)
: Embed Input Manifest Artifact ID into an archive entry named .omnibor
General Archive files (tar, gzip, etc.)
: Embed Input Manifest Artifact ID into an archive entry named .omnibor
Java .class file
: Embed Input Manifest Artifact ID into an annotation named @OMNIBOR in the
.class file.
Python .pyc files
: Embed Input Manifest Artifact ID into an __omnibor__ in the .pyc file.
Container Images
: Embed Input Manifest Artifact ID into the image manifest as an annotation
named dot.omnibor
Generated Source Code : Embed Input Manifest Artifact ID for a generated source code file using a comment
OmniBOR is not a Software Bill of Materials (SBOM). It is designed to complement SBOMs, such as SPDX or CycloneDX.
OmniBOR can help SBOMs be more precise and reliable.
Most SBOMs allow for 'external identifiers' and can thus use
Artifact IDs to reference the artifacts in the OmniBOR
Artifact Dependency Graph (ADG). This allows an
SBOM describing a specific component, e.g.
Component Name: Django and Component Version: 1.11.1, to reference a list
of applicable Artifact IDs.
This is helpful because today two different tools might produce two different SBOMs for the same software artifact. This could occur if the SBOM generation tools use different sources to identify and describe the component. OmniBOR provides a precise software Artifact ID which can be used in SBOMs in situations where naming schemes may be ambiguous.
Example 1: If one SBOM generation tool uses CPEs:
cpe:2.3:a:djangoproject:django:1.11.1:*:*:*:*:*:*:*
and the other uses Package URLs (pURLs):
pkg:pypi/django@1.11.1
… then these two SBOMs might diverge when they define the component
supplier: it could be Component Supplier: djangoproject or
Component Supplier: pypi.
Example 2: In another instance a vendor might choose to use their product's current marketing name for the component name in their SBOM generation tools, whereas third-party SBOM generation tools might use the vendor's product name as listed in a CPE or SWID tag.
By enabling both SBOM generation tools to list the OmniBOR Artifact ID(s) for associated with the component, an SBOM consumer can quickly understand that both SBOMs do describe the same artifact, regardless of ambiguities in naming schemes.