The following is a glossary of terms defined by the OmniBOR project. For the current precise definitions, refer to the specification.
An artifact is any software object of interest.
Examples:
.o
object file.so
shared object file.class
Java class file.jar
file.pyc
compiled python fileWhat all artifacts have in common is that they are all arrays of bytes.
Two artifacts are equivalent if and only if their byte representations are exactly equal.
Most artifacts are produced by a build tool consuming some set of input artifacts to produce an artifact as an output. Such artifacts are said to be 'derived artifacts'.
Artifacts which are not 'derived artifacts' are said to be 'leaf artifacts'. Leaf artifacts are usually source code files constructed by hand by humans.
Examples:
foo.o
is derived from foo.c
and bar.h
using gcc
"fooexecutable
is derived from foo.o
and baz.o
using ld
"foo.class
is derived from foo.java
using javac
"It should be possible to identify each artifact with an Artifact ID.
Artifact IDs should have the following characteristics:
Canonical : Independent parties, presented with equivalent artifacts, derive the same Artifact ID.
Unique : Non-equivalent artifacts have distinct Artifact IDs.
Immutable : An artifact cannot be modified without also changing its Artifact ID.
OmniBOR uses the GitOID of an artifact as its Artifact ID.
Source code leaf artifacts are typically already being stored in Git where they are identified via their GitOID.
The Artifact Dependency Graph (ADG) of an artifact is the DAG (Directed Acyclic Graph) of all the 'leaf artifacts' that are transformed by a build tool into that artifact. This includes the direct input artifacts, and the transitive set of artifacts to each input artifact, all the way down to source code.
Simple C Executable
flowchart BT c1[.c] --> o1[.o] h1.1[.h] --> o1[.o] h1.2[.h] --> o1[.o] c2[.c] --> o2[.o] h2.1[.h] --> o2[.o] h2.2[.h] --> o2[.o] o1 --> executable o2 --> executable
Running C Executable with Shared Object
flowchart BT c1[.c] --> o1[.o] h1.1[.h] --> o1[.o] h1.2[.h] --> o1[.o] c2[.c] --> o2[.o] h2.1[.h] --> o2[.o] h2.2[.h] --> o2[.o] o1 --> executable o2 --> executable c3[.c] --> o3[.o] h3.1[.h] --> o3[.o] h3.2[.h] --> o3[.o] c4[.c] --> o4[.o] h4.1[.h] --> o4[.o] h4.2[.h] --> o4[.o] o3 --> .so o4 --> .so executable --> running[running executable] .so --> running[running executable]
Java Example
flowchart BT java1[.java] --> cls1[.class] java2[.java] --> cls2[.class] java3[.java] --> cls3[.class] java4[.java] --> cls4[.class] java5[.java] --> cls5[.class] cls1 --> running[running executable] cls2 --> running[running executable] cls3 --> running[running executable] cls4 --> running[running executable] cls5 --> running[running executable]
Go Example
flowchart BT go1[.go] --> o1[.o] go2[.go] --> o2[.o] go3[.go] --> o3[.o] go4[.go] --> o4[.o] go5[.go] --> o5[.o] o1 --> executable o2 --> executable o3 --> executable o4 --> executable o5 --> executable
Python Example
flowchart BT py1[.py] --> pyc1[.pyc] py2[.py] --> pyc2[.pyc] py3[.py] --> pyc3[.pyc] py4[.py] --> pyc4[.pyc] py5[.py] --> pyc5[.pyc] pyc1 --> running[running executable] pyc2 --> running[running executable] pyc3 --> running[running executable] pyc4 --> running[running executable] pyc5 --> running[running executable]
A build tool is something which reads one or more input artifacts and writes one or more output artifacts.
flowchart LR input1 --> buildtool[build tool] --> output input2 --> buildtool[build tool] input3 --> buildtool[build tool]
Examples:
.c
file and zero or more .h
files to produce a
.o
fileflowchart LR .c --> compiler[[compiler]] *.h --> compiler[[compiler]] compiler --> .o
.o
files to produce an executable fileflowchart LR *.o --> linker[[linker]] linker --> executable
.o
files to produce a shared objectflowchart LR *.o --> linker[[linker]] linker --> .so
flowchart LR executable --> linker[[dynamic linker]] *.so --> linker[[dynamic linker]] linker --> running[running executable]
.java
file to produce a .class
fileflowchart LR .java --> compiler[[compiler]] compiler --> classfile[.class]
.class
files to produce a running processflowchart LR classfile[*.class] --> runtime[[runtime]] runtime --> running[running executable]
.py
file to produce a .pyc
fileflowchart LR .py --> compiler[[compiler]] compiler --> .pyc
The totality of ancestors for a given artifact may be represented as an Artifact Dependency Graph (ADG).
Typically, source code files are hand written by humans, and as such are leaf artifacts in the Artifact Dependency Graph (ADG).
Source code files can also be generated from other inputs by a code generator.
flowchart LR input[input] --> codegenerator[[code generator]] --> generatedsrc[generated source code file]
In this scenario, the generated source code file is a derived artifact. This is because the code generator is a build tool and, by definition, the output from the build tool is a derived artifact.
Code generation is very common in many languages. See go generate, Java Xtend, and qtcpp for examples.
Git is an object store masquerading as a source code management system (SCM).
Git's storage model stores source code and metadata using a Merkel tree.
Git Objects are represented as follows:
${type}
- Git Object Type as a string
blob
- any bytestree
- represents a filesystem treecommit
- represents a Git committag
- represents a Git tag${size}
: size in bytes of ${content}
represented as a string base 10.${content}
: the byte content of the objectA Git blob (binary large object) is the type used for file contents in git:
${content}
- bytes of the file contents
Git Blobs are identified by the SHA-1 hash of the blob object with the GitOID construction, which first hashes in a string containing the object type, an ASCII space character, the length of the content in number of bytes, and an ASCII null terminator character:
An artifact dependency graph can be represented as a graph with nodes identified by an Artifact ID. In the examples below, we only show tree structures for simplicity.
flowchart BT Artifact-2[Artifact-2 ID] --> Artifact-1[Artifact-1 ID] Artifact-3[Artifact-3 ID] --> Artifact-1[Artifact-1 ID] Artifact-4[Artifact-4 ID] --> Artifact-2[Artifact-2 ID] Artifact-5[Artifact-5 ID] --> Artifact-2[Artifact-2 ID] Artifact-6[Artifact-6 ID] --> Artifact-3[Artifact-3 ID] Artifact-7[Artifact-7 ID] --> Artifact-3[Artifact-3 ID]
OmniBOR uses the GitOID of an artifact as its Artifact ID.
flowchart BT Artifact-2[Artifact-2 gitoid] --> Artifact-1[Artifact-1 gitoid] Artifact-3[Artifact-3 gitoid] --> Artifact-1[Artifact-1 gitoid] Artifact-4[Artifact-4 gitoid] --> Artifact-2[Artifact-2 gitoid] Artifact-5[Artifact-5 gitoid] --> Artifact-2[Artifact-2 gitoid] Artifact-6[Artifact-6 gitoid] --> Artifact-3[Artifact-3 gitoid] Artifact-7[Artifact-7 gitoid] --> Artifact-3[Artifact-3 gitoid]
The parent-child relationship is captured by a set of Input Manifests.
Each artifact has an Input Manifest that describes its immediate children consiting of a set of new line delimited records, one for each child, in lexical order.
A child artifact which is itself a leaf artifact would be represented by:
${Artifact ID of child}\n
A child artifact which is itself a derived artifact would be represented by:
${Artifact ID of child}⎵manifest⎵${Artifact ID of child's Input Manifest}\n
Example:
flowchart BT Artifact-2[Artifact-2 Artifact ID] --> Artifact-1[Artifact-1 Artifact ID] Artifact-3[Artifact-3 Artifact ID] --> Artifact-1[Artifact-1 Artifact ID] Artifact-4[Artifact-4 Artifact ID] --> Artifact-2[Artifact-2 Artifact ID] Artifact-5[Artifact-5 Artifact ID] --> Artifact-2[Artifact-2 Artifact ID] Artifact-6[Artifact-6 Artifact ID] --> Artifact-3[Artifact-3 Artifact ID] Artifact-7[Artifact-7 Artifact ID] --> Artifact-3[Artifact-3 Artifact ID]
Artifact-2's Input Manifest:
gitoid:sha256\n
${Artifact ID of Artifact-4}\n
${Artifact ID of Artifact-5}\n
Artifact-3's Input Manifest:
gitoid:sha256\n
${Artifact ID of Artifact-6}\n
${Artifact ID of Artifact-7}\n
Artifact-1's Input Manifest:
gitoid:sha256\n
${Artifact ID of Artifact-2}⎵manifest⎵${Artifact ID of Artifact-2's Input Manifest}\n
${Artifact ID of Artifact-3}⎵manifest⎵${Artifact ID of Artifact-2's Input Manifest}\n
OmniBOR advocates for build tools to embed into each derived artifact the Artifact ID of that derived artifact's Input Manifest.
Examples:
ELF Files (Executables and .so
, and .o
files)
: Embed Input Manifest Artifact ID into an ELF section named .omnibor
ar Files (.a
static libraries)
: Embed Input Manifest Artifact ID into an archive entry named .omnibor
General Archive files (tar
, gzip
, etc.)
: Embed Input Manifest Artifact ID into an archive entry named .omnibor
Java .class
file
: Embed Input Manifest Artifact ID into an annotation named @OMNIBOR
in the
.class
file.
Python .pyc
files
: Embed Input Manifest Artifact ID into an __omnibor__
in the .pyc
file.
Container Images
: Embed Input Manifest Artifact ID into the image manifest as an annotation
named dot.omnibor
Generated Source Code : Embed Input Manifest Artifact ID for a generated source code file using a comment
OmniBOR is not a Software Bill of Materials (SBOM). It is designed to complement SBOMs, such as SPDX or CycloneDX.
OmniBOR can help SBOMs be more precise and reliable.
Most SBOMs allow for 'external identifiers' and can thus use
Artifact IDs to reference the artifacts in the OmniBOR
Artifact Dependency Graph (ADG). This allows an
SBOM describing a specific component, e.g.
Component Name: Django
and Component Version: 1.11.1
, to reference a list
of applicable Artifact IDs.
This is helpful because today two different tools might produce two different SBOMs for the same software artifact. This could occur if the SBOM generation tools use different sources to identify and describe the component. OmniBOR provides a precise software Artifact ID which can be used in SBOMs in situations where naming schemes may be ambiguous.
Example 1: If one SBOM generation tool uses CPEs:
cpe:2.3:a:djangoproject:django:1.11.1:*:*:*:*:*:*:*
and the other uses Package URLs (pURLs):
pkg:pypi/django@1.11.1
… then these two SBOMs might diverge when they define the component
supplier: it could be Component Supplier: djangoproject
or
Component Supplier: pypi
.
Example 2: In another instance a vendor might choose to use their product's current marketing name for the component name in their SBOM generation tools, whereas third-party SBOM generation tools might use the vendor's product name as listed in a CPE or SWID tag.
By enabling both SBOM generation tools to list the OmniBOR Artifact ID(s) for associated with the component, an SBOM consumer can quickly understand that both SBOMs do describe the same artifact, regardless of ambiguities in naming schemes.