Transformation

If you want to know more details about transformation passes, please take a look at section “Transformation Pass” in chapter Internals.

Submodules

Transformation Passes

Base Class

Guide to writing QONNX transformations

Your transformation must inherit the Transformation abstract base class.
Your transformation’s apply function should take in a ModelWrapper, and return a tuple with (transformed_model: ModelWrapper, model_was_changed: Bool)
The transformations are meant to be applied using the .transform function in ModelWrapper. This makes a deep copy of the input model by default, so you don’t have to.
model_was_changed indicates whether your transformation made any changes to the model. If you know your transformation needs to be called only once and repeated calls have no further effect, you can return False even if the model was changed.
You MUST return model_was_changed=False at some point when your transformation is called multiple times, otherwise apply_repeated() will loop infinitely.
If you cannot guarantee that the transformation will reach a fixed point, you must declare this, return model_was_changed = False and let the user manually re-apply the transform.

class qonnx.transformation.base.NodeLocalTransformation(num_workers=None)

Bases: Transformation

Parent class for transformations, which can be executed locally to one node by accessing and modifying the attributes of only that node. This class can then automatically parallelize the transformation. Transformations sublcassing NodeLocalTransformation must implement the abstract method applyNodeLocal(). A read-only copy of the model is available as a member variable ref_input_model, but any modifications there will be disregarded.

To control the degree of parallelization, specify the num_workers argument in the constructor, using one of the following values: * None: use NUM_DEFAULT_WORKERS environment variable * 0: use all available CPU cores * (any other int>0): set number of parallel workers

apply(model)

abstract applyNodeLocal(node)

class qonnx.transformation.base.Transformation

Bases: ABC

Transformation class all transformations are based on. Contains only abstract method apply() every transformation has to fill.

abstract apply(model)

qonnx.transformation.batchnorm_to_affine

class qonnx.transformation.batchnorm_to_affine.BatchNormToAffine

Bases: Transformation

Replaces any test-time BatchNorm layers with Mul-Add layers.

apply(model)

qonnx.transformation.bipolar_to_xnor

class qonnx.transformation.bipolar_to_xnor.ConvertBipolarMatMulToXnorPopcount

Bases: Transformation

Convert MatMul nodes with all-bipolar inputs to XnorPopcountMatMul and associated result correction.

apply(model)

qonnx.transformation.change_3d_tensors_to_4d

class qonnx.transformation.change_3d_tensors_to_4d.Change3DTo4DTensors

Bases: Transformation

Replaces 3D tensors with 4D tensors assuming the following format: [N, C, H] -> [N, C, H, 1]. The attributes of a (specific) set of supported nodes are changed accordingly. If the graph contains unsupported nodes, a warning is raised and the transformation is not applied.

apply(model)

qonnx.transformation.change_batchsize

class qonnx.transformation.change_batchsize.ChangeBatchSize(bsize)

Bases: Transformation

Change the batch size dimension to the given value for the entire graph by changing it for the global input/output and removing all intermediate shapes (will need a call to shape inference to restore shapes). Will attempt to handle any Reshape nodes with constant shape parameters by changing the batch size dimension value in the parameter.

apply(model: ModelWrapper)

qonnx.transformation.change_datalayout

class qonnx.transformation.change_datalayout.ChangeDataLayoutQuantAvgPool2d

Bases: Transformation

Replace QuantAvgPool2d with datalayout (N,C,H,W) with Transpose nodes and QuantAvgPool2dNHWC with datalayout (N,H,W,C)

apply(model)

qonnx.transformation.channels_last

class qonnx.transformation.channels_last.AbsorbChanFirstIntoMatMul

Bases: Transformation

Removes a transpose to channels first node if it is in front of a Flatten and MatMul (or Gemm) node.

The channels first transpose is fused into the initializer of the Quant node acting as a weight tensor for the MatMul/Gemm node. Reshape nodes with shape [1, -1] are also supported instead of Flatten nodes. Independent of whether the flattening operation was performed by a Flatten node or a Resphape node, a Flatten node will be reinserted in-front of the MatMul node.

Note: This transformation removes some of the tensor shapes on the down-stream path.: Thus running shape inference afterwards is advised.

apply(model)

class qonnx.transformation.channels_last.ConvertToChannelsLastAndClean(make_input_channels_last=False)

Bases: Transformation

Converts data layout dependent nodes to ChannelsLast nodes and inserts transformations. Then it tries to eliminate as many transformations as possible and moves the still existing ones as far upstream as possible.

Parameters:: make_input_channels_last (bool) – Also makes the input of the network channels last, otherwise a transpose node will be left at the beginning of the network. Defaults to False

apply(model)

class qonnx.transformation.channels_last.InsertChannelsLastDomainsAndTrafos

Bases: Transformation

Inserts ChannelsLast domain, where required and also inserts required transposes.

apply(model)

class qonnx.transformation.channels_last.MoveChanFirstDownstream

Bases: Transformation

Moves channel first transformations further downstream.

apply(model)

class qonnx.transformation.channels_last.MoveChanLastUpstream

Bases: Transformation

Moves channel last transformations further upstream.

apply(model)

class qonnx.transformation.channels_last.RemoveConsecutiveChanFirstAndChanLastTrafos

Bases: Transformation

Remove two consecutive transformations, which would do: (ChannelsLast -> ChannelsFirst) -> (ChannelsFirst -> ChannelsLast) Or more concrete, the first converts to channels first and the second to channels last.

apply(model)

qonnx.transformation.create_generic_partitions

class qonnx.transformation.create_generic_partitions.PartitionFromDict(partitioning={}, partition_dir=None)

Bases: Transformation

Split a graph into partitions. Each resulting partition node has a model attribute indicating the path to the subordinate onnx file. Cleanup and InferShapes() transformations should be applied first.

This transformation builds on PartitionFromLambda() and takes a dictionary that defines partitions based on node indices.

Argument 0: partitioning * Dictionary with the following format: { partition_id : node_index_list } * Example: {0 : [3,4,5], 1 : range(10, 15)}

Argument 1 (optional): partition_dir * Manually define where to save the partition models

apply(model)

class qonnx.transformation.create_generic_partitions.PartitionFromLambda(partitioning=<function PartitionFromLambda.<lambda>>, partition_dir=None)

Bases: Transformation

Split a graph into partitions. Each resulting partition node has a model attribute indicating the path to the subordinate onnx file. Cleanup and InferShapes() transformations should be applied first.

Argument 0: partitioning * Function performing the mapping: node -> partition_id (int or string) * Partitions may not cover the graph completely (nodes mapped to -1 are retained) * Mapping must return -1 for GenericPartition nodes

Argument 1 (optional): partition_dir * Manually define where to save the partition models

apply(model)

qonnx.transformation.double_to_single_float

class qonnx.transformation.double_to_single_float.DoubleToSingleFloat

Bases: Transformation

Convert any float64 initializers to float32.

apply(model)

qonnx.transformation.expose_intermediate

class qonnx.transformation.expose_intermediate.ExposeIntermediateTensorsLambda(tensor_filter=<function ExposeIntermediateTensorsLambda.<lambda>>)

Bases: Transformation

apply(model: ModelWrapper)

class qonnx.transformation.expose_intermediate.ExposeIntermediateTensorsPatternList(pattern_list, dynamic_only=True)

Bases: ExposeIntermediateTensorsLambda

pattern_filter(tname, model)

qonnx.transformation.extend_partition

class qonnx.transformation.extend_partition.ExtendPartition(extend_index)

Bases: Transformation

Extends GenericPartition type nodes by inserting the graph pointed to by the model attribute. Argument 0: extend_index * List that contains the node indices of the GenericPartition nodes

apply(model)

qonnx.transformation.extract_conv_bias

class qonnx.transformation.extract_conv_bias.ExtractBiasFromConv

Bases: Transformation

Extracts the (optional) Bias from a Conv(Transpose) node and inserts it behind the Conv(Transpose) node as an Add node.

apply(model)

qonnx.transformation.extract_quant_scale_zeropt

class qonnx.transformation.extract_quant_scale_zeropt.ExtractQuantScaleZeroPt

Bases: Transformation

Extract any non-identity scale and zero-point Quant inputs as separate Div/Mul (for scale) and Add/Sub (for zeropoint” nodes, preceding and following the Quant node.

apply(model: ModelWrapper)

qonnx.transformation.fold_constants

class qonnx.transformation.fold_constants.FoldConstants(exclude_op_types=['Quant', 'BipolarQuant'])

Bases: Transformation

Replace the output of a node with const-only inputs with a precomputed result. Skip any op types given in exclude_op_types.

apply(model)

class qonnx.transformation.fold_constants.FoldConstantsFiltered(match_filter_fxn)

Bases: Transformation

Replace the output of a node with const-only inputs with a precomputed result. Use the match_filter_fxn(model, node) function to decide which nodes can be eligible for const folding.

apply(model)

qonnx.transformation.gemm_to_matmul

class qonnx.transformation.gemm_to_matmul.GemmToMatMul

Bases: Transformation

Converts Gemm nodes into a MatMul and an Add node. This transformation is built to support version 9 of the Gemm node, as documented here: https://github.com/onnx/onnx/blob/master/docs/Changelog.md#Gemm-9 However, earlier and later versions of the node are likely to work as well. Explicitly not supported is the optionality of input C in versions >=11 and the broadcast attribute of versions <=6.

apply(model)

qonnx.transformation.general

class qonnx.transformation.general.ApplyConfig(config, node_filter=<function ApplyConfig.<lambda>>)

Bases: Transformation

Applies node properties (attributes) from either a config dict or its JSON representation given as a filename. The JSON file can specify default values for particular op_types, as well as values for nodes with particular names. Example dict:

{
# set kernel_size = 3 for all nodes with op_type=Im2Col
"Defaults" : {"kernel_size" : [3, ["Im2Col"]]},
# set kernel_size = 7 for the particular node with name Im2Col_0
"Im2Col_0" : {"kernel_size" : 7}
}

apply(model)

class qonnx.transformation.general.ConvertDivToMul

Bases: Transformation

Convert divide by constant nodes to multiply by constant nodes.

apply(model)

class qonnx.transformation.general.ConvertSubToAdd

Bases: Transformation

Convert subtract-a-constant nodes to add-a-constant nodes.

apply(model)

class qonnx.transformation.general.GiveRandomTensorNames

Bases: Transformation

Give random tensor names to all tensors.

apply(model)

class qonnx.transformation.general.GiveReadableTensorNames

Bases: Transformation

Give more human-readable names to all internal tensors. You should apply GiveUniqueNodeNames prior to this transform to avoid empty node names, as the readable names are based on the node names.

apply(model)

class qonnx.transformation.general.GiveUniqueNodeNames(prefix='')

Bases: Transformation

Give unique names to each node in the graph using enumeration, starting with given prefix (if specified in the constructor).

apply(model)

class qonnx.transformation.general.GiveUniqueParameterTensors

Bases: Transformation

Make every parameter tensor unique. The aim is to avoid affecting other nodes apart from the one the system is currently operating on.

apply(model)

class qonnx.transformation.general.MovePadAttributeToTensor

Bases: Transformation

Move padding info from attribute into input tensor for Pad nodes.

apply(model)

class qonnx.transformation.general.RemoveStaticGraphInputs

Bases: Transformation

Remove any top-level graph inputs that have initializers.

apply(model)

class qonnx.transformation.general.RemoveUnusedTensors

Bases: Transformation

Remove any unused tensors in the graph by removing any initializers, ValueInfo and tensor annotations associated with it. Unused tensors do not appear as any input/output for any graph nodes.

apply(model)

class qonnx.transformation.general.SortGraph

Bases: Transformation

Returns the model with its node list sorted topologically. Any ONNX graph to be executed must have a topologically sorted node list, as dictated by the ONNX standard.

apply(model)

qonnx.transformation.infer_data_layouts

class qonnx.transformation.infer_data_layouts.InferDataLayouts

Bases: Transformation

Try to infer data layout annotations info for all input/intermediate/output tensors based on inputs and node type.

apply(model)

qonnx.transformation.infer_datatypes

class qonnx.transformation.infer_datatypes.InferDataTypes

Bases: Transformation

Infer QONNX DataType info for all intermediate/output tensors based on inputs and node type.

apply(model)

qonnx.transformation.infer_datatypes.infer_mac_result_dtype(idtypes, possible_negation)

qonnx.transformation.infer_datatypes.is_scaled_int(x)

qonnx.transformation.infer_shapes

class qonnx.transformation.infer_shapes.InferShapes

Bases: Transformation

Ensure every tensor in the model has a specified shape (ValueInfo).

apply(model)

qonnx.transformation.insert_topk

class qonnx.transformation.insert_topk.InsertTopK(k=5, axis=-1, largest=1, sorted=1)

Bases: Transformation

Add TopK node at the network output and replace the graph output with the TopK indices.

apply(model)

qonnx.transformation.lower_convs_to_matmul

class qonnx.transformation.lower_convs_to_matmul.LowerConvsToMatMul

Bases: Transformation

Replace Conv layers with pairs of Im2Col-MatMul layers, plus Transpose layers to keep the original data layout.

apply(model)

qonnx.transformation.make_input_chanlast

class qonnx.transformation.make_input_chanlast.MakeInputChannelsLast

Bases: Transformation

For networks with an input using the NCx data layout, add a transpose node at the beginning and mark the input as using NxC (channels-last).

apply(model)

qonnx.transformation.merge_onnx_models

class qonnx.transformation.merge_onnx_models.MergeONNXModels(pre_model)

Bases: Transformation

Merges two models. The model passed in the transformation will be inserted before the model the transformation is applied on, the resulting model is returned. This transformation will try to connect graph.output[0] of the pre model and graph.input[0] of the post model. If more than one input or output exists, a warning is raised.

apply(model)

qonnx.transformation.pruning

class qonnx.transformation.pruning.ApplyMasks(prune_spec: Dict)

Bases: Transformation

Apply the given sparsity masks in prune_spec to the appropriately named tensors in the model. These masks are only annotations, no actual pruning is performed at this stage.

apply(model: ModelWrapper) → Tuple[ModelWrapper, bool]

class qonnx.transformation.pruning.PropagateMasks(lossy: bool = True)

Bases: Transformation

Propagate the sparsity masks in the network to relevant upstream and downstream layers. Some inital sparsity masks must have been applied either manually or with the ApplyMasks transformation. Note that not all layer types are supported; see the update_node_mask function for details.

apply(model: ModelWrapper) → Tuple[ModelWrapper, bool]

class qonnx.transformation.pruning.PruneChannels(prune_spec: Dict, lossy: bool = True)

Bases: Transformation

Prune channels from specified tensors and their dependencies from a model, as specified by the dictionary given in prune_spec. This dictionary must be formatted as {tensor_name : {axis : {channels}}}. See test_pruning.py for examples. If lossy is True, the transformation will aggresively prune all relevant upstream/downstream layers around the specified tensors. This is good for maintaining the consistency of layer shapes, but may introduce a larger accuracy penalty. If lossy is False, the pruning will be more conservative to preserve the numerical ranges (e.g. biases won’t be pruned in the downstream layers) but this may lead to inconsistent shapes in the network.

apply(model: ModelWrapper) → Tuple[ModelWrapper, bool]

class qonnx.transformation.pruning.RemoveMaskedChannels(lossy: bool = True)

Bases: Transformation

Remove channels indicated by sparsity masks on the model. The sparsity mask annotations will be removed after they have been processed for each tensor. Does not perform any shape consistency checking and may result in a broken graph.

apply(model: ModelWrapper) → Tuple[ModelWrapper, bool]

qonnx.transformation.pruning.ensure_masktype_is_dict(mask)

qonnx.transformation.pruning.merge_dicts_of_sets(dict1, dict2)

qonnx.transformation.pruning.remove_masked_tensor_channels(tensor_or_shape, mask, axis)

qonnx.transformation.pruning.update_node_mask(node, masks_in, masks_out, lossy=True)

qonnx.transformation.qcdq_to_qonnx

class qonnx.transformation.qcdq_to_qonnx.QCDQToQuant

Bases: Transformation

Fuse a chain of nodes, specifically QuantizeLinear+DequantizeLinear back into QONNX Quant node. This transform finds chains of QuantizeLinear followed by DequantizeLinear during the quantization process into a QONNX Quant node. If a Clip node is found between the QuantizeLinear+DequantizeLinear, this will be taken into account for the Quant bitwidth calculation. Input —– A model potentially quantized with QuantizeLinear, (optional) Clip and DequantizeLinear nodes. Output —— A model with QuantizeLinear, Clip and DequantizeLinear nodes re-fused back into QONNX Quant nodes.

apply(model: ModelWrapper) → Tuple[ModelWrapper, bool]

qonnx.transformation.qcdq_to_qonnx.extract_elem_type(elem_type: int, clip_range=None) → Tuple[int, int, bool]: Return Quant attribute specification based on element type and (optional) clipping range. Returns: (bitwidth, signed, is_narrow_qnt)

qonnx.transformation.qonnx_to_qcdq

class qonnx.transformation.qonnx_to_qcdq.QuantToQCDQ

Bases: Transformation

Replace QONNX Quant-style quantization nodes with QuantizeLinear -> Clip -> DequantizeLinear (QCDQ)-style quantization nodes. The following restictions apply on the Quant: - the scale, zero-point and bitwidth inputs for Quant must be statically specified

by an initializer

the bitwidth must be an integer in the range [2, 8]
the zero-point tensor must be zero
the scale must be a scalar value or 1D tensor
the rounding_mode attribute must be ROUND

BipolarQuant is not (yet) supported.

apply(model: ModelWrapper)

qonnx.transformation.quant_constant_folding

class qonnx.transformation.quant_constant_folding.FoldTransposeIntoQuantInit

Bases: Transformation

Fuses a Transpose node into the initializers of a Quant node.

apply(model: ModelWrapper)

qonnx.transformation.quant_constant_folding.is_quant_init(node: NodeProto, model: ModelWrapper)

qonnx.transformation.quantize_graph

class qonnx.transformation.quantize_graph.QuantizeGraph(quantnode_map)

Bases: Transformation

This transformation can be used to introduce a Quant node for a specific type of node in the graph. Users would be able to specify the location of the quant node by providing the input and output index as the parameters.

Expectations:

Onnx model in the modelwraper format.

Model must be cleaned using qonnx.util.cleanup.cleanup_model()

Batchsize to be set.

Steps to transform are:
Step1: Finding the input for the quant node. Step2: Finding the consumer of the quant node output. Step3: Finding the shape for the output tensor of quant node. Note: The output tensor of the quant node must have the same shape as the consumer of the input

to the quant node.

Input:
A dict “quantnode_map” specifying the criterion, positions, and input parameters like scale, bitwidth, zeropoint, and others for a specific quantnode.

Criterion:

name: This will allow users to add quant nodes for specific node like “Conv_0” and “Gemm_0”.
Note: using this users can have quant nodes with different parameters. Ex: quantizing “Conv_0” and “Conv_1” with bitwidth of 4 and 6, respectively.

op_type: This will allow users to add quant nodes for all nodes of a particular op_type such
as, “Conv”, “Gemm”, and others. Note: All quant nodes created using op_type criterion will have the same input parameters (scale, zeropoint, bitwidth, and others.)

name and op_type: In this case, quant nodes will be added with precedence to “Name”
in comparison to “op_type”.

Positions: (“input”, index) or (“output”, index)

“input”: indicates that the user want to quantize the input of the selected node.

“output”: indicates that the user want to quantize the output of the selected node.

index: refers to the input/output index to quantize (a node can have multiple inputs and outputs)

Parameters (to quant node) are provided as (scale, zeropoint, bitwidth, narrow, signed, rounding_mode)

Inputs: scale, zeropoint, bitwidth.

Attributes: narrow, signed, rounding_mode.

Assert:

The input is a dictionary representing the node names as keys and a list of quant positions as values.

The input dictionary must have atleast one mac node (Conv, gemm, matmul) for the transformation.

Return:
Returns a model with new quant nodes created at the positions specified using the “quantnode_map”.

Example:

quantnode_map = {“name”: {“Conv_0”: [((“input”, 0), (1, 0, 8, 0, 1, “ROUND”)),

((“input”, 1), (1, 0, 8, 0, 1, “ROUND”)), ((“output”, 0), (1, 0, 8, 0, 1, “ROUND”))],

“Conv_1”: [((“input”, 0), (1, 0, 8, 0, 1, “ROUND”))], “Conv_2”: [((“input”, 1), (1, 0, 8, 0, 1, “ROUND”)),

((“output”, 0), (1, 0, 8, 0, 1, “ROUND”))]},

“op_type”: {“Gemm”: [((“input”, 0), (1, 0, 8, 0, 1, “ROUND”)),
((“input”, 1), (1, 0, 8, 0, 1, “ROUND”)), ((“input”, 2), (1, 0, 8, 0, 1, “ROUND”)), ((“output”, 0), (1, 0, 8, 0, 1, “ROUND”))]}}

apply(model)

qonnx.transformation.quantize_graph.adjust_graph(model, input_positions, node_name, quantized_nodes)

qonnx.transformation.quantize_graph.create_quantnode(model, quantnode_input, quantnode_output_shape, scale_value, zeropoint_value, bitwidth_value, narrow, signed, rounding_mode)

qonnx.transformation.rebalance_conv

class qonnx.transformation.rebalance_conv.RebalanceIm2Col(extract_channels)

Bases: Transformation

For certain hardware that prefers channel parallelism over feature map spatial parallelism, it is possible to reshape the inputs to an Im2Col node to move some of the spatial dimension into the channels dimension. This transformation attempts to find such Im2Col nodes, adds a Reshape node in front and alters their kernel/stride sizes accordingly. See list of conditions checked in the implementation for a full list, but one example of rebalancing is provided in the unit test for this transformation (test_rebalance_conv.py)

apply(model)

qonnx.transformation.remove

class qonnx.transformation.remove.RemoveIdentityOps(atol=1e-05)

Bases: Transformation

Remove identity ops like Add/Sub with zero or Mul/Div with one. A tolerance value (defaults to 1e-05) can be specified during init for the comparison to zero/one.

apply(model)

class qonnx.transformation.remove.RemoveUnusedNodes

Bases: Transformation

Remove nodes which do not contribute to any top-level output in the graph, either directly or indirectly.

apply(model: ModelWrapper)

qonnx.transformation.remove.remove_node_and_rewire(model, node)

qonnx.transformation.resize_conv_to_deconv

class qonnx.transformation.resize_conv_to_deconv.ResizeConvolutionToDeconvolution(maintain_bit_width: bool = False)

Bases: Transformation

Replaces resize convolution layers (e.g., nearest neighbor upsample + same-padded convolution) with deconvolution layers using the weight convolution algorithm. Currently does not support resize convolutions that use bilinear or bicubic upsampling

apply(model)

qonnx.transformation.subpixel_to_deconv

class qonnx.transformation.subpixel_to_deconv.SubPixelToDeconvolution

Bases: Transformation

Replaces sub-pixel convolution layers (i.e., same-padded convolution + depth2space) with deconvolution layers using the weight shuffle algorithm. Currently does not support same-padded convolutions with biases.

apply(model)

finn.transformation.move_reshape

class finn.transformation.move_reshape.RemoveCNVtoFCFlatten

Bases: Transformation

Removes a flatten node if it is between two fpgadataflow nodes. For an NHWC-Conv to FC transition, the preceding transpose is absorbed. The flatten operation can also be implemented by a reshape node.

apply(model)