DeepStability

A Continuously Growing Database of Numerical Stability Vulnerabilities of Common Numerical Methods in Deep Learning

Index Library Commit hash Language Type of commit Root Cause Manifestation/End User Impact IEEE arithmetic exception type Background Problem DL Topic - level 1 DL Topic - level 2 DL Topic - level 3 Patch type - level 1 Patch type - level 2 Patch type - level 3 Old Solution New Solution Test Math operation References
1 PyTorch ac72881f3ff8c46c2a5cf8b09d02babf46bc4c85 CUDA Fix loss of precision inaccurate result of mean in batch normalization Inexact Sync batch norm applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs with additional channel dimension)
y = ((x - E[x])/sqrt(Var[x] + epsilon)) * alpha + beta
numerical issue in CUDA channels-last SyncBatchNorm, numerical issue of CUDA channels-last SyncBatchNorm', apex SBN channels-last also has this issue data processing batch normalization batch normalization, Cuda rewrite math formula rewrite math formula Replace: div_roundup() with ATenCeilDiv
int div_roundup(int x, int y) {
   return lastPow2(1 + (x-1)/y);
}, where lastPow2 returns 2**floor(log2(n))
ATenCeilDiv(T a, T b) {
  return (a + b - 1) / b;
}
int div_roundup(int x, int y) {
   return lastPow2(1 + (x-1)/y);
}
static int lastPow2(unsigned int n) {
  n |= (n >> 1);
  n |= (n >> 2);
  n |= (n >> 4);
  n |= (n >> 8);
  n |= (n >> 16);
  return std::max<int>(1, n - (n >> 1));
}, where |= is a bitwise or opearator
ATenCeilDiv(T a, T b) {
  return (a + b - 1) / b;
}
def _batch_norm_stats(data):
    mean1, _ = torch.batch_norm_stats(data, 1e-5)
    mean2, _ = torch.batch_norm_stats(data.to(memory_format=torch.channels_last), 1e-5)
    mean_ref = torch.mean(data, (0, 2, 3), keepdim=False)
    self.assertEqual(mean_ref, mean1)
    self.assertEqual(mean_ref, mean2)
division round up https://arxiv.org/abs/1502.03167
https://pytorch.org/docs/stable/generated/torch.nn.SyncBatchNorm.html
2 PyTorch dfc7fa03e5d33f909b9d7853dd001086f5d782a0 Python Fix loss of precision inaccurate result of gradient Inexact lower–upper (LU) decomposition (also called LU factorization) factors a matrix as the product of a lower triangular matrix and an upper triangular matrix. It is a procedure for decomposing an N×N matrix A into a product of a lower triangular matrix L and an upper triangular matrix U, LU=A.
Matrix A = LU. In the lower triangular matrix all elements above the diagonal are zero, in the upper triangular matrix, all the elements below the diagonal are zero.
LU decomposition is an efficient method used for solving a system of linear equations. Suppose we have B=AX and want to solve for X.  (The solution could be X = inverse(A)B. But a matrix inverse is numerically unstable.) Find LU decomposition of A, A = LU. So, B=AX=LUX. Then solve for X with two equations: (1) LY = B and (2) UX = Y
Matrix inverse is numerically unstable, as a result numerical and analytical gradients for LU decomposition are too different.

gradients for the LU decomposition calculation  is unstable, lu_backward is impelemented as autograd
torch.det is using LU in forward, while det_backward is using svd_backward (singular value decomposition).
The issue with svd_backward is that it is only stable for inputs with distinct singular values. As a result, TestGradientsCuda::test_fn_gradgrad_linalg_det_cuda_float64 fails on Windows with GPU, which compares the numerical and analytical gradient. SVD_backward is only stable for ranks n - 1 <= r <= n with singular values sufficiently far away from each other.
gradients/derivatives automatic differentiation gradients for the LU decomposition, backward pass, autograd, linear algebra operations, determinant of a square matrix use a different algorithm use a different algorithm Replace matrix inverse with solutions to systems of linear triangular equations. System of "triangular" equations  refers to the equations having the form of a triangle, because of the lower equations containing only the later variables.
However, works only for square matrices of full rank
-        I = LU_grad.new_zeros(LU_grad.shape)
-        I.diagonal(dim1=-2, dim2=-1).fill_(1)
-        Lt_inv = torch.triangular_solve(I, L, upper=False).solution.transpose(-1, -2)
-        Ut_inv = torch.triangular_solve(I, U, upper=True).solution.transpose(-1, -2)
-
-        phi_L = (L.transpose(-1, -2) @ LU_grad).tril_()
-        phi_U = (LU_grad @ U.transpose(-1, -2)).triu_()
-
-        self_grad_perturbed = Lt_inv @ (phi_L + phi_U) @ Ut_inv
-        return P @ self_grad_perturbed, None, None
phi_L = (L.transpose(-1, -2).conj() @ LU_grad).tril_()
+        phi_U = (LU_grad @ U.transpose(-1, -2).conj()).triu_()
+        phi = phi_L + phi_U
+        X = torch.triangular_solve(phi, L.transpose(-1, -2).conj(), upper=True).solution
+        A_grad = torch.triangular_solve(X.transpose(-1, -2).conj() @ P.transpose(-1, -2), U, upper=True) \
+            .solution.transpose(-1, -2).conj()
+
+        return A_grad, None, None
def sample_inputs_lu(op_info, device, dtype, requires_grad=False):
+    # not needed once OpInfo tests support Iterables
+    def generate_samples():
+        batch_shapes = ((), (3,), (3, 3))
+        for batch_shape, get_infos in product(batch_shapes, (True, False)):
+            shape = batch_shape + (S, S)
+            input = make_tensor(shape, device, dtype, requires_grad=requires_grad, low=None, high=None)
+            yield SampleInput(input, args=(True, get_infos))
+
+    return list(generate_samples())
matrix inverse, autograd
3 PyTorch 8e507ad00ebdfd0ae84bc03718e9c2cb74b8573b yaml Fix overflow/underflow/loss of precision Inaccurate result overflow, underflow, inexact This script defines derivative formulas and Python signatures of methods on Variables Division formula in backward pass is unstable, because multiply two values can lead to loss of precision. When divisor value that is squared is large or small, which results in loss of precision. For extremely large values, the divisor may overflow and will evaluate to inf. For extremely small values the divisor will underflow and will evaluate to 0. gradients/derivatives automatic differentiation backward pass, autograd, division, derivative, higher order gradients rewrite math formula rewrite math formula Instead of dividing by other squared, divide by other twice. Mathematically x / y^2 = x / y / y, but if y is a large finite precision floating point number, then by performing y^2 you may lose precision. Successive divisions achieves the same result while not losing as much precision for large values of y other: -grad * self / (other * other) other: -grad * (self / other) / other division
4 PyTorch fe5d23cf4a9d8f673fb1bfc6e84c642fb6a23182 C++ Fix loss of precision incorrect result and NaN Inexact Cosine Similarity measures the cosine of the angle between two non-zero vectors of an inner product space. This similarity measurement is particularly concerned with orientation, rather than magnitude. In short, two cosine vectors that are aligned in the same orientation will have a similarity measurement of 1, whereas two vectors aligned perpendicularly will have a similarity of 0. If two vectors are diametrically opposed, meaning they are oriented in exactly opposite directions (i.e. back-to-back), then the similarity measurement is -1. Often, however, Cosine Similarity is used in positive space, between the bounds 0 and 1. Cosine Similarity is not concerned, and does not measure, differences is magnitude (length), and is only a representation of similarities in orientation. Cosine similarity implementation that may lose precision and return a value greater than 1.0, which is incorrect, because cosine similarity outputs are in range of -1 and 1. linear algebra distance cosine similarity distance rewrite math formula rewrite math formula Use x / sqrt(x * x) instead of x / (sqrt(x) * sqrt(x)) followig scipy implementation -  Tensor n12 = (w1 * w2).rsqrt_().clamp_max(1.0 / eps);
-  return w12.mul_(n12);
  Tensor n12 = (w1 * w2).clamp_min_(eps * eps).sqrt_();
+  return w12.div_(n12);
  # Check dividing by 0.
+        input1 = torch.randn(10).requires_grad_()
+        input2 = torch.zeros_like(input1).requires_grad_()
+        torch.cosine_similarity(input1, input2, 0).sum().backward()
+        self.assertEqual(input1.grad, torch.zeros_like(input1))
+        self.assertEqual(input2.grad, input1 * 1e8)
reciprocal  of square root
5 Tensorflow/Keras 646d25d15910dc5cc3532aebb7e8395487adad4f C++ Fix overflow/underflow softmax output is NaN overflow, underflow Softmax is a normalized exponential function that takes a vector of n real values as input and outputs a vector of n real values that represent a probability distribution and sum up to 1.  In deep learning classifiers, softmax is used in the last layer, because it normalizes the output of the prior network layer, a vector with size n, to a probability distribution over n predicted output classes. Direct calculation of the softmax function according to its definition formula  is conjugate with numerical issues. Single-precision exp(x) function overflows for x > 89 and underflows for x < −104, and, in turn, cause NaN outputs in the na¨ıve implementations. activation functions activation functions softmax, metal GPU acceleration use a different algorithm use a different algorithm Implement a tree pass softmax algorithm, see algorithm in https://arxiv.org/pdf/2001.04438.pdf softmax https://arxiv.org/pdf/2001.04438.pdf
6 Tensorflow/Keras a3d726ae8246371515a0f666c38668e9da7765f9 C++ Fix underflow error due to divide by zero invalid operation, underflow compute the centered RMSProp, the gradient is normalized by an estimation of its variance The denominator in centered RMSProp optimizer does not add a small epsilon as the last operation. This will not be effective af preventing underflow. Given the current formula ms + eps - mg.square, if ms and mg.square are of very similar magnitude, subtracting two similar numbers will lead to loss of significant digits, which has a risk of underflow. Because the epsilon was added to ms prior to that, it will not prevent overflow optimizers optimizers centered RMSprop optimizer rewrite math formula rewrite math formula Rewrite the order of operations. Reordered the sum (ms - mg^2 + epsilon) to add epsilon last for numerical stability both on CPU and GPU. auto denom = ms + epsilon() - mg.square();

auto denom = epsilon.reshape(single).broadcast(bcast) + ms - mg.square().sqrt()
auto denom = (ms - mg.square()) + epsilon()

auto denom = (ms - mg.square()) + epsilon.reshape(single).broadcast(bcast)
7 PyTorch 6a458512c22c908b19f49262fd0f32a14425ec80 C++ Fix loss of precision assertion error Inexact static _cast converts the type of variable
static_cast can perform conversions between pointers to related classes, not only upcasts (from pointer-to-derived to pointer-to-base), but also downcasts (from pointer-to-base to pointer-to-derived). No checks are performed during runtime to guarantee that the object being converted is in fact a full object of the destination type.
function test_computes_cubic_kernel returns an assertion error saying that input is less than 1e-5, which is untrue. The input is slightly larger: 1.0790e-05. The cause is the precision of a variable returned by a function that performs the power operation (x to the power of y).

On x86_64 a long double will utilize the x87 (the 8087 was the floating point co-processor of the 8086, now it is on the same die as modern amd64 processor) special and proprietary 80 bit float. This 80 bit floating point type is not a part of the IEEE 754 floating point standard. Even though it has more bits of precision, its lack of standardization and its niche nature means that it will often be the cause of stability issues, and is not worth using.
tensor math tensor math power, low level math increase variable precision/change variable type increase variable precision/change variable type Stop using long doubles, they will only cause you trouble. Instead just use the same type as the function input power https://en.wikipedia.org/wiki/X87#Performance
8 Tensorflow/Keras d4b5c606fc9fbd1a20b5b113b4bc831f31d889a3 Python fix loss of precision Dividing by a number that is squared results in dividing by a very large or small number. The square operation could overflow or underflow respectively and if that does not happen, there is a risk of loss of precision due to dividing two very different magnitudes gradients/derivatives gradients gradient rewrite math formula rewrite math formula Avoid a square value in denomator and rewrite division as (-x/y)/y instead of (-x/y^2). They are mathematically equivalent, but the first formula avoids dividing by very large or very small numbers. Proof that they are mathematically equivalent: (-x/y)/y = (-x/y)*(1/y) = -x/(y^2) math_ops.reduce_sum(grad * math_ops.div(-x, math_ops.square(y)) math_ops.reduce_sum(grad * math_ops.div(math_ops.div(-x, y), y) division
9 Tensorflow/Keras 2411514c726f4ccd98e864e8b2e253e6df99c39d C++ fix loss of precision The formula for dequantization in quantization range for multiplication is numerically unstable quantization quantization dequantization rewrite math formula rewrite math formula rewrite the order of operations. Specifically, rewrite q_range_min + (input_array - 1_lowest) * q_range_scale to the following: q_range_min - (q_lowest * q_range_scale + input_array * q_range_scale), which is mathematically equivalent #define DEQUANTIZE_WITH_EIGEN(input_array, q2f)                       \
-  (q2f.range_min +                                                    \
-   (((input_array.template cast<float>() - q2f.lowest_quantized())) * \
-    q2f.range_scale));
#define DEQUANTIZE_WITH_EIGEN(input_array, q2f)                 \
+  ((q2f.range_min - q2f.lowest_quantized() * q2f.range_scale) + \
+   input_array.template cast<float>() * q2f.range_scale)
// Test for signed 32 bit.
+  // Note that we cannot use input mins and maxes that match the range because
+  // there are 7 too few bits of mantissa accuracy in floats to represent
+  // 2**31-1 accurately.  Also there is no good fraction to use because 2**31-1
+  // is a mersenne prime.
+  Tensor input32(DT_QINT32, TensorShape({input_height, input_width}));
+
+  // Use a quantizer centered at 0.
+  float input_range = 1LL << 25;
+  int64 num_levels = (1LL << 32) - 1;
+  float step_size =
+      static_cast<float>(static_cast<double>(input_range) / num_levels);
+  float q_compatible_min_value =
+      roundf(-(input_range / 2.0) / step_size) * step_size;
+  float q_compatible_max_value = q_compatible_min_value + input_range;
+  test::FillValues<qint32>(&input32, {-16384, 0, 16256, -13440, -13312, -13184,
+                                      14720, 14848, 14976});
+
+  Tensor output32 = QuantizedTensorToFloat<qint32>(
+      input32, q_compatible_min_value, q_compatible_max_value);
+  test::FillValues<float>(&expected, {-128.0f, 0.0f, 127.0f, -105.0f, -104.0f,
+                                      -103.0f, 115.0f, 116.0f, 117.0f});
+  // The quantization error in going between 1<<25 and 1<<32 levels.
+  const double kTolerance = .5 / 128.0;
+  test::ExpectTensorNear<float>(expected, output32, kTolerance);
10 PyTorch 43ab91118226b330be6d2274a154b98da233d879 C Fix loss of precision Inaccurate result Inexact Dirichlet distribution is a family of continuous multivariate probability distributions parameterized by a vector Alpha of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD). Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.
In Bayesian probability theory, if the posterior distribution p(θ|x) and the prior distribution p(θ) are from the same probability distribution family, then the prior and posterior are called conjugate distributions, and the prior is the conjugate prior for the likelihood function.

The saddle point technique is a method for deriving an accurate approximation for the probability density function of the mean of a random sample. A point that is not a local extremum yet has zero gradient is called a saddle point, such point can occur in non-convex functions.
low precision of gradient approximation in Dirichlet distribution statistical distributions statistical distributions distributions, Dirichlet distribution, gradient approximation use a different algorithm use a different algorithm Use Taylor expansion and Rice saddle point expansion to approximate gradient and use higher precision types for that computation https://en.wikipedia.org/wiki/Dirichlet_distribution
11 PyTorch ae1a972d78950abc4dab372f496914b5e78b9637 C++ Fix loss of precision inaccurate result Log softmax is an activation function used in the last layer of a neural network that outputs log probabilities loss of precision in log_softmax cpu code when inputs are big but their differences are small activation functions activation functions log softmax rewrite math formula rewrite math formula Rewrite order of operations to avoid loss of significat digits when subtracting two numbers of very similar magnitude. Change order of operations so that a large number is first subtracted by another large number before adding a small number. tmpsum = max_input + std::log(tmpsum);
output_data[d * dim_stride] = input_data[d * dim_stride] - tmpsum;
tmpsum = std::log(tmpsum);
output_data[d * dim_stride] =
+                  input_data[d * dim_stride] - max_input - tmpsum;
log(exp(x_i)/sum(exp(x)) log, exp, division, sum
12 PyTorch 0c588a500b2219c028eefe595cff0829fd982f52 Python Fix loss of precision SigmoidCrossEntropyWithLogits computes sigmoid cross entropy given logits. Sigmoid cross-entropy is a Sigmoid activation plus a Cross-Entropy loss. Using sigmoid followed by a multinomial logistic loss layer can be less stable than a single layer of sigmoid cross entropy with logits loss functions loss functions, activation functions cross entropy, sigmoid use a different algorithm use a different algorithm Use a single layer of sigmoid cross entropy with logits instead. Replace sigmoid + xent loss with SigmoidCrossEntropyWithLogits. The sigmoid layer computes the multinomial logistic loss of the sigmoid of its inputs. It's conceptually identical to a sigmoid layer followed by a multinomial logistic loss layer, but provides a more numerical stable gradient.
13 PyTorch 3d06a1e075ef0e6f4bf862d13e83cdd4b02dbc32 Cuda Fix loss of precision Welford’s method is a usable single-pass method for computing the variance. It can be derived by looking at the differences between the sums of squared differences for N and N-1 samples. Algorithm:
variance(samples):
  M := 0
  S := 0
  for k from 1 to N:
    x := samples[k]
    oldM := M
    M := M + (x-M)/k
    S := S + (x-M)*(x-oldM)
  return S/(N-1)
THCTensor_varInnermostDim numerically unstable tensor math tensor math low level tensor math, variance calculation, GPU use a different algorithm use a different algorithm Make THCTensor_varInnermostDim numerically stable using Welford's algorithm (#3425)
    * Use Welford's algorithm when reducing along inner dimension for THCTensor's variance fn
    * Use accreals in THCTensor's varInnermostDim
variance https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm
14 PyTorch 638f0b5d78fe5ff2e484dc573c35b97a4bcf4e82 Python Fix invalid input loss = NaN invalid operation Negative log likelihood loss with Poisson distribution of target. The Poisson distribution is used to model the number of events occurring within a given time interval.
target∼Poisson(input)loss(input,target)=input−target∗log(input)+log(target!)
log(0) = NaN in poisson negative log likelihood loss function loss functions loss functions loss, poisson negative log likelihood loss rewrite math formula rewrite math formula Add small epsilon to prevent log(0) following keras implementation, eps=1e-8 `input - target * log(input)`. Default: True log_input=False. Default: 1e-8
`input - target * log(input+eps)`. Defrue
log https://pytorch.org/docs/stable/generated/torch.nn.PoissonNLLLoss.html
15 PyTorch 81b995514ea908b635d725e11d1b91ac7ad03eb0 C Fix overflow/loss of precision Welford’s method is a usable single-pass method for computing the variance. It can be derived by looking at the differences between the sums of squared differences for N and N-1 samples. Algorithm:
variance(samples):
  M := 0
  S := 0
  for k from 1 to N:
    x := samples[k]
    oldM := M
    M := M + (x-M)/k
    S := S + (x-M)*(x-oldM)
  return S/(N-1)
numerical stability of std and var of THTensor,  formulas for the variance may involve sums of squares, which causes loss of precision or overflow when dealing with large values tensor math tensor math low level tensor math, variance and standard deviation calculation, CPU use a different algorithm use a different algorithm Use Welford’s algorithm for better numerical stability tensor = torch.FloatTensor([1.0, 2.0, 3.0])
        self.assertEqual(tensor.var(unbiased=True), 1.0)
        self.assertEqual(tensor.var(unbiased=False), 2.0 / 3.0)
variance, standard deviation
16 PyTorch 455038e470dd60dae45f68948ae876b1931a8bf0 Cuda Fix overflow/underflow Spatial logsoftmax computes the log of spatial softmax. Spatial softmax returns the expected pixel locations of each feature map in a CNN and hence, be better described as spatial soft argmax. It is defined in https://arxiv.org/pdf/1504.00702.pdf.

Each output channel of the softmax is a probability distribution over the location of a
feature in the image. To convert from this distribution to a coordinate representation
(fcx, fcy), the network calculates the expected image position of each feature, yielding a
2D coordinate for each channel.

s_cij = e^(a_cij) / sum_from_i'_to_j'(e^(a_ci'j') ), where i and j are coordinates specifing location in an image
Spatial log softmax in CUDA backend for the Neural Network Package is not stable activation functions activation functions spatial log softmax, CNN rewrite math formula rewrite math formula It appears to be reducing the sum by the maxiumum input vector at each iteration of accumulating sum. This may be to ensure that the input to exp() is not too large. sum += THCNumerics<T>::exp(input[inputStartIndex + i]);
sum = AccumT(1) / sum;
output[outputIndex] = ScalarConvert<AccumT, T>::to(
-        THCNumerics<AccumT>::log(sum * THCNumerics<T>::exp(input[inputStartIndex + i])));
T maxInput = input[inputStartIndex];
+    for (int i = 1; i < classSize; i++) {
+      T value = input[inputStartIndex + i];
+      maxInput = THCNumerics<T>::ge(maxInput, value) ? maxInput : value;
+    }
+      sum += THCNumerics<T>::exp(input[inputStartIndex + i] - maxInput);
+    T logsum = maxInput + ScalarConvert<AccumT, T>::to(THCNumerics<AccumT>::log(sum));
+      output[outputIndex] = input[inputStartIndex + i] - logsum;

spatial logsoftmax, log, exp, scalar convert https://arxiv.org/pdf/1504.00702.pdf
17 PyTorch c010ef7f0c6d837809a7e973048afac76373e3de Cuda Fix overflow Cuda block is a group of threads that execute the same task. CUDA blocks are grouped into a grid. A kernel (i.e.: Cuda functipn) is executed as a grid of blocks of threads. Overflow issue in GET_BLOCKS Cuda function that returns the number of blocks used for scheduling blocks in Cuda device (i.e.: Nvidia GPU), because addition operations on N could cause an overflow for large N. other Cuda blocks Cuda thread scheduling rewrite math formula rewrite math formula Rather than directly adding to N, rearrange the operations to shrink N first. (N + CUDA_NUM_THREADS - 1) / CUDA_NUM_THREADS; auto block_num = (N - 1) / CUDA_NUM_THREADS + 1; division
18 PyTorch 6be3e5d3bb00a288da51bd368c5342c8676bbcf7 Python Unit test loss of precision One of basic units of computation in Caffe2 are the Operators. Operators in Caffe2 are kind of like functions. CAFFE (Convolutional Architecture for Fast Feature Embedding) is a deep learning framework, originally developed at University of California, Berkeley. It is open source, under a BSD license. It is written in C++, with a Python interface. unstable formula for updating gradient and momentum for adagrad optimizer in Caffe 2 in operators test script optimizers optimizers adagrad testing, optimizer, gradients, caffe2, weight decay, momentum rewrite math formula rewrite math formula Rewrite formula, specifically change updating gradient using a temp variable (change x+= y to temp = x + y) grad += weight_decay * param_in_f32 grad_temp = grad + weight_decay * param_in_f32
19 PyTorch 0b7e8323256e56728e1ffc9ee5d701987af3d06c C++ Unit test overflow The primary difference between const and constexpr variables is that the initialization of a const variable can be deferred until run time. A constexpr variable must be initialized at compile time. signed integer overflow of variable range other random number generator random number generator testing increase variable precision/change variable type increase variable precision/change variable type change type of variable range from signed to unsigned int 64 bits
and change type from const auto to constexpr and
const int64_t max_val = std::is_floating_point<T>::value ? int64_max_val : static_cast<int64_t>(t_max_val);
range = *to - from;
range = max_val - from + 1;
range = static_cast<uint64_t>(*to) - static_cast<uint64_t>(from);
range = static_cast<uint64_t>(max_val) - static_cast<uint64_t>(from) + 1;
20 PyTorch 470c496eb224bdd735eea1accf7269dfdd87d49f Python Fix loss of precision Cholesky inverse = Compute inverse of Hermitian positive definite matrix using Cholesky factorization
inverse(S) = inverse(LL*)
In multivariate normal distribution class, there is a function for computing the precision matrix that uses inverse, which is numerically unstable statistical distributions statistical distributions multivariate normal distribution, precision matrix use a different algorithm use a different algorithm Replace the naive inverse with a cholesky inverse for improved stability -        scale_tril_inv = torch.inverse(self._unbroadcasted_scale_tril)
-        return torch.matmul(scale_tril_inv.transpose(-1, -2), scale_tril_inv).expand(
identity = torch.eye(self.loc.size(-1), device=self.loc.device, dtype=self.loc.dtype)
+        # TODO: use cholesky_inverse when its batching is supported
+        return torch.cholesky_solve(identity, self._unbroadcasted_scale_tril).expand(
matrix inverse
21 PyTorch 071971476d7431a24e527bdc181981678055a95d Python Fix overflow torch.clamp(input, min, max, *, out=None) → Tensor
Clamp all elements in input into the range [ min, max ].
Binomial distribution class encounters overflow when logits are large. Note: the binomial distribution is parametrized by logits statistical distributions statistical distributions Binomial distribution, log probability rewrite math formula rewrite math formula Rewrite equation for log_prob method and use a custom clamp function on logits to ensure they are of certain value. The custom clamp function works like torch.clamp, except for that it returns 0.5 when gradient = 0 and value = 0
-        return (log_factorial_n - log_factorial_k - log_factorial_nmk +
-                value * self.logits - self.total_count * torch.log1p(self.logits.exp()))
def _clamp_by_zero(x):
+    # works like clamp(x, min=0) but has grad at 0 is 0.5
+    return (x.clamp(min=0) + x - x.clamp(max=0)) / 2

+        normalize_term = (self.total_count * _clamp_by_zero(self.logits)
+                          + self.total_count * torch.log1p(torch.exp(-torch.abs(self.logits)))
+                          - log_factorial_n)
+        return value * self.logits - log_factorial_k - log_factorial_nmk - normalize_term

def test_binomial_stable(self):
+        logits = torch.tensor([-100., 100.], dtype=torch.float)
+        total_count = 1.
+        x = torch.tensor([0., 0.], dtype=torch.float)
+        log_prob = Binomial(total_count, logits=logits).log_prob(x)
+        self.assertTrue(torch.isfinite(log_prob).all())
+
+        # make sure that the grad at logits=0, value=0 is 0.5
+        x = torch.tensor(0., requires_grad=True)
+        y = Binomial(total_count, logits=x).log_prob(torch.tensor(0.))
+        self.assertEqual(grad(y, x)[0], torch.tensor(-0.5))
22 PyTorch 3dcc329746223bc24f8213ccbaa5eba09273e162 C++ Fix loss of precision Inaccurate result Inexact Summation of numbers should be performed from smallest to largest to avoid loss of significant digits Loss of precision and floating point truncation in summation formula. Summing many floating point values can lead to loss in precision if the values are different orders of magnitude. tensor math tensor math summation, tensor math use a different algorithm use a different algorithm Use a tree based approach where items of similar orders of magnitude are summed together  to avoid numerical instability.

This algorithm does the summation along a single axis with multiple "levels" of accumulator, each of which is designed to hold the sum of an order of magnitude more values than the previous. e.g. if there are 2^16 elements, the first level will hold the sum of 2^4 elements, and so on in increasing powers of 2: 2^4, 2^8, 2^12 and finally 2^16. This limits the differences in magnitude of the partial results being added together, and so we don't lose accuracy as the axis length increases.
A simplified recursive implementation would look like this:
+
+  scalar_t row_sum(const scalar_t * data, int64_t n) {
+    // Note, in practice the chunk size can increase with n
+    // This allows the recursion depth to be limited to O(1).
+    constexpr int64_t min_chunk_size = 16;
+
+    scalar_t sum = 0;
+    if (n <= min_chunk_size) {
+      // Recursive base case, calculate a simple running sum
+      for (int64_t i = 0; i < n; ++i) {
+        sum += data[i];
+      }
+      return sum;
+    }
+
+    // Recursively sum larger chunks of elements
+    const int64_t chunk_size = std::max(divup(n, min_chunk_size), min_chunk_size);
+    for (int64_t i = 0; i < n; i += chunk_size) {
+      sum += row_sum(data + i, std::min(chunk_size, n - i));
+    }
+    return sum;
+  }
ASSERT_NEAR(norm_after, max_norm, 1e-6); sum
23 PyTorch d16c8238e164c6499714de625eb73422382e5ec1 Python Fix overflow/underflow Inaccurate result, NaN overflow, underflow, inexact Softmax function turns a vector of K real values into a vector of K real values that sum to 1. The input values can be positive, negative, zero, or greater than one, but the softmax transforms them into values between 0 and 1, so that they can be interpreted as probabilities. I Implementation of softmax  for certain cases (when the dim argument of softmax and axis do not equal to ndim - 1, where ndim - 1 = the last dimension) is numerically unstable. Large inputs into the exponential function will produce infinity and output of softmax becomes NaN. activation functions activation functions softmax use a different algorithm use a different algorithm Transpose input to allow for using ONNX softmax numerically stable implementation softmax exp
24 PyTorch b403b10ff98a6bc1a238e7ba4eee6393b6b89048 C++ Fix loss of precision categorical cross entropy yields inacurate result Inexact When a small float is subtracted from a large float, the large float is not changing in value (as it should mathematically). logsoftmax not working for large logits, as a result nn.CrossEntropyLoss() yields incorrect results for big logits loss functions loss functions logsoftmax, cross entropy loss rewrite math formula rewrite math formula Rewrite formula considering maximum input. If we add a very small number to a large one, the small number will be ignored. Example: tmpsum = 1e8 + log(2) = 1e8. Numerically with float precision the log(2) is ignored so at the end we basically have 1e8 - (1e8 + log(2)) = 0 instead of -log(2). [tmp_sum](Vec x) { return x - Vec(tmp_sum); } [tmp_sum](Vec x) { return x - Vec(tmp_sum); } def test_log_softmax(self):
+        x_small = torch.ones(1, 2, dtype=torch.float32)
+        x_big = x_small + 1e16
+        self.assertEqual(F.log_softmax(x_small, -1), F.log_softmax(x_big, -1))
log softmax subtaction
25 PyTorch f8cab38578a99ad04d23256c2da877db4814f76f Python Fix invalid operation ? Only a positive definite matrix has a unique Cholesky factorization A = RTR, where R is upper triangular with positive diagonal elements. A positive definite matrix = symmetric matrix with all positive eigenvalues. Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations. A = RTR, R is called the Cholesky factor of A. Matrix inverse triggers a cholesky error, because the matrix is not positive definite. Also, matrix inverse can cause numerical instability. statistical distributions statistical distributions Gaussian distribution rewrite math formula rewrite math formula only take inverse of a triangular matrix def _precision_to_scale_tril(P):
+    # Ref: https://nbviewer.jupyter.org/gist/fehiepsi/5ef8e09e61604f10607380467eb82006#Precision-to-scale_tril
+    Lf = torch.cholesky(torch.flip(P, (-2, -1)))
+    L_inv = torch.transpose(torch.flip(Lf, (-2, -1)), -2, -1)
+    L = torch.triangular_solve(torch.eye(P.shape[-1], dtype=P.dtype, device=P.device),
+                               L_inv, upper=False)[0]
+    return L
matrix inverse
26 PyTorch c1790fa202f30e3aca1d1ecb31f26e0b3bb1e69f Cuda, C++ Fix loss of precision linear interpolation is a method of curve fitting using linear polynomials to construct new data points within the range of a discrete set of known data points.Parameters of lerp: a, b, t, Output: a + t (b-a), The parameter t defines where to estimate the value on the interpolated line, it is 0 at the first point and 1 and the second point. For interpolated values between the two points mu ranges between 0 and 1. https://en.wikipedia.org/wiki/Linear_interpolation#Programming_language_support unstable formula for linear interpolation tensor math linear interpolation linear interpolation rewrite math formula rewrite math formula rewrite formula
// Imprecise method, which does not guarantee v = v1 when t = 1, due to floating-point arithmetic error. This method is monotonic
// This form may be used when the hardware has a native fused multiply-add instruction.
float lerp(float v0, float v1, float t) {
  return v0 + t * (v1 - v0);
}

// Precise method, which guarantees v = v1 when t = 1. This method is monotonic only when v0 * v1 < 0. Lerping between same values might not produce the same value
float lerp(float v0, float v1, float t) {
  return (1 - t) * v0 + t * v1;
}
ret_val = self_val + weight_val * (end_val - self_val);
ret_val = (weight_val < 0.5) ?
            self_val + weight_val * (end_val - self_val) : end_val - (end_val - self_val) * (1 - weight_val);
a + t (b-a) linear interpolation
27 PyTorch e17b8dea1dd30bef55b314b0217f79ce22a13cf9 C++ Fix overflow In C and C++, integer literals are interpreted as an `int` type unless specified otherwise by using a trailing L for long and LL for long long, e.g., 42 is an int, 42L is a long, 42LL is a long long. On x86_64 systems using GNU toolchains on Linux, this is 32, 64, and 64 bits respectively.
$ cat long.c
#include <stdio.h>

int main(void) {
    printf("sizeof(42) = %lu\n", sizeof(42));
    printf("sizeof(42L) = %lu\n", sizeof(42L));
    printf("sizeof(42LL) = %lu\n", sizeof(42LL));
    return 0;
}
(py38) kyle@fulltower:~
$ gcc long.c
(py38) kyle@fulltower:~
$ ./a.out
sizeof(42) = 4
sizeof(42L) = 8
sizeof(42LL) = 8
Accumulator is overflowing because the starting value of accumulation is too small of a type (int) to accomodate the size of inputs that are common in Pytorch. Calculation of number of elements (e.g.: number of batches) overflows, because return type does not have enough precision to hold the result. linear algebra linear algebra linear algebra, distance increase variable precision/change variable type change variable type Use 64 bit type for accumulator.
-  int64_t numel = std::accumulate(oldshape.begin(), oldshape.end(), 1,
-                                  std::multiplies<int64_t>());
const int64_t numel = prod_intlist(oldshape);
28 PyTorch 56840f0a81e4460089740d50d3768f37e79a17fc Cuda Fix overflow In binary search, the variables used to represent the indices will often be of fixed size (integers), and this can result in an arithmetic overflow for very large arrays.
If the midpoint of the span is calculated as (L+R)/2, then the value of L+R may exceed the range of integers of the data type used to store the midpoint, even if L and R are within the range.
If L and R are nonnegative, this can be avoided by calculating the midpoint as  L+ ((R-L)/2)

Bucketize bucketizes 'input' based on 'boundaries'.

Summary
For example, if the inputs are boundaries = [0, 10, 100] input = [[-5, 10000] [150, 10] [5, 100]]

then the output will be output = [[0, 3] [3, 2] [1, 3]]
Possible overflow when adding two 32 bit ints in binary search algorithm when calculating the midpoint other bucketize binary search, bucketize operation rewrite math formula rewrite math formula By first subtracting low from high, this assures that this intermediate calculation will not overflow its 32 bit datatype. int32_t median = (high + low) / 2; const int32_t median = low + (high - low) / 2; int32_t mp1(int32_t a, int32_t b){
            return (a+b)/2;
    }
    int32_t mp2(int32_t a, int32_t b){
            return a+(b-a)/2;
    }
    int main(){
            int32_t low=-1;
            for(int32_t high=1;high<10000;high++){
                    if(mp1(low,high)!=mp2(low,high)){
                            std::cout<<"Ahhhh!"<<std::endl;
                    }
            }
    }
29 PyTorch 7f42d1c98a72855806bd35ef27ce6823837e0816 C++ Fix loss of precision Python "floats" are actually doubles internally Originally a float was used, which has less precision than a double. JIT only supports double, not float. So when insertConstant, we need to cast the python `float_` to double instead of float. This will fix the incorrect `math.pi` and other high precision constants value. other other python bindings from C++, low level math, constants increase variable precision/change variable type increase variable precision When converting a Pyobject representation into a C++ representation, use a double instead of a float return toSimple(g.insertConstant(py::cast<float>(obj), loc)); return toSimple(g.insertConstant(py::cast<double>(obj), loc));
30 PyTorch c784f847debc6f6a30b41da6853517b2ccd3ddf0 C++ Fix overflow int is 32 bits on amd64/Linux/GNU. sizes and indexes should use size_t in order to use the word size of the current platform, which allows one to index as many elements as could possibly fit into memory. sparse_adagrad param_size overflow error optimizers optimizers adagrad optimizer increase variable precision/change variable type increase variable precision Correctly replace the data type of a size from "int" to "size_t" int param_size
uint64_t idx_pref = indices[i_pref];
size_t param_size
auto idx_pref = indices[i_pref];
31 PyTorch 76c1b5cd794c44e4fec8da1d87ec8f0ccc045e68 C++ Fix overflow Std::numeric_limits = way to query various properties of arithmetic types Reusing a variable who's data type (precision) depends on the template argument.
Bug: caffe2/caffe2/operators/stats_put_ops.h:66:25: runtime error: 9.22337e+18 is outside the range of representable values of type 'long' . The assignment from int64_t to float loses some precision and because of that we overflow
other external library Caffe operators increase variable precision/change variable type increase variable precision increase precision computation to int 64 and as opposed to converting to int64_t at the end from float
add overflow safeguard using std::numeric_limits
-        input = 0;
-      } else if (input < -bound_value) {
-        input = -bound_value;
-      } else if (input > bound_value) {
-        input = bound_value;


-    int64_t int_value = input * magnitude_expand_;
int_value = 0;
+      } else if (input <= -bound_value) {
+        int_value = std::numeric_limits<int64_t>::min();
+      } else if (input >= bound_value) {
+        int_value = std::numeric_limits<int64_t>::max();
+      } else {
+        int_value = input * magnitude_expand_;
       }
     } else {
       CAFFE_ENFORCE(
           std::abs(static_cast<int64_t>(input)) < bound_value,
           "Input value is too large for the given magnitude expansion!");
       CAFFE_ENFORCE(!isNan(input), "Input value cannot be NaN!");
+      int_value = input * magnitude_expand_;
     }
def test_clamp_with_out_of_bounds(self):
+        put_value = float(1e20)
+        magnitude_expand = 1000000000000
+        stat_name = "stat".encode('ascii')
+        sum_postfix = "/stat_value/sum".encode("ascii")
+        count_postfix = "/stat_value/count".encode("ascii")
+
+        workspace.FeedBlob("value", np.array([put_value], dtype=np.float))
+
+        workspace.RunOperatorOnce(core.CreateOperator(
+            "AveragePut",
+            "value",
+            [],
+            stat_name=stat_name,
+            magnitude_expand=magnitude_expand,
+            bound=True))
+
+        workspace.RunOperatorOnce(core.CreateOperator(
+            'StatRegistryExport', [], ['k', 'v', 't']))
+
+        k = workspace.FetchBlob('k')
+        v = workspace.FetchBlob('v')
+
+        stat_dict = dict(zip(k, v))
+
+        self.assertIn(stat_name + sum_postfix, stat_dict)
+        self.assertIn(stat_name + count_postfix, stat_dict)
+        self.assertEquals(stat_dict[stat_name + sum_postfix],
+            9223372036854775807)
         self.assertEquals(stat_dict[stat_name + count_postfix], 1)
32 PyTorch 08b1324ec26043b1acfaf4b65335c671c8658a3c C Fix overflow integer overflow in remainder operator tensor math tensor math tensor math, remainder operator rewrite math formula rewrite math formula, add overflow check The sign of the result of modulo should be the same as the denominator. This commit checks that those signs have not flipped, which would indicate an overflow. There is a bug, however, because signed overflow is undefined behavior in C, and therefore the compiler is allowed to emit any machine code for this. A compiler upgrade may break this code, or more likely, not emit machine code for this condition check, since a signed integer cannot overflow as per the standard, and thus the compiler can ignore that as a condition.    TensorRemainderOp(T v) : val(v) {}
   __device__ __forceinline__ void operator()(T* out, T* in) {
     *out = *in % val;
-    if ((*out * val) < 0){
       *out += val;
     }
static inline bool has_different_sign(real a, real b) {
+  return (a < 0) != (b < 0);
+}

   TensorRemainderOp(T v) : val(v) {}
   __device__ __forceinline__ void operator()(T* out, T* in) {
     *out = *in % val;
+    if (has_different_sign<T>(*out, val)){
       *out += val;
     }
def _test_remainder_overflow(self, dtype=torch.int64):
+        # Check Integer Overflows
+        x = torch.tensor(23500, dtype=dtype)
+        q = 392486996410368
+        self.assertEqual(x % q, x)
+        self.assertEqual(-x % q, q - x)
+        self.assertEqual(x % -q, x - q)
+        self.assertEqual(-x % -q, -x)
+
+    def test_remainder_overflow(self):
+        self._test_remainder_overflow(self, dtype=torch.int64)
For CUDA:
+    def test_remainder_overflow(self):
+        TestTorch._test_remainder_overflow(self, dtype=torch.cuda.int64)
33 PyTorch 6185b27cc6645d8055b76f9cc330b010d1c2a258 C++ Fix loss of precision Standard_gamma_grad computes the reparameterized gradient -(d/dalpha cdf(x;alpha)) / pdf(x;alpha) for random number x drawn from a standard Gamma distribution Gamma(alpha)
standard_gamma_grad_one(scalar alpha, scalar x)
low precision of gamma distribution gradient statistical distributions statistical distributions gradients, gamma distribution use a different algorithm use a different algorithm Use Taylor series expansion and Rice saddle point expansion instead of asymptotic approximation for caluclating gamma distribution gradient. In particular, use a Taylor series expansion for small x and a Rice saddle point expansion for large alpha.
34 PyTorch c43b120d4329dbcbed114eae8b4cfb23f11b3779 C Fix loss of precision linspace operation creates a  one-dimensional tensor of size steps whose values are evenly spaced from start to end, inclusive. low float precision in linear approximation operation tensor math tensor math linspace (i.e.: 1D tensor creation) rewrite math formula rewrite math formula reverse order of multiplication and division. Move variable i from numerator to denominator. However, this formula would yield a different result. *r__data = a + i*(b-a)/((real)(n-1)); *r__data = a + (b-a)/((real)(n-1))*i; start, start + (end-start)/(steps-1), ..., start + (steps - 2) * (end-start)/(steps-1) multiply, divide, add, subtract
35 PyTorch 415658836538d69362ed5482dc5fbfdba39a1c69 C++ Unit test hardware result of log is sligtly different on different hardware platforms Inexact Logarithms are easy to compute in some cases, such as log10(1000) = 3. In general, logarithms can be calculated using power series or the arithmetic–geometric mean, or be retrieved from a precalculated logarithm table that provides a fixed precision.[ Log approximation is not bitwise identical on different hardware platforms. Different processors (in this case broadwell vs skylake) can have different behavoir when it comes to floating point operations. Log is implemented in software, so different hardware platforms using the same software stack may be using the same algorithm to calculate log, but the primitive operations used for floating point operations can be different between different FPU implementations. tensor math tensor math tesing output accuracy, log approximation, hardware, tensor math relax accuracy test tolerance relax accuracy test tolerance Rather than asserting bit by bit perfect match, instead compare with a tollerence of 32 bit floating point epsilon. Epsilon is the smallest number such that when added to the floating point number 1.0, yields a value greater than 1.0.
Allow 1 ULP (unit in place) tolerance by allowing for epsilon relative tolerance error. Epsilon is defined using C++ standard library's numeric_limits for float, which returns the machine epsilon, that is, the difference between 1.0 and the next value representable by the floating-point type T.
// Results should be bit-identical.
    ASSERT_TRUE(
        memcmp(
            B_ref.data_ptr<float>(), B_t.data_ptr<float>(), B_ref.nbytes()) ==
        0);
// Results should be bit-identical.
    ASSERT_TRUE(torch::allclose(
        B_t, B_ref, /*rtol=*/eps, /*atol=*/0.0f, /*equal_nan=*/true))
        << "Input[:8]\n"
        << A_t.index({Slice(0, 8)}) << "\n"
        << "Test[:8]\n"
        << B_t.index({Slice(0, 8)}) << "\n"
        << "Ref[:8]\n"
        << B_ref.index({Slice(0, 8)}) << diffs(B_t, B_ref);
N/A log log approximation
36 PyTorch 2e35fe953553247d8a22fc38b039374e426f13b8 C++ Speed optimization inefficient algorithm low speed of model training N/A variational maximum likelihood (VML) is a parametric statistical estimation techniques. VML (Beal, 2003)
also referred to as (variational) expectation-maximization (McLachlan and Krishnan, 2007; Barber, 2012), can be considered a semi-Bayesian estimation approach.
VML rests on a decomposition of the log marginal likelihood
FPU only has 1 divider, so FP division operations are slow. tensor math tensor math log approximation,  tensor math use a different algorithm use a different algorithm Implement variational maximum likelihood for log approximation, which will be faster. It increase speed by keeping floating point units busy by avoiding division operations to to allow for better instruction level pararellism. Use a power series using log vml instead of sleef. N/A // Generate every single-precision FP value in [1.0, 2.0).
+  auto eps = std::numeric_limits<float>::epsilon();
+  at::Tensor A_t = torch::arange(1.0f, 2.0f, eps);
+  ASSERT_EQ(A_t.numel(), 1 << 23);
+
+  test(A_t);
+
+  test(A_t * 2.0f);
+  test(A_t * 0.5f);
+
+  test(A_t * 4.0f);
+  test(A_t * 0.25f);
+
+  test(A_t * powf(2.0f, 16));
+  test(A_t * powf(2.0f, -16));
+
+  test(A_t * powf(2.0f, 126));
+  test(A_t * powf(2.0f, -126));
+
+  test(torch::full({32}, INFINITY));
+  test(torch::full({32}, NAN));
+
+  auto min = std::numeric_limits<float>::min();
+  auto denorm_min = std::numeric_limits<float>::denorm_min();
+
+  // Denormals aren't bit precise, because sleef isn't bit-precise either.
+  A_t = torch::arange(0.0f, min, denorm_min);
+  ASSERT_EQ(A_t.numel(), 1 << 23);
+  auto B_ref = at::log(A_t);
+  auto B_t = at::empty_like(B_ref);
+  cg.call({A_t.data_ptr<float>(), B_t.data_ptr<float>(), A_t.numel()});
+  ASSERT_TRUE(torch::allclose(B_t, B_ref));
+}
log log
37 PyTorch 1047957831e2ef68d60af90865187e46ba6e5e86 C++ Speed optimization inefficient algorithm low speed of model training N/A SLEEF stands for SIMD Library for Evaluating Elementary Functions. It implements manually vectorized versions of all C99 real floating point math functions. It can utilize SIMD instructions that are available on modern processors. SLEEF is designed to efficiently perform computation with SIMD instructions by reducing the use of conditional branches and scatter/gather memory access. Log can be slow to compute, an optimized algorithm can help. tensor math tensor math log approximation,   tensor math use a different algorithm use a different algorithm add log approximation based on SLEEF. N/A log
38 PyTorch 2572d7a67123fdccef8979520be335c95605cf82 Python Unit test loss of precision Inaccurate result Inexact PyTorch provides two different modes of quantization: Eager Mode Quantization and FX Graph Mode Quantization. Eager Mode Quantization is a beta feature. User needs to do fusion and specify where quantization and dequantization happens manually, also it only supports modules and not functionals. https://pytorch.org/docs/stable/quantization.html quantization aware training (weights quantized, activations quantized, quantization numerics modeled during training) Needed a unit test for leaky relu in quantization aware training quantization quantization testing, quantization, leaky relu, eger mode quantization, qat conversion, quantization aware training add test/warning add precision test Add numerical test for conversion in qat (Quantization-aware training) for leaky relu def _test_activation_impl(
            self, float_module, float_op, quantized_module, quantized_op):
        ''' Test for activation op(with inplace options), float_op can be
        torch op or functional op
        '''
        class M(torch.nn.Module):
            def __init__(self, is_module, inplace):
                super(M, self).__init__()
                self.is_module = is_module
                self.inplace = inplace
                if self.is_module:
                    self.op = float_module(self.inplace)
                else:
                    self.op = float_op

            def forward(self, input):
                if self.is_module:
                    return self.op(input)
                else:
                    return self.op(input, self.inplace)

        options = itertools.product([True, False], [True, False], self.static_quant_types)
        quantized_nodes = {
            # is_module
            True: ns.call_module(quantized_module),
            False: ns.call_function(quantized_op),
        }

        for is_module, is_inplace, quant_type in options:
            self.checkGraphModeFxOp(
                M(is_module, is_inplace), self.img_data_2d,
                quant_type, quantized_nodes[is_module])
class TestEagerModeQATOps(QuantizationTestCase):
+    def _test_activation_convert_numerics_impl(self, Act, data):
         class M(torch.nn.Module):
             def __init__(self):
                 super().__init__()
@@ -1321,6 +1321,10 @@ class TestEagerModeQATOps(QuantizationTestCase):
         m = convert(m)
         checkNoFQModule(m)

class TestQATActivationOps(QuantizationTestCase):
    def _test_activation_convert_numerics_impl(self, Act, data):
        class M(torch.nn.Module):
            def __init__(self):
                super().__init__()
                self.act = Act()
                self.quant = QuantStub()
                self.dequant = DeQuantStub()

            def forward(self, x):
                x = self.quant(x)
                x = self.act(x)
                x = self.dequant(x)
                return x

        m = M().train()
        m.qconfig = default_qat_qconfig
        m = prepare_qat(m)
        before_convert = m(data)
        m = convert(m)
        after_convert = m(data)
        self.assertEqual(before_convert, after_convert)

+    def test_leaky_relu(self):
+        data = torch.randn(1, 3, 2, 4)
+        self._test_activation_convert_numerics_impl(nn.LeakyReLU, data)
LeakyReLU(x)=max(0,x)+negative_slope∗min(0,x) leaky relu
39 PyTorch c9a8413306312b2f2789dd46d5ac1a947be6b556 Cuda Fix loss of precision NaN, Inf gradients Creating and using character or word embeddings is the mainstream approach for handling most of the NLP tasks. Each character/word is matched with a numeric vector to create a numerical vector representation of text, which can be input into a model. Intermediate calculations were done on the same type as the output, in the case of float 16 this can lead to loss of precision. During FP16 training, char_embeddings.weight get NAN or INF gradients other NLP backward pass, character embedding, NLP increase variable precision/change variable type increase variable precision use higher precision for a variable that holds intermediate result,  use a `float32` temporary tensor when the input is `float16`
40 PyTorch 699de487db9f2cb6de5cba9588311eed46a8ccb3 C++ New feature N/A trapezoidal rule for integration is an approximation technique for calculating area under a curve based on summing trapezoids under a curve

The estimated integral of a function y of x, sampled at points (y_1, ..., y_n) that are separated by distance (dx_1, ..., dx_{n-1}), is given by the trapezoid rule: sum_{i=1}^{n-1}  dx_i * (y_i + y_{i+1}) / 2
N/A other integration integration other add new algorithm add numerical integration based on trapeizoidal rule that matches numpy implementation N/A Tensor do_trapz(const Tensor& y, const Tensor& dx, int64_t dim) {
+    Tensor left = y.slice(dim, 0, -1);
+    Tensor right = y.slice(dim, 1);
+
+    return ((left + right) * dx).sum(dim) / 2.;
+}
+
+// When dx is constant, the above formula simplifies
+// to dx * [(\sum_{i=1}^n y_i) - (y_1 + y_n)/2]
+Tensor do_trapz(const Tensor& y, double dx, int64_t dim) {
+    return (y.sum(dim) - (y.select(dim, 0) + y.select(dim, -1)) * (0.5)) * dx;
+}
+
+Tensor zeros_like_except(const Tensor& y, int64_t dim) {
+    auto sizes = y.sizes().vec();
+    dim = maybe_wrap_dim(dim, y.dim());
+    sizes.erase(sizes.begin() + dim);
+    return at::zeros(sizes, y.options());
+}
+
+}
+
+Tensor trapz(const Tensor& y, const Tensor& x, int64_t dim) {
+    dim = maybe_wrap_dim(dim, y);
+    // asking for the integral with zero samples is a bit nonsensical,
+    // but we'll return "0" to match numpy behavior.
+    if (y.size(dim) == 0) {
+        return zeros_like_except(y, dim);
+    }
+    Tensor x_viewed;
+    if (x.dim() == 1) {
+        TORCH_CHECK(x.size(0) == y.size(dim), "trapz: There must be one `x` value for each sample point");
+        DimVector sizes(y.dim(), 1);
+        sizes[dim] = x.size(0);
+        x_viewed = x.view(sizes);
+    } else {
+        x_viewed = x;
+    }
+    Tensor x_left = x_viewed.slice(dim, 0, -1);
+    Tensor x_right = x_viewed.slice(dim, 1);
+
+    Tensor dx = x_right - x_left;
+    return do_trapz(y, dx, dim);
+}
+
+Tensor trapz(const Tensor& y, double dx, int64_t dim) {
+    // see above
+    if (y.size(dim) == 0) {
+        return zeros_like_except(y, dim);
+    }
+    return do_trapz(y, dx, dim);
def test_trapz(self):
+        f_args_variable = (torch.randn(2, 3, requires_grad=True),
+                           torch.tensor([[1.0, 2.0, 5.5], [2.3, 0.5, 6.2]], requires_grad=True))
+        f_args_tensor = deepcopy(unpack_variables(f_args_variable))
+        run_functional_checks(self, "test_trapz", "trapz",
+                              lambda y, x: torch.trapz(y, x),
+                              True, f_args_variable, f_args_tensor)

@unittest.skipIf(not TEST_NUMPY, "Numpy not found")
+    def test_trapz(self):
+        def test_dx(sizes, dim, dx, device):
+            t = torch.randn(sizes, device=device)
+            actual = torch.trapz(t, dx=dx, dim=dim)
+            expected = np.trapz(t.cpu().numpy(), dx=dx, axis=dim)
+            self.assertEqual(expected.shape, actual.shape)
+            self.assertTrue(np.allclose(expected, actual.cpu().numpy()))
+
+        def test_x(sizes, dim, x, device):
+            t = torch.randn(sizes, device=device)
+            actual = torch.trapz(t, x=torch.tensor(x, device=device), dim=dim)
+            expected = np.trapz(t.cpu().numpy(), x=x, axis=dim)
+            self.assertEqual(expected.shape, actual.shape)
+            self.assertTrue(np.allclose(expected, actual.cpu().numpy()))
+
+        for device in torch.testing.get_all_device_types():
+            test_dx((2, 3, 4), 1, 1, device)
+            test_dx((10, 2), 0, 0.1, device)
+            test_dx((1, 10), 0, 2.3, device)
+            test_dx((0, 2), 0, 1.0, device)
+            test_dx((0, 2), 1, 1.0, device)
+            test_x((2, 3, 4), 1, [1.0, 2.0, 3.0], device)
+            test_x((10, 2), 0, [2.0, 3.0, 4.0, 7.0, 11.0, 14.0, 22.0, 26.0, 26.1, 30.3], device)
+            test_x((1, 10), 0, [1.0], device)
+            test_x((0, 2), 0, [], device)
+            test_x((0, 2), 1, [1.0, 2.0], device)
+            with self.assertRaisesRegex(
+                    IndexError,
+                    'Dimension out of range'):
+                test_x((2, 3), 2, [], device)
+                test_dx((2, 3), 2, 1.0, device)
+            with self.assertRaisesRegex(
+                    RuntimeError,
+                    'There must be one `x` value for each sample point'):
+                test_x((2, 3), 1, [1.0, 2.0], device)
+                test_x((2, 3), 1, [1.0, 2.0, 3.0, 4.0], device)
y = 1/(1+exp(-x)), x = logit(y) integration
41 PyTorch c5d5d45f40969cbddbb7f87da343dfd422503c1c Python Fix overflow/underflow overflow/underflow The absolute value of the Jacobian determinant at p gives us the factor by which the function f expands or shrinks volumes near p; this is why it occurs in the general substitution rule.
The Jacobian determinant is used when making a change of variables when evaluating a multiple integral of a function over a region within its domain.
According to the inverse function theorem, the matrix inverse of the Jacobian matrix of an invertible function is the Jacobian matrix of the inverse function.
The absolute determinant of the Jacobian of the inverse transformation in sigmoid transformation is unstable and returns NaN statistical distributions statistical distributions Log absolute determinant Jacobian, distribution transformation rewrite math formula rewrite math formula Rewrite method log abs det jacobian -(y.reciprocal() + (1 - y).reciprocal()).log() -F.softplus(-x) - F.softplus(x) reciprocal
42 PyTorch 645ad7ad0c89ecef61e89666745324deba31c8b7 Python Fix underflow NaN LP is LP space: At p = 1, one gets Sum Pooling (which is proportional to average pooling), p = inf is max pooling Gradient in LP pooling 1D and 2D becomes Nan when all inputs are zero. If all inputs are zero then the sum of all x to the power of p is zero. Square root of zero = NaN CNN operations pooling layer LP pooling rewrite math formula rewrite math formula Add relu unit to LP pooling to avoid gradient = NaN. After adding this patch gradient will be set to zero as opposed to NaN. return out.mul(kw * kh).pow(1. / norm_type) return (torch.sign(out) * relu(torch.abs(out))).mul(kw * kh).pow(1. / norm_type) pth root of sum of x^p pth root of sum of polynomials
43 PyTorch de42542351ad933ada59a4a8cf3b247d75d52917 Python Fix loss of precision Precision matrix (also known as concentration matrix) is the matrix inverse of the covariance matrix. The multivariate normal distribution can be parametrized either by the covariance matrix or precision matrix. precision matrix computation in multivariate normal distribution is unstable due to matrix inverse statistical distributions statistical distributions distributions, precision matrix, multivariante normal distribution rewrite math formula rewrite math formula Prior computation for precision matrix which uses the inverse of covariance matrix. Compute precision matrix with scale_tril instead. scale_tril is lower-triangular k x k matrix with non-zero diagonal, -        flat_conv = self.covariance_matrix.reshape((-1,) + self._event_shape * 2)
-        flat_precision = torch.stack([C.inverse() for C in flat_conv], 0)
       scale_tril_inv = _batch_inverse(self.scale_tril)
+        flat_scale_tril_inv = self.scale_tril.reshape((-1,) + self._event_shape * 2)
+        flat_precision = torch.bmm(flat_scale_tril_inv.transpose(-1, -2),
+                                   flat_scale_tril_inv)
matrix inverse
44 PyTorch 8cff8e93d21142ff42b9d2b1f45b01acde0b9d99 Python Fix loss of precision NaN Pytorch no longer has from torch.distributions.utils import _finfo, there is now torch.finfo
A torch.finfo is an object that represents the numerical properties of a floating point torch.dtype, (i.e. torch.float32, torch.float64, and torch.float16). This is similar to numpy.finfo.
Need a function for checking numerical properties of variables and calculating epsilon, which is used, for example, in softmax. Different floating point types have different characteristics with regards to their precision, what is the smallest positive number they can represent, what is the smallest number that can be added to one without truncation, etc. statistical distributions statistical distributions distributions (Laplace, Gumbel, Gamma, Dirichlet) use a different algorithm use a different algorithm Pytorch has many differenet datatypes with varying degress of precision. _finfo allows one to get the information about the charactaristics such as smallest number that can be added to 1 without truncation (eps), and the smallest positive number greater than zero (tiny) for each type of float. The newly implemented _finfo is used to clamp the Gamma, Beta, and Dirichlet distributions to avoid NANs. def _get_clamping_buffer(tensor):
-    clamp_eps = 1e-6
-    if isinstance(tensor, Variable):
-        tensor = tensor.data
-    if isinstance(tensor, (torch.DoubleTensor, torch.cuda.DoubleTensor)):
-        clamp_eps = 1e-15
-    return clamp_eps

eps = _get_clamping_buffer(probs)
# This follows semantics of numpy.finfo.
+_Finfo = namedtuple('_Finfo', ['eps', 'tiny'])
+_FINFO = {
+    torch.HalfStorage: _Finfo(eps=0.00097656, tiny=6.1035e-05),
+    torch.FloatStorage: _Finfo(eps=1.19209e-07, tiny=1.17549e-38),
+    torch.DoubleStorage: _Finfo(eps=2.22044604925e-16, tiny=2.22507385851e-308),
+    torch.cuda.HalfStorage: _Finfo(eps=0.00097656, tiny=6.1035e-05),
+    torch.cuda.FloatStorage: _Finfo(eps=1.19209e-07, tiny=1.17549e-38),
+    torch.cuda.DoubleStorage: _Finfo(eps=2.22044604925e-16, tiny=2.22507385851e-308),
+}

_finfo doc comment:
def _finfo(tensor):
    """
    Return floating point info about a `Tensor` or `Variable`:
    - `.eps` is the smallest number that can be added to 1 without being lost.
    - `.tiny` is the smallest positive number greater than zero
      (much smaller than `.eps`).

    Args:
        tensor (Tensor or Variable): tensor or variable of floating point data.
    Returns:
        _Finfo: a `namedtuple` with fields `.eps` and `.tiny`.
    """

eps = _finfo(probs).eps
45 PyTorch bc505100167f61ce241f511741794dfe2f89c5f0 Python Fix loss of precision Logit is the natural logarithm of odds, which is defined as p / (1-p), where p is probability. Probabilities range from zero to one, i.e., p∈[0,1], whereas logits can be any real number (R, from minus infinity to infinity) numerical stability of linspace implementation loss functions loss functions loss, caffe2, batch lr loss use a different algorithm use a different algorithm Delete code that uses probability, use only logits in batch lr loss -        if schema.is_schema_subset(
-            schema.Struct(
-                ('label', schema.Scalar()),
-                ('logit', schema.Scalar())
-            ), self.input_record
-        ):
-            label = self.input_record.label()
-            # mandatory cast to float32
-            # self.input_record.label.field_type().base is np.float32 but
-            # label type is actually int
-            label = net.Cast(
-                label,
-                net.NextScopedBlob('label_float32'),
-                to=core.DataType.FLOAT)
-            label = net.ExpandDims(label, net.NextScopedBlob('expanded_label'),
-                                    dims=[1])
-            xent = net.SigmoidCrossEntropyWithLogits(
-                [self.input_record.logit(), label],
-                net.NextScopedBlob('cross_entropy'),
-            )
-        # TODO(T23937449): Change all the use cases of BatchLRLoss to the
-        # numerically stable version
-        else:
-            class_probabilities = net.MakeTwoClass(
-                self.input_record.prediction.field_blobs(),
-                net.NextScopedBlob('two_class_predictions')
-            )
-            label = self.input_record.label.field_blobs()
-            label = [net.Cast(
-                label,
-                net.NextScopedBlob('int32_label'),
-                to=core.DataType.INT32)]
-            xent = net.LabelCrossEntropy(
-                [class_probabilities] + label,
-                net.NextScopedBlob('cross_entropy'),
-            )
label = self.input_record.label()
+        # mandatory cast to float32
+        # self.input_record.label.field_type().base is np.float32 but
+        # label type is actually int
+        label = net.Cast(
+            label,
+            net.NextScopedBlob('label_float32'),
+            to=core.DataType.FLOAT)
+        label = net.ExpandDims(label, net.NextScopedBlob('expanded_label'),
+                                dims=[1])
+        xent = net.SigmoidCrossEntropyWithLogits(
+            [self.input_record.logit(), label],
+            net.NextScopedBlob('cross_entropy'),
+        )
logit = ln(p/(1-p)) ln
46 PyTorch 40b783b746b4f5775c97c7fe41dfb011b545665a Python Unit test loss of precision A simple approximation of the first derivative is f'(x) ~ (f(x+h)-f(x))/h, where h is the steps size. Unit test failing because of numerical approximation of derivative (i.e.: the gradient) of pReLU uses step size that is too large, which causes a large approximation error. activation functions activation functions testing accuracy, gradients, caffe2, pReLU rewrite math formula rewrite math formula Improve test for gradient checks asserts by using smaller step size self.assertGradientChecks(gc, op, [X, W], 0, [0]) self.assertGradientChecks(gc, op, [X, W], 0, [0], stepsize=1e-2)     def test_prelu(self, X, alpha, inplace, shared, order, seed, gc, dc):
        np.random.seed(seed)
        W = np.random.randn(
            X.shape[1] if order == "NCHW" else X.shape[3]).astype(np.float32)

        if shared:
            W = np.random.randn(1).astype(np.float32)

        # go away from the origin point to avoid kink problems
        X += 0.04 * np.sign(X)
        X[X == 0.0] += 0.04

        def prelu_ref(X, W):
            Y = X.copy()
            W = W.reshape(1, -1, 1, 1) if order == "NCHW" \
                else W.reshape(1, 1, 1, -1)
            assert len(X.shape) == 4
            neg_indices = X <= 0
            assert len(neg_indices.shape) == 4
            assert X.shape == neg_indices.shape
            Y[neg_indices] = (Y * W)[neg_indices]
            return (Y,)

        op = core.CreateOperator(
            "PRelu", ["X", "W"], ["Y" if not inplace else "X"],
            alpha=alpha, order=order)
        self.assertReferenceChecks(gc, op, [X, W], prelu_ref, ensure_outputs_are_inferred=True)
        # Check over multiple devices
        self.assertDeviceChecks(dc, op, [X, W], [0])

        if not inplace:
            # Gradient check wrt X
            self.assertGradientChecks(gc, op, [X, W], 0, [0], stepsize=1e-2, ensure_outputs_are_inferred=True)
            # Gradient check wrt W
            self.assertGradientChecks(gc, op, [X, W], 1, [0], stepsize=1e-2, ensure_outputs_are_inferred=True)
PReLU(x)=max(0,x)+a∗min(0,x) pReLU
47 PyTorch e187ba7a9fb18aba0a0651e05c20e1f491d989fc Python Fix loss of precision inaccurate result Inexact Fmod computes the element-wise remainder of division. When the divisor is zero, returns NaN for floating point dtypes on both CPU and GPU; raises RuntimeError for integer division by zero on CPU; Integer division by zero on GPU may return any value. Unit test for Fmod/Remainder fail due to numerical jacobian check. Previously, tests for Fmod and Remainder added 5e-2 to the denominator tensor (the same as the div tests), which only avoids divide by 0, but not issues with computing the numerical jacobian due to non-linearity of fmod/remainder, when input / divisor is close to an integer. gradients/derivatives automatic differentiation testing accuracy, automatic differentiation, remainer, numerical jacobian rewrite math formula rewrite math formula Ensure that the result of input / divisor is not close to an integer. Add 1.5 to denominator to make it more likely that it it will not be an integer result . Add 1.5 to denominator instead of 5e-2. Note: this is no longer in Pytorch, specifically decrease probability of numerical issues with numerical jacobian computation. remainder
48 PyTorch 67968cb60b1d3021834594967d4140a36a8213e3 Python Fix overflow/loss of precision Binary cross entropy with logits measures the probability error in tasks with two outcomes in which each outcome is independent and need not have a fully certain label. For instance, one could perform a regression where the probability of an event happening is known and used as a label. This loss may also be used for binary classification, where labels are either zero or one. Using sigmoid followed by a BCE loss layer can be less stable than a single layer that combines sigmoid with BCE Loss loss functions loss functions binary cross entropy loss use a different algorithm use a different algorithm combinine sigmoid and BCE loss into one layer and utilize the log-sum-exp trick. This is more stable than using a plain sigmoid followed by a BCE loss def binary_cross_entropy_with_logits(input, target, weight=None, size_average=True):
+    r"""Function that measures Binary Cross Entropy between target and output logits:
+
+    See :class:`~torch.nn.BCEWithLogitsLoss` for details.
+
+    Args:
+        input: Variable of arbitrary shape
+        target: Variable of the same shape as input
+        weight (Variable, optional): a manual rescaling weight
+                if provided it's repeated to match input tensor shape
+        size_average (bool, optional): By default, the losses are averaged
+                over observations for each minibatch. However, if the field
+                sizeAverage is set to False, the losses are instead summed
+                for each minibatch.
+    """
+    if weight is not None and target.dim() != 1:
+        weight = weight.view(1, target.size(1)).expand_as(target)
+    neg_abs = - input.abs()
+    loss = input.clamp(min=0) - input * target + (1 + neg_abs.exp()).log()
+
+    if weight is not None:
+        loss = loss * weight
+
+    if size_average:
+        return loss.mean()
+    else:
+        return loss.sum()
loss(o, t) = - 1/n \sum_i (t[i] * log(sigmoid(o[i])) + (1 - t[i]) * log(1 - sigmoid(o[i]))) log, sigmoid
49 PyTorch 7ba5e7cea1d2be485d2806ad38608dad9bcc7041 Python Fix loss of precision Pooling layers are used to reduce the dimensions of the feature maps and to summarize them. A max pooling layer returns the maximum values of rectangular regions of its input.

Boundary conditions (b.c.) are constraints necessary for the solution of a boundary value problem. A boundary value problem is a differential equation (or system of differential equations) to be solved in a domain on whose boundary a set of conditions is known.
VolumetricMaxPooling (in legacy.nn) precision test kept failing
there were these one set of indices that were in the same Pooling window that differed by less than epsilon. So, the numeric gradient was hitting boundary conditions (max-pooling is discontinuous of course)
CNN operations pooling layer testing, max pooling rewrite math formula rewrite math formula modify the test to not have the input tensor have these boundary conditions, using torch.rand, which eturns a tensor filled with random numbers from a uniform distribution on the interval [0, 1) and then multipling by 1000 input_size=(2, 3, 5, 5, 5)) input=(torch.randn(2, 3, 5, 5, 5) * 1000)),
50 PyTorch a03692069ebe19038bfccf5a59208ed2989bd4d9 Python Unit test loss of precision Unit test sometimes failing because of numerical gradient approximation error loss functions loss functions caffe2, loss relax accuracy test tolerance relax accuracy test tolerance Increase the tollerance when comparing the gradient to make test pass delta=1e-3 delta=1e-2 * abs(np.asscalar(dx[0])))
51 PyTorch 33cc71dc55db073ba46b065e24cff0d26156376f C Fix loss of precision Returns a 1-D tensor of size (start-end)/step + 1 with values from start to end with step step. Step is the gap between two values in the tensor. Precision can be lost when floats get very small, unexpected behavior tensor math tensor math range (i.e.: 1 D tensor) rewrite math formula rewrite math formula When dividing (xmax - xmin) by step, the numerator can become very small of xmax and xmin are close to each other. It is mathematically correct and also more stable when xmax and xmin are close to each other to distribute the divisions, then perform subtraction, therefore (xmax / step) - (xmin / step). Note: This function is deprecated and will be removed in a future release because its behavior is inconsistent with Python’s range builtin. Instead, use torch.arange(), which produces values in [start, end). void THTensor_(range)(THTensor *r_, real xmin, real xmax, real step)
size = (long)((xmax-xmin)/step+1);
void THTensor_(range)(THTensor *r_, accreal xmin, accreal xmax, accreal step)
size = (long)((xmax/step - xmin/step)+1);
52 PyTorch 87fcf3072ef988b5b2e408cce141b76235929bbd C++ Fix overflow Hsum_sq performs horizontal sum of squares over a range of uint8_t, returns row sum
The quantized version of hsum_sq has an overflow when input image size is large such as (H,W,D) as (224,224,160) quantization quantization quantization, sum of squares use a different algorithm use a different algorithm Rewrite for loop definition to include overflow threshold to prevent overflow for (; i < len / 16 * 16; i += 16) {
  }
int overflow_threshold = 262144; // 2147483647(max of int32)/(256*256)*8 = 262144
int loop = len / overflow_threshold + 1;
for(int j=0; j<=loop; j++){
for (; ((i < overflow_threshold * j) && (i < len / 16 * 16)); i += 16) {
53 PyTorch 45aaaef22cdc9d87f2c04762fce9ffeeff290330 Python Unit test overflow exception Python uses arbitrary precision integers, which can scale to be as large as needed, up to the amount of memory available to the computer. C++ primitives are fixed in their precision, and are commonly either 8, 16, 32, or 64 bits. A timing function in code used for benchmarking has can overflow when calling C++ code precision tests/speed benchmarks timing benchmarking timing add overflow check add overflow check Check if the operation would overflow a 32 bit signed primitive from Python before using this value in C++. Add a break statement with overflow threshold condition to prevent overflow # Avoid overflow in C++ pybind11 interface
+                if number * 10 > 2147483647:
+                    break
54 PyTorch c675727adf36bdbb60933c9c7529d3ee34462093 C++ Fix overflow torch.empty(size) returns a tensor filled with uninitialized data. The shape of the tensor is defined by the variable argument size Incorrect error message that fails to indicate an overflow. Overflow occurs when the input into torch.empty is very large tensor math tensor math torch.empty (i.e.: tensor with unitialized data) fix test/warning correct error message change error message to indicate overflow TypeError: empty(): argument 'size' must be tuple of ints, but found element of type int at pos 1 RuntimeError: Overflow when unpacking long
55 PyTorch a69910868a5962e2d699c6069154836e262a29e2 Python Fix overflow DistributedSampler restricts data loading to a subset of the dataset. DistributedSampler takes a dataset as input and loads a sample of it.
torch.utils.data.distributed.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, seed=0, drop_last=False)
num_replicas (int, optional): Number of processes participating in distributed training. By default, :attr:`world_size` is retrieved from the current distributed group.
DistributedSampler takes a dataset as input and loads a sample of it. When `len(dataset) * 2 < num_replica`, there is a possibility of overflow
statistical distributions data sampling distributions, sampling, data loading rewrite math formula rewrite math formula rewrite formula for indexing data points in dataset and add if else logic indices += indices[:(self.total_size - len(indices))] padding_size = self.total_size - len(indices)
+            if padding_size <= len(indices):
+                indices += indices[:padding_size]
+            else:
+                indices += (indices * math.ceil(padding_size / len(indices)))[:padding_size]
56 PyTorch 6debe825beb36fc8e894a1b0a14bd5b4ebcd6090 GLSL, Python, C++ New feature loss of precision Vulcan is a graphics and compute open standard API.  GLSL (OpenGL Shading Language), a special OpenGL Shading Language with syntax similar to C. A shader is essentially a function required to draw something on the screen. Shaders run on a GPU.
The RelaxedPrecision allows 32-bit integer and 32-bit floating-point operations to execute with a relaxed precision of somewhere between 16 and 32 bits. More info: https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.html
Add new feature to allow relaxed precision mode via a cmake option non-standard precision non-standard precision GLSL shaders, GPU add new precision option add new precision option Introduces cmake option USE_VULKAN_RELAXED_PRECISION that controls which precision will be used in Vullkan shaders. This option allows to relax precision executes operations in 16 to 32 bit range precision on Vulcan. Note, the default setting is 32 bit precision. N/A option(USE_VULKAN_RELAXED_PRECISION "Use Vulkan relaxed precision(mediump)" OFF)
+if(USE_VULKAN_RELAXED_PRECISION)
+  string(APPEND CMAKE_CXX_FLAGS " -DUSE_VULKAN_RELAXED_PRECISION")
+endif()
57 PyTorch 324c18fcad579b1afa63ae45528bf598ba8ec4ca Cuda Fix underflow Computes division a/b using formula a * (1/b) Division operation, where the denominator is a low precision scalar has a risk of underflow. Inverse by division was calculated using the same precision of the non-scalar operands. tensor math tensor math Cuda, division increase variable precision/change variable type change variable type Replace the type used for accumulation to the same type as the opperands. Replace scalar_t with accscalar_t. auto inv_b = scalar_t(1.0) / iter.scalar_value<scalar_t>(2); using accscalar_t = at::acc_type<scalar_t, true>;
auto inv_b = accscalar_t(1.0) / iter.scalar_value<accscalar_t>(2);
@onlyCUDA
+    @dtypes(torch.half)
+    def test_divmul_scalar(self, device, dtype):
+        x = torch.tensor(100., device=device, dtype=dtype)
+        x_ref = x.float()
+        scale = 1e5
+        res = x.div(scale)
+        expected = x_ref.div(scale)
+        self.assertEqual(res, expected.to(dtype), atol=0., rtol=0.)
+        x = torch.tensor(1e-5, device=device, dtype=dtype)
+        x_ref = x.float()
+        res = x.mul(scale)
+        expected = x_ref.mul(scale)
+        self.assertEqual(res, expected.to(dtype), atol=0., rtol=0.)
+        res = scale * x
+        self.assertEqual(res, expected.to(dtype), atol=0., rtol=0.)
division
58 PyTorch 24a8614cac3af1711eccc7294fd47ac30aefa8cc Python Add warning overflow cuFFT = CUDA Fast Fourier Transform library non-standard precision non-standard precision CUDA, half precision, warning disable test/warning add overflow warning Add a warning message to warn programmer of possible overflow when operation performed in half precision. Message: "Due to limited dynamic range of half datatype, performing this operation in half precision may cause the first element of result to overflow for certain inputs"
59 PyTorch fe684679b06f7f2fe7a7e136ea5605c04254b652 C++ disable test overflow runtime error The csrc directory contains all of the code concerned with integration with Python. This is in contrast to lib, which contains the Torch libraries that are Python agnostic. csrc depends on lib, but not vice versa. Runtime error from overflow when unpacking large numbers. The bug is: torch.tensor([0.1, 999999999999999999999]) fails with "Overflow when unpacking double" other other Convert Python float to C++ float, Python integration disable test/warning disable overflow and precision test Delete code that throws an exception on overflow and lost precision -  if (PyLong_Check(obj)) {
-    int overflow;
-    long long value = PyLong_AsLongLongAndOverflow(obj, &overflow);
-    if (overflow != 0) {
-      throw std::runtime_error("Overflow when unpacking double");
-    }
-    if (value > DOUBLE_INT_MAX || value < -DOUBLE_INT_MAX) {
-      throw std::runtime_error("Precision loss when unpacking double");
-    }
-    return (double)value;
-  }

delete old solution def test_unpack_double(self, device, dtype):
+        # Reference: https://github.com/pytorch/pytorch/issues/33111
+        vals = (2 ** 24 + 1, 2 ** 53 + 1,
+                np.iinfo(np.int64).max, np.iinfo(np.uint64).max, np.iinfo(np.uint64).max + 1,
+                -1e500, 1e500)
+        for val in vals:
+            t = torch.tensor(val, dtype=dtype, device=device)
+            a = np.array(val, dtype=torch_to_numpy_dtype_dict[dtype])
+            self.assertEqual(t, torch.from_numpy(a))
60 PyTorch 7417b4c66f5b0901f206bf48b64de07384770724 Cuda Fix overflow ConvTranspose3d applies a 3D transposed convolution operator over an input image composed of several input planes. The transposed convolution operator multiplies each input value element-wise by a learnable kernel, and sums over the outputs from all input feature planes.
torch.nn.ConvTranspose3d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')
The index in torch.nn.ConvTranspose3d overflows CNN operations convolution convolution transpose add test/warning change variable type, increase variable precision Change variable type of index from int to unsigned. equires that input.numel() <= UINT_MAX, and channels * kernel.numel() <= UINT_MAX. Note: this is a second attept to fix the problem int data_col_index =
-                (((((c_im * kernel_t + t_k) * kernel_h + h_k) * kernel_w +
-                   w_k) *
-                      depth_col +
-                  t_col) *
-                     height_col +
-                 h_col) *
-                    width_col +
-                w_col;
const int64_t idx_k =
+                ((c_im * kernel_t + t_k) * kernel_h + h_k) * kernel_w + w_k;
+            const int64_t data_col_index =
+                ((idx_k * depth_col + t_col) *
+                    height_col + h_col) *
+                  width_col + w_col;
             val += data_col[data_col_i
const auto num_kernels = channels * depth * height * width;
+
+  auto check_fits_in_unsigned =
+    [](int64_t val, const char * name) {
+      constexpr auto umax = std::numeric_limits<unsigned>::max();
+      TORCH_CHECK(val >= 0 && val <= umax,
+                  name, " must fit in a 32-bit unsigned value");
+    };
+  check_fits_in_unsigned(num_kernels, "input size");
+  check_fits_in_unsigned(
+      channels * patch_t * patch_h * patch_w, "channels x kernel size");
61 PyTorch 0a159b0a3a78a80fb0f9082087a98f87f2dea986 C++ Fix loss of precision inaccurate/incorrect result torch.remainder gives the wrong output for very large float dividends due to loss of precision. For example,
x = torch.tensor(2749682432.0)
q = 36
print(torch.remainder(x,q))
actual output is 128.0 whereas the correct output should be 20
tensor math tensor math remainder use a different algorithm use a different algorithm Use sleef library to calculate mod for floats. Use sleef_fmod8, a vectorized single precision FP remainder. return a - b * at::native::floor_impl(a / b); Vec256<BFloat16> fmod(const Vec256<BFloat16> & q) const {
+    __m256 x_lo, x_hi;
+    cvtbf16_fp32(values, x_lo, x_hi);
+    __m256 q_lo, q_hi;
+    cvtbf16_fp32(q.values, q_lo, q_hi);
+    auto o1 = Sleef_fmodf8(x_lo, q_lo);
+    auto o2 = Sleef_fmodf8(x_hi, q_hi);
+    return cvtfp32_bf16(o1, o2);

scalar_t mod = std::fmod(a, b);
if ((mod != 0) && ((b < 0) != (mod < 0))) mod += b;
return mod;
modulo
62 PyTorch 63b1ae69831cd21bc4d6059a5854bc1155a152c9 Cuda Fix overflow C++ std:: fmod definition: The floating-point remainder of the division operation x/y calculated by this function is exactly the value x - n*y, where n is x/y with its fractional part truncated.

The returned value has the same sign as x and is less than y in magnitude.
If successful, returns the floating-point remainder of the division x/y as defined above.

If a domain error occurs, an implementation-defined value is returned (NaN where supported)

If a range error occurs due to underflow, the correct result (after rounding) is returned.

overflow in torch.remainder when dividend is very large tensor math tensor math remainder rewrite math formula rewrite math formula Use fmod from C++ standard library to calculate remainder instead of a - b * floor(a/b). And account for an edge case: if the result of fmod is not zero (i.e.; a is not divisible by b) and either (1) the divisor is less than zero while the remainder is greater than zero, or (2) the divisor is greater than zero while the remainder is less than zero. If that is the case, increment the result of fmod by the divisor. return a - b * static_cast<scalar_t>(std::floor(a / b)); auto mod = ::fmod(a, b);
+          if ((mod != 0) && ((b < 0) != (mod < 0))) mod += b;
+          return mod;
def test_remainder_fmod_large_dividend(self, device, dtype):
+        alarge = 1e9
+        pi = 3.14159265358979
+        for avalue in [alarge, -alarge]:
+            for bvalue in [pi, -pi]:
+                a = torch.tensor([avalue], dtype=dtype, device=device)
+                b = torch.tensor([bvalue], dtype=dtype, device=device)
+                c = torch.remainder(a, b)
+                d = torch.fmod(a, b)
+                self.assertTrue((b[0] > 0) == (c[0] > 0))  # remainder has same sign as divisor
+                self.assertTrue((a[0] > 0) == (d[0] > 0))  # fmod has same sign as dividend
+                self.assertTrue(abs(c[0]) < abs(b[0]))     # remainder is within range of divisor
+                self.assertTrue(abs(d[0]) < abs(b[0]))     # fmod is within range of divisor
+                if ((a[0] > 0) == (b[0] > 0)):
+                    self.assertTrue(c[0] == d[0])   # remainder is same as fmod
+                else:
+                    self.assertTrue(abs(c[0] - d[0]) == abs(b[0]))  # differ by one divisor
remainder, division
63 PyTorch b33e38ec475017868534eb114741ad32c9d3b248 C++ Fix loss of precision arrange creates a 1D tensor using start, end, and step size Step and input have the same type. Variable step may require higher precision than variables start and end. I think this is when step is a very small number. tensor math tensor creation vectorized calculations, low level tensor math, CPU increase variable precision/change variable type increase variable precision Allow a higher-precision step type for Vec256::arange. Setting the type of step to be independent of the input type. Often a double is required for this while the input remains a single. static Vec256<T> arange(T base = static_cast<T>(0), T step = static_cast<T>(1))   template<typename step_t>  // step sometimes requires a higher precision type (e.g., T=int, step_t=double)
  static Vec256<T> arange(T base = static_cast<T>(0), step_t step = static_cast<step_t>(1)) {
N/A
64 PyTorch 5c423cae72b3b720a0857a8237a499d0e07d6b98 Python Unit test loss of precision Linspace creates a 1D tensor of size steps whose values are evenly spaced from start to end, inclusive.
Logspace creates a 1D tensor of size steps whose values are evenly spaced from base^start to base^end inclusive, on a logarithmic scale with base "base".
Precision of Cuda half precision computation of linspace and logspace seems bad tensor math tensor creation testing precision, Cuda, half precision, linspace, logspace add test/warning add precision test adds precision tests for CUDA half (16 bits), float (32 bits), and double (64 bits). Since linspace/logspace are deterministic, we can compute an expected
amount of error (by testing without a precision override), adding a tiny amount (EPS) to that, and using that value as the override.
EPS = 1e-5
LINSPACE_LOGSPACE_EXTRA_EPS = 1e-5
+
# Tests that compare a device's computation with the (gold-standard) CPU's.
class TestDevicePrecision(TestCase):
-    def test_linspace(self, device):
-        a = torch.linspace(0, 10, 10, device=device)
-        b = torch.linspace(0, 10, 10)
+
+    # The implementation of linspace+logspace goes through a different path
+    # when the steps arg is equal to 0 or 1. For other values of `steps`
+    # they call specialized linspace (or logspace) kernels.
+    LINSPACE_LOGSPACE_SPECIAL_STEPS = [0, 1]
+
+    def _test_linspace(self, device, dtype, steps):
+        a = torch.linspace(0, 10, steps=steps, dtype=dtype, device=device)
+        b = torch.linspace(0, 10, steps=steps)
         self.assertEqual(a, b)

-    @dtypes(torch.double)
-    def test_logspace(self, device, dtype):
-        a = torch.logspace(1, 10, 10, dtype=dtype, device=device)
-        b = torch.logspace(1, 10, 10, dtype=dtype, device='cpu')
+    # See NOTE [Linspace+Logspace precision override]
+    @precisionOverride({torch.half: 0.0039 + LINSPACE_LOGSPACE_EXTRA_EPS})
+    @dtypesIfCUDA(torch.half, torch.float, torch.double)
+    @dtypes(torch.float, torch.double)
+    def test_linspace(self, device, dtype):
+        self._test_linspace(device, dtype, steps=10)
+
+    @dtypesIfCUDA(torch.half, torch.float, torch.double)
+    @dtypes(torch.float, torch.double)
+    def test_linspace_special_steps(self, device, dtype):
+        for steps in self.LINSPACE_LOGSPACE_SPECIAL_STEPS:
+            self._test_linspace(device, dtype, steps=steps)
+
+    def _test_logspace(self, device, dtype, steps):
+        a = torch.logspace(1, 1.1, steps=steps, dtype=dtype, device=device)
+        b = torch.logspace(1, 1.1, steps=steps)
         self.assertEqual(a, b)

-        # Check non-default base=2
-        a = torch.logspace(1, 10, 10, 2, dtype=dtype, device=device)
-        b = torch.logspace(1, 10, 10, 2, dtype=dtype, device='cpu')
+    def _test_logspace_base2(self, device, dtype, steps):
+        a = torch.logspace(1, 1.1, steps=steps, base=2, dtype=dtype, device=device)
+        b = torch.logspace(1, 1.1, steps=steps, base=2)
         self.assertEqual(a, b)

+    # See NOTE [Linspace+Logspace precision override]
+    @precisionOverride({torch.half: 0.0157 + LINSPACE_LOGSPACE_EXTRA_EPS})
+    @dtypesIfCUDA(torch.half, torch.float, torch.double)
+    @dtypes(torch.float, torch.double)
+    def test_logspace(self, device, dtype):
+        self._test_logspace(device, dtype, steps=10)
+
+    # See NOTE [Linspace+Logspace precision override]
+    @precisionOverride({torch.half: 0.00201 + LINSPACE_LOGSPACE_EXTRA_EPS})
+    @dtypesIfCUDA(torch.half, torch.float, torch.double)
+    @dtypes(torch.float, torch.double)
+    def test_logspace_base2(self, device, dtype):
+        self._test_logspace_base2(device, dtype, steps=10)
+
+    @dtypesIfCUDA(torch.half, torch.float, torch.double)
+    @dtypes(torch.float, torch.double)
+    def test_logspace_special_steps(self, device, dtype):
+        for steps in self.LINSPACE_LOGSPACE_SPECIAL_STEPS:
+            self._test_logspace(device, dtype, steps=steps)
+            self._test_logspace_base2(device, dtype, steps=steps)
\
start, start + (end-start)/(steps-1), ..., start + (steps - 2) * (end-start)/(steps-1)

https://pytorch.org/docs/stable/generated/torch.logspace.html
65 PyTorch b9b9fd4fadc4d4fa0b030941a35011956eafa10b C++ Disable warning overflow Warning pragma enables selective modification of the behavior of compiler warning messages. The pragma warning( push ) stores the current warning state for every warning. The pragma warning( push, n ) stores the current state for every warning and sets the global warning level to n. The pragma warning( pop ) pops the last warning state pushed onto the stack. Any changes that you made to the warning state between push and pop are undone. False arithmetic overflow warning in MSVC ( Microsoft Visual compiler for C, C++) results in code not compiling compiler compiler overflow warning, Microsoft compiler disable test/warning disable warning Disable warnings for arithmetic overflow raised by MSVC (Microsoft Visual C Compiler). Add logic to ignore warning using warning pragma: pragma warning(disable : 4146) that allows for ignoring specified warning messages. Also push and pop are used. // Ignore the false warning "Arithmetic overflow" for MSVC
+ #ifdef _MSC_VER
+ # pragma warning(push)
+ # pragma warning(disable : 4146)
+ #endif
+
  /// Gets the minimum value for a N-bit signed integer.
  inline int64_t minIntN(int64_t N) {
    assert(N > 0 && N <= 64 && "integer width out of range");

+   return -(UINT64_C(1) << (N - 1));
  }

+ #ifdef _MSC_VER
+ # pragma warning(pop)
+ #endif
66 PyTorch ec8e75ea92ae2b5ea73b4aeb3ec7cb39e9f95db9 Cuda Fix overflow Histograms are an important data representation with many applications in computer vision, data analytics and medical imaging. Histogram is a popular analytic graphical representation of data distribution resulting from processing a given numerical input data. Not enough bits to represent the necessary values using an int for nbins. getBin function in Cuda overflows for large bVal and nbins values => (bVal - minvalue) * nbins = inf
other other Cuda histogram increase variable precision/change variable type increase variable precision Patch: increase precision from int (32 bits) to int64   t = torch.zeros([10], dtype=torch.int32, device='cuda')
+        # 35488 * 65536 as int32 would cause overflow to negative value
+        # giving negative bin offset
+        t[0] = 35488
+        counted = t.bincount(minlength=65536)
+        self.assertEqual(torch.sum(counted), 10)
67 PyTorch 17c1b2c7159a0218a69e8486eb4212339253353a Python Fix overflow Saturation arithmetic is a version of arithmetic in which all operations such as addition and multiplication are limited to a fixed range between a minimum and maximum value.
If the result of an operation is greater than the maximum, it is set ("clamped") to the maximum; if it is below the minimum, it is clamped to the minimum. The name comes from how the value becomes "saturated" once it reaches the extreme values; further additions to a maximum or subtractions from a minimum will not change the result.
In quantization code,  range (i.e.: the min and max values of fixed range), which is used as fallback onto default 8-bit qmin and qmax calculation if dynamic range is not used, can cause overflow quantization quantization quantization, range rewrite math formula rewrite math formula Change range: relax scale and zero-point for activations to ensure that fbgemm implementations of conv and linear do not saturate due to 16 bit intermediate accumulation.  But now in Pytorch: "Please use quant_min and quant_max to specify the range for observers. reduce_range will be deprecated in a future release of PyTorch."          if self.dtype == torch.qint8:
-            qmin, qmax = -128, 127
         else:
-            qmin, qmax = 0, 255
@@ -59,9 +61,15 @@ class ObserverBase(ABC, nn.Module):
         )

         if self.dtype == torch.qint8:
+            if self.reduce_range:
+                qmin, qmax = -64, 63
+            else:
+                qmin, qmax = -128, 127
         else:
+            if self.reduce_range:
+                qmin, qmax = 0, 127
+            else:
+                qmin, qmax = 0, 255
class ObserverTest(QuantizationTestCase):
     @given(qdtype=st.sampled_from((torch.qint8, torch.quint8)),
-           qscheme=st.sampled_from((torch.per_tensor_affine, torch.per_tensor_symmetric)))
-    def test_minmax_observer(self, qdtype, qscheme):
-        myobs = MinMaxObserver(dtype=qdtype, qscheme=qscheme)
+           qscheme=st.sampled_from((torch.per_tensor_affine, torch.per_tensor_symmetric)),
+           reduce_range=st.booleans())
+    def test_minmax_observer(self, qdtype, qscheme, reduce_range):
+        # reduce_range cannot be true for symmetric quantization with uint8
+        if qdtype == torch.quint8 and qscheme == torch.per_tensor_symmetric:
+            reduce_range = False
+        myobs = MinMaxObserver(dtype=qdtype, qscheme=qscheme, reduce_range=reduce_range)
68 PyTorch c845984271a551ac1c61b9eb06a17fb57aafbd7e Cuda Fix overflow A loop from i to n uses int to store the index i, which overflows after it is incremented. Overflow makes the index negative, which will also cause buffer overflow other other looping, Cuda increase variable precision/change variable type increase variable precision increase precision from int to int 64 #define CUDA_KERNEL_LOOP(i, n) \
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); i += blockDim.x * gridDim.x)
// CUDA: grid stride looping
+// int64_t _i_n_d_e_x specifically prevents overflow in the loop increment.
+// If input.numel() < INT_MAX, _i_n_d_e_x < INT_MAX, except after the final
+// iteration of the loop where _i_n_d_e_x += blockDim.x * gridDim.x can be
+// greater than INT_MAX.  But in that case _i_n_d_e_x >= n, so there are no
+// further iterations and the overflowed value in i=_i_n_d_e_x is not used.

#define CUDA_KERNEL_LOOP(i, n) \
+  int64_t _i_n_d_e_x = blockIdx.x * blockDim.x + threadIdx.x;                                \
+  for (int i=_i_n_d_e_x; _i_n_d_e_x < (n); _i_n_d_e_x+=blockDim.x * gridDim.x, i=_i_n_d_e_x)
@unittest.skipIf(not TEST_MEDIUM_TENSOR, "not enough memory")
+    def test_cuda_kernel_loop_overflow(self):
+        # Issue #24309: In extreme cases, the loop variable could overflow and continue
+        # the kernel loop with a negative index, causing a RuntimeError (invalid write):
+        x = torch.randn(1, 1, 1, 2**30 + 1, dtype=torch.float16, device="cuda")
+        expected = x[0, 0, 0, 2**30]
+        y = torch.nn.functional.avg_pool2d(x, kernel_size=1)
+        torch.cuda.synchronize()
+        self.assertEqual(y[0, 0, 0, 2**30], expected)
+
+    @unittest.skipIf(not TEST_LARGE_TENSOR, "not enough memory")
+    def test_cuda_kernel_loop_overflow_large(self):
+        # Make sure input.numel() > INT_MAX is handled:
+        x = torch.randn(1, 1, 1, 2**31, dtype=torch.float16, device="cuda")
+        with self.assertRaisesRegex(RuntimeError, "integer out of range"):
+            y = torch.nn.functional.avg_pool2d(x, kernel_size=1)
+
+        # Issue #24309: In extreme cases, the loop variable could overflow and continue
+        # the kernel loop with a negative index, causing a RuntimeError (invalid write):
+        x = torch.randn(1, 1, 1, 2**31 - 1, dtype=torch.float16, device="cuda")
+        expected = x[0, 0, 0, 2**31 - 2]
+        y = torch.nn.functional.avg_pool2d(x, kernel_size=1)
+        torch.cuda.synchronize()
+        self.assertEqual(y[0, 0, 0, 2**31 - 2], expected)
69 PyTorch 4d2bf0b51b71f96929b58c6e23fb71d3e25440ff Python Unit test loss of precision backward pass output in quantization aware training was not accurate enough quantization quantization quantization aware training, testing precision, backward pass increase variable precision/change variable type increase variable precision Increase precision from float to double
70 PyTorch af908d57ea07c593bb7c8db00c3139fc973b2d4c Python Unit test loss of precision Test for quantized operations’s precision is failing in function def test_adaptive_avg_pool2d(self, X, output_size_h, output_size_w) due to double rounding quantization quantization quantization, precision testing, average pooling relax accuracy test tolerance relax accuracy test tolerance Increase unittest precision tolerance to 1.0 to avoid failing -            self.assertEqual(X_ref, qX_repr,
-                             message=error_message.format(name, X_ref, qX_repr))
self.assertEqual(X_ref, qX_hat.int_repr(), prec=1.0,
message=error_message.format(name, X_ref, qX_hat))
71 PyTorch 83bfd76b2f7a9b388537eb00022622d9c6989890 Python Unit test loss of precision absolute tolerance (atol). An absolute tolerance is a fixed number that is used to make direct comparisons Test in function make_input(batch_size) in class ONNX Runtime (ONNX=Open Neural Network Exchange) fails
AssertionError:  Not equal to tolerance rtol=0.001, atol=1e-07
other other GRU (Gated Recurrent Unit in RNN) relax accuracy test tolerance relax accuracy test tolerance relax precision tolerance, absolute tolerance (atol) = 1e-5 self.run_test(model, input, batch_size=RNN_BATCH_SIZE,) self.run_test(model, input, batch_size=RNN_BATCH_SIZE, atol=1e-5)
72 PyTorch 77651615c8976b6ad7ddd8abf2a62cd54b573f56 C++ Fix loss of precision CHAR_BIT indicates how many bits are in a char. On almost every architecture today it's 8 bits to a char, but on some historical machines it has been 7. The previous code mistook the number of decimal digits with the binary precision that this gemm implementation expected quantization quantization quantization, fbgemm increase variable precision/change variable type increase variable precision Use the correct number of binary precision. Interestingly, while C expects everything in terms of number of bytes, this library expects precision to be in number of bits, so CHAR_BIT must be multiplied by the result of sizeof (sizeof returns number of bytes) in order to get this number in bits. qparams.precision = std::numeric_limits<typename T::underlying>::digits; qparams.precision = CHAR_BIT * sizeof(typename T::underlying);
73 PyTorch 9b69f21a95fa626522ef371f8557e7286f9db318 C++ Fix loss of precision The Code Generator (codegen.h/cpp) produces the string to be compiled on the device.
Csrc directory in Pytorch repo contains all of the code concerned with integration with Python. This is in contrast to lib, which contains the Torch libraries that are Python agnostic. csrc depends on lib, but not vice versa.
Jit directory contains (most of) the C++ code for the PyTorch JIT, a language and compiler stack for executing PyTorch models portably and efficiently.
The fuser accepts subgraphs wrapped in "fusion nodes" and tries to execute them by just-in-time (JIT) compiling kernels that run all the graph operations. fuser - identify processes using files or sockets
just-in-time (JIT) compilation (also dynamic translation or run-time compilations)[1] is a way of executing computer code that involves compilation during execution of a program – at run time – rather than before execution.[
Std::scientific
modifies the default formatting for floating-point input/output.
Specifically, write floating-point values in scientific notation
Sets the floatfield format flag for the str stream to scientific. When floatfield is set to scientific, floating-point values are written using scientific notation: the value is represented always with only one digit before the decimal point, followed by the decimal point and as many decimal digits as the precision field (precision). Finally, this notation always includes an exponential part consisting on the letter e followed by an optional sign and three exponential digits.
Std::setprecision
When used in an expression out << setprecision(n) or in >> setprecision(n), sets the precision parameter of the stream out or in to exactly n.
low precision emitted for prim:: Constant compiler compiler code generation for compiler, fuser, JIT increase variable precision/change variable type increase variable precision Patch 1: Emit higher precision literal for float values v in the fusion kernel using std::setprecision instead of std::scietific.
Patch 2: increase precision in code that sets variable types: int to int 64 and float to double
// Note: The NAN, NEG_INFINITY and POS_INFINITY strings map to device-specific
// implementations of these special values. These macros are found in the
// resource strings for each device.
static std::string scalarValue(const double v) {
  std::ostringstream out;
  if (std::isnan(v)) {
    out << "NAN";
  } else if (std::isinf(v)) {
    if (v < 0) {
      out << "NEG_INFINITY";
    } else {
      out << "POS_INFINITY";
    }
  } else {
    out << std::setprecision(16) << v;
  }
  return out.str();
}
@unittest.skipIf(RUN_CUDA, 'This tests the CPU fuser')
+    @unittest.skipIf(IS_WINDOWS or IS_SANDCASTLE, "NYI: fuser support for Windows or Sandcastle")
+    @enable_cpu_fuser
+    def test_fuser_double_literal_precision(self):
+        code = '''
+        graph(%2 : Float(*, *)):
+            %4 : int = prim::Constant[value=1]()
+            %3 : float = prim::Constant[value=1.282549830161864]()
+            %5 : Float(*, *) = aten::add(%2, %3, %4)
+            %1 : Float(*, *) = aten::relu(%5)
+            return (%1)
+        '''
+
+        graph = parse_ir(code)
+        code = torch._C._jit_fuser_get_fused_kernel_code(graph, [torch.rand(3, 4)])
+        FileCheck().check('1.282549830161864').run(code)
74 PyTorch 8e1e29124de99c01d08a2e2c02455c72335a971d Python Fix loss of precision In various distributions (bernoulli, binomial, etc.) the expansion method chooses to use probabilities over logits, which results in loss of precision statistical distributions statistical distributions distributions rewrite math formula rewrite math formula In method “expand(self, batch_shape, _instance=None)” of the distribution class change logic of preference of probabilities and logits
If logits are available, use them over probabilities (not the other way around)
75 PyTorch 2ed95c58713b45a6a9dac4336135523555bc58a9 C++ Disable warning overflow error from Microsoft compiler when building compiler compiler Micsrosoft C++ compiler, Converter disable test/warning disable warning disable warning using pragma warning disable #ifdef _MSC_VER
+#pragma warning( disable : 4146 )
+#endif
76 PyTorch dc72a5e02c1ecb105ea58cafcf10ef3a6f7d9c25 C++ Fix underflow CV refers to OpenCV and rotatedRectangleIntersection is a function in OpenCV library
rotatedRectangleIntersection finds out if there is any intersection between two rotated rectangles.

int cv::rotatedRectangleIntersection ( const RotatedRect & rect1,
const RotatedRect & rect2,
OutputArray intersectingRegion
)
cv::rotatedRectangleIntersection has a known float underflow bug that would cause failure in ```CV_Assert(intersection.size() <= 8)```, Problem reported in OpenCV
data processing image processing OpenCV, rotated triangele intersection use a different algorithm use a different algorithm Replace rotatedRectangleIntersection with custom made replacement function cvfix_rotatedRectangleIntersection. When OpenCV version is upgraded to be >= 4.0, we can remove this replacement function.
77 PyTorch 4b97a4642100e26d14c34c07c31643422d60ac48 C++ Disable warning overflow compilation error due to signed overflow compiler compiler compiling disable test/warning disable warning Disable strict-overflow flag to avoid compilation error ADD_COMPILE_OPTIONS(-Wno-strict-overflow)
ADD_COMPILE_OPTIONS(-Wno-error=strict-overflow)
78 PyTorch 55b25365e9e11ee4d9dfb02ff1c79081225c7bd1 C++ New feature loss of precision N/A N/A non-standard precision non-standard precision quantization, low precision computations other add new algorithm Add feature to allow 8 bit precision values (ultra low precision) is_same<T, uint8_t>::value && GetCpuId().avx2(); is_same<T, uint8_t>::value && GetCpuId().avx2() &&
!FLAGS_caffe2_dnnlowp_force_slow_path;
79 PyTorch efd2aeac9e03a8813ba37db98e1a7645fa2902be txt Disable warning overflow Wno-stringop-overflow uses Object Size Checking to determine the sizes of destination objects stringop-overflow flag is added in only in GCC 7 compiler compiler GCC compiler flags disable test/warning disable warning Change logic for compiler flag Wno-stringop-overflow. Set it only if GCC compiler version >= 7 if (CMAKE_COMPILER_IS_GNUCXX AND NOT (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7.0.0))
+    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-stringop-overflow")
+  endif()
80 PyTorch d97c9dd01904ff423554345cd877ebc1e520c21e Python Add warning loss of precision Check gradients computed via small finite differences against analytical
    gradients w.r.t. tensors in :attr:`inputs` that are of floating point or complex type
    and with ``requires_grad=True``.
    The check between numerical and analytical gradients uses :func:`~torch.allclose`.
    For most of the complex functions we consider for optimization purposes, no notion of
    Jacobian exists. Instead, gradcheck verifies if the numerical and analytical values of
    the Wirtinger and Conjugate Wirtinger derivatives are consistent. Because the gradient
    computation is done under the assumption that the overall function has a real-valued
    output, we treat functions with complex output in a special way. For these functions,
    gradcheck is applied to two real-valued functions corresponding to taking the real
    components of the complex outputs for the first, and taking the imaginary components
    of the complex outputs for the second. For more details, check out
    :ref:`complex_autograd-doc`.
failure of gradient check between numerical and anlytical gradients due to low precision input (the input is numerical gradients) gradients/derivatives automatic differentiation gradients, autograd, testing precision add test/warning add precision warning Add a warning for gradients that require a check between numerical and analytical gradients need to be of double precision
81 PyTorch 4d287f90743e09d1fdc6e2b3519b16c2d1ae3fa3 C++ Fix overflow for loop index overflow if input vector is large tensor math tensor math loop index, low level math, summation of scalars increase variable precision/change variable type increase variable precision increase precision from int to int 64 for (int i = k * WIDTH; i != size; i++) Patch: increase precision from int to int 64
@@ -102,7 +102,7 @@ struct Reduction {
       sum = std::accumulate(buf, buf + WIDTH, scalar_t(ident), ReduceScalar());
     }
+    for (int64_t i = k * WIDTH; i != size; i++) {
       sum = ReduceScalar()(sum, data[i]);
     }
     return sum;
82 PyTorch 7cbe63da8621b6063c864527592db6b1c894804f Cuda Fix loss of precision statistical distributions statistical distributions Distributions (Multinomial), THT Tensor Random,
binarySearchForMultinomial
rewrite math formula rewrite math formula   // first non-zero element by setting start to size-1 here,
+    // the code below will move it to the last non-zero probability
+    // this actually can happen when the random number is 1
start = 0; start = size - 1; # Test a corner case from older PyTorch (Issue #4858)
+        freqs = torch.cuda.FloatTensor([
+            0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
+            0.03178183361887932, 0.027680952101945877, 0.033176131546497345,
+            0.046052902936935425, 0.07742464542388916, 0.11543981730937958,
+            0.14148041605949402, 0.15784293413162231, 0.13180233538150787,
+            0.08271478116512299, 0.049702685326337814, 0.027557924389839172,
+            0.018125897273421288, 0.011851548217236996, 0.010252203792333603,
+            0.007422595750540495, 0.005372154992073774, 0.0045109698548913,
+            0.0036087757907807827, 0.0035267581697553396, 0.0018864056328311563,
+            0.0024605290964245796, 0.0022964938543736935, 0.0018453967059031129,
+            0.0010662291897460818, 0.0009842115687206388, 0.00045109697384759784,
+            0.0007791675161570311, 0.00020504408166743815, 0.00020504408166743815,
+            0.00020504408166743815, 0.00012302644609007984, 0.0,
+            0.00012302644609007984, 4.100881778867915e-05, 0.0, 0.0, 0.0, 0.0,
+            0.0, 0.0])
+
+        torch.cuda.manual_seed(11042)
+        sample = torch.multinomial(freqs, 1000, True)
+        self.assertNotEqual(freqs[sample].min(), 0)
83 PyTorch 0167f76d2a99ced5f4359d8ea77eb6704179b797 Python Unit test loss of precision An absolute tolerance is a fixed number that is used to make direct comparisons.
Rtol = relative tolerance
precision tests/speed benchmarks accuracy testing testing precision, external to pytorch, onnx relax accuracy test tolerance relax accuracy test tolerance Relax precision, specifically, add absolute tolerance (atol). np.testing.assert_allclose(
                ref_outputs[i],
                outputs[i],
                rtol=1e-3)
np.testing.assert_allclose(
                ref_outputs[i],
                outputs[i],
                rtol=1e-3,
                atol=1e-7)
84 PyTorch 4b8f4fc25902e3a325b06e2db415bba9fad7c0ef Python New feature loss of precision N/A N/A non-standard precision non-standard precision mixed precision, training add new precision option add new precision option allow mixed precision in distributed training
85 PyTorch 873f1163806c14ae236538f76c44d04b63bef331 Python Unit test loss of precision The STFT computes the Fourier transform of short overlapping windows of the input. This giving frequency components of the signal as they change over time. The interface of this function is modeled after the librosa stft function. STFT (short time fourier transform) precision test not passing precision tests/speed benchmarks accuracy testing testing precision, fourier transform relax accuracy test tolerance relax accuracy test tolerance increase precision tolerance in assertEqual from 5e-6 to 7e-6 self.assertEqual(result.data, ref_result, 5e-6, 'stft result') self.assertEqual(result.data, ref_result, 7e-6, 'stft result')
86 PyTorch f9fd82d8933639a8cf20a029c7fa47fff8fdb93d Cuda Fix loss of precision __half2float = defined in cuda: cuda_fp16.h. Converts half number to float.
ScalarConvert = defined cvd library
non-standard precision non-standard precision mixed precision, sigmoid increase variable precision/change variable type increase variable precision Change how to convert half precision variables to float in struct TensorSigmoidOp. Use ScalarConvert instead of __half2float. Change float to accreal, which is int64_t -    float fin = __half2float(*in);
-    *out = __float2half(1.0f / (1.0f + expf(- fin)));

-#define H2F(input) __half2float(input)
-#define F2H(input) __float2half(input)

float fin = ScalarConvert<half, float>::to(*in);
+    *out = ScalarConvert<float, half>::to(1.0f / (1.0f + expf(- fin)));

+#define H2F(input) ScalarConvert<real, accreal>::to(input)
+#define F2H(input) ScalarConvert<accreal, real>::to(input)
87 PyTorch 35abc4efa2d08ef2e9b7d978089fbd98b8d14187 C++ New feature loss of precision N/A N/A torch.digamma(input, *, out=None) → Tensor Computes the logarithmic derivative of the gamma function on input.
torch.polygamma(n, input, *, out=None) → Tensor Computes the nth derivative of the digamma function on input. n≥0 is called the order of the polygamma function.
gradients/derivatives derivatives add new precision option add new precision option Add low-precision digamma() and polygamma() functions def test_digamma(self):
+        def test(use_double=False):
+            cpu_tensor = torch.randn(10, 10, 10)
+            gpu_tensor = cpu_tensor.cuda()
+            zeros = torch.zeros(10, 10, 10)
+            if (use_double):
+                cpu_tensor = cpu_tensor.double()
+                gpu_tensor = gpu_tensor.double()
+                zeros = zeros.double()
+            cpu_out = cpu_tensor.digamma()
+            gpu_out = gpu_tensor.digamma()
+            norm_errors = (gpu_out - cpu_out.cuda()) / gpu_out
+            self.assertEqual(norm_errors, zeros)
+
+        test(True)
+        test(False)
+
+    def test_polygamma(self):
+        def test(use_double=False):
+            cpu_tensor = torch.randn(10, 10, 10)
+            gpu_tensor = cpu_tensor.cuda()
+            zeros = torch.zeros(10, 10, 10)
+            if (use_double):
+                cpu_tensor = cpu_tensor.double()
+                gpu_tensor = gpu_tensor.double()
+                zeros = zeros.double()
+            for n in [0, 1]:
+                cpu_out = cpu_tensor.polygamma(n)
+                gpu_out = gpu_tensor.polygamma(n)
+                norm_errors = (gpu_out - cpu_out.cuda()) / gpu_out
+                self.assertEqual(norm_errors, zeros)
log, derivative
88 PyTorch 0443c11f7e4d14dfe5f5b23f4112a4c443d95a9c Python Fix loss of precision Volta is the codename for a GPU microarchitecture developed by Nvidia
The major revision number is 7 for devices based on the Volta architecture, 6 for devices based on the Pascal architecture, 5 for devices based on the Maxwell architecture, 3 for devices based on the Kepler architecture, 2 for devices based on the Fermi architecture, and 1 for devices based on the Tesla architecture.

torch.cuda.get_device_capability(device=None)
Gets the cuda capability of a device.
Return type: tuple(int, int)
Returns: the major and minor cuda capability of the device

pre-volta architecture (i.e.: prior to cuda major version 7) half precision needs special handling non-standard precision non-standard precision GPU, cuDNN, RNN, half precision, hardware other check hardware Fix half precision for older (pre-volta) NVIDIA cards. Add in additional logic for checking major cuda capability of currently selected device - check if major cuda capability is greater than 7. -        if version() >= 7000 and int(cuda[0]) >= 9:
-            lib.cudnnSetRNNMatrixMathType(self, CUDNN_DEFAULT_MATH)
-            if datatype == CUDNN_DATA_HALF:
-                lib.cudnnSetRNNMatrixMathType(self, CUDNN_TENSOR_OP_MATH)
if version() >= 7000 and int(cuda[0]) >= 9 and (
+                    torch.cuda.get_device_capability(torch.cuda.current_device())[0] >= 7):
+                lib.cudnnSetRNNMatrixMathType(self, CUDNN_DEFAULT_MATH)
+                if datatype == CUDNN_DATA_HALF:
+                    lib.cudnnSetRNNMatrixMathType(self, CUDNN_TENSOR_OP_MATH)
89 PyTorch f7a459b28b184dedf265ed8718f85cb483e8284e Cuda Fix overflow MAGMA is a collection of next generation linear algebra (LA) GPU accelerated libraries designed and implemented by the team that developed LAPACK and ScaLAPACK. overflow when using magma
Ints are commonly only 32 bits. An int should not be used for a size. If you index an array of floats with an int, you can only store up to 8 GB of data in that array, much smaller than many workloads require.
other external library extranal library, linear algebra increase variable precision/change variable type increase variable precision Use a 64 bit value for size. -  int n = a_->size[0];
-  int nrhs = b_->size[1];
int64_t n = a_->size[0];
+  int64_t nrhs = b_->size[1];
90 PyTorch 30bbeb8b87ef815d8a7ce8ab8af9a301fcedfbe9 C++ Change exception to a warning overflow/loss of precision exception due to check requiring that type conversions are exact precision tests/speed benchmarks accuracy testing scalars, overflow check, precision check relax accuracy test tolerance relax accuracy test tolerance relax type conversion check - only throw an exception if overflow
91 PyTorch a92fce18715be7317b5eab1319140899b355eb9f Python Unit test loss of precision cpu and gpu gradients are not exact precision tests/speed benchmarks accuracy testing precision testing, testing that cpu and cuda gradients are equal relax accuracy test tolerance relax accuracy test tolerance Relax precision tolerance from 0 to 5e-5 for comparing cpu and gpu gradients self.assertEqual(grid_cpu.grad, grid_cuda.grad) self.assertEqual(grid_cpu.grad, grid_cuda.grad, prec=5e-5)
92 PyTorch 4c35c630eca9a7a3fbfc8f4bc72ea2fd5ba0dd05 Python Unit test loss of precision precision tests/speed benchmarks accuracy testing gradient check relax accuracy test tolerance relax accuracy test tolerance Enable norm gradgradchecks by lowering precision requirements. Add absolute and relative tolerance precision based on empirical observations gradgradcheck_precision_override = {
+    'test_NormFunction_1_5': {'atol': 1e-2, 'rtol': 1e-2},
+    'test_NormFunction_2': {'atol': 1e-2, 'rtol': 1e-2},
+    'test_NormFunction_3': {'atol': 5e-2, 'rtol': 1e-2},
+}
+            if test_name in gradgradcheck_precision_override:
+                atol = gradgradcheck_precision_override[test_name]['atol']
+                rtol = gradgradcheck_precision_override[test_name]['rtol']
+                self.assertTrue(gradgradcheck(apply_fn, input, grad_y, atol=atol, rtol=rtol))
93 PyTorch eaacfc7e25c0500f933b0c68e63f1d947739df90 Python Speed optimization loss of precision Performance issue in momentum update of SGD due to repeatedly converting gradient from 16 bits to 32 and vice versa. cuda was allocating and freeing chunks of memory frequently because grad was changing sizes optimizers optimizers SGD optimizer, momentum update increase variable precision/change variable type increase variable precision Set grad to a predetermined size of fp_32 so cuda no longer needs to alloc/free this frequently.          net.MomentumSGDUpdate(
             [grad_fp32, momentum_data, lr, param_fp32],
-            [grad, momentum_data, param_fp32],
         net.MomentumSGDUpdate(
             [grad_fp32, momentum_data, lr, param_fp32],
param_fp32],
+            [grad_fp32, momentum_data, param_fp32],
94 PyTorch 1f4317be3f02d84e93303193e782c1cb002b26e3 Cuda, C++ New feature loss of precision N/A N/A ollectives = collective communication in distributed computing
Gloo backend for distributed CPU training
non-standard precision non-standard precision Distributed training, half precision add new precision option add new precision option Add support for collectives over vectors of half-precision floating point values
95 PyTorch aec182ae72d51dad0f46cdfe7ff9a41380d7da35 Cuda New feature loss of precision N/A N/A Performs a batch matrix-matrix product of matrices in batch1 and batch2. input is added to the final result.

batch1 and batch2 must be 3-D tensors each containing the same number of matrices.
linear algebra linear algebra tensor math, linear algebra add new precision option add new precision option add support for half precision in tensormath blas in BADDBMM (batch matrix-matrix product)
96 PyTorch 1bf7bc9768fa3f768419884595e08b3bc25913ea Cuda Fix overflow One often wants the type for the accumulator to be of higher precision than the inputs. When accumulating (summation for example), error will build up more. sum accumulator had insufficient precision statistical distributions statistical distributions distributions (multinomial) increase variable precision/change variable type increase variable precision change data type for accumulator from T to AccT and add assertion to make sure the sum of distribution did not overflow (i.e.: is not inf) assert(!isinf(sum));
97 PyTorch c1ba0fbab3ad3f1a4b2630de9629c4749469eada C++ New feature loss of precision N/A N/A non-standard precision non-standard precision cuDNN, ReLu, mixed precision add new precision option add new precision option Decide at runtime which precision of types to use
98 PyTorch 26516f667e688ed38c8ded71af8e1abc3a56d5ee Python Unit test loss of precision tensor math statistics testing precision, mean, standard deviation relax accuracy test tolerance relax accuracy test tolerance relax precision tolerance in assertEqual -        self.assertEqual(r[:,:50].std(), 4, 0.2)
-        self.assertEqual(r[:,:50].std(), 4, 0.2)
-        self.assertEqual(q.mean(), 2, 0.1)
-        self.assertEqual(q.std(), 3, 0.1)
-        self.assertEqual(q.mean(), 0, 0.1)
-        self.assertEqual(q.std(), 1, 0.1)
  self.assertEqual(r[:,:50].std(), 4, 0.3)
+        self.assertEqual(r[:,:50].std(), 4, 0.3)
+        self.assertEqual(q.mean(), 2, 0.3)
+        self.assertEqual(q.std(), 3, 0.3)
+        self.assertEqual(q.mean(), 0, 0.2)
+        self.assertEqual(q.std(), 1, 0.2)
99 PyTorch cd780eb9ec20827a924c658b5960be452797076d C++ Speed optimization inefficient algorithm AXPBY Scales two vectors, adds them to one another and stores result in the vector.
In this case the type is double (daxpby)
?axpby perform vector vector operation defined as y:= a*x + b*y, where a and b are scalars and x and y are vectors of length n
In caffe2 CPU math using MKL (MKL is an optimized Intel math library) function CAFFE2_SPECIALIZED_AXPBY(double, d) suffers from underlow.
When running caffe2 experiments that calling Exp with many values close to 0 causes MKL's underflow error handler to be called repeatedly, causing significant overhead while the result is correct (e.g. exp(x) = 0).
other external library extermal library (MKL), exp, caffe2, disable test/warning disable warning Disable MKL's underflow checker to speed up operation by setting the error mode to VML_ERRMODE_IGNORE -#define DELEGATE_SIMPLE_UNARY_FUNCTION(T, Funcname, OriginalFunc)              \
-template <>                                                                    \
-void Funcname<T, CPUContext>(                                                  \
-    const int N, const T* x, T* y,                                             \
-    CPUContext* context) {                                                     \
-  OriginalFunc(N, x, y);                                                       \
-}
-DELEGATE_SIMPLE_UNARY_FUNCTION(float, Exp, vsExp)
-DELEGATE_SIMPLE_UNARY_FUNCTION(double, Exp, vdExp)
#define DELEGATE_SIMPLE_UNARY_FUNCTION(T, Funcname, OriginalFunc, ...) \
+  template <>                                                          \
+  void Funcname<T, CPUContext>(                                        \
+      const int N, const T* x, T* y, CPUContext* context) {            \
+    OriginalFunc(N, x, y, ##__VA_ARGS__);                              \
+  }
+DELEGATE_SIMPLE_UNARY_FUNCTION(
+    float,
+    Exp,
+    vmsExp,
+    VML_HA | VML_FTZDAZ_OFF | VML_ERRMODE_IGNORE)
+DELEGATE_SIMPLE_UNARY_FUNCTION(
+    double,
+    Exp,
+    vmdExp,
+    VML_HA | VML_FTZDAZ_OFF | VML_ERRMODE_IGNORE)
100 PyTorch 206029bc5a3f179abe97986641ed3ccd3c414126 C++ Fix overflow Integer literals are of type int, size index variable overflows if input tensor very big, specifically when input > 2GB). other external library external library, caffe2 increase variable precision/change variable type increase variable precision Increase precision of variable that holds tensor size from int 32 to int 64. Instead of passing in an integer literal, do a static cast on an integer literal to a larger datatype for the accumulator type      auto newSize = std::accumulate(
-        newDims.begin(), newDims.end(), 1, std::multiplies<TIndex>());

@@ -180,7 +183,10 @@ class Tensor {
   template <class T, class ContextForCopy>
   void Reserve(const std::vector<T>& newCapacity, ContextForCopy* context) {
     auto newSize = std::accumulate(
-        newCapacity.begin(), newCapacity.end(), 1, std::multiplies<TIndex>());
     if (newSize * meta_.itemsize() <= capacity_) {
       return;
     }
@@ -208,7 +214,10 @@ class Tensor {
         "New outer dimension must be smaller than current.");
     dims_[0] = outer_dim;
     size_ = std::accumulate(
-        dims_.begin(), dims_.end(), 1, std::multiplies<TIndex>());
   }
     auto newSize = std::accumulate(
+        newDims.begin(),
+        newDims.end(),
+        static_cast<TIndex>(1),
+        std::multiplies<TIndex>());
     if (newSize * meta_.itemsize() <= capacity_) {
       dims_ = newDims;
       size_ = newSize;
@@ -180,7 +183,10 @@ class Tensor {
   template <class T, class ContextForCopy>
   void Reserve(const std::vector<T>& newCapacity, ContextForCopy* context) {
     auto newSize = std::accumulate(
+        newCapacity.begin(),
+        newCapacity.end(),
+        static_cast<TIndex>(1),
+        std::multiplies<TIndex>());
     if (newSize * meta_.itemsize() <= capacity_) {
       return;
     }
@@ -208,7 +214,10 @@ class Tensor {
         "New outer dimension must be smaller than current.");
     dims_[0] = outer_dim;
     size_ = std::accumulate(
+        dims_.begin(),
+        dims_.end(),
+        static_cast<TIndex>(1),
+        std::multiplies<TIndex>());
   }
101 PyTorch 5030d76acfcdd48492e988e3fc1aa19bebe9366a Python Fix loss of precision linear algebra linear algebra precision testing for CUDA blas relax accuracy test tolerance relax accuracy test tolerance reduce precision of CUDA blas tests custom_precision = {
     'addbmm': 1e-4,
     'addmm': 1e-4,
+    'addmv': 1e-4,
+    'addr': 1e-4,
+    'baddbmm': 1e-4,
     'rsqrt': 1e-4,
     'cumprod': 1e-4,
}
102 PyTorch a489884da4b63e33ede107261afd6a4a81d9401a Python Unit test loss of precision torch.addmm(input, mat1, mat2, *, beta=1, alpha=1, out=None) → Tensor Performs a matrix multiplication of the matrices mat1 and mat2. The matrix input is added to the final result. alpha and beta are scaling factors on matrix-vector product between mat1 and mat2 and the added matrix input respectively.
linear algebra linear algebra precision testing for matrix multiply relax accuracy test tolerance relax accuracy test tolerance Reduce precision of addmm CUDA test custom_precision = {
     'addbmm': 1e-4,
+    'addmm': 1e-4,
     'rsqrt': 1e-4,
     'cumprod': 1e-4,
}
out = Beta * input + Alpha * (mat1_i @ mat2_i)
103 PyTorch a0fb1ab86e88d5c98733d7e6e5aa3b5811fe24f4 Python Unit test loss of precision torch.rsqrt(input, *, out=None) → Tensor Returns a new tensor with the reciprocal of the square-root of each of the elements of input. linear algebra linear algebra precision testing for matrix multiply and square root relax accuracy test tolerance relax accuracy test tolerance Reduce precision for addmm and rsqrt CUDA tests out_i = 1/(sqrt(input_i))
104 PyTorch f7fe6cf1a6a58c55335e1b337dbdd23a78a2f74a C Fix overflow statistical distributions statistical distributions multinomial distribution increase variable precision/change variable type increase variable precision Using higher precision type for accumulator void THTensor_(multinomial)(THLongTensor *self, THGenerator *_generator, THTenso
   for (i=0; i<n_dist; i++)
   {
     /* Get normalized cumulative distribution from prob distribution */
-    real sum = 0;
     for (j=0; j<n_categories; j++)
     {
       sum += THStorage_(get)( \
@@ -160,7 +160,7 @@ void THTensor_(multinomial)(THLongTensor *self, THGenerator *_generator, THTenso
         /* update cumulative distribution so that sample cannot be drawn again */
         real diff;
         real new_val = 0;
-        real sum;
         if (sample_idx != 0)
         {
void THTensor_(multinomial)(THLongTensor *self, THGenerator *_generator, THTenso
   for (i=0; i<n_dist; i++)
   {
     /* Get normalized cumulative distribution from prob distribution */
+    accreal sum = 0;
     for (j=0; j<n_categories; j++)
     {
       sum += THStorage_(get)( \
@@ -160,7 +160,7 @@ void THTensor_(multinomial)(THLongTensor *self, THGenerator *_generator, THTenso
         /* update cumulative distribution so that sample cannot be drawn again */
         real diff;
         real new_val = 0;
+        accreal sum;

         if (sample_idx != 0)
         {
105 Tensorflow/Keras 2ccbbdb4b06bf0d60d02c7cf316fce117b77df55 C++ fix overflow/underflow softmax output is NaN overflow/underflow Direct calculation of the softmax function according to its definition formula  is conjugate with numerical issues. Single-precision exp(x) function overflows
for x > 89 and underflows for x < −104, and, in turn, cause NaN outputs in the na¨ıve implementations.
activation functions activation functions softmax, openGL use a different algorithm use a different algorithm Implement a tree pass softmax algorithm, see algorithm in https://arxiv.org/pdf/2001.04438.pdf softmax
106 Tensorflow/Keras 115623e2fc21affeaeee5167daec9c1f0db27069 C++ fix overflow/underflow softmax output is NaN overflow/underflow Direct calculation of the softmax function according to its definition formula  is conjugate with numerical issues. Single-precision exp(x) function overflows
for x > 89 and underflows for x < −104, and, in turn, cause NaN outputs in the na¨ıve implementations.
activation functions activation functions softmax, openCL use a different algorithm use a different algorithm Implement a tree pass softmax algorithm, see algorithm in https://arxiv.org/pdf/2001.04438.pdf softmax
107 Tensorflow/Keras e665a737f90564cd143fdc1b15420720596d17e1 C++ fix underflow tensor math statistics mean test rewrite math formula rewrite math formula auto input_rng = std::bind(
-      std::uniform_real_distribution<float>(-15.0f, 15.0f), std::ref(rng));
auto input_rng =
+      std::bind(std::uniform_real_distribution<float>(), std::ref(rng));
108 Tensorflow/Keras e60c1ba960e598be9c0e0cdd331cdc10e8919dbb C++ fix overflow/underflow activation functions activation functions LSTM, logistic function rewrite math formula rewrite math formula XlaOp Logistic(XlaOp x) {
-  auto half = xla::ScalarLike(x, 0.5);
-  return half + half * xla::Tanh(half * x);
}
XlaOp Logistic(XlaOp x) {
+  auto one = xla::ScalarLike(x, 1);
+  return xla::Div(one, (one + xla::Exp(xla::Neg(x))));
}
def testFloatOpsDisabledOnMlirBridge(self):
+    for dtype in self.float_types:
+      if dtype != np.float16:
+        self._assertOpOutputMatchesExpected(
+            lambda x: math_ops.sigmoid(x) / math_ops.log1p(math_ops.exp(x)),
+            np.array([-40, 40], dtype=dtype),
+            expected=np.array([1.0, 0.025], dtype=dtype))
109 Tensorflow/Keras 86fa42f516e4c5ca5ac3b2430aeab9d1a55afb30 python fix loss of precision the output of derivative of betainc is NaN invalid operation I = betainc(X,Z,W) computes the incomplete beta function for corresponding elements of the arrays X, Z and W. The elements of X must be in the closed interval . The arrays Z and W must be nonnegative and real. All arrays must be the same size, or any of them can be scalar. When calculating the derivate of betainc, if a or b are equal to 1, there is a risk that log(0) occurs gradients/derivatives derivatives derivative of Betainc (incomplete beta function) rewrite math formula rewrite math formula Use xlog1py and xlogy instead of log. The function xlog1py  computes x * log1p(y) for a given x and y, This function safely returns zero when x = 0, no matter what the value of y is. The function xlogy(x,y) returns 0 if x == 0, and x * log(y) otherwise, elementwise. partial_x = math_ops.exp((b - 1) * math_ops.log(1 - x) +
-                           (a - 1) * math_ops.log(x) - log_beta)
# We use xlog1py and xlogy since the derivatives should tend to
+  # zero one one of the tails when a is 1. or b is 1.
+  partial_x = math_ops.exp(math_ops.xlog1py(b - 1, -x) +
+                           math_ops.xlogy(a - 1, x) - log_beta)
exp, log
110 Tensorflow/Keras ee85e6d230278e763a2784ba86acc747abdb2242 C++ fix loss of precision decreased accuracy tensor math statistics variance use a different algorithm use a different algorithm Use more numerically stable two-pass algorithm to calculate variance in MeanStddevNormalization.    for (int batch = 0; batch < n_batch; ++batch) {
     float sum = 0.0f;
-    float sum_sq = 0.0f;
     for (int i = 0; i < v_size; ++i) {
       sum += input_vector[i];
-      sum_sq += input_vector[i] * input_vector[i];
     }
     const float mean = sum / v_size;
-    const float variance = sum_sq / v_size - mean * mean;
   for (int batch = 0; batch < n_batch; ++batch) {
     float sum = 0.0f;
     }
     const float mean = sum / v_size;
-    const float variance = sum_sq / v_size - mean * mean;
+    float sum_diff_sq = 0.0f;
+    for (int i = 0; i < v_size; ++i) {
+      const float diff = input_vector[i] - mean;
+      sum_diff_sq += diff * diff;
+    }
+    const float variance = sum_diff_sq / v_size;
variance, sum of squares
111 Tensorflow/Keras fd2d8bc50e9b3143544819bf505326e4ed6db2a5 C++ fix overflow/underflow incorrect result overflow XlaOp=Array to concatenate across replicas. asinh(x) = log(x + sqrt(x^2 + 1)) risk of overflow due to x^2 for large x tensor math tensor math inverse hyperbolc sine rewrite math formula rewrite math formula For positive x, we can approximate x + sqrt(x^2 + 1) as 2*x and return log(2) + log(x). For negative x we utilize asinh(-x) = -asinh(x) XlaOp Asinh(XlaOp x) { return Log(x + Sqrt(x * x + ScalarLike(x, 1.0))); } XlaOp Asinh(XlaOp x) {
+  XlaBuilder* b = x.builder();
+  auto do_it = [&](XlaOp x) -> StatusOr<XlaOp> {
+    TF_ASSIGN_OR_RETURN(auto shape, b->GetShape(x));
+    auto one = ScalarLike(x, 1);
+    if (primitive_util::IsComplexType(shape.element_type())) {
+      return Log(x + Sqrt(x * x + one));
+    }
+    auto a = Abs(x);
+    auto naive_result = Log(a + Sqrt(a * a + one));
+    auto overflow_result = Log(Abs(a)) + Log(ScalarLike(a, 2));
+    auto sqrt_max_value = Sqrt(MaxFiniteValue(b, shape.element_type()));
+    return Sign(x) *
+           Select(Ge(a, sqrt_max_value), overflow_result, naive_result);
+  };
+  // These upcasts are not strictly necessary on all platforms to get within our
+  // error tolerances, so we could relax this if it ever mattered.
+  return DoWithUpcastToF32(x, {BF16, F16}, [&](XlaOp x) {
+    return b->ReportErrorOrReturn(do_it(x));
+  });
+}
log, square root, power
112 Tensorflow/Keras f84e8257aa88fa45cc7a15835ad386565cd60237 C++ fix loss of precision In Eigen, a reduction is a function taking a matrix or array, and returning a single scalar value. One of the most used reductions is .sum() , returning the sum of all the coefficients inside a given matrix or array. CNN operations pooling layer eigen reduction, summation, EigenPooling use a different algorithm use a different algorithm use a tree algorithm for summation summation
113 Tensorflow/Keras 18f860fd8e1fdffd80633cf5ac32f895423dfa8d C++ fix underflow/loss of precision other random number generator testing, random number generation rewrite math formula rewrite math formula change input range for random number generator std::uniform_real_distribution<FloatT> generator(-0.9f, 1.0f); std::uniform_real_distribution<FloatT> generator(1.0f, 1.125f);
114 Tensorflow/Keras 35ca57d39b9e368ef43302421db774e4ac3e3625 Python fix overflow/underflow overflow/underflow statistical distributions statistical distributions binomial distribution rewrite math formula rewrite math formula Use log_sigmoid instead of log1p and log. ALso, use logits instead of probabilities return (self.total_count * math_ops.log1p(-self.probs)
-            + x * math_ops.log(self.probs))
return (self.total_count * math_ops.log_sigmoid(-self.logits)
+            + x * math_ops.log_sigmoid(self.logits))
def testLogProbOverflow(self):
+    with self.test_session() as sess:
+      logits = np.float32([20., 30., 40.])
+      total_count = np.float32(1.)
+      x = np.float32(0.)
+      nb = negative_binomial.NegativeBinomial(
+          total_count=total_count, logits=logits)
+      log_prob_ = sess.run(nb.log_prob(x))
+      self.assertAllEqual(np.ones_like(log_prob_, dtype=np.bool),
+                          np.isfinite(log_prob_))
+
+  def testLogProbUnderflow(self):
+    with self.test_session() as sess:
+      logits = np.float32([-90, -100, -110])
+      total_count = np.float32(1.)
+      x = np.float32(0.)
+      nb = negative_binomial.NegativeBinomial(
+          total_count=total_count, logits=logits)
+      log_prob_ = sess.run(nb.log_prob(x))
+      self.assertAllEqual(np.ones_like(log_prob_, dtype=np.bool),
+                          np.isfinite(log_prob_))
log
115 Tensorflow/Keras 2114fd51e9e4fe3cefc058fe42363f68126a9da6 C++ fix overflow/underflow overflow/underflow softplus(x) = log(exp(x) + 1), softplus is a smooth approximation of relu. Like relu, softplus always takes on positive values.

activation functions activation functions sofplus rewrite math formula rewrite math formula XLAJIT_MAKE_UNARY(Softplus,
-                  b->Log(b->Add(b->Exp(x), XlaHelpers::One(b, input_type(0)))));
static xla::ComputationDataHandle Softplus(
+    xla::ComputationBuilder* b, DataType dtype,
+    const xla::ComputationDataHandle& features) {
+  xla::ComputationDataHandle threshold =
+      b->Add(b->Log(XlaHelpers::Epsilon(b, dtype)),
+             XlaHelpers::FloatLiteral(b, dtype, 2.0));
+  // Value above which exp(x) may overflow, but softplus(x) == x
+  // is within machine epsilon.
+  xla::ComputationDataHandle too_large = b->Gt(features, b->Neg(threshold));
+  // Value below which exp(x) may underflow, but softplus(x) == exp(x)
+  // is within machine epsilon.
+  xla::ComputationDataHandle too_small = b->Lt(features, threshold);
+  xla::ComputationDataHandle features_exp = b->Exp(features);
+  xla::ComputationDataHandle output = b->Select(
+      too_large, features,
+      b->Select(too_small, features_exp,
+                b->Log(b->Add(features_exp, XlaHelpers::One(b, dtype)))));
+  return output;
+}
+XLAJIT_MAKE_UNARY(Softplus, Softplus(b, input_type(0), x));
def _assertSoftplusMatchesExpected(self, features, dtype):
+    features = np.array(features, dtype=dtype)
+    zero = np.asarray(0).astype(dtype)
+    expected = np.logaddexp(zero, features)
+    self._assertOpOutputMatchesExpected(
+        nn_ops.softplus, features, expected=expected)
+
+  def testSoftplus(self):
+    for dtype in self.float_types:
+      self._assertSoftplusMatchesExpected([[-2, 0, 8]], dtype)
+      self._assertSoftplusMatchesExpected(
+          [[-9, 7, -5, 3, -1], [1, -3, 5, -7, 9]], dtype)
+      log_eps = np.log(np.finfo(dtype).eps)
+      one = dtype(1)
+      ten = dtype(10)
+      self._assertSoftplusMatchesExpected([
+          log_eps, log_eps - one, log_eps + one, log_eps - ten,
+          log_eps + ten, -log_eps, -log_eps - one, -log_eps + one,
+          -log_eps - ten, -log_eps + ten], dtype)
116 Tensorflow/Keras 448de13b1ae2ebc96a49785cee5ae98db1ae7b06 C++ fix overflow/underflow linear algebra determinant log determinant of a matrix use a different algorithm use a different algorithm Compute the log determinant through a Partially Pivoted LU decomposition
117 Tensorflow/Keras 1193b39c9e58545ac35aae19dfa34a06bdfae073 Python fix underflow Poisson  is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. Lambda can be 0 through infinity. for a small rate lamdda in poisson distribution, e^(-lambda) causes numerical stability issues, because exp of a very small number produces exponentially smaller number, which leads to a risk of underflow statistical distributions statistical distributions poisson distribution rewrite math formula rewrite math formula Use log of rate instead of plain rate to avoid exponenitating very small numbers with ops.control_dependencies([check_ops.assert_positive(rate)] if
-                                    validate_args else []):
-        self._rate = array_ops.identity(rate, name="rate")
if (rate is None) == (log_rate is None):
+        raise ValueError("Must specify exactly one of `rate` and `log_rate`.")
+      elif log_rate is None:
+        rate = ops.convert_to_tensor(rate, name="rate")
+        if not rate.dtype.is_floating:
+          raise TypeError("rate.dtype ({}) is a not a float-type.".format(
+              rate.dtype.name))
+        with ops.control_dependencies([check_ops.assert_positive(rate)] if
+                                      validate_args else []):
+          self._rate = array_ops.identity(rate, name="rate")
+          self._log_rate = math_ops.log(rate, name="log_rate")
+      else:
+        log_rate = ops.convert_to_tensor(log_rate, name="log_rate")
+        if not log_rate.dtype.is_floating:
+          raise TypeError("log_rate.dtype ({}) is a not a float-type.".format(
+              log_rate.dtype.name))
+        self._rate = math_ops.exp(log_rate, name="rate")
+        self._log_rate = ops.convert_to_tensor(log_rate, name="log_rate")

class PoissonLogRateTest(PoissonTest):
+
+  def _make_poisson(self, rate, validate_args=False):
+    return poisson_lib.Poisson(
+        log_rate=math_ops.log(rate),
+        validate_args=validate_args)
+
+  def testInvalidLam(self):
+    # No need to worry about the non-negativity of `rate` when using the
+    # `log_rate` parameterization.
+    pass
exp
118 Tensorflow/Keras 0cff60ebb29f5aba5092988c8b7f13c258115e81 Python fix overflow/underflow linear algebra linear algebra log of hermitian matrix determinant use a different algorithm use a different algorithm Use the property that the log det(A) = 2*sum(log(real(diag(C)))), where C is the cholesky decomposition of A. Add a function to compute the natural log of the determinant for hermitian positive definite matrices in a numerically stable way via Cholesky decompositions.. Equivalent to numpy.linalg.slogdet, although no sign is returned since only
+  hermitian positive definite matrices are supported.
def logdet(matrix, name=None):
+  """Computes log of the determinant of a hermitian positive definite matrix.
+
+  ```python
+  # Compute the determinant of a matrix while reducing the chance of over- or
+  underflow:
+  A = ... # shape 10 x 10
+  det = tf.exp(tf.logdet(A))  # scalar
+  ```
+
+  Args:
+    matrix:  A `Tensor`. Must be `float32`, `float64`, `complex64`, or
+      `complex128` with shape `[..., M, M]`.
+    name:  A name to give this `Op`.  Defaults to `logdet`.
+
+  Returns:
+    The natural log of the determinant of `matrix`.
+
+  @compatibility(numpy)
+  Equivalent to numpy.linalg.slogdet, although no sign is returned since only
+  hermitian positive definite matrices are supported.
+  @end_compatibility
+  """
+  # This uses the property that the log det(A) = 2*sum(log(real(diag(C))))
+  # where C is the cholesky decomposition of A.
+  with ops.name_scope(name, 'logdet', [matrix]):
+    chol = gen_linalg_ops.cholesky(matrix)
+    return 2.0 * math_ops.reduce_sum(
+        math_ops.log(math_ops.real(array_ops.matrix_diag_part(chol))),
+        reduction_indices=[-1])
119 Tensorflow/Keras b85601b95eba28605d3de076fa70cabf2f2e32b9 Python fix loss of precision incorrect result In probability theory, an ƒ-divergence is a function Df (P  || Q) that measures the difference between two probability distributions P and Q. If probability distribution Q is not reparameterized, TensorFlow's gradient will be incorrect since the chain-rule stops at samples of unreparameterized distributions other probability ƒ-divergence use a different algorithm use a different algorithm Improve score-trick to be a valid Csiszar f-Divergence yet numerically stable. Using the Score-Gradient trick results in an unbiased gradient nabla E_q[f(X)]
-  = nabla int dx q(x) f(x)
-  = int dx nabla [ q(x) f(x) ]
-  = int dx q'(x) f(x) + q(x) f'(x)
-  = int dx q(x) nabla [ log(q(x)) stopgrad[f(x)] + f(x) ]
-  = E_q[ nabla [ log(q(X)) stopgrad[f(X)] + f(X) ] ]
-  ~= Avg{ log(q(y_i)) stopgrad[f(y_i)] + f(y_i) : y_i = stopgrad[x_i], x_i ~ q}
grad[ E_q[f(X)] ]
+  = grad[ int dx q(x) f(x) ]
+  = int dx grad[ q(x) f(x) ]
+  = int dx [ q'(x) f(x) + q(x) f'(x) ]
+  = int dx q(x) grad[ f(x) q(x) / stop_grad[q(x)] ]
+  = E_q[ grad[ f(x) q(x) / stop_grad[q(x)] ] ]
120 Tensorflow/Keras e6126230200e2ce9c96da5c9e4dc7f104c645d11 Python fix overflow/underflow overflow/underflow for very small or very large numbers naive direct computation of log of sum of exponentials has a risk of underflow and overflow respectively other other Gaussian mixture model, log probability rewrite math formula rewrite math formula Use Tensorflow log(sum(exp)) function to work in log scale which is numerically stable than log -> sum ->exp for calculating log probability self._prior_probs[shard_id] = math_ops.log(
-        math_ops.reduce_sum(
-            math_ops.exp(self._probs[shard_id]), 1, keep_dims=True))
self._prior_probs[shard_id] = math_ops.reduce_logsumexp(
+        self._probs[shard_id], axis=1, keep_dims=True)
def test_random_input_large(self):
+    # sklearn version.
+    iterations = 5  # that should be enough to know whether this diverges
+    np.random.seed(5)
+    num_classes = 20
+    x = np.array([[np.random.random() for _ in range(100)]
+                  for _ in range(num_classes)], dtype=np.float32)
+
+    # skflow version.
+    gmm = gmm_lib.GMM(num_classes,
+                      covariance_type='full',
+                      config=run_config.RunConfig(tf_random_seed=2))
+
+    def get_input_fn(x):
+      def input_fn():
+        return constant_op.constant(x.astype(np.float32)), None
+      return input_fn
+
+    gmm.fit(input_fn=get_input_fn(x), steps=iterations)
+    self.assertFalse(np.isnan(gmm.clusters()).any())
log sum of exp
121 Tensorflow/Keras fdbd02c8d7f07bd1207938662716fad8857dcd55 Python fix loss of precision deals with the shift parameter, but this feature is not available in TF now tensor math statistics mean, variance rewrite math formula rewrite math formula change the shift value for calculating mean shift = math_ops.cast(shift, dtypes.float32) if (
-        shift is not None and x.dtype == dtypes.float16) else shift
if shift is None:
+      # Compute true mean while keeping the dims for proper broadcasting.
+      shift = array_ops.stop_gradient(
+          math_ops.reduce_mean(y, axes, keep_dims=True))
+    else:
+      shift = math_ops.cast(shift, y.dtype)
     
+    # Reshape shift as needed.
+    shift = array_ops.reshape(shift, array_ops.shape(m_ss))
+    shift.set_shape(m_ss.get_shape())
122 Tensorflow/Keras 7c97f13ace37ac73bb820dec941c55ae4d538581 Python fix underflow Student's t-distribution is defined as the distribution of the random variable t which is (very loosely) the "best" that we can do not knowing sigma. statistical distributions statistical distributions student t distribution log probability rewrite math formula rewrite math formula use log1p instead of log. THe function log1p computes natural logarithm of (1 + x) element-wise. def _log_prob(self, x):
    y = (x - self.mu) / self.sigma
    half_df = 0.5 * self.df
    return (math_ops.lgamma(0.5 + half_df) - math_ops.lgamma(half_df) - 0.5 *
            math_ops.log(self.df) - 0.5 * math.log(math.pi) -
            math_ops.log(self.sigma) -
            (0.5 + half_df) * math_ops.log(1. + math_ops.square(y) / self.df))
def _log_prob(self, x):
    return self._log_unnormalized_prob(x) - self._log_normalization()

  def _log_unnormalized_prob(self, x):
    y = (x - self.mu) / self.sigma  # Abs(sigma) superfluous.
    return -0.5 * (self.df + 1.) * math_ops.log1p(y**2. / self.df)

  def _log_normalization(self):
    return (math_ops.log(math_ops.abs(self.sigma)) +
            0.5 * math_ops.log(self.df) +
            0.5 * np.log(np.pi) +
            math_ops.lgamma(0.5 * self.df) -
            math_ops.lgamma(0.5 * (self.df + 1.)))
log
123 Tensorflow/Keras de6ce1de08ea97d599687fbbe5196ca4af5232ae C++ fix overflow large logit values were not properly handled in multinomial distribution statistical distributions statistical distributions Multinomial distribution rewrite math formula rewrite math formula subtract a maximum from logits before taking exponentials running_total += std::exp(static_cast<float>(logits_row[j])) // Takes an along-class maximum (for numerical stability).
+        T max = std::numeric_limits<T>::lowest();
+        for (int64 j = 0; j < num_classes; ++j) {
+          if (std::isfinite(static_cast<float>(logits_row[j]))) {
+            max = std::max(max, logits_row[j]);
+          }
+        }
+        const float max_logit = static_cast<float>(max);
running_total += std::exp(static_cast<float>(logits_row[j]) - max_logit);
def testLargeLogits(self):
+    for neg in [True, False]:
+      with self.test_session(use_gpu=self.use_gpu):
+        logits = np.array([[1000.] * 5])
+        if neg:
+          logits *= -1
+        samples = tf.multinomial(logits, 10).eval()
+      # Sampled classes should be in-range.
+      self.assertTrue((samples >= 0).all())
+      self.assertTrue((samples < 5).all())
exponential
124 Tensorflow/Keras e47dc8593d11be8cd82767965b8b75b6307c07e4 Python fix loss of precision     There is evidence that the 'shift' strategy in computing the sufficient statistics of the moments is actually leading to worse numerical stability for batch normalization.
tensor math statistics mean, variance other amend algorithm set shift parameter as a non-default argument in moments method that calculates mean and variance and is utilized in batch normalization def sufficient_statistics(x, axes, shift=True, keep_dims=False, name=None) def sufficient_statistics(x, axes, shift=False, keep_dims=False, name=None) variance, mean
125 Tensorflow/Keras ab1165c4908b70441f1ddea24821a8b84a806ddc C++ fix overflow/underflow Legalization is the phase in code generation that eradicates any instructions that are not supported by the target. Multi-Level IR Compiler Framework activation functions activation functions sigmoid, compiler other amend algorithm This function converts Sigmoid op to HLO ops computing sigmoid class ConvertSigmoidOp : public OpRewritePattern<TF::SigmoidOp> {

-  using OpRewritePattern::OpRewritePattern;
-
-  LogicalResult matchAndRewrite(TF::SigmoidOp op,
class ConvertSigmoidOp : public RewritePattern {
  public:
+  explicit ConvertSigmoidOp(MLIRContext *context)
+      : RewritePattern(
+            TF::SigmoidOp::getOperationName(), 0, context,
+            {mhlo::ConstOp::getOperationName(),
+             shape::ShapeOfOp::getOperationName(),
+             shape::ToExtentTensorOp::getOperationName(),
+             mhlo::DynamicBroadcastInDimOp::getOperationName(),
+             mhlo::MulOp::getOperationName(), mhlo::TanhOp::getOperationName(),
+             mhlo::AddOp::getOperationName()}) {}
+
+  LogicalResult matchAndRewrite(Operation *sigmoid_op,
                                 PatternRewriter &rewriter) const override {
+    auto op = cast<TF::SigmoidOp>(sigmoid_op);
126 Tensorflow/Keras 6acd86d539464b611d37b8dc13251fafab25fb5c C++ fix loss of precision tensor math tensor math argmin rewrite math formula rewrite math formula amend logic for tie breaking
127 Tensorflow/Keras f73e9d61a7c577a5182701d3aa5bba8d6d69f87d C++ fix loss of precision tensor math tensor math argmin, argmax rewrite math formula rewrite math formula amend logic for tie breaking
128 Tensorflow/Keras ee85e6d230278e763a2784ba86acc747abdb2242 C++ fix loss of precision MeanStddevNormalization is numerically unstable tensor math statistics variance use a different algorithm use a different algorithm Use the numerically stable two-pass algorithm to calculate variance in MeanStddevNormalization. float sum_sq = 0.0f;
sum_sq += input_vector[i] * input_vector[i];
     }
const float variance = sum_sq / v_size - mean * mean;
float sum_diff_sq = 0.0f;
+    for (int i = 0; i < v_size; ++i) {
+      const float diff = input_vector[i] - mean;
+      sum_diff_sq += diff * diff;
+    }
+    const float variance = sum_diff_sq / v_size;
test accuracy for
// small mean, small variance
/ small mean, large variance
// large mean, zero variance
// large mean, small variance
/ large mean, large variance
129 Tensorflow/Keras f42d9846f6942e497645af28b3506e6163bdc8bf C++ fix underflow mel spectrogram is a spectrogram where the frequencies are converted to the mel scale. It is used in signal processing and it involves mapping audio signal from the time to frequency domain using fast fourier transform loss functions loss functions logistic loss, uniform distribution sampling, Mel-Frequency Cepstral Coefficient (MFCC) calculation use a different algorithm use a different algorithm Replace log(1 + x) with numerically more stable log1p(x) LogisticLossUpdater : public DualLossUpdater {
- return log(1 + exp(-y_wx)) * example_weight;

double MfccMelFilterbank::FreqToMel(double freq) const {
-  return 1127.0 * log(1.0 + (freq / 700.0));

LogUniformSampler::LogUniformSampler(int64 range)
-    : RangeSampler(range), log_range_(log(range + 1)) {}

static float FreqToMel(float freq) {
-  return 1127.0 * log(1.0 + (freq / 700.0));

double MfccMelFilterbank::FreqToMel(double freq) const {
-  return 1127.0 * log(1.0 + (freq / 700.0));
class LogisticLossUpdater : public DualLossUpdater {
+      return log1p(exp(-y_wx)) * example_weight;

double MfccMelFilterbank::FreqToMel(double freq) const {
+  return 1127.0 * log1p(freq / 700.0);

LogUniformSampler::LogUniformSampler(int64 range)
+    : RangeSampler(range), log_range_(log1p(range)) {}

static float FreqToMel(float freq) {
+  return 1127.0 * log1p(freq / 700.0);

double MfccMelFilterbank::FreqToMel(double freq) const {
+  return 1127.0 * log1p(freq / 700.0);

130 Tensorflow/Keras 0fe671dd0a14614edbbd50397777def3bff770cc Cuda fix loss of precision Eigen MeanReducer is numerically unstable due to unstable summation operation. Summing numbers of different magnitude leads to loss of precision, numbers should be sorted tensor math statistics mean use a different algorithm use a different algorithm Don't use the numerically unstable MeanReducer class in Eigen.
131 Tensorflow/Keras f84e8257aa88fa45cc7a15835ad386565cd60237 C++ fix loss of precision Eigen MeanReducer is numerically unstable due to unstable summation operation. Summing numbers of different magnitude leads to loss of precision, numbers should be sorted CNN operations pooling layer mean, average pooling use a different algorithm use a different algorithm Change the Eigen reduction code to use a tree to improve numerical stability.
    This changes the InnerMostDimReducer to use a summation tree, which is more numerically stable than the previous approach of sequential addition into an accumulator.
    This solves the issue for reduction over all or a trailing subset of dimensions.
    This change does not improve the numerical accuracy for MeanReducer, which maintains state.
    
    Benchmarks show a 40% (AVX) to 50% (SSE) slowdown for small row reductions (sum, float). column- and full reductions are unchanged.
132 Tensorflow/Keras fa2132ab65f92ea40c94152dba105a9f86a0a555 Python fix loss of precision unsorted sum is numerically unstable gradients/derivatives gradients gradients, hessians, boosted trees increase variable precision/change variable type increase variable precision Use 64bit aggregation for gradients and hessians since the 32 bit version is numerically unstable for large minibatches.      per_partition_hessians = math_ops.unsorted_segment_sum(
-        hessians, mapped_partitions, array_ops.size(unique_partitions))
# Since unsorted_segment_sum can be numerically unstable, use 64bit
+    # operation.
+    gradients64 = math_ops.cast(gradients, dtypes.float64)
+    hessians64 = math_ops.cast(hessians, dtypes.float64)
     per_partition_gradients = math_ops.unsorted_segment_sum(
+        gradients64, mapped_partitions, array_ops.size(unique_partitions))
     per_partition_hessians = math_ops.unsorted_segment_sum(
+        hessians64, mapped_partitions, array_ops.size(unique_partitions))
+    per_partition_gradients = math_ops.cast(per_partition_gradients,
+                                            dtypes.float32)
+    per_partition_hessians = math_ops.cast(per_partition_hessians,
+                                           dtypes.float32)
sum
133 Tensorflow/Keras 48adc7ba73177f2a9331918b160bc3d0775985b8 Python fix underflow square root linear algebra norm L2 norm rewrite math formula rewrite math formula Avoid  potentially numerically unstable square root in the linalg_ops.norm() mean = math_ops.square(linalg_ops.norm(m - m_w))  # This uses the L2 norm. mean = math_ops.reduce_sum(
+      math_ops.squared_difference(m, m_w))  # Equivalent to L2 but more stable.
square root, mean
134 Tensorflow/Keras 18f860fd8e1fdffd80633cf5ac32f895423dfa8d C++ fix underflow In computing, a normal number is a non-zero number in a floating-point representation which is within the balanced range supported by a given floating-point format: it is a floating point number that can be represented without leading zeros in its significand. other random number generator random number generator limit input range limit input range Create uniform numbers between 1 and 1.125  instead of -0.9 and 1.0 to avoid creating denormal numbers. std::uniform_real_distribution<FloatT> generator(-0.9f, 1.0f);
[&](tensorflow::gtl::ArraySlice<int64> /*indices*/) {
-        return generator(engine);
std::uniform_real_distribution<FloatT> generator(1.0f, 1.125f);
[&](tensorflow::gtl::ArraySlice<int64> indices) {
+        // Generate a random uniforma number from -0.0625 and 0.0625 and bias it
+        // with  a position dependent nubmer with mean 0.037109375. These number
+        // should allow for long chains of accumulation without being too close
+        // to zero or to large to accumulate all numbers accurately.
+        return (generator(engine) - 1.0625) +
+               static_cast<FloatT>(Product(indices) % 113 - 47) /
+                   static_cast<FloatT>(256.0f);
135 Tensorflow/Keras 6db014b44863bab616f026beab461fd646fcb505 C++ fix overflow/underflow gradients/derivatives gradients gradients testing other revert commit N/A TEST_F(NaryGradTest, Select) {
+  TensorShape shape({3, 2});
+  auto x1 = Placeholder(scope_, DT_FLOAT, Placeholder::Shape(shape));
+  auto x2 = Placeholder(scope_, DT_FLOAT, Placeholder::Shape(shape));
+  // Use constant values to avoid instability when computing
+  Tensor c =
+      test::AsTensor<float>({-3.5f, 1.5f, -1.2f, 3.0f, -2.5f, 2.8f}, {3, 2});
+  auto zero = Cast(scope_, Const(scope_, 0.0), c.dtype());
+  auto y = Where3(scope_, Greater(scope_, c, zero), x1, x2);
+  RunTest({x1, x2}, {shape, shape}, {y}, {shape});
+}

Status SelectGrad(const Scope& scope, const Operation& op,
+                  const std::vector<Output>& grad_inputs,
+                  std::vector<Output>* grad_outputs) {
+  auto comparator = op.input(0);
+  auto x = op.input(1);
+  auto zeros = ZerosLike(scope, x);
+  auto grad = grad_inputs[0];
+
+  auto gx_1 = Where3(scope, comparator, grad, zeros);
+  auto gx_2 = Where3(scope, comparator, zeros, grad);
+
+  grad_outputs->push_back(NoGradient());
+  grad_outputs->push_back(gx_1);
+  grad_outputs->push_back(gx_2);
+  return scope.status();
+}
+REGISTER_GRADIENT_OP("Select", SelectGrad);
136 Tensorflow/Keras 1bbec9e4e9c5d3fbbc2fa2b58841435e86dbf76a Cuda fix overflow linear algebra linear algebra log determinant use a different algorithm use a different algorithm Compute Determinant from a partially pivoted LU factorization
Change behavior for Determinant on matrices with (numerically) infinite determinants to match the behavior of numpy.linalg.det: Return inf for matrix with infinite determinant.
137 Tensorflow/Keras 265483857be3ca84b992937490ea8f0591b2d4ab Python fix overflow/underflow statistical distributions statistical distributions laplace distribution use a different algorithm use a different algorithm Add more stable calculation of Log of the cumulative distribution function and log survival function
138 Tensorflow/Keras e37e792d3eb2dac7ac627b7d8d56d69360649d19 Python fix loss of precision The raw formulation of cross-entropy, tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.softmax(y)),
reduction_indices=[1])) can be numerically unstable.
loss functions loss functions cross entropy loss rewrite math formula rewrite math formula we apply
+`tf.nn.softmax_cross_entropy_with_logits` on the unnormalized logits (e.g., we
+call `softmax_cross_entropy_with_logits` on `tf.matmul(x, W) + b`), because this
+more numerically stable function internally computes the softmax activation.
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])) cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
139 Tensorflow/Keras a89c54d57209f91161fa450605f645c9124d89ac Python fix loss of precision statistical distributions statistical distributions Bernoulli distribution use a different algorithm use a different algorithm use logits to create bernouli distribution
140 Tensorflow/Keras 14066c4b84e56c3b86f6152de1bb80df22341aa8 Python fix overflow/underflow statistical distributions statistical distributions log determinant, multivariate normal distribution use a different algorithm use a different algorithm Compute log_determinant instead of determinant in mvn to make stable (w.r.t. under/over flow). -def _determinant_from_sigma_chol(sigma_chol):
   det_last_dim = array_ops.rank(sigma_chol) - 2
   sigma_batch_diag = array_ops.batch_matrix_diag_part(sigma_chol)
-  det = math_ops.square(math_ops.reduce_prod(
-      sigma_batch_diag, reduction_indices=det_last_dim))
-  det.set_shape(sigma_chol.get_shape()[:-2])
-  return det
def _log_determinant_from_sigma_chol(sigma_chol):
   det_last_dim = array_ops.rank(sigma_chol) - 2
   sigma_batch_diag = array_ops.batch_matrix_diag_part(sigma_chol)
+  log_det = 2.0 * math_ops.reduce_sum(
+      math_ops.log(sigma_batch_diag), reduction_indices=det_last_dim)
+  log_det.set_shape(sigma_chol.get_shape()[:-2])
+  return log_det
141 Tensorflow/Keras bce6216610d57f8f4b1e9e79836737df109c4e42 Python fix loss of precision tensor math statistics variance with shifted data use a different algorithm use a different algorithm
142 Tensorflow/Keras 66f452d2217b155b697fc6d6cef5f56599ee2bbc C++ fix overflow overflow Only enable the HoistCommonFactorOutOfAggregation rewrite in aggressive mode, since it changes program behavior w.r.t. over- and underflow. For example, it will rewrite "0.5*x + 0.5*y" to "0.5*(x + y)", which will overflow if x + y > FLT_MAX, while the original expression does not overflow unless x + y > 2*FLT_MAX. optimizers optimizers arithmetic optimizer other allow code rewriting only in agressive mode Only enable the HoistCommonFactorOutOfAggregation rewrite in aggressive mode
143 Tensorflow/Keras 12243e6b65958c2e0c344aa3df4875f472ce5ae0 C++ fix overflow overflow other computational graph analytical cost estimator add overflow check add overflow check Fix integer-overflow in `tensorflow::grappler::AnalyticalCostEstimator::PredictCosts` by using MultiplyWithoutOverflow. MultiplyWithoutOverflow multiplies unsigned ints since signed overflow is undefined and has a check fo integer overflow. Return nullop if overflow size *= std::max<int64>(1, dim.size());       size = MultiplyWithoutOverflow(size, std::max<int64>(1, dim.size()));
      if (size < 0) {
        return errors::InvalidArgument(
            "Integer overflow encountered in dimension size.");
      }
144 Tensorflow/Keras cc464f04caa327d3f62d2f793a428cb7b0f0a5d7 Python unit test overflow overflow linear algebra linear algebra array product limit input range limit input range Limit input values to avoid integer overflow in reduction_ops_test. # overflow, divide the incremental int32 array by 2.
-    for rank in range(1, _MAX_RANK + 1):
-      np_arr = self._makeIncremental((2,) * rank, dtypes.int32) / 2

def testInt64(self):
-    for rank in range(1, _MAX_RANK + 1):
-      np_arr = self._makeIncremental((2,) * rank, dtypes.int64)
  # overflow, limit array values.
+    for rank in range(1, _MAX_RANK):
+      np_arr = self._makeIncremental((2,) * rank, dtypes.int32) % 5 + 1

def testInt64(self):
+    for rank in range(1, _MAX_RANK):
+      # Avoid overflow by limiting array values.
+      np_arr = self._makeIncremental((2,) * rank, dtypes.int64) % 11 + 1
145 Tensorflow/Keras 9d40a1573849b7e21d4f2d359fd9e87c40e33c0e Python Disable test overflow overflow tensor math tensor math division, mod testing disable test/warning disable test for division and mod Temporarily disable div overflow edge case due to ASAN failure.
146 Tensorflow/Keras b47be308c4b5ac7babd6400a8fb40c3d8bf163d6 C++ fix overflow overflow The original implementations of `google_floor_div`, XLA `FloorDiv` and
    MLIR `TF_FloorDivOp` all suffered from overflows for
    `abs(x) + abs(y) > INT_MAX
tensor math tensor math floor division rewrite math formula rewrite math formula Rewrite formula to
T z = x / y
return (z * y != x && (x < 0) != (y < 0)) ? z - 1 : z
def intEdgeTestData(self, dtype):
+    """Edge-case test data for integer types."""
+    nums = np.array([np.iinfo(dtype).min, -1, 1,
+                     np.iinfo(dtype).max],
+                    dtype=dtype).reshape([4, 1])
+    divs = nums.reshape([1, 4])
+    return nums, divs
+
+  def testFloorDivModIntEdges(self):
+    for dtype in [np.int32, np.int64]:
+      x, y = self.intEdgeTestData(dtype)
+      tf_floor_div = math_ops.floor_div(x, y)
+      np_floor_div = self.numpySafeFloorDivInt(x, y)
+      self.assertAllEqual(tf_floor_div, np_floor_div)
+      tf_floor_mod = math_ops.floormod(x, y)
+      np_floor_mod = self.numpySafeFloorModInt(x, y)
+      self.assertAllEqual(tf_floor_mod, np_floor_mod)
+      z = math_ops.add(math_ops.multiply(tf_floor_div, y), tf_floor_mod)
+      # x = floor_div(x, y) * y + floor_mod(x, y)
+      self.assertAllEqual(z, np.broadcast_to(x, z.shape))
+
+  def testTruncateDivModIntEdges(self):
+    for dtype in [np.int32, np.int64]:
+      x, y = self.intEdgeTestData(dtype)
+      tf_truncate_div = math_ops.truncatediv(x, y)
+      np_truncate_div = self.numpySafeTruncateDivInt(x, y)
+      self.assertAllEqual(tf_truncate_div, np_truncate_div)
+      tf_truncate_mod = math_ops.truncatemod(x, y)
+      np_truncate_mod = self.numpySafeTruncateModInt(x, y)
+      self.assertAllEqual(tf_truncate_mod, np_truncate_mod)
+      z = math_ops.add(math_ops.multiply(tf_truncate_div, y), tf_truncate_mod)
+      # x = truncatediv(x, y) * y + truncatemod(x, y)
+      self.assertAllEqual(z, np.broadcast_to(x, z.shape))
147 Tensorflow/Keras 4c0ee937c0f61c4fc5f5d32d9bb4c67428012a60 C++ fix overflow overflow other sparse operations sparse operations use a different algorithm use a different algorithm Prevent overflow by constructing the dense shape separately sparse::SparseTensor sparse_tensor;
     OP_REQUIRES_OK(context,
-                   sparse::SparseTensor::Create(
-                       input_indices, input_values,
-                       TensorShape(input_shape.vec<int64>()), &sparse_tensor));
TensorShape dense_shape;
+    const auto input_shape_flat = input_shape.flat<int64>();
+    for (int i = 0; i < input_shape.NumElements(); i++) {
+      OP_REQUIRES_OK(context,
+                     dense_shape.AddDimWithStatus(input_shape_flat(i)));
+    }


     sparse::SparseTensor sparse_tensor;
     OP_REQUIRES_OK(context,                   TensorShape(input_shape.vec<int64>()), &sparse_tensor));
+                   sparse::SparseTensor::Create(input_indices, input_values,
+                                                dense_shape, &sparse_tensor));
148 Tensorflow/Keras 7c8cc4ec69cd348e44ad6a2699057ca88faad3e5 C++ fix overflow overflow Op that looks up items from a sparse tensor in an embedding matrix. The sparse lookup tensor is represented by three individual tensors: lookup, indices, and dense_shape. integer overflow other sparse operations sparse operations, embedding add overflow check add overflow check ensure that output is not a null pointer that indicates overflow N/A TF_LITE_ENSURE(context, output_shape != nullptr);
149 Tensorflow/Keras 37054f9134af917ded7f40c7d663fa490d85c7d4 C++ fix overflow overflow activation functions activation functions range of activation function, quantization add overflow check add overflow check Add extra robustness by adding more overflow checks to CalculateActivationRangeQuantized for cases where output tensor has bad, but still valid quantization parameters, which cause integer overflow.
void CalculateActivationRangeQuantizedImpl(TfLiteFusedActivation activation,
-                                           int32_t qmin, int32_t qmax,
-                                           TfLiteTensor* output,
-                                           int32_t* act_min, int32_t* act_max) {

   if (activation == kTfLiteActRelu) {
-    *act_min = std::max(qmin, quantize(0.0));
   } else if (activation == kTfLiteActRelu6) {
-    *act_min = std::max(qmin, quantize(0.0));
-    *act_max = std::min(qmax, quantize(6.0));
   } else if (activation == kTfLiteActReluN1To1) {
-    *act_min = std::max(qmin, quantize(-1.0));
-    *act_max = std::min(qmax, quantize(1.0));

inline TfLiteStatus Quantize(TfLiteContext* context, float scale,
+                             int32_t zero_point, float f, int32_t& q) {
+  const float tmp = TfLiteRound(f / scale);
+  const bool no_integer_overflow_from_quantization =
+      (tmp >= std::numeric_limits<int32_t>::min() &&
+       tmp <= std::numeric_limits<int32_t>::max());
+  TF_LITE_ENSURE(context, no_integer_overflow_from_quantization);
+  q = zero_point + static_cast<int32_t>(tmp);
+  return kTfLiteOk;
+}
+
+TfLiteStatus CalculateActivationRangeQuantizedImpl(
+    TfLiteContext* context, TfLiteFusedActivation activation, int32_t qmin,
+    int32_t qmax, TfLiteTensor* output, int32_t* act_min, int32_t* act_max) {
+  int32_t tmp_q;
   if (activation == kTfLiteActRelu) {
+    TF_LITE_ENSURE_OK(context,
+                      Quantize(context, scale, zero_point, 0.0, tmp_q));
+    *act_min = std::max(qmin, tmp_q);
     *act_max = qmax;
   } else if (activation == kTfLiteActRelu6) {
+    TF_LITE_ENSURE_OK(context,
+                      Quantize(context, scale, zero_point, 0.0, tmp_q));
+    *act_min = std::max(qmin, tmp_q);
+    TF_LITE_ENSURE_OK(context,
+                      Quantize(context, scale, zero_point, 6.0, tmp_q));
+    *act_max = std::min(qmax, tmp_q);
   } else if (activation == kTfLiteActReluN1To1) {
+    TF_LITE_ENSURE_OK(context,
+                      Quantize(context, scale, zero_point, -1.0, tmp_q));
+    *act_min = std::max(qmin, tmp_q);
+    TF_LITE_ENSURE_OK(context,
+                      Quantize(context, scale, zero_point, 1.0, tmp_q));
+    *act_max = std::min(qmax, tmp_q);
   } else {
     *act_min = qmin;
     *act_max = qmax;
   }
+  return kTfLiteOk;
TEST_F(KernelUtilTest, ActivationRangeQuantizedOverflow) {
+  // Create output.
+  TfLiteTensor output = {};
+  output.type = kTfLiteUInt8;
+  output.allocation_type = kTfLiteArenaRw;
+  output.dims = nullptr;
+  TfLiteQuantizationParams output_quant = {1e-10, -128};
+  output.params = output_quant;
+  output.quantization.type = kTfLiteAffineQuantization;
+  auto* output_params = reinterpret_cast<TfLiteAffineQuantization*>(
+      malloc(sizeof(TfLiteAffineQuantization)));
+  output_params->scale = TfLiteFloatArrayCreate(1);
+  output_params->scale->data[0] = 1;
+  output_params->zero_point = TfLiteIntArrayCreate(1);
+  output_params->zero_point->data[0] = -128;
+  output.quantization.params = reinterpret_cast<void*>(output_params);
+
+  // For bounded activation, a too small scale value may cause overflow.
+  // Make sure overflow error is handled gracefully.
+  int32_t act_min, act_max;
+  ASSERT_EQ(kTfLiteOk,
+            CalculateActivationRangeQuantized(&context_, kTfLiteActRelu,
+                                              &output, &act_min, &act_max));
+  ASSERT_NE(kTfLiteOk,
+            CalculateActivationRangeQuantized(&context_, kTfLiteActRelu6,
+                                              &output, &act_min, &act_max));
+  EXPECT_TRUE(absl::StrContains(
+      context_.error, "no_integer_overflow_from_quantization was not true"));
+  ASSERT_NE(kTfLiteOk,
+            CalculateActivationRangeQuantized(&context_, kTfLiteActReluN1To1,
+                                              &output, &act_min, &act_max));
+  EXPECT_TRUE(absl::StrContains(
+      context_.error, "no_integer_overflow_from_quantization was not true"));
+
+  // Release.
+  TfLiteTensorFree(&output);
150 Tensorflow/Keras 4253f96a58486ffe84b61c0415bb234a4632ee73 C++ fix overflow overflow integer overflow other other concatenate limit input range limit input range TfLiteStatus Prepare(TfLiteContext* context, TfLiteNode* node) {
     TF_LITE_ENSURE_EQ(context, t->type, input_type);
     for (int d = 0; d < t0->dims->size; ++d) {
       if (d == axis) {
+        // Avoid integer overflow in sum_axis below
+        TF_LITE_ENSURE(context, t->dims->data[axis] >= 0);
+        TF_LITE_ENSURE(context, t->dims->data[axis] <=
+                                    std::numeric_limits<int>::max() - sum_axis);
sum
151 Tensorflow/Keras 704866eabe03a9aeda044ec91a8d0c83fc1ebdbe C++ fix overflow overflow other other join segments limit input range limit input range OP_REQUIRES(context, num_segments_tensor.NumElements() != 0,
+                errors::InvalidArgument("Number of segments cannot be empty."));
152 Tensorflow/Keras 87cf4d3ea9949051e50ca3f071fc909538a51cd0 C++ fix overflow overflow other sparse operations sparse operations (concat) add overflow check add overflow check break if overflow occured bool overflow_ocurred = false;
     for (int i = 0; i < N; i++) {
+      int64 new_num_elements = 1;
       OP_REQUIRES(context, TensorShapeUtils::IsVector(shapes[i].shape()),
                   errors::InvalidArgument(
                       "Input shapes should be a vector but received shape ",
                       shapes[i].shape().DebugString(), " at position ", i));
+      auto input_shape_vector = shapes[i].vec<int64>();
+      for (int j = 0; j < input_shape_vector.size(); j++) {
+        new_num_elements =
+            MultiplyWithoutOverflow(new_num_elements, input_shape_vector(j));
+        if (new_num_elements < 0) {
+          overflow_ocurred = true;
+          break;
+        }
+      }
+
+      if (overflow_ocurred) {
+        break;
+      }
     }

+    OP_REQUIRES(
+        context, !overflow_ocurred,
+        errors::Internal("Encountered overflow from large input shape."));
153 Tensorflow/Keras b432a38fe0e1b4b904a6c222cbce794c39703e87 C++ unit test overflow overflow data processing image processing bounding boxes for image processing limit input range limit input range replace check with require valid arguments into draw bounding box function and add useful error messages if argument invalid CHECK_GE(min_box_row_clamp, 0);
-        CHECK_GE(max_box_row_clamp, 0);
-        CHECK_LT(min_box_row_clamp, height);
-        CHECK_LT(max_box_row_clamp, height);
-        CHECK_GE(min_box_col_clamp, 0);
-        CHECK_GE(max_box_col_clamp, 0);
-        CHECK_LT(min_box_col_clamp, width);
-        CHECK_LT(max_box_col_clamp, width);
-        CHECK_LT(min_box_row, height);
-        CHECK_GE(max_box_row, 0);
-        CHECK_LT(min_box_col, width);
-        CHECK_GE(max_box_col, 0);
OP_REQUIRES(
+            context, min_box_row_clamp >= 0,
+            errors::InvalidArgument("Min box row clamp is less than 0."));
+        OP_REQUIRES(
+            context, max_box_row_clamp >= 0,
+            errors::InvalidArgument("Max box row clamp is less than 0."));
+        OP_REQUIRES(context, min_box_row_clamp <= height,
+                    errors::InvalidArgument(
+                        "Min box row clamp is greater than height."));
+        OP_REQUIRES(context, max_box_row_clamp <= height,
+                    errors::InvalidArgument(
+                        "Max box row clamp is greater than height."));
+
+        OP_REQUIRES(
+            context, min_box_col_clamp >= 0,
+            errors::InvalidArgument("Min box col clamp is less than 0."));
+        OP_REQUIRES(
+            context, max_box_col_clamp >= 0,
+            errors::InvalidArgument("Max box col clamp is less than 0."));
+        OP_REQUIRES(context, min_box_col_clamp <= width,
+                    errors::InvalidArgument(
+                        "Min box col clamp is greater than width."));
+        OP_REQUIRES(context, max_box_col_clamp <= width,
+                    errors::InvalidArgument(
+                        "Max box col clamp is greater than width."));

+        OP_REQUIRES(
+            context, min_box_row <= height,
+            errors::InvalidArgument("Min box row is greater than height."));
+        OP_REQUIRES(context, max_box_row >= 0,
+                    errors::InvalidArgument("Max box row is less than 0."));
+        OP_REQUIRES(
+            context, min_box_col <= width,
+            errors::InvalidArgument("Min box col is greater than width."));
+        OP_REQUIRES(context, max_box_col >= 0,
+                    errors::InvalidArgument("Max box col is less than 0."));
154 Tensorflow/Keras 87d2b9751513253058be671313db3e32cc13842a C++ unit test overflow overflow other sparse operations sparse operations (concat) fix test/warning fix overflow check                                          shapes.size()));
-    bool overflow_ocurred = false;
     for (int i = 0; i < N; i++) {
-      int new_num_elements = 1;
       OP_REQUIRES(context, TensorShapeUtils::IsVector(shapes[i].shape()),
                   errors::InvalidArgument(
                       "Input shapes should be a vector but received shape ",
                       shapes[i].shape().DebugString(), " at position ", i));
-      auto input_shape_vector = shapes[i].vec<int64>();
-      for (int j = 0; j < input_shape_vector.size(); j++) {
-        new_num_elements =
-            MultiplyWithoutOverflow(new_num_elements, input_shape_vector(j));
-        if (new_num_elements < 0) {
-          overflow_ocurred = true;
-          break;
-        }
-      }
-
-      if (overflow_ocurred) {
-        break;
-      }
     }

-    OP_REQUIRES(
-        context, !overflow_ocurred,
-        errors::Internal("Encountered overflow from large input shape."));
N/A
155 Tensorflow/Keras 7bb2d255e6d404cbfa528d0ffc2f22248e6c1b21 Cuda fix overflow overflow other sparse operations sparse to dense operation increase variable precision/change variable type change variable type use int 64 as index Index output_idx = indices[thread_idx * ndims + ndims - 1]; int64 output_idx = indices[thread_idx * ndims + ndims - 1];
156 Tensorflow/Keras dc4d330cfe25bbb0c3e4759dadfb16d4715f338a C++ fix underflow underflow If a complex value's squared norm was denormal but had a non-zero imaginary part, the Householder reflection computation could yield NaNs. linear algebra norm norm use a different algorithm use a different algorithm By using a more accurate norm, we can avoid the underflow. The new norm Computes sqrt(x^2 + y^2 + ...), avoiding overflow/underflow auto mu = Sqrt(Real(alpha * Conj(alpha)) + sigma); XlaOp Norm(std::vector<XlaOp> xs) {
+  CHECK(!xs.empty());
+  XlaOp w;
+  for (size_t i = 0; i < xs.size(); ++i) {
+    xs[i] = Abs(xs[i]);
+    w = i == 0 ? xs[i] : xla::Max(w, xs[i]);
+  }
+
+  XlaOp out;
+  for (size_t i = 0; i < xs.size(); ++i) {
+    XlaOp t = Square(xs[i] / w);
+    out = i == 0 ? t : xla::Add(out, t);
+  }
+  return Select(Eq(w, ZerosLike(w)), ZerosLike(w), w * Sqrt(out));

auto mu = Norm({Real(alpha), Imag(alpha), Sqrt(sigma)});
157 Tensorflow/Keras ff6601a943db5f71fda09210e67ba8e9fd839ae8 C++ fix overflow overflow CombinedNonMaxSuppression greedily selects a subset of bounding boxes in descending order of score data processing image processing non_max_suppression increase variable precision/change variable type change variable type Use an int32 scalar as the default type for representing the maximum number of boxes retained over all classes. If int32 should overflow, use int64 max_total_size = ops.convert_to_tensor(max_total_size)
class CombinedNonMaxSuppressionTest(test_util.TensorFlowTestCase):
+
+  # NOTE(b/142795960): parameterized tests do not work well with tf.tensor
+  # inputs. Due to failures, creating another test `testInvalidTensorInput`
+  # which is identical to this one except that the input here is a scalar as
+  # opposed to a tensor.
+  def testInvalidPyInput(self):
+    boxes_np = [[[[0, 0, 1, 1], [0, 0.1, 1, 1.1], [0, -0.1, 1, 0.9],
+                  [0, 10, 1, 11], [0, 10.1, 1, 11.1], [0, 100, 1, 101]]]]
+    scores_np = [[[0.9, 0.75, 0.6, 0.95, 0.5, 0.3]]]
+    max_output_size_per_class = 5
+    max_total_size = 2**31
+    with self.assertRaisesRegex(
+        (TypeError, ValueError),
+        "type int64 that does not match expected type of int32|"
+        "Tensor conversion requested dtype int32 for Tensor with dtype int64"):
+      image_ops.combined_non_max_suppression(
+          boxes=boxes_np,
+          scores=scores_np,
+          max_output_size_per_class=max_output_size_per_class,
+          max_total_size=max_total_size)
+
+  # NOTE(b/142795960): parameterized tests do not work well with tf.tensor
+  # inputs. Due to failures, creating another this test which is identical to
+  # `testInvalidPyInput` except that the input is a tensor here as opposed
+  # to a scalar.
+  def testInvalidTensorInput(self):
+    boxes_np = [[[[0, 0, 1, 1], [0, 0.1, 1, 1.1], [0, -0.1, 1, 0.9],
+                  [0, 10, 1, 11], [0, 10.1, 1, 11.1], [0, 100, 1, 101]]]]
+    scores_np = [[[0.9, 0.75, 0.6, 0.95, 0.5, 0.3]]]
+    max_output_size_per_class = 5
+    max_total_size = ops.convert_to_tensor(2**31)
+    with self.assertRaisesRegex(
+        (TypeError, ValueError),
+        "type int64 that does not match expected type of int32|"
+        "Tensor conversion requested dtype int32 for Tensor with dtype int64"):
+      image_ops.combined_non_max_suppression(
+          boxes=boxes_np,
+          scores=scores_np,
+          max_output_size_per_class=max_output_size_per_class,
+          max_total_size=max_total_size)
158 Tensorflow/Keras 94b6db8cc538408cc29d88be13307f9fd8a77120 C++ fix overflow overflow Dynamic stitch interleaves the values from the data tensors into a single tensor. slice_size must not be stored as int for cases of tensors over 2GB. data processing data dynamic_stitch increase variable precision/change variable type change variable type use auto type instead of int const int slice_size = merged_flat.dimension(1); const auto slice_size = merged_flat.dimension(1);
159 Tensorflow/Keras 087859fce9409991164f727735743da4cb310fd4 C++ fix overflow overflow large input size other computational graph bilinear operation, computational graph optimization increase variable precision/change variable type change variable type use int64 instead of int const int output_elements = CalculateTensorElementCount( const int64 output_elements = CalculateTensorElementCount // Cost with very large tensor.
+    op_context.op_info.clear_outputs();
+    // Number of elements in tensor exceeds 2^32.
+    constexpr int64 kLargeOutputImageDim = 40000;
+    DescribeTensor4D(1, kLargeOutputImageDim, kLargeOutputImageDim,
+                     kChannelSize, op_context.op_info.add_outputs());
+    const int64 kInterpWeightCost = 12;
+    // Using half_pixel_centers.
+    AttrValue half_pixel_centers;
+    half_pixel_centers.set_b(true);
+    (*op_context.op_info.mutable_attr())["half_pixel_centers"] =
+        half_pixel_centers;
+
+    const int64 num_ops =
+        kInterpWeightCost * (kLargeOutputImageDim * 2) +
+        kComputeLerpCost *
+            (kLargeOutputImageDim * kLargeOutputImageDim * kChannelSize);
+    const int64 expected_compute_time = std::ceil(
+        num_ops /
+        estimator_.GetDeviceInfo(op_context.op_info.device()).gigaops);
+
+    const int64 expected_memory_time =
+        (kImageDim * kImageDim + kLargeOutputImageDim * kLargeOutputImageDim) *
+        4;
+
+    const auto cost = PredictCosts(op_context);
+    EXPECT_EQ(cost.compute_time, Costs::Duration(expected_compute_time));
+    EXPECT_EQ(cost.memory_time, Costs::Duration(expected_memory_time));
+    EXPECT_EQ(cost.execution_time,
+              Costs::Duration(expected_memory_time + expected_compute_time));
+    EXPECT_FALSE(cost.inaccurate);
+    EXPECT_EQ(cost.num_ops_with_unknown_shapes, 0);
+  }
160 Tensorflow/Keras 90e89339a9bf04fb304129a01ca50f25fdde441d C++ fix overflow overflow potential overflow in 64-bit MultiplyByQuantizedMultiplier function
quantization quantization quantization use a different algorithm use a different algorithm int32_t reduced_multiplier = (quantized_multiplier + (1 << 15)) >> 16;
int32_t reduced_multiplier = (quantized_multiplier < 0x7FFF0000)
+                                   ? ((quantized_multiplier + (1 << 15)) >> 16)
+                                   : 0x7FFF;
161 Tensorflow/Keras dffb0b56192f4c95fbf563a82742b4a3f4881e05 C++ fix overflow overflow     A U16 of 46977 multiplied by a U16 of 53826, when evaluated in the evaluator,
    results in the operands of the multiply getting promoted to the C++ type "int"
    which is signed. The result of the multiply will overflow a signed int and give
    a negative result.
compiler compiler compiler, XLA HLO (high level operations) increase variable precision/change variable type change variable type promote both operands to "unsigned int" which will not suffer
    from any overflow issues
typename std::enable_if<std::is_integral<T>::value &&
-                                  std::is_signed<T>::value>::type* = nullptr>
-typename std::make_unsigned<T>::type ToArithmeticSafeType(T t) {
-  return static_cast<typename std::make_unsigned<T>::type>(t);

-          typename std::enable_if<!std::is_integral<T>::value ||
-                                  !std::is_signed<T>::value>::type* = nullptr>
namespace detail {
+template <typename T>
+using unsigned_promoted_type_t =
+    std::make_unsigned_t<decltype(std::declval<T>() + std::declval<T>())>;
+}

+          typename std::enable_if<std::is_integral<T>::value>::type* = nullptr>
+detail::unsigned_promoted_type_t<T> ToArithmeticSafeType(T t) {
+  return static_cast<detail::unsigned_promoted_type_t<T>>(t);

+          typename std::enable_if<!std::is_integral<T>::value>::type* = nullptr>
162 Tensorflow/Keras 90b80fba1ade0222713b8a33af00858190532075 C++ fix overflow overflow compiler message overflow compiler compiler compiler, XLA HLO (high level operations) limit input range limit input range limit max inuts SummarizeNodeDef(node_def), ".\n");
string SummarizeNodeDef(const NodeDef& node_def) {
SummarizeNodeDef(node_def, /*max_inputs_in_summary=*/10), ".\n");
+string SummarizeNodeDef(const NodeDef& node_def, int max_inputs_in_summary) {
+    if (max_inputs_in_summary-- == 0) {
+      strings::StrAppend(&ret, "...");
+      break;
+// The parameter `max_inputs_in_summary` specifies how many inputs at most to
+// serialize in the output (in order not to get a string which is overly large).
+// The value `-1` specifies that all inputs will be shown.
+string SummarizeNodeDef(const NodeDef& node_def,
+                        int max_inputs_in_summary = -1);
163 Tensorflow/Keras 036b75a818493a30cd25caef1761931a3bc2b074 C++ fix overflow overflow compiler compiler compiler increase variable precision/change variable type increase variable precision increase precision of index from int to int64 int linear_index = j * vector_size + i; int64 linear_index = j * vector_size + i;
164 Tensorflow/Keras 2adf1114d4dc7ca30e5117acd2dc7aeb3279feb7 C++ unit test overflow overflow The Android Neural Networks API (NNAPI) is available on all Android devices running Android 8.1 (API level 27) or higher. It provides acceleration for TensorFlow Lite models on Android devices with supported hardware accelerators including:

Graphics Processing Unit (GPU)
Digital Signal Processor (DSP)
Neural Processing Unit (NPU)
other other NNAPI delegate add overflow check add overflow check add overflow check of cpu     // reference CPU path.
-      Expect(is_accelerator_specified ||
-                 (builtin->filter_width * builtin->filter_height <= 256),
-             NNAPIValidationFailureType::kUnsupportedOperandSize,
-             "Large filter window would overflow on the reference CPU path",
-             &val_ctx);
     // quantized reference CPU path.
+      if (IsQuantized(context->tensors[node->inputs->data[0]].type)) {
+        Expect(is_accelerator_specified ||
+                   (builtin->filter_width * builtin->filter_height <= 256),
+               NNAPIValidationFailureType::kUnsupportedOperandSize,
+               "Large filter window would overflow on the reference CPU path",
+               &val_ctx);
+      }
165 Tensorflow/Keras 85f10eb4200b3b3339340943b288da157e9742e7 C++ unit test overflow overflow Compilers are producing different
    code and resulting in bad assumptions.
precision tests/speed benchmarks overflow test overflow test increase variable precision/change variable type change variable type change type of variable y from auto to int64 -  for (auto x : interesting) {
-    for (auto y : interesting) {
     
-  long double dxy = static_cast<long double>(x) * y;
-      if (dxy > std::numeric_limits<int64>::max()) {
-        EXPECT_LT(xy, 0);
bool HasOverflow(int64 x, int64 y) {
+#ifdef PLATFORM_WINDOWS
+  // `long double` on MSVC is 64 bits not 80 bits - use a windows specific API
+  // for this test.
+  return ::MultiplyHigh(x, y) != 0;
+#else
+  long double dxy = static_cast<long double>(x) * static_cast<long double>(y);
+  return dxy > std::numeric_limits<int64>::max();
+#endif
+}

+  for (int64 x : interesting) {
+    for (int64 y : interesting) {
    if (HasOverflow(x, y)) {
+        EXPECT_LT(xy, 0) << x << " " << y;
166 Tensorflow/Keras 171ba06f5e52078e0aa2112797b5a4227370bbd5 C++ unit test overflow overflow Subgraphs are the part of main graph and are themselves computational graphs by nature. other computational graph tensorflow subgraph graph generation add overflow check add overflow check bring back overflow detection for windows
167 Tensorflow/Keras 2522ce7dd5d28c9733824a66133fc918290e3ed0 C++ fix overflow overflow data processing tensor allocation tensor allocation add overflow check add overflow check Check for overflow in # of bytes computation of tensor allocation.
Check both for product of shape dimensions (# of elements) and number of bytes (elements * sizeof(data_type)).
no overflow check TfLiteStatus MultiplyAndCheckOverflow(size_t a, size_t b, size_t* product) {
+  constexpr size_t overflow_threshold = (8 * sizeof(size_t)) >> 1;
+  *product = a * b;
+  // If neither integers have non-zero bits past 32 bits can't overflow.
+  // Otherwise check using slow devision.
+  if (__builtin_expect((a | b) >> overflow_threshold != 0, false)) {
+    if (a != 0 && *product / a != b) return kTfLiteError;
+  }
+  return kTfLiteOk;

+  for (int k = 0; k < dims_size; k++) {
+    size_t old_count = count;
+    TF_LITE_ENSURE_MSG(
+        &context_,
+        MultiplyAndCheckOverflow(old_count, dims[k], &count) == kTfLiteOk,
+        "BytesRequired number of elements overflowed.\n");
+  }
   size_t type_size = 0;
   TF_LITE_ENSURE_OK(&context_, GetSizeOfType(&context_, type, &type_size));

+  TF_LITE_ENSURE_MSG(
+      &context_, MultiplyAndCheckOverflow(type_size, count, bytes) == kTfLiteOk,
+      "BytesRequired number of bytes overflowed.\n");

TEST(BasicInterpreter, TestOverflow) {
+  TestErrorReporter reporter;
+  Interpreter interpreter(&reporter);
+  TfLiteQuantizationParams quantized;
+
+  ASSERT_EQ(interpreter.AddTensors(1), kTfLiteOk);
+  ASSERT_EQ(interpreter.SetInputs({0}), kTfLiteOk);
+  ASSERT_EQ(interpreter.SetOutputs({0}), kTfLiteOk);
+  // Overflow testing is pointer word size dependent.
+  if (sizeof(size_t) == 8) {
+    // #bits for bytecount = 30 + 30 + 2 = 62 < 64
+    ASSERT_EQ(interpreter.SetTensorParametersReadWrite(
+                  0, kTfLiteFloat32, "in1", {1 << 30, 1 << 30}, quantized),
+              kTfLiteOk);
+    // #bits for element count = 30 + 30 + 2 = 62 < 64 (no overflow)
+    // #bits for byte count = 30 + 30 + 2 + 2 = 64 == 64 (overflow)
+    ASSERT_NE(
+        interpreter.SetTensorParametersReadWrite(
+            0, kTfLiteFloat32, "in1", {1 << 30, 1 << 30, 1 << 2}, quantized),
+        kTfLiteOk);
+    EXPECT_THAT(
+        reporter.error_messages(),
+        testing::EndsWith("BytesRequired number of bytes overflowed.\n"));
+    // #bits for element count = 30 + 30 + 2 + 4 = 66 > 64 (overflow).
+    // #bits for byte count = 30 + 30 + 2 + 4 + 2 = 68 > 64 (overflow).
+    reporter.Reset();
+    ASSERT_NE(interpreter.SetTensorParametersReadWrite(
+                  0, kTfLiteFloat32, "in1", {1 << 30, 1 << 30, 1 << 2, 1 << 4},
+                  quantized),
+              kTfLiteOk);
+    EXPECT_THAT(
+        reporter.error_messages(),
+        testing::EndsWith("BytesRequired number of elements overflowed.\n"));
+
+  } else if (sizeof(size_t) == 4) {
+    // #bits for bytecount = 14 + 14 + 2 = 30 < 32
+    ASSERT_EQ(interpreter.SetTensorParametersReadWrite(
+                  0, kTfLiteFloat32, "in1", {1 << 14, 1 << 14}, quantized),
+              kTfLiteOk);
+    // #bits for element count = 14 + 14 + 3 = 31 < 32 (no overflow).
+    // #bits for byte count = 14 + 14 + 3 + 2 = 33 > 32 (overflow).
+    ASSERT_NE(
+        interpreter.SetTensorParametersReadWrite(
+            0, kTfLiteFloat32, "in1", {1 << 14, 1 << 14, 1 << 3}, quantized),
+        kTfLiteOk);
+    EXPECT_THAT(
+        reporter.error_messages(),
+        testing::EndsWith("BytesRequired number of bytes overflowed.\n"));
+    // #bits for element count = 14 + 14 + 4 = 32 == 32 (overflow).
+    // byte count also overflows, but we don't get to that check.
+    reporter.Reset();
+    ASSERT_NE(
+        interpreter.SetTensorParametersReadWrite(
+            0, kTfLiteFloat32, "in1", {1 << 14, 1 << 14, 1 << 4}, quantized),
+        kTfLiteOk);
+    EXPECT_THAT(
+        reporter.error_messages(),
+        testing::EndsWith("BytesRequired number of elements overflowed.\n"));
+  } else {
+    // This test failing means that we are using a non 32/64 bit architecture.
+    ASSERT_TRUE(false);
+  }
+}
168 Tensorflow/Keras 75e5b5d70b6f33bd41fdf07b844c762b23f99d1b C++ fix overflow overflow overflows in accumulation results tensor math tensor math summation increase variable precision/change variable type increase variable precision upcastto an integer type with more bits N/A // Upcast small integer types to 32 bit to avoid overflow.
+  if (dtype == DT_INT8 || dtype == DT_INT16) {
+    return DT_INT32;
+  }
+  if (dtype == DT_UINT8 || dtype == DT_UINT16) {
+    return DT_UINT32;
+  }
N/A
169 Tensorflow/Keras 23fde233bf3210759b5a4453bc39101df9c86d0c C++ fix overflow overflow tensor math statistics mean increase variable precision/change variable type increase variable precision Perform mean reductions for integer types in 64 bit to mitigate overflow in the sum and/or denominator.
I.e.: Upcast int8, int16, int32 into int64
define CASTING_SPECIALIZATION(ScalarType, IntermediateType)                  \
+  template <typename Device, typename OUT_T, typename IN_T,                   \
+            typename ReductionAxes>                                           \
+  struct ReduceEigenImpl<Device, OUT_T, IN_T, ReductionAxes,                  \
+                         functor::MeanReducer<ScalarType>> {                  \
+    void operator()(const Device& d, OUT_T out, IN_T in,                      \
+                    const ReductionAxes& reduction_axes,                      \
+                    const functor::MeanReducer<ScalarType>& reducer) {        \
+      static_assert(std::is_same<ScalarType, typename OUT_T::Scalar>::value,  \
+                    "");                                                      \
+      Eigen::internal::SumReducer<IntermediateType> sum_reducer;              \
+      out.device(d) = (in.template cast<IntermediateType>().reduce(           \
+                           reduction_axes, sum_reducer) /                     \
+                       static_cast<IntermediateType>(in.size() / out.size())) \
+                          .template cast<ScalarType>();                       \
+    }                                                                         \
+  }
# This tests the issue reported in b/145030710.
+  @test_util.run_deprecated_v1
+  def testSizeOverflowUint8(self):
+    np_arr = self._makeRandom((2**8,), dtypes.uint8)
+    self._compareAllAxes(np_arr)
+
+  @test_util.run_deprecated_v1
+  def testSizeOverflowInt8(self):
+    np_arr = self._makeRandom((2**7,), dtypes.int8)
+    self._compareAllAxes(np_arr)
+
+  @test_util.run_deprecated_v1
+  def testSizeOverflowUint16(self):
+    np_arr = self._makeRandom((2**16,), dtypes.uint16)
+    self._compareAllAxes(np_arr)
+
+  @test_util.run_deprecated_v1
+  def testSizeOverflowInt16(self):
+    np_arr = self._makeRandom((2**15,), dtypes.int16)
+    self._compareAllAxes(np_arr)
170 Tensorflow/Keras 79605069321520bd8af318eef92b71070dcc8961 C++ fix overflow overflow strided_slice extracts a strided slice of a tensor (generalized Python array indexing). strided_slice would overflow for end and start slices larger than int16 other other strided slice kernel increase variable precision/change variable type increase variable precision change the StridedSliceParams start_indices and end_indices from int16 to int32 values struct StridedSliceParams {
   int8 start_indices_count;
-  int16 start_indices[4];
   int8 stop_indices_count;
-  int16 stop_indices[4];
   int8 strides_count;
-  int16 strides[4];
struct StridedSliceParams {
   int8 start_indices_count;
+  int32 start_indices[4];
   int8 stop_indices_count;
+  int32 stop_indices[4];
   int8 strides_count;
+  int32 strides[4];
TEST(StridedSliceOpTest, In1D_Int32End) {
+  StridedSliceOpModel<> m({32768}, {1}, {1}, {1}, 0, 0, 0, 0, 0);
+  std::vector<float> values;
+  for (int i = 0; i < 32768; i++) {
+    values.push_back(i);
+  }
+  m.SetInput(values);
+  m.SetBegin({0});
+  m.SetEnd({32768});
+  m.SetStrides({1});
+  m.Invoke();
+  EXPECT_THAT(m.GetOutputShape(), ElementsAreArray({32768}));
+  EXPECT_THAT(m.GetOutput(), ElementsAreArray(values));
171 Tensorflow/Keras eaea3db3be4e27464a0b669bebffe46f2f8b005f C++ fix overflow overflow overflow in quantization if there is a mismatch in scale of weights and biases quantization quantization quantization limit input range limit input range Adjusts the scale of the weight tensor when the scale is small enough to lead to overflow due to a mismatch with the scale of the bias values.
Checks that the bias is quantized to within the middle half of the allowable bit range determined by the scales of the input and weight tensors If this condition is not satisfied, the scale of the weights is increased in order to prevent overflow.
TfLiteStatus AdjustWeightsForBiasScale(QuantizationParametersT* quant_params,
+                                       const float* bias_data,
+                                       const size_t bias_size,
+                                       const float input_scale,
+                                       ErrorReporter* error_reporter) {
+  // TODO(dmolitor) Allow adjusting activation scale.
+  // TODO(dmolitor) Tighten scale adjustment.
+  // TODO(dmolitor) Test using a separate strategy for scales of 0.
+  const int32_t kScale = std::numeric_limits<int32_t>::max();
+  if (quant_params == nullptr) {
+    error_reporter->Report("Missing max and min values for weight tensor.");
+    return kTfLiteError;
+  }
+  // channel_dim_size is calculated from min.size() to infer whether
+  // quantization is per axis
+  int channel_dim_size = quant_params->min.size();
+  if (channel_dim_size == 0) {
+    error_reporter->Report(
+        "Missing weight scales. Unable to check compatibility with bias "
+        "scale.");
+    return kTfLiteError;
+  }
+
+  std::vector<float> weight_scales(channel_dim_size);
+  TF_LITE_ENSURE_STATUS(GetSymmetricScalesFromMaxMin(
+      quant_params, &weight_scales, error_reporter));
+
+  // Per channel quantization
+  if (channel_dim_size > 1) {
+    for (size_t i = 0; i < channel_dim_size; ++i) {
+      // Current scale is not compatible with bias. Adjust max/min values.
+      if (std::abs(bias_data[i]) >=
+          0.5 * input_scale * weight_scales[i] * kScale) {
+        quant_params->max[i] = 2.0 * std::abs(bias_data[i]) / kScale *
+                               (kMaxQuantizedValue / input_scale);
+        quant_params->min[i] = -quant_params->max[i];
+      }
+    }
+    // Per layer quantization
+  } else if (channel_dim_size == 1) {
+    const auto minmax = std::minmax_element(bias_data, bias_data + bias_size);
+    const float bias_half_range =
+        std::max(std::abs(*minmax.first), std::abs(*minmax.second));
+
+    // Need to adjust weight min/max; not compatible with bias.
+    if (bias_half_range / kScale >= 0.5 * input_scale * weight_scales[0]) {
+      quant_params->min[0] =
+          2.0 * bias_half_range / kScale * (kMinQuantizedValue / input_scale);
+      quant_params->max[0] =
+          2.0 * bias_half_range / kScale * (kMaxQuantizedValue / input_scale);
+    }
+  }
+  return kTfLiteOk;
172 Tensorflow/Keras 676bce388aba376a4e6f7307dc92fdc0a8b3af42 C++ fix overflow overflow Quantized mean and sum have a risk of overflow quantization quantization quantization limit input range limit input range cast input based on numeric limits // Convert to float value.
-        output_data[idx] =
-            static_cast<T>(std::round(float_mean * scale + bias)) +
-            output_zero_point;
float result =
+            std::min(std::round(float_mean * scale + bias) + output_zero_point,
+                     static_cast<float>(std::numeric_limits<T>::max()));
+        result =
+            std::max(result, static_cast<float>(std::numeric_limits<T>::min()));
+        output_data[idx] = static_cast<T>(result);
N/A
173 Tensorflow/Keras e08474a981b87a8c4fdc9d9d08765727fe8d629e C++ fix overflow overflow compiler compiler compiler, variable accessor increase variable precision/change variable type increase variable precision Change shared variables to high precision
174 Tensorflow/Keras c782a538b0b90d93c6070ac177cb1f542272bcce C++ fix overflow overflow overflowing of integer "+" and "-" operations CNN operations convolution convolution transpose rewrite math formula rewrite math formula -        int i = y * $kernel_size.x$ + x;
-        ivec2 idx = gid.xy + ivec2(x, y) - $padding$;
int i = int(float(y * $kernel_size.x$) + float(x));        
+        ivec2 idx = ivec2(vec2(gid.xy + ivec2(x, y)) - vec2($padding$));
175 Tensorflow/Keras ea316ec1827bacae811858a7f681dfac47ef7f47 C++ fix overflow overflow signed overflow compiler compiler compiler, dot interpreter increase variable precision/change variable type change variable type change type to unsigned -                static_cast<ElementwiseT>(lhs_literal.Get<ReturnT>(lhs_index)) *
-                static_cast<ElementwiseT>(rhs_literal.Get<ReturnT>(rhs_index));
ElementwiseT lhs_val(lhs_literal.Get<ReturnT>(lhs_index));
+            ElementwiseT rhs_val(rhs_literal.Get<ReturnT>(rhs_index));
ToArithmeticSafeType(lhs_val) * ToArithmeticSafeType(rhs_val);
176 Tensorflow/Keras 09b8ed34f47dbd6921304f2d4ceb3669c1e089e6 Python fix overflow overflow int32 overflow other other flatten layer increase variable precision/change variable type increase variable precision increase precision of variable shape to int64 if neccessary, otherwise keep it as int32 input_shape = inputs.shape
+    if input_shape[1:].is_fully_defined():
+      flattened_dim = tensor_shape.dimension_value(
+          np.prod(input_shape[1:], dtype=int))
+      # Temporary fix for integer overflow issue.
+      if flattened_dim > np.iinfo(np.int32).max:
+        shape_dtype = dtypes.int64
+      else:
+        shape_dtype = dtypes.int32
+      outputs = array_ops.reshape(
+          inputs, constant_op.constant((-1, flattened_dim), shape_dtype))
def testFlattenLargeDim(self):
+    x = array_ops.placeholder(shape=(None, 21316, 21316, 80), dtype='float32')
+    y = core_layers.Flatten()(x)
+    self.assertEqual(y.shape.as_list(), [None, 21316 * 21316 * 80])
177 Tensorflow/Keras dbcb2a5470e40974924cebd0f74d7f117b21bf8e C++ fix overflow overflow compiler compiler compiler, bit cast operation increase variable precision/change variable type increase variable precision increase precision of an integer to int64 auto output_bit_width_mask = (1 << output_bit_width) - 1; auto output_bit_width_mask = (int64(1) << output_bit_width) - 1;
178 Tensorflow/Keras 0d6095963d907e0de1d635842d8ed80759a436ba C++ fix overflow overflow data processing memory allocator ruy allocator, size increase variable precision/change variable type change variable type change from std::size_t to std::ptrdiff_t
179 Tensorflow/Keras 5b4fe5470852d1aea737b194e03727cdedddebca C++ fix underflow underflow exponent smaller than -31 causes underflow quantization quantization quantization rewrite math formula rewrite math formula For exponents smaller than -31, set shift to zero void GuardedQuantizeMultiplier(double effective_output_scale,
-                               int32_t* significand, int* shift) {
-  QuantizeMultiplier(effective_output_scale, significand, shift);
-  // Additional guard to make sure RoundingDivideByPOT does not fail.
-  if (*shift < -31) {
-    // If shift is less than -31, RoundingDivideByPOT fails. This happens when
-    // min and max are close and small. For this particular case, both
-    // significand and shift are set to zero.
-    *significand = 0;
-    *shift = 0;
-  }
-}
void QuantizeMultiplier(double double_multiplier, int32_t* quantized_multiplier,
     ++*shift;
   }

if (*shift < -31) {
+    *shift = 0;
+    q_fixed = 0;
+  }

QuantizeMultiplier(effective_output_scale, &significand, &shift);
TEST(QuantizationUtilTest, QuantizeMultiplierUnderflow) {
+  auto quantize = [](double d) {
+    int32_t q;
+    int s;
+    QuantizeMultiplier(d, &q, &s);
+    return std::pair<int32_t, int>{q, s};
+  };
+
+  EXPECT_THAT(quantize(std::ldexp(1.0f, -31)), Pair(1073741824, -30));
+  EXPECT_THAT(quantize(std::ldexp(1.0f, -32)), Pair(1073741824, -31));
+  EXPECT_THAT(quantize(std::ldexp(0.99f, -32)), Pair(0, 0));
+  EXPECT_THAT(quantize(std::ldexp(1.0f, -33)), Pair(0, 0));
+}
180 Tensorflow/Keras 3af3959377d54414f480d617402274f7e9440316 C++ fix overflow overflow Using sqrt(a^2 + b^2) tensor math tensor math absolute value of a complex number rewrite math formula rewrite math formula use sqrt(a^2 + b^2) = sqrt(a^2 * (1 + b^2/a^2))
                                = |a| * sqrt(1 + (b/a)^2)
With the assumption that |a| >= |b|
case HloOpcode::kAbs: {
-      auto sum_sq = FAdd(
-          FMul(EmitExtractReal(operand_value), EmitExtractReal(operand_value)),
-          FMul(EmitExtractImag(operand_value), EmitExtractImag(operand_value)));
-      return llvm_ir::EmitCallToIntrinsic(llvm::Intrinsic::sqrt, {sum_sq},
StatusOr<llvm::Value*> ElementalIrEmitter::EmitComplexAbs(
+    PrimitiveType prim_type, llvm::Value* operand_value) {
+  auto real = EmitExtractReal(operand_value);
+  auto imag = EmitExtractImag(operand_value);
+  auto abs_real = llvm_ir::EmitCallToIntrinsic(llvm::Intrinsic::fabs, {real},
+                                               {real->getType()}, b_);
+  auto abs_imag = llvm_ir::EmitCallToIntrinsic(llvm::Intrinsic::fabs, {imag},
+                                               {imag->getType()}, b_);
+  auto max = EmitFloatMax(abs_real, abs_imag);
+  auto min = EmitFloatMin(abs_real, abs_imag);
+
+  auto div = FDiv(min, max);
+  auto div_sq = FMul(div, div);
+  auto one = llvm::ConstantFP::get(max->getType(), 1);
+  TF_ASSIGN_OR_RETURN(auto sqrt, EmitSqrt(prim_type, FAdd(one, div_sq)));
+
+  auto zero = llvm::ConstantFP::get(max->getType(), 0);
+  return Select(FCmpOEQ(max, zero), zero, FMul(max, sqrt));
181 Tensorflow/Keras 840f25bd4623e5a9aedcbe6163332f51ee303784 C++ fix overflow overflow signed integer overflow in HandleCopies when batch_size * indices_size * slice_size is larger than int32 data processing parallelism kernels, gather increase variable precision/change variable type increase variable precision use int64 instead of int32 for large values      bool use_large =
+                      batch_size * indices_size * slice_size >
+                          std::numeric_limits<int32>::max());
182 Tensorflow/Keras 8211365f9e8aed8cec7b63d7eb992ab104422f8c C++, Python fix overflow overflow build error on Windows caused by potential int32 overflow data processing data shard size increase variable precision/change variable type increase variable precision increase precision from int32 to int64 to calculate the default shard size self._shard_size_bytes = (
-        shard_size_bytes
-        if shard_size_bytes is not None else 10 * 1024 * 1024 * 1024)
self._pending_snapshot_expiry_seconds = (
         pending_snapshot_expiry_seconds
-        if pending_snapshot_expiry_seconds is not None else 86400)
// Defaults to 10 GiB per shard.
+const int64 kDefaultShardSizeBytes = 10L * 1024 * 1024 * 1024;

   if (shard_size_bytes_ == -1) shard_size_bytes_ = kDefaultShardSizeBytes;
+
+    // Default to 1 day expiry for snapshots.
+    if (pending_snapshot_expiry_seconds_ == -1) {
+      pending_snapshot_expiry_seconds_ = 86400;
+    }

shard_size_bytes if shard_size_bytes is not None else -1)
183 Tensorflow/Keras 8ac1075eac1ab9072e29c025348f749b43f251cf C++ fix overflow overflow optimizers optimizers experimental optimizer limit input range limit input range add upper bound scaling to prevent overflow element = T(UniformDistribution(RandomType(0), RandomType(1), &gen)); auto upper_bound =
+          RandomType(std::is_same<T, Eigen::half>::value ? 0.1 : 1.0);
+      element = T(UniformDistribution(RandomType(0), upper_bound, &gen));
184 Tensorflow/Keras 737600454df83be02fac46e48b093a8892c7241a C++ unit test overflow overflow linear algebra linear algebra matrix multiply limit input range limit input range Avoid the corner case where both lhs and rhs zero_point's are the lowest representable value in their respective quantized type.  E.g. when both LHS and RHS are uint8 with zero_point=0. if (!use_golden && !std::is_floating_point<LhsScalar>::value) {
-    lhs_params.zero_point = random_engine() % 8;

if (!use_golden && !std::is_floating_point<RhsScalar>::value) {
-    rhs_params.zero_point = random_engine() % 8;
if (!std::is_floating_point<LhsScalar>::value) {
+    lhs_params.zero_point = 1;
+    if (!use_golden) {
+      lhs_params.zero_point += random_engine() % 8;
+    }

if (!std::is_floating_point<RhsScalar>::value) {
+    rhs_params.zero_point = 1;
+    if (!use_golden) {
+      rhs_params.zero_point += random_engine() % 8;
+    }
185 Tensorflow/Keras c38b41d7c813e0dc26fa99cf6495ec474a595542 C++ fix overflow overflow possible float-to-integer-cast overflow precision tests/speed benchmarks timing timing increase variable precision/change variable type change variable type cahnge variable for holding processing time from int64 to double const int64 processing_time = TotalProcessingTime(snapshot);
const int64 output_time = OutputTime(snapshot);
int64 best_delta = -1;
int64 new_output_time = OutputTime(snapshot);
int64 delta = output_time - new_output_time;
int64 Model::OutputTime(std::shared_ptr<Node> node)
const double processing_time = TotalProcessingTime(snapshot);
const double output_time = OutputTime(snapshot);
double best_delta = -1.0L;
double new_output_time = OutputTime(snapshot);
double delta = output_time - new_output_time;
double Model::OutputTime(std::shared_ptr<Node> node) {
186 Tensorflow/Keras 52a6cfddef9b51b608b4a554b77a10e1522d56ec C++ fix overflow overflow overflow of variable size data processing parallelism segmented reduction—a parallel reduction over many irregular-length segments. increase variable precision/change variable type increase variable precision change int to int64 for number of threads, block size, block numbers, compute cycles, compute bytes const int num_threads = cpu_device.numThreads();
const int min_block_size = 64;
-    const int max_block_num = std::min(N / min_block_size + 1, num_reductions);
-    int block_num = std::min(max_block_num, num_threads);
-    const int block_size = N / block_num;
const int compute_cycles = 5 * (N - num_reductions) * inner_dim;
-    const int output_bytes = num_reductions * inner_dim * sizeof(T);
const int64 num_threads = cpu_device.numThreads();
const int64 min_block_size = 64;
+    int64 block_num = std::min(num_reductions, num_threads);
+    int64 block_size = (N - 1) / block_num + 1;
+    if (block_size < min_block_size) {
+      block_size = min_block_size;
+      block_num = (N - 1) / min_block_size + 1;
const int64 compute_cycles = 5 * (N - num_reductions) * inner_dim;
+    const int64 output_bytes = num_reductions * inner_dim * sizeof(T);
187 Tensorflow/Keras aa4765a1417950cf2c29afd0172aebdd31b0725f C++ fix overflow overflow cast overflow undefined behavior tensor math tensor math absolute value of a complex number increase variable precision/change variable type increase variable precision Change this function that outputs absolute value to return a double instead of a float to avoid cast overfloat for inputs with types double and complex128.
float FpAbsoluteValue(NativeT value) double FpAbsoluteValue(NativeT value) TEST(LiteralTestUtilTest, ExpectNearDoubleOutsideFloatValueRange) {
+  auto two_times_float_max =
+      LiteralUtil::CreateR0<double>(2.0 * std::numeric_limits<float>::max());
+  ErrorSpec error(0.001);
+  EXPECT_TRUE(
+      LiteralTestUtil::Near(two_times_float_max, two_times_float_max, error));
188 Tensorflow/Keras d0136d4affebd14fee59ba1865d5f1c8fa64251a C++ fix overflow overflow TensorFlow BFC Allocator is a memory allocator that implements a 'best-fit with coalescing' algorithm. index integer overflow data processing memory allocator TensorFlow BFC Allocator increase variable precision/change variable type change variable type change type of an index from int to size_t int IndexFor(const void* p) const
return static_cast<int>(((p_int - base_int) >> kMinAllocationBits));
size_t IndexFor(const void* p) const
return static_cast<size_t>(((p_int - base_int) >> kMinAllocationBits));
189 Tensorflow/Keras f9ac078ebd0d05b64691e6718d404ee801f80c67 C++ fix overflow overflow conversion to float32 results in overflow other other number casting add overflow check add overflow check return error if overflow and return null pointer if infinity double as_double = PyFloat_AsDouble(v);
+    // Handle infinity.
+    if (as_double == std::numeric_limits<double>::infinity()) {
+      *out = std::numeric_limits<T>::infinity();
+      return nullptr;
+    } else if (as_double == -1 * std::numeric_limits<double>::infinity()) {
+      *out = -1 * std::numeric_limits<T>::infinity();
+      return nullptr;
+    }
+    // Check for overflow.
+    if (as_double > std::numeric_limits<T>::max() ||
+        as_double < std::numeric_limits<T>::lowest()) {
+      return ErrorOutOfRangeDouble;
+    }
+    *out = static_cast<T>(as_double);
190 Tensorflow/Keras c8e8f35f3e39b36d105eb7e43321a9da1362f242 C++ fix overflow overflow integer overflow precision tests/speed benchmarks timing timing increase variable precision/change variable type increase variable precision use unsigned long long int event->set_timestamp_ps(node.all_start_micros() * 1000000);
-      event->set_duration_ps(node.all_end_rel_micros() * 1000000);
static constexpr uint64 kMicrosToPicos = 1000ULL * 1000ULL;
const uint64 profile_start_time_micros
    event->set_timestamp_ps(
+          (node.all_start_micros() - profile_start_time_micros) *
+          EnvTime::kMicrosToPicos);
+      event->set_duration_ps(node.all_end_rel_micros() *
+                             EnvTime::kMicrosToPicos);
191 Tensorflow/Keras f1d0c84f699624382c8d66e2ea10205ac0207868 C++ disable test overflow overflow precision tests/speed benchmarks timing timing disable test/warning disable overflow test Skip overflow testing when running with address sanitizer.
192 Tensorflow/Keras 84337310517914ca4b4d6eb35295a65758bc6d75 C++ unit test overflow overflow AveragePool uses a uint16 accumulator which causes it to overflow for
large images
CNN operations pooling layer average pooling add overflow check add overflow check add overflow test N/A // Send in a white image, expect a white pixel.
+TEST(QuantizedPoolingOpTest, AveragePoolImageSize16) {
+  int image_size = 16;
+  QuantizedPoolingOpModel m(
+      BuiltinOperator_AVERAGE_POOL_2D,
+      /*input=*/{TensorType_UINT8, {1, image_size, image_size, 1}, 0, 16},
+      /*filter_width=*/image_size,
+      /*filter_height=*/image_size,
+      /*output=*/{TensorType_UINT8, {}, 0, 16});
+
+  std::vector<float> input(image_size * image_size, 16.f);
+  m.SetInput(input);
+  m.Invoke();
+
+  EXPECT_THAT(m.GetOutput(), ::testing::ElementsAre(255));
+  EXPECT_THAT(m.GetDequantizedOutput(), ElementsAreArray(ArrayFloatNear({16})));
+}
+
+// Send in a white image, expect something other than a white pixel, due to
+// overflow.
+TEST(QuantizedPoolingOpTest, AveragePoolImageSize17) {
+  int image_size = 17;
+  QuantizedPoolingOpModel m(
+      BuiltinOperator_AVERAGE_POOL_2D,
+      /*input=*/{TensorType_UINT8, {1, image_size, image_size, 1}, 0, 16},
+      /*filter_width=*/image_size,
+      /*filter_height=*/image_size,
+      /*output=*/{TensorType_UINT8, {}, 0, 16});
+
+  std::vector<float> input(image_size * image_size, 16.f);
+  m.SetInput(input);
+  m.Invoke();
+
+  // Ordinarily we would see '255' here. However, the optimized version of
+  // AveragePool uses a uint16 accumulator which causes it to overflow for
+  // images this large.
+  EXPECT_THAT(m.GetOutput(), ::testing::ElementsAre(28));
N/A
193 Tensorflow/Keras 434dbe38970ffc90a5b546780be702e0b5de9a0c C++ fix overflow undefined behavior overflow undefined behavior caused by integer overflow in custom float comparison compiler compiler casting limit input range limit input range consider numeric limits return std::numeric_limits<CastType>::max() - casted_value; return static_cast<UnsignedCastType>(std::numeric_limits<CastType>::max()) -
+           casted_value;
194 Tensorflow/Keras fc44600e5c3ccf1de1e3d4792a00d3578311d3f6 Cuda fix overflow overflow index integer overflow linear algebra linear algebra row reduce rewrite math formula rewrite math formula rewrite formula const int row = (blockIdx.x * blockDim.x + threadIdx.x) / 32;
std::size_t temp_storage_bytes = 0;

-  Tensor temp_storage;
-  // written as a loop because it reduces clutter
-  // first pass allocates memory, second launches kernel(s)
-  for (int i = 0; i < 2; ++i) {
-    auto success = cub::DeviceReduce::Reduce(
-        i == 0 ? nullptr : temp_storage.flat<int8_t>().data(),
-        temp_storage_bytes, in, out, in_size, op, init, cu_stream);
assert(blockDim.x % 32 == 0);
+  int warps_per_block = blockDim.x / 32;
+  int warp_index = threadIdx.x / 32;
+  const int row = blockIdx.x * warps_per_block + warp_index;

size_t temp_storage_bytes = 0;
+  auto reduce = [&](void* temp_storage_ptr) {
+    auto success =
+        cub::DeviceReduce::Reduce(temp_storage_ptr, temp_storage_bytes, in, out,
+                                  in_size, op, init, cu_stream);
195 Tensorflow/Keras e66aea59e0367618f924ffe3bc3b1140be8eaf45 C++ fix underflow underflow underflow if data empty data processing data tf.data / Cloud Bigtable rewrite math formula rewrite math formula change order of operations if (index_ > keys_.size() - 2) {
if (index_ + 2 > keys_.size()) {
196 Tensorflow/Keras 880390941ce6430996c8f842540f73b53f3d1d8e Python fix overflow overflow int32 overflow data processing parallelism parallelism increase variable precision/change variable type increase variable precision use int64 number of segments to guard against int32 overflow num_segments *= n num_segments = math_ops.cast(num_segments, dtypes.int64) * math_ops.cast(
+      n, dtypes.int64)
197 Tensorflow/Keras f0d7172a30954b6696bdf2f40a5be11e7fdeb39c C++ fix overflow overflow int overflow compiler compiler compiler, shape inference add overflow check add overflow check return invalid argument if number of features is not positive if (feature_group_count <= 0) {
+    return InvalidArgument(
+        "feature_group_count must be a positive number, got %d",
+        feature_group_count);
198 Tensorflow/Keras 63bac283d12899a2d769a768729942c4f64436ea C++ fix overflow undefined behavior overflow undefined behavior due to signed integer overflow data processing image processing bmp image decoding rewrite math formula rewrite math formula rewrite formula const int row_size = (8 * channels_ * width + 31) / 32 * 4; const int row_size = (channels_ * width + 3) / 4 * 4;
199 Tensorflow/Keras 4f7a169a7eb97ea4819217f14705d6c2bd125355 C++ fix overflow overflow Need to handle overflow in devision and remainder compiler compiler compiler, elemental emiter, division, remainder add overflow check add overflow check Define integer division overflow for CPU/GPU N/A       X / 0 == -1
      X % 0 == X
      INT_SMIN / -1 = INT_SMIN
      INT_SMIN % -1 = 0
200 Tensorflow/Keras d7ebc1f4ca2c677710c5257d30c757f0f8b604c6 Python fix overflow overflow overflow in flops calculations in nn_ops.py CNN operations CNN flops calculation, product increase variable precision/change variable type increase variable precision use int64 for product calculation output_count = np.prod(output_shape.as_list())
output_count = np.prod(output_shape.as_list(), dtype=np.int64)
201 Tensorflow/Keras e7674c09a151cac07bae43f6fe8551e8fec6dfe0 C++ fix overflow overflow array index overflow in TransformFilter functor CNN operations convolution indexing, 2D convolution limit input range limit input range subtract 2 from number of dimentions to iterate over for (int i = 0; i < NDIMS; ++i) {            // spatial dimensions for (int i = 0; i < NDIMS - 2; ++i) {        // spatial dimensions
202 Tensorflow/Keras aec5a0191e21ce022f47d743a4954e13f710cd8f C++ fix overflow overflow very large and branchy models, where the number of paths is exponential to the number of nodes can overflow - specifically an overflow in hlo_scheduling, when compiling AutoML models compiler compiler compiler, HLO (high level operations) limit input range limit input range set min and max for total number of HLOs N/A int64 total_hlos = computation.parent()->NumUniqueInstructionIds();
extra_users[hlo] = std::min(extra_users[hlo], total_hlos);
203 Tensorflow/Keras 503b7c11b44ee8b238946b345efea503058652c0 Python disable test overflow overflow SinhArcsinh: Y = g(X) = Sinh( (Arcsinh(X) + skewness) * tailweight ) * multiplier. overflow test fails other transformations square, Bijective transformations disable test/warning disable overflow test Skipped the check that fails due to overflow error # Do the numpy calculation in float128 to avoid inf/nan.
-        y_float128 = np.float128(y)
-        self.assertAllClose(
-            np.log(np.cosh(
-                np.arcsinh(y_float128) / tailweight - skewness) / np.sqrt(
-                    y_float128**2 + 1)) -
-            np.log(tailweight),
-            bijector.inverse_log_det_jacobian(y, event_ndims=0).eval(),
-            rtol=1e-4,
-            atol=0.)
  # On IBM PPC systems, longdouble (np.float128) is same as double except that it can have more precision.
+        # Type double being of 8 bytes, can't hold square of max of float64 (which is also 8 bytes) and
+        # below test fails due to overflow error giving inf. So this check avoids that error by skipping square
+        # calculation and corresponding assert.
+
+        if np.amax(y) <= np.sqrt(np.finfo(np.float128).max) and \
+           np.fabs(np.amin(y)) <= np.sqrt(np.fabs(np.finfo(np.float128).min)):
+
+          # Do the numpy calculation in float128 to avoid inf/nan.
+          y_float128 = np.float128(y)
+          self.assertAllClose(
+              np.log(np.cosh(
+                  np.arcsinh(y_float128) / tailweight - skewness) / np.sqrt(
+                      y_float128**2 + 1)) -
+              np.log(tailweight),
+              bijector.inverse_log_det_jacobian(y, event_ndims=0).eval(),
+              rtol=1e-4,
+              atol=0.)
204 Tensorflow/Keras f5dbc1e16622f433f41f195bb33f56d674a004ce C++ fix overflow overflow TensorFlow Lite Converter converts TensorFlow graphs into TensorFlow Lite graphs overflow in shape calculation
TensorFlow's shapes use int64s, while TOCO uses ints.
tensor math tensor shape shape, Tensorflow Lite Converter (TOCO) add overflow check add overflow check
205 Tensorflow/Keras 9f312f32091534bfc115212d2ec7c838180df663 C++ fix overflow overflow overflow due to large values other random number generator random tensor generation limit input range limit input range Updating Generate Random Tensor to generate tensors whose values are small and do not cause overflow for arithmetic operations. tensor.flat<T>() = tensor.flat<T>().random(); for (auto i = 0; i < tensor.NumElements(); i++)
+      tensor.flat<T>()(i) = i + random::New64() % 10;
206 Tensorflow/Keras 6a7779f3384e48012d3e27ae0f48d410f5174d06 C++ fix overflow overflow undefined signed integer overflow statistical distributions statistical distributions random uniform distribution limit input range limit input range impose coditions on random number generation to prevent overflow result[i] = lo_ + static_cast<int32>(sample[i] % range_);
result[i] = lo_ + static_cast<int64>(bits % range_);
template <typename Int>
+PHILOX_DEVICE_INLINE Int SignedAdd(Int a,
+                                   typename std::make_unsigned<Int>::type b) {
+  auto b_div_2 = b >> 1;
+  return a + static_cast<Int>(b_div_2) + static_cast<Int>(b - b_div_2);

result[i] = SignedAdd(lo_, sample[i] % range_);
result[i] = SignedAdd(lo_, bits % range_);
207 Tensorflow/Keras d107fee1e4a9a4462f01564798d345802acc2aef C++ fix overflow overflow other other I/O limit input range limit input range consider numeric limits N/A if (kBlockTrailerSize > std::numeric_limits<size_t>::max() - n) {
+    return errors::DataLoss("handle.size() too big");
+  }
+
N249
208 Tensorflow/Keras 665a4bf664546224c65eeb5a0a52d80e48e2f3e1 C++ fix overflow overflow int64 overflow and low accuracy compiler compiler compiler, HLO (high level operations), size use a different algorithm use a different algorithm The new implementation computes the
    min of the previous overestimate and the sum of all HLO's
    before-and-including the current HLO in a topological sort of the
    graph.
209 Tensorflow/Keras 11f1e50886f91ce2caa6e53b0bc9a1e82abdda8e Python unit test overflow overflow exp() test overflowing tensor math tensor math exponential limit input range limit input range Keep the results below 2^31 in exp(), consider min and max create_tensor_data(parameters["input_dtype"], parameters["input_shape"]) create_tensor_data(parameters["input_dtype"], parameters["input_shape"],
+                           min_value=-100, max_value=9)
210 Tensorflow/Keras 49f73c55d56edffebde4bca4a407ad69c1cae433 C++ fix overflow overflow integer overflow data processing image processing bmp image decoding increase variable precision/change variable type increase variable precision Fix integer overflow in BMP decoder by making the checks in DecodeBmp
more stringent.  Total possible pixel bytes must be less than 2^30. Also, increase orecision of image size from int to int64. Add fuzzer to improve the robustness of the decoder in the future.
const int last_pixel_offset =
-        header_size + (abs(height) - 1) * row_size + (width - 1) * channels_;
-    const int expected_file_size = last_pixel_offset + channels_;
OP_REQUIRES(context, width > 0 && header_size >= 0,
+                errors::InvalidArgument("Width must be positive"));
+    OP_REQUIRES(context, header_size >= 0,
+                errors::InvalidArgument("header size must be nonnegative"));
+
+    // The real requirement is < 2^31 minus some headers and channel data,
+    // so rounding down to something that's still ridiculously big.
+    OP_REQUIRES(
+        context,
+        (static_cast<int64>(width) * std::abs(static_cast<int64>(height))) <
+            static_cast<int64>(std::numeric_limits<int32_t>::max() / 8),
+        errors::InvalidArgument(
+            "Total possible pixel bytes must be less than 2^30"));
+
+    const int32 abs_height = abs(height);

const int64 last_pixel_offset = static_cast<int64>(header_size) +
+                                    (abs_height - 1) * row_size +
+                                    (width - 1) * channels_;

const int64 expected_file_size = last_pixel_offset + channels_;
211 Tensorflow/Keras 7f88363810e77a39db919fb4000583ad0138e53c C++ fix overflow overflow integer overflow other computational graph shape size propagation in a tf graph increase variable precision/change variable type increase variable precision increase precision from int to int64 for max loops const int num_loops = new_shapes->size();
-  const int max_loop_length = item_.graph.node_size();
-  const int max_rank = 4;
-  const int max_loop_iterations =
-      max_rank * max_loop_length * std::max(1, num_loops * num_loops);
-  const int num_queues = resources.size();
-  const int max_resource_iterations = num_queues * num_queues * max_rank;
-
-  int num_resource_iterations = 0;

int num_loop_iterations = 0;
const int64 num_loops = new_shapes->size();
+  const int64 max_loop_length = item_.graph.node_size();
+  const int64 max_rank = 4;
+  const int64 max_loop_iterations =
+      max_rank * max_loop_length * std::max<int64>(1, num_loops * num_loops);
+  const int64 num_queues = resources.size();
+  const int64 max_resource_iterations = num_queues * num_queues * max_rank;
+
+  int64 num_resource_iterations = 0;
   do {
+    int64 num_loop_iterations = 0;
212 Tensorflow/Keras 192f1c24ec6692342391c03bb620f5de1af9de3b C++ fix overflow overflow integer overflow data processing parallelism parallelism rewrite math formula rewrite math formula rewrite formula for calculating maximum number of elements -         input_shape.num_elements() >=
-             std::max(num_threads, num_split) * 4096 &&
-         input_shape.num_elements() < num_split * 180 * 1024);

-            num_split, kint64max, range_output_func);
-         input_shape.num_elements() >=
-             std::max(num_threads, num_split) * 4096 &&
-         input_shape.num_elements() < num_split * 180 * 1024);
-            num_split, kint64max, range_output_func);
const auto input_element_count = input_shape.num_elements();
input_element_count >= std::max(num_threads, num_split) * 4096 &&
+         input_element_count < num_split * 180 * 1024);
num_split, input_element_count / num_split, range_output_func);
input_element_count >= std::max(num_threads, num_split) * 4096 &&
+         input_element_count < num_split * 180 * 1024);
num_split, input_element_count / num_split, range_output_func);
213 Tensorflow/Keras b1c095a28a7aa9bbee4af4d9a7e9d0c60567765b Python fix underflow underflow underflow in log probability statistical distributions statistical distributions multinomial distribution, log probabilty use a different algorithm use a different algorithm use log softmax and logits instead of log and probabilities return math_ops.reduce_sum(counts * math_ops.log(self.probs), -1)
return math_ops.reduce_sum(counts * nn_ops.log_softmax(self.logits), -1) def testPmfUnderflow(self):
+    logits = np.array([[-200, 0]], dtype=np.float32)
+    with self.test_session():
+      dist = multinomial.Multinomial(total_count=1., logits=logits)
+      lp = dist.log_prob([1., 0.]).eval()[0]
+      self.assertAllClose(-200, lp, atol=0, rtol=1e-6)
214 Tensorflow/Keras 74137f994faad09593ae2daad6251a4ccf72f558 C++ fix overflow overflow When a node name has a long numeric suffix, e.g.,
    "foo/y_0/gradient_debug_09684b60f2184c67b744721915034528" (as has happened with tfdbg GradientsDebugger),
    
    the parsing algorithm in ParseTensorName() may experience signed int overflow.
other other tensor name parser increase variable precision/change variable type change variable type use unsigned int instead of signed int -  int index = 0;
-  int mul = 1;
unsigned int index = 0;
+  unsigned int mul = 1;
215 Tensorflow/Keras 793fa4e91d3cae77565f753c2b8d769e1a3928f8 Python fix overflow overflow vimco package proves a Bayesian variable selection method for GWAS data with multiple traits. Unlike in BVSR where each trait is analyzed seperately, vimco performs a joint analysis for the multiple traits, while accounting for correlation among the multiple traits. Csiszar f-Divergence generalized VIMCO objective overflow issue in Csiszar-VIMCO other probability gradient estimator, csiszar divergence rewrite math formula rewrite math formula rewrite formula for log sum - subtract maximum from input log_sum_u = math_ops.reduce_logsumexp(logu, axis=0)
return log_sum_u - log_n, log_soosum_u - log_n
log_max_u = math_ops.reduce_max(logu, axis=0)
+    log_sum_u_minus_log_max_u = math_ops.reduce_logsumexp(
+        logu - log_max_u, axis=0)


is_positive_and_largest = math_ops.logical_and(
+        logu > 0.,
+        math_ops.equal(logu, log_max_u[array_ops.newaxis, ...]))
+    log_lomsum_u = math_ops.reduce_logsumexp(
+        array_ops.where(is_positive_and_largest,
+                        array_ops.fill(array_ops.shape(logu), -inf),
+                        logu),
+        axis=0, keep_dims=True)
+    log_lomsum_u = array_ops.tile(
+        log_lomsum_u,
+        multiples=1 + array_ops.pad([n-1], [[0, array_ops.rank(logu)-1]]))
+
+    d_not_ok_result = array_ops.where(
+        is_positive_and_largest,
+        log_lomsum_u,
+        array_ops.fill(array_ops.shape(d), -inf))
+
+    log_loosum_u = array_ops.where(d_ok, d_ok_result, d_not_ok_result)

  
log_avg_u = log_sum_u_minus_log_max_u + log_max_u - log_n
+    log_sooavg_u = log_soosum_u - log_n
+
+    log_avg_u.set_shape(logu.shape.with_rank_at_least(1)[1:])
+    log_sooavg_u.set_shape(logu.shape)
+
+    return log_avg_u, log_sooavg_u
216 Tensorflow/Keras d906c963269dd1522c7693c8f944e6a846b86221 C++ unit test overflow overflow signed integer overflows detected with -fsanitize=signed-integer-overflow compiler compiler compiler, shape inference increase variable precision/change variable type change variable type, add overflow check use unsigned int instead of signed int to prevent undefined behavior and report error if overflow const int64 b = a + 1;
const int64 sum = first_value + second_value;
-  int64 result = 0;
[](int i) { return static_cast<float>(i * i * i); });
const int64 b = a - 1;
const int64 sum = static_cast<uint64>(first_value) + second_value;

uint64 result = 0;
[](int i) { return static_cast<float>(i) * i * i; });
217 Tensorflow/Keras 931fd84bb72df0500f512d5d92ec0bef2ea461be Python fix overflow overflow numpy.prod overflow on windows gradients/derivatives gradients shape, gradient, tensor increase variable precision/change variable type increase variable precision perform computations in int64 instead of int32 and then convert result to int32 shape_size = np.prod(shape)
num_elements = np.prod(shape)
params_shape = array_ops.shape(params)
shape_size = np.prod(shape, dtype=np.int64)
num_elements = np.prod(shape, dtype=np.int64)
params_shape = array_ops.shape(params, out_type=ops.dtypes.int64)
params_shape = math_ops.to_int32(params_shape)
218 Tensorflow/Keras e8ee5286a686c6fc3057ba7cf9ba9ef7003789a6 C++ fix overflow overflow data processing data tensor shape, multipy, size limit input range limit input range, add overflow check Remove 2**40 size limit on TensorShape, use std::numerica_limits instead. the previous TensorShape code did not check for overflow when multiplying
219 Tensorflow/Keras 3c9ba5673cf560ded0739530b673ab0a05d43630 C++ unit test overflow overflow integer overflow, undefined behavior, square other random number generator pseudo-random number generator increase variable precision/change variable type increase variable precision cast from int32 to int64 sum += Square(counts[i] - expected_count); sum += Square(static_cast<int64>(counts[i] - expected_count));
220 Tensorflow/Keras 60e7360dfcf8951c4a269cfddd2a9cf2a05d7f91 Python fix overflow/underflow overflow/underflow Adjust the brightness of RGB or Grayscale images. The current implementation (i.e. without clipping before conversion) introduces different behavior for images with different original data types, i.e. uint8 or float32. data processing image processing images, adjust brightness limit input range limit input range clip image into [0.0, 1.0] before converting back to original data type in 'adjust_brightness' N/A adjusted = clip_ops.clip_by_value(adjusted, 0.0, 1.0) def testNegativeDeltaFloat(self):
+    x_shape = [2, 2, 3]
+    x_data = [0, 5, 13, 10, 135, 226, 37, 8, 245, 90, 255, 1]
+    x_np = np.array(x_data, dtype=np.float32).reshape(x_shape) / 255.
+
+    y_data = [0, 0, 3, 0, 125, 216, 27, 0, 235, 80, 245, 0]
+    y_np = np.array(y_data, dtype=np.float32).reshape(x_shape) / 255.
+
+    self._testBrightness(x_np, y_np, delta=-10. / 255.)
221 Tensorflow/Keras ec58d4042790e71172964383f737b249289d15af Python fix underflow underflow statistical distributions statistical distributions gumbel distribution limit input range limit input range set min value with np.finfo(np_dtype).tiny
222 Tensorflow/Keras 096ab75275862f973b2fd1a369a9fd25952a6c37 C++ fix overflow overflow text files larger than 2B words overflows other NLP word to vec embedding, size increase variable precision/change variable type increase variable precision increase precision of corpus size from int32 to int64 int32 corpus_size_ = 0; int64 corpus_size_ = 0;
223 Tensorflow/Keras e6e06b2fc89d41556d159d1181a558f8f5352b87 C++ fix overflow overflow other other strings rewrite math formula rewrite math formula rewrite formula for checking overflow if (new_v < v) { if (new_v / 8 < v) { // (2^64-1)*10+9
+  TestConsumeLeadingDigits("184467440737095516159yz", -1,
+                           "184467440737095516159yz");
224 Tensorflow/Keras 4ad8912996a25136a280312de3801f30dd4d4a74 C++ unit test overflow overflow overflow in float-int32 cast quantization quantization quantization use a different algorithm use a different algorithm const int values_count = sizeof(T) == 256 ? 256 : 50000;
if (sizeof(T) == 256) {
input_array(i) = Eigen::NumTraits<T>::lowest() +
                      static_cast<int32>(q_range / values_count * i);
const int values_count = sizeof(T) == 1 ? 256 : 50000;
if (sizeof(T) == 1) {
int64 offset = static_cast<int64>(q_range / values_count * i);
+          input_array(i) = static_cast<int32>(
+              Eigen::NumTraits<T>::lowest() +
+              std::min<int64>(Eigen::NumTraits<T>::highest(), offset));
225 Tensorflow/Keras f4264cb8e1ea70c612170ed72b9fe0382d1967a0 C++ fix overflow overflow overflow when using float in eigen to quantize to QInt32 quantization quantization quantization limit input range limit input range use bounds that can be converted back to int32 without going outside the range of an int32. static float upper_bound_float() {
+    return Eigen::numext::mini(
+        static_cast<float>(Eigen::NumTraits<T>::highest()), +2.147483520e+09f);

static float lower_bound_float() {
+    return Eigen::numext::maxi(
+        static_cast<float>(Eigen::NumTraits<T>::lowest()), -2.147483648e+09f);
226 Tensorflow/Keras 6047c6977dbc30f018b8b3ea0486ca907901dabb C++ fix overflow overflow data processing data png I/O increase variable precision/change variable type increase variable precision Force height*row_bytes computations to use 64 bits. N/A int64 height = static_cast<int64>(height_in);
227 Tensorflow/Keras f4686d27a705bd547b828693462714d31bfd21ce C++ fix overflow overflow static_cast overflow in WorkSharder Shard data processing data shard, dataset increase variable precision/change variable type increase variable precision cast intermediate variable to higher precision const int num_shards = std::max(
-      1, std::min<int>(num_workers, total * cost_per_unit / kMinCostPerShard));
const int num_shards =
+      std::max<int>(1, std::min(static_cast<int64>(num_workers),
+                                total * cost_per_unit / kMinCostPerShard));
TEST(Shard, OverflowTest) {
+  thread::ThreadPool threads(Env::Default(), "test", 3);
+  mutex mu;
+  for (auto workers : {1, 2, 3}) {
+    const int64 total_elements = 1LL << 32;
+    const int64 cost_per_unit = 10000;
+    int num_shards = 0;
+    int64 num_elements = 0;
+    Shard(workers, &threads, total_elements, cost_per_unit,
+          [&mu, &num_shards, &num_elements](int64 start, int64 limit) {
+            mutex_lock l(mu);
+            ++num_shards;
+            num_elements += limit - start;
+          });
+    EXPECT_EQ(num_shards, workers);
+    EXPECT_EQ(num_elements, total_elements);
+  }
+}
228 PyTorch e6000a7c045cbece5fbfd7d933c39e40b1625037 Python Disable test loss of precision Quantize = convert from float 32 ro int 8, dequantize = convert from int 8 to float 32

During training, all calculations are done in floating point, with fake_quant modules modeling the effects of quantization by clamping and rounding to simulate the effects of INT8. After model conversion, weights and activations are quantized, and activations are fused into the preceding layer where possible. It is commonly used with CNNs and yields a higher accuracy compared to static quantization. Quantization Aware Training is also known as QAT.
test_numerical_consistency_per_tensor in test_fake_quant is failing on Windows. The test is comparing numerical consistency between CPU quantize/dequantize op and the CPU fake quantize op. quantization quantization testing, quantization disable test/warning disable precision test Temporarily disables a test for comparing numerical consistency between CPU quantize/dequantize op and the CPU fake quantize op
229 PyTorch 02d318461e5c7bded304c42ed7075de84f71dac6 Python Disable test loss of precision Quantized operations require FBGEMM. FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference. FBGEMM is only optimized for CPUs with instruction set support avx2 or newer. In Pytorch, quantization currently supports two backends: fbgemm (for use on x86,) and qnnpack (for use on the ARM)
Failing test quantization quantization testing, quantization disable test/warning disable precision test Temporarily disable test_numerical_consistency_per_channel due to failure
230 PyTorch b7038f7c37e955f7400459bbfc9382a77b16377d Python Change exception to a warning loss of precision exception This test script compares if two values are “close enough” and handles +inf, -inf, nan numerical differences raise exception precision tests/speed benchmarks accuracy testing tensor compare, testing accuracy, JIT relax accuracy test tolerance relax accuracy test tolerance changes errors to warnings when numerical differences found by replacing self.assertRaisesRegex with assertWarnsRegex
231 PyTorch 032e4f81a8df14fe8b7177957f73567fa04919e8 Python Unit test overflow Test for overflow does not verify that all listed conditions throw, just the first one precision tests/speed benchmarks overflow test testing fix test/warning fix overflow check Update test to check that the correct exceptions are raised when attempting to convert and invalid value to a certain type. Refactor code: add 'with' and 'assert' for every condition.
232 PyTorch 86abc8cd481bfa2b9bb741722770796966778ab1 C++ Change exception to a warning overflow PyTorch has a JIT compiler and a method to allow for inserting instructions as the compiler is compiling on the go. In this case an overflow check is inserted. other other C++ interpreter fix test/warning change variable type, change exception to a warning Change an exception to instead just raise a non fatal warning, also, this changes a cast to use unsigned variants of 16 and 64 bit integers, which both allows double the amount of positive values these types can represent, as well as giving them well defined overflow behavior, which avoids undefined behavior in the event that they do overflow, and will simply wrap arround. throw std::runtime_error("safe_narrow_cast<>() failed due to overflow");
safe_narrow_cast<int16_t, int64_t>(N));
TORCH_WARN(
+        "ATTENTION: your model computation is overflowing, safe_narrow_cast<>() failed");
+    return v;

safe_narrow_cast<uint16_t, uint64_t>(N));
233 PyTorch 2171f910531be28f7d5dd8e6ab8bff3a5486e6fd Python Unit test overflow ROCm is the first open-source software development platform for HPC/Hyperscale-class GPU computing The test was previously turned off because of broken continuous integration on ROCm precision tests/speed benchmarks overflow test testing overflow, Cuda add overflow check add overflow check reenable cuda_kernel_loop_overflow_large test def test_cuda_kernel_loop_overflow_large(self):
         # Make sure input.numel() > INT_MAX is handled:
         x = torch.randn(1, 1, 1, 2**31, dtype=torch.float16, device="cuda")
234 PyTorch 916eee182c9dc8d335501f6672842c6d29f0af58 Python Unit test overflow A test that checks input shape of 2D convolution prints overflowed integers.
Bug in error message:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 6 1 5 5 2323362894317625376, but got 5-dimensional input of size [1, 10, 1, 28, 28] instead
Correct error message:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 6 1 5 5, but got 5-dimensional input of size [1, 10, 1, 28, 28] instead
CNN operations convolution 2D convolution fix test/warning correct error message add unit test to test shape mismatch for 2d convolutions def test_mismatch_shape_conv2d(self):
+        x = torch.randn(1, 10, 1, 28, 28)
+        w = torch.randn(6, 1, 5, 5)
+
+        with self.assertRaisesRegex(RuntimeError,
+                                    r'Expected 4-dimensional input for 4-dimensional weight 6 1 5 5,' +
+                                    r' but got 5-dimensional input of size \[1, 10, 1, 28, 28\] instead'):
+
+            F.conv2d(x, w)
235 PyTorch 3805be62c1bb10b8bf4e645aac30d89efd8f79ab Python Unit test overflow quantization test fails due to overflow when width parameter is specified quantization quantization quantization, testing increase variable precision/change variable type increase variable precision skip test and get rid of width parameter. Note: no longer in pytorch @given(A=hu.tensor(shapes=((3, 4, 5),),
                        qparams=hu.qparams()),
-           b=st.floats(allow_infinity=False, allow_nan=False, width=32))

@unittest.skip("FIXME: Failing due to overflow error without width option")
     @given(A=hu.tensor(shapes=((3, 4, 5),),
                        qparams=hu.qparams()),
+           b=st.floats(allow_infinity=False, allow_nan=False))
236 PyTorch 1ed488da4f88ec7b85ba5f6a4113908dda3681e3 Python Unit test loss of precision non-standard precision non-standard precision custom precision testing fix test/warning fix precision test fix precision test for inplace mode
         for inplace in (True, False):
-            if len(decl) == 3:
-                name, constr, arg_constr = decl
-                desc = ''
-            elif len(decl) == 4:
-                name, constr, arg_constr, desc = decl
             if inplace:
                 name = name + '_'
             if not hasattr(tensor, name):
@@ -335,8 +337,6 @@ for decl in tests:
             if desc:
                 test_name += '_' + desc

-            precision = custom_precision.get(name, TestCuda.precision)
     for t in types:
         tensor = t()
         gpu_tensor = get_gpu_type(t)()
+        if len(decl) == 3:
+            name, constr, arg_constr = decl
+            desc = ''
+        elif len(decl) == 4:
+            name, constr, arg_constr, desc = decl
+
+        precision = custom_precision.get(name, TestCuda.precision)
237 Tensorflow/Keras 37af1b8790d633b9002ab04a0e664ca3c1dbe508 Python fix loss of precision data processing batch normalization batch normalization rewrite math formula rewrite math formula Do not use moving average in batch normalization since the method moments that calculate the mean of input that is utilized already implements this ogic in a numerically stable way
238 Tensorflow/Keras f93960d0afdcf59457b614158ee5575ca2acfe15 Python fix N/A incorrect comment about numerical stability statistical distributions statistical distributions Beta distribution fix test/warning delete incorrect comment
239 PyTorch 8c8918c3412aa1a7a50df02cddfd66be948d2ace C++ Fix overflow non-standard precision non-standard precision half precision, overflow testing fix test/warning fix overflow check make half precision overflow checks consistent with other types template<> bool overflows<Half, double>(double f) {
+  using limit = std::numeric_limits<double>;
+  if (limit::has_infinity && std::isinf(f)) {
+    return false;
+  }
240 PyTorch 79c3ebc040c4bac896477030d8af4ac94bc6f440 Python Unit test loss of precision Unit test was not aware of the precision of inputs activation functions activation functions testing fix test/warning fix overflow check Add argument to make assertion aware of precision of inputs.
241 PyTorch 2b902e9738f5346050814b40db3ec67faf37128a C++ fix loss of precision An offset within an array or other data structure object is an integer indicating the distance (displacement) between the beginning of the object and a given element or point, presumably within the same object. The concept of a distance is valid only if all elements of the object are of the same size (typically given in bytes or words).

For example, in A as an array of characters containing "abcdef", the fourth element containing the character 'd' has an offset of three from the start of A.

In assembly language an offset usually denotes the number of address locations added to a base address in order to get to a specific absolute address.
offset numerical bug when casting quantization quantization quantization, caffe2, type conversion increase variable precision/change variable type change variable type change all_offsets variable type from float to int32_t std::vector<std::vector<float>>* all_offsets) std::vector<std::vector<int32_t>>* all_offsets)
242 PyTorch 5292685d2f144d9781ab8b7991c0a1153098a477 C++ Fix loss of precision -inf, NaN Logarithms of determinants of large positive definite matrices appear ubiquitously in ML. Log-determinant computation involves the Cholesky decomposition loss of precision when diagonal matrix contains small values. log determinant of a square matrix causes -inf when the matrix entries are very small numbers. Result is -inf if input has zero log determinant. If input has negative determinant, the result is NaN linear algebra linear algebra linear algebra, log of matrix determinant use a different algorithm use a different algorithm Use sign of diagonal of U instead of the matrix determinant when diag_U has a lot small values.
determinant of a matrix, log of a matrix
243 PyTorch 67f2039f4ce233754910ebc24fbfcc8bc68685ae Python Fix inefficient algorithm slow execution The binomial distribution is used when there are exactly two mutually exclusive outcomes of a trial, e.g., a coin toss has only two outcomes: heads and tails. A single binary outcome has a Bernoulli distribution, and a sequence of binary outcomes has a Binomial distribution.. The binomial distribution gives the discrete probability distribution P_p(n|N) of obtaining exactly n successes out of N Bernoulli trials (where the result of each Bernoulli trial is true with probability p and false with probability q=1-p). Log probability in binomial distribution has numerical stability issues. issue manifests itself when `total_count` is high and `probs` is very low. step size unreasonably small statistical distributions statistical distributions distributions, log probability, binomial distribution rewrite math formula rewrite math formula log probability method in binomial distribution is unstable max_val = (-self.logits).clamp(min=0.0)
value * self.logits + self.total_count * max_val -
-                self.total_count * torch.log1p((self.logits + 2 * max_val).exp()))
value * self.logits - self.total_count * torch.log1p(self.logits.exp())) @unittest.skipIf(not TEST_NUMPY, "NumPy not found")
    def test_binomial_log_prob_float(self):
        probs = torch.tensor([1e-5, 0.99999], dtype=torch.float)
        total_count = 1000000.
        x = torch.tensor([10, 9999], dtype=torch.float)
        expected = scipy.stats.binom(total_count, probs.numpy()).logpmf(x.numpy())
        log_prob = Binomial(total_count, probs).log_prob(x)
        # Comparison is again scipy distributions which use float64.
        self.assertTrue(np.allclose(log_prob, expected, rtol=0.05))
        logits = probs_to_logits(probs, is_binary=True)
        log_prob = Binomial(total_count, logits=logits).log_prob(x)
        self.assertTrue(np.allclose(log_prob, expected, rtol=0.05))
244 PyTorch a17c0118a52d34c97ab48bae416ae1896ad14e56 C++ Fix overflow NaN loss overflow Binary Cross Entropy (BCE) is a loss function used for binary classification taks to measure the difference between true labels and predicted labels. BCE with logits takes logits, not predicted labels as input, but serves the same purpose. Binary cross entropy with logits is unstable with positive weights argument when logits are large negative values and results in an inf. Positive weight is a weight of positive examples and must be a vector with length equal to the number of classes. loss functions loss functions binary cross entropy loss rewrite math formula rewrite math formula instead of multipling by 1 + exp(-input), add exp(-input-max_val) loss = (1 - target).mul_(input).add_(log_weight.mul_((-max_val).exp_().mul_(1 + (-input).exp_()).log_().add_(max_val))); loss = (1 - target).mul_(input).add_(log_weight.mul_(((-max_val).exp_().add_((-input - max_val).exp_())).log_().add_(max_val)));
def test_bce_with_logits_stability(self):
+        output = torch.tensor([0., -120.])
+        target = torch.tensor([0., 1.])
+        pos_weight = torch.tensor([1., 1.])
+
+        out1 = nn.BCEWithLogitsLoss()(output, target)
+        self.assertTrue(torch.isfinite(out1).all().item())
+
+        out2 = nn.BCEWithLogitsLoss(pos_weight=pos_weight)(output, target)
+        self.assertTrue(torch.isfinite(out2).all().item())
H_p(q) = -1/N * sum_from_i_to_N(y_i * log(p(y_i)) + (1-y_i) * log (1-p(y_i)) log, multiply
245 PyTorch 00d2befba11a1e9c85146a4470721eb75596d5b7 Cuda Fix loss of precision TH = TorcH
This is in directory aten/src, which contains the low-level tensor libraries for PyTorch, as well as the new ATen C++ bindings. The low-level libraries trace their lineage from the original Torch. There are multiple variants of the library, summarized here:

TH = TorcH
THC = TorcH Cuda
THCS = TorcH Cuda Sparse (now defunct)
THCUNN = TorcH CUda Neural Network (see cunn)
THNN = TorcH Neural Network (now defunct)
THS = TorcH Sparse (now defunct)
unstable TorcH Cuda Tensor outer dimentions (THTensor_varOuterDim) tensor math tensor math low level tensor math, variance calculation, GPU increase variable precision/change variable type change variable type Use Accreal variable type instead of real def test_var_stability(self):
        tensor = torch.FloatTensor([2281.5, 2281.25]).cuda()

        # Stability for inner dim
        self.assertEqual(tensor.var(0)[0], 0.03125)

        # General stability
        self.assertEqual(tensor.var(), 0.03125)

        # Stability for outer dimensions
        tensor = tensor.unsqueeze(1)
        self.assertEqual(tensor.var(0)[0], 0.03125)
variance
246 PyTorch 72a257584efa7fb63b14f09d19efc96caa5d6e4d Cuda Fix overflow/underflow overflow/underflow Log sigmoid is a logistic non-linear activation function. However, typically softmax is prefered over sigmoid numerically unstable logsigmoid activation functions activation functions log sigmoid rewrite math formula rewrite math formula rewrite formula for log sigmoid considering the maximum representable values const float fmax =
      (float)((int32_t)(uint32_t)qmax - (int32_t)(uint32_t)zero_point);

const T z = THCNumerics<T>::exp(- *input);
-    *gradInput = *gradOutput * z / (1.f + z);
const T max = fmaxType(0.f, -*input);
+    const T z = THCNumerics<T>::exp(-max) + THCNumerics<T>::exp(-*input -max);
+    T max_deriv = 0.f;
+    T sign = -1.f;
+    if (*input < 0.f){
+        max_deriv = -1.f;
+        sign = 1.f;
+    }
+    *gradInput = *gradOutput * (-max_deriv - sign*((z - 1.f)/z));
+    *gradInput = *gradOutput * (-max_deriv - sign*((z - 1.f)/z));
log(1/1+e^(-x)) log sigmoid
247 PyTorch f555c6308c534dd3964d106f2551067fad6edaec Cuda, C++ Fix loss of precision Normalized gradient helps to ameliorate issues with gradient descent such as slow convergenece and getting stuck in saddle points. Normalized gradient is the gradient divided by its magnitude. Therefore, when normalized gradient only provides the direction for gradient descent, but does not affect magnitude of step size. Gradient magnitude is calculated as the square root of sum of squares of the gradient vector. normalization operation for gradient is unstable due to sum of squares operation gradients/derivatives gradients gradient normalization rewrite math formula rewrite math formula rewrite math formula in Cuda:
grad_mat[index] = (y_ij / x_ij) * (dy_ij - y_ij) * row_sum;

in C++:
gradInMat = ((outputMat / inputMat) * (gradOutMat - outputMat)).rowwise() *
      (gradOutMat * inputMat).colwise().sum();
in Cuda:
grad_mat[index] = (dy_ij / row_norm) - ((x_ij / row_norm_3) * row_sum);

in C++:
auto square = inputMat.square();
  auto norm = square.colwise().sum().sqrt();
  gradInMat = gradOutMat.rowwise() * norm.inverse() -
      ((inputMat.rowwise() / norm.pow(3)).rowwise() *
       (gradOutMat * inputMat).colwise().sum());
gradient/||gradient|| sum of squares, square root
248 PyTorch 9a153412fd4f78b9a9b59bbf85a358339fb69613 C++, Python, Cuda Fix underflow Rsample offers a reparametrization trick, where the parameterized random variable can be constructed via a parameterized deterministic function of a parameter-free random variable. The reparameterized sample therefore becomes differentiable.
sample and rsample both generate samples from the distribution, but only rsample supports differentiating through the sampler. You should use rsample whenever you need to compute gradients of distribution parameters with respect to functions of samples, e.g. in variational inference. SOURCE: https://forum.pyro.ai/t/sample-vs-rsample/2344
sample is literally rsample wrapped in with torch.no_grad(), so when you don't need gradients. SOURCE: https://github.com/cornellius-gp/gpytorch/issues/764
underflow issue in method rsample of dirichlet distribution class statistical distributions statistical distributions dirichlet distribution, sampling, forward pass use a different algorithm use a different algorithm adds a `torch._sample_dirichlet` method in `Distributions.cpp` def test_beta_underflow(self):
+        # For low values of (alpha, beta), the gamma samples can underflow
+        # with float32 and result in a spurious mode at 0.5. To prevent this,
+        # torch._sample_dirichlet works with double precision for intermediate
+        # calculations.
+        set_rng_seed(1)
+        num_samples = 50000
+        for dtype in [torch.float, torch.double]:
+            conc = torch.tensor(1e-2, dtype=dtype)
+            beta_samples = Beta(conc, conc).sample([num_samples])
+            self.assertEqual((beta_samples == 0).sum(), 0)
+            self.assertEqual((beta_samples == 1).sum(), 0)
+            # assert support is concentrated around 0 and 1
+            frac_zeros = float((beta_samples < 0.1).sum()) / num_samples
+            frac_ones = float((beta_samples > 0.9).sum()) / num_samples
+            self.assertEqual(frac_zeros, 0.5, 0.05)
+            self.assertEqual(frac_ones, 0.5, 0.05)
+
+    @unittest.skipIf(not TEST_CUDA, "CUDA not found")
+    def test_beta_underflow_gpu(self):
+        set_rng_seed(1)
+        num_samples = 50000
+        conc = torch.tensor(1e-2, dtype=torch.float64).cuda()
+        beta_samples = Beta(conc, conc).sample([num_samples])
+        self.assertEqual((beta_samples == 0).sum(), 0)
+        self.assertEqual((beta_samples == 1).sum(), 0)
+        # assert support is concentrated around 0 and 1
+        frac_zeros = float((beta_samples < 0.1).sum()) / num_samples
+        frac_ones = float((beta_samples > 0.9).sum()) / num_samples
+        # TODO: increase precision once imbalance on GPU is fixed.
+        self.assertEqual(frac_zeros, 0.5, 0.12)
+        self.assertEqual(frac_ones, 0.5, 0.12)
249 PyTorch 74819087de17de4c8215a7f631d8d4d18dd13d45 C++ Fix inefficient algorithm Mixed precision training with DDP (distributed data parallelization) randomly hangs. The reason for that is that take_tensors will generate a list of bucketed tensors in an undeterministic order, because the key to the map is a pointer. non-standard precision non-standard precision distributed data parallelization, mixed precision use a different algorithm use a different algorithm use map instead of unordered map  to generate an ordered list of bucketed tensors for parallel training std::unordered_map<at::Type*, TensorGroup> groups; std::map<TypeID, TensorGroup> groups;
250 PyTorch 73bdb661feb195a8b98366db5750b998c025f709 Python Unit test loss of precision BCELoss's outputs and gradInput computations are accurate to around 1e-6 on float types (as a relative value, not absolute), which is reasonable. However, the tests use absolute thresholds: the accumulation of 5 gradInputs has to have error less than 0.0002. loss functions loss functions binary cross entropy loss, testing precision rewrite math formula rewrite math formula restrict input to [0.028, 1- 0.028]  instead of [0.02, 1- 0.02] to decrease error

The worse case for BCELoss's gradInput for each element may be described as 1 / ( (1-x) * x ). Previously, the input to the test was restricted to [0.02, 1- 0.02], resulting in worse-case largest gradInput of 50, resulting in a total accumulated grad of 50*5 = 250, resulting in an error of 250 * 1e-6 = 0.00025, which was too big.
    By restricting x to [0.028, 1- 0.028] we get a worse case of 36.74, resulting in a total accumulated grad of 184, which is less than the 200 needed to have error less than 0.0002.
input_fn=lambda: torch.rand(15, 10).clamp_(2e-2, 1 - 2e-2) input_fn=lambda: torch.rand(15, 10).clamp_(2.8e-2, 1 - 2.8e-2),
251 PyTorch 912ee4e40a9f2f2f156e94a76a521d3ed4f49bd0 Python Unit test loss of precision failing unit test linear algebra linear algebra testing, sparse to dense matrix conversion rewrite math formula rewrite math formula elements=st.floats(min_value=0.5, max_value=10), dtype=dt))
D = np.random.uniform(0, 1, size=(first_dim,) + X.shape[1:])
elements=st.floats(min_value=0, max_value=1), dtype=dt))
D = np.zeros((first_dim,) + X.shape[1:])
252 PyTorch b1fa9d2b06714de099e3ae1141d15dcbaba78dd3 C Fix overflow THFile is for loading data from disk or memory, but this is no longer part of PyTorch
data processing data data loading increase variable precision/change variable type increase variable precision, add overflow check increase precision to long, add logic to check that smaller than long max