How to multiply double (floating point) by an integer type (32-bit, 64-bit, 128-bit, etc.) manually

Chris Delpire

I am trying to manually implement multiplication between a double and a 128 bit integer that I have created myself using two ulongs.

My understanding is as follows:
1. Decompose the double into it's significand and exponent. Ensuring the significand is normalized.
2. Multiply the significand and my uint128. This will give me at 256 bit number.
3. Shift my 256 bit number by exponent extracted from the double.
4. If the value is over 128 bits, then I overflowed.

I feel like I am incredibly close, but I am missing something. Lets say I have the following example. I am storing a uint128 with the value 2^127 and I want to multiply it by 8E-6.

uint128 myValue = new uint128(2^127);
double multiplier = 8E-6;
uint128 product = myValue * multiplier;

The real value or correct answer is 1361129467683753853853498429727072.845824. So I would like to get the value 1361129467683753853853498429727072 as my 128-bit integer.

The problem is my implementation is giving me 1361129467683753792259819967610881.

int exponent; // This value ends up being -69 for 8E-6
uint128 mantissa = GetMantissa(multiplier, out exponent); // This value ends up being 4722366482869645 after normalizing it.
uint256 productTemp = myValue * mantissa; // This value is something like 803469022129495101412490705402148357126451442021826560.
uint128 product = productTemp >> exponent. // this value is 1361129467683753792259819967610881

I am using this code from extracting mantissa and exponent from double in c# to get my mantissa and exponent. I can use those values to correctly get 8E-6 back as a double.

Does anyone know what I am getting wrong here? If I using .8 instead of 8E-6 my values are better.

chux - Reinstate Monica

what I am getting wrong here?

double multiplier does not have the arithmetic value of 0.000008. It have a dyadic value near 0.000008, to 15-17 significant decimal places. That difference accounts for not meeting your expectation.

1234567890123456
1361129467683753 853853498429727072.845824 - perceived product
1361129467683753 853853498429727072        - perceived rounded product
1361129467683753 792259819967610881        - product seen.

Try multiplier with an exact value in decimal like 0.0625 (1.0/16).


Notes:

With binary64, the closest double to 8E-6 is (@Patricia Shanahan) 0.000007999999999999999637984894607090069484911509789526462554931640625.

Multiplying that by 2127 is exactly

1361129467683753 792259819967610880.0

So the multiplication appears to be off-by-one, perhaps rounding?

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

IEEE-754 Double (64-bit floating point) vs. Long (64-bit Integer) Revisited

"Simulating" a 64-bit integer with two 32-bit integers

How does Rust's 128-bit integer `i128` work on a 64-bit system?

How can I multiply 64 bit operands and get 128 bit result portably?

32 bit unsigned multiply on 64 bit causing undefined behavior?

Floating point differences between 64 bit and 32 bit with Round

32-bit floating point math in 64-bit Python

How to create a 128-bit integer literal

Compile for 32 bit or 64 bit

How can I instruct the MSVC compiler to use a 64bit/32bit division instead of the slower 128bit/64bit division?

Is it safe to point to 32 bits of a 64 bit number?

Handling a quadruple precision floating point (128-bit) number in java

how a 32 bit processor process 64 bit double value?

Floating point 'error' is php 64bit

How to multiply 32-bit integers in c

32 bit vs 64 bit

Fastest way to multiply two 64-bit ints to 128-bit then >> to 64-bit?

C 128 bit double type

How to put 32-bit signed integer into higher 32 bits of 64-bit unsigned integer?

How to add 2^63 to a signed 64-bit integer and cast it to a unsigned 64-bit integer without using 128-bit integer in the middle

How to convert/store a double precision (64bit) number to a 32 bit float or a UInt16 in javascript

Native support for 64 bit floating point in Javascript

Represent 128 bit integer as two 64 bit integers in C++

Convert 32bit floating-point number to 64bit uint64_t

Byte array to 32 bit floating point in Julia

shift count >= width of type warning for 64 bit but not 32 bit

128 bit floating point binary representation error

How to turn a double to a 64-bit integer bit by bit using unions

How can I extend my code do divide by a 64bit integer? (128bit / 64bit)

TOP Ranking

  1. 1

    Failed to listen on localhost:8000 (reason: Cannot assign requested address)

  2. 2

    How to import an asset in swift using Bundle.main.path() in a react-native native module

  3. 3

    Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

  4. 4

    pump.io port in URL

  5. 5

    Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

  6. 6

    BigQuery - concatenate ignoring NULL

  7. 7

    ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

  8. 8

    Do Idle Snowflake Connections Use Cloud Services Credits?

  9. 9

    maven-jaxb2-plugin cannot generate classes due to two declarations cause a collision in ObjectFactory class

  10. 10

    Compiler error CS0246 (type or namespace not found) on using Ninject in ASP.NET vNext

  11. 11

    Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

  12. 12

    Generate random UUIDv4 with Elm

  13. 13

    Jquery different data trapped from direct mousedown event and simulation via $(this).trigger('mousedown');

  14. 14

    Is it possible to Redo commits removed by GitHub Desktop's Undo on a Mac?

  15. 15

    flutter: dropdown item programmatically unselect problem

  16. 16

    Change dd-mm-yyyy date format of dataframe date column to yyyy-mm-dd

  17. 17

    EXCEL: Find sum of values in one column with criteria from other column

  18. 18

    Pandas - check if dataframe has negative value in any column

  19. 19

    How to use merge windows unallocated space into Ubuntu using GParted?

  20. 20

    Make a B+ Tree concurrent thread safe

  21. 21

    ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

HotTag

Archive