Re: [PHP-DEV] Proposal: Arbitrary precision native scalar type

On Thu, 7 Dec 2023, Alex Pravdin wrote:

Accounting for all of the above, I suggest adding a native numeric
scalar arbitrary precision type called "decimal". Below are the
preliminary requirements for implementation.

Adding a new native type to PHP will create a large change. Not only is
it "just" adding a new native type, it also means all of the conversions
between types need to be added. This is not a small task.

Decimal values can be created from literals by specifying a modifier or using
the (decimal) typecast:

$v = 0.2d;
$v = (decimal) 0.2; // Creates a decimal value without intermediary float

It uses the precision and scale defined in php.ini.

If you want to use arbitrary precision natives, then a precision and
scale as defined in php.ini defeats the purpose. Every installation can
then potentially calculate things in a different way.

The only way how to prevent that, is to have *actual* Decimal type, such
as the Decimal type in MongoDB uses (the IEEE 754 decimal128 type):

- https://www.mongodb.com/docs/mongodb-shell/reference/data-types/#std-label-shell-type-decimal
- decimal128 floating-point format - Wikipedia

cheers,
Derick

--
https://derickrethans.nl | https://xdebug.org | https://dram.io

Author of Xdebug. Like it? Consider supporting me: Xdebug: Support

mastodon: @derickr@phpc.social @xdebug@phpc.social

On Tue, Apr 9, 2024 at 7:52 PM Derick Rethans <derick@php.net> wrote:

Adding a new native type to PHP will create a large change. Not only is
it "just" adding a new native type, it also means all of the conversions
between types need to be added. This is not a small task.

I understand this :slight_smile:

If you want to use arbitrary precision natives, then a precision and
scale as defined in php.ini defeats the purpose. Every installation can
then potentially calculate things in a different way.

The only way how to prevent that, is to have *actual* Decimal type, such
as the Decimal type in MongoDB uses (the IEEE 754 decimal128 type):

- https://www.mongodb.com/docs/mongodb-shell/reference/data-types/#std-label-shell-type-decimal
- decimal128 floating-point format - Wikipedia

If PHP core experts think that 128-bit decimal will cover the vast
majority of cases and is worth implementing, I'm totally for it. If we
can implement something standardized then fine. The current thread is
not an RFC candidate, but a kinda discussion board to formulate the
design principles and strategy.

On another note, is it possible to make zval variable size - 64 or 128
bits? So the 128-bit decimal can be a struct that will be held in the
stack instead of pointer manipulations. Can it be achieved with the
help of macroses?

--
Best, Alexander

Hello everyone. To continue the discussion, I'm suggesting an updated
version of my proposal. The main change is: to use decimal128 as the
backend for native decimals, so no 3rd-party libraries are required.
Even if adopting a library, it looks like it'll be much easier than
the previous proposal. Also, the principle "if any operand is decimal,
the result is decimal" is adopted, and conversion requirements are
relaxed and more specific.

Decimal values can be created from literals by specifying a modifier
or using the (decimal) typecast:

$v = 0.2d;
$v = (decimal) 0.2; // Creates a decimal value without intermediary float

Native decimals are backed by 128-bit decimals according to IEEE
754-2008. Point to discuss: can we use 128-bit zvals or transpile
decimals into a pair of 64-bit zvals representing lo and hi portions?

Decimal to float conversion is allowed and smooth:

function f (float $value) {}
f(0.2);
f(0.2d); // allowed with no warnings or errors

Function "str_to_decimal" added to convert from string representation
of numbers to decimals.

Typecast from string to decimal works the same as the "str_to_decimal" function.

The function "float_to_decimal" is added to explicitly convert floats
to decimals. Internally, it performs explicit float to string
conversions using existing defaults and also accepts an optional
parameter to define the number of fractional digits to round to. After
converting a float to a string, it converts the string to a decimal.
Since the main problem of float to decimal conversion is that we don't
know the exact result until we use some rounding when transforming it
to a human-readable format, it looks like the step of the conversion
to a string is inevitable. Any optimized algorithms are welcome.

Explicit type cast "(decimal) float" is the same as using
float_to_decimal with defaults.

Implicit conversion from float to decimal with type juggling works the
same as "float_to_decimal" with defaults but throws a warning to
encourage users to use explicit conversion instead: "Implicit float to
decimal conversion may cause unexpected results due to possible
rounding errors".

With strict types, implicit conversion is not possible and generates a
TypeError.

Literal numbers in the code are compiled to floats by default. If
prepended by the "(decimal)" typecast, or the "d" modifier is used
(0.2d) the decimal is produced by the compiler without an intermediary
float or string.

New declare directive "default_decimal" is added. When used, literals
and math operations return decimal by default instead of float. This
is to simplify creating source files working with decimals only. With
default_decimal, fractional string literals are compiled into
decimals.

Without default_decimal:
$var = 5 / 2; // returns float 2.5
$a = 3.02; // compiled to float

With default_decimal:
$var = 5 / 2; // returns decimal 2.5
$a = 3.02; // compiled to decimal

I understand that this point is controversial, so it can be postponed
and decided later.

The (decimal) typecast applied to a math operation produces a decimal
result without intermediary float by converting all operands to
decimals before calculating:

$var = 5 / 2; // returns float 2.5
$var = (decimal)(5 / 2); // returns decimal 2.5
$a = 5;
$b = 2;
$var = (decimal)($a / $b); // returns decimal 2.5

If any of the math operation operands are decimal, the result is
decimal. Floats are converted to decimals with implicit conversion
rules mentioned above:

Without strict types:
$f = (float) 0.2;
$d = (decimal) 0.2;
$r = $f + $d; // produces a warning about implicit conversion, returns decimal
$r = (decimal)$f + $d; // works with no warnings

With strict types:
$f = (float) 0.2;
$d = (decimal) 0.2;
$r = $f + $d; // produces TypeError
$r = (decimal)$f + $d; // works with no warnings

All builtin functions that currently accept float also accept decimal.
So users don't need to care about separate function sets, and PHP
developers don't need to maintain separate sets of functions. If any
of the parameters is decimal, they return decimal. Float parameters
are converted to decimals implicitly according to the conversion rules
mentioned above.

I hope this version of the design is closer to being accepted by the
community. Please share your thoughts.

--
Best,
Alexander

--
Best regards,
Alex Pravdin
Interico

On Wed, Apr 10, 2024 at 12:55 AM Alexander Pravdin
<alex.pravdin@interi.co> wrote:

On Tue, Apr 9, 2024 at 7:52 PM Derick Rethans <derick@php.net> wrote:

> Adding a new native type to PHP will create a large change. Not only is
> it "just" adding a new native type, it also means all of the conversions
> between types need to be added. This is not a small task.

I understand this :slight_smile:

> If you want to use arbitrary precision natives, then a precision and
> scale as defined in php.ini defeats the purpose. Every installation can
> then potentially calculate things in a different way.
>
> The only way how to prevent that, is to have *actual* Decimal type, such
> as the Decimal type in MongoDB uses (the IEEE 754 decimal128 type):
>
> - https://www.mongodb.com/docs/mongodb-shell/reference/data-types/#std-label-shell-type-decimal
> - decimal128 floating-point format - Wikipedia

If PHP core experts think that 128-bit decimal will cover the vast
majority of cases and is worth implementing, I'm totally for it. If we
can implement something standardized then fine. The current thread is
not an RFC candidate, but a kinda discussion board to formulate the
design principles and strategy.

On another note, is it possible to make zval variable size - 64 or 128
bits? So the 128-bit decimal can be a struct that will be held in the
stack instead of pointer manipulations. Can it be achieved with the
help of macroses?

--
Best, Alexander

On Sat, Apr 27, 2024 at 11:04 PM Alexander Pravdin <alex.pravdin@interi.co> wrote:

All builtin functions that currently accept float also accept decimal.
So users don’t need to care about separate function sets, and PHP
developers don’t need to maintain separate sets of functions. If any
of the parameters is decimal, they return decimal. Float parameters
are converted to decimals implicitly according to the conversion rules
mentioned above.

So, as I mentioned months ago, this is the reason that having actually looked into implementing things like this, I was interested in using a library. Proposing this is fine. But doing a fully custom implementation that includes this? You’re going to implement sin and cos and atan for 128-bit decimals? When we could use an open source library that has a compatible license instead and is proven to work for these already instead, likely with better performance as well?

This is likely to be more work than doing a type backed by a library while also being less capable.

I know that your shift in proposal here is not aimed at me, and also I’m not a voter so in that sense it doesn’t matter. But if this is what the proposal ends up being, I’ll probably just continue on the research for an actual arbitrary precision implementation based on MPFR instead of helping with this implementation.

Jordan

On 28 April 2024 07:02:22 BST, Alexander Pravdin <alex.pravdin@interi.co> wrote:

Hello everyone. To continue the discussion, I'm suggesting an updated
version of my proposal.

This all sounds very useful ... but it also sounds like several months of full-time expert development.

Before you begin, I think it will be really important to define clearly what use cases you are trying to cater for, and who your audience is. Only then can you define a minimum set of requirements and goals.

It seems to me that the starting point would be an extension with a decimal type as an object, and implementations for all the operations you want to support. You'll probably want to define that more clearly than "anything in the language which takes a float".

What might seem like it would be the next step is converting the object to a "native type", by adding a new case to the zval struct. Not only would this require a large amount of work to start with, it would have an ongoing impact on everyone working with the internals.

I think a lot of the benefits could actually be delivered without it, and as separate projects:

- Optimising the memory performance of the type, using copy-on-write semantics rather than eager cloning. See Gina's recent thread about "data classes".

- Overloading existing functions which accept floats with decimal implementations. Could potentially be done in a similar way to operator overloads and special interfaces like Countable.

- Convenient syntax for creating decimal values, such as 0.2d, declare(default_decimal), or having (decimal) casts affecting the tree of operations below them rather than just the result. This just needs the type to be available to the compiler, not a new zval type - for instance, anonymous function syntax creates a Closure object.

There may be other parts I've not mentioned, but hopefully this illustrates the idea that "a native decimal type" doesn't have to be one all-or-nothing project.

Regards,
Rowan Tommins
[IMSoP]

On Sun, Apr 28, 2024 at 11:36 AM Rowan Tommins [IMSoP]
<imsop.php@rwec.co.uk> wrote:

On 28 April 2024 07:02:22 BST, Alexander Pravdin <alex.pravdin@interi.co> wrote:
>Hello everyone. To continue the discussion, I'm suggesting an updated
>version of my proposal.

This all sounds very useful ... but it also sounds like several months of full-time expert development.

Before you begin, I think it will be really important to define clearly what use cases you are trying to cater for, and who your audience is. Only then can you define a minimum set of requirements and goals.

It seems to me that the starting point would be an extension with a decimal type as an object, and implementations for all the operations you want to support. You'll probably want to define that more clearly than "anything in the language which takes a float".

What might seem like it would be the next step is converting the object to a "native type", by adding a new case to the zval struct. Not only would this require a large amount of work to start with, it would have an ongoing impact on everyone working with the internals.

I think a lot of the benefits could actually be delivered without it, and as separate projects:

- Optimising the memory performance of the type, using copy-on-write semantics rather than eager cloning. See Gina's recent thread about "data classes".

- Overloading existing functions which accept floats with decimal implementations. Could potentially be done in a similar way to operator overloads and special interfaces like Countable.

- Convenient syntax for creating decimal values, such as 0.2d, declare(default_decimal), or having (decimal) casts affecting the tree of operations below them rather than just the result. This just needs the type to be available to the compiler, not a new zval type - for instance, anonymous function syntax creates a Closure object.

There may be other parts I've not mentioned, but hopefully this illustrates the idea that "a native decimal type" doesn't have to be one all-or-nothing project.

Regards,
Rowan Tommins
[IMSoP]

I'm not so sure this could be implemented as an extension, there just
isn't the right hooks for it.

Robert Landers
Software Engineer
Utrecht NL

On 28 April 2024 07:47:40 GMT-07:00, Robert Landers <landers.robert@gmail.com> wrote:

I'm not so sure this could be implemented as an extension, there just
isn't the right hooks for it.

The whole point of my email was that "this" is not one single feature, but a whole series of them. Some of them can be implemented as an extension right now; some could be implemented as an extension by adding more hooks which would also be useful for other extensions; some would need changes to the core of the language.

If the aim is "everything you could possibly want in a decimal type", it certainly can't be an extension; if the aim is "better support for decimals", then it possibly can.

Regards,
Rowan Tommins
[IMSoP]

I think setting some expectations in the proper context is warranted here.

  1. Would a native decimal type be good for the language? I would say we probably are not going to find many if any people who would be against it.
  2. Is there a need for it? Well, the whole world of e-commerce, accounting and all kinds of business systems that deal with money in PHP world do not leave any room for doubt - https://packagist.org/?query=money . The use case is right there :slight_smile:
  3. In most cases people need decimal precision math in the bounds of what the decimal128 standard provides. Most of us just do not want the float drift and while a signed 64-bit integer is big, it has it’s limitations and a need for edge layer transformations be it presentation, API endpoints or storing it in the database or other storage mediums.
  4. Is it a lot of engine work? Yes, yes it is. Is it worth it? I think yes, especially if we get buying from most of the active maintainers and get a project going for it. This is not going to be the first or last big engine project. But this might warrant a PHP 9 release in the end :slight_smile:
  5. But BCMath/GMP/etc!!! Well, extensions are optional. They are also not as fast and they deal with strings. They are not as fast and you will have to rely on that extension’s methods that most math functions are implemented and all that stuff. Frankly, I never had a use case where BCMath was not an overkill. And doing number crunching with BCMath is just slow due to them being strings internally. The use cases are just different. They solve a different problem than a decimal128 does for the language.

I think all the discussions on the subject have shown that BCMath RFC is it’s own thing and adding a decimal type to the PHP language/engine is it’s own thing. They are not mutually exclusive and solve different problems.

···

Arvīds Godjuks+371 26 851 664
arvids.godjuks@gmail.com
Telegram: @psihius https://t.me/psihius

On 30 April 2024 11:16:20 GMT-07:00, Arvids Godjuks <arvids.godjuks@gmail.com> wrote:

I think setting some expectations in the proper context is warranted here.

1. Would a native decimal type be good for the language? I would say we
probably are not going to find many if any people who would be against it.

As I said earlier, I don't think that's the right question, because "adding a native type" isn't a defined process. Better questions are: Should a decimal type be always available? Does a decimal type need special features to maximise performance? Should we have special syntax for a decimal type? What functions should support a decimal type, or have versions which do?

2. Is there a need for it? Well, the whole world of e-commerce, accounting
and all kinds of business systems that deal with money in PHP world do not
leave any room for doubt - Packagist . The use
case is right there :slight_smile:

That's a great example - would a decimal type make those libraries redundant? Probably not - they provide currency and rounding facilities beyond basic maths. Would those libraries benefit from an always-available, high-performance native type? Certainly.

Would they benefit from it having strong integration into the syntax and standard library of the language? Not really; there's a small amount of actual code dealing with the values.

4. Is it a lot of engine work?

Only if we go for the maximum ambition, highly integrated into the language.

Is it worth it?

I'm actually not convinced.

5. But BCMath/GMP/etc!!! Well, extensions are optional.

Extensions are only optional if we decide they are. ext/json used to be optional, but now it's always-on.

They are also not as fast and they deal with strings.

Not as fast as what? If someone wants to make an extension around a faster library, they can. And only BCMath acts directly on strings; other libraries use text input to create a value in memory - whether that's a PHP string or a literal provided by the compiler doesn't make much difference.

I absolutely think there are use cases for decimal types and functions; but "I want a faster implementation" and "I want to add a new fundamental type to the language, affecting every corner of the engine" are very different things.

Regards,
Rowan Tommins
[IMSoP]