[PHP-DEV] Native decimal scalar support and object types in BcMath - do we want both?

Hello,

There are currently two proposals being discussed - native decimal scalar type support and Support object type in BCMath

I’ve been getting involved in the discussion for the BCMath proposal, but not paying as much attention to the native decimal thread.

But these seem like very similar things, so I’m wondering whether or not it makes sense to do both at once. They both seem like ways to represent and calculate with arbitrary precision decimal numbers.

I’m not sure if they have distinct use cases. Are there some tasks where people would likely prefer one, and different tasks where they would prefer the other? Or should PHP internals choose just one of these options instead of potentially releasing both? It doesn’t seem like a good idea to have two directly competing features for the same use case in one PHP release, unless there’s a reason to favor each one in a different situation.

Best wishes,

Barney

Hi Barney,

There are currently two proposals being discussed - native decimal scalar type support and Support object type in BCMath

I’ve been getting involved in the discussion for the BCMath proposal, but not paying as much attention to the native decimal thread.

But these seem like very similar things, so I’m wondering whether or not it makes sense to do both at once. They both seem like ways to represent and calculate with arbitrary precision decimal numbers.

I’m not sure if they have distinct use cases. Are there some tasks where people would likely prefer one, and different tasks where they would prefer the other? Or should PHP internals choose just one of these options instead of potentially releasing both? It doesn’t seem like a good idea to have two directly competing features for the same use case in one PHP release, unless there’s a reason to favor each one in a different situation.

(I’m the proposer on the BCMath thread, so my opinion may be a bit biased.)

The “areas” being discussed are certainly close. However, I believe that the goals of the proposals and the time and effort required to realize them will vary greatly.

Regards.

Saki

On Sat, Apr 6, 2024 at 4:07 AM Barney Laurance <barney@redmagic.org.uk> wrote:

Hello,

There are currently two proposals being discussed - native decimal scalar type support and Support object type in BCMath

I’ve been getting involved in the discussion for the BCMath proposal, but not paying as much attention to the native decimal thread.

But these seem like very similar things, so I’m wondering whether or not it makes sense to do both at once. They both seem like ways to represent and calculate with arbitrary precision decimal numbers.

I’m not sure if they have distinct use cases. Are there some tasks where people would likely prefer one, and different tasks where they would prefer the other? Or should PHP internals choose just one of these options instead of potentially releasing both? It doesn’t seem like a good idea to have two directly competing features for the same use case in one PHP release, unless there’s a reason to favor each one in a different situation.

Best wishes,

Barney

The scalar arbitrary precision discussion is for an implementation that would be in the range of 100x to 1000x faster than BCMath. No matter what improvements are made to BCMath, there will still be strong arguments for it, and until someone actually puts together an RFC, the BCMath library is the only thing around.

Internals is just volunteers. The people working on BCMath are doing that because they want to, the people working on scalar decimal stuff are doing that because they want to, and there’s no project planning to tell one group to stop. That’s not how internals works (to the extent it works).

Jordan

On 07/04/2024 11:07, Rowan Tommins [IMSoP] wrote:

On 7 April 2024 01:32:29 BST, Jordan LeDoux<jordan.ledoux@gmail.com> wrote:

Internals is just volunteers. The people working on BCMath are doing that
because they want to, the people working on scalar decimal stuff are doing
that because they want to, and there's no project planning to tell one
group to stop. That's not how internals works (to the extent it works).

I kind of disagree. You're absolutely right the detailed effort is almost always put in by people working on things that interest them, and I want to make clear up front that I'm extremely grateful to the amount of effort people do volunteer, given how few are paid to work on any of this.

However, the goal of the Internals community as a whole is to choose what changes to make to a language which is used by millions of people. That absolutely involves project planning, because there isn't a marketplace of PHP forks with different competing features, and once a feature is added it's very hard to remove it or change its design.

If - and I stress I'm not saying this is true - IF these two features have such an overlap that we would only want to release one, then we shouldn't just accept whichever is ready first, we should choose which is the better solution overall. And if that was the case, why would we wait for a polished implementation of both, then tell one group of volunteers that all their hard work had been a waste of time?

So I think the question is very valid: do these two features have distinct use cases, such that even if we had one, we would still want to spend time on the other? Or, should we decide a strategy for both groups to work together towards a single goal?

That's not about "telling one group to stop", it's about working together for the benefit of both users and the people volunteering their effort, to whom I am extremely grateful.

Yes, I was going to say the same thing as Rowan. But also Jordan has shown that there's at least one advantage to each proposal - one would be much more performant, one would might be releasable a lot sooner. That's a possible reason to keep both.

On 7 April 2024 01:32:29 BST, Jordan LeDoux <jordan.ledoux@gmail.com> wrote:

Internals is just volunteers. The people working on BCMath are doing that
because they want to, the people working on scalar decimal stuff are doing
that because they want to, and there's no project planning to tell one
group to stop. That's not how internals works (to the extent it works).

I kind of disagree. You're absolutely right the detailed effort is almost always put in by people working on things that interest them, and I want to make clear up front that I'm extremely grateful to the amount of effort people do volunteer, given how few are paid to work on any of this.

However, the goal of the Internals community as a whole is to choose what changes to make to a language which is used by millions of people. That absolutely involves project planning, because there isn't a marketplace of PHP forks with different competing features, and once a feature is added it's very hard to remove it or change its design.

If - and I stress I'm not saying this is true - IF these two features have such an overlap that we would only want to release one, then we shouldn't just accept whichever is ready first, we should choose which is the better solution overall. And if that was the case, why would we wait for a polished implementation of both, then tell one group of volunteers that all their hard work had been a waste of time?

So I think the question is very valid: do these two features have distinct use cases, such that even if we had one, we would still want to spend time on the other? Or, should we decide a strategy for both groups to work together towards a single goal?

That's not about "telling one group to stop", it's about working together for the benefit of both users and the people volunteering their effort, to whom I am extremely grateful.

Regards,
Rowan Tommins
[IMSoP]

Hi Rowan,

On 7 April 2024 01:32:29 BST, Jordan LeDoux <jordan.ledoux@gmail.com> wrote:

Internals is just volunteers. The people working on BCMath are doing that
because they want to, the people working on scalar decimal stuff are doing
that because they want to, and there's no project planning to tell one
group to stop. That's not how internals works (to the extent it works).

I kind of disagree. You're absolutely right the detailed effort is almost always put in by people working on things that interest them, and I want to make clear up front that I'm extremely grateful to the amount of effort people do volunteer, given how few are paid to work on any of this.

However, the goal of the Internals community as a whole is to choose what changes to make to a language which is used by millions of people. That absolutely involves project planning, because there isn't a marketplace of PHP forks with different competing features, and once a feature is added it's very hard to remove it or change its design.

If - and I stress I'm not saying this is true - IF these two features have such an overlap that we would only want to release one, then we shouldn't just accept whichever is ready first, we should choose which is the better solution overall. And if that was the case, why would we wait for a polished implementation of both, then tell one group of volunteers that all their hard work had been a waste of time?

So I think the question is very valid: do these two features have distinct use cases, such that even if we had one, we would still want to spend time on the other? Or, should we decide a strategy for both groups to work together towards a single goal?

That's not about "telling one group to stop", it's about working together for the benefit of both users and the people volunteering their effort, to whom I am extremely grateful.

Regards,
Rowan Tommins
[IMSoP]

I don't think the two threads can be combined because they have different goals. If one side of the argument was, "How about to add BCMath?" then perhaps we should merge the discussion. But BCMath already exists and the agenda is to add an OOP API.

In other words, one is about adding new features, and the other is about improving existing features.

I agree that it would be wise to merge issues that can be merged.

Regards.

Saki

On 7 April 2024 11:44:22 BST, Saki Takamachi <saki@sakiot.com> wrote:

I don't think the two threads can be combined because they have different goals. If one side of the argument was, "How about to add BCMath?" then perhaps we should merge the discussion. But BCMath already exists and the agenda is to add an OOP API.

In other words, one is about adding new features, and the other is about improving existing features.

While I appreciate that that was the original aim, a lot of the discussion at the moment isn't really about BCMath at all, it's about how to define a fixed-precision number type. For instance, how to specify precision and rounding for operations like division. I haven't seen anywhere in the discussion where the answer was "that's how it already works, and we're not adding new features".

Is there anything in the proposal which would actually be different if it was based on a different library, and if not, should we be designing a NumberInterface which multiple extensions could implement? Then Jordan's search for a library with better performance could lead to new extensions implementing that interface, even if they have portability or licensing problems that make them awkward to bundle in core.

Finally, there's the separate discussion about making a new "scalar type". As I said in a previous email, I'm not really sure what "scalar" means in this context, so maybe "integrating the type more directly into the language" is a better description? That includes memory/copying optimisation (potentially linked to Ilija's work on data classes), initialisation syntax (which could be a general feature), and accepting the type in existing functions (something frequently requested for custom array-like types).

In other words, looking at how the efforts overlap doesn't have to mean abandoning one of them, it can mean finding how one can benefit the other.

Regards,
Rowan Tommins
[IMSoP]

Hi Rowan,

While I appreciate that that was the original aim, a lot of the discussion at the moment isn't really about BCMath at all, it's about how to define a fixed-precision number type. For instance, how to specify precision and rounding for operations like division. I haven't seen anywhere in the discussion where the answer was "that's how it already works, and we're not adding new features".

Is there anything in the proposal which would actually be different if it was based on a different library, and if not, should we be designing a NumberInterface which multiple extensions could implement? Then Jordan's search for a library with better performance could lead to new extensions implementing that interface, even if they have portability or licensing problems that make them awkward to bundle in core.

Finally, there's the separate discussion about making a new "scalar type". As I said in a previous email, I'm not really sure what "scalar" means in this context, so maybe "integrating the type more directly into the language" is a better description? That includes memory/copying optimisation (potentially linked to Ilija's work on data classes), initialisation syntax (which could be a general feature), and accepting the type in existing functions (something frequently requested for custom array-like types).

In other words, looking at how the efforts overlap doesn't have to mean abandoning one of them, it can mean finding how one can benefit the other.

I agree that the essence of the debate is as you say.
However, an argument must always reach a conclusion based on its purpose, and combining two arguments with different purposes can make it unclear how to reach a conclusion.

If we were to merge these two debates, what should be on our agenda? It would probably be reasonable to have a limited joint discussion on the common point between the two arguments, namely, "how to express numbers," and then return to each of their own arguments.

However, it is not desirable for the venue for discussion to change depending on the content of the discussion, so I think it will be difficult to integrate them.

Your hope is probably that by combining the discussions, better ideas will emerge. IMHO, that should really be a "new discussion", perhaps in this thread where we're talking now.

Regards.

Saki

On 7 April 2024 15:38:04 BST, Saki Takamachi <saki@sakiot.com> wrote:

In other words, looking at how the efforts overlap doesn't have to mean abandoning one of them, it can mean finding how one can benefit the other.

I agree that the essence of the debate is as you say.
However, an argument must always reach a conclusion based on its purpose, and combining two arguments with different purposes can make it unclear how to reach a conclusion.

Well, that's the original question: are they actually different purposes, from the point of view of a user?

I just gave a concrete suggestion, which didn't involve "combining two arguments", it involved splitting them up into three projects which all complement each other.

It feels like both you and Jordan feel the need to defend the work you've put in so far, which is a shame; as a neutral party, I want to benefit from *both* of your efforts. It really doesn't matter to me how many mailing list threads that requires, as long as there aren't two teams making conflicting designs for the same feature.

Regards,
Rowan Tommins
[IMSoP]

On 2024-04-08 12:17, Arvids Godjuks wrote:

Why not have decimal be represented as 2 64-bit ints at the engine level

Just to clarify this point, what's the formula to convert back and forth between a decimal and two integers? Are you thinking like scientific notation: decimal = coefficient * 10^exponent.

64 bits for the exponent seems excessive, and it might be nice to have more for the coefficient but maybe that doesn't matter too much.

Hello everyone, I’ve been following the discussion threads and forming my own opinion to share since I have done a bunch of financial stuff throughout my career: I did the integers only at the application level and DECIMAL(20,8) in the database due to handling Bitcoin, Litecoin, etc.

My feeling on the discussion is that it got seriously sidetracked from the core tenet of what is actually needed from the PHP engine/core to discussing developing/upgrading a library that can handle money, scientific calculations, yada yada yada. Basically, in my view, the discussion has been catastrophically scope-creeped to a point where nobody could agree on anything and discuss things that were irrelevant to the initial scope.

To me, the BCMath library stuff is just that - a BCMath library. It’s a tool that can handle any size number. It’s a specialized tool. And, frankly, for the vast majority of use cases, it’s complete overkill.

If we are talking about implementing a Decimal type into the language as a first-class citizen alongside int/float/etc, do we really need it to handle numbers outside 64-bit space? Ints have a size limit (64 bits), floats also have a defined range. Why not have decimal be represented as 2 64-bit ints at the engine level, and similarly to floating-point numbers, you can have a php.ini setting where you can define the precision you want? Floats have a default of 14 positions. Why not have a setting that defines precision for the decimal type, set it to the same 14 positions, and you can have a decimal type that has 114 bits for the integer part and 14 bits for the floating-point part? In the vast majority of cases, for all practical intents and purposes, that would be enough. For the rest - you have ext/decimal, BCMath, GMP extensions (and by all means, improve those as much as needed, make them as powerful as Python’s math libs). This approach has some major benefits: if done right, it’s just another type that is compatible with float, but does integer precision math, and having the precision of 14 in the vast majority of needs is basically overkill already. Ideally, you should be able to just replace your float and ints with decimal type hints and just do the roundings/formatting via the usual means of round/ceil/floor/number_format. Normal math just works. If any part of the expression has a decimal type, the result is a decimal number. The only sticking point I see is how to define a decimal type variable since we do not have var/let for/const; we can only define types on class properties and their constants. Do we add a function decimal(int|float $value): decimal? Or do we need to do prep work to be able to define variables with type? Another idea I have is to just do $decimal = (decimal)10.054 when instantiating a variable. Actually, that’s not that uncommon to do it like that already when you want to ensure the result is of a certain type, PSL library does a lot of that and I do quite like it.

Long story short, give people a tool that’s simple and works, things like scale and all that stuff we can just handle in userland code ourselves because everyone has different needs, different scales, and so on. It’s the same as right now with integers - if you require an integer bigger than 64 bits, you use GMP/BCMath/etc. You are also not going to have fun with databases and PDO because there are going to be some shenanigans there too. Basically, at that point, you are running against various other PHP engine limitations and when software has to be written with those considerations in mind anyway in literally any language to begin with. Some are easier than others.

Sorry for it being a bit long, I’m happy to clarify/expand on any parts you have questions about.

···

Arvīds Godjuks+371 26 851 664
arvids.godjuks@gmail.com
Telegram: @psihius https://t.me/psihius

I was thinking of no exponents, just a straightforward integer representation for the fractional part, 14 digits long (48 bits). Taking two 64-bit numbers and combining them into a single 128-bit value would give us a range of “-604,462,909,807,314,587,353,088” to “604,462,909,807,314,587,353,087” for the integer part (80 bits) and “281,474,976,710,655” for the unsigned integer for fractions (48 bits). With this, we can achieve 14 digits of precision without any problem. I would say these numbers are sufficiently large to realistically cover most scenarios that the vast majority of us, PHP developers, will ever encounter. For everything else, extensions that handle arbitrary numbers exist. :slight_smile:

The ini setting I was considering would function similarly to what it does for floats right now - I assume it changes the exponent, thereby increasing their precision but reducing the integer range they can cover. The same adjustment could be applied to decimals if people really need to tweak those ini settings (I’ve never seen anyone change that from the default in 20 years, but hey, I’m sure someone out there does and needs it).

···

Arvīds Godjuks+371 26 851 664
arvids.godjuks@gmail.com
Telegram: @psihius https://t.me/psihius

On Mon, 8 Apr 2024, at 13:42, Arvids Godjuks wrote:

The ini setting I was considering would function similarly to what it does for floats right now - I assume it changes the exponent, thereby increasing their precision but reducing the integer range they can cover.

If you’re thinking of the “precision” setting, it doesn’t do anything nearly that clever; it’s purely about how many decimal digits should be displayed when converting a binary float value to a decimal string. In recent versions og PHP, it has a “-1” setting that automatically does the right thing in most cases. https://www.php.net/manual/en/ini.core.php#ini.precision

The other way around - parsing a string to a float, including when compiling source code - has a lot of different compile-time options, presumably to optimise on different platforms; but no user options at all: https://github.com/php/php-src/blob/master/Zend/zend_strtod.c

Regards,


Rowan Tommins
[IMSoP]

On Mon, Apr 8, 2024, 16:40 Rowan Tommins [IMSoP] <imsop.php@rwec.co.uk> wrote:

On Mon, 8 Apr 2024, at 13:42, Arvids Godjuks wrote:

The ini setting I was considering would function similarly to what it does for floats right now - I assume it changes the exponent, thereby increasing their precision but reducing the integer range they can cover.

If you’re thinking of the “precision” setting, it doesn’t do anything nearly that clever; it’s purely about how many decimal digits should be displayed when converting a binary float value to a decimal string. In recent versions og PHP, it has a “-1” setting that automatically does the right thing in most cases. https://www.php.net/manual/en/ini.core.php#ini.precision

The other way around - parsing a string to a float, including when compiling source code - has a lot of different compile-time options, presumably to optimise on different platforms; but no user options at all: https://github.com/php/php-src/blob/master/Zend/zend_strtod.c

Regards,


Rowan Tommins
[IMSoP]

Thanks for the info. Then we just specify the value range for the decimal the same way it’s done for integer and float and let developers decide if it fits their needs or they need to use BCMath/Decimal/GMP extensions.

Develop for the common use case for the core, let extensions take the burden of the rest.

On 07/04/2024 23:50, Jordan LeDoux wrote:

By a "scalar" value I mean a value that has the same semantics for reading, writing, copying, passing-by-value, passing-by-reference, and passing-by-pointer (how objects behave) as the integer, float, or boolean types.

Right, in that case, it might be more accurate to talk about "value types", since arrays are not generally considered "scalar", but have those same behaviours. And Ilija recently posted a draft proposal for "data classes", which would be object, but also value types: [RFC][Concept] Data classes (a.k.a. structs) - Externals

As I mentioned in the discussion about a "scalar arbitrary precision type", the idea of a scalar in this meaning is a non-trivial challenge, as the zval can only store a value that is treated in this way of 64 bits or smaller.

Fortunately, that's not true. If you think about it, that would rule out not only arrays, but any string longer than 8 bytes long!

The way PHP handles this is called "copy-on-write" (COW), where multiple variables can point to the same zval until one of them needs to write to it, at which point a copy is transparently created.

The pointer for this value would fit in the 64 bits, which is how objects work, but that's also why objects have different semantics for scope than integers. Objects are potentially very large in memory, so we refcount them and pass the pointer into child scopes, instead of copying the value like is done with integers.

Objects are not the only thing that is refcounted. In fact, in PHP 4.x and 5.x, *every* zval used a refcount and COW approach; changing some types to be eagerly copied instead was one of the major performance improvements in the "PHP NG" project which formed the basis of PHP 7.0. You can actually see this in action here: Online PHP editor | output for oPgr4

This is all completely transparent to the user, as are a bunch of other memory/speed optimisations, like interned string literals, packed arrays, etc.

So, there may be performance gains if we can squeeze values into the zval memory, but it doesn't need to affect the semantics of the new type.

In general I would say that libbcmath is different enough from other backends that we should not expect any work on a BCMath implementation to be utilized in other implementations. It *could* be that we are able to do that, but it should not be something people *expect* to happen because of the technical differences.

Some of the broader language design choices would be transferable though. For instance, the standard names of various calculation functions/methods are something that would remain independent, even with the differences in the implementation.

Yes, that makes sense. Even if we don't have an interface, it would be annoying if one class provided $foo->div($bar), and another provided $foo->dividedBy($bar)

For money calculations, scale is always likely to be a more useful configuration. For mathematical calculations (such as machine learning applications, which I would say is the other very large use case for this kind of capability), precision is likely to be the more useful configuration. Other applications that I have personally encountered include: simulation and modeling, statistical distributions, and data analysis. Most of these can be done with fair accuracy without arbitrary precision, but there are certainly types of applications that would benefit from or even require arbitrary precision in these spaces.

This probably relates quite closely to Arvid's point that for a lot of uses, we don't actually need arbitrary precision, just something that can represent small-to-medium decimal numbers without the inaccuracies of binary floating point. That some libraries can be used for both purposes is not necessarily evidence that we could ever "bless" one for both use cases and make it a single native type.

My intuition at the moment is that a single number-handling API would be challenging to do without an actual proposed implementation on the table for MPDec/MPFR.

I think it would certainly be wise to experiment with how each library can interface to the language as an extension, before spending the extra time needed to integrate it as a new zval type.

But even with these extensions available in PHP, they are barely used by developers at all because (at least in part) of the enormous difference between PECL and PIP. For PHP, I do not think that extensions are an adequate substitute like PIP modules are for Python.

Yes, this is something of a problem. On the plus side, a library doesn't need to be incorporated into the language to be widely installed, because we have the concept of "bundled" extensions; and in practice, Linux distributions add a few "popular" PECL extensions to their list of installable binary packages. On the minus side, even making it into the "bundled" list doesn't mean it's installed by default everywhere, and userland libraries spend a lot of effort polyfilling things which would ideally be available by default.

This is, essentially, the thesis of the research and work that I have done in the space since joining the internals mailing list.

Thanks, there's some really useful perspective there.

Regards,

--
Rowan Tommins
[IMSoP]

On Mon, Apr 8, 2024 at 12:23 PM Rowan Tommins [IMSoP] <imsop.php@rwec.co.uk> wrote:

As I mentioned in the discussion about a “scalar arbitrary precision type”, the idea of a scalar in this meaning is a non-trivial challenge, as the zval can only store a value that is treated in this way of 64 bits or smaller.

Fortunately, that’s not true. If you think about it, that would rule out not only arrays, but any string longer than 8 bytes long!

The way PHP handles this is called “copy-on-write” (COW), where multiple variables can point to the same zval until one of them needs to write to it, at which point a copy is transparently created.

The pointer for this value would fit in the 64 bits, which is how objects work, but that’s also why objects have different semantics for scope than integers. Objects are potentially very large in memory, so we refcount them and pass the pointer into child scopes, instead of copying the value like is done with integers.

Objects are not the only thing that is refcounted. In fact, in PHP 4.x and 5.x, every zval used a refcount and COW approach; changing some types to be eagerly copied instead was one of the major performance improvements in the “PHP NG” project which formed the basis of PHP 7.0. You can actually see this in action here: https://3v4l.org/oPgr4

This is all completely transparent to the user, as are a bunch of other memory/speed optimisations, like interned string literals, packed arrays, etc.

So, there may be performance gains if we can squeeze values into the zval memory, but it doesn’t need to affect the semantics of the new type.

I have mentioned before that my understanding of the deeper aspects of how zvals work is very lacking compared to some others, so this is very helpful. I was of course aware that strings and arrays can be larger than 64 bits, but was under the impression that the hashtable structure in part was responsible for those being somewhat different. I confess that I do not understand the technical intricacies of the interned strings and packed arrays, I just understand that the zval structure for these arbitrary precision values would probably be non-trivial, and from what I was able to research and determine that was in part related to the 64bit zval limit. But thank you for the clarity and the added detail, it’s always good to learn places where you are mistaken, and this is all extremely helpful to know.

This probably relates quite closely to Arvid’s point that for a lot of uses, we don’t actually need arbitrary precision, just something that can represent small-to-medium decimal numbers without the inaccuracies of binary floating point. That some libraries can be used for both purposes is not necessarily evidence that we could ever “bless” one for both use cases and make it a single native type.

Honestly, if you need a scale of less than about 15 and simply want FP error free decimals, BCMath is perfectly adequate for that in most of the use cases I described. The larger issue for a lot of these applications is not that they need to calculate 50 digits of accuracy and BCMath is too slow, it’s that they need non-arithmetic operations, such as sin(), cos(), exp(), vector multiplication, dot products, etc., while maintaining that low to medium decimal accuracy. libbcmath just doesn’t support those things, and creating your own implementation of say the sin() function that maintains arbitrary precision is… challenging. It compounds the performance deficiencies of BCMath exponentially, as you have to break it into many different arithmetic operations.

To me, while being 100x to 1000x more performant at arithmetic is certainly reason enough on its own, the fact that MPFR (for example) has C implementations for more complex operations that can be utilized is the real selling point. The ext-stats extension hasn’t been maintained since 7.4. And trig is critical for a lot of stats functions. A fairly common use of stats, even in applications you might not expect it, is to generate a Gaussian Random Number. That is, generate a random number where if you continued generating random numbers from the same generator, they would form a normal distribution (a bell curve), so the random number is weighted according to the distribution.

The simplest way to do that is with the sin() and cos() functions (picking a point on a circle). But a lot of really useful such mathematics are mainly provided by libraries that ALSO provide arbitrary precision. So for instance, the Gamma Function is another very common function in statistics. To me, implementing a bundled or core type that utilizes MPFR (or something similar) is as much about getting access to THESE mathematical functions as it is the arbitrary precision aspect.

Jordan

Hi Jordan,

To me, while being 100x to 1000x more performant at arithmetic is certainly reason enough on its own, the fact that MPFR (for example) has C implementations for more complex operations that can be utilized is the real selling point. The ext-stats extension hasn't been maintained since 7.4. And trig is critical for a lot of stats functions. A fairly common use of stats, even in applications you might not expect it, is to generate a Gaussian Random Number. That is, generate a random number where if you continued generating random numbers from the same generator, they would form a normal distribution (a bell curve), so the random number is weighted according to the distribution.

The simplest way to do that is with the sin() and cos() functions (picking a point on a circle). But a lot of really useful such mathematics are mainly provided by libraries that ALSO provide arbitrary precision. So for instance, the Gamma Function is another very common function in statistics. To me, implementing a bundled or core type that utilizes MPFR (or something similar) is as much about getting access to THESE mathematical functions as it is the arbitrary precision aspect.

As you say, BCMath is really barebones and slower than other libraries. It would be nice if there was a universal math extension that could handle all use cases, but unfortunately there isn't one today.

The biggest problem is right there: there are several math functions, and they have slightly different characteristics.

If could combine all of these to create a new math function without sacrificing the benefits of each, do you think that would be possible? (It doesn't matter what libraries or technologies use internally.)

To be honest, whenever I bring up the topic of BCMath on the mailing list, there are always references to speed and other libraries, so many people probably want that, but unfortunately, we probably don't have a common idea about the specifics.

If what I write is off-topic and not appropriate for this thread, I can start a new thread.

Regards.

Saki

On 8 April 2024 21:51:46 BST, Jordan LeDoux <jordan.ledoux@gmail.com> wrote:

I have mentioned before that my understanding of the deeper aspects of how
zvals work is very lacking compared to some others, so this is very
helpful.

My own knowledge definitely has gaps and errors, and comes mostly from introductions like https://www.phpinternalsbook.com/ and in this case Nikita's blog articles about the changes in 7.0: Internal value representation in PHP 7 - Part 1

I confess that I do not
understand the technical intricacies of the interned strings and packed
arrays, I just understand that the zval structure for these arbitrary
precision values would probably be non-trivial, and from what I was able to
research and determine that was in part related to the 64bit zval limit.

From previous discussions, I gather that the hardest part of implementing a new zval type is probably not the memory structure itself - that will mostly be handled in a few key functions and macros - but the sheer number of places that do something different with each zval type and will need updating. Searching for Z_TYPE_P, which is just one of the macros used for that purpose, shows over 200 lines to check: Z_TYPE_P (full) in projects: php-src - OpenGrok search results

That's why it's so much easier to wrap a new type in an object, because then all of those code paths are considered for you, you just have a fixed set of handlers to implement. If Ilija's "data classes" proposal progresses, you'll be able to have copy-on-write for free as well.

Regards,
Rowan Tommins
[IMSoP]

So I’d like to conclude this thread since we have dedicated threads for each of the topics here.

In my opinion, we should go with both. Both topics cover quite different things. Personally, I’m not really interested in BCMath part as much because I do not see me needing it (never did before, don’t force me getting into an industry where it would be required to have numbers that large). But I am interested in a native decimal that would cover the vast majority of the uses and be on part with integer and float number types.
If my stance makes sense, I then shall join the native decimal thread and continue there.

···

Arvīds Godjuks+371 26 851 664
arvids.godjuks@gmail.com
Telegram: @psihius https://t.me/psihius