Number prefixes

Poll standings

See below for voting instructions.
Systems
[ ] conforms to ANS Forth.
    SwiftForth, SwiftX (Leon Wagner)
    iForth (Marcel Hendrix)
    Gforth
    Win32Forth 6.xx, STC version 0.02
[ ] already implements the proposal in full since release [ ].
    Win32Forth 6.xx, STC version 0.02
    iForth 3.0
[ ] implements the proposal in full in a development version.
    Gforth
    Win32Forth
    4p (Helmar)
[ ] will implement the proposal in full in release [ ].
    4p 1.2
[ ] will implement the proposal in full in some future release.
    SwiftForth, SwiftX (Leon Wagner)
    kForth (will not support double-cell entry)
There are no plans to implement the proposal in full in [ ].
[ ] will never implement the proposal in full.
    4th (Hans Bezemer)
Programmers
[ ] I have used (parts of) this proposal in my programs:
    Leon Wagner
    George Hubert
    Marcel Hendrix
    Anton Ertl
    Alex McDonald
[ ] I would use (parts of) this proposal in my programs if the systems
    I am interested in implemented it:
    Leon Wagner
    Marcel Hendrix
[ ] I would use (parts of) this proposal in my programs if this
    proposal was in the Forth standard:
    Leon Wagner
    Marcel Hendrix
[ ] I would not use (parts of) this proposal in my programs.
    Hanz Bezemer
    Andrew Haley
Informal results
iForth currently does not support lower-case hex digits.

4p passes the core and core-ext testers, but is not yet 100% ANS Forth conformant.

Proposal

Change History

2007-08-14 CfV
  No change in the actual proposal
  Updated informal straw poll results
  Corrected information of the meaning of a " prefix in CHForth
  Added CfV stuff
2007-08-11 Second RfD
  No significant change in the actual proposal (still waiting on more
    feedback before deciding on whether to keep '').
  Existing Practice: extended with information about ASCII, CONTROL,
    and Forth Inc's '' naming convention, and occurence
    frequencies of character literals and (for comparison) some CORE
    words.
  Remarks: Preliminary informal straw poll results on '' (more
    feedback wanted).
  Comments: added Section and comments from Mitch Bradley and
    Elizabeth Rather.
  Proposal: Rewrote it in a way that makes it optional.  Fixed a
    buglet.

2007-08-03 First RfD

Problem

It is often useful to write numbers in other than the current BASE.
Changing BASE to do this is often inconvenient and has lead to
hard-to-find errors.


Solution

The Forth text interpreter should accept numbers like the following ones:

#-12346789. \ decimal double
$-1234cDeF. \ hex double
%-10101010. \ binary double
'a'         \ equivalent to char a or [char] a

(and of course without "." for single-cell and without - for positive
and unsigned numbers; the text interpreter only has to accept
double-cell numbers if the double-cell wordset is present).


Typical usage:

#99
#-99.
$-ff
$ff.
$FF
$-FF.
%-11
%11.
'a'


Existing Practice

Many Forth systems support various number prefixes already in the
interpreter:

#10 $10 %10 &10 0x10 0X10 10h 10H $f $F 'a 'a' input
 10  16   2   -    -    -   -   -  - 15  -  97 iforth 2.1.2541, CHForth 1.2.5, 1.3.9
 10  16   2  10    -    -   -   - 15 15 97 24871 bigFORTH rev. 2.1.6
  -  16   2  10   16   16   -   - 15 15  -   - PFE 0.33.34
  -  16   2  10    -    -   -   - 15 15 97 24871 Gforth-0.6.2
 10  16   2  10   16   16   -   - 15 15 97  97 Gforth 0.6.9
  -  16   -   -   16   16   -   - 15 15  -  97 Win32Forth version 4.xxx
 10  16   2  10   16   16  16  16 15 15  -  97 Win32Forth version 6.xx, Win32Forth-STC version 0.02.xx
 10  16   2   8    -    -   -   - 15 15  -   - SwiftForth
 10  16   2   -   16   16  16  16 15 15  -   - VFX Forth 4.0.2 build 2428
 10  16   2   8    -    -   -   - 15 15  -   - ntf/lxf (Peter Fälth)

 #9. #9 -#9. -#9 #-9. #-9 $F. $F -$F. -$F $-F. $-F %1. %1 -%1. -%1 %-1. %-1
   9  9    -   -   -9  -9  15 15    -   -  -15 -15   1  1    -   -   -1  -1 iForth 2.1.2541
   9  9   -9  -9    -   -  15 15  -15 -15    -   -   1  1   -1  -1    -   - bigForth 2.1.6
   -  -    -   -    -   -  15 15  -15 -15  -15 -15   1  1   -1  -1   -1  -1 PFE 0.33.34
   9  9   -9  -9    -   -  15 15  -15 -15    -   -   1  1   -1  -1    -   - gforth 0.6.9
   9  9   -9  -9   -9  -9  15 15  -15 -15  -15 -15   1  1   -1  -1   -1  -1 Win32Forth 6.xx
   9  9    -   -   -9  -9  15 15    -   -  -15 -15   1  1    -   -   -1  -1 SwiftForth 3.0.11
   9  9    -   -   -9  -9  15 15    -   -  -15 -15   1  1    -   -   -1  -1 ntf/lxf (Peter Fälth)

The programs used to produce these outputs are: test-num-prefixes.fs
and test-num-prefixes2.fs

None of these systems accept !10, @10 or ^10, and none accept these
prefixes in >NUMBER.

Also:
iForth and CHForth support the following prefixes

^ for control characters, e.g., ^G for 7.
& for characters, e.g., &" for 34 and &1 for 49.
" for single characters, e.g. "a" for 97 (in iForth,
                                          in CHForth this is a string)

ntf/lxf supports the prefix U for a Unicode codepoint, e.g., U20AC for
the Euro sign (to convert from the Unicode number to the on-stack
format of ntf/lxf for xchars).

Open Firmware, eforth, and Swiftforth support the parsing words H# D#
B# O# to determine the base, used as in "H# FF".  Open Firmware also
has the parsing word ASCII (a STATE-smart ancestor of CHAR/[CHAR]) and
CONTROL (a variant of ASCII intended to produce the values of control
characters).

Forth, Inc. uses and promotes (e.g., in Forth Programmer's Handbook) a
convention for word names of the form '' for words that do

: '' [char]  hold ;

There was a claim that there are few uses of CHAR/[CHAR] so the
'' syntax would save nothing.  In response three corpi of code
were searched for uses of [CHAR], CHAR, or '/''; the
number of occurences were:

[CHAR]    CHAR    'x/'x'
 245       141     443    Gforth sources
1087       155     ---    benchmarks collected by Anton Ertl
 711      3982    6969    Marcel Hendrix' corpus <04153616153560@frunobulax.edu>

So '' does not just have a large potential for use, it is
already used a lot.  For comparison, here are some frequencies of some
CORE words that could be easily replaced with slightly longer
sequences in the Gforth sources (829 occurences of character
literals):

298 1+
223 1-
132 2*
345 2DROP
401 2DUP
 79 >
107 NEGATE
211 SPACE


Remarks

Character value syntax

The '<char>' syntax does not have as much support in existing systems
as the others; if there is too much resistance against that, I will
take it out.


Sign or base prefix first?

Some people find it more intuitive to place the sign in front of the
base prefix, some find it more natural to do it the other way round.
That's probably a good reason for systems to accept both variants.
However, currently slightly more systems support the base-prefix-first
approach, so that's what is proposed here.  Systems can also implement
to other variant, but programs making use of that would not conform to
this proposal.  A very informal strawpoll had the following result:

In favour: Marcel Hendrix, Alex McDonald, helmwo@gmail.com, Gerry,
           Stephen Pelc, Anton Ertl
Against:   Ed, Elisabeth Rather


Comments

Mitch Bradley pointed out that the parsing word approach (H# etc.) has
a compilation speed advantage on systems that cache dictionary lookups
but where the dictionary is slow otherwise.

Elizabeth Rather thinks that it's inadvisable to put any kind of
optional extension in the main body of the Standard, and that it
should go in the appropriate *.3.* of TOOLS or whatever wordset, with
a reference to it in the main body.


Proposal

Move "12.2.2.1 Numeric notation" to "2.2.1 Numeric Notation",
with appropriate editing.

Replace "3.4.1.3 Text interpreter input number conversion" with:

|When converting input numbers, the text interpreter shall recognize
|integer numbers in the form <BASEnum>; if X:number-prefixes is
|present, the text interpreter shall recognize integer numbers in the
|form <anynum>.
|
|<anynum>   := { <BASEnum> | <decnum> | <hexnum> | <binnum> | <cnum> }
|<BASEnum>  := [-]<bdigit><bdigit>*
|<decnum>   := #[-]<decdigit><decdigit>*
|<hexnum>   := $[-]<hexdigit><hexdigit>*
|<binnum>   := %[-]<bindigit><bindigit>*
|<cnum>     := '<char>'
|<bindigit> := { 0 | 1 }
|<decdigit> := { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 }
|<hexdigit> := { <decdigit> | a | b | c | d | e | f | A | B | C | D | E | F }
|
|<bdigit> represents a digit according to the value of BASE (see
|"3.2.1.2 Digit conversion").  For <hexdigit>, the digits a..f have the
|values 10..15.  <char> represents any printable character.
|
|The radix used for number conversion is: for <BASEnum>, the value in
|BASE; for <decnum> 10; for <hexnum> 16; for <binnum> 2.  For <cnum>,
|the number is the value of <char>.


Change "8.3.2 Text interpreter input number conversion" as follows:

|When the text interpreter processes a number except a <cnum> that is
                                              ^^^^^^^^^^^^^^^ new
|immediately followed by a decimal point and is not found as a
|definition name, the text interpreter shall convert it to a
|double-cell number.
|
|For example, entering DECIMAL 1234 leaves the single-cell number 1234
|on the stack, and entering DECIMAL 1234. leaves the double-cell number
|1234 0 on the stack.


Reference implementation

Implementing this proposal requires changes in places that are quite
system-specific, therefore a reference implementation is not useful.


Tests

require test/tester.fs
decimal

{ #1289       -> 1289 }
{ #12346789.  -> 12346789. }
{ #-1289      -> -1289 }
{ #-12346789. -> -12346789. }
{ $12eF       -> 4847 }
{ $12aBcDeF.  -> 313249263. }
{ $-12eF      -> -4847 }
{ $-12AbCdEf. -> -313249263. }
{ %10010110   -> 150 }
{ %10010110.  -> 150. }
{ %-10010110  -> -150 }
{ %-10010110. -> -150. }
{ 'z'         -> 122 }

Voting instructions

Fill out the appropriate ballot(s) below and mail it/them to me <anton@mips.complang.tuwien.ac.at>. Your vote will be published (including your name (without email address) and/or the name of your system) here. You can vote (or change your vote) at any time by mailing to me, and the results will be published here.

Note that you can be both a system implementor and a programmer, so you can submit both kinds of ballots.

Ballot for systems
If you maintain several systems, please mention the systems separately in the ballot. Insert the system name or version between the brackets or in the line below the statement. Multiple hits for the same system are possible (if they do not conflict).
[ ] conforms to ANS Forth.
[ ] already implements the proposal in full since release [ ].
[ ] implements the proposal in full in a development version.
[ ] will implement the proposal in full in release [ ].
[ ] will implement the proposal in full in some future release.
There are no plans to implement the proposal in full in [ ].
[ ] will never implement the proposal in full.
If you want to provide information on partial implementation, please do so informally, and I will aggregate this information in some way.
Ballot for programmers
Just mark the statements that are correct for you (e.g., by putting an "x" between the brackets). If some statements are true for some of your programs, but not others, please mark the statements for the dominating class of programs you write.
[ ] I have used (parts of) this proposal in my programs.
[ ] I would use (parts of) this proposal in my programs if the systems
    I am interested in implemented it.
[ ] I would use (parts of) this proposal in my programs if this
    proposal was in the Forth standard.
[ ] I would not use (parts of) this proposal in my programs.
If you feel that there is closely related functionality missing from the proposal (especially if you have used that in your programs), make an informal comment, and I will collect these, too. Note that the best time to voice such issues is the RfD stage.
Anton Ertl