[ ] conforms to ANS Forth. Gforth CForth (Mitch Bradley) Open Firmware (Mitch Bradley) SwiftForth (Leon Wagner) SwiftX (Leon Wagner) bigForth (Bernd Paysan) JavaForth (Peter Knaggs) IronForth (Peter Knaggs) [ ] already implements the proposal in full since release [ ]. Gforth 0.1alpha amForth 0.1 (Matthias Trute) CForth (Mitch Bradley) Open Firmware (Mitch Bradley) SwiftForth all (Leon Wagner) SwiftX all (Leon Wagner) TurboForth (Mark Wills, for TI-99/4A) bigForth 1.0 (Bernd Paysan) 4th 3.62.4 (Hans Bezemer) [ ] implements the proposal in full in a development version. [ ] will implement the proposal in full in release [ ]. [ ] will implement the proposal in full in some future release. [ ] There are no plans to implement the proposal in full in [ ]. JavaForth (Peter Knaggs) IronForth (Peter Knaggs) [ ] will never implement the proposal in full.
[ ] I have used (parts of) this proposal in my programs.
Bernd Paysan
Leon Wagner
Mitch Bradley
Anton Ertl
Matthias Trute
Mark Wills
Hans Bezemer
[ ] I would use (parts of) this proposal in my programs if the systems
I am interested in implemented it.
Anton Ertl
Matthias Trute
Mark Wills
[ ] I would use (parts of) this proposal in my programs if this
proposal was in the Forth standard.
Mark Wills
[ ] I would not use (parts of) this proposal in my programs.
Tim Partridge
This proposal is insufficient on its own, it should be paired with a byte access library/wordset.
Problem
While Forth-94 and Forth-2012 provide the option to systems that
1 CHARS > 1, to my knowledge no mainstream Forth system has
made use of this option. Consequently, many Forth programmers write
programs that (according to the standard documentation requirements)
should declare an environmental dependency on 1 CHARS = 1.
Even those programmers that want to avoid this dependency cannot
easily test that their program does not have it, because so few Forth
systems are available that do not satisfy this dependency. And in any
case, spending an effort to avoid an environmental dependency that all
interesting systems satisfy seems wasteful; alternatively, declaring
the environmental dependency does not beneit anyone, either.
Solution
Characters take one address unit (au)in memory.
CHARS can be implemented as noop
CHAR+ is equivalent to 1+
These words are not removed. So programmers who like to program for
systems (whether existing or not) where 1 CHARS>1 still can do so.
As a consequence, on byte-addressed machines, characters normally take
one byte; on word-addressed machines, characters and cells take one
machine word.
Remarks
What about word-addressed machines?
Implementing standard Forth on word-addressed machines already
necessitates that char=au: a character in standard Forth must be at
least on address unit wide, and be at most as large as a cell; on
word-addressed machines, cell=au, so char=au follows from that.
There have been occasionaly ideas about supporting a packed character
representation (with more than one character per au) for
word-addressed machines, but that is beyond the scope of this
proposal.
What about nibble-addressed/bit-addressed machines?
No implementations of standard Forth on such machines are known to me;
probably the benefits of standard systems do not outweigh the costs on
such restricted machines. Such machines are becoming rarer over time,
so if nobody has created such a system yet, it is unlikely to happen
in the future.
Still, if anybody wants to create such a system, they would have two
options: Declare an environmental restriction of having a non-standard
character size, or implement an address unit size of (e.g.) 8 bits.
Which option is preferable depends on the use of the system.
What about other unusual hardware setups?
I heard of an embedded system with 16-bit address units and 32-bit
chars. If we standardize on 1 chars = 1, the simplest way forward for
such a system would be to declare an environmental restriction of
having a non-standard char size; alternatively, one could try to
change the char size to 16 bits, or the address unit size to 32 bits.
I don't know enough about the system to make any recommendation among
these options.
What about Unicode?
An important motivation for the introduction of CHARS and CHAR+ in
Forth-94 was probably the seeming move to 16-bit characters with
Unicode in the early 1990s, which also led to 16-bit characters in
Windows NT (released 1993) and in Java (released 1995).
However, 16 bits turned out to be too little for storing Unicode code
points (Unicode 2.0 1996), so UTF-16 (Unicode based on 16-bit basic
units) is a variable-width encoding; at around the same time UTF-8
was introduced for encoding Unicode with 8-bit basic units, which
requires less conversion effort for software based on 8-bit chars, and
has the same disadvantage of being variable-width as UTF-16 (which
turns out to play little role in most software).
As a result, with a few exceptions (see below) Forth systems have
stayed with 8-bit characters, and they are not alone in this: UTF-8 is
the dominant file representation of Unicode, and most Unix software
handles Unicode also internally as UTF-8 (can anyone fill in here for
Windows).
Jax4th and JavaForth
Jax4th was written by Jack Woehr (Jax) as a proof of concept
implementation of Forth-94, and implements 16-bit characters on a
byte-addressed platform. It is not maintained (current platform:
Windows NT 3.1).
JavaForth was written in 2014 by Peter Knaggs with primitives in Java.
It uses 32-bit cells, 16-bit chars and 8-bit address units. Because
Java does not allow access to raw memory, JavaForth has to map Forth
memory to a Java array of some Java type anyway, and is pretty free to
choose address unit, character and cell size; it seems to me that it
would be simplest if Java Forth actually implemented
cell=char=au=32bits.
My impression is that Peter Knaggs chose the address unit and
character size to allow testing whether programs have an environmental
dependency on char=au (but that is still difficult for most programs
because JavaForth implements only the core words). While one might
feel that the effort that went into getting this to work would be
wasted if this proposal is accepted, this effort has been expended
already, and is no good reason not to standardize the common practice
that char=au. JavaForth could adopt the proposal by having 8-bit
chars or 16-bit aus, or in any other way; or it could keep the current
choices (and declare an environmental restriction on char>au) to serve
as testing platform for programs that should be able to cope with that
environmental restriction.
Typical use
s" Some string" dup buffer: s s swap move
instead of Forth-2012:
s" Some string" dup chars buffer: s s swap Cmove
Proposal
3.1.2 Replace "be at least one address unit wide" with "be exactly one
address unit wide".
Keep the rest as-is. While some wordings could be simplified (in
particular wrt character-aligned addresses, and the definitions of
CHARS and CHAR+), keeping them as-is gives guidance to programmers who
want to write char>au-capable code, and to system implementors who
implement a Forth system with the environmental restriction char>au.
Reference implementation
Nearly all standard Forth systems, e.g., SwiftForth, VFX, iForth,
Gforth, ...
Test cases
T{ 0 char+ -> 1 }T
T{ 1 chars -> 1 }T
Experience
Almost universally implemented and widely used.
Comments
None yet.
Note that you can be both a system implementor and a programmer, so you can submit both kinds of ballots.
[ ] conforms to ANS Forth. [ ] already implements the proposal in full since release [ ]. [ ] implements the proposal in full in a development version. [ ] will implement the proposal in full in release [ ]. [ ] will implement the proposal in full in some future release. [ ] There are no plans to implement the proposal in full in [ ]. [ ] will never implement the proposal in full.If you want to provide information on partial implementation, please do so informally, and I will aggregate this information in some way.
[ ] I have used (parts of) this proposal in my programs.
[ ] I would use (parts of) this proposal in my programs if the systems
I am interested in implemented it.
[ ] I would use (parts of) this proposal in my programs if this
proposal was in the Forth standard.
[ ] I would not use (parts of) this proposal in my programs.
If you feel that there is closely related functionality missing from the
proposal (especially if you have used that in your programs), make an
informal comment, and I will collect these, too. Note that the best time
to voice such issues is the RfD stage.