C256 Foenix - Specifications (Last Update: March 10th)

User avatar
PJW
Posts: 15
Joined: Wed Apr 24, 2019 12:44 am

Re: C256 Foenix - Specifications (Last Update: March 10th)

Post by PJW » Mon May 20, 2019 11:57 pm

It's a tokenizing interpreter for BASIC. It's not going to be a speed demon, I'm afraid. I am considering changing how I handle literals to make those faster to process while running a program. I'm also trying to think of a simple way to make it extensible.. something like a foreign function interface or user-defined commands so that parts you'd need to be fast could be written in assembly and loaded added to the language for a particular program.
MageMaster
Posts: 3
Joined: Wed Apr 24, 2019 3:37 pm

Re: C256 Foenix - Specifications (Last Update: March 10th)

Post by MageMaster » Tue May 21, 2019 4:18 pm

That sounds like an awesome feature.
PJW wrote:
Mon May 20, 2019 11:57 pm
It's a tokenizing interpreter for BASIC. It's not going to be a speed demon, I'm afraid. I am considering changing how I handle literals to make those faster to process while running a program. I'm also trying to think of a simple way to make it extensible.. something like a foreign function interface or user-defined commands so that parts you'd need to be fast could be written in assembly and loaded added to the language for a particular program.
Jeff_Birt
Posts: 4
Joined: Fri May 10, 2019 1:05 pm

Re: C256 Foenix - Specifications (Last Update: March 10th)

Post by Jeff_Birt » Tue May 21, 2019 8:03 pm

PJW wrote:
Mon May 20, 2019 11:57 pm
It's a tokenizing interpreter for BASIC. It's not going to be a speed demon, I'm afraid. I am considering changing how I handle literals to make those faster to process while running a program. I'm also trying to think of a simple way to make it extensible.. something like a foreign function interface or user-defined commands so that parts you'd need to be fast could be written in assembly and loaded added to the language for a particular program.
In the MS/Commodore V2 flavor of BASIC literals are kept as ASCII text that is interpreted each and every time the line is 'run' which is slow.

In Forth a litteral is turned into an 'object' whose value is set to the literal value.

You might have a token that means 'Literal_Int' and one for 'Literal_Float' that tells the interpretor the following N bytes are an int or a float. So token 0x1A is 'Literal_Int' and assuming 16 bit INTs it would read the next two bytes as an INT.
User avatar
PJW
Posts: 15
Joined: Wed Apr 24, 2019 12:44 am

Re: C256 Foenix - Specifications (Last Update: March 10th)

Post by PJW » Wed May 22, 2019 3:25 pm

Jeff_Birt wrote:
Tue May 21, 2019 8:03 pm
You might have a token that means 'Literal_Int' and one for 'Literal_Float' that tells the interpretor the following N bytes are an int or a float. So token 0x1A is 'Literal_Int' and assuming 16 bit INTs it would read the next two bytes as an INT.
Yes, that's kind of what I have in mind. It would speed up execution, since the interpreter wouldn't have to re-parse numbers every time it executes that part of the code. On the other hand, the Commodore/Microsoft BASIC style of tokenization has the benefit that an end-of-line is easy to determine, as it's just the NULL at the end. Pre-parsed literals would make it more difficult since they'd sometimes (often?) have NULLs in them, so the interpreter can't find the end of the line by just looking for the NULL.

There's another possibility as well: I could go with a model that's more like Forth direct threaded VM: get rid of the tokens altogether and use instead pointers to the actual code. The saved files would have to be converted to ASCII though, and I don't think I could do that approach and let the programmer keep their formatting (spaces would end up being removed from the code, so

Code: Select all

10 PRINT"Hello"
would come out

Code: Select all

10 PRINT "Hello"
when you did a listing).
Jeff_Birt
Posts: 4
Joined: Fri May 10, 2019 1:05 pm

Re: C256 Foenix - Specifications (Last Update: March 10th)

Post by Jeff_Birt » Wed May 22, 2019 5:25 pm

As far as the null byte ending the line if you have an 'literal' opcode then you would consume the requisite bytes after it. Only after that would you look for a 'null' opcode.

Direct threading is an interesting idea. Folks got into the habit of leaving out spaces because they were limited to 80 characters/line. Personally I'm all for making readable 'pretty' code.

Of course if you really wanted to drive peopel crazey you coudl enforce Forth's white space rules :)
tim1724
Posts: 2
Joined: Mon Apr 22, 2019 6:43 pm

Re: C256 Foenix - Specifications (Last Update: March 10th)

Post by tim1724 » Thu May 23, 2019 12:05 am

PJW wrote:
Wed May 22, 2019 3:25 pm
spaces would end up being removed from the code, so

Code: Select all

10 PRINT"Hello"
would come out

Code: Select all

10 PRINT "Hello"
when you did a listing
Doesn't Microsoft BASIC already strip spaces? (I know the Applesoft flavor did; I'd assumed that all the 6502 versions tokenized the same way.)
User avatar
tomxp411
Posts: 10
Joined: Thu May 09, 2019 11:19 pm
Location: California, USA

Re: C256 Foenix - Specifications (Last Update: March 10th)

Post by tomxp411 » Thu May 30, 2019 10:11 pm

Microsoft BASIC doesn't strip spaces. The parser works a little differently on different platforms, though.

On Commodore BASIC, the parser just does a simple text search against the current word as it reads each character. So PRINTA gets encoded as $99 $41. $99 is the token for PRINT, and $41 is the PETSCII value for A.

On PC BASIC, spaces are required between commands and arguments, so PRINTA should result in a syntax error, and PRINT A encodes to $91 $20 $41. (PC BASIC's token for PRINT is $91)
User avatar
tomxp411
Posts: 10
Joined: Thu May 09, 2019 11:19 pm
Location: California, USA

Re: C256 Foenix - Specifications (Last Update: March 10th)

Post by tomxp411 » Thu May 30, 2019 10:40 pm

Jeff_Birt wrote:
Tue May 21, 2019 8:03 pm
PJW wrote:
Mon May 20, 2019 11:57 pm
It's a tokenizing interpreter for BASIC. It's not going to be a speed demon, I'm afraid. I am considering changing how I handle literals to make those faster to process while running a program. I'm also trying to think of a simple way to make it extensible.. something like a foreign function interface or user-defined commands so that parts you'd need to be fast could be written in assembly and loaded added to the language for a particular program.
In the MS/Commodore V2 flavor of BASIC literals are kept as ASCII text that is interpreted each and every time the line is 'run' which is slow.

In Forth a literal is turned into an 'object' whose value is set to the literal value.

You might have a token that means 'Literal_Int' and one for 'Literal_Float' that tells the interpreter the following N bytes are an int or a float. So token 0x1A is 'Literal_Int' and assuming 16 bit INTs it would read the next two bytes as an INT.
I proposed something similar on the old forum, but of course that's gone. :)

I've been working on a BASIC interpreter, and that's basically how mine works. Literals and variables would get tokenized using prefix codes, and punctuation would be left in its ASCII code.

So PRINT X + 3 would be tokenized as:
{PRINT}{VARIABLE}{1}{+}{LITERAL INT}{3}

Note that there are no spaces. Spaces would be stripped by the parser and inserted by the LISTer.

The variables would be stored on a table, which would include the variable name, type, and value. When the parser encounters a word that is not in the token table, it automatically creates an entry on the variable table.

To figure out the type of a literal, the parser uses the following rules:
A string starts with a quote, so any literal surrounded by quotes is prefixed with {LITERAL STRING}, then the string text. {END STRING} terminates the string. The starting and trailing quotes are stripped.
A series of digits (0-9) with no decimal point is a signed integer literal.
A series of digits with a decimal is a float.
$ prefixes a hexadecimal number. Hex numbers are treated as an integer, but printed as a hex string with only enough significant digits to represent the value. Hex numbers are unsigned.
A letter, followed by letters, numbers, a period, or the underscore, is a variable. It will be added to the variable table and represented by the {variable} token and an integer index into the table.

Longer values, such as 64 bit integers or GUID's, would need to be encoded as a string.

Finally, my interpreter saves and loads ASCII data. It does not save or load tokenized files; this allows for easier exchange with other platforms.
Post Reply