diff --git a/gazprea/impl/slice_passing.rst b/gazprea/impl/slice_passing.rst new file mode 100644 index 0000000..116ccae --- /dev/null +++ b/gazprea/impl/slice_passing.rst @@ -0,0 +1,59 @@ +.. _sec:impl_slice_passing: + +Slice Passing — Eager Copy vs. Copy-On-Write +============================================ + +The *Gazprea* specification defines slice expressions as rvalues that produce a +**deep copy** of the selected elements (see :ref:`sssec:array_ops`). This is a +*semantic* guarantee: from the programmer's perspective, a slice always behaves +as an independent value with no aliasing relationship to the source array. + +However, the specification does **not** require an *eager* copy to be made at +the point of the slice expression. An implementation is free to use a lazy +strategy such as **Copy-On-Write (COW)**. + +Copy-On-Write Strategy +---------------------- + +Under COW, passing a slice to a function or procedure does not immediately +duplicate the underlying storage. Instead, the implementation shares the same +backing memory and only performs the physical copy when — and if — either the +source array or the slice view is mutated. If no mutation occurs, the copy is +avoided entirely. + +This is safe because: + +1. Slices can only be passed as ``const`` (by-value) parameters. A callee that + receives a slice argument cannot mutate it through that parameter. +2. The source array variable is not accessible from inside the called + function/procedure (functions are pure; procedures have no aliasing with + ``const`` parameters). + +Therefore, a COW implementation is observationally equivalent to an eager deep +copy for all legal *Gazprea* programs. + +Example +------- + +:: + + function sum(integer[*] v) returns integer { ... } + + procedure main() returns integer { + var integer[10] a = 1..10; + + // The slice a[2..6] is an rvalue. An eager-copy implementation + // allocates a new 4-element array here. A COW implementation may + // instead pass a lightweight view into a's storage, deferring the + // copy until (if ever) a mutation would make it necessary. + integer total = sum(a[2..6]); + return total; + } + +Implementation Note +------------------- + +Choosing an eager-copy or COW strategy is an internal quality-of-implementation +decision and does not affect language semantics. Implementations that wish to +avoid the overhead of copying large slices are encouraged to consider COW or +similar lazy strategies. diff --git a/gazprea/impl/value_categories.rst b/gazprea/impl/value_categories.rst new file mode 100644 index 0000000..6d50331 --- /dev/null +++ b/gazprea/impl/value_categories.rst @@ -0,0 +1,135 @@ +.. _sec:value_categories: + +Value Categories +================ + +Every expression in *Gazprea* belongs to exactly one **value category**, +which determines how the expression may be used. In essence, value categories +describe whether an expression +can appear on the left-hand side of an assignment and whether it can be passed +as a mutable (``var``) argument to a procedure. + +*Gazprea* recognises two value categories: **lvalue** and **rvalue**. This is +a deliberate simplification of the richer taxonomy found in modern C++ (which +adds *xvalue*, *prvalue*, and *glvalue*); those additional categories exist to +support move semantics and resource transfer, neither of which *Gazprea* +exposes. The two-category model is sufficient for *Gazprea*'s ownership rules, +which are entirely copy-based. + +The full C++ taxonomy is described at +`cppreference: Value categories `_ +and is worth understanding as background, even if *Gazprea* does not expose all +of it. + +.. _ssec:vc_background: + +Background: The Full C++ Taxonomy +---------------------------------- + +C++ characterises expressions along two orthogonal axes: + +- **Identity**: does the expression refer to a persistent object that has an + address and can be named again later? +- **Moveability**: can the object's resources be transferred (moved) rather + than copied? + +This gives rise to five named categories, arranged in the following hierarchy: + +.. code-block:: text + + Expression + ├── glvalue (has identity) + │ ├── lvalue (identity, not moveable) + │ └── xvalue (identity, moveable — "expiring value") + └── rvalue (may be moved from) + ├── xvalue (shared with glvalue above) + └── prvalue (no identity — "pure rvalue") + +**glvalue** ("generalised lvalue") + Any expression that determines the identity of an object or function. + Includes both lvalues and xvalues. A glvalue *may* be implicitly converted + to a prvalue. + +**lvalue** + A glvalue that is not an xvalue. Refers to a persistent object with a + stable address — something you can take the address of and use again next + time the same expression is evaluated. Variable names, array element + accesses, and dereferenced pointers are classic lvalues. + +**xvalue** ("expiring value") + A glvalue whose resources can be reused because the object is near the end + of its lifetime. Introduced in C++11 to support ``std::move`` and rvalue + references. *Gazprea* has no equivalent.* + +**prvalue** ("pure rvalue") + An rvalue that is not an xvalue. Computes a value or initialises an object + but has no persistent identity of its own. Literals, arithmetic + sub-expressions, and function return values (when returned by value) are + prvalues. + +**rvalue** + The union of xvalues and prvalues, anything that is not a glvalue. + rvalues can generally be moved from (in C++) and cannot be the target of an + ordinary assignment. + +.. _ssec:vc_gazprea: + +Value Categories in Gazprea +----------------------------- + +Because *Gazprea* has no move semantics or reference types, xvalues never +arise. The two remaining categories collapse cleanly: + +**lvalue** + An expression that refers to a named, addressable storage location that + persists beyond the expression and can appear on the left-hand side of an + assignment. In *Gazprea*: + + - Named variables (``x``, ``arr``, ``my_tuple``) + - Individual element accesses on mutable arrays (``arr[i]``, ``mat[i, j]``, + ``tup.1``, ``tup.name``) + +**rvalue** + An expression that produces a value but has no persistent, named storage + location. In *Gazprea*, rvalues correspond to what C++ would call + *prvalues*: + + - Literals (``42``, ``true``, ``'a'``, ``"hello"``) + - Arithmetic and logical sub-expressions (``x + 1``, ``a and b``) + - Array and tuple literals (``[1, 2, 3]``, ``(x: 1, y: 2)``) + - Range expressions (``1..10``) + - **Slice expressions** (``arr[2..5]``) — even though slices are derived + from a named array, the result is a fresh deep copy with no stable + address of its own + - Function call results + +.. _ssec:vc_consequences: + +Practical Consequences +----------------------- + +The value category of an expression determines what you can do with it: + ++---------------------------------------------+----------+---------+ +| Operation | lvalue | rvalue | ++=============================================+==========+=========+ +| Appear on left-hand side of ``=`` | ✓ | ✗ | ++---------------------------------------------+----------+---------+ +| Pass as ``var`` (mutable) procedure argument| ✓ | ✗ | ++---------------------------------------------+----------+---------+ +| Pass as ``const`` procedure argument | ✓ | ✓ | ++---------------------------------------------+----------+---------+ +| Use in an expression | ✓ | ✓ | ++---------------------------------------------+----------+---------+ + +In particular, because a slice is an rvalue, the following are both +compile-time errors: + +:: + + var integer[5] a = [10, 20, 30, 40, 50]; + + a[1..3] = [99, 99]; // ERROR: slice is an r-value, not an l-value + + procedure mutate(var integer[*] v) { ... } + call mutate(a[1..3]); // ERROR: cannot pass r-value as var argument diff --git a/gazprea/index.rst b/gazprea/index.rst index 54a4f74..c81217b 100644 --- a/gazprea/index.rst +++ b/gazprea/index.rst @@ -20,6 +20,7 @@ Hardware Acceleration Laboratory in Markham, ON. spec/identifiers spec/comments spec/declarations + spec/constexpr spec/type_qualifiers spec/types spec/type_inference @@ -42,5 +43,7 @@ Hardware Acceleration Laboratory in Markham, ON. impl/part_1 impl/part_2 impl/errors + impl/value_categories + impl/slice_passing .. |gazprea_logo| image:: assets/images/GazpreaLogo.png diff --git a/gazprea/spec/built_in_functions.rst b/gazprea/spec/built_in_functions.rst index 3d1fdb0..0dcc7e9 100644 --- a/gazprea/spec/built_in_functions.rst +++ b/gazprea/spec/built_in_functions.rst @@ -14,22 +14,8 @@ If a declaration or a definition with the same name as a built-in function is encountered in a *Gazprea* program, then the compiler should issue an error. Note that although the examples below all use arrays, all the built-ins work -on Vectors and Strings, since they are always compatible with arrays. - -.. _ssec:builtIn_length: - -Length ------- - -``length`` takes an array of any element type, and returns an integer -representing the number of elements in the array. - -:: - - integer[*] v = 1..5; - - length(v) -> std_output; /* Prints 5 */ - +on strings as well, since a ``string`` is structurally compatible with +``character[*]``. .. _ssec:builtIn_rows_cols: @@ -37,12 +23,16 @@ Shape ----- The built-in ``shape`` operates on arrays of any dimension, and returns an -array listing the size of each dimension. +``integer[*]`` listing the size of each dimension. For a 1-dimensional array, +``shape`` returns a single-element array, so ``shape(v)[1]`` gives the number +of elements in ``v``. :: - integer[*][*] M = [[1, 2, 3], [4, 5, 6]]; + integer[4] v = 1..5; + shape(v)[1] -> std_output; /* Prints 4 */ + integer[*, *] M = [[1, 2, 3], [4, 5, 6]]; shape(M) -> std_output; /* Prints [2, 3] */ .. _ssec:builtIn_reverse: @@ -50,16 +40,16 @@ array listing the size of each dimension. Reverse ------- -The reverse built-in takes any single dimensional array, Vector, or String, and returns a -reversed version of it. +The ``reverse`` built-in takes any array or ``string``, and returns a +reversed copy of it. :: - integer[*] v = 1..5; - integer[*] w = reverse(v); + integer[4] v = 1..5; + integer[4] w = reverse(v); - v -> std_output; /* Prints 12345 */ - w -> std_output; /* Prints 54321 */ + v -> std_output; /* Prints 1234 */ + w -> std_output; /* Prints 4321 */ .. _ssec:builtIn_format: diff --git a/gazprea/spec/constexpr.rst b/gazprea/spec/constexpr.rst new file mode 100644 index 0000000..ac7a5d0 --- /dev/null +++ b/gazprea/spec/constexpr.rst @@ -0,0 +1,154 @@ +.. _sec:constexpr: + +Constant Expressions +==================== + +A constant expression (sometimes called a constexpr) is an expression that can +be fully +evaluated by the compiler at compile time. This feature primarily +for specifying the size of +:ref:`statically-sized arrays `. + +In *Gazprea*, a ``constexpr`` is not a keyword, but a property of a ``const`` +variable. A ``const`` variable is considered a ``constexpr`` if and only if its +initializer expression meets a strict set of criteria: + +.. _ssec:constexpr_rules: + +Rules for Constant Expressions +------------------------------ + +An expression is a valid ``constexpr`` if it is composed exclusively of: + +1. Literals of base types (``boolean``, ``integer``, ``real``, ``character``). +2. Operators, including ``+``, ``-``, ``*``, ``/``, ``not``, ``and``, ``or``. + between two or more ``constexpr``s. +3. Constructors for aggregate types, provided that the aggregate is const and + all members are ``constexpr``s. +4. Index or field access on ``constexpr`` aggregate types. +5. Other variables that are themselves valid ``constexpr``s. + +An expression is **not** a ``constexpr`` if it contains: + +1. References to ``var`` variables. +2. Function or procedure calls. +3. Any I/O operations (``<-``). + +The compiler must perform this validation recursively. When checking if a variable +is a ``constexpr``, the compiler must trace its entire dependency chain. If the +chain ever depends on a runtime value, the check fails. + +The only expressions that *must* be ``constexpr`` are global constants. Other +constexprs arising from constants inside function scope may also be constexprs +but the implementation does not need to enforce or necessarily identify this. +Students should also note that mlir has a constant propagation pass built in, +so doing constant folding yourself may not be necessary depending on your +implementation. + +**Examples:** + +**Note**: we will annotate the scope explicitly in these examples. Some +'illegal' examples here would be legal within a non-global scope. + +:: + // ---------------------------- + // in global scope + // ---------------------------- + + // Legal Global Constant Expressions + const A = 10; + const B = A * 2; // Depends on another constexpr + const C = B + 5; // C is 25 + + // Illegal Global Constant Expressions + var x = 10; + const Y = x + 5; // Not a constexpr: depends on a 'var' + + function get_val() returns integer { return 100; } + const Z = get_val(); // Not a constexpr: depends on a function call + +.. _ssec:constexpr_aggregates: + +Constant Expressions with Aggregate Types +----------------------------------------- + +Arrays and tuples can also be ``constexpr``s if they meet specific criteria, +allowing them to be used to define other constants. + +#. Arrays + + A ``const`` statically-sized array is a ``constexpr`` if: + + 1. Its size is a valid ``constexpr``. + 2. All of its element initializers are valid ``constexpr``s. + 3. Any use of the spread operator (``...``) spreads only arrays that are + themselves ``constexpr``s. + + Dynamically-sized arrays (e.g., ``integer[*]``) cannot be ``constexpr`` + aggregates as their size is not known at compile time. + + :: + + // ---------------------------- + // in global scope + // ---------------------------- + + const WIDTH = 5; + const integer[WIDTH] LOOKUP_TABLE = [10, 20, 30, 40, 50]; // Legal constexpr array + + const ELEMENT = LOOKUP_TABLE[3]; // Legal: ELEMENT is a constexpr with value 30 + integer[ELEMENT] my_array = 0; // Legal: static array of size 30, zero-filled + + const integer[2] BAD_TABLE = [10, get_val()]; // Illegal: initializer is not a constexpr + // also illegal if a procedure since + // procedures calls are not allowed + // within declarations + + // Spread of a constexpr array is also a constexpr + const integer[3] A = [1, 2, 3]; + const integer[5] B = [0, ...A, 4]; // Legal: spread of constexpr A + + // Spread of a non-constexpr array is not + var integer[*] dyn = [1, 2, 3]; + const integer[5] C = [0, ...dyn, 4]; // Illegal: dyn is not a constexpr + + + A ``constexpr`` can appear anywhere a ``const`` declaration is legal, + including inside functions, procedures, and control-flow blocks. However, + **not every** ``const`` variable is a ``constexpr``. ``const`` means only + that the variable is immutable within its scope; ``constexpr`` is the + stronger property that the value is fully known at compile time. For + example: + + :: + + // ---------------------------------- + // in local/function/non-global scope + // ---------------------------------- + var integer x; + x <- std_input; + const integer y = x; // Legal: y is immutable, but NOT a constexpr + // because its value depends on runtime input. + integer[y] arr; // Legal, but not constexpr: y is not a + // constexpr, so it cannot + // be used as a static array size. arr is + // a dynamic-sized array + + The compiler propagates the constexpr property through local scopes + normally; there is no restriction on where in a block the declaration + appears, as long as its entire dependency chain satisfies the rules above. + +#. Tuples + + A ``const`` tuple is a ``constexpr`` if all of its fields are initialized with + valid constant expressions. + + :: + + // ---------------------------- + // in global scope + // ---------------------------- + const CONFIG = (true, 10 * 2); // Legal constexpr tuple + + const IS_ENABLED = CONFIG.1; // Legal: IS_ENABLED is a constexpr with value 'true' + const VALUE = CONFIG.2; // Legal: VALUE is a constexpr with value 20 diff --git a/gazprea/spec/declarations.rst b/gazprea/spec/declarations.rst index a87865d..d1c4e06 100644 --- a/gazprea/spec/declarations.rst +++ b/gazprea/spec/declarations.rst @@ -4,7 +4,7 @@ Declarations ============ Variables must be declared before they are used. Aside from -a few :ref:`special cases `, declarations have the +a few :ref:`special cases `, declarations have the following formats: :: @@ -85,7 +85,6 @@ Special cases Special cases of declarations are covered in their respective sections. #. :ref:`Arrays ` -#. :ref:`Matrices ` #. :ref:`Tuples ` #. :ref:`Globals ` #. :ref:`Functions ` diff --git a/gazprea/spec/expressions.rst b/gazprea/spec/expressions.rst index bde026d..f4a5622 100644 --- a/gazprea/spec/expressions.rst +++ b/gazprea/spec/expressions.rst @@ -43,6 +43,75 @@ associativities of the operators in *Gazprea*. | (Lowest) 13 | ``||`` | right | +----------------+------------------------------------+-------------------+ +.. _ssec:expressions_range: + +Range Operator (``..``) +----------------------- + +The range operator ``..`` produces an ``integer[upper - lower]`` array +containing every integer from the lower bound (inclusive) to the upper bound +(exclusive). Both bounds must be ``integer`` expressions; non-integer bounds +are a compile-time type error. Omitting either bound is not supported. + +When both bounds are literals or :ref:`constexprs `, the +resulting array type is statically sized. When either bound is a runtime +value, the size is only known at runtime and the result should be stored in +an ``integer[*]`` variable. + +:: + + integer[4] v = 1..5; // [1, 2, 3, 4] - size known at compile time + integer[0] w = 3..3; // [] - lower equals upper, empty + integer[0] x = 5..1; // [] - lower exceeds upper, empty + + var integer n = 10; + integer[*] y = 1..n; // size only known at runtime + +The result is semantically a deep copy, independent of any variables used to +compute the bounds. + +**Special case: inside an indexing expression.** +When ``..`` appears inside square brackets as part of an index operation, it +takes on a different role: it denotes a *slice* of an existing array rather +than producing a standalone integer array. See :ref:`sssec:array_ops` for the +full slicing semantics. + +.. _ssec:expressions_stride: + +Stride Operator (``by``) +------------------------ + +The ``by`` operator strides through an array, selecting every *step*-th +element starting from the first, and returns a new independent array whose +elements are deep-copied from the source. + +Syntax:: + + by + +Given a source array of ``N`` elements and a step ``s``, the result contains +``N / s`` elements (integer division), selecting elements at positions +1, 1+s, 1+2s, and so on. + +The step must be a positive ``integer``. If the step expression is a +:ref:`constexpr `, a non-positive value is a compile-time +error; otherwise it is a runtime error. + +:: + + integer[8] v = 1..9; + integer[4] a = v by 2; // [1, 3, 5, 7] + integer[2] b = v by 3; // [1, 4] + +The ``by`` operator is most commonly combined with ``..`` to produce +arithmetic sequences. When both bounds and the step are literals, all sizes +are statically known: + +:: + + integer[4] odds = 1..9 by 2; // [1, 3, 5, 7] + integer[4] evens = 2..10 by 2; // [2, 4, 6, 8] + .. _ssec:expressions_generators: Generators @@ -62,15 +131,15 @@ This additional expression is used to create the generated values. For example: integer[10] v = [i in 1..10 | i * i]; /* v[i] == i * i */ - integer[2][3] M = [i in 1..2, j in 1..3 | i * j]; - /* M[i][j] == i * j */ + integer[2, 3] M = [i in 1..2, j in 1..3 | i * j]; + /* M[i, j] == i * j */ The expression to the right of the bar (``|``), is used to generate the value at the given index. Let ``T`` be the type of the expression to the right of the bar (``|``). Then, if the domain of the generator is an array of size ``N``, the result will be a array of size ``N`` with element type ``T``. Otherwise, if the domain of the -generator is a matrix of size ``N`` x ``M``, the result will be a matrix of size +generator is an N-D array of size ``N`` x ``M``, the result will be an array of size ``N`` x ``M`` with element type ``T``. Generators may be nested, and may be used within domain expressions. For instance, the generator below diff --git a/gazprea/spec/functions.rst b/gazprea/spec/functions.rst index 5cc1403..1cd01ff 100644 --- a/gazprea/spec/functions.rst +++ b/gazprea/spec/functions.rst @@ -5,11 +5,12 @@ Functions A function in *Gazprea* has several requirements: -1. All of the arguments are implicitly ``const``, and can not be mutable. +1. All of the arguments are implicitly ``const``, and can not be mutable or + mutated within the function. 2. Function arguments cannot contain type qualifiers. Including a type qualifier with a function argument should result in a ``SyntaxError``. -3. Argument types must be explicit. Inferred size arrays are allowed +3. Argument types must be explicit. Dynamic sized arrays are allowed 4. Functions can not perform any I/O. @@ -180,7 +181,10 @@ The arguments and return value of functions can have both explicit and inferred } -Like Rust, array *slices* may be passed as arguments: +Array *slices* (see :ref:`sssec:array_ops`) may be passed as arguments. +Since slices semantically produce a deep copy, they are treated as +``const`` values and +may only be passed to ``const`` (by-value) parameters: :: @@ -191,17 +195,22 @@ Like Rust, array *slices* may be passed as arguments: function slicer() returns real[*] { integer[10] a = 1..10; - var vector two_halves = to_real_vec(a[1..5]); - two_halves.append(to_real_vec(a[6..])); - return two_halves; + real[*] first_half = to_real_vec(a[1..5]); + real[*] second_half = to_real_vec(a[5..10]); + return first_half || second_half; } Remember that all function parameters are ``const`` in *Gazprea*, so that all functions are pure. That means that while it is legal to pass arrays and slices -*be reference*, the array contents cannot be modified inside the function, +*by reference*, the array contents cannot be modified inside the function, because the change would be visible outside the function. You must check that the ``const`` requirement is honored. +**Note**: There are ways to get aroud the restrictions imposed on passing +slices like by spreading a slice of an array into a new variable declaration. +If you feel so inclined see :ref:`sec:value_categories` and +:ref:`sssec:array_lvalue` + .. _ssec:function_namespacing: Function Namespacing @@ -213,8 +222,5 @@ gazprea program, nor can you forward declare the same function twice. Additionally, functions share the following namespaces: -- The ``struct`` namespace: you cannot have a struct and function with the same - name in the same gazprea program. - - The ``procedure`` namespace: You cannot have a procedure and function with the same name in the same gazprea program. diff --git a/gazprea/spec/globals.rst b/gazprea/spec/globals.rst index ee294a4..1c3686a 100644 --- a/gazprea/spec/globals.rst +++ b/gazprea/spec/globals.rst @@ -3,32 +3,29 @@ Globals ======= -Valid global scope statements inclulde: +The legal statements in the global scope are: -* Variable Declarations -* Struct Declarations -* Function and Procedure Declarations -* Function and Procedure Prototypes -* Typealias +* Function/Procedure Prototypes +* Function/Procedure Declarations +* Global Constants -All global statements are considered declarations. Global statements may occur -in any order, given respective symbols are defined before being referenced. - -Variable Declarations -===================== - -In *Gazprea* values can be assigned to a global identifier. All globals +In *Gazprea* values can be assigned to a global identifier. This is a Global +Constant. All globals must be immutable (``const``). If a global identifier is declared with the ``var`` specifier, then an error should be raised. This restriction is in place since mutable global variables would ruin functional purity. If functions have access to mutable global state then we can not guarantee their purity. -Globals must be initialized, but the initialization expressions may only contain -a single _scalar_ literal. That means that functions and even previously defined globals may not -appear on the RHS of a global declaration. The reason is because it is very difficult to -evaluate variables and functions at compile time. Global expression evaluation could -be deferred to runtime, but that has the disadvantage of changing errors from compile -time to run time. +Globals must be initialized with a valid :ref:`constant expression `. +This requirement ensures that the value of every global can be determined by +the compiler before the program runs. This restriction is in place to support +functional purity and enable compile-time optimizations. As a result of this +rule: +* Functions, procedures, or I/O operations may not appear in a global's + initializer. +* Globals cannot have a dynamically-sized array type (e.g., ``integer[*]``), + as their size cannot be determined at compile time. +* All globals are implicitly ``constexpr``. diff --git a/gazprea/spec/keywords.rst b/gazprea/spec/keywords.rst index 4c83471..573aa85 100644 --- a/gazprea/spec/keywords.rst +++ b/gazprea/spec/keywords.rst @@ -20,8 +20,6 @@ not be used by a programmer. - character -- columns - - const - continue @@ -40,8 +38,6 @@ not be used by a programmer. - integer -- length - - loop - not @@ -58,7 +54,7 @@ not be used by a programmer. - reverse -- rows +- shape - std_input @@ -68,8 +64,6 @@ not be used by a programmer. - string -- struct - - true - tuple @@ -78,8 +72,6 @@ not be used by a programmer. - var -- vector - - while - xor diff --git a/gazprea/spec/procedures.rst b/gazprea/spec/procedures.rst index af95cdb..89ca9d4 100644 --- a/gazprea/spec/procedures.rst +++ b/gazprea/spec/procedures.rst @@ -98,11 +98,22 @@ For example: var z = not p(); /* Legal, depending on the return type of p */ var u = p() + p(); /* Illegal */ +In particular, a procedure call may not appear as a member of a composite +literal or as the operand of an explicit cast inline with other expressions: + +:: + + /* p returns integer, q returns real */ + var t = (p(), 42); /* Illegal: procedure call inside tuple literal */ + var c = as(p()) + 1; /* Illegal: cast result used in binary expression */ + var r = as(p()); /* Legal: cast is the only operation applied */ + These restrictions are made by *Gazprea* in order to allow for more optimizations. Procedures without a return clause may not be used in an expression. *Gazprea* should raise an error in such a case. + :: /* p is some procedure with no return clause */ @@ -114,7 +125,21 @@ Procedure Declarations ---------------------- Procedures can use :ref:`forward declaration ` -just like functions. +just like functions. Parameter ``var`` qualifiers are part of +the procedure's type signature and **must** appear in both the prototype and +the definition. A prototype that omits ``var`` on a parameter that the +definition declares ``var`` is a type-signature mismatch and a compile-time +error. Otherwise parameters default to being ``const`` qualified. + +:: + + /* Prototype — var qualifier required where the definition uses it */ + procedure increment(var integer x); + + /* Definition — must match */ + procedure increment(var integer x) { + x = x + 1; + } .. _ssec:procedure_main: @@ -151,17 +176,17 @@ call by reference, and are therefore *l-values* (pointers). :: - procedure byvalue(String x) returns integer { - return len(x); + procedure byvalue(string x) returns integer { + return shape(x)[1]; } - procedure byreference(var String x) returns integer { - return len(x); + procedure byreference(var string x) returns integer { + return shape(x)[1]; } procedure main() returns integer { const character[3] y = ['y', 'e', 's']; - integer size = byvalue(y); // legal - call byreference(y); // illegal + integer size = byvalue(y); // legal: character[3] promotes to string + call byreference(y); // illegal: mutable arguments require exact type match return 0; } @@ -172,9 +197,10 @@ Aliasing Since procedures can have mutable arguments, it would be possible to cause `aliasing `__. -In *Gazprea* aliasing of mutable variables is illegal. The only case -where aliasing of arguments is allowed is through disjoint tuple or struct field access. This -helps *Gazprea* compilers perform more optimizations. However, the compiler must be able +In *Gazprea* aliasing of mutable variables is illegal (the only case +where any aliasing is allowed is that tuple members can be accessed by +name, or by number, but this is easily spotted). This helps *Gazprea* +compilers perform more optimizations. However, the compiler must be able to catch cases where mutable memory locations are aliased, and an error should be raised when this is detected. For instance: @@ -226,6 +252,16 @@ aliasing. call p(t1, t1.1); /* p is some procedure with a tuple argument and a real argument */ +**Slices are not subject to aliasing analysis.** A slice expression (e.g. +``v[1..4]``) is semantically an *rvalue* that produces a deep copy with +no persistent +address (see :ref:`sssec:array_lrvalue` and :ref:`sec:value_categories`). +Because a slice cannot be a ``var`` argument. Passing an rvalue as a mutable +parameter is a compile-time error. An rvalue it can never be the source of +a mutable +alias. Two slice arguments derived from the same array are therefore always +safe to pass as ``const`` arguments simultaneously. + .. _ssec:procedure_vec_mat: Array Parameters and Returns @@ -248,8 +284,5 @@ gazprea program, nor can you forward declare the same procedure twice. Additionally, procedures share the following namespaces: -- The ``struct`` namespace: you cannot have a struct and function with the same - name in the same gazprea program. - - The ``function`` namespace: You cannot have a procedure and function with the same name in the same gazprea program. diff --git a/gazprea/spec/statements.rst b/gazprea/spec/statements.rst index 9116a7d..40bb9b5 100644 --- a/gazprea/spec/statements.rst +++ b/gazprea/spec/statements.rst @@ -60,13 +60,13 @@ This applies to arrays of any dimension. :: - var integer[*][*] M = [[1, 1], [1, 1]]; + var integer[*, *] M = [[1, 1], [1, 1]]; /* Change the entire matrix M to [[1, 2], [3, 4]] */ M = [[1, 2], [3, 4]]; /* Change a single position of M \*/ - M[1][2] = 7; /* M is now [[1, 7], [3, 4]] */ + M[1, 2] = 7; /* M is now [[1, 7], [3, 4]] */ Tuples also have a special unpacking syntax in *Gazprea*. A tuple’s field may be assigned to comma separated variables instead of a tuple diff --git a/gazprea/spec/type_casting.rst b/gazprea/spec/type_casting.rst index bddd522..1eabeaa 100644 --- a/gazprea/spec/type_casting.rst +++ b/gazprea/spec/type_casting.rst @@ -98,17 +98,17 @@ truncation can occur in all dimensions. For example: :: - real[2][2] a = [[1.2, 24], [-13e2, 4.0]]; + real[2, 2] a = [[1.2, 24], [-13e2, 4.0]]; // Convert to an integer matrix. - integer[2][2] b = as(a); + integer[2, 2] b = as(a); // Convert to integers and pad in both dimensions. - integer[3][3] c = as(a); + integer[3, 3] c = as(a); // Truncate in one dimension and pad in the other. - real[1][3] d = as(a); - real[3][1] e = as(a); + real[1, 3] d = as(a); + real[3, 1] e = as(a); .. _ssec:typeCasting_ttot: diff --git a/gazprea/spec/type_inference.rst b/gazprea/spec/type_inference.rst index 8877704..053be33 100644 --- a/gazprea/spec/type_inference.rst +++ b/gazprea/spec/type_inference.rst @@ -46,3 +46,14 @@ at least one of the qualifier or the type to be present: x = 2; // assignment to undeclared variable? - illegal var x; // can't infer type - illegal integer x; // const integer initialized to 0 - legal + +Type inference also applies when the initializer is a procedure call. The +compiler synthesises the variable's type from the procedure's declared return +type: + +:: + + procedure get_count() returns integer { ... } + + var n = get_count(); // n is inferred as var integer + const m = get_count(); // m is inferred as const integer diff --git a/gazprea/spec/type_promotion.rst b/gazprea/spec/type_promotion.rst index c5a6325..948e03b 100644 --- a/gazprea/spec/type_promotion.rst +++ b/gazprea/spec/type_promotion.rst @@ -4,26 +4,120 @@ Type Promotion ============== Type promotion is a sub-problem of casting and refers to casts that happen -implicitly. - -Any conversion that can be done implicitly via promotion can also be done -explicitly via typecast expression. -The notable exception is array promotion to a higher dimension, which occurs as -a consequence of scalar to array promotion. +implicitly. Any conversion that can be done implicitly via promotion can also +be done explicitly via a typecast expression. + +.. _ssec:typePromotion_lattice: + +Type Lattice +------------ + +.. graphviz:: + + digraph TypeLattice { + rankdir=BT; + compound=true; + node [shape=box, fontname="Courier", style=filled, fillcolor=white]; + edge [fontname="Courier", fontsize=10]; + + // Scalar types + subgraph cluster_scalars { + label="Scalar Types"; + style=dashed; + boolean [label="boolean"]; + character [label="character"]; + integer [label="integer"]; + real [label="real"]; + } + + subgraph cluster_composite_types { + label="Composite Types"; + style=dashed; + //int_arr [label="integer[*]"]; + //real_arr [label="real[*]"]; + //bool_arr [label="boolean[*]"]; + char_arr [label="char[*]"]; + string [label="string"]; + generic_static_arr [label="U[*] (static)", shape=ellipse, style=dashed] + generic_dynamic_arr [label="U[*] (dynamic)", shape=ellipse, style=dashed] + generic_ragged_arr [label="U[..., *] (ragged)\nmust be explicitly initialized\nonly usable as data", shape=ellipse, style=dashed] + //generic_static_arr -> int_arr [dir=none] + //generic_static_arr -> real_arr [dir=none] + generic_dynamic_arr -> char_arr [dir=none] + //generic_static_arr -> bool_arr [dir=none] + + // String / character[*] - bidirectional + string -> char_arr [label="implicit", dir=both]; + generic_dynamic_arr -> string [dir=none] + + // arrays can implicitly promote to dynamic arrays but not vv + generic_static_arr -> generic_dynamic_arr [label="implicit"] + } + + subgraph cluster_aggregate_types { + label="Aggregate Types"; + style=dashed; + + // Anonymous tuple promotion (field-wise) + tup_tagged [label="tuple(name: U_1, name: T_2, ...)\ntagged"]; + tup_untagged [label="tuple(U_1, U_2, ...)\nuntagged"]; + tup_ptagged [label="tuple(name: U_1, U_2, ...)\npartially tagged"]; + tup_untagged -> tup_ptagged [label="implicit element-wise\ntype promotion"] + tup_ptagged -> tup_tagged [label="implicit promotion if:\n- field names match\n- field types match\n- field orders match"] + generic_tup [label="tuple (generic)", shape=ellipse, style=dashed]; + tup_tagged -> generic_tup + tup_ptagged -> generic_tup + tup_untagged -> generic_tup + } + + // The one scalar promotion + integer -> real [label="implicit"]; + + // Scalar-to-array (parametric - shown as a representative edge) + scalar_T [label="T (any scalar)", shape=ellipse, style=dashed]; + boolean -> scalar_T [lhead=cluster_scalars] + real -> scalar_T [lhead=cluster_scalars] + integer -> scalar_T [lhead=cluster_scalars] + character -> scalar_T [lhead=cluster_scalars] + + scalar_T -> generic_static_arr [ + label="broadcast\n(any compatible T)", + style=dashed, + ltail=cluster_composite_types + ]; + + union_type [label="U (union of all types)", shape=ellipse, style=dashed]; + scalar_T -> union_type + generic_static_arr -> union_type + generic_dynamic_arr -> union_type + generic_ragged_arr -> union_type + generic_tup -> union_type + + } + +The diagram above shows every implicit promotion *Gazprea* permits. An arrow +``A -> B`` means a value of type ``A`` can be silently converted to type ``B`` +without an explicit ``as<>`` cast. Paths not shown require an explicit cast or +are entirely forbidden. + +Solid edges represent concrete implicit promotions between named types. Dashed +nodes and edges represent parametric promotion rules that apply to any +conforming type. + +There are no other implicit promotions. In particular: + +- ``real`` does **not** promote to ``integer`` (truncation requires ``as<>``). +- ``boolean`` and ``character`` have no implicit promotions to any other type. +- Array types do not implicitly downcast to scalars. .. _ssec:typePromotion_scalar: Scalars ------- -The only automatic type promotion for scalars is ``integer`` to -``real``. This promotion is one way - a ``real`` cannot be automatically -converted to ``integer``. - -Automatic type conversion follows this table where N/A means no implicit -conversion possible, id means no conversion necessary, -``as(var)`` means var of type "From type" is converted to type -"toType" using semantics from . +The only automatic type promotion for scalars is ``integer`` to ``real``. +This promotion is one-way — a ``real`` cannot be automatically converted to +``integer``. +----------+-----------+---------+-----------+---------+---------------+ | | **To type** | @@ -42,88 +136,71 @@ conversion possible, id means no conversion necessary, .. _ssec:typePromotion_stoa: Scalar to Array --------------------------- +--------------- -All scalar types can be promoted to arrays that have an internal type that the -scalar can be :ref:`converted to implicity `. -This can occur when an array is used in an operation with a scalar value. - -The scalar will be implicitly converted to an array of -equivalent dimensions and equivalent internal type. For example: +Any scalar type can be promoted to an array whose element type is compatible +with the scalar (per the scalar lattice above). This occurs when a scalar is +used in an operation with an array — the scalar is broadcast to match the +array's shape. :: integer i = 1; - integer[*] v = [1, 2, 3, 4, 5]; - integer[*] res = v + i; - - res -> std_output; + integer[5] v = [1, 2, 3, 4, 5]; + integer[5] res = v + i; -would print the following: + res -> std_output; // [2 3 4 5 6] -:: - - [2 3 4 5 6] +Other examples:: -Other examples: - -:: + 1 == [1, 1] // true - scalar broadcast into equality check + 1..2 || 3 // [1, 3] - 3 promoted to integer[1] then concatenated - 1 == [1, 1] // True - 1..2 || 3 // [1, 2, 3] +Note that an array can never be downcast to a scalar, even with an explicit +cast. -Note that an array can never be downcast to a scalar, -even if type casting is used. Also note that matrix multiply imposes strict -requirements on the dimensionality of the the operands. The consequence is -that scalars can only be promoted to a matrix if the matrix multiply -operand is a square matrix (:math:`m \times m`). +.. _ssec:typePromotion_tuple: Tuple to Tuple -------------- -Tuples may be promoted to another tuple type if it has an equal number of -internal types and the original internal types can be implicitly -converted to the new internal types. For example: +An anonymous tuple may be implicitly promoted to another anonymous tuple type +if both tuples have the same number of fields and each source field can be +implicitly promoted to the corresponding destination field type (per the scalar +lattice above). + +Equivalently named fields are necessary, but not sufficient for implicit +promotion. If promoting a named tuple to another named tuple, names and +types must both match. If promoting a partially tagged tuple to another +partially tagged tuple, names, fields +and orders must all match between the two tuples. See +:ref:`sssec:tuple_casting` for additional elaboration, including the behaviour of +mixed (partially-named) tuples. :: tuple(integer, integer) int_tup = (1, 2); - tuple(real, real) real_tup = int_tup; - - tuple(char, integer, boolean[2]) many_tup = ('a', 1, [true, false]); - tuple(char, real, boolean[2]) other_tup = many_tup; + tuple(real, real) real_tup = int_tup; // Legal: anonymous, integer -> real -If initializing a variable with a tuple via :ref:`sec:typeInference`, the -variable is assumed to be the same type. -Therefore, tuple elements also copied accordingly. For example: +Two-sided promotion can occur when comparing anonymous tuples whose element +types differ. Each side is promoted to the common type before comparison: :: - tuple(real, real) foo = (1, 2); - tuple(real, real) bar = (3, 4); - - var baz = foo; - baz.1 -> std_output; // 1 - baz.2 -> std_output; // 2 - - baz = bar; - baz.1 -> std_output; // 3 - baz.2 -> std_output; // 4 - - -It is possible for a two sided promotion to occur with tuples. For example: - -:: + boolean b = (1.0, 2) == (2, 3.0); // (real, real) == (real, real) - boolean b = (1.0, 2) == (2, 3.0); +.. _ssec:typePromotion_string: Character Array to/from String ------------------------------- -A ``string`` can be implicitly converted to a vector of ``character``\ s and vice-versa (two-way type promotion). +A ``string`` can be implicitly converted to a ``character[*]`` and vice-versa. +This bidirectional promotion reflects that ``string`` is structurally a +``character[*]`` wrapper (see :ref:`ssec:string`). The compiler preserves the +type distinction for output-formatting purposes. :: - string str1 = "Hello"; /* str1 == "Hello" */ - character[*] chars = str1; /* chars == ['H', 'e', 'l', 'l', 'o'] */ - string str2 = chars || [' ', 'W', 'o', 'r', 'l', 'd']; /* str2 == "Hello World" */ + string str1 = "Hello"; + character[5] chars = str1; // string -> character[5] + string str2 = chars || [' ', 'W', 'o', 'r', 'l', 'd']; // character[*] -> string diff --git a/gazprea/spec/typedef.rst b/gazprea/spec/typedef.rst index 754edad..7d464be 100644 --- a/gazprea/spec/typedef.rst +++ b/gazprea/spec/typedef.rst @@ -40,7 +40,7 @@ consistency: typealias tuple(character[64], integer, real) student_id_grade; student_id_grade chucky_cheese = ("C. Cheese", 123456, 77.0); - typealias integer[2][3] two_by_three_matrix; + typealias integer[2, 3] two_by_three_matrix; two_by_three_matrix m = [i in 1..2, j in 1..3 | i + j]; Type aliases of arrays with inferred sizes are allowed, but declarations diff --git a/gazprea/spec/types.rst b/gazprea/spec/types.rst index 80bd4a8..74b4f44 100644 --- a/gazprea/spec/types.rst +++ b/gazprea/spec/types.rst @@ -11,8 +11,5 @@ Types types/integer types/real types/tuple - types/struct types/array - types/vector types/string - types/matrix diff --git a/gazprea/spec/types/array.rst b/gazprea/spec/types/array.rst index a0ad9db..930a0cb 100644 --- a/gazprea/spec/types/array.rst +++ b/gazprea/spec/types/array.rst @@ -1,403 +1,330 @@ .. _ssec:array: Arrays -------- +------ -Arrays are fixed size collections, where each element of the array has the -same type. Arrays can contain any of *Gazprea*'s base types (``boolean``, -``integer``, ``real``, and ``character``). +Arrays are ordered, homogeneous collections of elements. *Gazprea*'s array +system offers a unified syntax for +statically-sized, dynamically-sized, and multi-dimensional arrays. + +An array's elements can be of any single type, including base types ( +``boolean``, +``integer``, ``real``), compound types (``tuple``), and other arrays. + +.. _sssec:array_lrvalue: + +L-values and R-values +~~~~~~~~~~~~~~~~~~~~~ + +Every expression in *Gazprea* has a **value category**: either an *lvalue* +or an *rvalue*, which governs the role the expression may take and +how the result of the expression is stored. A full +discussion of value categories, including their relationship to the richer +C++ taxonomy (glvalue, xvalue, prvalue), is given in +:ref:`sec:value_categories`. + +For arrays the key consequence is that **slice expressions are rvalues**. +A slice such as ``v[2..5]`` produces, semantically, a new, independent +deep copy of the +selected elements. Because it is an rvalue, a slice: + +- cannot appear on the left-hand side of an assignment, and +- cannot be passed as a ``var`` (mutable) parameter to a procedure. + +Attempting either is a compile-time error. Because slices semantically produce +deep copies +and carry no persistent address, they do not participate in aliasing analysis +(see :ref:`ssec:procedure_alias`). .. _sssec:array_decl: Declaration ~~~~~~~~~~~ -Aside from any type specifiers, the element type of the array is the first -portion of the declaration. An array is then declared using square brackets -immediately after the element type. +An array type is specified by providing a shape in +square brackets (``[]``) to a type. -If possible, initialization expressions may go through an implicit type -conversion. For instance, when declaring a real array that is -initialized with an integer value the integer will be promoted to a real -value, and then used as a scalar initialization of the array. -Be careful about type inference! If the type of the array is being inferred -from the right had side, the previous example would create an ``integer`` -array instead of a ``real`` array. +#. Static vs. Dynamic Sizing -#. Explicit Size Declarations + *Gazprea* does not distinguish between arrays that are statically-sized + (static, with static memory footprint) and arrays that can change size at + runtime (dynamic) **to the user**. However, there are important differences + to note while implementing the language that can provide optimization + opportunities if correctly identified: - When an array is declared it may be explicitly given a size. This - size can be given as any integer expression, thus the size of the - array may not be known until runtime. + - A **static dimension** is declared using an integer literal or a + :ref:`constant expression `. + - A **dynamic dimension** is declared using an asterisk (``*``). :: - [] ; - [] = ; - [] = ; + // A statically-sized array of 10 integers. + // initialized to 0 elementwise + var integer[10] a; + + // A dynamically-sized array of integers. + // initialized to integer[0], with shape() = [0] + var integer[*] b; + + .. note:: + The ``*`` token is a syntactic marker meaning "size not declared here", + but it is **not** the sole property that makes an array dynamic. An array + is dynamic when its size cannot be determined at compile time: - The size of the array is given by the integer expression between the - square brackets. + - ``integer[x]`` is dynamic whenever ``x`` is not a + :ref:`constant expression `, no ``*`` is required. + - ``integer[*] a = [1, 2, 3]`` may be treated as **static** by the + compiler because the initialiser literal has a known length of 3. + A conforming implementation is free to allocate ``a`` on the stack + just like ``integer[3] a = [1, 2, 3]``. - If the array is given a scalar value (``type-expr``) of the same element type then the - scalar value is duplicated for every single element of the array. + The distinction is opaque to users, but implementations can make + performance gains by identifying arrays that are statically sized + and that do not change size (memory footprint) at runtime. - An array may also be initialized with another array. Initialization occurs element-wise, - with the RHS element type's initialization semantics applying from left to right. - If the LHS array is initialized using a RHS array that is too small then the LHS array will - be padded with zeros. However, if the LHS array is initialized with a RHS - array that is too large then a ``SizeError`` should be thrown at - compile-time or run-time. Check the :ref:`ssec:errors_sizeErrors` section to know when you - should throw the error. + **HINT**: This will figure as a part of performance testing. -#. Inferred Size Declarations +#. N-Dimensional Arrays - If an array is assigned an initial value when it is declared, then - its size may be inferred. There is no need to repeat the size in the - declaration because the size of the array on the right-hand side is - known. + Multi-dimensional arrays are declared by providing a comma-separated list of + dimension specifiers (the shape). There are some restrictions on which + dimension can be static or dynamic: i) There may only be one (1) dynamic + dimension per n-d array, ii) the last dimension of an n-d array with n > 1 + cannot be dynamic. :: - [*] = ; + // A 3x4 2d-array of real numbers. + var real[3, 4] matrix; + + // A dynamic list of static 3-element integer vectors. + var integer[*, 3] vectors; + + // a jagged array definition + var integer[3, *] jagged; // illegal, compile time error + var tuple(integer[*], integer[*], integer[*]) jagged; //equivalent #. Inferred Type and Size - It is also possible to declare an array with an implied type and - length using the var or const keyword. This type of declaration can only be - used when the variable is initialized in the declaration, otherwise - the compiler will not be able to infer the type or the size of the - array. + When initializing a variable with an array literal, its type and size can + be inferred by the compiler using ``var``. The resulting array is always + statically-sized unless _any_ initializer contains a dynamic dimension + or is a dynamically-sized array. :: - integer[*] v = [1, 2, 3]; - var w = v + 1; + // v is inferred as type integer[3]. + var v = [1, 2, 3]; + // w is inferred as type real[2, 2]. + var w = [[1.0, 2.0], [3.0, 4.0]]; - In this example the compiler can infer both the size and the type of - ``w`` from ``v``. The size may not always be known at compile time, so this - may need to be handled during runtime. + // x is inferred as type integer[5]. + var integer[*] dyn = [1, 2, 3, 4, 5]; + var x = [...dyn]; .. _sssec:array_constr: Construction ~~~~~~~~~~~~ -An array value in *Gazprea* may be constructed using the following -notation: +An array value is constructed using a comma-separated list of expressions +within square brackets. All elements must share a common promotable type. +The element type of an unspecified array is the top-most type in the type +hierarchy that elements can be _implicitly_ promoted to. Any other unpromotable +types will result in a compile-time type error. :: - [expr1, expr2, ..., exprN] + [1, 2, 3] // An integer array + [1, 2.5, 3] // A real array (integer 1 is promoted) + [(1, true), (2, false)] // An array of tuples + +*Gazprea* supports empty array literals (``[]``). The literal has no inherent +type and acquires its element type from the declared variable type. +A dynamic array (``integer[*]``) initialised with ``[]`` starts as an empty, +growable array. A static array of size zero (``integer[0]``) is also legal, +though of limited practical use. Any other static size is a compile-time +``SizeError``. -Each ``expK`` is an expression with a compatible type. In the simplest -cases each expression is of the same type, but it is possible to mix the -types as long as all of the types can be promoted to a common type. For -instance it is possible to mix integers and real numbers. +Because ``[]`` carries no element type, **type inference cannot be used with +an empty array literal.** A declaration of the form ``var x = []`` is a +compile-time ``TypeError`` since the compiler has no information from which to +derive the element type of ``x``. :: - real[*] v = [1, 3.3, 5 * 3.4]; + var integer[*] a = []; // Legal: dynamic empty array + integer[0] b = []; // Legal: static array of size zero (not very useful) + var integer[5] c = []; // Illegal: size mismatch, static array needs 5 elements + var d = []; // Illegal: element type cannot be inferred from [] +.. _sssec:array_spread: -It is also possible to construct a single-element array using this -method of construction. +Spread Operator +~~~~~~~~~~~~~~~ -:: +The spread operator (``...``) provides a concise, declarative way to construct +a new array by unpacking elements from existing arrays. It can be used multiple +times within an array literal and can be combined with other elements. - real[*] v = [7]; +The spread operator is a syntactic feature **exclusive** to array literals. +It is +evaluated left-to-right. +:: -*Gazprea* **DOES** support empty arrays. + var integer[2] a = [1, 2]; + var integer[3] b = [3, 4, 5]; -:: + // c becomes [0, 1, 2, 3, 4, 5, 6] + var integer[7] c = [0, ...a, ...b, 6]; - real[*] v = []; /* Should create an empty array */ +When constructing a static array, the compiler must be able to verify the final +size at compile time. Spreading a dynamic array into a static array is a +compile-time size error. See :ref:`sec:constexpr` for more details. .. _sssec:array_ops: Operations ~~~~~~~~~~ -#. Array Operations and functions - - a. length - - The number of elements in an array is given by the built-in - functions ``length``. For instance: - - :: - - integer[*] v = [8, 9, 6]; - integer numElements = length(v); - - - In this case ``numElements`` would be 3, since the array ``v`` - contains 3 elements. - - b. Concatenation - - Two arrays with the same element type may be concatenated into a - single array using the concatenation operator, ``||``. For - instance: - - :: - - [1, 2, 3] || [4, 5] // produces [1, 2, 3, 4, 5] - [1, 2] || [] || [3, 4] // produces [1, 2, 3, 4] - - - Concatenation is also allowed between arrays of different element - types, as long as one element type is coerced automatically to the - other. For instance: - - :: - - integer[3] v = [1, 2, 3]; - real[3] u = [4.0, 5.0, 6.0]; - real[6] j = v || u; - - - would be permitted, and the integer array ``v`` would be promoted to - a real array before the concatenation. - - Concatenation may also be used with scalar values. In this case - the scalar values are treated as though they were single element - arrays. - - :: - - [1, 2, 3] || 4 // produces [1, 2, 3, 4] - 1 || [2, 3, 4] // produces [1, 2, 3, 4] - - - An interesting corollary to array-scalar concatenation is that - two scalars can be concatenated to produce an array: - - :: - - integer[3] v = 1 || 2 || 3; // produces [1, 2, 3] - - - Remember that arrays have a fixed length, which means you cannot grow an - array by concatenating elements to the end: - - :: - - var integer[*] growme = [0]; // length is now 1 - var integer i = 1; - loop while (i < 10) { - growme = growme || i; // illegal: SizeError - i = i + 1; - } - - - c. Dot Product - - Two arrays with the same size and a numeric element type(types with - the ``+``, and ``\*`` operator) may be used in a dot product operation. - For instance: +#. Indexing and Slicing - :: + - **Indexing:** Elements of an N-dimensional array are accessed using a + comma-separated list of 1-based integer indices. Negative indices count + from the end of a dimension. + - **Slicing (Deep Copy):** A slice expression creates a **new, independent + array** by performing a **deep copy** of a segment of an existing array. + The resulting array has its own memory, and modifications to it will + never affect the original array. This behavior is consistent with + *Gazprea*'s rule that all assignments are deep copies. - integer[3] v = [1, 2, 3]; - integer[3] u = [4, 5, 6]; + A slice expression is an **r-value**, meaning it produces a value and + cannot be the target of an assignment. For N-D arrays, slicing is only + permitted on the last dimension. - /* v[1] * u[1] + v[2] * u[2] + v[3] * u[3] */ - /* 1 * 4 + 2 * 5 + 3 * 6 &=& 32 */ - integer dot = v ** u; /* Perform a dot product */ + .. note:: - - d. Range - - The ``..`` operator creates an integer array holding the specified range - of integer values. - This operator must have an expression resulting in an integer on both - sides of it. These integers mark the *inclusive* upper and lower bounds - of the range. - - For example: - - :: - - 1..10 -> std_output; - (10-8)..(9+2) -> std_output; - - prints the following: - - :: - - [1 2 3 4 5 6 7 8 9 10] - [2 3 4 5 6 7 8 9 10 11] - - The number of integers in a range may not be known at compile time when - the integer expressions use variables. In another example, assuming at - runtime that ``i`` is computed as -4: - - :: - - i..5 -> std_output; - - prints the following: - - :: - - [-4 -3 -2 -1 0 1 2 3 4 5] - - Therefore, it is *valid* to have bounds that will produce an empty - array because the difference between them is negative. - - d. Indexing - - An array may be indexed in order to retrieve the values stored in - the array. An array may be indexed using integers. - *Gazprea* is 1-indexed, so the first element of an array is at index 1 - (as opposed to index 0 in languages like *C*). For instance: - - :: - - integer[3] v = [4, 5, 6]; - integer x = v[2]; /* x == 5 */ - integer y = [4,5,6][3] /* y == 6 */ - - Like Python, *Gazprea* allows negative indices, which are interpreted as - starting from the _back_ of the array instead of the front: - - :: - - integer[3] v = [4, 5, 6]; - integer x = v[-2]; /* x == 5 */ - integer y = [4,5,6][-1] /* y == 6 */ - - Out of bounds indexing should cause an error. - - e. Stride - - The ``by`` operator is used to specify a step-size greater than 1 when - indexing across an array. It produces a new array with the values - indexed by the given stride. For instance: - - :: - - integer[*] v = 1..5 by 1; /* [1, 2, 3, 4, 5] */ - integer[*] u = v by 1; /* [1, 2, 3, 4, 5] */ - integer[*] w = v by 2; /* [1, 3, 5] */ - integer[*] l = v by 3; /* [1, 4] */ - integer[*] s = v by 4; /* [1, 5] */ - - d. Slices - - An array may be indexed by a range to create a new array that is a *slice* - of the original. The left hand index is inclusive, while the right is exclusive. - - :: - - integer[*] a = 0..10 by 2; /* a = [0, 2, 4, 6, 8, 10] */ - integer[2] x = a[2..4]; /* x == [2, 4] */ - - Note that for slices only a stride of 1 is allowed. - For indexing purposes three additions are made to range syntax: - - +---------+---------------------------------+ - | | Interpretation | - +---------+---------------------------------+ - + `..` | all elements | - +---------+---------------------------------+ - + `i..` | ith to nth elements | - +---------+---------------------------------+ - + `..-i` | first to n-i-1th elements | - +---------+---------------------------------+ - + `i..j` | i to jth elements | - +---------+---------------------------------+ - Examples: - - :: - - integer[*] a = 0..10 by 2; /* a = [0, 2, 4, 6, 8, 10] */ - integer x = a[..4]; /* x == [0, 2, 4] */ - integer y = a[4..]; /* x == [6, 8, 10] */ - integer z = a[..-1]; /* x == [0, 2, 4, 6, 8] */ - -#. Operations of the Element Type - - Unary operations that are valid for the Element type of an array may be - applied to the array in order to produce an array whose result is - the equivalent to applying that unary operation to each element of - the array. For instance: + Implementations are **not** required to perform an eager copy when a + slice is passed to a function or procedure. A lazy strategy such as + Copy-On-Write is permitted because slices are always passed as + ``const`` parameters and therefore cannot be mutated by the callee. + See :ref:`sec:impl_slice_passing` for guidance. :: - boolean[*] v = [true, false, true, true]; - boolean[*] nv = not v; + var integer[5] a = [10, 20, 30, 40, 50]; + // Legal: Create a new array 'b' from a slice of 'a'. + var integer[3] b = a[2..5]; // b is [20, 30, 40] - ``nv`` would have a value of - ``[not true, not false, not true, not true] = [false, true, false, false]``. - - Similarly most binary operations that are valid to the element type of a - array may be also applied to two arrays. When applied to two - arrays of the same size, the result of the binary operation is a - array formed by the element-wise application of the binary operation - to the array operands. - - :: + // 'b' is independent of 'a'. + b[1] = 99; // 'a' remains [10, 20, 30, 40, 50] - [1, 2, 3, 4] + [2, 2, 2, 2] // results in [3, 4, 5, 6] + // Illegal: A slice is not an l-value and cannot be assigned to. + a[1..3] = [1, 2]; // COMPILE-TIME ERROR +#. shape - Attempting to perform a binary operation between two arrays of - different sizes should result in a ``SizeError``. + The built-in function ``shape`` returns the shape of an array as a + dynamically-sized integer array (``integer[*]``). - When one of the operands of a binary operation is an array and the - other operand is a scalar, the scalar value must first - be promoted to an array of the same size as the array operand and - with the value of each element equal to the scalar value. For example: + For n-d arrays, ``shape`` returns the shape of the array using -1 + as a marker value for dynamic dimensions. :: - [1, 2, 3, 4] + 2 // results in [3, 4, 5, 6] + var integer[10] a; + shape(a) // returns [10] + var real[3, 4] b; + shape(b) // returns [3, 4] - Additionally the element types of arrays may be promoted, for instance - in this case the integer array must be promoted to a real array in - order to perform the operation: - - :: - - [1, 2, 3, 4] + 2.3 // results in [3.3, 4.3, 5.3, 6.3] + var character[5, *, 4] c; + shape(c) // returns [5, -1, 4] +#. Concatenation (``||``) - The equality operation is the exception to the behavior of the binary - operations. Instead of producing a boolean array, an equality - operation checks whether or not all of the elements of two arrays - are equal, and return a single boolean value reflecting the result of - this comparison. + The ``||`` operator concatenates two arrays. This operation is primarily + useful for **dynamically-sized arrays**. :: - [1, 2, 3] == [1, 2, 3] + var integer[*] a = [1, 2]; + a = a || [3, 4]; // a is now [1, 2, 3, 4] + var integer[*, 4] b; // integer[0, 4], growable + b = b || [1, 2, 3, 4]; // integer[1, 4] - yields ``true`` + var integer[1, *, 2] c; // integer[1, 0, 2] = [[]]; + c = c || [[[1, 2]]]; // c = [[[1, 2]]] - :: + The :ref:`spread operator ` is the preferred method for + composition of arrays. Note that working with a dynamically-sized array + implies that + the size check must be performed at runtime, however some arrays will have + constant size obtainable at compile time. - [1, 1, 3] == [1, 2, 3] +#. Element-wise Operations and Broadcasting + Unary and binary operations (e.g., ``not``, ``+``, ``-``, ``*``) can be applied + element-wise to arrays. - yields ``false`` + - For operations between two arrays, their dimensions must be compatible. + - For operations between an array and a scalar, the scalar is **broadcast** + across the array. - The ``!=`` operation also produces a boolean instead of a boolean array. - The result is the logical negation of the result of the ``==`` operator. + *Gazprea* follows a simple "trailing dimensions" rule for broadcasting: an + array ``A`` can be broadcast over array ``B`` if ``A``'s dimensions are a suffix + of ``B``'s dimensions. + :: -Type Casting and Type Promotion -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + var integer[3, 4] m = ...; + var integer[4] n = [1, 2, 3, 4]; + var s = 10; + + var r1 = m + s; // Legal: scalar broadcast + var r2 = m + v; // Legal: [4] is a suffix of [3, 4]. v is added to each row. + + var integer[3] v2; + var r3 = m + v2; // Illegal: [3] is not a suffix of [3, 4]. + + The equality operators ``==`` and ``!=`` are an exception. They perform a + deep, element-wise comparison and return a single ``boolean`` value. + + These element-wise operations are fully supported for dynamic arrays where the + shape is regular (e.g., ``integer[*]``, ``integer[*, 5]``). Compatibility + checks can be performed either at runtime or compile time, and a ``SizeError`` + will be thrown if + the shapes are incompatible. + +.. _sssec:array_taxonomy: + +Array Type Summary +~~~~~~~~~~~~~~~~~~ + +The following table summarises the different array forms in *Gazprea*, their +declaration syntax, the meaning of each wildcard (``*``) position, and the +key restrictions that apply. + ++------------------------+-------------------+----------------------------------------------+------------------------------------+ +| **Form** | **Declaration** | **Description** | **Element-wise ops allowed?** | ++========================+===================+==============================================+====================================+ +| Static | ``T[N]`` | Size fixed at compile time. ``N`` must be a | Yes: size known at compile time. | +| | | literal or :ref:`constexpr `. | | ++------------------------+-------------------+----------------------------------------------+------------------------------------+ +| Static N-D | ``T[N, M]`` | All dimensions fixed at compile time. | Yes: checked at compile time. | ++------------------------+-------------------+----------------------------------------------+------------------------------------+ +| Dynamic 1-D | ``T[*]`` | Size unknown at compile time; grows | Yes: shape checked at runtime. | +| | | or shrinks at runtime. | | ++------------------------+-------------------+----------------------------------------------+------------------------------------+ +| Regular dynamic N-D | ``T[*, N]`` | Leading dimension(s) dynamic; final | Yes: shape checked at runtime. | +| | | dimension(s) static. All rows have the | | +| | | same fixed inner length. | | ++------------------------+-------------------+----------------------------------------------+------------------------------------+ -To see the types that an array may be cast and/or promoted to, see -the sections on :ref:`sec:typeCasting` and :ref:`sec:typePromotion` -respectively. diff --git a/gazprea/spec/types/integer.rst b/gazprea/spec/types/integer.rst index 094669e..6ea94ff 100644 --- a/gazprea/spec/types/integer.rst +++ b/gazprea/spec/types/integer.rst @@ -112,9 +112,17 @@ override precedence and create new atoms in an expression. +----------------+----------------+ +Overflow +~~~~~~~~ + +``integer`` arithmetic is checked at runtime. If the result of an operation +exceeds the range of a signed 32-bit integer (i.e. falls outside +−2,147,483,648 to 2,147,483,647), a runtime ``OverflowError`` is raised. +Overflow does **not** wrap silently. + Type Casting and Type Promotion ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To see the types that ``integer`` may be cast and/or promoted to, see -the sections on :ref:`sec:typeCasting` and :ref:`sec:typePromotion` +the sections on :ref:`sec:typeCasting` and :ref:`sec:typePromotion` respectively. diff --git a/gazprea/spec/types/matrix.rst b/gazprea/spec/types/matrix.rst deleted file mode 100644 index 00ac607..0000000 --- a/gazprea/spec/types/matrix.rst +++ /dev/null @@ -1,130 +0,0 @@ -.. _ssec:matrix: - -Matrices --------- - -*Gazprea* supports two dimensional matrices as arrays of arrays. -Although the syntax and concepts are easily generalizable to many dimensions, -we are restricting the language to two dimensions for now. - -.. _sssec:matrix_decl: - -Declaration -~~~~~~~~~~~ - -Matrix declarations are similar to array declarations, the difference -being that matrices have two dimensions instead of one. The following are -valid matrix declarations: - -:: - - integer[*][*] A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]; - integer[3][2] B = [[1, 2], [4, 5], [7, 8]]; - integer[3][*] C = [[1, 2], [4, 5], [7, 8]]; - integer[*][2] D = [[1, 2], [4, 5], [7, 8]]; - integer[*][*] E = [[1, 2], [4, 5], [7, 8]]; - -.. _sssec:matrix_constr: - -Construction -~~~~~~~~~~~~ - -A 2D matrix can be viewed as an array of arrays. -The elements in each array form a single row of the matrix. -All rows with fewer elements than the row of maximum row length are padded with -zeros on the right. Similarly, if the matrix is declared with a row -length larger than the number of rows provided, the bottom rows of the -matrix are zero. If the number of rows or columns exceeds the -amounts given in a declaration an error is to be produced. - -:: - - integer[*] v = [1, 2, 3]; - integer[*][*] A = [v, [1, 2]]; - /* A == [[1, 2, 3], [1, 2, 0]] */ - - -Similarly, we can have: - -:: - - integer[*] v = [1, 2, 3]; - integer[3][3] A = [v, [1, 2]]; - /* A == [[1, 2, 3], [1, 2, 0], [0, 0, 0]] */ - - -Also matrices can be initialized with a scalar value. -Initializing with a scalar value makes every element of the matrix equal -to the scalar. - -Gazprea supports empty matrices. - -:: - - integer[*][*] m = []; /* Should create an empty matrix */ - -.. _sssec:matrix_ops: - -Operations -~~~~~~~~~~ - -Multi-dimensional arrays have binary and unary operations of the element type -defined in the same manner as uni-dimensional arrays. -Unary operations are applied to every element of the matrix, and binary -operations are applied between elements with the same position in the arrays. - -The operators ==, and != also have the same behavior independent of the -dimensionality of the array. -These operations compare whether or not **all** elements of are equal. - -Two dimensional arrays have several special operations defined on them. -If the element type is numeric (supports addition and multiplication), -then matrix multiplication is supported using the operator \**. -Matrix multiplication is only defined between matrices with compatible element -types, and the dimensions of the matrices must be valid for performing matrix -multiplication. -Specifically, the number of columns of the first operand must equal the number -of rows of the second operand, e.g. an :math:`m \times n` matrix multiplied by -an :math:`n \times p` matrix will produce an :math:`m \times p` matrix. -If the dimensions are not correct a ``SizeError`` should be raised. - -Arrays of any dimension support the built in functions ``rows`` and ``columns``, -which when passed a 2D array yields the number of rows and columns in the -matrix respectively. For instance: - -:: - - integer[*][*] M = [[1, 1, 1], [1, 1, 1]]; - - integer r = rows(M); /* This has a value of 2 */ - integer c = columns(M); /* This has a value of 3 \*/ - - -Matrix indexing is done similarly to array indexing, however, two -indices must be used. Because matrices are arrays of arrays the indexing is -coposite: - -:: - - M[i][j] -> std_output; - - -The first index specifies the row of the matrix, and the second index -specifies the column of the matrix. The result is retrieved from the row -and column. Both the row and column indices must be integers. - -:: - - integer[*][*] M = [[11, 12, 13], [21, 22, 23]]; - - /* M[1, 2] == 12 */ - -As with arrays, out of bounds indexing is an error on Matrices. - - -Type Casting and Type Promotion -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To see the types that matrix may be cast and/or promoted to, see -the sections on :ref:`sec:typeCasting` and :ref:`sec:typePromotion` -respectively. diff --git a/gazprea/spec/types/string.rst b/gazprea/spec/types/string.rst index 664f9b7..ab9fc3b 100644 --- a/gazprea/spec/types/string.rst +++ b/gazprea/spec/types/string.rst @@ -3,53 +3,52 @@ String ------ -A ``string`` is another object within *Gazprea*. Fundamentally, a ``string`` is -a ``vector`` of ``character``. -This means that, like a vector, a string behaves like a dynamically sized array, -but because it is an object *Gazprea* can provide type specific features. +A ``string`` is a distinct type in *Gazprea* that behaves as a wrapper around a +dynamically-sized ``character`` array. It is structurally equivalent to +``character[*]`` for all operations, but the type is preserved by the compiler +because it affects output formatting: a ``string`` written to an output stream +is printed as a sequence of characters (e.g. ``hello world``), while a +``character[*]`` is printed with array notation (e.g. ``[h e l l o]``). -String vectors behave a lot like character arrays, but there are several -differences between the two types: -an :ref:`extra literal style `, -the :ref:`result of a concatenation ` -and :ref:`behaviour when sent to an output stream `. +Bi-directional promotion between ``string`` and ``character[*]`` is implicit, +meaning a ``string`` can be assigned to a ``character[*]`` variable and vice +versa without an explicit cast. .. _sssec:string_decl: Declaration ~~~~~~~~~~~ -A string may be declared with the keyword ``string``. The same rules of -:ref:`vector declarations ` also apply to strings, which means -that all lenghts are inferred: +A string may be declared with the keyword ``string``. Because strings are +always dynamically sized, no length is specified in the declaration: :: - string = ; + string = ; .. _sssec:string_lit: Literals ~~~~~~~~ -Strings can be constructed in the same way as arrays using character literals. -*Gazprea* also provides a special syntax for string literals. A string literal -is any sequence of character literals (including escape sequences) in between -double quotes. For instance: +Strings can be constructed in the same way as character arrays by enclosing a +comma-separated list of character literals in square brackets. *Gazprea* also +provides a special string literal syntax: any sequence of characters (including +escape sequences) enclosed in double quotes. :: - string cats_meow = "The cat said \"Meow!\"\nThat was a good day.\n" + string cats_meow = "The cat said \"Meow!\"\nThat was a good day.\n"; -Although strings and character arrays look similar, they are still treated -differently by the compiler: +Although strings and character arrays look similar, they are treated differently +at output: :: character[*] carray = ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\n']; - string vec = carray; - carry -> std_output; - vec -> std_output; + string s = carray; + carray -> std_output; + s -> std_output; prints: @@ -65,30 +64,33 @@ prints: Operations ~~~~~~~~~~ -As character vectors, strings have all of the same operations defined on them as -the other array data types. -Remember that because a ``string`` and vector of ``character`` are fundamentally -the same, the concatenation operation may be used to concatenate values of the -two types. You may also append a slice of characters to a string using the -append method. -As well, a scalar character may be concatenated onto a string in the same way -as it would be concatenated onto an array of characters. -Note that because a ``string`` is a sub-type of ``vector``, concatenation may also -be accomplished with ``concat`` and ``push`` methods: +Because a ``string`` is structurally equivalent to ``character[*]``, all array +operations apply to strings. Concatenation uses the ``||`` operator: :: - var string letters = ['a', 'b'] || "cd"; - letters.concat("ef"); - letters.push('g'); - letters -> std_output; + var string greeting = "hello"; + var string full = greeting || " world"; + full -> std_output; // Prints: hello world -prints the following: +A ``string`` and a ``character[*]`` may be concatenated directly using ``||``, +since bi-directional promotion makes them compatible, and the result of +concatenating two strings can itself be concatenated further: :: - abcdefg + var string letters = ['h', 'e', 'l'] || "lo "; + var string full = letters || "world"; + full -> std_output; // Prints: hello world +A scalar ``character`` may also be concatenated onto a ``string`` through +scalar-to-array promotion: + +:: + + var string s = "abc"; + s = s || 'd'; + s -> std_output; // Prints: abcd Type Casting and Type Promotion ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/gazprea/spec/types/struct.rst b/gazprea/spec/types/struct.rst deleted file mode 100644 index 5a0a424..0000000 --- a/gazprea/spec/types/struct.rst +++ /dev/null @@ -1,157 +0,0 @@ -.. _ssec:struct: - -Structs -------- - -Like ``tuples``, a ``struct`` is a way of grouping multiple values with -different types into an aggregate data structure. -The main differences between tuples and structs are that the fields of a struct -are named, and the type signature of a struct is named as a user defined type. -Any type except ``tuple``, another ``struct`` and :ref:`streams` -may be stored within a struct. Also like tuples, structs must contain *at least two fields*. - -.. _sssec:struct_decl: - -Declaration -~~~~~~~~~~~ - -A struct is declared with the keyword ``struct`` followed by a *type name*, -followed by a parentheses-surrounded, comma-separated list of -*field declarations*. -Field declarations look identical to parameter declarations in functions, -and consist of a ```` pair: - -:: - - struct s1 (integer i, real r, integer[10] iv) t1; - struct Another (character char, real float, string[256] str, s1 struct_field); - var Another t2; - -The examples show two structs declared with types ``s1`` and ``another``. -Struct type ``s`` has three fields: ``i`` of type ``integer``, ``r`` of type -``real``, and ``iv`` of type ``integer[10]``. -Struct type ``another`` has four fields named ``char``, ``float``, ``str``, -and ``struct_field``. -The instance variables are ``t1`` and ``t2`` have types ``s1`` and ``another``, -respectively. - - -.. _sssec:struct_typealias: - - -Type Aliasing -~~~~~~~~~~~~~ - -A struct can be typealiased and used in any context a regular struct declaration may occur. Notably, the alias can only be used in a type positions, not literal constructors. - -:: - - typealias struct S(integer x, integer y) Pair; - - function add(Pair p1, Pair p2) returns Pair { - Pair p3 = S(p1.x + p2.x, p1.y + p2.y); // Pair can not be used in place of S - return p3; - } - - -.. _sssec:struct_acc: - -Access -~~~~~~ - - * ``field`` is a field within struct ``T`` - -For example: -:: - - struct s1 (integer i, real r, integer[10] iv) t1; - t1.i - t1.iv[2] - t1.r - -Struct fields can be used as both LVALs and RVALs, i.e. on either the left -or right hand side of an expression: - -:: - - y = x + t1.r; // Allowed - t1.iv[i] = type-expr; // Allowed - - -.. _sssec:struct_lit: - -Literals -~~~~~~~~ - -A ``struct`` literal is constructed by listing comma separated values for each -field in the struct, in the order defined in the struct's definition. -The value list is surrounded by parenthesis and prefaced by the struct type: - -:: - - struct S (integer i, character[5] c, integer[3] a3); - const S cs = S(x, "hello", [1, 2, 3]); - var S vs = S(0, ' ', 0); - struct V (integer i, real r, integer[10] arr) v = V(1, 2.1, [i in 1..10 | i]); - -The type of each value in the list must match the type of the corresponding -field definition in the struct. To save having to explicitly specify a value -for each index in an array, *Gazprea* allows a single scalar to be propagated -across all elements in the array. Finally, note that the field values may need -to be evaluated at run-time. - -.. _sssec:struct_ops: - -Operations -~~~~~~~~~~ - -The following operations are defined on ``struct`` instances. -In all of the usage examples, ``struct-type`` means some struct yielding -expression of a particular type, while ``id`` is a field within the struct. - -+------------+---------------+------------+--------------------------------+-------------------+ -| **Class** | **Operation** | **Symbol** | **Usage** | **Associativity** | -+------------+---------------+------------+--------------------------------+-------------------+ -| Access | dot | ``.`` | ``struct-type.id`` | left | -+------------+---------------+------------+--------------------------------+-------------------+ -| Comparison | equals | ``==`` | ``struct-type == struct-type`` | left | -+ +---------------+------------+--------------------------------+-------------------+ -| | not equals | ``!=`` | ``struct-type != struct-type`` | left | -+------------+---------------+------------+--------------------------------+-------------------+ - -Note that in the above table ``struct-type`` may only refer to a variable -instance for *Access*, while for *Comparison* at least one of the operands must -resolve to a struct type ``T``. -This allows struct instances to be compared to struct literals: - -:: - - struct Complex (real r, real i) c = Complex(r, 0.0); - if (c == Complex(0.0, i)) { } - -Two structs are equal when all fields within each struct have the same value. -It is an error to compare two structs of different types. - -Type Casting and Type Promotion -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -A struct itself cannot be cast or promoted. However, the fields within a struct -can be individually cast/promoted, as described in -sections :ref:`sec:typeCasting` and :ref:`sec:typePromotion`. - -.. _ssec:function_namespacing: - -Struct Namespacing --------------------- - -In *Gazprea*, struct declarations can occur in *any* scope. -This means that two struct types with the same name *can* coexist in the same -gazprea program so long as they are not in the same scope - -Additionally, ``structs`` share the following namespaces: - -- The ``procedure`` namespace: You cannot have a procedure and struct with - the same name in the same gazprea program. - -- The ``function`` namespace: You cannot have a function and struct with - the same name in the same gazprea program. diff --git a/gazprea/spec/types/tuple.rst b/gazprea/spec/types/tuple.rst index 99cf664..32ff16d 100644 --- a/gazprea/spec/types/tuple.rst +++ b/gazprea/spec/types/tuple.rst @@ -3,130 +3,198 @@ Tuples ------ -A ``tuple`` is a way of grouping multiple values with potentially different types into an aggregate data structure. Tuples are similar to :ref:`structs`, except that a tuple's fields are indexed instead of named. Tuples are often used to return multiple values from a function or procedure. Any type may be stored within tuples except structs and tuples. Additionally streams can not be stored in tuples. +A ``tuple`` is an ordered collection of values that groups multiple, potentially +different, types into a single compound value. + +The fields within a tuple can be anonymous or can be given explicit names. This +allows tuples to be used as simple, lightweight collections or as more descriptive, +self-documenting data structures. .. _sssec:tuple_decl: Declaration ~~~~~~~~~~~ -A tuple value is declared with the keyword ``tuple`` followed by a -parentheses-surrounded, comma-separated list of types. The list must -contain *at least two elements*. Tuples are *mutable*. For example: +A tuple type is declared using the ``tuple`` keyword followed by a +parenthesised, comma-separated list of field type specifiers. Each field may +optionally carry a name: :: - tuple(integer, real, integer[10]) t1; - tuple(character, real, character[256], real) t2; + // Anonymous fields - accessed by index only. + tuple(integer, real) a; -Note that while each tuple declaration defines a new type, the tuple type -is not named explcitly. Rather, it has a type *signature* ``(T1, T2, ...)``, -where ``T1, T2`` are the types of its members. -The number of fields in a ``tuple`` must be known at compile time. -This includes instances of :ref:`type inference`, where a variable is -declared without an explicit type signature using ``var`` or ``const`` -. -In this case, the variable must be initialised immediately with a literal whose -type is known at compile time. + // Named fields - accessed by index or by name. + tuple(integer x, real y) b; -.. _sssec:tuple_acc: +The default qualifier applies: a declaration without ``var`` is ``const``. -Access -~~~~~~ +**Type Identity** -The elements in a tuple are accessed using dot notation. Dot -notation can only be applied to tuple variables and *not* tuple literals. -Dot notation means an identifier followed by a period and then a literal -integer. Spaces are not allowed between elements in dot notation. -Field indices *start at one*, not zero. For example: - -:: +Field names are part of the type. The rules are: - t1.1 - t2.4 +- A **named field** contributes both its name and its type to the type identity. + Two named fields at the same position are compatible only if they share the + same name. +- An **unnamed field** contributes only its type to the type identity. An + unnamed field at position *i* in one tuple is compatible with an unnamed field + at position *i* in another tuple based solely on type compatibility. +- A named field and an unnamed field at the same position are **never** + compatible, even if the underlying types match. -Tuple access can be used either to retrieve the element value for an expression -or to assign a new value to the element. +Therefore, two tuples whose fields have the same underlying types but different +names (or a mix of named and unnamed) are considered different, incompatible types: :: - y = x + t1.1; // Allowed - t1.1 = type-expr; // Allowed + // These three variables have different, incompatible types. + tuple(integer, real) a = (1, 2.0); // fully anonymous + tuple(integer x, real y) b = (x: 1, y: 2.0); // fully named + tuple(integer a, real b) c = (a: 1, b: 2.0); // different names from b + // Mixed: field 1 is named x, field 2 is anonymous, field 3 is named z. + tuple(integer x, real, character z) mixed = (x: 1, 2.0, z: 'a'); + + // Incompatible with mixed: field 2 has a name (y) where mixed has none. + tuple(integer x, real y, character z) named = (x: 1, y: 2.0, z: 'a'); + + mixed == named; // ILLEGAL: field 2 is unnamed in mixed, named in named .. _sssec:tuple_lit: Literals ~~~~~~~~ -A tuple literal is constructed by grouping values together between -parentheses in a comma separated list. For example: +A tuple literal is constructed by grouping values together between parentheses +in a comma-separated list. + +**Fully named tuples** may use named field syntax, where each value is preceded +by its field name and a colon (``:``) Named literals may appear in any order, +since the names provide unambiguous mapping to fields: :: - tuple(integer, character[5], integer[3]) my_tuple = (x, "hello", [1, 2, 3]); - var my_tuple = (x, "hello", [1, 2, 3]); - const your_tuple = (x, "hello", [1, 2, 3]); - tuple(integer, real, integer[10]) tuple_var = (1, 2.1, [i in 1..10 | i]); + // A literal of type tuple(integer x, real y) — names in order + (x: 10, y: 3.14) + + // Same type, names out of order — legal because all fields are named + (y: 3.14, x: 10) + +**Anonymous or mixed tuples must be constructed positionally.** When any field +in a tuple type is unnamed, the entire literal must list values in declaration +order with no field name labels: + +:: + + // tuple(integer, character, boolean) — all anonymous, positional only + (1, 'a', true) + + // tuple(integer x, real, character z) — mixed: positional only + (1, 2.0, 'a') + +.. note:: + + **Rationale.** Allowing named labels in a mixed-tuple literal would make + ordering ambiguous as soon as more than one field is unnamed. For example, + given ``tuple(integer x, real, character z)``, the literal + ``(z: 'a', 2.0, x: 1)`` looks as though it reorders fields, but the unnamed + ``real`` field has no label to anchor it — it could plausibly bind to position + 1, 2, or 3. Requiring fully positional construction for any tuple that contains + an unnamed field eliminates this ambiguity entirely and keeps the rule simple: + if you need named literals, name all of your fields. + +Duplicate field names within a single tuple literal are not allowed and will +result in a compile-time error. + +.. _sssec:tuple_access: + +Access +~~~~~~ + +Fields in a tuple are accessed using dot notation (``.``). *Gazprea* supports +dual access for named fields: + +1. **By Index:** All fields can be accessed by their 1-based integer index. +2. **By Name:** If a field is named, it can also be accessed by its name. + +:: + + var point = (x: 10, y: 20); + + // Access by index + point.1 -> std_output; // Prints 10 + point.2 = 30; // Modify the second field + + // Access by name + point.x -> std_output; // Prints 10 + point.y = 40; // Modify the field named 'y' .. _sssec:tuple_ops: Operations ~~~~~~~~~~ -The following operations are defined on tuple values. In all of the -usage examples ``tuple-expr`` means some expression yielding tuples with the same type signature, -while ``int_lit`` is an integer literal as defined in :ref:`Integer Literals ` and ``tuple-inst`` is the -name of tuple instance as defined in :ref:`sec:identifiers`. - -+------------+---------------+------------+------------------------------+-------------------+ -| **Class** | **Operation** | **Symbol** | **Usage** | **Associativity** | -+------------+---------------+------------+------------------------------+-------------------+ -| Access | dot | ``.`` | ``tuple-inst.int_lit`` | left | -+------------+---------------+------------+------------------------------+-------------------+ -| Comparison | equals | ``==`` | ``tuple-expr == tuple-expr`` | left | -+ +---------------+------------+------------------------------+-------------------+ -| | not equals | ``!=`` | ``tuple-expr != tuple-expr`` | left | -+------------+---------------+------------+------------------------------+-------------------+ - -Note that in the above table ``tuple-expr`` may refer to a variable for access. -Accessing a literal could be replaced immediately with the scalar inside the tuple literal, however, ``tuple-expr`` may -refer to a literal in comparison operations to enable shorthand like this: +**Comparison** -:: +The equality (``==``) and inequality (``!=``) operators are defined for tuples. +Two tuples are considered equal if and only if: + +1. They have a compatible type (see :ref:`sssec:tuple_casting`). +2. All corresponding fields are pairwise equal. - if ((a, b) == (c, d)) { } +:: -Comparisons are performed pairwise. Two tuples are equal when for every expression pair, the equality operator returns true. -Two tuples are unequal when one or more expression pairs are unequal or the types mismatch. This table describes how the -comparisons are completed, where ``t1`` and ``t2`` are tuple yielding expressions including literals: + tuple(integer x, integer y) p1 = (x: 1, y: 2); + tuple(integer x, integer y) p2 = (x: 1, y: 2); + tuple(integer a, integer b) p3 = (a: 1, b: 2); + tuple(integer, integer) p4 = (1, 2); -============= ========================================= -**Operation** **Meaning** -============= ========================================= -``t1 == t2`` ``t1.1 == t2.1 and ... and t1.n == t2.n`` -``t1 != t2`` ``t1.1 != t2.1 or ... or t1.n != t2.n`` -============= ========================================= + p1 == p2; // true: same type, same values + p1 == p3; // ILLEGAL: incompatible types — field names differ (x/y vs a/b) + p1 == p4; // ILLEGAL: incompatible types — p1 has named fields, p4 has none +.. _sssec:tuple_casting: -.. _sssec:tuple_unpack: +Type Casting and Promotion +~~~~~~~~~~~~~~~~~~~~~~~~~~ -Unpacking -~~~~~~~~~ +**Implicit Promotion (Anonymous Fields Only)** -Any tuple expression may be assigned (unpacked) into multiple lvalues. If the size of -the tuple being unpacked does not match the number of lvalues being asigned, an ``AssignError`` -may be raised. There is no partial unpacking of tuples. +Implicit promotion between tuple types is permitted only at positions where +**both** the source and destination fields are unnamed. At such positions, the +normal scalar promotion rules apply (e.g. ``integer`` promotes to ``real``). +Named fields are never implicitly promoted; if either the source or destination +field carries a name, an explicit ``as<>`` cast is required for that conversion. :: - var real a; - var real b; - a, b = (3.14, 1.5); + // Fully anonymous: field-wise promotion applies freely. + tuple(integer, integer) int_tup = (1, 2); + tuple(real, real) real_tup = int_tup; // Legal: both fields anonymous, integer -> real + + // Mixed: the unnamed field (position 2) promotes; named fields must match exactly. + tuple(integer x, integer, character z) src = (x: 1, 2, z: 'a'); + tuple(integer x, real, character z) dst = src; // Legal: position 2 is unnamed in both + + // Named fields do NOT implicitly promote. + tuple(integer x, integer y) named = (x: 1, y: 2); + tuple(real x, real y) named_real = named; // ILLEGAL: named fields require as<> + // Must use explicit cast: + tuple(real x, real y) named_real = as(named); + +**Explicit Casting with ``as<>``** + +The ``as<>`` operator can be used to explicitly convert between compatible tuple +types. The cast is valid if the source and destination have the same number of +fields and each source field can be cast (per :ref:`sec:typeCasting`) to the +corresponding destination field type. + +:: -Type Casting and Type Promotion -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + // Cast an anonymous tuple to a named type + tuple(integer x, integer y) named = as((1, 2)); -To see the types that tuple may be cast and/or promoted to, see the sections on :ref:`sec:typeCasting` -and :ref:`sec:typePromotion`, respectively. + // Cast between named tuple types with compatible field types + tuple(integer a, integer b) ab = (a: 3, b: 4); + tuple(real x, real y) xy = as(ab); diff --git a/gazprea/spec/types/vector.rst b/gazprea/spec/types/vector.rst deleted file mode 100644 index ff42f8a..0000000 --- a/gazprea/spec/types/vector.rst +++ /dev/null @@ -1,104 +0,0 @@ -.. _ssec:vector: - -Vectors -------- - -Vectors are language supported objects that allow for dynamically sized arrays. -Once created, ``vectors`` in *Gazprea* behave exactly like arrays: they can be -intermixed with arrays in expressions; they can be used on the RHS of array -declarations and initializations; and they can be passed as array arguments to -subroutines and functions. - -.. _sssec:vec_decl: - -Declaration -~~~~~~~~~~~ - -Vectors are declared and (optionally) initialized as follows. -(Note that we have replaced ``<>`` with ``|`` in the notation below since -the literals ``<`` and ``>`` are used in the declaration) - - :: - - vector<|type|> |identifier|; - vector<|type|> |identifier| = |type-expr|; - vector<|type|> |identifier| = |type-array|; - - -Unlike the array type, *Gazprea* vectors do not have an explicit size -specifier, often called *capacity* in other languages. Below are some examples of -`vector` declarations. - - :: - - const vector v1 = 3; // [[3, 3]] - const vector v2 = [4, 5]; // [[4, 5]] - const vector v3 = 42; // [42] - const vector v4 = 1; // [1.0] - - -Vectors of inferred sized arrays assume the size of the *first* array in the vector. -Subsequent array elements of less than the inferred size are padded. -Those greater raise a runtime ``SizeError``. - - :: - - const vector vec = ['a', 'b', 'c']; - const vector ragged_right = [[1.0], [2.0, 2.0]]; // SizeError - const vector paddeded_right = [[1.0, 2.0], [1.0]]; // Padds second element - const vector const_vec = vec; - - -Operations -~~~~~~~~~~~ - -Operations on vectors are identical syntactically and semantically to -operations on arrays. In particular, operand lengths must match for binary -expressions and dot product. Vectors can behave as arrays by using slices: - - :: - - var vector v1, v2; - var integer[3] a; - v1.append([1, 2, 3]); - a = v1; // slice of v yields array and can be used to initialize 'a' - v2 = v1 + a; // slice of vector plus array yields result type array - a = v1 + v2; // slice of v1 + slice of v2 still yields array type - - -A vector or vector slice can be passed as a call argument that has been -declared as an array slice of the same size and type. When indexing a vector of arrays, -the first index selects the array element within the vector, and the second index selects -the element within the array: - - :: - - vector ragged_right = [[1.0], [2.1]]; - length(ragged_right[1]) -> std_output; // prints 1 - ragged_right[2][1] -> std_output; // prints 2.1 - - -As a language supported object, *Gazprea* provides several methods for ``vector``: - -- ``push()`` - pushes a new element to the back of the vector - -- ``len()`` - number of elements in the vector - -- ``append(T[*])`` - append another array slice to the vector where `T` is the type of the original vector or a type that can be implicitly cast to it. The following example tracks the elements inside `vec` through various appends. - - :: - - const x = 1..10; - var vector vec; // [] - - // scalar to array promotion - vec.append(1); // [[1.0, 1.0]] - - // array padding - vec.append(3..3); // [[1,0, 1.0], [3.0, 0.0]] - - // slices - vec.append(x[5..7]); // [[1,0, 1.0], [3.0, 0.0], [5.0, 6.0]] - - vec[tvec.len()] -> std_output; // prints 3 -