A standard for generics in PHP

December 14th 2020


Update: See proposal on Github.

Using static analysis generics in PHP is already a reality. Many developers are reaping the considerable rewards of using static analysers and generics on their PHP codebase. A major blocker to increased uptake is the lack of a standard for generics. A standard will provide tools (such as IDEs) and libraries with a clear guidelines for implementing and supporting generics.

There is already an unofficial standard for generics, see documentation from Psalm and PHPStan.

The scope of this article is to propose how PHP code should be annotated to include the extra information required for generics. This article proposes the existing informal standard becomes the "official" standard. It also proposes expanded support using Attributes.

Example using docblock (for PHP7 code):

/** @template T of object */
class Queue
{
    /** @var array<int,T> */
    private array $queue = [];

    /** @param T $item */
    public function add(
        $item,
    ): void {
        // Implementation
    }

    /** @return T */
    public function next() 
    { 
        // Implementation
    }

    /** @return array<int,T> */
    public function asArray(): array
    {
        // Implementation
    }
}

Example using attributes (for code that is only compatible with PHP8):

use StaticAnalysis\Generics\v1\Template;
use StaticAnalysis\Generics\v1\Type;

#[Template("T", "object")]
class Queue
{

    #[Type("array<int,T>")] 
    private array $queue = [];

    public function add(
        #[Type("T")] $item,
    ): void {
        // Implementation
    }

    #[Type("T")]   // This is return type
    public function next() 
    { 
        // Implementation
    }

    #[Type("array<int,T>")]
    public function asArray(): array 
    { 
        // Implementation
    }
}

This article investigates the possible methods of annotating PHP code with the additional information required by generics. It then argues why the above formats should be the chosen one.

Contents:

Assumed knowledge

It is assumed that the reader is familiar with the concept of generics and attributes. The following articles and videos give further background:

PHP Docblock

There is the existing informal standard for generics. The key element is the @template docblock. See documentation from Psalm and PHPStan. In the context of generics, both Psalm and PHPStan pretty much follow the same standard.

Currently, this is not formally documented as a standard (e.g. as a PSR). PSRs 5 and 19 focus on PHPDoc blocks and tags more generally, neither mention @template.

The lack of a standard is the reason cited in PHPStorm's 2020.3 release announcement for not fully supporting @template:

We believe that support for generics is an advanced feature that lacks a proper specification and has many edge cases. Yet, we have decided to implement basic support for the @template construct based on the Psalm syntax, to see how it goes.

The next steps to advancing this as a standard is to formalise it, probably as a PSR.

Pros

  • An informal standard exists. It just needs formalising, maybe as a PSR.
  • Supported by all advanced static analysers.
  • Partial support by some IDEs.

Cons

  • Uses docblocks (which some object to).
  • Can not be versioned.

Simple Attributes

PHP 8 has a new feature, attributes. Attributes could be used instead of docblocks for providing additional information for generics.

Instead of @template docblock use the attribute #[Template]. For additional type information (that appears after @var, @param or @return) use the attribute #[Type].

Param

Using @param docblock:

/**
 * @param T $item 
 */
public function add(
    $item,
): void { ... }

Using an attribute instead of @param:

public function add(
    #[Type("T")] $item,
): void { ... }
Return

Using the @return docblock:

/**
  *  @return T 
  */
public function next () { ... }

Using an attribute instead of @return (NOTE: it's not possible to add an attribute to a return type. However, as a function or method can only have one return type an attribute is attached to the function/method to give information about the return type) :

#[Type("T")] 
public function next() { ... }
Template

Using the @template docblock:

/** @template T */
class Queue { ... } 

Same information as an attribute:

#[Template("T")]
class Queue { ... } 

To delve a bit deeper...

It is possible to restrict templates to be of a certain type. E.g.

/** @template T of Animal */
interface AnimalGame { /* some code */ }

The #[Template] attribute takes an optional 2nd argument. The restriction of the template is the 2nd argument. The docblock example would become:

#[Template("T", "Animal")]
interface AnimalGame {  /* some code */ }

Definitions of Attributes

Template

namespace StaticAnalysis\Generics\v1;

use Attribute;

#[Attribute(Attribute::TARGET_CLASS|Attribute::TARGET_FUNCTION|Attribute::TARGET_METHOD|Attribute::IS_REPEATABLE)]
class Template
{
    public function __construct(
        public string $name,
        public ?string $of = null,
    ) {}
}

Type

namespace StaticAnalysis\Generics\v1;

use Attribute;

#[Attribute(Attribute::TARGET_FUNCTION|Attribute::TARGET_METHOD|Attribute::TARGET_PARAMETER|Attribute::TARGET_PROPERTY)]
class Type
{
    public function __construct(
        public string $name,
    ) {}
}

Other attributes are also required. These include #[Extends] and #[Implements].

NOTE: The namespace StaticAnalysis\Generics\v1 is a placeholder. Assuming this ends up as a PSR then it could be Psr\Generics\v1.

Pros

  • Attributes can be versioned (e.g. by having a version number in the namespace). It is highly unlikely that the first generics standard will be 100% feature complete. Allowing versioning means that 90% of uses cases can be covered off in v1 and minor enhancements and corrections can be added later.
  • Does not pollute docblocks with metadata needed for generics.
  • Attributes are, arguably, the correct place to store additional metadata about code.
  • Informal standard for documenting generics is still used, it's just shifted to attributes.

Cons

  • The information for generics is stored in strings and so is not part of the AST. Additional tooling is required to convert the information in the strings into some AST like data. (That said, this is something that happens now anyway.)

Non cons

An objection that a developer might have is that because the information is in a string auto completion (i.e. in IDEs) will not be possible. This is an incorrect assumption. IDEs could still understand the context of the string to provide auto completion and validation. In fact this already happens. PHPStorm provides autocompletion for docblocks. It also provides autocompletion based on the information documented in docblocks. So it would be no problem to extend this to a string within an attribute.

Complex Attributes

One might suggest that instead of using strings:

#[Template("T")]
#[Type("array<int,T>")]
function asArray(
    #[Type("T")] $value,
): array {
    return [$value];
}

Add the type information directly, i.e. no strings. Unfortunately the following is not valid PHP 8.0 code.

#[Template(T)]
#[Type(array<int,T>)]
function asArray(
    #[Type(T)] $value,
): array {
    return [$value];
}

From php.net:

Arguments to attributes can only be literal values or constant expressions.

Before suggesting other notation let's consider examples of the type information that needs supporting by the #[Type] attribute:

  1. T|int|null
  2. array<int,string>
  3. array{int: int, name: string}
  4. array<0: int, 1: T>
  5. class-string<T>
  6. Queue<T>
  7. ArrayCollection<K,V>

Attempt 1

Inspired by PHPStorm's ArrayShape attribute, it might be possible to encode information as an array.

So let's start with the first example T|int|null, a union:

#[Type(['T', 'int', null])]

The second example array<int,string> defines the key and value of an array:

#[Type(['int' => 'T'])]

Let's consider the 3rd example, an array shape .array{int: int, name: string} Following PHPStorm's example: (NOTE: the string int is a valid array key):

#[Type(['int' => 'int', 'name' => 'string'])]

⚠️ There are problems with this notation. The second example could be misinterpreted as a single element array shape (with a key of int). Consider the first example T|int|null, which is the same as:

#[Type([0 => 'T', 1 => 'int', 2 => null])]

This too might be misinterpreted as an array shape.

There are already many problems with this method and there are still several more example cases to consider. Something more advanced is needed...

Attempt 2

Remember the constraint that arguments to attributes can only be literal values or constant expressions. From attempt 1 we can see that there needs to be a way distinguishing between unions, array shapes and normal arrays.

Unions could be expressed like this: ['union' => [<type 1>, <type 2>, etc].

Array shapes could be expressed like this: ['shape' => <array shape>].

Back to our examples. For a union, example 1 T|int|null, would become:

#[Type(['union' => ['T', 'int', null]])]

For now the second example array<int,string> remains as before:

#[Type(['int' => 'T'])]

The third array{int: int, name: string} becomes:

#[Type(['shape' => ['int' => 'int', 'name' => 'string']])]

Example 4 array<0: int, 1: T>:

#[Type('shape' => [0 =>'int', 1 => 'string']])]

This is progress. There is no ambiguity. Unfortunately the examples are more verbose.

Let's continue with example 5 class-string<T>:

#[Type(['class-string' => 'T'])]

As for example 6 Queue<T>. This could work:

#[Type([Queue::class => 'T'])]

How about example 7 ArrayCollection<K,V>:

#[Type([ArrayCollection::class => ['K', 'V'])]

Contrast this with example 2. These are similar in intent; they define the type for key and value, however they look very different:

#[Type(['K' => 'V'])]
#[Type([ArrayCollection::class => ['K', 'V'])]

Perhaps it could be argued that ['K' => 'V'] is a shortcut for ['array' => ['K', 'V']]?

Let's compare a few examples of using strings

String version:

#[Type("ArrayCollection<K,V>")]

Becomes:

#[Type([ArrayCollection::class => ['K', 'V'])]

Consider a complex, and somewhat contrived, example:

#[Type("array{0: int, employees: array<string,Person::class>, type: class-string<T>, data: array<int,T>}|null")]

Becomes:

#[Type(['union' => ['shape' => [0 => 'int', 'employees' => ['string' =>  Person::class], 'type' => ['class-string' => 'T'], 'data' => ['int' =>  'T'], ], null]])]

Even with this system there are still ambiguities. shape and union have become reserved words.

#[Type(['shape' => ['shape' => ['string' => 'string'], 'area' => 'int']])]

Does this mean:

array{shape: array<int,string>, area: int}

Or

array{0: array{string: string}, array: int}

Of course, you can pick a different name. Using array-shape instead of shape will probably result in less chance of a name collision, but the fundamental problem still exists. This is probably one of many issues. It's safe to say this is not a viable solution.

Attempt 3

Instead of just the attribute #[Type], have others too, e.g. #[Union], #[ArrayShape] and no doubt others.

This is a non-starter. It would be impossible to describe an array or array shapes. array<int, array{name:string, age:int}>

Attempt 4

A more drastic measure is to create an RFC to allow more scope for what can be used as arguments for attributes. There are many disadvantages to this. Firstly the earliest this could happen is for PHP 8.1, at the time of writing a year away. Secondly if the main use case for this is to support generics, then I think it would be better adding the notation to the language, even if only used by static analysis and not the run time. E.g.

class Queue<T> 
{
    public function add(T $item): void { /* implementation */ }

    public function next(): T { /* implementation */ }
}

It's a controversial suggestion. I made it at PHP-UK conference in Feb 2020, and a couple of people thought it wasn't wise. I think I agree with them!

Let's disregard this option now.

Complex attributes conclusions

Given the constraints placed of what is a legal argument for an Attribute, any attempt at documenting information required by generics is likely to be complicated and unintuitive.

Conclusions

The only 2 sensible methods for documenting are:

  • docblocks
  • attributes that contain the same information as in the docblock

The notation used for expressing generics, array shapes and unions are intuitive to those who have experience of other programming languages that use generics. It would be sensible to have a standard that follows other languages rather than inventing something entirely new for PHP.

The benefits of the attributes over docblocks include:

  • Versioning is possible.
  • Does not pollute docblocks with metadata needed for generics.

Given that docblocks are the de facto standard, both should be supported going forward.

Comments, corrections, feedback

Drop me a DM on twitter.