June 5th 2019
Thanks to Alexey Berezuev for providing a Russian translation of this article.
Ask PHP developers which big feature they wished PHP had and many would say generics.
Language level support for generics in PHP would be the best solution. Adding generics to PHP is hard. Hopefully native support will be part of the language one day, but we'll probably have to wait a few years until it happens.
This article outlines how using existing tools, in some cases with minimal modifications, we can give PHP the power of generics now.
This is also available as a talk.
Content:
This section covers a brief introduction to generics.
Further reading:
As it is not currently possible to specify generics at a language level we have to do the next best thing which is to specify generics in the docblock.
In many code bases we already do this. See this example:
/**
* @param string[] $names
* @return User[]
*/
function createUsers(iterable $names): array { ... }
In the code above we do what we can at language level. We've specified that the parameter $names
is something that can be iterated over.
We've also specified that the function will return an array. PHP will throw a TypeError
if the types of parameter and return are not as specified.
The docblock offers more insight. $names
must be strings. The function must return an array of User
objects.
PHP itself doesn't make these additional type checks.
IDEs like PhpStorm do understand this notation and warn developers if this additional contract is not met.
Additionally static analysis tools like Psalm, PHPStan and Phan can also validate that data of the correct type is passed to and from the function.
As far as generics go the above example is simple. More advanced examples include where we might want to specify the type of an array key as well as the type of its value. One way of specifying this is:
/**
* @return array<string, User>
*/
function getUsers(): array { ... }
This means that the array returned from getUsers
has keys of type string
and values of type User
.
Static analysers like Psalm, PHPStan and Phan understand this notation. They will do validation on this. Consider the following code:
/**
* @return array<string, User>
*/
function getUsers(): array { ... }
function showAge(int $age): void { ... }
foreach(getUsers() as $name => $user) {
showAge($name);
}
The static analysers would raise an issue on the call to showAge
with an error similar to this: Argument 1 of showAge expects int, string provided
.
Unfortunately at the time of writing PhpStorm does not do this.
We still might want to go further with generics. Consider an object that represents a stack:
class Stack
{
public function push($item): void { ... }
public function pop() { ... }
}
A stack can take any type of object, but what if we wanted to limit it to be a stack that holds only objects of type User
?
Psalm and Phan support notation like this:
/**
* @template T
*/
class Stack
{
/**
* @param T $item
*/
public function push($item): void;
/**
* @return T
*/
public function pop();
}
Docblock are used to pass in additional type information: e.g.
/** @var Stack<User> $userStack */
$stack = new Stack();
Means that $userStack
must only contain User
s.
If Psalm analysed this code:
$userStack->push(new User());
$userStack->push("hello");
It would complain about the 2nd line with the error: Argument 1 of Stack::push expects User, string(hello) provided
.
Currently PhpStorm does not support this notation.
There is even more to generics than the above, but we've covered enough for now.
The following steps are required:
The PHP community has already, unofficially, agreed this form of generics (it's supported by most tools and it's meaning is widely understood):
/**
* @return User[]
*/
function getUsers(): array { ... }
However we have problems with something as simple as this:
/**
* @return array<string, User>
*/
function getUsers(): array { ... }
Psalm understands this and knows the type of the key and value in the returned array.
At the time of writing PhpStorm does not understand this notation. Using this notation I would lose the power of the real time static analysis that PhpStorm offers.
Consider the code below. PhpStorm does not know that $user
is of type User
and that $name
is of type string
:
foreach(getUsers() as $name => $user) {
...
}
If Psalm is my static analysis tool of choice I could write this:
/**
* @return User[]
* @psalm-return array<string, User>
*/
function getUsers(): array { ... }
Psalm understands everything.
PhpStorm knows that $user
is of type User
. It still does not know that the array key is of type string
.
Phan and PHPStan don't understand the psalm specific annotation, so the best information they get is the same as
PhpStorm; the type of $user
.
You could argue that PhpStorm should just adopt the convention: array<keyType, valueType>
. I'd argue it's the
job of the language and community to dictate the standards and the tools follow. Tool vendors, for all kinds of good
reasons, might not be happy to set standards.
I suspect the convention above would happily be accepted by the majority of the PHP community who care about generics. However things get more complex when talking about templates. At the time of writing neither PHPStan or PhpStorm support templates. Psalm and Phan do support templates. Their goals are the same, but when you delve down to the details their implementations are slightly different.
Every option presented is some kind of compromise.
Put simply there is a need for agreement on the generics notation:
Psalm has all the functionality required for checking generics. Phan looks like it does too.
PhpStorm I'm sure will implement generics as soon as there is agreement in the community for the format.
The final piece of the generics jigsaw is adding support for dealing with 3rd party libraries.
Once a standard for defining generics comes along hopefully most libraries would start using them. However there will be delay. Some libraries might be used but not actively maintained. To use static analysis tools for validating generics it is vital that all functions that take or return generics are defined.
What happens if your project relies on a 3rd party library that doesn't have generics?
Fortunately this problem has already been solved, the concept is called stubs. Psalm, Phan and PhpStorm all use stubs.
Stubs are regular files that contain function and method signatures but no implementation. By adding docblocks to the stubs gives static analysis tools the extra information they need.
E.g. if you had an interface to a stack with no typehints or generics like this.
class Stack
{
public function push($item)
{
/* some implementation */
}
public function pop()
{
/* some implementation */
}
}
You could create a stub file that has identical method signatures but added docblock an no implementation.
/**
* @template T
*/
class Stack
{
/**
* @param T $item
* @return void
*/
public function push($item);
/**
* @return T
*/
public function pop();
}
When the static analyser sees the stack class it infers type information from the stub rather than the actual code.
An easy way of sharing stubs between developers would be useful (maybe in a composer repo) as this means work can be shared.
As a community we need to get behind a agreeing and defining a standard.
Maybe this would be best done as a generics PSR?
Or maybe the lead developers of the static analysis tools, PhpStorm, other PHP IDEs and someone from internals (as a sanity check) could make a standard for all the tools to work towards.
Once the standard is in place everyone can help add generics to existing libraries and projects by submitting PRs. Where this isn't possible developers can write and share stubs.
With all the above in place we'll be able to use tools like PhpStorm for checking generics in real time as we code. We can use static analysis tools as part of our CI as a safety net.
So we can have generics in PHP (well almost).
There are some limitations. PHP is a dynamic language that can do lots of magical things, examples here. If you use too much PHP magic then this means the static analysis tools might not be able to accurately derive all the types in the system. If any types are unknown then the tools will not be able to assert in all cases the correct use of generics.
That said the main application for this kind of analysis should be on your business logic. If you're coding cleanly you probably shouldn't be using too much magic.
That would be ideal. PHP is open source so there is nothing to stop you checking out PHP source and adding generics! (NOTE: This hard!)
Just ignore all of the above. One of the great things about PHP is that you have the flexibility to choose the appropriate level of engineering depending on what you're doing. Throw away code need not bother with features like type hinting. Long lasting code must use these features.
Drop me a DM on twitter.