php框架的底层原理（php各种框架的优缺点） - 原点资讯

本文分6个主题进行讲解

PHP运行机制和原理
PHP底层变量数据结构
PHP传值赋值中的COW特性
PHP垃圾回收机制
PHP中数组底层分析
PHP数组函数分类

PHP运行机制和原理

扫描 -> 解析 -> 编译 -> 执行 -> 输出

执行步骤

扫描

对代码进行词法和语法分析，将内容切割成一个个片段 (token)

解析

将代码片段筛掉空格注释等，将剩下的token 转成有意义的表达式

编译

将表达式编译成中间码 (opcode)

执行

将中间码一条条执行

输出

将执行结果输出到缓冲区

代码切割

$code = <<<EOF <?php echo 'hello world'l; $data = 1 1; echo $data; EOF; print_r(token_get_all($code));

执行结果

Array ( [0] => Array ( [0] => 376 [1] => <?php [2] => 1 ) [1] => Array ( [0] => 319 [1] => echo [2] => 2 ) [2] => Array ( [0] => 379 [1] => [2] => 2 ) [3] => Array ( [0] => 318 [1] => 'hello world' [2] => 2 ) [4] => Array ( [0] => 310 [1] => l [2] => 2 ) [5] => ; [6] => Array ( [0] => 379 [1] => [2] => 2 ) [7] => = [8] => Array ( [0] => 379 [1] => [2] => 3 ) [9] => Array ( [0] => 308 [1] => 1 [2] => 3 ) [10] => [11] => Array ( [0] => 308 [1] => 1 [2] => 3 ) [12] => ; [13] => Array ( [0] => 379 [1] => [2] => 3 ) [14] => Array ( [0] => 319 [1] => echo [2] => 4 ) [15] => Array ( [0] => 379 [1] => [2] => 4 ) [16] => ; )

观察上面可以得到三个信息

Token id 例如空格回车都是 379
token 字符串
行号

Token id 是Zend内部token对应码, 定义于zend_language_parser.h

提高PHP执行效率

压缩代码，去除无用注释和空白字符 (jquery.min.js)
尽量使用PHP内置函数或扩展函数
用 apc/xcache/opcache 等缓存PHP的opcode
缓存复杂和耗时的运算结果
能异步处理的不要同步处理，如发送邮件

HHVM 为何速度快

通过虚拟机(类似java) 直接将PHP转换成二进制字节码运行，执行时不用每次都去解析。

PHP底层变量数据结构

使用 zval 结构体保存，下面代码在 Zend/zend.h 定义

typedef union _zvalue_value { /* 下面定义描述了PHP的8大数据类型 */ long lval; // 长整型布尔型 double dval; // 浮点型 struct { // 字符串型 char *val; int len; // strlen 返回这个值 } str; // NULL 类型表示本身为空 hashTable *ht; // 数组使用哈希表实现 zend_object_value obj; // 对象类型 } zvalue_value; struct _zval_struct { zvalue_value value; /* 变量的值 */ zend_uint refcount__gc; zend_uchar type; /* 变量的类型 */ zend_uchar is_ref__gc }; typedef struct _zval_struct zval;

变量类型的定义，下面代码在 Zend/zend_types.h 定义

typedef unsigned int zend_uint; typedef unsigned char zend_uchar;

PHP数据8大类型统一通过 zvalue_value 联合体存储

联合体自身为空描述 null long 描述 int bool double 描述 float str 描述 string HashTable 描述数字数组和关联数组 zend_object_value 描述对象和资源

PHP变量类型描述使用 zend_uchar type 描述

#define IS_NULL 0 #define IS_LONG 1 #define IS_DOUBLE 2 #define IS_BOOL 3 #define IS_ARRAY 4 #define IS_OBJECT 5 #define IS_STRING 6 #define IS_RESOURCE 7 #define IS_CONSTANT 8 #define IS_CONSTANT_ARRAY 9

例如 $a=3 结构体如下(伪代码)

struct { zvalue_value = 3; refcount__gc = 1; type = IS_LONG; is_ref__gc = 0; }

$a 就像指针一样指向上面的结构体

PHP传值赋值中的COW特性

在 _zval_struct 数据结构中还有下面两个成员

zend_uint refcount__gc 表示被引用多少次，每次引用 1
zend_uchar is_ref__gc 表示普通变量还是引用变量

下面通过编写代码了解引用机制

此处我使用的是 php5.4，需要安装 xdebug 来查看变量引用

注意使用 php7.2 测试的时候引用数会一直为0

安装 xdebug

编译生成 xdebug.so

yum -y install php-devel tar xf xdebug-2.8.0alpha1.tgz cd xdebug-2.8.0alpha1 phpize find /usr/ -name "php-config" ./configure --with-php-config=/usr/bin/php-config make && make install ls /usr/lib64/php/modules/

配置 xdebug

php --ini echo 'zend_extension=/usr/lib64/php/modules/xdebug.so' >> /etc/php.ini systemctl restart php72-php-fpm.service php -m | grep xdebug

编写测试代码

$a = 3; xdebug_debug_zval('a');

输出

a: (refcount=1, is_ref=0)=3

refcount 引用数为1
is_ref 为0表示普通变量
=3 表示值为3

开始引用

$a = 3; $b = $a; xdebug_debug_zval('a'); xdebug_debug_zval('b');

输出

a: (refcount=2, is_ref=0)=3

b: (refcount=2, is_ref=0)=3

赋予新值

$a = 3; $b = $a; $b = 5; xdebug_debug_zval('a'); xdebug_debug_zval('b');

输出

a: (refcount=1, is_ref=0)=3

b: (refcount=1, is_ref=0)=5

传递地址

$a = 3; $b = &$a; xdebug_debug_zval('a'); xdebug_debug_zval('b');

输出

a: (refcount=2, is_ref=1)=3

b: (refcount=2, is_ref=1)=3

is_ref 该变量从普通变量转成引用变量

赋予新值

$a = 3; $b = &$a; $c = $a; $b = 5; xdebug_debug_zval('a'); xdebug_debug_zval('b'); xdebug_debug_zval('c');

a: (refcount=2, is_ref=1)=5

b: (refcount=2, is_ref=1)=5

c: (refcount=1, is_ref=0)=3

总结

变量之间传值是通过引用赋值形式，无需开辟新的空间，节省资源
当一个变量的值发生改变时，会复制一份来存新的值，取消引用，称为 copy on write (COW)
引用变量不会触发COW

PHP垃圾回收机制

什么是垃圾

上海人：你算什么垃圾?
如果一个zval 没有任何变量引用它，那它就是垃圾

?: (refcount=0, is_ref=0)=5

为啥要清理垃圾?

有人说php线程结束时会销毁所有变量，关闭所有句柄资源，不是自动的嘛，为啥要清理

如果php 短时间内处理多个大文件时(如1G的电影)，处理完不回收继续处理下一个，会造成内存溢出
如果php 是个守护进程或者长时间运行的脚本，不回收垃圾，慢慢积累会造成内存溢出

如何清理垃圾

找垃圾
清除

找垃圾

通过 get_defined_vars 查看所有已定义变量

底层代码 zend_globals.h 定义了存储所有变量的两个哈希表

struct _zend_executor_globals { ... HashTable *active_symbol_table; //局部变量符号表 HashTable symbol_table; //全局变量符号表 ... }

找到所有已定义的变量后，寻找哪些变量引用数为0

struct _zval_struct{ ... zend_uint refcount__gc; zend_uchar is_ref__gc; ... }

清理垃圾

如上面将 refcount__gc 为0的变量清除，这个思路是 PHP5.2版本之前的做法了

PHP5.3后用引用计数系统中同步周期回收算法来清除

其实新算法也是基于 refcount__gc 来回收，那么为什么要用新算法呢？

我们知道 refcount__gc 为0的一定是垃圾

但是并不是所有的垃圾 refcount__gc 都为0

也有 refcount__gc 不为0 的垃圾，如下实验可以产生不为0的垃圾

一个例子

$a = ['a']; $a[] = &$a; //引用自己 xdebug_debug_zval('a');

输出

a: (refcount=2, is_ref=1)=array (

0 => (refcount=1, is_ref=0)='a',

1 => (refcount=2, is_ref=1)=...

)

第二元素: ... 代表递归，引用数2，是一个指针引用变量

官方提供的一张图

php框架的底层原理,php各种框架的优缺点(1)

此时删掉 $a

$a = ['a']; $a[] = &$a; unset($a); xdebug_debug_zval('a');

输出

a: no such symbol

因为 $a 被删了，所以xdebug打印不出来，那么此时理论结构如下

(refcount=1, is_ref=1)=array (

0 => (refcount=1, is_ref=0)='a',

1 => (refcount=1, is_ref=1)=...

)

php框架的底层原理,php各种框架的优缺点(2)

此时这个 zval 已经没有符号 (symbol) 引用了，但是它因为自己引用自己 refcount 为1，所以它是一个奇葩的垃圾

对于此情况php脚本结束时，会自动清理，当结束前会占用空间

因此 5.2 版本之前的垃圾清理思路不能覆盖这种情况

引用计数系统中同步周期回收算法 (Concurrent Cycle Collection in Reference Counted System)

继续以上面代码为例进行说明

新算法说明:

将 $a 作为疑似垃圾变量，进行模拟删除 (refcount--)，然后模拟恢复，恢复条件是有其他变量引用该值时才进行模拟恢复 (refcount )

这样没能恢复成功的就是垃圾了，把它删除即可。

例如上面的奇葩垃圾:

(refcount=1, is_ref=1)=array (

0 => (refcount=1, is_ref=0)='a',

1 => (refcount=1, is_ref=1)=...

)

模拟删除后变成:

(refcount=0, is_ref=1)=array (

0 => (refcount=0, is_ref=0)='a',

1 => (refcount=0, is_ref=1)=...

)

然后模拟恢复:

因为没有类似 $a 这种 symbol 取指向该zval，所以恢复不来

何时清除

通过上面的算法疑似垃圾会存放到一个区域(垃圾站)，只有垃圾站满了才会立刻清除。注意前提是开启垃圾回收

开启垃圾回收两种方式

php.ini 下的 zend.enable_gc = On 默认开启
通过 gc_enable() 和 gc_disable() 来打开或关闭垃圾回收

可以直接使用 gc_collect_cycles() 函数强制执行周期回收

最后说了那么多，其实只需要了解其中的原理，整个过程不需要PHP开发人员参与，只需要调用 gc_enable() 或 gc_collect_cycles() 即可实现自动回收

PHP中数组底层分析

先复习一下数组特性

PHP 数组键的特性

$arr = [ 1 => 'a', '1' => 'b', 1.5 => 'c', true => 'd', ]; print_r($arr);

Array

(

[1] => d

)

key 可以是 integer 或 string

value 可以是任意类型

key 有如下特性

数字字符串会被转成整型 '1' => 1
浮点型和布尔型转成整型 1.3 =》 1
null会被当做空字符串 null => ''
键名不可以使用对象和数组
相同键名后面覆盖前面

访问数组元素

$arr[key]
$arr{key}

5.4 版本后可以使用如下

function getArr(){ return [1,2,3,4]; } echo getArr()[2];

删除数组元素

$a = [1,2,3,4]; foreach ($a as $k => $v) { unset($a[$k]); } $a[] = 5; print_r($a);

Array

(

[4] => 5

)

删除不会重置索引

数组遍历

for
foreach
array_walk
array_map
current 和 next

数组内部实现

实现使用两个结构 HashTable 和 bucket

php框架的底层原理,php各种框架的优缺点(3)

什么是 HashTable

哈希表，通过关键字直接访问内存存储位置的数据结构。

通过把关键字进行哈希函数计算，得到映射到表中的位置使得: 查找，插入，修改，删除均在O(1)完成

php框架的底层原理,php各种框架的优缺点(4)

下面代码在 Zend/zend_types.h

typedef struct _zend_array HashTable; struct _zend_array { zend_refcounted_h gc; union { struct { ZEND_ENDIAN_LOHI_4( zend_uchar flags, zend_uchar nApplyCount, zend_uchar nIteratorsCount, zend_uchar consistency) } v; uint32_t flags; } u; uint32_t nTableMask; Bucket *arData; uint32_t nNumUsed; uint32_t nNumOfElements; uint32_t nTableSize; uint32_t nInternalPointer; zend_long nNextFreeElement; dtor_func_t pDestructor; };

旧版结构体

typedef struct _hashtable { uint nTableSize; uint nTableMask; uint nNumOfElements; ulong nNextFreeElement; Bucket *pInternalPointer; Bucket *pListHead; Bucket *pListTail; Bucket **arBuckets; unsigned char nApplyCount; };

成员说明nTableSizeBucket大小，最小为8，以2x增长nTableMask索引优化 nTableSize-1nNumOfElements元素个数使用count()函数直接返回这个nNextFreeElement下一个索引位置 foreach使用pInternalPointer当前遍历的指针,foreach比for快的原因,reset current函数使用pListHead存储数组头部指针pListTail存储数组尾部指针arBuckets实际存储容器arDataBucket数据nApplyCount记录被递归次数，防止死循环递归

typedef struct bucket { ulong h; uint nKeyLength; void *pData; void *pDataPtr; struct bucket *pListNext; struct bucket *pListLast; struct bucket *pNext; struct bucket *pLast; const char *arKey; };

成员说明h对char *key进行hash后的值，或是用户指定数字索引值nKeyLength哈希关键字长度，若为索引数字则为0pData指向value 一般是用户数据的副本，若为指针数据则指向指针pDataPtr如果是指针数据，指针会指向真正value，上面指向此pListNext整个hash表下个元素pListLast整个hash表上个元素pNext同一个hash的下一个元素pLast同一个hash的上一个元素arKey保存当前key对应的字符串

php框架的底层原理,php各种框架的优缺点(5)