Emitters
We currently support four different emitters: sync, socket, curl and an out-of-band file emitter. The most basic emitter only requires you to select the type of emitter to be used and specify the collector's hostname as parameters.
All emitters support both GET
and POST
as methods for sending events to Snowplow collectors.
For the sake of performance, we recommend using POST
as the tracker can then batch many events together into a single request.
Note that depending on your pipeline architecture, your collector may have limits on the maximum request size it will accept that could be exceeded by large batch sizes.
It is recommended that after you have finished logging all of your events to call the following method:
$tracker->flushEmitters();
This will empty the event buffers of all emitters associated with your tracker object and send any left over events. In future releases, this will be an automatic process but for now, it remains manual.
Sync
The Sync emitter is a very basic synchronous emitter which supports both GET
and POST
request types.
By default, this emitter uses the Request type POST, HTTP and a buffer size of 50.
As of version 0.7.0, the emitter has the capability to retry failed requests. In case connection to the collector can't be established or the request fails with a 4xx (except for 400, 401, 403, 410, 422) or 5xx status code, the same request is retried. The number of times a request should be retried is configurable but defaults to 1. There is a back-off period between subsequent retries, which starts with 100ms (configurable) and increases exponentially.
Example emitter creation:
$emitter = new SyncEmitter($collector_uri, "http", "POST", 50);
Whilst you can force the buffer size to be greater than 1 for a GET Request; it will not yield any performance changes as we can still only send 1 event at a time.
Constructor:
public function __construct($uri, $protocol = NULL, $type = NULL, $buffer_size = NULL, $debug = false, $max_retry_attempts = NULL, $retry_backoff_ms = NULL)
Arguments:
Argument | Description | Required? | Validation |
---|---|---|---|
$uri | Collector hostname | Yes | Non-empty string |
$protocol | Collector Protocol (HTTP or HTTPS) | No | String |
$type | Request Type (POST or GET) | No | String |
$buffer_size | Amount of events to store before flush | No | Int |
$debug | Whether or not to log errors | No | Boolean |
$max_retry_attempts | The maximum number of times to retry a request. Defaults to 1. | No | Int |
$retry_backoff_ms | The number of milliseconds to backoff before retrying a request. Defaults to 100ms, increases exponentially in subsequent retries. | No | Int |
Socket
The Socket emitter allows for the much faster transmission of Requests to the collector by allowing us to write data directly to the HTTP socket. However, this solution is still, in essence, a synchronous process and will block the execution of the main script.
As of version 0.7.0, the emitter has the capability to retry failed requests. In case connection to the collector can't be established or the request fails with a 4xx (except for 400, 401, 403, 410, 422) or 5xx status code, the same request is retried. The number of times a request should be retried is configurable but defaults to 1. There is a back-off period between subsequent retries, which starts with 100ms (configurable) and increases exponentially.
Example Emitter creation:
$emitter = new SocketEmitter($collector_uri, NULL, "GET", NULL, NULL);
Whilst you can force the buffer size to be greater than 1 for a GET Request; it will not yield any performance changes as we can still only send 1 event at a time.
Constructor:
public function __construct($uri, $ssl = NULL, $type = NULL, $timeout = NULL, $buffer_size = NULL, $debug = NULL, $max_retry_attempts = NULL, $retry_backoff_ms = NULL)
Arguments:
Argument | Description | Required? | Validation |
---|---|---|---|
$uri | Collector hostname | Yes | Non-empty string |
$ssl | Whether to use SSL encryption | No | Boolean |
$type | Request Type (POST or GET) | No | String |
$timeout | Socket Timeout Limit | No | Int or Float |
$buffer_size | Amount of events to store before flush | No | Int |
$debug | Whether or not to log errors | No | Boolean |
$max_retry_attempts | The maximum number of times to retry a request. Defaults to 1. | No | Int |
$retry_backoff_ms | The number of milliseconds to backoff before retrying a request. Defaults to 100ms, increases exponentially in subsequent retries. | No | Int |
Curl
The Curl Emitter allows us to have the closest thing to native asynchronous requests in PHP. The curl emitter uses the curl_multi_init
resource which allows us to send any number of requests asynchronously. This garners quite a performance gain over the sync and socket emitters as we can now send more than one request at a time.
On top of this, we are also using a modified version of this Rolling Curl library for the actual sending of the curl requests. This allows for a more efficient implementation of asynchronous curl requests as we can now have multiple requests sending at the same time, and in addition as soon as one is done a new request is started.
The collector does not retry failed requests to the collector. Failed requests to the collector (e.g., due to it being not reachable) result in lost events.
Example Emitter creation:
$emitter = new CurlEmitter($collector_uri, false, "GET", 2);
Whilst you can force the buffer size to be greater than 1 for a GET request, it will not yield any performance changes as we can still only send 1 event at a time.
Constructor:
public function __construct($uri, $protocol = NULL, $type = NULL, $buffer_size = NULL, $debug = false, $curl_timeout = NULL)
Arguments:
Argument | Description | Required? | Validation |
---|---|---|---|
$uri | Collector hostname | Yes | Non-empty string |
$protocol | Collector Protocol (HTTP or HTTPS) | No | String |
$type | Request Type (POST or GET) | No | String |
$buffer_size | Amount of events to store before flush | No | Int |
$debug | Whether or not to log errors | No | Boolean |
$curl_timeout | Maximum time the request is allowed to take, in seconds | No | Int |
Curl Default Settings
The internal emitter default settings are as follows:
- Rolling Window (Number of concurrent requests)
- POST: 10
- GET: 30
- Curl Buffer (Number of times we need to hit the emitters buffer size before sending)
- POST: 50
- GET: 250
These settings are currently not editable from the constructor; however, the values are stored within a Constants.class
if you must make changes.
File
When running under Windows, PHP cannot spawn truly separate processes, and slowly eats more and more resources when more processes are spawned. Thus, Windows might crash under high load when using the File Emitter.
The File Emitter is the only true non-blocking solution. The File Emitter works via spawning workers which grab created files of logged events from a local temporary folder. The workers then load the events using the same asynchronous curl properties from the above emitter.
All of the worker processes are created as background processes so none of them will delay the execution of the main script. Currently, they are configured to look for files inside created worker folders until there are none left and they hit their timeout
limit, at which point the process will kill itself.
If the worker for any reason fails to successfully send a request it will rename the entire file to failed
and leave it in the /temp/failed-logs/
folder.
The collector does not retry failed requests to the collector. Failed requests to the collector (e.g., due to it being not reachable) result in lost events.
Example Emitter creation:
$emitter = new FileEmitter($collector_uri, false, "POST", 2, 15, 100, "/tmp/snowplow");
The buffer for the file emitter works a bit differently to the other emitters in that here it refers to the number of events needed before an events-random.log
is produced for a worker. If you are anticipating it taking a long time to reach the buffer be aware that the worker will kill itself after 75 seconds by default (15 x 5). Adjust the timeout amount in the construction of the FileEmitter if the default is not suitable.
Constructor:
public function __construct($uri, $protocol = NULL, $type = NULL, $workers = NULL, $timeout = NULL, $buffer_size = NULL, $debug = false, $log_dir = NULL)
Arguments:
Argument | Description | Required? | Validation |
---|---|---|---|
$uri | Collector hostname | Yes | Non-empty string |
$protocol | Collector Protocol (HTTP or HTTPS) | No | String |
$type | Request Type (POST or GET) | No | String |
$workers | Amount of background workers | No | Int |
$timeout | Worker Timeout | No | Int or Float |
$buffer_size | Amount of events to store before flush | No | Int |
$debug | Whether or not to log errors | No | Boolean |
$log_dir | The directory for event log and worker log subdirectories to be created in | No | String |
Emitter Debug Mode
Debug mode is enabled on emitters by setting the $debug
argument in the emitter constructor to true
:
$emitter = new SyncEmitter($collector_uri, "http", "POST", 50, true);
By default, debug mode will create a new directory called /debug/
in the root of the tracker's directory. It will then create a log file with the following structure; sync-events-log-[[random number]].log
: i.e. the type of emitter and a randomized number to prevent it from being accidentally overwritten.
If physically storing the information is not possible due to not having the correct write permissions or simply not wanted it can be turned off by updating the following value in the Constants class:
const DEBUG_LOG_FILES = false;
Now all debugging information will be printed to the console.
Every time the events buffer is flushed we will be able to see if the flush was successful. In the case of an error, it records the entire event payload the tracker was trying to send, along with the error code.
Event Specific Information
Debug Mode if enabled will also have the emitter begin storing information internally. It will store the HTTP response code and the payload for every request made by the emitter.
array(
"code" => 200,
"data" => "{"e":"pv","url":"www.example.com","page":"example","refr":"www.referrer.com"}"
)
The data
is stored as a JSON-encoded string. To locally test whether or not your emitters are successfully sending, we can retrieve this information with the following commands:
$emitters = $tracker->returnEmitters(); # Will store all of the emitters as an array.
$emitter = $emitters[0]; # Get the first emitter stored by the tracker
$results = $emitter->returnRequestResults(); # Return the stored results.
# Now that we have results we can work with...
print("Code: ".$results[0]["code"]);
print("Data: ".$results[0]["data"]);
This allows you to debug on a request by request basis to ensure that everything is being sent properly.
Turn Debug Off
As debugging stores a lot of information, we can end debug mode by calling the following command:
$tracker->turnOffDebug();
This will stop all logging activity, both to the external files and to the local arrays. We can go one step further though and pass a true
boolean to the function. This will delete all of the tracker's associated physical debug log files as well as emptying the local arrays within each linked emitter.
$tracker->turnOffDebug(true);