FAQ Flashcards
How do you integrate Segment?
All data ingested by Segment is collected through Segment’s collection API. Using this API, Segment has built 16 (and counting) different SDKs for mobile, web and server side data. Segment code is integrated directly into your websites, mobile apps and servers by an engineer, and data is sent to Segment via the track endpoint for user actions, and the identify endpoint for user traits.
How does Segment send data to downstream tools?
What is device mode?
What is cloud mode?
Segment can send data to downstream tools in one of two ways (called connection modes). The first is Device mode. If Segment is collecting data via a client side library, Segment has the option to load partner SDKs directly onto the client side by side with the Segment SDK. This is only an option for Segment client side SDKs (web/mobile). On Web, when a destination is turned on in the Segment UI in “device mode”, within about 10 minutes Segment updates that customers Segment CDN to include the new destination’s SDK. After that CDN update has been made, the next time Segment loads on the customer’s website, the CDN will then serve both the Segment SDK and the destination SDK that was recently added. This is done automatically, and requires no extra work for the customer.
What is device mode?
On Mobile, device mode connections require developers to include a Segment modified (provided by Segment) destination package within the mobile app, but does not require them to do a full integration of that destination. Once the segment specific destination package has been added to the mobile app, it will automatically collect data wherever the engineer has specified Segment track and identify calls. While this does require engineering work, it drastically reduces the time it takes to implement new tools.
What is cloud mode?
Cloud mode connections are far simpler. Cloud mode means that data is always being sent to the Segment server first, and then routed from the Segment server to the downstream destination server.
Why use Device Mode?
Destinations need to be in device mode for a couple reasons. Some destinations need to be on the client because they have very special, specific features they offer as part of their business that Segment does not accomplish via basic data collection. Examples of this are AB testing tools like Optimizely / VWO or heatmapping tools like Fullstory / Crazy Egg. Other destinations don’t have ways to send data server side, meaning their client side SDKs are the only way for them to receive data. This is common in the ad pixel world, where adtech companies purposefully make it difficult to send data to them server side so that their ad pixels must remain on the site (and provide them with more information).
How fast is Segment?
What is the speed of your track API?
How long does it take data to get sent to downstream tools?
Data collected by Segment is near real time. Data sent to our collection APIs receive a response in about 50 ms. Once this data hits the Segment front doors, it takes ~1 second for that data to be processed, queued and successfully sent downstream to a downstream partner. Some destinations also operate in near real time, in which case you can view the Segment collection and federation both in Segment and in the downstream partner. Some destinations, however, either have lag time before you can see processed data (Segment sends it to them in real time, but you can’t view it yet) or Segment batches and sends instead of real time (data warehouses where dynamic schematization must happen before sending the data downstream).
How is Segment different than a tag manager?
A tag manager is a web only feature that allows non technical users to inject code into a website. Tag management exists on mobile, but it is notoriously hard to use and quite buggy. An engineer must first implement the tag manager natively, and populate a “data layer” with commonly used properties that are present on the website (e.g. “item name”, “price”, “item category” etc..). In most cases, the engineer must also specify “hooks” in the website where key actions occur (e.g. “Order Completed”, “Product Added” etc..) Once the tag manager is implemented, users can log into the UI and start injecting both preset and custom javascript functions into those “hooks”, using the data layer to populate the meta data within the functions. A new integration typically requires a user to go in to the tag manager and manually insert the new destinations code into every “hook” that is relevant to the integration, as well as specify which data within the datalayer should be sent through as well. All tag manager integrations are “device mode” integrations. They live on the site.
Segment requires far less work to integrate a new tool. Once Segment is implemented on a website, turning on a new integration only requires you to add a few config options and then flip a switch (with the exception of hard to integrate tools like Adobe). Also Segment is far more than a web based collection tool. Data can be collected and combined from mobile, web and server, as well as allowing you to set tools to cloud mode so they do not need to touch the site at all. Tag managers can be great for the long tail list of ad pixels that Segment may not have in its catalogue, but for everything else Segment is a far superior way to integrate tools.
Will Segment slow down my web page?
If you are loading Segment on your website, it will have a nonzero effect on your page’s load time. analytics.js is asnychronous and non-blocking. This means that the page won’t wait for Segment to finish loading before it moves on to continue loading your site. So even in the rare chance that the Segment library takes a full second to load, the page will only be slowed down a few ms. After a couple internal speed tests, Segment slowed down my test website between 3-9 milliseconds.
Segment can potentially load a number of other SDKs side by side with the Segment SDK in “device mode”. This is controlled by the customer within the Segment UI. These libraries that are loaded in device mode are loaded directly onto the web page, in which case they too can potentially slow down your site a miniscule amount. These libraries slow down your page the exact same amount compared to if you implemented them natively on the site, rather than through Segment.
Segment offers cloud mode connections, which take the partner SDKs off the site altogether (whenever possible), making it the fastest possible way to get data to your downstream tools while minimizing the impact to your site as much as possible.
Can I manage my Segment configuration via APIs?
Yes, almost everything that is configurable in the UI is also configurable via our Segment Config API. Segment is built to be scalable and engineer friendly.
Can I apply custom rules for how data is sent to different destinations?
There are a few ways to customize your data for specific downstream destinations.
- Event routing - Within the schema tab of any source, you have the ability to route events on a event name basis to downstream tools. A common usecase here is excluding page views from martech tools as they run up the bill heavily.
- Destination Filtering - This gives you the ability to apply much more custom rules and conditionals around when to send an event, or when to strip a property from an event. There are many apllications for this, but one example is stripping PII from data payloads when they are not necessary in the downstream tool, as a way of reducing risk.
- Custom functions - When all else fails, custom functions allows you to write arbitrary javascript functions that can do all sorts of transformations, data filterting and data enrichment on a destination by destination basis. This is not going to be the solution to simple problems, but if it is absolutely essential that data is sent to a downstream tool in a very specific way that is not handled by the above, there is a very high chance custom functions can solve it for you (with a decent amount of work).
What if I don’t see a tool in the catalogue that I want to integrate with? What are my options?
[AS] There are a couple of options. We have a full dev center and partners can create their own connector. We have around 60 (and growing) partner developed integrations and many of them have come from customer requests to those teams asking for a Segment integration. (then pick one of the following)
For Sources: But, if you want more control we have custom source functions that allow your engineers to write a small js function that can ingest webhooks and translate the data into a Segment API call.
For Destinations: But, if you want more control we do have custom destination functions that lets your engineers write a small serverless js function that can transform your Segment events and send them to other APIs, including your internal services. (edited)
Does Segment help filter / prevent bot activity?
Some destinations can be setup to filter out bot user agents. You can also write it in the instrumentation code to not send track events for bots (based on user agent)
How does Segment integrate with consent managers?
[AS] Segment has an API and UI to delete and suppress data about end users. This allows you to block ongoing data collection about the user, and additionally to delete all historical data across Segment’s systems, connected S3 buckets and warehouses, and supported downstream partners.
You can also use code within the source to control which destinations to send data to, perhaps depending on a users response to a consent form.
Can Segment work with DMPs?
[AS] Yes, Segment can work with DMPs. Segment works by collecting and organizing data from your website, mobile app and anywhere else someone might interact with your brand and then federating it to your other tools and we can pull data back from those tools (things like email opens/clicks, etc). We also help you connect anonymous user activty to an identified user through deterministic Identity Resolution.
Segment then let’s you build audiences based on your 1st and 2nd party data and use it to target your marketing campaigns and can be used in conjunction with DMPs - for example, you can send an audience to Facebook’s DMP to create look-a-like audiences based on your 1st party data.
This audiencing capability is where there is an overlap between the two, but DMPs use almost exclusively 3rd party data. This is useful when you want to build campaigns targetting audiences who have never interacted with your brand, but this 3rd party data is much less accurate and most commonly used only at the top of the acquisition funnel. If you plan on creating highly personalized marketing campaigns based on your own data, you’ll need a CDP like Segment.
How long does it typically take teams to instrument a tool such as for analytics or A/B testing?
Anywhere between 2 weeks to 3 months
How we handle low/no connectivity scenarios on mobile?
[AS] We queue up events locally on the device and then send them once connection is re-established. We also compress them first, so it doesnt take up much memory. Additionally, by moving SDKs server-side you’ll get a smlle
Note: If they still have questions about limited memory and losing data: While we’re also limited by memory and the SDK is compressing as much as possible, this coupled with moving SDKs server-side should be fine.
Why do we batch track calls in Mobile?
Increase durability — Immediately persist every message to a disk-backed queue, preventing you from losing data in the event of battery or network loss.
Auto-Retries — If the network is spotty or not available, our SDKs will retry transferring the batch until the request is successful. This drastically improves data deliverability.
Reduce Network Usage — Each batch is gzip compressed, decreasing the amount of bytes on the wire by 10x-20x.
Save Battery — Because of data batching and compression, Segment’s SDKs reduce energy overhead by 2-3x which means longer battery life for your app’s users.
Why is a bundled SDK better than including native SDKs?
[AS] Eliminate Redundant Code
Consolidate repetitive user tracking code with a single, simple API.
Shrink your app
Reduce the size of your app by moving destinations to the server-side.
Boost Performance
Save your customers battery with data queuing, batching, and compression.
Use the tools you want, when you want
rewrite this: something about app versions, ability to turn on and off tools without having to re-release and get users to update, etc…
Are cloud sources credentials encrypted at rest?
Yes, they are encrypted at rest in our control plane db.
what are the use cases for Page calls? does every customer use or have to use page calls?
I see this note: “Note: In analytics.js a page call is included in the snippet by default just after analytics.load. We do that because you must call this method at least once per page load.” on the Page Spec Docs.
i would never recommend removing page calls
thats 1/3 of the primary question people usee segment to answer – where is thee user doing stuff
and many integrations depend on page call explicit to function / instantiate
now if the question is they are firing unnecessary page calls ie. shit SPA setups etc
then could be another discussion but yeah plenty of use cases for page calls
one could argue you could infer page views from “product viewed” events etc. but to me that is additional user behavior they are deciding to track on that specific page
along with other button clicks and potentially actions
so it would be short sighted to scrap page calls
cuz that inherently breaks our api spec model
what about customers moving tracking server side, do they still fire page calls?
i guess you have/want to if you dont have every page viewed as a track event
i guess there is just a lot of overlap between page and track page view
you can theoretically fire virutal page calls like nike
is doing w their mobile apps
but you are making tradeoffs there
overlap maybe in data insight
but veery different purpose when it comes to data integration and what features it drives betweeen page and track much diff
firing page calls server side is ssuch a hardo move
you have to basically manually collect our client side SDK that already auto collects a shit ton of context, pass it to your server, send calls
nike did it anyway cuz it was worth not having 3rd party sdk bloat for their apps
but for most folks, its just so much extra work for no meaningful impact/gain
-Han
How do you do on-premise installations?
shaurav.garg -
I haven’t run into this at Segment but often for on-prem services, external connections are proxied via some middleman service that controls what data can go between the outside world the on-prem installation behind the firewall. I think Diggory’s comment about existing connections is to guage whether this model already exists and if so could be extended to Segment since really they need to access a couple of our CDN endpoints. I know we are working on possibly expanding our custom domain/proxying feature but for now there are some public docs around this (https://segment.com/docs/connections/sources/custom-domains/)
There are also models where there is VPN interconnects that allow communications between a customer’s customer on-prem installation and your own network. If that exists, then they could also leverage us by sending data from the customer’s customer on-prem app to their own backend and then use a server side library to send data to us.
Can Protocols help govern device mode connections?
Currently Segment does not automatically govern data sent through device mode connections but it is on the roadmap (this answer might be old, true as of 11/24/2019). Data will be sent as it is configured by the developer to downstream destinations that are implemented in device mode, but will be governed/blocked/transformed to destinations that are in cloud mode with protocols turned on. Typewriter (https://www.npmjs.com/package/typewriter) is a feature that allows engineers to download the tracking plan into their Segment SDK and can help them integrate correctly the first time. This mitigate potential device mode inconsistencies by enforcing the spec within the developers IDE rather than only within Segment.
How does violation alerting work?
Protocol violations can be routed to a separate Segment source. As such, that source can be used just like any other source in Segment. This allows companies to send Segment violations to internal webhooks, slack, email tools etc. and gives them many custom and bespoke options for how they want to be alerted to Segment violations. Additionally, in the Segment user notifications settings, all kinds of email alerts can be setup for Protocols (e.g. when a new event is blocked for the first time, new properties are omited, properties are missing, tracking plans are updated etc..)
How do I fix bad data that is being blocked by protocols?
There are a couple options. The best practice is to always fix the bad data at it’s source within the Segment SDK. This is the safest way to keep your data pipeline clean, but can potentially result in companies waiting long periods for engineering tickets to be addressed, and during this period bad data is being quarantined and is going unused. In this case customers can use Segment transformations within the protocols tab. Segment transformations allow you to make simple name changes for events and properties, as well as reorganize json objects (e.g. unest/nest objects). This is the perfect way to clean up your data while you wait for your engineering ticket to be fulfilled, so good data is still flowing downstream to your end destinations.
How do I add more complex rules/logic to Segment protocols?
Customers sometimes want to add conditional logic, dependencies and all sorts of complex rules to determine when an event should be blocked or allowed through. While this is not supported directly in the wysiwyg editor, Segment protocols under the hood is using JSON schema. Within the protocols UI, you can modify the JSON schema rules directly, which supports all kinds of complex rules to further govern your data. This user will need a decent understanding of raw JSON schema rules to make this work, but all the tools are at their disposal if they know what they are doing. I recommend a free online JSON schema validator when testing, and to always test in development before pushing to a production tracking plan.
If I transform data in Protocols, does it bypass my tracking plan?
Data transforms at the source level do not bypass your tracking plan. Transformed data is passed back in through Protocols to make sure the new transformed format fits your given data schema. Data transforms within protocols are meant to be fixes to your malformed data, not a tool to bypass your data governance.
Data transformations done for specific destinatinos do in fact bypass your protocols governance. These transformations are for when you want to make a very specific change to your data for a single downstream tool and will not be governed by your tracking plan rules.