cybersecurity risk of documentation

By the end of this year, organizations are projected to have spent $87 billion USD on cybersecurity tools like Security Orchestration, Automation and Response (SOAR), Security information and event management (SIEM), and Endpoint Detection and Response (EDR). With attacks ranging from phishing scams to good ol’ fashioned viruses, it’s no wonder there’s such a plethora of tools (and acronyms).

Have you ever considered that your organization’s technical documentation could also pose a cybersecurity risk?

In many of our Developer Experience and Friction Audits and subsequent documentation improvement assignments, we often identify security risks in both content and underlying developer documentation tools. Repercussions can include exposing employees or customers to spam, losing PCI compliance (e.g., for fintechs), opening up unauthorized access to systems (attack vectors), and many other nefarious issues.

Worried? You should be. To help you, we’ve put together this list.

Seven key considerations to reduce the cybersecurity risk of developer documentation:

Exposed IDs in Content and Images
Tooling
Software and Hardware Development Kits and Baked-out Docs
Additional Learning Resources
Git Repos and Other External Systems
Access Control – Employees and Contractors
Sandbox Environment

1. Exposed IDs in Content and Images

A common issue is the exposure of real-world identifiers in code samples and screenshots. This ranges from personally-identifiable information (PII) like names, emails, phone numbers, and addresses, to IDs and URLs of related resources, and authentication data like API keys, as shown in this example:

APIKEY: 523basdwk28e2d9201o22d88

    "employeedID": "1293745"

    "firstName": "John",

    "lastName": "Smith",

    "phone":"1236451234",

    "email":"johnsmith@yourorg.com",

    "companyID":"1398"

To prevent exposure of real data, techwriters commonly obfuscate values and introduce fake data. For example, you might obfuscate part of an email with dots and change the domain name to something fake like this:

"email":"j…@test.com"

In practice, however, teams can easily forget about these touch-ups or insufficiently obfuscate values such that a bad actor can still guess them. For example, can you guess the full name of this email?

"email":"johnsm..h@test.com"

Keep in mind that the same sample value may require obfuscation in several places like matching API requests and responses, tutorial topics, etc. This may seem obvious, but often, multiple siloed authors update different topics in different guides. All it takes is one bad actor to scour your guides and piece together values for identity theft, phishing, spamming, etc.

There’s even greater potential for insufficient obfuscation in screenshots, such as those depicting code snippets, portal admin screens, etc. Image obfuscation requires more effort because it involves art tools to blur values, cover them with filled boxes, or erase pixels. Without careful attention or the right artistic touch, enough data can still show through and enable a bad actor to guess values:

Top: Insufficient box filling

Bottom: Insufficient blurring. Together: different parts of the same API key are (poorly) obfuscated, enabling a bad actor to piece together the value.

Tip: Always remember that not everyone in your organization:

has the tooling, artistic skills, or patience
is aware of obfuscation requirements
knows it’s their responsibility to obfuscate certain images.

It’s also easy to forget to re-obfuscate in art pipelines. For example, if you replace a screenshot in a layer (e.g., in Photoshop), remember to verify that obfuscations in other layers are still aligned and sized correctly over values, before baking out a new image.

To help prevent such issues, be sure to:

have formal documentation policies in place for obfuscation, with clear examples of what’s acceptable versus insufficient.
ensure all docs go through a formal review process by your technical writers.

2. Tooling

For doc tooling, we still see companies roll their own doc systems – something we never advise – usually because off-the-shelf systems lack a desired feature. Not only are these companies reinventing the wheel, but they may not have considered the security implications in their underlying implementation.

In other cases, organizations use Confluence or wikis – another category of tools we don’t recommend for docs. These tools make it too easy for contributors, especially new-hires, to expose internal content through external-facing topics, since both ultimately live in the same tool. Folks not familiar with what’s public versus private can:

accidentally link external docs to internal docs.
set too wide of workspace access permissions.
move topics from internal to customer-facing workspaces.

We recommend you have policies and procedures in place so all stakeholders understand the requirements of both your doc system and how to expose the right level of information.

3. Software and Hardware Development Kits and Baked-out Docs

Be sure to review your HDKs and SDKs, as they often comprise language-specific APIs with heavy code commenting. Although rich code commenting helps your target developers, it’s easy for private data to linger. This is especially prevalent when your public API hasn’t been clearly separated from its implementation. This can expose too much data, and potentially too much of your system’s inner workings.

Also, SDKs, and especially HDKs, often include baked-out doc formats like PDFs, CHM files, Word documents, etc. Here, doc reviews are especially important because once these formats are published and distributed, they can never be unpublished.

Today’s modern SaaS-based doc tools like Readme.com, APIMatic, Document360, and Fern go a long way toward building that single source of truth. For starters, if your docs live in a SaaS-based tool, you can update your official published version at any time. Many of those tools include formal SDK generation from your APIs so that all published content is built from that single source of truth (e.g., your OpenAPI spec).

Many of these tools also offer live API previews with auto-generated authentication values (e.g., API keys) for test purposes. This enables trial developers to self-evaluate your solution via the docs, giving you more control over how and when you hand out official authentication values for real prospects.

And we would be remiss if we didn’t mention CI/CD. Using the built-in linter of tools, you can set up automated checks and balances to catch information that shouldn’t be published. For example, set up a rule to ensure an ID’s sample value is significantly shorter than that of a real value or tell the linter to check that no IDs are in code comments.

4. Additional Learning Resources

Don’t forget that other learning resources also need to be reviewed for leaking too much information, including your:

Technical blogs
White papers
Video tutorials
Postman collections
Public Git repos (more on this below)
Community forums
Customer support messages

See our Developer Journey map for a list of dev-focused touchpoints where PII can make its way in.

5. Git Repos and Other External Systems

Git repos deserve mention because they’re a prime example of an external resource where private information can linger, especially if the history isn’t cleared.

Many folks forget to gitignore credential files and push them to their repository inadvertently. And even when they figure out their mistake and remove it, the information may still be available in the history. Although sites like GitHub scan all public repos for GitHub credentials and revoke them, credentials for any other platform/APIs can still be leaked silently.

6. Access Control – Employees and Contractors

More generally, third-party cloud services like Git, Confluence, SaaS-based documentation tools, and community forums necessitate a plan for administrator access controls. Unfortunately, many companies forget to revoke access for stakeholders who no longer need access (e.g., for employees who have departed the organization). We recommend putting formal measures in place, such as time-limited access credentials.

7. Sandbox Environment

And finally, don’t forget about your sandbox – that test environment where prospects can evaluate, learn, and build with your product.

If you have interactive docs involving authentication values or auto-generated API keys for trials, ensure they’re backed by a test sandbox environment that has no production data.

Ensure your sandbox is partitioned for each prospect or customer and not shared. We’ve seen at least one case where a sandbox was shared across trial users, making for messy testing and a potential security gap. Remember that some trial developers and prospects may put real data in their sandbox, expecting to eventually migrate to your production environment once they adopt your product.

So, spend the time to put together quality test data and the infrastructure necessary to quickly spin up new sandbox instances for new developers.

Need help with your Technical Documentation?

We get it – everyone in your organization is busy on the new release, Git branches are sprouting up faster than a tree in spring, and docs are probably the last thing on most peoples’ minds.

Backed by years of development and coding experience, we can work closely with your team of subject matter experts to help you build best-in-class technical documentation using the latest tools. Contact us today to discuss how we can help you!

Don’t let your Technical Documentation be a Cybersecurity Risk