Friday, 22 April 2022

Writing Strong Front-end Test Element Locators

Automated front-end tests are awesome. We can write a test with code to visit a page — or load up just a single component — and have that test code click on things or type text like a user would, then make assertions about the state of the application after the interactions. This lets us confirm that everything described in the tests work as expected in the application.

Since this post is about one of the building blocks of any automated UI tests, I don’t assume too much prior knowledge. Feel free to skip the first couple of sections if you’re already familiar with the basics.

Structure of a front-end test

There’s a classic pattern that’s useful to know when writing tests: Arrange, Act, Assert. In front-end tests, this translates to a test file that does the following:

  1. Arrange: Get things ready for the test. Visit a certain page, or mount a certain component with the right props, mock some state, whatever.
  2. Act: Do something to the application. Click a button, fill out a form, etc. Or not, for simple state-checks, we can skip this.
  3. Assert: Check some stuff. Did submitting a form show a thank you message? Did it send the right data to the back end with a POST?

In specifying what to interact with and then later what to check on the page, we can use various element locators to target the parts of the DOM we need to use.

A locator can be something like an element’s ID, the text content of an element, or a CSS selector, like .blog-post or even article > div.container > div > div > p:nth-child(12). Anything about an element that can identify that element to your test runner can be a locator. As you can probably already tell from that last CSS selector, locators come in many varieties.

We often evaluate locators in terms of being brittle or stable. In general, we want the most stable element locators possible so that our test can always find the element it needs, even if the code around the element is changing over time. That said, maximizing stability at all costs can lead to defensive test-writing that actually weakens the tests. We get the most value by having a combination of brittleness and stability that aligns with what we want our tests to care about.

In this way, element locators are like duct tape. They should be really strong in one direction, and tear easily in the other direction. Our tests should hold together and keep passing when unimportant changes are made to the application, but they should readily fail when important changes happen that contradict what we’ve specified in the test.

Beginner’s guide to element locators in front-end testing

First, let’s pretend we are writing instructions for an actual person to do their job. A new gate inspector has just been hired at Gate Inspectors, Inc. You are their boss, and after everybody’s been introduced you are supposed to give them instructions for inspecting their first gate. If you want them to be successful, you probably would not write them a note like this:

Go past the yellow house, keep going ‘til you hit the field where Mike’s mother’s friend’s goat went missing that time, then turn left and tell me if the gate in front of the house across the street from you opens or not.

Those directions are kind of like using a long CSS selector or XPath as a locator. It’s brittle — and it’s the “bad kind of brittle”. If the yellow house gets repainted and you repeat the steps, you can’t find the gate anymore, and might decide to give up (or in this case, the test fails).

Likewise, if you don’t know about Mike’s mother’s friend’s goat situation, you can’t stop at the right reference point to know what gate to check. This is exactly what makes the “bad kind of brittle” bad — the test can break for all kinds of reasons, and none of those reasons have anything to do with the usability of the gate.

So let’s make a different front-end test, one that’s much more stable. After all, legally in this area, all gates on a given road are supposed to have unique serial numbers from the maker:

Go to the gate with serial number 1234 and check if it opens.

This is more like locating an element by its ID. It’s more stable and it’s only one step. All the points of failure from the last test have been removed. This test will only fail if the gate with that ID doesn’t open as expected.

Now, as it turns out, though no two gates should have the same ID on the same road, that’s not actually enforced anywhere And one day, another gate on the road ends up with the same ID.

So the next time the newly hired gate inspector goes to test “Gate 1234,” they find that other one first, and are now visiting the wrong house and checking the wrong thing. The test might fail, or worse: if that gate works as expected, the test still passes but it’s not testing the intended subject. It provides false confidence. It would keep passing even if our original target gate was stolen in the middle of the night, by gate thieves.

After an incident like this, it’s clear that IDs are not as stable as we thought. So, we do some next-level thinking and decide that, on the inside of the gate, we’d like a special ID just for testing. We’ll send out a tech to put the special ID on just this one gate. The new test instructions look like this:

Go to the gate with Test ID “my-favorite-gate” and check if it opens.

This one is like using the popular data-testid attribute. Attributes like this are great because it is obvious in the code that they are intended for use by automated tests and shouldn’t be changed or removed. As long as the gate has that attribute, you will always find the gate. Just like IDs, uniqueness is still not enforced, but it’s a bit more likely.

This is about as far away from brittle as you can get, and it confirms the functionality of the gate. We don’t depend on anything except the attribute we deliberately added for testing. But there’s a bit of problem hiding here…

This is a user interface test for the gate, but the locator is something that no user would ever use to find the gate.

It’s a missed opportunity because, in this imaginary county, it turns out gates are required to have the house number printed on them so that people can see the address. So, all gates should have an unique human-facing label, and if they don’t, it’s a problem in and of itself.

When locating the gate with the test ID, if it turns out that the gate has been repainted and the house number covered up, our test would still pass. But the whole point of the gate is for people to access the house. In other words, a working gate that a user can’t find should still be a test failure, and we want a locator that is capable of revealing this problem.

Here’s another pass at this test instruction for the gate inspector on their first day:

Go to the gate for house number 40 and check if it opens.

This one uses a locator that adds value to the test: it depends on something users also depend on, which is the label for the gate. It adds back a potential reason for the test to fail before it reaches the interaction we want to actually test, which might seem bad at first glance. But in this case, because the locator is meaningful from a user’s perspective, we shouldn’t shrug this off as “brittle.” If the gate can’t be found by its label, it doesn’t matter if it opens or not — this is is the “good kind of brittle”.

The DOM matters

A lot of front-end testing advice tells us to avoid writing locators that depend on DOM structure. This means that developers can refactor components and pages over time and let the tests confirm that user-facing workflows haven’t broken, without having to update tests to catch up to the new structure. This principle is useful, but I would tweak it a bit to say we ought to avoid writing locators that depend on irrelevant DOM structure in our front-end testing.

For an application to function correctly, the DOM should reflect the nature and structure of the content that’s on the screen at any given time. One reason for this is accessibility. A DOM that’s correct in this sense is much easier for assistive technology to parse properly and describe to users who aren’t seeing the contents rendered by the browser. DOM structure and plain old HTML make a huge difference to the independence of users who rely on assistive technology.

Let’s spin up a front-end test to submit something to the contact form of our app. We’ll use Cypress for this, but the principles of choosing locators strategically apply to all front-end testing frameworks that use the DOM for locating elements. Here we find elements, enter some test, submit the form, and verify the “thank you” state is reached:

// 👎 Not recommended
cy.get('#name').type('Mark')
cy.get('#comment').type('test comment')
cy.get('.submit-btn').click()
cy.get('.thank-you').should('be.visible')

There are all kinds of implicit assertions happening in these four lines. cy.get() is checking that the element exists in the DOM. The test will fail if the element doesn’t exist after a certain time, while actions like type and click verify that the elements are visible, enabled, and unobstructed by something else before taking an action.

So, we get a lot “for free” even in a simple test like this, but we’ve also introduced some dependencies upon things we (and our users) don’t really care about. The specific ID and classes that we are checking seem stable enough, especially compared to selectors like div.main > p:nth-child(3) > span.is-a-button or whatever. Those long selectors are so specific that a minor change to the DOM could cause a test to fail because it can’t find the element, not because the functionality is broken.

But even our short selectors, like #name, come with three problems:

  1. The ID could be changed or removed in the code, causing the element to go overlooked, especially if the form might appear more than once on a page. A unique ID might need to be generated for each instance, and that’s not something we can easily pre-fill into a test.
  2. If there is more than one instance of a form on the page and they have the same ID, we need to decide which form to fill out.
  3. We don’t actually care what the ID is from a user perspective, so all the built-in assertions are kind of… not fully leveraged?

For problems one and two, the recommended solution is often to use dedicated data attributes in our HTML that are added exclusively for testing. This is better because our tests no longer depend on the DOM structure, and as a developer modifies the code around a component, the tests will continue to pass without needing an update, as long as they keep the data-test="name-field" attached to the right input element.

This doesn’t address problem three though — we still have a front-end interaction test that depends on something that is meaningless to the user.

Meaningful locators for interactive elements

Element locators are meaningful when they depend on something we actually want to depend on because something about the locator is important to the user experience. In the case of interactive elements, I would argue that the best selector to use is the element’s accessible name. Something like this is ideal:

// 👍 Recommended
cy.getByLabelText('Name').type('Mark')

This example uses the byLabelText helper from Cypress Testing Library. (In fact, if you are using Testing Library in any form, it is probably already helping you write accessible locators like this.)

This is useful because now the built-in checks (that we get for free through the cy.type() command) include the accessibility of the form field. All interactive elements should have an accessible name that is exposed to assistive technology. This is how, for example, a screenreader user would know what the form field they are typing into is called in order to enter the needed information.

The way to provide this accessible name for a form field is usually through a label element associated with the field by an ID. The getByLabelText command from Cypress Testing Library verifies that the field is labeled appropriately, but also that the field itself is an element that’s allowed to have a label. So, for example, the following HTML would correctly fail before the type() command is attempted, because even though a label is present, labeling a div is invalid HTML:

<!-- 👎 Not recommended  -->
<label for="my-custom-input">Editable DIV element:</label>
<div id="my-custom-input" contenteditable="true" />

Because this is invalid HTML, screenreader software could never associate this label with this field correctly. To fix this, we would update the markup to use a real input element:

<!-- 👍 Recommended -->
<label for="my-real-input">Real input:</label>
<input id="my-real-input" type="text" />

This is awesome. Now if the test fails at this point after edits made to the DOM, it’s not because of an irrelevant structure changes to how elements are arranged, but because our edits did something to break a part of DOM that our front-end tests explicitly care about, that would actually matter to users.

Meaningful locators for non-interactive elements

For non-interactive elements, we should put on our thinking caps. Let’s use a little bit of judgement before falling back on the data-cy or data-test attributes that will always be there for us if the DOM doesn’t matter at all.

Before we dip into the generic locators, let’s remember: the state of the DOM is our Whole Thing™ as web developers (at least, I think it is). And the DOM drives the UX for everybody who is not experiencing the page visually. So a lot of the time, there might be some meaningful element that we could or should be using in the code that we can include in a test locator.

And if there’s not, because. say, the application code is all generic containers like div and span, we should consider fixing up the application code first, while adding the test. Otherwise there is a risk of having our tests actually specify that the generic containers are expected and desired, making it a little harder for somebody to modify that component to be more accessible.

This topic opens up a can of worms about how accessibility works in an organization. Often, if nobody is talking about it and it’s not a part of the practice at our companies, we don’t take accessibility seriously as front-end developers. But at the end of the day, we are supposed to be the experts in what is the “right markup” for design, and what to consider in deciding that. I discuss this side of things a lot more in my talk from Connect.Tech 2021, called “Researching and Writing Accessible Vue… Thingies”.

As we saw above, with the elements we traditionally think of as interactive, there is a pretty good rule of thumb that’s easy to build into our front-end tests: interactive elements should have perceivable labels correctly associated to the element. So anything interactive, when we test it, should be selected from the DOM using that required label.

For elements that we don’t think of as interactive — like most content-rendering elements that display pieces of text of whatever, aside from some basic landmarks like main — we wouldn’t trigger any Lighthouse audit failures if we put the bulk of our non-interactive content into generic div or span containers. But the markup won’t be very informative or helpful to assistive technology because it’s not describing the nature and structure of the content to somebody who can’t see it. (To see this taken to an extreme, check out Manuel Matuzovic’s excellent blog post, “Building the most inaccessible site possible with a perfect Lighthouse score.”)

The HTML we render is where we communicate important contextual information to anybody who is not perceiving the content visually. The HTML is used to build the DOM, the DOM is used to create the browser’s accessibility tree, and the accessibility tree is the API that assistive technologies of all kinds can use to express the content and the actions that can be taken to a disabled person using our software. A screenreader is often the first example we think of, but the accessibility tree can also be used by other technology, like displays that turn webpages into Braille, among others.

Automated accessibility checks won’t tell us if we’ve really created the right HTML for the content. The “rightness” of the HTML a judgement call we are making developers about what information we think needs to be communicated in the accessibility tree.

Once we’ve made that call, we can decide how much of that is suitable to bake into the automated front-end testing.

Let’s say that we have decided that a container with the status ARIA role will hold the “thank you” and error messaging for a contact form. This might be nice so that the feedback for the form’s success or failure can be announced by a screenreader. CSS classes of .thank-you and .error could be applied to control the visual state.

If we add this element and want to write a UI test for it, we might write an assertion like this after the test fills out the form and submits it:

// 👎 Not recommended
cy.get('.thank-you').should('be.visible')

Or even a test that uses a non-brittle but still meaningless selector like this:

// 👎 Not recommended
cy.get('[data-testid="thank-you-message"]').should('be.visible')

Both could be rewritten using cy.contains():

// 👍 Recommended
cy.contains('[role="status"]', 'Thank you, we have received your message')
  .should('be.visible')

This would confirm that the expected text appeared and was inside the right kind of container. Compared to the previous test, this has much more value in terms of verifying actual functionality. If any part of this test fails, we’d want to know, because both the message and the element selector are important to us and shouldn’t be changed trivially.

We have definitely gained some coverage here without a lot of extra code, but we’ve also introduced a different kind of brittleness. We have plain English strings in our tests, and that means if the “thank you” message changes to something like “Thank you for reaching out!” this test suddenly fails. Same with all the other tests. A small change to how a label is written would require updating any test that targets elements using that label.

We can improve this by using the same source of truth for these strings in front-end tests as we do in our code. And if we currently have human-readable sentences embedded right there in the HTML of our components… well now we have a reason to pull that stuff out of there.

Human-readable strings might be the magic numbers of UI code

A magic number (or less-excitingly, an “unnamed numerical constant”) is that super-specific value you sometimes see in code that is important to the end result of a calculation, like a good old 1.023033 or something. But since that number is not unlabeled, its significance is unclear, and so it’s unclear what it’s doing. Maybe it applies a tax. Maybe it compensates for some bug that we don’t know about. Who knows?

Either way, the same is true for front-end testing and the general advice is to avoid magic numbers because of their lack of clarity. Code reviews will often catch them and ask what the number is for. Can we move it into a constant? We also do the same thing if a value is to be reused multiple places. Rather than repeat the value everywhere, we can store it in a variable and use the variable as needed.

Writing user interfaces over the years, I’ve come to see text content in HTML or template files as very similar to magic numbers in other contexts. We’re putting absolute values all through our code, but in reality it might be more useful to store those values elsewhere and bring them in at build time (or even through an API depending on the situation).

There are a few reasons for this:

  1. I used to work with clients who wanted to drive everything from a content management system. Content, even form labels and status messages, that didn’t live in the CMS were to be avoided. Clients wanted full control so that content changes didn’t require editing code and re-deploying the site. That makes sense; code and content are different concepts.
  2. I’ve worked in many multilingual codebases where all the text needs to be pulled in through some internationalization framework, like this:
<label for="name">
  <!-- prints "Name" in English but something else in a different language -->
  
</label>
  1. As far as front-end testing goes, our UI tests are much more robust if, instead of checking for a specific “thank you” message we hardcode into the test, we do something like this:
const text = content.en.contactFrom // we would do this once and all tests in the file can read from it

cy.contains(text.nameLabel, '[role="status"]').should('be.visible')

Every situation is different, but having some system of constants for strings is a huge asset when writing robust UI tests, and I would recommend it. Then, if and when translation or dynamic content does become necessary for our situation, we are a lot better prepared for it.

I’ve heard good arguments against importing text strings in tests, too. For example, some find tests are more readable and generally better if the test itself specifies the expected content independently from the content source.

It’s a fair point. I’m less persuaded by this because I think content should be controlled through more of an editorial review/publishing model, and I want the test to check if the expected content from the source got rendered, not some specific strings that were correct when the test was written. But plenty of people disagree with me on this, and I say as long as within a team the tradeoff is understood, either choice is acceptable.

That said, it’s still a good idea to isolate code from content in the front end more generally. And sometimes it might even be valid to mix and match — like importing strings in our component tests and not importing them in our end-to-end tests. This way, we save some duplication and gain confidence that our components display correct content, while still having front-end tests that independently assert the expected text, in the editorial, user-facing sense.

When to use data-test locators

CSS selectors like [data-test="success-message"] are still useful and can be very helpful when used in an intentional way, instead of used all the time. If our judgement is that there’s no meaningful, accessible way to target an element, data-test attributes are still the best option. They are much better than, say, depending on a coincidence like whatever the DOM structure happens to be on the day you are writing the test, and falling back to the “second list item in the third div with a class of card” style of test.

There are also times when content is expected to be dynamic and there’s no way to easily grab strings from some common source of truth to use in our tests. In those situations, a data-test attribute helps us reach the specific element we care about. It can still be combined with an accessibility-friendly assertion, for example:

cy.get('h2[data-test="intro-subheading"]')

Here we want to find what has the data-test attribute of intro-subheading, but still allow our test to assert that it should be a h2 element if that’s what we expect it to be. The data-test attribute is used to make sure we get the specific h2 we are interested in, not some other h2 that might be on the page, if for some reason the content of that h2 can’t be known at the time of the test.

Even in cases where we do know the content, we might still use data attributes to make sure the application renders that content in the right spot:

cy.contains('h2[data-test="intro-subheading"]', 'Welcome to Testing!')

data-test selectors can also be a convenience to get down to a certain part of the page and then make assertions within that. This could look like the following:

cy.get('article[data-test="ablum-card-blur-great-escape"]').within(() => {
  cy.contains('h2', 'The Great Escape').should('be.visible')
  cy.contains('p', '1995 Album by Blur').should('be.visible')
  cy.get('[data-test="stars"]').should('have.length', 5)
})

At that point we get into some nuance because there may very well be other good ways to target this content, it’s just an example. But at the end of the day, it’s a good if worrying about details like that is where we are because at least we have some understanding of the accessibility features embedded in the HTML we are testing, and that we want to include those in our tests.

When the DOM matters, test it

Front-end tests bring a lot more value to us if we are thoughtful about how we tell the tests what elements to interact with, and what to contents to expect. We should prefer accessible names to target interactive components, and we should include specific elements names, ARIA roles, etc., for non-interactive content — if those things are relevant to the functionality. These locators, when practical, create the right combination of strength and brittleness.

And of course, for everything else, there’s data-test.


Writing Strong Front-end Test Element Locators originally published on CSS-Tricks. You should get the newsletter.



from CSS-Tricks https://ift.tt/vSF9bM3
via IFTTT

No comments:

Post a Comment

Passkeys: What the Heck and Why?

These things called  passkeys  sure are making the rounds these days. They were a main attraction at  W3C TPAC 2022 , gained support in  Saf...