w3c / webdriver Goto Github PK

View Code? Open in Web Editor NEW

669.0 99.0 187.0 8.21 MB

Remote control interface that enables introspection and control of user agents.

Home Page: https://w3c.github.io/webdriver/

License: Other

HTML 98.93% JavaScript 0.92% CSS 0.14%

w3c-specification webdriver standard browser automation remote-control

webdriver's Introduction

WebDriver Standard

WebDriver is a remote control interface that enables introspection and control of user agents. It provides a platform- and language-neutral wire protocol as a way for out-of-process programs to remotely instruct the behavior of web browsers.

Provided is a set of interfaces to discover and manipulate DOM elements in web documents and to control the behavior of a user agent. It is primarily intended to allow web authors to write tests that automate a user agent from a separate controlling process, but may also be used in such a way as to allow in-browser scripts to control a — possibly separate — browser.

The standard is authored by the W3C [Browser Testing and Tools Working Group], and has produced the following documents:

Living Document: https://w3c.github.io/webdriver/
Level 2 (Working Draft): https://www.w3.org/TR/webdriver2/
Level 1 (Recommendation): https://www.w3.org/TR/webdriver1/

Contribute

In short, change index.html and submit a pull request (PR) with a good commit message. Changes that affect behaviour must be accompanied with corresponding test changes to the Web Platform Tests repository.

We use ReSpec to help us maintain referential integrity, bibliographical data, and perform other mundane tasks such as styling. To preview your changes, just load index.html from disk in a browser. To verify the integrity of the document you can run make test.

You may add your name to the Acknowledgements section in your first PR, even for trivial fixes. The names are sorted lexicographically.

See CONTRIBUTING.md for more guidelines.

Vendor status documents

webdriver's People

Contributors

Stargazers

Watchers

Forkers

malini lukeis abotalov automatedtester jgraham sevaseva zenlambda srve4 modulexcite ddavison fatman2021 naltak tripu dandv liuchendong dret kleopatra999 yotov-limited juangj treykc78 ducthanh runt18 sinterbrowser atobatele erikvold shs96c qiuhuilu michaelstoney jameszoft uchuugaka microsoftedge rbyers mjzffr vkatsikaros randallkent ondram titusfortner whimboo jugglinmike bocoup jimevans mdbhuiyan foolip jlipps guoyu07 veith shekyan manoj9788 skmvali dontcallmedom feedthebob sp003y charlieyqin kereliuk pbi-qfs romandev spuyandaykin gsnedders dangbing87 setoelkahfi carlosgcampos rude9 mercmobily elliterate burg aravindanath nbloomf perfdriver kgaurav mattb20 correosdelbosque timsutton sionking alicest c0ns0le gpprakash1989 pebsconsulting webdriver omunroe-com gyyfifafans christian-bromann johnchen0 babusekaran tomaszek1989 navidz zhangeryan antonperez abhishekprsetty gaybro8777 bhanditz timotiusmargo thejohnjansen robert-snakard d700p marcoscaceres rohpavone yang262 yazaihu bharadwaj-pendyala ii0

webdriver's Issues

Update timeouts to have a GET HTTP Verb to return details

https://www.w3.org/Bugs/Public/show_bug.cgi?id=25012

Marc Fisher:

I believe this was discussed at the F2F, but regardless, it would be useful to be able to get the current value for a timeout.

Define Element Send Keys

Ability to take screenshots of containers like flash/java

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27177

Alexei Barantsev:

It should be explicitly claimed ability or inability to take screenshots of "external" elements like Java applets, Flash/Flex objects, may be canvas too.

Missing support for HTTP authentication prompts

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28802

csnover:

There is currently no way to handle HTTP authentication prompts when navigating to a page, only pre-authentication with username/password in the URL works (and, apparently, not without workarounds in some browsers like IE).

Related Selenium issue with links to more background and other information: SeleniumHQ/selenium#453

Getting window position isn't defined

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26708

Andreas Tolfsen:

There is no way currently to get a window's position, you can only set it [1].

https://dvcs.w3.org/hg/webdriver/raw-file/tip/webdriver-spec.html#resizing-and-positioning-windows

Order of error checks in spec incompatible with proxy implementations

https://www.w3.org/Bugs/Public/show_bug.cgi?id=29218

James Graham:

"1. If the current browsing context is no longer open, return error with error code no such window.

Handle any user prompts, and return its value if it is an error.

Let cookie be the result of getting a property named "cookie" from the parameters argument."

This pattern doesn't work with a proxy implementation since it must read the full request before communicating with the backend that can know things like whether the browsing context is still open. Also the browsing context may close whilst the request is being read. So generally it seems better to delay these checks until after the request is fully processed.

Content Type header should be specified on the response

James Graham:

Presumably application/json. Probably some other headers are also needed.

Describe data structures coming over the wire with a JSON schema instead of WebIDL

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26707

Andreas Tolfsen:

The WindowSize dictionary [1] uses the data type double for height/width but this isn't supported in JSON which only has a number type [2](which may be extracted to an integer or float in the local end).

(Furthermore ElementRect may use floats for the positioning of elements in the DOM, e.g. .5px, but WindowSize uses integers as no window managers support half pixels.)

https://dvcs.w3.org/hg/webdriver/raw-file/tip/webdriver-spec.html#dictionary-windowsize-members

http://www.ietf.org/rfc/rfc4627.txt

executeScript and execureAsyncScript don't specify how to handle alerts that can appear during execution

https://www.w3.org/Bugs/Public/show_bug.cgi?id=29134

Alexei Barantsev:

executeScript and execureAsyncScript don't specify how to handle alerts that can appear during execution

Extend the find-by-link-text location strategy to apply to all elements

https://www.w3.org/Bugs/Public/show_bug.cgi?id=24847

Wilhelm Joys Andersen:

Many of the interactive elements in web applications these days are not links, but arbitrary elements with event handlers. To test interaction with such elements, test authors often resort to using XPath.

XPath is an undesirable anti-pattern that should be killed with fire. To be able to fry it, we must first cater to its usecases and make it obsolete.

The first step can be to allow any element to be selected by its (visible) text:

https://dvcs.w3.org/hg/webdriver/raw-file/default/webdriver-spec.html#link-text

element.send_keys needs to talk about implicitly unsetting the modifier keys

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26290

seva:

[recorded in http://www.w3.org/2014/02/26-testing-minutes.html#action19]

Switch to Window doesn't specify what to do with the current browsing context

It sets the current top-level browsing context, but not the current browsing context.

The only real ambiguity comes when "Switch to Window" is called with the handle of the current top-level browsing context (i.e., is a no-op):

Marionette does not change the current browsing context.
All other driver implementations I'm aware of (including Microsoft WebDriver and Apple SafariDriver) set the current browsing context to the top-level context, which seems to me like the right approach.

Merge getting window size and position to a single endpoint

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26709

Andreas Tolfsen:

Getting window size and position (bug 26708) is currently two different command endpoints. We should consider doing the same as for ElementRect and combine them to a single dictionary and endpoint.

Define what response should be sent when an alert is open

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26962

Andreas Tolfsen:

Section 5.2.1 says that “If any modal dialog box, such as those opened by on window.onbeforeunload or window.alert, is opened at any point in the page load, a response MUST be sent.”

Because we need to do this as a precondition for almost all commands I suggest we make it a definition that each command's algorithm can refer to.

The language also need to be cleaned up, and I suggest something along the lines of:

Define a global state that signifies whether an alert dialogue is open.

Create a definition of how what steps to take when the previous state is true, including the steps to populate the response with the correct status.

Add this as a precondition to all commands where we need to check for this.

I imagine we can use a language like this for the POST /session/{session_id}/url command:

“All alert dialogs created during beforeunload are subject to unexpected alert handling.”

And a definition of alert dialogs:

“Window.alert, Window.confirm, and Window.prompt are considered alert dialogs.”

Then some text on how to handle the dialogs:

“Alert dialogs block document script execution and WebDriver behaves the same way. When alert dialogs are created commands are free to choose if they should affect their response. The following steps may be run when a command requests unexpected alert handling on request:

If the current alert is defined:

Let response's status be unexpected alert open.
Return response and abort the remaining steps.

Otherwise, return.
”
Then the definition of “current alert”:

“The remote end must keep a global state current alert that is an initially left undefined alert. When any of the alert dialogs appear, this state must be updated with a reference to that alert.”

And then we need a definition of an “alert” struct which we can use in the algorithm for interacting with the alert.

Use lower-case for screen orientation arguments

The open source wire protocol found in the Selenium project defines
permitted screen orientation arguments to be PORTRAIT and LANDSCAPE
(in upper casing).

My suggestion is that we specify this section to allow
case-insensitive arguments to this command.

Setting orientation to secondary view angles

https://www.w3.org/Bugs/Public/show_bug.cgi?id=23949

Andreas Tolfsen:

WebDriver currently supports setting the screen orientation to either
portrait or landscape mode on devices that support this configuration.
Many devices support further rotation by 90° so that the top of the
device aligns with the bottom border of the viewport.

In Android this is refered to as secondary orientations. My
suggestion is to use *-primary and *-secondary as optional additions
to specify the type of orientation. This would recognize:
PORTRAIT
LANDSCAPE
PORTRAIT-PRIMARY
LANDSCAPE-PRIMARY
PORTRAIT-SECONDARY
LANDSCAPE-SECONDARY
The PORTRAIT and LANDSCAPE orientations would default to
PORTRAIT-PRIMARY and LANDSCAPE-PRIMARY.

Investigate defining what a response is

We define what a command is but we are only loosely talking about responses. In fact, we talk about two different type of responses: those that are results of executing WebDriver commands, and those that are returned as part of running the navigate algorithm from HTML.

Order window handles

https://www.w3.org/Bugs/Public/show_bug.cgi?id=29003

Andreas Tolfsen:

Quoting thephilwells from #39:

Using the Java API, one should be able to do this

List handles = driver.getWindowHandles();
driver.switchTo().window(handles[1]);

..and able to reliably expect that they are viewing the second-oldest open window. This matters most when more than two windows are open, as when one needs to open more than one new window from the original window.

Missing use cases

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26494

Andreas Tolfsen:

At some previous F2F we agreed it would be a good idea to include a section on use cases of WebDriver.

Need a way to enable / disable networking from webdriver

https://www.w3.org/Bugs/Public/show_bug.cgi?id=25179

James Graham:

The new generation of web apps are expected to continue to function when networking is not available for whatever reason. Obviously this is particularly important on mobile devices to close the gap between native apps and web apps.

At present it isn't possible to test the behaviour of an app when it is offline, or the transition between online and offline or vice-versa. This substantially decreases the utility of WebDriver for testing contemporary web applications. Neither is it possible to write testsuites for the features underlying offline support (AppCache, Service Worker), substantially increasing the chance of buggy or non-interoperable implementations.

The most obvious way to provide this would be to expose an API to webdriver that would allow disabling "content" networking i.e. from the point of view of the webpage it would look like the browser was offline, but privileged code (in particular webdriver itself) would still be able to perform network operations.

Provide similar methods for audio and/or video recording as to take screenshots

Gerardo Capiel:

Provide similar methods for audio and/or video recording as to take screenshots.

Does it make sense to require the use of "unknown error"?

https://www.w3.org/Bugs/Public/show_bug.cgi?id=29168

juangj:

From 8. Invalid SSL Certificates:

"In this case, implementations may choose to make accessing a site with bad HTTPS configurations cause a WebDriverException to be thrown. Remote end implementations must return an unknown error error code in this case."

It seems odd to say that the remote end MUST return an "unknown error" in a case where we know exactly what the error is. There doesn't seem to be any other fitting error code, though.

Is "unknown error" intended to just be the catch-all for errors that don't fit any of the other error codes?

specify which headers we expect: cache-control and content-type

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27659

David Burns :automatedtester:

Action item in Santa Clara

http://www.w3.org/2014/10/30-testing-minutes.html#action10

Element location strategies for link text and partial link text references example

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26487

Andreas Tolfsen:

The element location strategies for link text [1] and partial link text [2] references examples in pseudo code which doesn't make any sense. Besides this is uses the very misleading term “visible text” which is not further defined.

It clearly borrows the algorithm used for getting an element's text [3], so perhaps this algorithm should be generalized?

https://dvcs.w3.org/hg/webdriver/raw-file/tip/webdriver-spec.html#link-text

https://dvcs.w3.org/hg/webdriver/raw-file/tip/webdriver-spec.html#partial-link-text

https://dvcs.w3.org/hg/webdriver/raw-file/tip/webdriver-spec.html#widl-WebElement-getElementText-DOMString

maximizeWindow inaccurately talks about resizing a window

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26711

Andreas Tolfsen:

The prose on maximizing the window talks about resizing the window which in the context doesn't make any sense as there is already a precondition that the window manager understands the concept of maximizing the window for the command to succeed, and if it does it's implicitly understood that the window will be resized to what the window manager considers is a maximized window.

Namespacing in capabilities

We support namespacing with extension commands, where UAs are encouraged to use a vendor-specific prefix to separate their additional endpoints from those of the specification. The specification in return makes the guarantee never to specify anything that will conflict with that path namespace.

Similarly we should do this for the capabilities object. For keys such as firefoxOptions and chromeOptions it mostly does not matter, since they are unlikely to ever conflict or protrude on a future reserved keyword. But one can imagine a scenario where different intermediary nodes accept authentication login, and putting username and password fields could conflict.

Clarify whether sessionId is optional from the perspective of the local end before newSession

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27766

Andreas Tolfsen:

Section 2.1 says that sessionId by default is null, but the WebIDL marks it as optional.

It's called out explicitly on the other parameters to the command object which are to be provided by the local end, but it's unclear if this applies to sessionId.

The question is whether sessionId should be undefined or null when calling newSession.

Since the spec allows a passing in a sessionId to newSession, it follows that local ends shouldn't send null to mean “undefined” since that (in section 4.1.1) talks about whether the field is “set” or not.

Should be possible to return errors with executeAsyncScript

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28060

jleyba:

The callback provided to the executeAsyncScript function only accepts a single argument that is always treated as a successful completion. It should be possible for users to call this function with an error to indicate their script failed.

Node.js has popularized the "Error-first" callback approach: errors are passed as the first argument, successful values the second.

Another option would be to standardize on the Error-type. If the callback is invoked with an instanceof Error, the script is marked as a failure.

Unclear what happens if maximizeWindow maximizes window and is called again

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26710

Andreas Tolfsen:

If maximizing the browser window is supported and the window gets maximized upon calling maximizeWindow or the window is somehow already maximized and maximizing the window is supported, it's unclear from reading the spec what happens if it's called again.

Should calling it again de-maximize it, that is defer to the window manager what position and dimensions to set the window to? In this case, should the command really be called “maximize” when what it does is toggling?

Mandate ordering which required capabilities must be checked

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27097

Andreas Tolfsen:

In section 4.1.1 we should mandate, and not merely suggest, the ordering in which the required capabilities must be checked.

This is important for interoperability so that one remote end doesn't parse browserVersion before browserName.

The mandated order should be:

browserName

browserVersion

platformName

platformVersion

Touch gestures API doesn't provide a way to do slow swipe

https://www.w3.org/Bugs/Public/show_bug.cgi?id=25293

Andrey Botalov:

Take a look at https://github.com/appium/appium/blob/master/docs/gestures.md.

It provides two different API calls: swipe and flick.

It seems that flick is just a quick swipe (not sure). The spec doesn't currently provide a way to do a slow swipe.

Close window should define which content is active after window closes

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26486

Andreas Tolfsen:

The close window section doesn't define which default content (strange term I think, why isn't this “context”?) is active when the current default content is closed.

Proposal: Shadow DOM support in WebDriver

Marc Fisher:

The current proposal for dealing with Web Components and their corresponding Shadow DOMs in WebDriver treats them similarly to frames; at any point in time the WebDriver session is operating within a particular DOM, either the top-level DOM or one of the Shadow DOMs of a particular Web Component. However, this makes interacting with pages that use Web Components extremely taxing, as the current DOM will have to be switched on a regular basis, and it is unclear over what time frame an element id found within a particular DOM will be considered valid. Instead I propose that the WebDriver wire protocol be extended with commands to get the list of attached Shadow DOMs for an elements as opaques IDs and to support new element and elements commands that are scoped to a particular DOM. Additionally, element IDs from Shadow DOMs be completely accessible as long they are the corresponding element attached to the Shadow DOM and the Shadow DOM is attached to the page.

For a more thorough description see:
https://docs.google.com/document/d/1qP7Se3MDUac5P0V1Kfm2yaj3fFhBOFCyWyXLcsVwkTA/edit?usp=sharing

getWindowHandles() should return a list of windows ordered by window age

Using the Java API, one should be able to do this

List<String> handles = driver.getWindowHandles();
driver.switchTo().window(handles[1]);

..and able to reliably expect that they are viewing the second-oldest open window. This matters most when more than two windows are open, as when one needs to open more than one new window from the original window.

implicit timeout does not return a timeout response

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28756

John Jansen:

CURRENTLY
"implicit - Set the amount of time the driver should wait when searching for elements. When searching for a single element, the driver should poll the page until an element is found or the timeout expires, whichever occurs first. When searching for multiple elements, the driver should poll the page until at least one element is found or the timeout expires, at which point it should return an empty list."

EXPECTED:
implicit - Set the amount of time the driver SHOULD wait when searching for elements. When searching for a single element, the driver should poll the page until an element is found or the timeout expires, whichever occurs first. When searching for multiple elements, the driver should poll the page until at least one element is found or the timeout expires. If the timeout expires before the driver has finished polling the page, the driver MUST return a "timeout" response.

Need clarification on JavaScript execution when Content Security Policy is in place

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27223

Jim Evans:

If a page has a Content Security Policy applied (spec: https://w3c.github.io/webappsec/specs/content-security-policy/), it may prevent the execution of user-supplied JavaScript via the executeScript command. This is because the injected JavaScript would have no source which could be validated by the policy. The WebDriver spec should have language describing how a driver should behave in this event.

Change of window close endpoint seems unnecessary

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26528

Marc Fisher:

In Selenium window closing endpoint is:
DELETE /session/{sessionId}/window
In spec has been changed to:
DELETE /session/{sessionId}/window_handle

Missing text/selection manipulation primitives

https://www.w3.org/Bugs/Public/show_bug.cgi?id=29135

[email protected]:

As far as I could see, the WebDriver spec currently provides very little in terms of emulating textual manipulations.

NOTE: I will use the term "insertion point" to refer to the textual cursor within e.g. a text box, to differentiate it from the "pointer" cursor

Current Provisions

the entire textual content of an element can be retrieved

it is possible to [clear] an element or [sendKeys] to it (emulating keyboard input)

implicitly, the insertion point and selection can be manipulated using actions (click and pointerDown/pointerMove/pointerUp).

Primary Issues

Pointer actions work in term of offsets, but as far as I could tell

the specification provides no way to perform textual matching and transform that into bounding boxes, thus no way to easily position the insertion point or draw selections

the specification provides no way to query the insertion point or selection for position or bounding boxe, thus no way to get simple feedback while probing blindly

Use case

Test/demonstrate RTEs or other contenteditable elements, allow cross-platform text insertion within existing textual nodes rather than just around them

Possible solutions?

Rect textRect(needle[, element][, skip])

would return the same thing as Element Rect ({x, y, width, height} relative to the document element).

would only match visible text (so text contained in a visible element)

would generate an error if no matching visible text is found?

needle would be the text to look for, possibly a regex? The specification does not currently use regex anywhere so that might be a bit much.

skip would probably be necessary as the reference text could occur multiple times in the source.

a WebElement "root reference" would probably allow easier precise matching and less skipping.

Testing Chrome, Firefox and Safari on OSX, selecting a glyph requires going through the majority of the glyph so selecting from a textual boundary won't risk selecting the preceding glyph.

It's somewhat inconvenient for single-letter boundary selections though as there might be need for lots of skipping.

It doesn't try to count characters/glyphs and thus might help avoid possible confusion issues with respect to code units, normalisation (maybe?), codepoints and glyphs at the interface-level (these concerns may have to be handled at the spec level though).

Unknowns for this possible solution

would/should it be possible to match text across multiple elements? This is possible for users e.g. my browser's in-page search will find a match for "requests | preferences" on the current page even tough that spans two links and a span in two separate list elements.

would/should the rect be augmented with the text's container element(s) in the style of a DOM Range? It doesn't seem to make much sense from a user-interaction perspective.

Unsolved

Should it be possible to query the current selection's span/rect as well, independently from arbitrary text? I don't have a use-case for that right now but a "living" user would see the current text selection displayed in the UI so it could make sense.

Webdriver command batches suggestion

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26266

Anton Zhuravsky:

Hi guys,

I have recently encountered an challenge while implementing automated tests and think we can design a pretty useful generic feature out of it.

The idea is pretty simple: create an interface to be able to create a batch of commands and get a notification upon their completion. So, basically the approach is:

An API consumer requests WebDriver to notice batch start

WebDriver replies with a unique batch id

All the commands executed later on are associated with this batch

An API consumer requests WebDriver to notice batch finish

An API consumer is able to ask if the batched commands have been completed

(Optional) WebDriver can notify API consumer about batch finalization

The motivation behind this is pretty simple: executing some actions can produce delayed side effects (send an AJAX request and handling a callback once it finishes; submitting the form into an iframe and handling onload event; etc). Unfortunately, currently there is no way to know that a command (or a set of commands) have finished working completely (including all side effects produced directly or indirectly by them).

Why do we need this is real world? Well, a number of examples can be provided: the simpliest is writing automated tests for javascript code, which requires asserting some values only after all asynchronous activity has completed.

Please note it is different from page loading modes as the effects are not limited to network / parser / DOM – one can set a delayed executing (via setTimeout) and aim to check if it worked properly (and, of course, only after setTimeout has fired), which gets handled by the proposed functionality.

If anyone could provide his thoughts on the suggestion it would be great – I am keen to discuss and polish the design of this feature :)

Closing a window refers to "quit" which doesn't exist

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26496

Andreas Tolfsen:

The section on DELETE /session/{sessionId}/window, that is for closing a window, refers to calling the function "quit", which (a) isn't defined and (b) isn't relevant anymore.

Instead it should point to DELETE /session/{sessionId}.

See also bug 26495.

send "invalid argument" error if value for POST /session/{sessionId}/element/{id}/value isn't an array

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28344

Andreas Tolfsen:

POST /session/{sessionId}/element/{id}/value should return an "invalid argument" error if the type of value isn't a sequence/array.

We should probably also figure out what to do with illegal content inside the sequence.

Script timeout should also apply for synchronous scripts

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27949

Andreas Tolfsen:

Section 13.1.1 says the script timeout set only applies for executeAsyncScript. It should also apply for executeScript.

maximizeWindow needs an algorithm

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26712

Andreas Tolfsen:

The maximize window section needs an algorithm as it's currently very confusing how it should be implemented by the driver.

Specifically the first paragraph talks about whether the window manager understands the concept of maximization; presumably the driver should return with an error if this isn't supported by the WM.

(It also talks about the return type void which isn't accurate. Also bug 26711.)

Use null to indicate unlimited script timeout

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27951

James Graham:

Since we are using JSON we don't have to pack magic values into integers. null seems like a sensible choice for unsetting the script timeout (or, rather, setting it to an indefinite value), and negative values should cause an error.

Send "invalid argument" error if file does not exist

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28382

Andreas Tolfsen:

On sending keys to an element we should send back an "invalid argument" error with a helpful message if the file you try to set doesn't exist.

As there may be guards against setting invalid/non-existing files in UAs, we should avoid doing so as it may cause the driver to have an internal error.

Default value of parameters in command is array

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27765

Andreas Tolfsen:

In section 2.1 it says that the default value of the parameters field should be an empty array, but the type is an object. Also an array cannot be named:

“The parameters attribute is a map of named parameters to objects representing the value of the parameter. The default value is an empty array and this field MUST NOT be null.”

executeScript and executeAsyncScript algorithms should respect the script timeout

https://www.w3.org/Bugs/Public/show_bug.cgi?id=27950

Andreas Tolfsen:

The executeScript and executeAsyncScript algorithms make no mention of the script timeout global. They should timeout when this is reached, or run indefinitely if -1 (negative integer) is given.

Section 13.1.1 mentions that "script timeout" can be used in this way, but the algorithms don't reflect this.

Ordering of array of web elements returned is undefined

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26706

Andreas Tolfsen:

The only element location strategy that mentions ordering of the returned web elements is the CSS selector in section 9.2.1, and it uses a reference to querySelectorAll. If drivers choose to implement this differently it feels to me that we should define the sorting order explicitly or reference the ECMAScript specification's.

This is further complicated by the other location strategies not having ordering defined.

Link to relevant section:

https://dvcs.w3.org/hg/webdriver/raw-file/tip/webdriver-spec.html#element-location-strategies

Treatment of DOM Text Nodes in mixed content nodes

Terminology: "Mixed content nodes" is used in its meaning defined by XML spec 1.0 5th edition, "DOM text nodes" has the meaning defined by DOM Core spec

Problem dexcription

The entire WebDriver specification addresses retrieving, the visibility, accessing and interaction with 'WebElement's (concept/representation of DOM element nodes), while any of these traits remain undefined to DOM text nodes in mixed content.

Some examples where the accessing text node in mixed content may arise: problem - get the text of <div id="one"> without including the content of the children blockquotes:

  This will result in plenty of white-space Text nodes, highly probable 
  without special meaning for testing
  <div id='one'>
    <blockquote id='two'>dolore ipsum</blockquote>
    Ah, the pain itself
    <blockquote id='three'>Ah, the pain</blockquote>
 </div>

  The non-breakable spaces may carry meaning relevant for testing
  <div id='one'>
    <blockquote id='two'>dolore ipsum</blockquote>&nbsp;&nbsp;Ah, the pain itself
 </div>

The fact that retrieving/accessing the context of text nodes in mixed content is conflated into the WebElement.innerText as the sole solution makes hard the separation of the text nodes content from the content of children elements. Various coping strategies may be found, ranging from:

easy to implement but not exact - e.g. iterate and subtract the inner text of children WebElements from the inner text of the parent (may lead to a great number of whitespace-only text nodes with content that cannot be separated by spaces with relevance for testing - see second example),
more effective but harder to implement and maybe not supported by all browsers - e.g. via scripts able to access the actual Web Page DOM, "injected" viaExecute/Execure Async, keeping in mind that XPath document.evaluate is not supported by a wide range of Internet Explorer versions.

Furthermore, the inability to reference individual text nodes in mixed content also restrict the applicable XPath expression that a WebDriver may accept only to those able to return an WebElement.

The current specification also fails to address the expected reaction of the WebDriver when presented with a valid XPath expression that does not result in a reference to a WebElement (e.g Element attribute or Text node).

In the context of the examples above, the following is the reaction of using the Selenium WebDriver (java API):

driver.findElement(By.xpath("//*[@id='one']/text()"));

org.openqa.selenium.InvalidSelectorException: invalid selector: 
The result of the xpath expression "//*[@id='one']/text()" is: [object Text].
It should be an element.

The need to access text nodes in mixed content seems to exist in the industry with indications that it is not an uncommon need and all work-around approaches mentioned above are taken (including the Script execution - which extends the range of cross-browser support of document.evaluate by using a TreeWalker).

An enhancement request raised against the Selenium WebDriver got a response indicating that the same request would need to be lodged with all the WebDriver providers, the missing functionality being traced to the lack of clarity in the WebDriver specification for handling such cases.

A minimal-change suggestion

The following is thought to be possible solution to the need without introducing new concepts/interfaces into the specification, but only by enhancing the behaviour of existing ones.

~~~_{^{WebElement should implement a way of accessing *text values* of any of the children nodes individually, no matter if text-type or element-type children (maybe supported by `GET /session/{session id}/element/{element id}/child-text/{child-index}` ???) - at least with this functionality in place, applying set differences operations between the texts of all children and the inner-text of element children may yield the set of text nodes' content on individual basis (unpleasant as it may be to do it a every time one needs to individualize text nodes content).}}~~~
It is somehow hard to find a solution to accessing the content of the text nodes in mixed content without introducing new concepts/interfaces, especially because

the interleaving order of text nodes with element nodes (or indeed, other type of nodes) may be significant
there's no guarantees that text content and element content can be distinguishable by their textual content only

As the author of the present issue doesn't have deep enough knowledge of the WebDriver specification, the suggested approach is formulated in terms of Java method specs (assuming the Selenium WebDriver as a reference implementation):

interface WebElement { // Already specified by Selenium WebDrive. // Allows obtaining all the WebElement children by, for example, using By.xpath("./*") java.util.List<WebElement> findElements(By by);

/** Proposed extension: if the parent WebElement is of a mixed-content type and
* there are sibling nodes of text type preceding this node, the method will return
* the content of these text type nodes in the natural order of appearance in the
* document (that is, the last element in this list is the closest to this WebElement).
* Otherwise returns an empty list.
* The method should have the same effect as applying an XPath selection of
* 'preceding-sibling::text()' except for the lack of preceding-sibling::
* axis inversion.
*/
java.util.List<String> getPrecidingTextNodes();

/** Proposed extension: if the parent WebElement is of a mixed-content type and
* there are sibling nodes of text type following this node, the method will return
* the content of these text type nodes. Otherwise returns an empty list.
* The method should have the same effect as applying an XPath selection of
* 'following-sibling::text()'
java.util.List<String> getFollowingTextNodes();

/** Proposed extension: if the (assumed common) WebElement parent of the two
* parameters is of a mixed-content type, the method will return the content of
* the text nodes occurring between the two, in the in the natural order of
* appearance in the document.
* The value in the first position of the returned list will be the content of
* the closest text node to the element represented by the first parameter, the last
* value of the returned list is the closest to the second one.
* If the two nodes are presented in the reversed order from the order established
* by their natural position in the context of their parent, the return of the method
* is an empty list.
* If the first parameter is null, the result of this method is the same as calling
* the getPrecidingTextNodes method for the second parameter.
* If the second parameter is null, the result of this method is the same as calling
* the getFollowingTextNodes method for the first parameter.
* If the two nodes represented by the parameters are not children of this WebElement,
* the method throws.
*
* Note: except for testing of direct parent-ship of this, the result of this method should
* be equivalent with
* child1.getFollowingTextNodes().retainAll(child2.getPrecidingTextNodes())
*/
java.util.List<String> getTextNodesBetween(WebElement child1, WebElement child2);

}

Note: even if two consecutive sibling WebElement nodes are presented to
the getTextNodesBetween method, the method can return a list with a size of more
than one for cases in which other node types that are present in the document break
the flow of the text (comment and processing-instruction nodes).

Note: of course, a WebDriver API which would introduce specific representations for text(), comment() and processing-instruction() nodes would be an exact DOM model of the represented Web page and thus open the opportunities for a richer automation logic (not based solely on artefacts producing a visual representation on the screen). But this would make the present proposal go beyond the *minimal-change suggestion* scope announced in this section.

the behaviour of WebDriver when referencing non-WebElements through XPath/XPointer should be changed to return the closest parent WebElement rather than signalling an error. Examples of such XPath selectors where the parent element is to be returned instead //*[@id=]/@name or //*[@id=]/text() or //*[@id=]/text()[] .

POST /session/{id}/timeouts should take an array of timeouts

https://www.w3.org/Bugs/Public/show_bug.cgi?id=26613

Andreas Tolfsen:

Currently POST /session/{id}/timeouts takes a hash map of {"type": TYPE, "ms": N} which allows setting individual timeouts.

If we consider a local end client binding that wants to set them all at once (pseudo code):
driver.timeouts = [{type: "page load", ms: 123},
                   {type: "implicit", ms: 456},
                   {type: "script", ms: 789}]
This will currently require them to make three individual calls to the endpoint.

An optimization is to allow the endpoint to take an array of dicts instead:
[{"type": TYPE, "ms": N}, …]

w3c / webdriver Goto Github PK

webdriver's Introduction

WebDriver Standard

Contribute

Vendor status documents

webdriver's People

Contributors

Stargazers

Watchers

Forkers

webdriver's Issues

Current Provisions

Primary Issues

Use case

Possible solutions?

Unknowns for this possible solution

Unsolved

Recommend Projects

Recommend Topics

Recommend Org

Jobs