samm82 / testgen-thesis Goto Github PK

My MASc thesis for generating test cases in Drasil

License: Other

Makefile 0.95% TeX 95.98% Perl 0.01% Python 3.07%

testgen-thesis's Introduction

About Me

Hello! My name is Samuel Crawford; I have a B.Eng. in Software Engineering from McMaster University, and I am back at McMaster for my M.A.Sc., working on Drasil. I am the recipient of a Schulich Leader Scholarship for community leadership and academic excellence, and I strive to put my skills to good use wherever I find myself.

Skills

My areas of expertise are object-oriented design, functional programming, and software testing
My main languages are Python, Haskell, LaTeX, Dart, and Java
I am comfortable with common tools and frameworks like Git, Flutter, pytest, and Make
I have experience with MATLAB, Bash, C, C++, and SQL

testgen-thesis's People

Contributors

Watchers

testgen-thesis's Issues

Note difference between "test phase" and "test level" in IEEE 2017

Just a minor thing I didn't have energy to do, so I'm recording it here so I don't forget! I started this in the footnote, but realized that there is more nuance (i.e., "test phase" is NOT a synonym of "test level", despite what IEEE 2017 says 😅)

Sam's Thesis Meeting | Jan 29, 2023 - 10:30am - ITB 167

My work this past week has mainly been going through the IEEE sources a second time with a finer-toothed comb to get more information, closing #27 and working towards #25. While these items are slightly less "meatier" than what we've discussed before, here are some things I would want to talk about (potentially as well as anything of note on your end):

The possibility of my Master's going into the summer. What would that look like? I'm assuming that a literature review doesn't constitute a Master's project; with this assumption, I doubt I would be finished by April.
From my degree requirements, what would a technical presentation look like? I'm assuming it wouldn't be enough to do this just based on my testing literature review, but I also don't know if I'd have much more done by the competition date (see point 1.)
Potential next steps (extra info: SWEBOK v4 is under review and looking for comments):
a. Finish going through IEEE2017
b. Finish the "second pass" of what I've already done of IEEE2017, then pivot to SWEBOK
c. Go through SWEBOK (v3, the current one)
d. Go through SWEBOK (v4, under review) - would this be a reasonable source to cite, or would this have to be slightly separated from the rest of my research
e. Go through SWEBOK v3 while going through v4 to compare/contrast
Acceptance Tests vs. Acceptance Testing (see IEEE2017) - is this a level of distinction we should make? Is this relevant?

Sam's Thesis Meeting | Dec 5, 2023

1. Discussion of the ISTQB Glossary

The International Software Testing Qualifications Board's glossary may be of use for finding unambiguous definitions. Of particular note:

test harness: A collection of stubs and drivers needed to execute a test suite.
test automation framework: A tool that provides an environment for test automation. It usually includes a test harness and test libraries.
component testing: A test level that focuses on individual hardware or software components. (Synonyms: module testing, unit testing)
component: A part of a system that can be tested in isolation. (Synonyms: module, unit)

Takeaways

A "test harness" is the collection of stubs and drivers and makes up part of the "testing framework".
The current definition of "unit testing" seems to apply to the testing of both modules and specific functions since both could be considered to be "a part of a system that can be tested in isolation".

Do these definitions make sense? Would it be useful to differentiate between the testing of modules and of their functions?

2. Discussion of Taxonomies/Ontologies

I've started looking for testing taxonomies/ontologies (Section 2.7.1 of notes.pdf). I haven't found anything promising as far as software testing taxonomies go, but am currently investigating ontologies.

a. Any advice on finding relevant resources?
b. Any guidance on how to organize this section?
c. "Test factors" vs. "Results of Testing (Area of Confidence)" column

Sam's Thesis Meeting | Dec 19, 2023 - VIRTUAL

My main work this week has been filling in my glossary with information I have already documented. If you think a "peer review" would be beneficial, I'm good with that! There are also some gaps to be filled in later that we could discuss if any stand out. I have some specific questions below, which will likely be answered with time and more research, but figured it wouldn't hurt to provide them to potentially get the ball rolling!

Level of granularity
1. Data Testing
2. Reviews
3. Path Coverage
4. Mutation
5. Regression
Parent relations
1. Regression testing: can be white- or black-box, so child or both or N/A?
2. Metamorphic testing: seems to be black-box for the code and white-box for the tests; is the "transparency" of the testing defined in relation to the test subject?
Confusion on terms
1. Data testing vs. data coverage in Patton2006 (the same thing, or testing = black-box and coverage = white-box?)
2. White-box testing (structural analysis/testing) in Patton2006 vs. structural = white-box in other places (see also #1)
3. Black-box testing ("specification"/functional testing) in Patton2006 vs. black-box in other places (e.g., "functional testing" for FRs)

Sam's Thesis Meeting | Nov 14, 2023

Since I'm feeling sick, I'll attend virtually if that's alright. With the combination of sickness and TA duties, I didn't make a whole lot of progress: I started some of the rearranging of my testing summary document from our meeting last week and added some (not all) rows for black-box testing. Some specific questions I had:

Am I understanding "test harness" correctly? Any obvious things I'm missing?
Granularity of Checking Method
Is "Equivalence Partitioning/Classing" a method for testing software, for testing tests, or a testing heuristic? I think it's the third: it doesn't specify a specific way to test software (although it is related to (Sub-)Boundary Testing, etc.) or the test suite (although it is related to Fault Seeding and Mutation Testing), but it provides a good rule of thumb to use when developing tests.
Related to 2. above, how should "extra" information like Equivalence Partitioning/Classing (depending on what we decide) and Test Oracles be captured? I had suggested putting Test Oracles in its own page of the document last meeting, which I'm still thinking of doing, but may also add a general "Testing Heuristics" sheet as well in case more of these come up.
Any prior knowledge of "Coverage-Based Testing"? (Note that that I haven't really looked into it yet.)

Categorize remainder of test approaches

After the work on #26, testing approaches that are only listed as "approaches" should be categorized.

Sam's Thesis Meeting | Jan 15, 2023 - 10:30am - ITB 167

I've started a more systematic glossary of types of testing based on ISO/IEC/IEEE 29119-1 (referred to as "IEEE" throughout this issue), referencing other sources as appropriate.

Questions about scope

IEEE talks about "test processes" (p. 24), which I think are more general than what we're looking for and may be out of scope.
IEEE seems to differentiate between "testing approaches" and "test design techniques". Since this confused me before (see point 3 in #11), I'm recording this information (at least for now).
IEEE identifies some types of testing as "test types", meaning they are "focused on specific quality characteristics" (p. 15), and some as "practices" (p. 22). Are these significant things to record/keep track of? My hunch is "no".
Figure 1 on page 17 decomposes "Verification and validation" into "Static testing", "Dynamic testing", "Demonstration", and "V and V analysis"; my gut reaction is that the last two are out of scope, but I wanted to double check.

Other things of note from IEEE

Page 18 gives some purposes of testing (detecting defects, gathering information, and creating confidence) that may serve as a good basis for future categorization/analysis.
Page 31 lists some examples of test metrics: residual risk, cumulative effects, test coverage, defect detection percentage.

"Handled" discussion

Compatibility testing's definition implies the existence of "co-existence testing" and "interoperability testing"; IEEE uses the latter three times (pp. 22,43), although I think they really mean "co-existence testing" the third time. ISTQB seems to support that "compatibility" encompasses "co-existence" and "interoperability" (and may even just be their union!) so it may be worthwhile to introduce a "new" term (or at least investigate this in more depth). I've added both of these terms to the document.
"Branch testing" is listed in IEEE, as opposed to "branch coverage" that I've seen in many other sources. Interesting (and important!) to note that coverage is a metric of the extent of the testing (at least how I understand it). Confirmed by @smiths.

Update definition of "error"

My current definition of "error" as "a human action that produces an incorrect result" is a little incomplete and misleading (at least to me). Tracking the original source led me to IEEE2010, which has the more complete definition of "a human action that produces an incorrect result, such as software containing a fault", although it also defines it as "an incorrect step, process, or data definition" and "an incorrect result".

Update definition of "error" in notes
Investigate definitions of "fault" and "failure"
Add IEEE2010 to the GitLab repo

Document and investigate confusion surrounding "functional testing"

As noted by Dr. Smith in #21, "the definition of Type of testing talks about focusing on a specific quality, but then the example includes functional testing. Functional testing doesn't test qualities, but functionality. That is, it tests what the software does, not the quality of how well it does it." This means the following should be done:

"Functional testing" should be removed as an example of "test type" (and probably replaced with a better one)
This discrepancy should be noted in the list in the methodology section of my notes (see #24)
- It also may be a good idea to give this list an explicit header to aid navigation
Make note of how "functional testing" is defined from varying sources
- Add row for "correctness testing" (Pan, 1999)?
- Double check how van Vliet refers to "functional testing"/"specification-based testing" - design or type?
- "Black-box testing" as a synonym vs. as a child term

Create new glossary for generic testing terminology

As mentioned in #47, important terms that aren't testing approaches should be captured in a separate glossary. This includes:

Categorize todos in thesis notes

I'm thinking of at least the following categories:

find original sources
add/modify citations
further investigation

Find conferences/journals to publish in

I've created a wiki page with more in-depth research into publication venues discussed in #24 and #41. Feel free to add thoughts here and/or suggest other ways to present this information! @smiths @JacquesCarette

Review of SWEBOK V4

I've formalized my feedback on SWEBOK I had put in the wiki to their specific Excel format. This template provided space for 40 comments; I had to add nine rows! If you don't have time/brainpower to devote to looking this over, no worries, but I thought I would provide the option (I would focus on entries with a category of "TH", followed by "TL"); the deadline for submission is Feb 16th, so if I don't hear anything by then, I'll submit it. I did have some small questions:

I'm assuming I should provide my McMaster email (especially given the next field is "affiliation"); would this be an issue if I lose access to this email? Do McMaster emails get "recycled" or will it be mine for life?
One of my points of feedback is "…add a correct definition for 'scalability testing'"; should I provide one? Based on #35, I think we're nailing down what that would look like, but doing so with an "academic" source, like the ones used in SWEBOK, could prove tricky. I'm tempted to leave this in their hands; should I prioritize nailing down a definition with more rigorous sources? Could I suggest one with less rigorous sources and leave it up to them?

Ambiguity about meaning of "path" in testing literature

From my thesis notes (currently p. 28):

Path coverage: “[a]ttempting to cover all the paths in the software” [6, p. 119]; I always thought the “path” in “path coverage” was a path from program start to program end, but van Vliet seems to use the more general definition (which is, albeit, sometimes valid, like in “du-path”) of being any subset of a program’s execution (see [7, p. 420])

What are your definitions of a "path" in a software? It is required to begin at the start of the code? Should we make the decision of a definition to use consistently, or should I review the literature further to see if the clarifies things? (See what is currently Q4.)

Regardless of our path forward (pun unintended), I think this will be interesting to include in my thesis somewhere.

Thesis Meeting | Apr 15, 2023 - 10:30am - ITB 167 (or Teams?)

Starting going through ISTQB led me on a side quest to go through ISO/IEC, 2023a, which I finished up recently (this source mainly focused on software qualities, but I got some relevant information about testing).

NOTE: Last I checked, I had a bit of a fever (will check again later tonight) and have had a headache the last couple of days. 😔 Depending on how I'm feeling in the morning, I will likely be able to come in person (and will mask up), assuming you are comfortable with it; otherwise, I'm OK to have this meeting virtually.

General Agenda Items

Poster competition: expectations, format (some examples were emailed recently), focus?
- Rationale
- Metrics of my research (e.g., hierarchies, static vs. dynamic, qualities, approach categorizations)
- Highlights of discrepancies
Overview of relevant changes from ISO/IEC, 2023a (from the Foreword):
- Usability and portability have been replaced with interaction capability and flexibility respectively.
- Inclusivity and self-descriptiveness, resistance, and scalability have been added as subcharacteristics of interaction capability, security, and flexibility respectively.
- User interface aesthetics and maturity have been replaced with user engagement and faultlessness respectively.
- Accessibility has been split into inclusivity and user assistance.

Remove qtodo related to #3

Since we've reached an agreement on the distinction between full path coverage and exhaustive testing in #3, the part referring this should be updated to be more explicit.

Thesis Meeting | Apr 1, 2023 - 10:30am - ITB 167

I finally finished going through IEEE 2017 (not an April Fools' joke)! There are some "lingering questions", but those require some more digging on my end, so I'll include them in a future agenda. As far as next steps, we decided that I would "skim … "Inventory of Testing Ideas and Structuring of Testing Terms" to capture anything that is new/missing/ambiguous/contradictory, then scrape the types of testing and their structure from A Taxonomy of Testing Types to find more 'shoe testing' to be filled in with definitions. I should also be looking 'forward' on these resources to 'find even more shoe testing'" (#37).

However, this skips over ISTQB, which we discussed in #11 and #14. Its format (an interactive webpage/search engine vs. a traditional document) may prove difficult to go through consistently (or come back to continue research later!), but I think this is still a good resource. It also contains some "mind maps" which will likely prove useful as we enter the analysis phase (skimming them a while ago already clarified a question I had 😁)! I think this should be done next, since it will actually provide definitions/relationships as opposed to A Taxonomy of Testing Types, which only (mainly?) has their relationships. Thoughts?

Minor Observations

Not super related to my specific research

"Data": plural? singular? I've always treated it as a collective plural, so "data" refers to a "group of datums", and such, use it like "the data is correct", although IEEE2017 (and others?) would use it as "the data are correct".
IEEE2017 also defines the "software design technique" of "input-process-output (IPO)" (p. 226) in case we want to pivot to that term instead of "input-calculate-output (ICO)".

For Later

Overflow from previous meeting(s) so I don't forget to revisit; I'm planning on doing more investigation before we discuss!

Is "data use" (IEEE, 2017, p. 120) a standard term? Can I assume its meaning is well-known, at least for my glossary?
- We concluded that it is not a well-understood term and should be defined somewhere, but made a note to revisit this

Thesis Meeting | Apr 8, 2023 - 10:30am - ITB 167

This week, in the absence of a large block of time to go through ISTQB (I don't know why I thought one would manifest 😅) I've been taking care of smaller tasks, including circling back to some "lingering questions" from IEEE 2017.

Major Questions

Revisiting "recovery" - #40
Data Flow Testing - how to define/explain? See wiki page and potentially #6

Resources

In going through the paper Dr. Smith sent in the Discord recently (Umar2020), I found a textbook that may be useful: DennisEtAl2012. This prompted some questions, mainly since I haven't yet started the "snowball" process (which I will begin after going through ISTQB and "A Taxonomy of Testing Types"):

Would adding this textbook to my resource to-do list be "ad hoc/arbitrary", as starting from McMaster-trusted textbooks was at the beginning?
At what point is adding previous research from those resources worth it? I've kind of already been interspersing my research as applicable; at what point should I revisit this research systematically?
Going through "A Taxonomy of Testing Types" is on my to-do list, and I went through "Inventory of Testing Ideas and Structuring of Testing Terms" (which Dr. Smith recommended) already; are these, as well as Umar2020 and the follow-up of DennisEtAl2012 "arbitrary" decisions as well?

Figure out bug with use of `tablefootnote`

I may not even end up needing this table, so this is quite low priority. Also worth noting these footnotes show up on new page (not the one with the table).

Clarify scope of research

Following up from #24 and #22, I should clarify the scope of my research:

Update README(s)

Pretty low priority, but the README file (or files, if there are others in other directories) should be updated to refer to my thesis rather than Jason's template.

Fix List of Abbreviations and Symbols

Should Projectile be present if it isn't really an abbreviation? Does this manifest in the main Drasil repo at all?
Acronyms using newacrs shouldn't be grouped separately from newacr; see if there's a way to merge them together

Clarify relation between All-DU-Paths and All-Uses

Based on these lecture slides (which I believe I've already cited), while both require all uses of a variable to be reached, All-Uses only requires some of the paths to each use to be reached while All-DU-Paths requires all of the paths to each use to be reached. This addresses what is currently Q5.

Also note that Peters says on p. 433 that All-DU-Paths and All-P-Uses are only stronger than All-Uses and All-Edges, respectively, for versions that include infeasible paths (e.g., through unreachable code or loops).

Use natbib citation style

This will provide better traceability.

Thesis Meeting | Feb 5, 2023 - 10:30am - ITB 167

This was a pretty busy week for me, so I don't really have much to discuss. I can see what else I can get done tomorrow. 😅 Some topics of note, focused on differing/unusual definitions of testing types:

Definition of scalability testing (see Washizaki, 2024, p. 5-9)
a. Comparison to elasticity testing
b. Minor note: notation for page numbers with dashes?
Different kinds of recovery testing: performance vs. not? (see point 9 in Section 2.7.1 Methodology of my notes)
List that captures incompleteness from IEEE2022's definitions of testing (see itemized list in Section 2.7.1 Methodology of my notes)

BONUS: Discuss other discrepancies/ambiguities from Section 2.7.1 Methodology of my notes

This has the potential to be a shorter meeting, and I'm OK if we decide that it would be better to wait for there to be more meaningful discussion (and therefore decide to SKIP this week's meeting), since there are still more sources for me to cross-reference/get definitions from!

How "exhaustive" is path testing really?

From my thesis notes (currently p. 28):

If done completely, it “is equivalent to exhaustively testing the program” [7, p. 421]; however, this seems to overlook the effect of inputs on behaviour as pointed out in [8, pp. 466-467]

Elaborating on Peters [8]*, the example of zero division is given; this wouldn't be uncovered by path coverage (unless a specific branch was added to check if x == 0, for example), so "exhaustively testing the program" actually requires full path coverage with all possible inputs, which it seems like van Vliet failed to account for. Thoughts? (See what is currently Q5.)

^{*Don't worry, I have a note to use natbib; just haven't gotten there yet. Apparently the issue with my laptop was the motherboard and they said it could be fixed soon, so I'm holding off on more "intensive" work (like running VS Code/LaTeX 😂) until then.}

Track implicit test types

Based on #23, the following should be done:

"Adaptability testing" should be removed from the glossary
A separate document should be created for tracking these "implicit test types", likely with fields for definition, precedence (for its use as a type of testing; essentially rationale for why it could potentially go in the main document), and notes (for more information, like that added in #23)

"Test approach" vs. "test practice"

Main Point

I think I'll change the "Approach" value to "Practice", "Design" to "Technique", and the "Field of Testing" column header to "Approach Category".

Reasoning

Following up on discussion in #19, I realized that the definition of "test approach" includes the following note: "Typical choices made as test approaches are test level, test type, test technique, test practice and the form of static testing to be used" (IEEE, 2022, p. 10). "Test technique" is a synonym for "test design technique" (p. 11), which I called "design" in the "Field of Test" column of my glossary. So it seems like the general phrase for all the entries in my glossary are "approaches", and the "generic" ones are actually "test practices": "conceptual framework[s] that can be applied to the organizational test process, the test management process, and/or the dynamic test process to facilitate testing" (p. 14). This is supported by the following uses:

"the test strategy specifies that the test practices scripted testing, exploratory testing and test automation are use[d]" (p. 20)
Figure 2 on p. 22 supports this breakdown of "test approach"
"experience-based test practices are primarily unscripted" and "include (but are not limited to) exploratory testing, tours, attacks, and checklist-based testing" (p. 34)
"Mathematical-based test practices can be used...." (p. 36)

I'm not sure where we actually landed on the term "test practice", since we may have glossed over it, but I just wanted to keep you in the loop/get your feedback.

What about Static Testing?

This brings up the question I asked in the meeting of "why did they group static testing separately in Figure 2 on p. 22?" I think the answer is in their understanding of test approach. However, I'm not convinced that this distinction is quite correct; you could argue that a lot of the design techniques are "static", since you don't need to run the code to analyze its equivalence partitions, for example. Therefore, I think the four "Types of Approach" are "Level", "Type", "Design Technique" (likely shortened to "Technique" in the glossary), and "Practice", with "Practice" being a sort of catch-all (including most, if not all, of the "static" approaches).

Sam's Thesis Meeting | Dec 12, 2023 - VIRTUAL

Structure of glossary:

How should sources be captured? Should it be a LaTeX table?
Should the "outcome" of the test be captured at this point, or would that be getting too granular? (See point 4. below)

Review of research on existing ontologies. NOTE: You don't need to read the following papers/excerpts for the meeting (although you can if you want); I just want to highlight where this discussion will come from and keep track of it until I upload the papers!

Relations between software testing concepts (Barbosa, 2006, p. 2)¹
Questions from ROoST (Souza, 2017, pp. 8-9, 12)¹
Term for "intervention" (Engstrom, 2015, pp. 1-3)¹: is it useful? Does it apply to methods like using oracles/equivalence classes?
"Test factors" vs. "Results of Testing (Area of Confidence)" column (Perry, 2006, pp. 40-41)¹; revisiting from #14 but actually writing down our discussion 😅

¹I'm assuming it would be helpful to upload resources used to my repo somewhere? In a papers/ folder, named like the BibTeX references for them? I'm sure using natbib style would also help with traceability (#7).

Clarify scope: include test "processes" (at least for now)

We've previously agreed (#28; #39) that "meta"-level approaches to testing are out of scope. Upon finding another, "collaboration-based test approach" from ISTQB, I realized I should make this more explicit in my notes.

Examples to exclude:

Record ambiguity around "acceptance test" and related terms

An "acceptance test" is "usually performed by the purchaser on his premises after installation with the participation of the vendor" (IEEE, 2017, p. 5, emphasis added), while "acceptance testing" is used to "determine whether a system satisfies its acceptance criteria and to enable the customer to determine whether to accept the system" (IEEE, 2017, p. 5), but doesn't seem to imply that the product has been purchased, which seems to suggest that the former is done after the product is released while the latter is done before. However, this difference may not have been "intentional by the SWEBOK authors".

Upon further investigation, a validation test is a "test to determine whether an implemented system fulfils its specified requirements" (IEEE, 2017, p. 499), which seems related to functional testing and/or specification-based testing. This may be worth investigating further, but I'm honestly not sure. I'm tempted to omit this (and "acceptance test") from my table (especially since it is listed as a "test" and not a method of "testing").

This shouldn't be too much of a time sink; just note this information and revisit later if desired.

This is potentially related to the discussion in #39 on the distinction between "product" and "process".

Thesis Meeting | Mar 11, 2023 - 10:30am - ITB 167

Conferences/Publications

See #42 and the wiki page. Don't really have anything specific to ask except for what the next steps are. How much time should I devote into looking for other venues?

Testing Approaches

Difference between "performance testing" and "performance efficiency testing":
- Performance Testing: Testing "conducted to evaluate the degree to which a test item accomplishes its designated functions within given constraints of time and other resources" (IEEE, 2022, p. 7; 2017, p. 320) or within "specified performance requirements" (Washizaki, 2024, p. 5-9)
- Performance Efficiency Testing: Testing to evaluate the "performance [of the SUT] relative to the amount of resources used under stated conditions" (IEEE, 2017, p. 319)
- It seems like the first is essentially a binary test (does this program meet constraints?), while the second is more nuanced (how well does this program meet constraints in specific conditions?)
Product Analysis: The "process of evaluating a product by manual or automated means to determine if the product has certain
characteristics" (IEEE, 2017, p. 343). Is this essentially a synonym for nonfunctional testing, or is that a stretch?
Speaking of testing approaches derived from types of requirements, do "nontechnical requirements" imply "nontechnical testing" for them? What about "physical requirements"?
- Nontechnical Requirement: A "requirement affecting product and service acquisition or development that is not a property of the product or service" (IEEE, 2017, p. 293); is this even in scope? What could this be that isn't functional or nonfunctional?
- Physical Requirement: A "requirement that specifies a physical characteristic that a system or system component must possess" (IEEE, 2017, p. 322); could potentially be in scope, such as file(s) size. Would this be a subset of nonfunctional testing?

Software Qualities

General check-in: am I doing too much here?
Are the following actually qualities worth tracking?
a. "diversity": the "realization of the same function by different means" (IEEE, 2017, p. 143)
b. "effectiveness": the "accuracy and completeness with which users achieve specified goals" (IEEE, 2017, p. 153)
c. "presentability", implied by "presentable": able to "be retrieved and viewed" (IEEE, 2017, p. 333); probably refers mostly to documentation, etc., but could also apply to code?
d. satisfaction-related "qualities"; see below (from ISO/IEC 25010:2011)
To what extent should inconsistencies, etc. be "investigated"?
- e.g., the existence of "context completeness" (IEEE, 2017, p. 96) and "functional completeness" (IEEE, 2017, p. 193) imply the existence of the super-quality of "completeness"

Minor Observations

Not super related to my specific research

"Data": plural? singular? I've always treated it as a collective plural, so "data" refers to a "group of datums", and such, use it like "the data is correct", although IEEE2017 (and others?) would use it as "the data are correct".
IEEE2017 also defines the "software design technique" of "input-process-output (IPO)" (p. 226) in case we want to pivot to that term instead of "input-calculate-output (ICO)".

For Later

Overflow from previous meeting(s) so I don't forget to revisit; I'm planning on doing more investigation before we discuss!

Is "data use" (IEEE, 2017, p. 120) a standard term? Can I assume its meaning is well-known, at least for my glossary?
- We concluded that it is not a well-understood term and should be defined somewhere, but made a note to revisit this

Definition of "scalability" vs. "elasticity testing"

Scalability Testing

SWEBOK V4 says "scalability testing evaluates the capability to use and learn the system and the user documentation. It also focuses on the system’s effectiveness in supporting user tasks and the ability to recover from user errors" (p. 5-9). This seems to be an amalgamation of usability, functional?, and recovery testing. (Hopefully the formatting helps and doesn't make things more confusing!) This seems particularly odd when an earlier subsection of Software Testing gives two definitions of scalability:

"The software’s ability to increase and scale up on its nonfunctional requirements, such as load, number of transactions, and volume of data"
"Connected to the complexity of the platform and environment in which the program runs, such as distributed, wireless networks and virtualized environments, large-scale clusters, and mobile clouds" (p. 5-5); I'd paraphrase this as "the ability of the system to be built upon"

Cross-checking this definition in other sources:

IEEE2022 mentions scalability testing (as a specialized form of testing of Internet of things systems), but doesn't define it
ISO/IEC 2023 defines "scalability" as the "capability of a product to handle growing or shrinking workloads or to adapt its capacity to handle variability" - primarily the first definition above
ISTQB defines "scalability testing" in terms of "scalability", which it defines as "the degree to which a component or system can be adjusted for changing" - a combination of the above two definitions

Elasticity Testing

However, SWEBOK V4 also says "elasticity testing assesses the ability of the SUT … to rapidly expand or shrink compute,
memory, and storage resources without compromising the capacity to meet peak utilization" (p. 5-9). From a cursory Google, this seems to track, and I also remember there was a distinction between the two from my Microservices courses (but not what it is 😬). This definition goes on to say that an "elasticity testing objective [is] … to evaluate scalability" (p. 5-9). This implies that elasticity testing is a parent of scalability testing, but the given definition of "scalability testing" doesn't seem to suggest this.

Wanted to make sure that there isn't some nuance I'm missing before I accuse SWEBOK of being wrong! 😅

Thesis Meeting | Mar 4, 2023 - 10:30am - ITB 167

This was a rough week for me mental health-wise, so I wasn't able to get as much done as I'd hoped… 😬 That being said, there's definitely enough left over from last meeting to discuss, and I was able to do a bit!

Regrets: @JacquesCarette

Guidance on Looking for Conferences/Publications

I found this website that tracks "Calls For Papers in science and technology fields". There are currently over 3500 in software engineering (but unfortunately none with an upcoming deadline in software testing). However, these publications give pretty vague guidance as to what they are looking for, giving a list of possible topics including relevant entries like "Software testing theory and practice" or "Software quality metrics and testing" (or even no specific topics at all)! While this definitely seems like a good resource when looking to publish, I'm not sure how useful it to investigate at this stage. I think we had discussed at least having a deadline would be good, but with such a wealth of options (depending on if we want to limit the scope to those that specifically are looking for testing-related research, which probably isn't a bad idea), unless we already have a specific one we want to go for, it seems a bit arbitrary at this point (at least to me).

Revisiting from Last Meeting

Is "data use" (IEEE, 2017, p. 120) a standard term? Can I assume its meaning is well-known, at least for my glossary?
- We concluded that it is not a well-understood term and should be defined somewhere, but made a note to revisit this

Software Qualities

General check-in: am I doing too much here?
Are the following actually qualities worth tracking?
a. "diversity": the "realization of the same function by different means" (IEEE, 2017, p. 143)
b. "effectiveness": the "accuracy and completeness with which users achieve specified goals" (IEEE, 2017, p. 153)
c. "encapsulation": the "technique" of "isolating a system function or a set of data and operations on those data within a module and providing precise specifications for the module" (IEEE, 2017, p. 158). None of its definitions imply that it is a quality (IEEE, 2017, p. 158), but the same is true of "coupling", which makes me wonder if I overlooked something.
d. "information hiding": the "containment of a design or implementation decision in a single module so that the decision is hidden from other modules" (IEEE, 2017, p. 221)
To what extent should inconsistencies, etc. be "investigated"?
- e.g., the existence of "context completeness" (IEEE, 2017, p. 96) and "functional completeness" (IEEE, 2017, p. 193) imply the existence of the super-quality of "completeness"

Minor Observations

Not super related to my specific research

"Data": plural? singular? I've always treated it as a collective plural, so "data" refers to a "group of datums", and such, use it like "the data is correct", although IEEE2017 (and others?) would use it as "the data are correct".
IEEE2017 also defines the "software design technique" of "input-process-output (IPO)" (p. 226) in case we want to pivot to that term instead of "input-calculate-output (ICO)" (pinging @balacij for this particular point 😅).

Scope of Testing (vs. V&V)

I believe we skimmed over this for time, but the following was a question I had about IEEE2022:

Figure 1 on page 17 decomposes "Verification and validation" into "Static testing", "Dynamic testing", "Demonstration", and "V and V analysis"; my gut reaction is that the last two are out of scope, but I wanted to double check.

Going through IEEE2017 (I've gone through A-E in the glossary so far 😅), I found similar terminology that seems on the boundary of our scope. Should I include these terms as "test approaches", or are they out of scope?

application requirements verification and validation: a "subprocess that confirms that the application specific requirements are consistent and feasible and ensures that the bound variants satisfy the specific product's requirements" (p. 23)
architectural design review: a "joint acquirer‐supplier review to evaluate the technical adequacies of the software architectural design as depicted in the software design descriptions" (p. 26)
certification: a "third‐party attestation related to products, processes, systems, or persons" or "guarantee that a system or component complies with its specified requirements and is acceptable for operational use" (p. 63)
checklist analysis: "a technique for systematically reviewing materials using a list for accuracy and completeness" (p. 67); it seems like this has a context broader than testing, since it is originally from A Guide to the Project Management Body of Knowledge. "Checklist-based testing" is already in my glossary, but this definition narrows down to accuracy and completeness instead of allowing the testing to be "focused on ... [any] particular quality characteristic" (IEEE, 2022, p. 34)
code tuning: the "process of making statement‐level changes to a program to make it more efficient" (p. 74)
design review: a process "to determine if the design meets the applicable requirements, to identify problems, and to propose solutions" (p. 132); also detailed design review
documentation reviews: "the process of gathering a corpus of information and reviewing it to determine accuracy and completeness" (p. 144); also from A Guide to the Project Management Body of Knowledge
error seeding: The "process of intentionally adding known faults to those already in a computer program … [to] monitor[] the rate of detection and removal, and estimating the number of faults remaining" (p. 165); SWEBOK classifies this as a "Test-Related Measure" (2014, p. 93)

Semi-related:

data analysis: the "systematic investigation of the data and their flow in a real or planned system" (p. 114)
domain analysis: the "analysis of systems within a domain to discover commonalities and differences among them" (p. 145)
dynamic analysis: the "process of evaluating a system or component based on its behavior during execution" (p. 149)

Names for SWEBOK paper/citation

I'm not sure if I should name the SWEBOK paper/BibTex citation "SWEBOK", "SWEBOK2014", or "BourqueAndFairley2014" (based on the preferred citation given from the SWEBOK website). I think the last one is the most consistent, but calling it something more recognizable might make it easier to find. Thoughts?

Create Poster

Here are my initial ideas on how I'll structure my poster; any thoughts or ideas? @smiths @JacquesCarette Let me know if anything here isn't quite clear and I can elaborate on what I'm getting at. 😅

Motivation

A "complete" collection of software testing approaches is useful when developing formal tools for software testing
- For example, we wanted to analyze testing approaches to see which we could generate in our framework Drasil
- Go through my attempts of looking for "complete" glossaries (see Section 2.7.2 Methodology of my notes)

Transition

If there isn't a complete collection, one should be created
- Go through my methodology for this research (see Section 2.7.2 Methodology of my notes)
- It should be trivial to create one from relevant standardized documents, no? (emphasis on "no")

How Standardized are the Standards?

Good

There's agreement between the definitions of data flow testing, although it's not present in some sources (e.g., SWEBOK V4)
IEEE provides a nice breakdown of how these testing approaches can be categorized (2022, p. 22) (although this quickly leads to confusion/ambiguity)

Medium

There's general agreement on what alpha testing is, but who performs it is ambiguous (in the Discrepancies and Ambiguities section of my notes)
Experience-based as both technique and practice (in the Discrepancies and Ambiguities section of my notes)
Renaming of qualities in ISO/IEC2023a (see #49) not propagated in sources SWEBOK V4 (double check this!)
Bonus (i.e., if there's resonable space): unhelpful definitions from IEEE2017 ("evaluation", "product analysis", "quality audit", "software product evaluation")

Bad

"empty" definitions from IEEE2017 ("event sequence analysis" and "operable")
undefined terms in IEEE2022 (in the discrepancies section of my notes)
general note that SWEBOK's template for review provided 40 lines, and I needed to add 23 more!

Unclear

These have the potential to require more space to "fully" explain, so unless one of them stands out, I'm planning on only including these if there is reasonable space to include one, potentially as something like a "case study"

operational (acceptance) testing (#36)
- include its omission from SWEBOK V4 from SWEBOK v3.0
scalability vs. elasticity testing (#35)
- include its incorrect definitions in SWEBOK V4
recovery/recoverability testing (#40)
performance testing (in the Discrepancies and Ambiguities section of my notes)

Conclusions

A "complete", systematic glossary, taxonomy, or ontology for software testing approaches would be useful, but doesn't yet exist
Its creation is more involved than one might think, based on discrepancies/ambiguities in software testing literature
Hopefully, the creation of a systematic glossary like this will not only help others who would benefit from it, but also help future literature be more standardized and consistent
- Ideally, this glossary will be able to grow alongside the literature as new testing approaches are developed or preexisting ones are better understood

Improve current "Field of Testing" column

As outlined in #21, some changes should be made the "Field of Testing" column

Explicitly define structural testing

As mentioned here, code structure is tested in both black and white box testing; the external structure in the former and the internal in the latter. This addresses one of the qtodos (currently Q3), although a more "academic" source should be found if possible.

Thesis Meeting | Feb 12, 2023 - 10:30am - ITB 167

I think this is organized fairly logically and ordered in (roughly) descending priority; feel free to comment on the importance (or lack thereof!) of any discussion topic

Logistics

Looking for journals/conferences - internal deadline?
Funding over summer?
Citing less "academic" sources - valuable? Taboo?

Testing Definitions

#35
Recovery testing: performance vs. not? (see point 9 in Section 2.7.1 Methodology of my notes)

SWEBOK V4

My Feedback on SWEBOK V4
SWEBOK V4 sections of testing in scope?
a. 3.6 Techniques Based on the Nature of the Application
b. 3.8 Techniques Based on Derived Knowledge
Minor: notation for citing pages with dashes? I've been doing \cite[pp.~5-6--5-9]{} but ChatGPT suggested \cite[pp.~5-6 to 5-9]{} (thanks Chris for suggesting asking it!)

Next Steps

Short-term: Which resource should I go through next?
a . IEEE 2017 (started)
b. A Taxonomy of Testing Types from Carnegie Mellon University in 2015 (which I found by accident after looking for info on operational (acceptance) testing)
c. "Inventory of Testing Ideas and Structuring of Testing Terms" from 2013 (Dr. Smith's paper in the Discord)
Long-term: List that captures incompleteness from IEEE2022's definitions of testing (see itemized list in Section 2.7.1 Methodology of my notes) - how do we want to capture/format this?

"Implied" Test Types

IEEE 2017 provides definitions of many software qualities. Since a test type is "Testing that is focused on specific quality characteristics" (e.g., functional testing, usability testing, and performance testing) (IEEE, 2022, p. 15) (see #21), it seems reasonable to consider each quality as implying a test type specific to it. Should I be including entries based on these qualities in my glossary? I've found the following (I've gone through A-E in the glossary so far 😅):

adaptability (p. 12)
availability (p. 38)
cohesion (p. 74)
comfort (p. 75) e.g., screen brightness
compliance (p. 82)
conciseness (p. 88)
concurrency (p. 88)
conformance (p. 92); may be at the process level, not the software level (see #22)
connectivity (p. 93)
consistency (p. 94)
correctability (p. 104); related to maintainability?
correctness (p. 104); precedence for this as a type of testing in (Pan, 1999)
coupling (p. 107)
criticality? (p. 110)
efficiency (p. 152); "performance efficiency testing" mentioned in IEEE (2022, p. 9; 2017, pp. 58, 159, 253, 442, 508)
error tolerance (p. 166)
expandability (p. 173)
extendability (p. 174)

And perhaps a quality the software shouldn't have:

complexity (p. 81)

Figure out ellipses

Currently, \dots is used. Should \ldots be used? How should the case of four dots be handled (an example is currently in the Generating Test Cases section)?

Formalize "recovery testing" definition(s)

From #39, recovery testing really only makes sense as a semi-subcategory of performance testing, and the distinctions between its different definitions aren't really meaningful. This should be made explicit, probably as part of the "refinement" of my glossary

Sam's Thesis Meeting | Jan 22, 2023 - 10:30am - VIRTUAL

Do we want to introduce a classification of "metric" (see Error Seeding in glossary), or is this out of scope (#22)?
Difference between "functional testing" and "specification-based testing" (see also #21, #25)
Venues for publishing; where should I keep track of these, and what exactly am I looking for? Budget constraints?
- Journal of Software: Testing, Verification and Reliability
- Software Quality Journal - seems to be more based on the results of testing rather than the methodology
- IEEE Transactions on Software Engineering - more generic, but has published articles both about testing and ontology-adjacent work
- Journal of Systems and Software - "topics of interest include, but are not limited to: methods and tools for software … verification and validation, testing.…"

BONUS: Discuss discrepancies/ambiguities from Section 2.7.1 Methodology of my notes

Lock down notes on Data Flow Testing

The information from the wiki page should be "locked down" and #6 should be addressed.

We've discussed some things, but this should probably be done formally later on in the actual analysis phase

Thesis Meeting | Feb 26, 2023 - 10:30am - ITB 167

Scope Questions

SWEBOK V4 sections of testing in scope?
a. 3.6 Techniques Based on the Nature of the Application
b. 3.8 Techniques Based on Derived Knowledge
Analysis vs. Testing (see #22)
a. data analysis: related to data flow testing
b. domain analysis: out of scope (#22)
c. dynamic analysis: related to dynamic testing and includes demos (IEEE, 2017)
d. static analysis: shown in literature as separate from static testing (see IEEE, 2022, p. 17 and SWEBOK V4, pp. 5-1 to 5-2)

Terminology Questions

Recovery testing: performance vs. not? (see point 9 in Section 2.7.1 Methodology of my notes)
How deep should I go for "implied" test types?
- e.g., since decision table testing exists (SWEBOK V4, 2024, p. 5-11; IEEE, 2022, p. 4) and an extended entry table is a type of decision table (IEEE, 2017, p. 175), this implies (at least to me) the existence of "extended entry table testing"
My gut reaction is that "evaluation", the "systematic determination of the extent to which an entity meets its specified criteria" (IEEE, 2017, p. 167) is not useful as a testing approach. If so, I feel this may be helpful to capture (although I'm not sure where/if it would even be helpful later…)
Is the distinction between a process and the artifact(s) that result from it a meaningful one to make? (e.g., certification, domain analysis)
Is "data use" (IEEE, 2017, p. 120) a standard term? Can I assume its meaning is well-known, at least for my glossary?

Software Qualities

General check-in: am I doing too much here?
Are the following actually qualities worth tracking?
a. "diversity": the "realization of the same function by different means" (IEEE, 2017, p. 143)
b. "effectiveness": the "accuracy and completeness with which users achieve specified goals" (IEEE, 2017, p. 153)
c. "encapsulation": none of its definitions imply that it is a quality (IEEE, 2017, p. 158), but the same is true of "coupling", which makes me wonder if I overlooked something.
To what extent should inconsistencies, etc. be "investigated"?
- e.g., the existence of "context completeness" (IEEE, 2017, p. 96) and "functional completeness" (IEEE, 2017, p. 193) imply the existence of the super-quality of "completeness"

Definition of "operational" vs. "operational acceptance testing"

I have come across the terms "Operational Testing" and "Operational Acceptance Testing", but can't nail down their difference. Some sources treat them as synonyms, including:

Some blogs
A testing platform
A paper from the University of Arizona and
Potentially a paper about a precipitation accumulation gauge (download)

I have found some differences in usage, though:

Operational Acceptance Testing

It isn't defined in IEEE 2022, but is listed as a subset of acceptance testing (as well as on Wikipedia and that testing platform)
ISTQB defines it as "a type of acceptance testing performed to determine if operations and/or systems administration staff can accept a system"

Operational Testing

SWEBOK v3.0 defines it as reliability testing that is "derived by randomly generating test cases according to the operational profile of the software" (p. 4-6) and mentions it can be used for system testing
SWEBOK V4 omits this definition (potentially because this isn't quite the right definition), but includes it as a subset of usage-based testing (which can be used for acceptance testing)
ISO/IEC 2018 defines it as "test[ing] to determine the correct installation, configuration and operation of a module and that it operates securely in the operational environment"
IEEE2017 defines it as "testing conducted to evaluate a system or component in its operational environment"
It is kept generally distinct from acceptance testing in this nuclear fuel project … report?
It is implied to be a subset of operational acceptance testing on TutorialsPoint

Not really sure what the action of this issue is, but figured this was a better place to track it than in a meeting issue 😂

Thesis Meeting | Mar 18, 2023 - 10:30am - ITB 167

Conferences/Publications

See #42 and the wiki page. Don't really have anything specific to ask except for what the next steps are. How much time should I devote into looking for other venues? Is there one here that we think we should commit to?

I made some scripts to help with my search, one to scrape/filter CFPs from the resource mentioned in #41 and one to estimate the amount of committee members from top-100 universities; would these be helpful to upload somewhere? This repo? A separate repo? Drasil?
I'm assuming natural-language-processing-centric conferences are probably not the best fit for us? 😅

Software Qualities

General check-in: am I doing too much here?
Are the following actually qualities worth tracking?
a. "diversity": the "realization of the same function by different means" (IEEE, 2017, p. 143)
b. "effectiveness": the "accuracy and completeness with which users achieve specified goals" (IEEE, 2017, p. 153)
c. "presentability", implied by "presentable": able to "be retrieved and viewed" (IEEE, 2017, p. 333); probably refers mostly to documentation, etc., but could also apply to code?
d. satisfaction-related "qualities"; see below (from ISO/IEC 25010:2011)

General Testing Questions

"Safety" is described as an "expectation" (IEEE, 2017, p. 397), not a quality, but "safety requirement" (which "ensure the safety of the product" (IEEE, 2017, p. 398)) seems to imply it could be.…
Could "software quality evaluation" (the "systematic examination of the extent to which a software product is capable of satisfying stated and implied needs" (IEEE, 2017, p. 425)) be considered a synonym of "(non-)requirements-based testing"?
As mentioned before about other terms, "software product evaluation" (a "technical operation that consists of producing an assessment of one or more characteristics of a software product according to a specified procedure" (IEEE, 2017, p. 424)) seems too vague to be useful; thoughts?
Are "safety demonstrations", defined by IEEE as "bod[ies] of evidence and rationale that show[] an item is justified as being safe within allowed limits on risk" (2017, p. 397) included in static testing? It seems like the term refers to the process, but the definition refers to the artifact - should this be recorded, even if static testing will be removed from our scope?

Minor Observations

Not super related to my specific research

"Data": plural? singular? I've always treated it as a collective plural, so "data" refers to a "group of datums", and such, use it like "the data is correct", although IEEE2017 (and others?) would use it as "the data are correct".
IEEE2017 also defines the "software design technique" of "input-process-output (IPO)" (p. 226) in case we want to pivot to that term instead of "input-calculate-output (ICO)".

For Later

Overflow from previous meeting(s) so I don't forget to revisit; I'm planning on doing more investigation before we discuss!

Is "data use" (IEEE, 2017, p. 120) a standard term? Can I assume its meaning is well-known, at least for my glossary?
- We concluded that it is not a well-understood term and should be defined somewhere, but made a note to revisit this

Sam's Thesis Meeting | Nov 7, 2023

Sorry for the lateness! This is my proposed agenda for tomorrow (likely to be filled in more later):

~~Refined theories version of Projectile?~~
How to best organize testing summary document
a. Does it make sense to have subcategories?
b. Appropriate level of detail?
c. Sufficient "citations" (at least for now?)
Assumptions for unit testing in testing summary document
QTODOs from notes document
a. Big bang testing with Drasil (Q2)
Difference between MT and PBT

NOTE: @JacquesCarette may be a bit late, in which case we'll start with bullet point 1

Add notes on black box testing from Peters

This is the section that I originally had questions about, but reading it again (with the help of this webpage) helped me understand it more. Now I just need to add these types of tests to my notes.

samm82 / testgen-thesis Goto Github PK

testgen-thesis's Introduction

About Me

Skills

testgen-thesis's People

Contributors

Watchers

testgen-thesis's Issues

1. Discussion of the ISTQB Glossary

Takeaways

2. Discussion of Taxonomies/Ontologies

Questions about scope

Other things of note from IEEE

"Handled" discussion

General Agenda Items

Minor Observations

For Later

Major Questions

Resources

Main Point

Reasoning

What about Static Testing?

Conferences/Publications

Testing Approaches

Software Qualities

Minor Observations

For Later

Scalability Testing

Elasticity Testing

Guidance on Looking for Conferences/Publications

Revisiting from Last Meeting

Software Qualities

Minor Observations

Motivation

Transition

How Standardized are the Standards?

Good

Medium

Bad

Unclear

Conclusions

Logistics

Testing Definitions

SWEBOK V4

Next Steps

Scope Questions

Terminology Questions

Software Qualities

Operational Acceptance Testing

Operational Testing

Conferences/Publications

Software Qualities

General Testing Questions

Minor Observations

For Later

Recommend Projects

Recommend Topics

Recommend Org

Jobs