Learn and share

Sunday, June 11, 2017

Lesson learned from Google Test Automation Conference 2016

I took me half a year to get to GTAC 2016 videos, but it was again absolutely worth. Engineers from the top companies presenting about challenges they had overcome. IMHO most of regular IT conferences talks have either solely a marketing purpose without any given value or are too beginner level oriented. GTAC is different! That is why I thought it might be again worth to write down notes from this year. Someone (similar to me - 6 years experience in Test Engineering) could see it in 5 minutes, and decide which talk is suited for him more easily.

Evolution of Business and Engineering Productivity

(34:49) When duplication is technical dept and when it can be a benefit?

flexibility, competition, collaboration when duplication is done in organized way, when it is unnoticed becomes dept

(37:22) Metrics and Measurements, e.g. how long it takes my code goes to production? Senior managers have to support this, and develop quarterly goals for employees based on this metrics.
(40:12) Test and Release strategy 2.0: continuous deployment, canary testing (testing during canary release), production monitoring, …
Build, test, release, repeat will not be sufficient in near future
Machine learning to find out which tests are completely useless, so they do not need to be run.
(52:20) Automated vs manual test ratio within Google Ads: 95%:5%
(53:00) Other metrics at Google: How long developers spend with running tests before submitting their change (a.k.a presubmit latency)?
10 days in general for getting my change into production

Automating Telepresence Robot Driving

Telepresence - refers to a set of technologies which allow a person to feel as if they were present, to give the appearance of being present, or to have an effect, via telerobotics, at a place other than their true location [https://en.wikipedia.org/wiki/Telepresence].
Beam Telepresence System - https://suitabletech.com/beampro/
Beacon
(19:12) LiDAR - measure distance with laser
(20:50) - Hardware stack for testing of such robot in lab

Beam robot with modifications for Lidar scanning, Beam charging Dock, Hokuyo Lidar, NUC small form computer

Difficult to find orientation of symmetric object, so make it asymmetric but in a way that it does not affect test (do not change weight that much etc.)
(22:08) Lab room considerations have - to be isolated (also for safety cautions, robot has 100 pounds), lighting, flooring

What’s in your Wallet?

Galen - automate test of look and feel for responsive websites
Hygieia - a single, easy to use dashboard to visualize near real time status of entire software delivery pipeline

Using test run automation statistics to predict which tests to run

(8:55) Which tests not to run?

100% successful, during last month, have > 100 test runs, those who run on all branches
Key point: disabled only on trunk, enabled on branches from which merges go to trunk, so basically, when they fail during merge process, they are again enabled, and run at least for another month
They were able to save about 50% build time.

Selenium-based test automation for Windows and Windows Phone

Winium.Mobile - something apart from Appium support for Windows on mobile devices
Winium.Desktop automation - opensourced, WPF, WinForms, any accessible app

The Quirkier Side of Testings

Funny one, must see :)

ML Algorithm for Setting up Mobile Test Environment

(09:50) Machine Learning algorithm to choose devices for test lab

Decision tree, random forest classifier

“Can you hear me?” - Surviving Audio Quality Testing

(06:58) Audio software testing pyramid
(16:40) POLQA algorithm for testing of audio quality. The inputs for this alg. are a reference audio file, and a recorded one. The result is a Mean Opinion Scale result (MOS), which is a grade from 0 to 5.
(18:08) Frequency analysis - it identifies actors in the audio recording. Each person speaks with different frequency.
(18:51) Speech presence - Finds out regions in the recording where speech was given.
(19:01) Amplitude analysis - Verify some speakers are not too loud or not too silent
(19:40) Live demo of web service which employs those algorithms

IATF: An new Automated Cross-platform and Multi-device API Test Framework

(21:25) Test steps sequence diagram for testing communication between two clients connected to a server (via WebRTC protocol)

Using Formal Concept Analysis in software testing

Can be used for finding dependencies among method parameters, in the form of implications
(14:27) Can be used for analysis of test report. Nice example
Lattice usage analysis is equivalent to finding most common descriptions of failed tests. In big systems lattice is a good representation for finding similar functionality.
Possible extension: ML to find out possible reason why some test failed.

How Flaky Tests in Continuous Integration: Current Practice at Google and Future Directions

SLA for dev, time he/she commits and gets answer = 3 hours in general
Not every change triggers right away test jobs, ⅓ does
With ML they can be 90% sure that some test is flaky, and they do not have to rerun it 10 times as usually
(14:10) how to identify that tests are flaky, patterns, features, correlations

Developer Experience, FTW!

Firebase test lab for Android devices, Espresso, Robotium, or UI Automator 2.0
Espresso test recorder available in Android studio
(55:29) Firebase test lab will maybe in the future would be able to use real user actions to test the application

Docker Based Geo Dispersed Test Farm - Test Infrastructure Practice in Intel Android Program

Release and deliver test suites in the way of docker image

OpenHTF - The Open-Source Hardware Testing Framework

https://github.com/google/openhtf
Test harness OSS Python library with Web GUI
Plugins for sensors, platforms, chips… any other hardware stuff. Currently not many plugins available.

Directed Test Generation to Detect Loop Inefficiencies

Redundant traversal of loops performance issue
Toddler: detecting performance problems via similar memory-access patterns
Glider - suggested approach to address redundant traversal
Implemented in Soot bytecode framework

Need for Speed - Accelerate Automation Tests From 3 Hours to 3 Minutes

Enablers:

dedicated environment, saved 57 minutes
empty DBs instead of shared DBs, saved 34 minutes
simulate dependencies (stub external dependencies), saved 24 minutes, but made tests more stable
Moved to containers - slowed the operation, as they did lot of IO operations
Run databases in memory - 4 minutes saved
Do not clean test data, when you have data in containers, once tests ends, container disappear, so no need for this, 15 minutes saved
Run tests in parallel, everybody starts with this, but one should end with this step, 41 min saved. One has to find the right number of threads, too many threads can slow things down.
Equalize workload, not every thread executed equal number of test cases
By vertical scaling (RAM and CPU) they were able to run in 1:38 min
They want to go below one minute by scaling horizontally

ClusterRunner: making fast test-feedback easy through horizontal scaling

OSS framework to run tests in parallel
It is not CI server, it avoid the CI stuff, although it integrates with CI nicely

Integration Testing with Multiple Mobile Devices and Services

Most frameworks are for single-device, when E2E testing challenges may come up: synchronize steps between multiple devices, large range of equipment - attenuator, call box, power meter, wireless AP
Mobly - OSS Python Google library, used to test Android, controls a collection of devices/equipment in a test bed (isolated mobile devices, network switch, IoT, etc)
centralized vs decentralized way of executing/dispatching test logic. Mobly is centralized - they found it more easy to debug
(18:24) Cool demonstration - two phones on watch, phone A gives voice command to watch, watch initiates a call to phone B, phone B gets call notification
Similar frameworks: openHTF, Firebase

Scale vs Value: Test Automation at the BBC

They become overwhelmed by manual regression tests > BDD to define with different stakeholders what to automate > separate team to ensure devices for testing are available, inventory, status of devices > test lab (lot of smart TVs, test lab in fire corridor :D)
PUMA - plan to adapt framework broadly within the company

Prove core functionality, automated checks for the core value of your product or system. Regularly audited to combat bloat
Understood by all - everyone cares, anyone can execute, visibility to all
Mandatory - part of delivery pipeline, any fail check stops the build
Automated

Whatever framework you use, you need to step back and see what value it brings to you: only important tests should run on real devices, etc

Finding bugs in C++ libraries using LibFuzzer

What to fuzz: anything that consumes untrusted or complicated inputs: parsers of any kind, media codecs, network protocols, crypto, compression, compilers and interpreters, regular expression matchers, databases, browsers, text editors/processors, OS kernels, drivers, supervisors, Chrome UI
How to fuzz: generation based fuzz or mutation based fuzz or guided mutation-based
Mutation: e.g. bit flipping

How I learned to crash test a server

They programmed outlet into which you can ssh, and turn on/off any of the socket
Crash virtual machines vs crashing physical machines, both need to be done
Virtual machines: from host, single command both for KVM and VMWare based
BIOS has setting to restore on AC power loss
On Windows there is utility bcedit, by which you can stop the prompt (Start Windows Normally, ...) after an abrupt Windows restart
They did not find a systematic way how to crash Windows by internal command (how ironic? :D)

Wednesday, March 30, 2016

Free VRs to schools for a better future

The older I am, the more I feel how the current education system needs a massive improvement. The weakest point I can see, is its inability to motivate students to learn. A classic would say: "Teacher can only show you the way, you have to take it". This is unfortunately working only for students whose source of motivation is coming from someone else (usually from parents). It would never work for students whose parents are not capable of explaining the importance of a proper education (usually such parents do not have time, or knowledge). Therefore, teachers have to motivate students.

They would never be successful with current means (books from 80-90s). Interactive screens, smart devices for each student, it is just not enough. They need to go beyond! They need means which would enable them to associate boring facts with emotions.

Virtual reality to rescue us all!

In my dreams, my children would take to school instead of pens and notebooks (or their electronic counterparts) only one device: a VR set. During each class, they would connect to a host room created by a teacher, together with all other students in the class.

With help of the gamification, they would be able to master the material far more quickly. Imagine, that on a history class, they would be able to profoundly feel the atmosphere of World War 2 battle (Call of Duty like FPS). Or, they would be able to quickly meet people from very different cultures. In science class, they would be able to land on the moon. My favorite, on literature, they would investigate the murder committed by Raskolnikov in a logical adventure.

The educational games can draw inspiration from MMORPG games: best students would be able to lead the raid on the final bosses for each class (e.g. Napoleon). They would gain some kind of gems (William Wallace's sword), which would give them some kind of advantage for the next classes. Gained experience points would also influence their final exam. The VR simulation would be full logical games, and quizzes, which would require home preparation. Why not to have a final exam fully scripted as a test in VR?

Step by step, day by day, we can positively influence children from their very early days to their adulthood. I believe, that this way, there will be no Neo-Nazism, and young people would find what they adore, what they are keen about faster.

State of the art games, and VR headsets show us that it is not a science fiction, and that we are already ready for it. Big game studios and VR producers just need to find it as a business opportunity, and maybe to cooperate with governments to push this forward.

So looking forward for a better future :)

Sunday, January 24, 2016

Appium workarounds #1

Appium is a great tool indeed. For Android for example, it integrates projects like UI Automator & Chrome driver, provides server, client API, instruments emulators/real devices. As expected, all of this components has own bugs. Followings is a list of my workarounds, which should work reliably until fixed in upstream (one I know root cause I will report them :)).

Components versions used:

Appium java-client 3.2.0
Appium server 1.4.16
Chrome driver 2.20
Emulator Android 6.0, API level 23 with Intel x86 image

Clear password field reliably

Sometimes, password fields are not cleared reliably. And you just end up with password appended to the prefilled one. It is because UIAutomator right now can not get password field value, and thus the automatic fallback which attempts to clear text field until its value is empty - fails.

What about having something like PasswordField widget, which would have clear method as follows:

public void clear() {
  passwordField.click();
  for(int i = 0; i <; passwordValue.length(); i++) {
    driver.pressKeyCode(67); //BACKSPACE
  }
}

It should be enough to click to the middle of password field, when passwords are not too long.

Find element which just recently fade in

Sometimes, Appium was not able to find an element (e.g. android.widget.Button) which just recently fade in. It was not a timing issue. It was non-deterministic, made tests flaky and almost drove me nuts.

Calling driver.getPageSource(), before attempting to find such an element solved my problem.

Inspired from this Appium bug report.

Set date picker field more reliably

There are multiple tutorials how to set Date Picker on Android. They simply advice to call WebElement#sendKeys on day, month and year elements. Sometimes, it just fails to clear completely previous value, and results with wrong date set. Easy solution is to set new value multiple times:

WebElement picker = driver.findElement(PICKER_LOCATOR);

int counter = 0;
while((!picker.getText().equalsIgnoreCase(value)) && (counter < MAX_TRIES_TO_SET_VALUE)) {
  picker.clear();
  picker.sendKeys(value);
  counter++;
}
if(counter >= MAX_TRIES_TO_SET_VALUE) {
  throw new IllegalStateException("It was not possible to set new value: " + value + 
                      " in Android picker after " + MAX_TRIES_TO_SET_VALUE + " tries.");
}
picker.click(); // confirm entered value

Normally this is not a good practice, and I am trying to avoid repeating doing something until successful, as it introduces false positive results. However, as far as this not a problem of our component, it is OK.

Sunday, January 10, 2016

Lesson learned from Google Test Automation Conference 2015

On November 10-11, 2015, there was the 9th GTAC. Although I was not there, I enjoyed it very much :) How come?

It is because of brilliant recordings:

recordings were available very soon after the event (2 weeks later)
great video & audio quality
audience questions were repeated by the moderator
but the most important was the outstanding content

This way, one can enjoy talks performed by professionals from big enterprises such as: Google YouTube team, Google Chrome OS team, Twitter, Uber, Spotify, Netflix, LinkedIn, Lockheed Martin and more.

I often try to watch conference videos. However, I always give up to finish them all. It is because they are publicly available at least half a year after an event, and thus often outdated. These talks were different though!

Following are notes from each talk. Hopefully, someone will find it useful, and encouraged to see its full version.

Keynote - Jürgen Allgayer (Google YouTube)

Cultural change, which consisted from:

take a SNAPSHOT where we are (how many bugs occurs in the staging phase, etc.)
make SLA (what is our goal?, need for a tool which will tell: this team is this effective)
and agree on it in whole organisation
continuously measure according to defined SLA, to see where we are
how many manual tests? Where do we find bugs more often?

Goal: No manual regressions, but instead manual exploratory.

The Uber Challenge of Cross-Application/Cross-Device Testing - Apple Chow & Bian Jiang

biggest challenge: two separated apps (driver and passenger), while same scenario can be completed using both apps.

solution: in house framework called Octopus, which is capable of running two emulators, and manage communication between them
Octopus uses signaling to make sure tests are executed in the right order -> asynchronous timeouts
Octopus focus: iOS, Android, parallel, signaling, extensible (does not matter what UI framework is used)
the communication is done through USB as most reliable channel
sending files to communicate - most reliable

Why the communication is not mocked? Answer: This is part of your happy path, to finally ensure you are good to go. It does not replaces your unit tests.

Robot Assisted Test Automation - Hans Kuosmanen & Natalia Leinonen (OptoFidelity)

When robot based testing is needed?

complex interacting components and apps
testing reboot or early boot reliability
medical industry, safety
Chrome OS uses it

Mobile Game Test Automation Using Real Devices - Jouko Kaasila (Bitbar/Testdroid)

use OpenCV for image comparison

side note: OpenCV is capable of lot of interesting things (object detection, machine learning, video analysis, GPU accelerated computer vision), BSD licence

parallel server side execution
Appium server, Appium client, OpenCV - all on one virtual machine instance

screenshots do not go through internet

Chromecast Test Automation - Brian Gogan (Google)

testing WIFI functionality in ''Test beds'' (if I heard the name correctly)

small faraday cage which can block signal
shield rooms

Bad WIFI network - software emulated (netem)
(6:35) things which gone bad

test demand exceeded device supply
test results varying across devices ( e.g. HDMI ) - solutions: support groups in device manager, add allocation wait time & alerts, SLA < 5 wait time for any device in any group, full traceability of device and test run

(6:44) things that gone really wrong

unreliable devices, arbitrary going offline for many offline reasons

fried hardware, overheating, loss of network connection, kernel bugs, broken recovery mechanism, mutable MAC - solutions: monitoring, logging, redundancy, connectivity - sanity checks at device allocation time, static IP, quarantine broken devices, buy good hardware

first prototype for testing lab on card board

Using Robots for Android App Testing - Dr.Shauvik Roy Choudhary (Georgia Tech/Checkdroid)

3 ways to navigate / explore app

random (Monkey)
model based
systematic exploration strategy

(12:00) - tools comparison

Your Tests Aren't Flaky - Alister Scott (Automattic)

A rerun culture is toxic.
There is no such thing as flakiness if you you have testable app.
Application test-ability is more than IDs for every element.
Application test-ability == Application usability.
How to kill flakiness

do not rerun tests, use flaky tests as an insight -> build test-ability

(16:10) - very strong statement to fight flaky tests - I would make a big poster and make it visible for all testers in QA department :)

''What I do have are a very particular set of skills, skills I have acquired over a very long testing career. Skills that make me a nightmare for flaky tests like you. I will look for you, I will find you, and I will kill you'' - Liam Neeson, Test Engineer

Large-Scale Automated Visual Testing - Adam Carmi (Applitools)

why not pixel to pixel comparison?

anti-aliasing - on each machine is different - different algorithm used
same with pixel brightness

screenshots baseline maintenance should be code less

I have stated and demonstrated the same in my diploma thesis :P

Hands Off Regression Testing - Karin Lundberg (Twitter) and Puneet Khanduri (Twitter)

in house project Diffy - makes diff on responses from 2 production and new candidate servers

more clear from this slide
interesting way how to deal with the "noise" (time-stamps, random numbers)

use production traffic by instrumenting clusters

Automated Accessibility Testing for Android Applications - Casey Burkhardt (Google)

we all have on daily bases accessibility problem: driving car, coking
accessibility is about challenging developers assumption that user can hear, see the content, interact with the app, distinguish colors
Android services: talk back, BrailleBack
(8:22) - common mistakes
(10:34) - accessibility test framework

can interact with Espresso

Statistical Data Sampling - Celal Ziftci (Google) and Ben Greenberg (MIT graduate student)

getting testing data from production

collecting logs from requests and responses

need to take into consideration whole production data

they managed to reduce the sample to minimum

Nest Automation Infrastructure - Usman Abdullah (Nest), Giulia Guidi (Nest) and Sam Gordon (Nest)

(4:15) - Challenges of IoT

coordinating sensors
battery powered devices

(4:55) - Solutions
motion detection challenges

end to end pipeline
reproducibility
test duration

Motion detection tested with camera in front of TV :)

Enabling Streaming Experiments at Netflix - Minal Mishra (Netflix)

Canary deployment for Web Apps

Canary release is a technique to reduce the risk of introducing a new software version in production by slowly rolling out the change to a small subset of users before rolling it out to the entire infrastructure and making it available to everybody.
I knew the process under different names: Android stage roll out of the app, or Phased rollout.
Danilo Sato is describing this in mode detail here.

Mock the Internet - Yabin Kang (LinkedIn)

Flashback proxy - their in-house project, which acts as a gateway proxy for three tier architecture communication with the outside world (external partners, Google, Facebook, etc.)
it works in record, replay mode
it can act as as proxy between components of three tier architecture, or as a proxy between communication of mobile clients
mocks the network layer

Effective Testing of a GPS Monitoring Station Receiver - Andrew Knodt (Lockheed Martin)

GPS can be divided into three segments:

user segments (mobile client) who receives signal
space segment - satellites
control segment - tells satellites what to do, 31 satellites currently operating

Monitoring station receiver, user in control segment - measure distance to each satellite

Automation on Wearable Devices - Anurag Routroy (Intel)

3:00 - how to setup android wearable real device to test on
7:00 - how to start Appium session for wearable device

Unified Infra and CI Integration Testing (Docker/Vagrant) - Maxim Guenis (Supersonic)

using docker to create database with pre-populated data, MySQL snapshot, so each test session start with fresh data
vagrant + docker

because they need iOS, Windows

Not using Docker in production, it is not mature enough and because of legacy code
docker plus Selenium

it handles Selenium server
good for CI

Docker runs inside Jenkins slaves, runs smoothly
Running 100 browser instances simultaneously, requires powerful workstations though
One Selenium Grid for each stack

Test Suites and Program Analysis - Patrick Lam (University of Waterloo)

static vs. dynamic program analysis
great book XUnit Test Patterns
copy and paste tests increase “test dept”
verify part of test often helps to find similarities among tests, and later refactor them
Soot framework, opensource library, do analysis on java bytecode (also Android), used for finding refactorable test methods

Coverage is Not Strongly Correlated with Test Suite Effectiveness - Laura Inozemtseva (University of Waterloo)

how can we estimate a fault detection ability of test suite - mutation testing

good mutation candidates: change plus for minus, change constant values

kind of well known and obvious facts presented

Fake Backends with RpcReplay - Matt Garrett (Google)

problem with moc/stubs: we need to ensure they are working, so we test them as well
they record request and responses (RPC server), and they serve them instead of starting expensive servers
a continuous job which updates RPC logs
as bonus, no problem with broken dependencies. Tests run against last green microservices, so if one microservice is broken, then devs are not blocked.

Chrome OS Test Automation Lab - Simran Basi (Google) and Chris Sosa (Google)

Chrome OS development model

stable releases from branches, no development on branches, just cherry-picks from always stable trunk
all feature implementation and bug fixes on trunk first

using BuildBot - a CI framework
they are emulating change in distance from WIFI router to chrome OS
type of testing they are doing
they are using AutoTest
(19:40) Chrome OS partners goals for testing: OEM, SoC, HW component vendors, Independent Bios vendors
What kind of bugs real devices found on top of emulators: wifi, bluetooth, kernel, touchpad, anything low level