Sunday, June 11, 2017

Lesson learned from Google Test Automation Conference 2016

I took me half a year to get to GTAC 2016 videos, but it was again absolutely worth. Engineers from the top companies presenting about challenges they had overcome. IMHO most of regular IT conferences talks have either solely a marketing purpose without any given value or are too beginner level oriented. GTAC is different! That is why I thought it might be again worth to write down notes from this year. Someone (similar to me - 6 years experience in Test Engineering) could see it in 5 minutes, and decide which talk is suited for him more easily.

  • Evolution of Business and Engineering Productivity
    • (34:49) When duplication is technical dept and when it can be a benefit?
      • flexibility, competition, collaboration when duplication is done in organized way, when it is unnoticed becomes dept
    • (37:22) Metrics and Measurements, e.g. how long it takes my code goes to production? Senior managers have to support this, and develop quarterly goals for employees based on this metrics.
    • (40:12) Test and Release strategy 2.0: continuous deployment, canary testing (testing during canary release), production monitoring, …
    • Build, test, release, repeat will not be sufficient in near future
    • Machine learning to find out which tests are completely useless, so they do not need to be run.
    • (52:20) Automated vs manual test ratio within Google Ads: 95%:5%
    • (53:00) Other metrics at Google: How long developers spend with running tests before submitting their change (a.k.a presubmit latency)?
    • 10 days in general for getting my change into production
  • Automating Telepresence Robot Driving
    • Telepresence - refers to a set of technologies which allow a person to feel as if they were present, to give the appearance of being present, or to have an effect, via telerobotics, at a place other than their true location [].
    • Beam Telepresence System -
    • Beacon
    • (19:12) LiDAR - measure distance with laser
    • (20:50) - Hardware stack for testing of such robot in lab
      • Beam robot with modifications for Lidar scanning, Beam charging Dock, Hokuyo Lidar, NUC small form computer
    • Difficult to find orientation of symmetric object, so make it asymmetric but in a way that it does not affect test (do not change weight that much etc.)
    • (22:08) Lab room considerations have - to be isolated (also for safety cautions, robot has 100 pounds), lighting, flooring
  • What’s in your Wallet?
    • Galen - automate test of look and feel for responsive websites
    • Hygieia - a single, easy to use dashboard to visualize near real time status of entire software delivery pipeline
  • Using test run automation statistics to predict which tests to run
    • (8:55) Which tests not to run?
      • 100% successful, during last month, have > 100 test runs, those who run on all branches
      • Key point: disabled only on trunk, enabled on branches from which merges go to trunk, so basically, when they fail during merge process, they are again enabled, and run at least for another month
      • They were able to save about 50% build time.
  • Selenium-based test automation for Windows and Windows Phone
    • Winium.Mobile - something apart from Appium support for Windows on mobile devices
    • Winium.Desktop automation - opensourced, WPF, WinForms, any accessible app
  • The Quirkier Side of Testings
    • Funny one, must see :)
  • ML Algorithm for Setting up Mobile Test Environment
    • (09:50) Machine Learning algorithm to choose devices for test lab
      • Decision tree, random forest classifier
  • “Can you hear me?” - Surviving Audio Quality Testing
    • (06:58) Audio software testing pyramid
    • (16:40) POLQA algorithm for testing of audio quality. The inputs for this alg. are a reference audio file, and a recorded one. The result is a Mean Opinion Scale result (MOS), which is a grade from 0 to 5.
    • (18:08) Frequency analysis - it identifies actors in the audio recording. Each person speaks with different frequency.
    • (18:51) Speech presence - Finds out regions in the recording where speech was given.
    • (19:01) Amplitude analysis - Verify some speakers are not too loud or not too silent
    • (19:40) Live demo of web service which employs those algorithms
  • IATF: An new Automated Cross-platform and Multi-device API Test Framework
    • (21:25) Test steps sequence diagram for testing communication between two clients connected to a server (via WebRTC protocol)
  • Using Formal Concept Analysis in software testing
    • Can be used for finding dependencies among method parameters, in the form of implications
    • (14:27) Can be used for analysis of test report. Nice example
    • Lattice usage analysis is equivalent to finding most common descriptions of failed tests. In big systems lattice is a good representation for finding similar functionality.
    • Possible extension: ML to find out possible reason why some test failed.

  • How Flaky Tests in Continuous Integration: Current Practice at Google and Future Directions
    • SLA for dev, time he/she commits and gets answer = 3 hours in general
    • Not every change triggers right away test jobs, ⅓ does
    • With ML they can be 90% sure that some test is flaky, and they do not have to rerun it 10 times as usually
    • (14:10) how to identify that tests are flaky, patterns, features, correlations
  • Developer Experience, FTW!
    • Firebase test lab for Android devices,  Espresso, Robotium, or UI Automator 2.0
    • Espresso test recorder available in Android studio
    • (55:29) Firebase test lab will maybe in the future would be able to use real user actions to test the application

  • Docker Based Geo Dispersed Test Farm - Test Infrastructure Practice in Intel Android Program
    • Release and deliver test suites in the way of docker image

  • OpenHTF - The Open-Source Hardware Testing Framework
    • Test harness OSS Python library with Web GUI
    • Plugins for sensors, platforms, chips… any other hardware stuff. Currently not many plugins available.

  • Directed Test Generation to Detect Loop Inefficiencies
    • Redundant traversal of loops performance issue
    • Toddler: detecting performance problems via similar memory-access patterns
    • Glider - suggested approach to address redundant traversal
    • Implemented in Soot bytecode framework

  • Need for Speed - Accelerate Automation Tests From 3 Hours to 3 Minutes
    • Enablers:
      • dedicated environment, saved 57 minutes
      • empty DBs instead of shared DBs, saved 34 minutes
      • simulate dependencies (stub external dependencies), saved 24 minutes, but made tests more stable
      • Moved to containers - slowed the operation, as they did lot of IO operations
      • Run databases in memory - 4 minutes saved
      • Do not clean test data, when you have data in containers, once tests ends, container disappear, so no need for this, 15 minutes saved
      • Run tests in parallel, everybody starts with this, but one should end with this step, 41 min saved. One has to find the right number of threads, too many threads can slow things down.
      • Equalize workload, not every thread executed equal number of test cases
      • By vertical scaling (RAM and CPU) they were able to run in 1:38 min
      • They want to go below one minute by scaling horizontally

  • ClusterRunner: making fast test-feedback easy through horizontal scaling
  • Integration Testing with Multiple Mobile Devices and Services
    • Most frameworks are for single-device, when E2E testing challenges may come up: synchronize steps between multiple devices, large range of equipment - attenuator, call box, power meter, wireless AP
    • Mobly - OSS Python Google library, used to test Android, controls a collection of devices/equipment in a test bed (isolated mobile devices, network switch, IoT, etc)
    • centralized vs decentralized way of executing/dispatching test logic. Mobly is centralized - they found it more easy to debug
    • (18:24) Cool demonstration - two phones on watch, phone A gives voice command to watch, watch initiates a call to phone B, phone B gets call notification
    • Similar frameworks: openHTF, Firebase
  • Scale vs Value: Test Automation at the BBC
    • They become overwhelmed by manual regression tests > BDD to define with different stakeholders what to automate > separate team to ensure devices for testing are available, inventory, status of devices > test lab (lot of smart TVs, test lab in fire corridor :D)
    • PUMA - plan to adapt framework broadly within the company
      • Prove core functionality, automated checks for the core value of your product or system. Regularly audited to combat bloat
      • Understood by all - everyone cares, anyone can execute, visibility to all
      • Mandatory - part of delivery pipeline, any fail check stops the build
      • Automated
    • Whatever framework you use, you need to step back and see what value it brings to you: only important tests should run on real devices, etc
  • Finding bugs in C++ libraries using LibFuzzer
    • What to fuzz: anything that consumes untrusted or complicated inputs: parsers of any kind, media codecs, network protocols, crypto, compression, compilers and interpreters, regular expression matchers, databases, browsers, text editors/processors, OS kernels, drivers, supervisors, Chrome UI
    • How to fuzz: generation based fuzz or mutation based fuzz or guided mutation-based
    • Mutation: e.g. bit flipping
  • How I learned to crash test a server
    • They programmed outlet into which you can ssh, and turn on/off any of the socket
    • Crash virtual machines vs crashing physical machines, both need to be done
    • Virtual machines: from host, single command both for KVM and VMWare based
    • BIOS has setting to restore on AC power loss
    • On Windows there is utility bcedit, by which you can stop the prompt (Start Windows Normally, ...) after an abrupt Windows restart
    • They did not find a systematic way how to crash Windows by internal command (how ironic? :D)

Wednesday, March 30, 2016

Free VRs to schools for a better future

The older I am, the more I feel how the current education system needs a massive improvement. The weakest point I can see, is its inability to motivate students to learn. A classic would say: "Teacher can only show you the way, you have to take it". This is unfortunately working only for students whose source of motivation is coming from someone else (usually from parents). It would never work for students whose parents are not capable of explaining the importance of a proper education (usually such parents do not have time, or knowledge). Therefore, teachers have to motivate students.

They would never be successful with current means (books from 80-90s). Interactive screens, smart devices for each student, it is just not enough. They need to go beyond! They need means which would enable them to associate boring facts with emotions.

Virtual reality to rescue us all!

In my dreams, my children would take to school instead of pens and notebooks (or their electronic counterparts) only one device: a VR set. During each class, they would connect to a host room created by a teacher, together with all other students in the class.

With help of the gamification, they would be able to master the material far more quickly. Imagine, that on a history class, they would be able to profoundly feel the atmosphere of World War 2 battle (Call of Duty like FPS). Or, they would be able to quickly meet people from very different cultures. In science class, they would be able to land on the moon. My favorite, on literature, they would investigate the murder committed by Raskolnikov in a logical adventure.

The educational games can draw inspiration from MMORPG games: best students would be able to lead the raid on the final bosses for each class (e.g. Napoleon). They would gain some kind of gems (William Wallace's sword), which would give them some kind of advantage for the next classes. Gained experience points would also influence their final exam. The VR simulation would be full logical games, and quizzes, which would require home preparation. Why not to have a final exam fully scripted as a test in VR?

Step by step, day by day, we can positively influence children from their very early days to their adulthood. I believe, that this way, there will be no Neo-Nazism, and young people would find what they adore, what they are keen about faster.

State of the art games, and VR headsets show us that it is not a science fiction, and that we are already ready for it. Big game studios and VR producers just need to find it as a business opportunity, and maybe to cooperate with governments to push this forward.

So looking forward for a better future :)

Sunday, January 24, 2016

Appium workarounds #1

Appium is a great tool indeed. For Android for example, it integrates projects like UI Automator & Chrome driver, provides server, client API, instruments emulators/real devices. As expected, all of this components has own bugs. Followings is a list of my workarounds, which should work reliably until fixed in upstream (one I know root cause I will report them :)).

Components versions used:

Appium java-client 3.2.0
Appium server        1.4.16
Chrome driver         2.20
Emulator                 Android 6.0, API level 23 with Intel x86 image

Clear password field reliably

Sometimes, password fields are not cleared reliably. And you just end up with password appended to the prefilled one. It is because UIAutomator right now can not get password field value, and thus the automatic fallback which attempts to clear text field until its value is empty - fails.

What about having something like PasswordField widget, which would have clear method as follows:
public void clear() {;
  for(int i = 0; i <; passwordValue.length(); i++) {
    driver.pressKeyCode(67); //BACKSPACE
It should be enough to click to the middle of password field, when passwords are not too long.

Find element which just recently fade in

Sometimes, Appium was not able to find an element (e.g. android.widget.Button) which just recently fade in. It was not a timing issue. It was non-deterministic, made tests flaky and almost drove me nuts.

Calling driver.getPageSource(), before attempting to find such an element solved my problem.

Inspired from this Appium bug report.

Set date picker field more reliably

There are multiple tutorials how to set Date Picker on Android. They simply advice to call WebElement#sendKeys on day, month and year elements. Sometimes, it just fails to clear completely previous value, and results with wrong date set. Easy solution is to set new value multiple times:
WebElement picker = driver.findElement(PICKER_LOCATOR);

int counter = 0;
while((!picker.getText().equalsIgnoreCase(value)) && (counter < MAX_TRIES_TO_SET_VALUE)) {
if(counter >= MAX_TRIES_TO_SET_VALUE) {
  throw new IllegalStateException("It was not possible to set new value: " + value + 
                      " in Android picker after " + MAX_TRIES_TO_SET_VALUE + " tries.");
}; // confirm entered value
Normally this is not a good practice, and I am trying to avoid repeating doing something until successful, as it introduces false positive results. However, as far as this not a problem of our component, it is OK.

Sunday, January 10, 2016

Lesson learned from Google Test Automation Conference 2015

On November 10-11, 2015, there was the 9th GTAC. Although I was not there, I enjoyed it very much :) How come?

It is because of brilliant recordings:

  • recordings were available very soon after the event (2 weeks later)
  • great video & audio quality
  • audience questions were repeated by the moderator
  • but the most important was the outstanding content
This way, one can enjoy talks performed by professionals from big enterprises such as: Google YouTube team, Google Chrome OS team, Twitter, Uber, Spotify, Netflix, LinkedIn, Lockheed Martin and more.

I often try to watch conference videos. However, I always give up to finish them all. It is because they are publicly available at least half a year after an event, and thus often outdated. These talks were different though!

Following are notes from each talk. Hopefully, someone will find it useful, and encouraged to see its full version.

Keynote - Jürgen Allgayer (Google YouTube)

  • Cultural change, which consisted from:
    • take a SNAPSHOT where we are (how many bugs occurs in the staging phase, etc.)
    • make SLA (what is our goal?, need for a tool which will tell: this team is this effective)
    • and agree on it in whole organisation
    • continuously measure according to defined SLA, to see where we are
    • how many manual tests? Where do we find bugs more often?
  • Goal: No manual regressions, but instead manual exploratory.

The Uber Challenge of Cross-Application/Cross-Device Testing - Apple Chow & Bian Jiang

  • biggest challenge: two separated apps (driver and passenger), while same scenario can be completed using both apps.
    • solution: in house framework called Octopus, which is capable of running two emulators, and manage communication between them
    • Octopus uses signaling to make sure tests are executed in the right order -> asynchronous timeouts
    • Octopus focus: iOS, Android, parallel, signaling, extensible (does not matter what UI framework is used)
    • the communication is done through USB as most reliable channel
    • sending files to communicate - most reliable
  • Why the communication is not mocked? Answer: This is part of your happy path, to finally ensure you are good to go. It does not replaces your unit tests.

Robot Assisted Test Automation - Hans Kuosmanen & Natalia Leinonen (OptoFidelity)

  • When robot based testing is needed?
    • complex interacting components and apps
    • testing reboot or early boot reliability
    • medical industry, safety
    • Chrome OS uses it

Mobile Game Test Automation Using Real Devices - Jouko Kaasila (Bitbar/Testdroid)

  • use OpenCV for image comparison
    • side note: OpenCV is capable of lot of interesting things (object detection, machine learning, video analysis, GPU accelerated computer vision), BSD licence
  • parallel server side execution
  • Appium server, Appium client, OpenCV - all on one virtual machine instance
    • screenshots do not go through internet

Chromecast Test Automation - Brian Gogan (Google)

  • testing WIFI functionality in ''Test beds'' (if I heard the name correctly)
    • small faraday cage which can block signal
    • shield rooms
  • Bad WIFI network - software emulated (netem)
  • (6:35) things which gone bad 
    • test demand exceeded device supply
    • test results varying across devices ( e.g. HDMI ) - solutions: support groups in device manager, add allocation wait time & alerts, SLA < 5 wait time for any device in any group, full traceability of device and test run
  • (6:44) things that gone really wrong
    • unreliable devices, arbitrary going offline for many offline reasons
      • fried hardware, overheating, loss of network connection, kernel bugs, broken recovery mechanism, mutable MAC - solutions: monitoring, logging, redundancy, connectivity - sanity checks at device allocation time, static IP, quarantine broken devices, buy good hardware
  • first prototype for testing lab on card board

Using Robots for Android App Testing - Dr.Shauvik Roy Choudhary (Georgia Tech/Checkdroid)

  • 3 ways to navigate / explore app
    • random (Monkey)
    • model based
    • systematic exploration strategy
  • (12:00) - tools comparison

Your Tests Aren't Flaky - Alister Scott (Automattic)

  • A rerun culture is toxic.
  • There is no such thing as flakiness if you you have testable app.
  • Application test-ability is more than IDs for every element.
  • Application test-ability == Application usability.
  • How to kill flakiness
    • do not rerun tests, use flaky tests as an insight -> build test-ability
  • (16:10) - very strong statement to fight flaky tests - I would make a big poster and make it visible for all testers in QA department :)
    • ''What I do have are a very particular set of skills, skills I have acquired over a very long testing career. Skills that make me a nightmare for flaky tests like you. I will look for you, I will find you, and I will kill you'' - Liam Neeson, Test Engineer

Large-Scale Automated Visual Testing - Adam Carmi (Applitools)

  • why not pixel to pixel comparison?
    • anti-aliasing - on each machine is different - different algorithm used
    • same with pixel brightness
  • screenshots baseline maintenance should be code less

Hands Off Regression Testing - Karin Lundberg (Twitter) and Puneet Khanduri (Twitter)

  • in house project Diffy - makes diff on responses from 2 production and new candidate servers
    • more clear from this slide
    • interesting way how to deal with the "noise" (time-stamps, random numbers)
  • use production traffic by instrumenting clusters

Automated Accessibility Testing for Android Applications - Casey Burkhardt (Google)

  • we all have on daily bases accessibility problem: driving car, coking
  • accessibility is about challenging developers assumption that user can hear, see the content, interact with the app, distinguish colors
  • Android services: talk back, BrailleBack
  • (8:22) - common mistakes
  • (10:34) - accessibility test framework
    • can interact with Espresso

Statistical Data Sampling - Celal Ziftci (Google) and Ben Greenberg (MIT graduate student)

  • getting testing data from production
    • collecting logs from requests and responses
  • need to take into consideration whole production data
    • they managed to reduce the sample to minimum

Nest Automation Infrastructure - Usman Abdullah (Nest), Giulia Guidi (Nest) and Sam Gordon (Nest)

  • (4:15) - Challenges of IoT
    • coordinating sensors
    • battery powered devices
  • (4:55) - Solutions
  • motion detection challenges
    • end to end pipeline
    • reproducibility
    • test duration
  • Motion detection tested with camera in front of TV :)

Enabling Streaming Experiments at Netflix - Minal Mishra (Netflix)

  • Canary deployment for Web Apps
    • Canary release is a technique to reduce the risk of introducing a new software version in production by slowly rolling out the change to a small subset of users before rolling it out to the entire infrastructure and making it available to everybody.
    • I knew the process under different names: Android stage roll out of the app, or Phased rollout.
    • Danilo Sato is describing this in mode detail here.

Mock the Internet - Yabin Kang (LinkedIn)

  • Flashback proxy - their in-house project, which acts as a gateway proxy for three tier architecture communication with the outside world (external partners, Google, Facebook, etc.)
  • it works in record, replay mode
  • it can act as as proxy between components of three tier architecture, or as a proxy between communication of mobile clients
  • mocks the network layer

Effective Testing of a GPS Monitoring Station Receiver - Andrew Knodt (Lockheed Martin)

  • GPS can be divided into three segments:
    • user segments (mobile client) who receives signal
    • space segment - satellites
    • control segment - tells satellites what to do, 31 satellites currently operating
  • Monitoring station receiver, user in control segment - measure distance to each satellite

Automation on Wearable Devices - Anurag Routroy (Intel)

  • 3:00 - how to setup android wearable real device to test on
  • 7:00 - how to start Appium session for wearable device

Unified Infra and CI Integration Testing (Docker/Vagrant) - Maxim Guenis (Supersonic)

  • using docker to create database with pre-populated data, MySQL snapshot, so each test session start with fresh data
  • vagrant + docker
    • because they need iOS, Windows
  • Not using Docker in production, it is not mature enough and because of legacy code
  • docker plus Selenium
    • it handles Selenium server
    • good for CI
  • Docker runs inside Jenkins slaves, runs smoothly
  • Running 100 browser instances simultaneously, requires powerful workstations though
  • One Selenium Grid for each stack
  • static vs. dynamic program analysis
  • great book XUnit Test Patterns
  • copy and paste tests increase “test dept”
  • verify part of test often helps to find similarities among tests, and later refactor them
  • Soot framework, opensource library, do analysis on java bytecode (also Android), used for finding refactorable test methods

Coverage is Not Strongly Correlated with Test Suite Effectiveness - Laura Inozemtseva (University of Waterloo)

  • how can we estimate a fault detection ability of test suite - mutation testing
    • good mutation candidates: change plus for minus, change constant values
  • kind of well known and obvious facts presented

Fake Backends with RpcReplay - Matt Garrett (Google)

  • problem with moc/stubs: we need to ensure they are working, so we test them as well
  • they record request and responses (RPC server), and they serve them instead of starting expensive servers
  • a continuous job which updates RPC logs
  • as bonus, no problem with broken dependencies. Tests run against last green microservices, so if one microservice is broken, then devs are not blocked.

Chrome OS Test Automation Lab - Simran Basi (Google) and Chris Sosa (Google)

  • Chrome OS development model
    • stable releases from branches, no development on branches, just cherry-picks from always stable trunk
    • all feature implementation and bug fixes on trunk first
  • using BuildBot - a CI framework
  • they are emulating change in distance from WIFI router to chrome OS
  • type of testing they are doing
  • they are using AutoTest
  • (19:40) Chrome OS partners goals for testing: OEM, SoC, HW component vendors, Independent Bios vendors
  • What kind of bugs real devices found on top of emulators: wifi, bluetooth, kernel, touchpad, anything low level