flashcards (brainscape)
Which of the following is NOT a likely advantage of a persuasive virtual assistant over human persuaders?
1. Can be simultaneously used by lots of users.
2. Ability to continuously monitor the user 24/7 without getting fatigued.
3. Superior understanding of psychological nuances and social contexts.
4. Ability to process large amounts of health data for personalized analysis.
3
Context data must be coupled with the ability to interpret it. While virtual assistants can handle data processing and analysis of the context data, they still lack human-like common sense to fully comprehend psychological nuances, social norms and contexts that human persuaders would naturally possess. This is exactly the reason why we need “human-in-the-loop”. Therefore, 3 is the correct answer.
Which of the following are given as examples of a context-aware app where the app is designed to automatically do a command when certain contexts are met? (Select all that apply)
1. Active Badge system triggering reminders based on location
2. Geonotes for leaving location-based annotations
3. Siren app generating alerts for firefighters based on contexts
4. Cyberguide mobile tour guide providing contextual information
5. Automatic brightness adjustment based on ambient light levels
1 & 3
Based on the question explanation and the context-aware app design dimension, this app is categorized as “Context-triggered actions” where the design choices are “Automatic” and “Command”. Therefore, we should look at samples that correspond with those two design choices. 1. and 3. are the correct answers - The Active Badge and Siren app system using rules to trigger actions based on contextual conditions, so they belong “Automatic” and “Command” design dimension. The other choices are incorrect because they adopt a different choices in either design dimension.
According to Freeman Dyson, which of the following will contribute most to the new trends in science?
a) The development of new concepts
b) The introduction of new tools
c) Collaboration between different scientific disciplines
d) Government and institutional funding policies
b
According to Freeman Dyson, the aspect that most influences new directions in science is the introduction of new tools. Dyson, a renowned theoretical physicist and mathematician, emphasized the significant role that the development of new experimental and computational tools plays in driving scientific progress. While other factors like the development of new concepts, interdisciplinary collaboration, and funding policies are important, Dyson particularly highlighted how new tools can lead to groundbreaking discoveries and open up entirely new fields of study.
What is the correct functional triad for the ‘Baby Think It Over’ persuasive technology, designed to help teenage girls understand the challenges of caring for babies?
a) Tool: User behavior tracking; Medium: Baby simulator robot; Social Actor: Teenage girls
b) Tool: Educational messages; Medium: Baby simulator robot; Social Actor: Teenage girls
c) Tool: User behavior tracking; Medium: Interactive real-world simulation; Social Actor: Baby simulator robot
d) Tool: User behavior tracking; Medium: baby simulator robot; Social Actor: School teachers
c
The Tool in this case is user behavior tracking, implying that the technology keeps track of how the user interacts with and cares for the simulator. The Medium being an interactive real-world simulation means that the baby simulator provides a realistic experience of caring for a baby, allowing users to understand and respond to various scenarios in real time. The Social Actor as the robot (or the baby simulator) signifies that it is the entity through which the interaction occurs and upon which the user’s actions are focused.
Which of the following best describes the key difference between participatory and opportunistic mobile sensing paradigms?
a) Participatory sensing requires users to actively collect the sensor data, whereas opportunistic sensing relies on automatic sensor data collection without user intervention.
b) Participatory sensing only utilizes built-in smartphone sensors, while opportunistic sensing can incorporate external sensor devices.
c) Participatory sensing collects data automatically while opportunistic sensing relies on user involvement for high-quality data collection.
d) Participatory sensing focuses on collecting data for individual use, whereas opportunistic sensing is designed for large-scale data collection across multiple users or devices.
a
a) Correct
b) The distinction between participatory and opportunistic sensing is based on user involvement in data collection, not on the use of internal versus external sensors.
c) It is the opposite. Opportunistic sensing collects data automatically, and it is participatory sensing that relies on user involvement for high-quality data collection.
d) Both sensing paradigms can be scaled to individual or community levels. The defining factor of the two paradigms are their method of data collection, not the scale of deployment.
20235599
In the context of context-aware computing, which of the following best illustrates the use of context to enhance the effectiveness of persuasive technology?
a) Persuasive technology primarily uses the user’s current activity and environmental conditions to trigger contextually relevant behavioral suggestions.
b) Persuasive technology primarily relies on static user profiles and predetermined schedules for interventions.
c) Persuasive technology needs manual confirmation of the context from the users to be effective.
d) Persuasive technology applications avoid using sensor data from mobile devices to infer context, relying instead on explicit user settings.
a
a) Correct
b) Persuasive technology uses both static user profiles and dynamic contextual information to customize interventions, not just predetermined schedules.
c) While manual inputs can enhance context-aware computing, it primarily utilizes automatic sensing and data analysis to determine context in persuasive technology.
d) Persuasive technology often utilizes sensor data from mobile devices to infer context, enhancing interventions beyond what is possible with just explicit user settings.
Which of the following explanations is consistent with the definition of proper nouns?
a) Contexts: Contexts in one scenario can have some overlapped area, like persuasive technology being the overlapped area of computers and persuasion.
b) Proximate selection: This technology is aimed at making the located objects “emphasized” or “being easier to choose.”
c) Context-triggered actions: This technology is faced with one challenge in the accuracy of language for rules.
d) Persuasive technology: Nintendo’s Pocket Pikachu’s medium is raising a virtual pet.
b
In a given scenario, different contexts cannot have overlapping areas (for instance, in the context of indoor individuals, there is a distinction between sleeping and not sleeping, which are the only two mutually exclusive contexts in the awake state). Therefore, option (a) is incorrect. Option (b) reflects the verbatim content from the slides and is accurate. Option (c) refers to the expressiveness of language for rules rather than accuracy. Regarding option (d), “raising a virtual pet” pertains to the social actor, not the medium, making option (d) incorrect. So the correct answer is b.
Which of the following is the correct order of sensor data processing pipeline:
a) Data collection –> Model Building –> Segmentation –>Evaluation –> Feature Extration
b) Data collection –> Model Building –> Segmentation –> Feature Extration –> Evaluation
c) Model Building –> Segmentation –> Feature Extration –> Data Collection –> Evaluation
d) Data collection –> Segmentation –> Feature Extration –> Model Building –> Evaluation
The correct order in a sensor data processing pipeline is (d) which typically involves collecting data first, followed by segmentation, feature extraction, model building, and finally, evaluation. This sequence ensures that raw data is processed and refined before building a model and evaluating its performance.
20246104:
Which of the following is incorrect about advantages of persuasive technology over human persuaders? Persuasive technology a) Is less persistent than human beings b) Offers greater anonymity c) Scales more easily d) Goes where humans cannot go or may not be welcomed.
20246104:
The incorrect option is (a), computer technology is more persistent than human beings as the IT systems can operate 24/7 without any changes to its behaviours. Human beings can be influenced by other factors however, machines and are pre-programmed and therefore remain persistent through its life cycle unless manually altered.
Which of the following statements is incorrect about cyber-physical systems? (select one)
1. Digital transformation is used for managing interconnected systems between their physical assets and computational capabilities.
2. A rich variety of inputs and outputs, such as gesture input, voice commands, and wearable devices, are utilized.
3. The sensors primarily operate indoors to collect real-time data.
4. Data is gathered from cross-domain sensors and IoT devices, enabling data-driven intelligence.
5. Devices are available in diverse form factors, including smartphones, smart bulbs, and smart switches.
3
Transformative technologies are employed in cyber-physical systems to manage interconnected systems, integrating their physical assets with computational capabilities. These systems boast a diverse array of devices, leading to a rich variety of inputs and outputs. The vast amount of data collected from these inputs and outputs facilitates data-driven intelligence within cyber-physical systems. Notably, sensors in such systems can be installed both indoors and outdoors, depending on application requirements. Hence, option 3 is the correct answer.
Which of the following best describes the functional triads of persuasive technology? (select two)
1. The tool makes the target behavior more difficult to perform.
2. The medium provides users with unrealistic experiences that deter motivation.
3. The social factor performs calculations or measurements that motivate.
4. The medium assists users in exploring cause-and-effect relationships.
5. A social factor rewards users with positive feedback and models target behaviors.
4 & 5
The tool facilitates the target behavior, making it easier to perform, while the medium provides users with vicarious experiences that serve as motivation. Additionally, the tool performs calculations or measurements that motivate, and the medium enables users to explore cause-and-effect relationships. Moreover, a social factor rewards people with positive feedback and exemplifies target behaviors or attitudes. Therefore, options 4 and 5 are the correct answers.
20246104
Which statement is incorrect about the sensing paradigm? (Choose one)
a. Participatory sensing is entirely free from privacy issues.
b. Monitoring urban noise pollution by users measuring and sharing ambient noise (using their phone) is an example of participatory sensing.
c. Sensors installed in public transportation to automatically monitor passenger counts is a example of opportunistic sensing.
d. Opportunistic sensing minimize user data collection efforts.
20235599
Which statement is incorrect about the Sensing paradigm? (Choose one)
a. “Participatory” sensing is entirely free from privacy issues.
b. Monitoring urban noise pollution by users measuring and sharing ambient noise (using their phone) is an example of “Participatory” sensing.
c. Sensors installed in public transportation to automatically monitor passenger counts is a example of “Opportunistic” sensing.
d. “Opportunistic” sensing minimize user data collection efforts.
a
“Participatory “ sensing involves individuals actively engaging in data collection. However, even though users voluntarily collect data in participatory sensing, privacy issues can still arise regarding the protection and usage of the collected data. So, (a) is incorrect.
“Opportunistic” sensing means automated sensor data collection, which reduces the burden placed on the user. So (c) and (d) are correct.
Question:
In the context of context-aware computing, which of the following scenarios follow the definition of sensor fusion?
a) Utilizing a sophisticated algorithm to interpret data from a single high-precision accelerometer to determine the user’s physical activity.
b) Analyzing high-resolution video feeds from a single camera to deduce the user’s specific actions and environment.
c) Integrating input from a user’s keyboard strokes with application usage data to predict the user’s next task.
d) Synthesizing data from an array of sensors, including an accelerometer, GPS, and a light sensor, alongside ambient sound recordings to construct a detailed understanding of the user’s current context and environment.
d
Sensor fusion is defined as a fusion of multiple sensors to infer a user’s context. Here’s the breakdown of each option:
Option A: This option involves using data from only one sensor, an accelerometer, to predict the user’s activity. Since sensor fusion requires the integration of multiple sensor outputs, relying solely on an accelerometer does not qualify as sensor fusion.
Option B: Similar to Option A, this choice uses data from just one source, a camera, to interpret the user’s context. The absence of integration with other sensor data means this approach does not embody sensor fusion.
Option C: This option focuses on user input and collected data from the user’s interactions, which does not involve the integration of various sensor types. Sensor fusion aims to combine different sensory inputs to create a comprehensive context picture, which is not achieved by analyzing user input alone.
Option D: This is the correct choice for illustrating sensor fusion. It involves combining data from multiple sensors such as accelerometers, GPS, light sensors and ambient sound to form a detailed and nuanced understanding of the user’s environment and activities. This multi-sensor integration follows the definition of sensor fusion, leveraging the strengths of each sensor type to enhance context awareness.
Question: Which of the following best describes the advantage of persuasive computing over human persuaders? (select two)
A) Persuasive computing technologies cannot scale easily to reach a large audience.
B) They offer greater anonymity and can manage huge volumes of data.
C) They are less persistent than human beings in achieving behavioral changes.
D) They can use various modalities to influence, such as data, graphics, and simulations.
B, D
Persuasive computing has several advantages over human persuaders, including the ability to be more persistent, offer greater anonymity, manage vast amounts of data, and use multiple modalities to influence behavior. These technologies can scale easily and operate in environments where humans may not be welcome or cannot reach. The correct answers are B and D, highlighting the capabilities of persuasive computing to handle data and utilize various communication modalities to influence user behavior effectively.
Question: Which of the following is a primary type of social cue used by persuasive technology acting as social actors? (select one)
A) Offering discounts and rewards unrelated to user behavior.
B) Providing positive feedback and modeling target behavior or attitude.
C) Relying solely on text-based communication without feedback.
D) Avoiding any interaction that simulates human-like exchanges.
B
Persuasive technology can act as a tool, medium, or social actor. When acting as a social actor, it can be persuasive by rewarding users with positive feedback, modeling a target behavior or attitude, and providing social support. This approach leverages social cues, such as language use, social dynamics, and roles, to influence behavior. Therefore, B is the correct answer as it accurately reflects how persuasive technology uses social interactions to encourage changes in behavior or attitude.
A) Citizens voluntarily taking and uploading photos to assess the amount of trash in city parks
B) Software that automatically collects data to analyze users’ web browsing patterns
C) A mobile app where consumers scan the barcodes on food packaging to share nutritional information
D) A system that collects data from sensors installed in cars to monitor traffic conditions in a smart city
E) A feature on smartwatches that automatically collects data on an individual’s daily activity and sleep patterns
————————————————————————————–
1)A, B
2)A, C
3)A, B, D
4)C, E
5)C, D, E
2
In this question, A) and C) belong to the participatory sensing paradigm, involving activities where users voluntarily collect and share data. Both cases require active participation from the users.
On the other hand, B), D), and E) are examples of the opportunistic sensing paradigm. These represent methods that automatically collect data through sensors or software, rather than requiring direct user intervention.
Therefore, the correct pairing of options that belong to the same sensing paradigm is 2) A, C.
Question: The following are descriptions of elements from the Functional triads of persuasive technology.
Which is the correct order of elements to fill in the blanks?
Providing social support—1)_____
Helping people rehearse a behavior (simulating environment or objects)—2)_____
Making target behavior easier to do—3)_____
A) 1)Social actor, 2)Medium, 3)Tool
B) 1)Medium, 2)Social actor, 3)Tool
C) 1)Tool, 2)Social actor, 3)Medium
D) 1)Social actor, 2)Tool, 3)Medium
A
The correct answer is A) 1)Social actor, 2)Medium, 3)Tool. In persuasive technology, a Social actor provides social support, offering encouragement or empathy. A Medium lets people practice behaviors in a simulated setting, preparing them for real-life scenarios. A Tool simplifies the desired behavior, making it more accessible and easier to adopt. Each plays a unique role in influencing and guiding user behavior towards a targeted outcome.
Which of the following examples best represents the opportunistic sensing paradigm?
A) Residents using a mobile app to record and report noise levels in their neighborhoods.
B) Citizens collecting water samples and using a testing kit to assess water quality in local water bodies.
C) Automatically collecting GPS location traces from users’ smartphones for traffic analysis.
D) Users taking photos of overflowing garbage cans to actively report and manage waste disposal.
C
20235599
The opportunistic sensing paradigm involves automated sensor data collection without requiring active participation from users. In the given options, automatically collecting GPS location traces from users’ smartphones aligns with this definition. This method utilizes the built-in GPS capabilities of smartphones to passively gather location data as users move about, without having them to actively engage with a specific app or device. This data can then be used for various purposes such as traffic analysis, location-based services, or urban planning, making it a prime example of opportunistic sensing.
In the activity recognition process, which step involves identifying portions of data that are likely to contain information about activities?
A) Data acquisition and pre-processing
B) Data segmentation
C) Feature extraction
D) Model building and classification
B
Data segmentation is the step in the activity recognition process where portions of data likely to contain information about activities are identified. During this step, techniques such as sliding window and energy-based methods are employed to isolate relevant segments of sensor data. These segments are then used for further analysis in subsequent steps, such as feature extraction, to extract meaningful information for activity recognition. Therefore, data segmentation plays a crucial role in identifying and preparing the data for subsequent processing in the activity recognition pipeline.
In the context of acquiring context information for context-aware computing, which of the following is NOT listed as a method or tool for acquiring context?
A) Smart environment infrastructure, such as active badge systems for location information.
B) Mobile sensors embedded in devices for sensing motion, light, and other environmental factors.
C) Sensor fusion, combining data from multiple sensors to infer a user’s context.
D) Utilizing social media activity to directly infer a user’s current physical environment.
D
While social media can give away information about a user’s location, this information is not reliable as it is irregular data (if it even exists), the accuracy depends heavily on the user input and social media usage in general. Also, the up-to-dateness of the information can heavily vary.
Which of the following best describes the sensor data processing pipeline in IoT data science processes?
A) Collect -> Analyze -> Implement -> Monitor
B) Collect -> Segment -> Extract -> Classify
C) Identify -> Process -> Store -> Analyze
D) Sense -> Process -> Actuate -> Feedback
B
20246104
First the data needs to be collected. Then the data is segmented into windows. Features are extracted from each window. Finally, a classification algorithm is used to determine the activity.
Which of the following statements is NOT true about Mobile Sensing Architecture? (Select one)
- Mobile Sensing Architecture involves inform, share, and persuasion stages.
- The most labor-intensive work in sensor data science is the integration of sensor data.
- Data visualization is one of the representative methods for Share stage
- Supervised learning in mobile sensing requires the data to be hand-labeled.
- Persuasive technology systems aim to change user behavior by providing tailored feedback.
2
Mobile Sensing Architecture involves the sense, learn, inform, share, and persuasion stages. The most labor-intensive work in sensor data science is sensor data and label collection in the sense stage. The representative methods for the Share stage include data visualization, community awareness, social network use, etc. Supervised learning in mobile sensing requires the data to be hand-labeled, and unsupervised learning does not. Persuasive technology’s goal is to change users’ attitudes and behavior with tailored feedback. Therefore, the incorrect answer is number 2.
Which of the following pairs correctly match a context-aware application category with its characteristics? (Select two)
- Proximate selection: Automatic, Information.
- Contextual information: Automatic, Information.
- Automatic contextual reconfiguration: Automatic, Information.
- Context-triggered actions: Manual, Command.
- Contextual commands: Manual, Command.
3 & 5
Proximate selection is a user interface technique where the located-objects that are nearby are emphasized or otherwise made easier to choose. In general, proximate selection involves entering a “locus” and “selection.” According to the context, Contextual information displays information and Contextual Commands perform the command. they need to get information about the context manually. Automatic Contextual Reconfiguration detects the user’s context automatically and adjusts information accordingly. Context-triggered actions automatically execute a command when certain context conditions are met. Therefore, the correct answers are number 3 and 5.
20246104
Which of the following steps is NOT typically part of the sensor data processing pipeline in mobile sensing with smartphones?
A) Data Collection
B) Segmentation
C) Model Deployment
D) Feature Extraction
20235599
Which of the following steps is NOT part of the sensor data processing pipeline in mobile sensing with smartphones?
A) Data Collection
B) Segmentation
C) Receiving User Feedback
D) Feature Extraction
C
The sensor data processing pipeline in mobile sensing with smartphones consists of the following steps:
Data Collection: Gathering sensor data from various sources such as built-in sensors (e.g., accelerometer, GPS) on smartphones.
Segmentation: Organizing the collected data into meaningful segments or chunks for further analysis.
Feature Extraction: Extracting relevant features or characteristics from the segmented data to represent the underlying patterns or trends.
Model Building: Developing machine learning or statistical models using the extracted features to learn from the data.
Evaluation: Assessing the performance and effectiveness of the built models using validation techniques.
However, “Model Deployment” is not typically considered as part of the data processing pipeline. Model deployment involves implementing the trained model into a production environment where it can be used to make predictions or decisions based on new data. While it is an important step in the broader process of deploying a system or real-world use, it is not directly involved in the processing of sensor data itself.
What is the main trade-off when designing energy-efficient algorithms for continuous smartphone sensing?
a) Accuracy vs. speed
b) Accuracy vs. memory usage
c) Accuracy vs. energy consumption
d) Accuracy vs. internet connectivity
c
Continuous smartphone sensing involves constantly collecting data from various sensors like accelerometers, gyroscopes, and microphones. This continuous operation drains the phone’s battery significantly.
Energy-efficient algorithms are designed to minimize the energy used by these sensors while still collecting usable data.
Accuracy refers to how well the sensor data reflects the real world.
The key trade-off lies in balancing these two aspects. Here’s how:
More frequent data collection (higher sampling rate) increases accuracy but consumes more energy.
Less frequent data collection (lower sampling rate) conserves energy but might miss important details, reducing accuracy.
Therefore, the goal is to design algorithms that can achieve an acceptable level of accuracy while minimizing the energy consumption of sensors during continuous data collection.
The “second-hand smoke problem” in mobile sensing refers to:
a) Sensor data corruption due to physical damage
b) Privacy concerns of users exposed to other people’s sensors
c) Limited battery life of smartphones
d) Incompatibility between different sensor models
b
The answer is b) Privacy concerns of users exposed to other people’s sensors.
Here’s why:
The term “second-hand smoke problem” draws an analogy to involuntary exposure. Just like inhaling smoke from someone else’s cigarette, a mobile sensing system might collect data about people nearby without their consent.
This scenario raises privacy concerns because sensor data can potentially reveal personal information about these bystanders.
What is the best description for purpose of data segmentation?
a) It is for exploratory data anaysis to enhance better understanding of data
b) Data segmentation is preprocessing data
c) It is to identify those data segments that are likey to contain information about activities for feature extraction
d) Data segmentation is labelling data for classification
c
Data segmentation is for identifying and preparing the data for feature extraction in the activity recognition pipeline. In this process, we can use technics such as sliding window, or engergy based approach. Sliding window is using a window(=frame) of sample, simply slide that window with fixed overlapping and energy based approach is figure out different activities have different activity “intensities” of (or energy).
Context is somewhat vaguely defined terminology in most cases. However according to Schmidt, Beigle, Gellersens’s model(2021) , context can be defined explicity with four definitions. What is the wrong definition about a context?
a) A context describes a situation and the environment a device/user is in
b) A context is defined by a unique name
c) For each context a set of features is relevant
d) Context is entirely determined by the user’s preferences and has no relation to the device or environment they are in.
e) For each relevant feature a range of values is determined(implicitly or explictly) by the context
d
Schmidt, Beigle, Gellersens has defined a context as followings :
- A context describes a situation and the environment a device/user is in
- A context is defined by a unique name
- For each context a set of features is relevant
- Context is entirely determined by the user’s preferences and has no relation to the device or environment they are in.
- For each relevant feature a range of values is determined(implicitly or explictly) by the context
Which of the following is NOT a challenge associated with the continuous sensing capability of smartphones for mobile sensing applications?
A) High computation demand
B) High battery consumption
C) Limited sensor programmability due to operating system and sensor variations
D) Effective data anonymization
D
Continuous smartphone sensing, especially in the context of mobile applications, faces several technical challenges. While high computation demand and battery consumption are direct consequences of such sensing, and sensor programmability issues arise due to hardware and software diversity, user privacy through data anonymization represents a broader, systemic challenge across mobile sensing domains. It’s not inherent to the continuous sensing feature but is crucial for ethical design and deployment. The focus here is on understanding the specific operational hurdles of continuous data collection and processing on smartphones, distinguishing them from overarching privacy considerations which, while vital, are managed through different mechanisms in the context of IoT data science.
Which of the following best exemplifies a context-aware application that utilizes persuasive technology to change a user’s behavior?
A) A GPS application that simply navigates the user from point A to point B
B) A digital calendar that shows the event list of each day
C) A fitness app that tracks the user’s physical activity and encourages more movement based on the user’s location and past behavior
D) A weather app that provides the current weather conditions
C
Persuasive technology aims to change a person’s attitudes or behaviors through the use of interactive technology, while context-aware computing tailors software behavior based on the user’s current context, such as location or activity. A fitness app that tracks physical activity and encourages movement integrates both concepts by using the user’s location and past behavior (context) to motivate increased physical activity (persuasion). Unlike the other options, which may use context-awareness (A, B, D), option C specifically leverages context-aware computing to persuasively encourage a change in user behavior, aligning with the objectives of persuasive technology.
Which aspect of mobile sensing architecture deals with the challenge of achieving fine-grained control over sensors while ensuring compatibility across different operating systems and sensor models?
A) Data integration
B) User interface design
C) Programmability
D) Energy management
C
Programmability focuses on managing smartphone sensors via system APIs, where controlling sensors precisely and ensuring portability across diverse operating systems and sensor models are major challenges.
Consider a bike navigator app that uses sensors to monitor the rider’s speed and adherence to regular roads. It ranks riders based on their safety scores on the same routes and rewards points accordingly. Analyze this bike navigator app from the perspective of the functional triads of persuasive technology. Which roles does it fulfill in encouraging safer riding practices? (Select all that apply)
A) A tool by making target behaviors easier to do through navigation aids and safety monitoring
B) A social actor by rewarding riders with positive feedback and creating a competitive ranking system based on safety scores
C) A medium by providing riders with real riding experiences
D) A database by storing records of all types of bikes
A, B
A) The app acts as a tool by providing navigation aids and monitoring safety-related behaviors (e.g., speed and route adherence), making it easier for riders to engage in safer riding practices.
B) By creating a ranking system based on safety scores and rewarding points for safe riding, the app serves as a social actor, encouraging positive behavior through competition and social reinforcement.
C) While the app assists in route planning and promotes safety, it does not directly provide simulated experiences or scenarios; its primary function is real-time navigation and safety feedback, not simulating different riding experiences.
D) Even though the app might store data on routes and statistics, its persuasive role is not as a database but rather in its interactive features that encourage safer riding practices.
What is a characteristic of the Participatory sensing paradigm?
A) Automated sensor data collection
B) Passive involvement of users
C) Active sensor data collection by users
D) Low burden placed on the user
C
Participatory sensing involves active involvement of users in collecting sensor data, as seen in the example of managing garbage cans by taking photos. Users actively participate in data collection, contributing to the complexity of operations but also influencing the quality of data.
Which of the following is NOT a current solution for addressing privacy issues in mobile sensing systems?
A) Cryptography
B) Privacy-preserving data mining
C) Publicly sharing collected data
D) Processing data locally versus cloud services
C
Current solutions for addressing privacy issues in mobile sensing systems include cryptography, privacy-preserving data mining, and processing data locally versus using cloud services. These solutions aim to protect user privacy by encrypting sensitive information, anonymizing data for analysis, and minimizing the transmission of personal data over the network. However, publicly sharing collected data would contradict the fundamental responsibility of respecting user privacy, as it could lead to unauthorized access or misuse of sensitive information by third parties.
Suppose a mobile sensing application utilizes accelerometer data to detect whether a user is walking or running. The application works well in most cases but struggles to differentiate between walking and running when the user is doing a fast-paced walk or a slow jog. What can we improve during the data gathering and pre-training phase that could enhance the application’s performance in these edge cases? (Select two that apply)
A) Increase the sampling rate of the accelerometer
B) Add a feature that allows users to manually input their activity
C) Increase processing power
D) Use data from the gyroscope in addition to the accelerometer
A, D
The correct options are:
A) Increase the sampling rate of the accelerometer. By increasing the sampling rate, more detailed data about the user’s movements could be captured and the behavior inference could have a better accuracy.
D) Use data from the gyroscope in addition to the accelerometer. Gyroscope data will help the inference process by providing additional context, such as the user’s body posture (form the device’s orientation).
Additional notes:
Option B could also be considered, but it is a less preferable option compared to options A and D. Relying on the users to manually label their type of activity every time they go for a walk or run is not practical and there is a good chance they will forget to label it.
Option C focuses on reducing the training time, rather than directly improving accuracy in inferring the user’s type of activity
Suppose your smartphone is part of a city-wide project to measure noise pollution. Which actions align most closely with the opportunistic sensing approach? (Select two that apply)
A) You decide when to start the noise measurement app.
B) The app automatically starts measuring noise levels when you enter a park.
C) The app periodically prompts you to input the noise levels at various intervals.
D) The app also uses the GPS sensor in your phone while measuring the noise levels as you move around the city.
B, D
The correct options are options B and D, as both options shows a form of automation in the data collection process, which aligns with the opportunistic sensing paradigm.
Meanwhile, options A and C rely more on a manual action or input by the user, which aligns more closely to the participatory sensing paradigm
Which of the following terms best explains ‘this’? ‘This’ is used for combining information to gain a more comprehensive and accurate understanding of the environment or the user’s situation.
A) Proximate Selection
B) Intervention
C) Sensor Fusion
D) Human-in-the-loop
C
The correct answer is C(Sensor Fusion).
A. Proximate Selection involves considering importance to provide convenience and better understanding to users and others.
B. Intervention refers to the act of becoming involved in a situation to alter, change, or influence its course or outcome.
C. Sensor fusion means fusing multiple sensors to infer the user’s context.
D. Human-in-the-loop refers to a mode of operation in systems or processes where human involvement is integrated into the workflow.
Therefore, ‘C. Sensor fusion’ best explains this problem.
What is an example of context triggered action?
a. Light triggered display
b. Orientation sensitive display
c. Active badge
d. Geonotes
C
Here C, Active badge, is the context triggered action and the rest are not.
a and b are Automatic Context Triggered and d is not context triggered.
Which of the following is a part of “Sense” in Mobile Sensing Architechture?
a. Phone Context
b. Semi supervised learning
c. Profile user preferences
d. Statistical analysis
A
The right answer is a. b and d are part of “Learn” and c the part of Inform, Share, Participation
The following is an explanation of sensing paradigms. Which of the following is incorrect? (select one)
A) Taking photographs of locations or discussing events is an example of participatory sensing.
B) In opportunistic sensing, users may feel less burdened.
C) Opportunistic sensing allows for automatic data collection from the surrounding environment.
D) In participatory sensing, the quality of data is unrelated to the participants.
D
The correct answer is D.
To solve this problem, knowledge of participatory sensing and opportunistic sensing is required.
With participatory sensing, users consciously opt to meet an application request out of their own will. Therefore, sensor data is actively collected by the user, and the quality of the data is dependent on the participants.
On the other hand, with opportunisitc sensing, sensor data is automatically collected through methods such as interconnection between devices and lowers burden placed on the user.
Therefore, D is incorrect in saying that the quality of data in participatry sensing is not related to the participants.
In the field of computer science, which of the following definitions inaccurately describes the term ‘context’?
- A context describes both the situation and the environment in which a device/user is situated.
- A context does not possess a unique name.
- Each context has a set of relevant features.
- The context implicitly or explicitly determines a range of values for each relevant feature
2
The correct answer is option 2. In the field of computer science, unique identifiers or names are frequently assigned to distinguish between different contexts within various applications. Therefore, option 2 inaccurately defines the term ‘context’ by suggesting it does not possess a unique name.
Which type of user interface technique is associated with ‘proximate selection,’ making located objects ‘emphasized’ or ‘easier to choose’?
- Siren
- Light sensitive display
- Orientation-sensitive UI
- Nearby printer selection
4
The correct answer to the question is option 4. ‘Proximate selection’ is a user interface technique that helps users choose objects or options physically close to their current location or context. In this specific scenario, selecting a nearby printer for printing tasks exemplifies ‘proximate selection’.
Which of the following strategy games can be classified as persuasive technology? Select all that apply.
a. A strategy game in which players play against NPCs that get progressively smarter.
b. A strategy game in which players play against random players.
c. A strategy game in which players play against players who are of similar rank.
d. A strategy game in which two players are randomly matched as teammates against NPCs that get progressively smarter.
a & c
Persuasive technology is used to purposefully induce a change in behaviour or attitude in the user. Strategy games that have enemies that get increasingly harder to beat force the player to think more strategically, meaning that option a can be considered a persuasive technology. Using the same reasoning, option c is also a persuasive technology and option b is not a persuasive technology. For option d, since you are matched with a randomly skilled teammate, you will not be forced to think more strategically since you may get extremely high skilled players, hence it is not a persuasive technology.
Which of the following data collection examples subscribe to the participatory sensing paradigm? Select all that apply.
a. Whenever a user performs some physical activity, they have to log it in an app.
b. Whenever a user performs some physical activity, their phone senses it and logs it into an app.
c. Whenever a user logs a physical activity they performed, the logging app will automatically record the time they logged it at as well as the current temperature.
d. Whenever a user throws their phone in the air, the phone records the temperature, humidity, and atmospheric pressure.
a & d
(c can be included)
Participatory sensors require users to manually collect/enter data. Using this definition, option a can be trivially recognized as a participatory collection scheme, and option b can be trivially recognized as an opportunistic collection scheme. In option c, even though the user manually logs something, the collected data is the time at which they logged it as well as the temperature at that time, which is automatically collected and hence opportunistic. In option d, the user must throw their phone in order to initiate the data collection every single time, hence this is a participatory scheme.
In which scenario is the sliding window not necessary?
A) Real-time traffic monitoring
B) Stock market data analysis
C) Facial Recognition
D) IoT sensor data analysis
B
Static image analysis does not require dividing the data into smaller windows for processing.
A machine’s bearings are starting to wear out. Which of the following monitoring techniques would be most likely to detect this issue early on?
(A) Oil analysis for viscosity changes
(B) Thermal camera inspection for abnormal heat sources
(C) Vibration analysis for changes in amplitude or frequency patterns
(D) Ultrasonic detection for corrosion
C
Bearing wear often leads to increased vibration. Vibration analysis can detect these changes early on, allowing for preventive maintenance before a major breakdown occurs.
According to Klaus Schwab, the Fourth Industrial Revolution is characterized by which of the following?
a. Mechanical production systems
b. Electrical mass production systems
c. Cyber-physical systems
d. Electronics, IT, automated production
C
Cyber-physical systems - This is the defining characteristic of the Fourth Industrial Revolution, which is the focus of the question. Cyber-physical systems integrate computing, networking, and physical processes. With the advent of the internet of things (IoT), artificial intelligence (AI), and machine learning, these systems enable new ways of creating value and are a step beyond the previous revolution.
Question: Which of the following statements about embedded systems and machine learning are false? (Multiple Answers)
a. Embedded systems like Arduino Sense have high resources compared to modern computers.
b. Writing machine learning architecture code is a fraction of the process, with data collection, preprocessing, and feature engineering taking more time.
c. The development of machine learning systems is a non-linear process involving multiple iterations from data collection to deployment.
d. Android’s sensing rate configuration ensures sampling rates by reducing resources allocated to computationally intensive tasks.
A & D
a. This statement is false because Arduino Sense, with its 512KB of RAM, cannot support MobileNetv1, which requires 16.9MB. The statement implies that lightweight models such as MobileNetv1 are suitable for deployment on such embedded systems, which is incorrect given the hardware constraints.
d. This statement is false as well. According to the lecture notes, Android’s sensing rate configuration does not guarantee the specified rates; instead, the actual sensing rate is device-dependent and varies based on operating conditions.
What role do opportunistic sensing paradigms play in data collection for smart city initiatives?
a) They enable automatic collection of sensor data without user intervention.
b) They rely on active participation of citizens to report environmental data.
c) They primarily utilize external sensor devices for data collection.
d) They focus on individual data collection rather than large-scale analytics.
A
b) this statement describes participatory sensing paradigms where citizens actively contribute data through their involvement.
c)This statement suggests a specific method of data collection, focusing on external sensors rather than automatic collection from various sources.
d) This statement misrepresents the purpose of opportunistic sensing, which aims to gather data from multiple sources for comprehensive analytics, rather than individualized data collection.
Which of the following statements accurately describes the role of sensors in mobile phones?
A) Sensors in mobile phones primarily focus on enhancing the processing power of the device.
B) The accelerometer in mobile phones is used solely for capturing photos in the correct orientation.
C) Proximity sensors in mobile phones can be used to turn off the screen during phone calls.
D) GPS sensors in mobile phones are mainly utilized for adjusting the brightness of the screen.
E) The gyroscope in mobile phones is used to detect when the user holds the phone to their face during calls.
C
A) False: Processing power is not the main function.
B) False: Accelerometers in mobile phones have multiple uses, not limited to orienting photos.
D) False: GPS sensors in mobile phones are primarily utilized for location-based services, not for adjusting screen brightness.
E) False: Gyroscopes in mobile phones primarily detect device orientation for tasks like gaming and augmented reality, rather than detecting when the phone is held to the user’s face during calls.
What is the preferred method for imputing missing values in a time-series dataset where the order of data points is significant, and why?
A) Mean imputation, because it is the simplest method.
B) Mode imputation, because it uses the most frequent value.
C) Winsorizing, because it limits extreme values.
D) Interpolation, because it provides more natural values by considering the temporal order of the data.
In time-series datasets, where the temporal order and continuity of the data points are important, interpolation is a preferred method for imputing missing values. Unlike mean or mode imputation, which might not account for the time-dependent nature of the data, interpolation uses values from neighboring data points to estimate the missing values. This method ensures that the imputed values follow the dataset’s natural flow and variability over time, leading to more accurate and realistic data restoration. Winsorizing is more about limiting extreme values rather than imputing missing ones and might not be suitable for filling gaps in time-series data.
In the context of time series analysis, what is the primary purpose of using a sliding window technique for data segmentation?
A) To increase the computational complexity of the model for better accuracy.
B) To transform qualitative data into quantitative data.
C) To apply a fixed-size window that moves over the data points for feature extraction or pattern recognition.
D) To permanently alter the original time series data for storage efficiency.
C
Answer is (C).
A) Increasing computational complexity is not a primary purpose of this technique. The sliding window method is actually a way to manage complexity by analyzing data in manageable, sequential segments.
B) The technique does not inherently transform qualitative data into quantitative data, although it might be used as part of a preprocessing step that includes such transformations.
C) This is correct because the sliding window technique is primarily used to analyze sequential data segments for pattern recognition, feature extraction, or smoothing purposes.
D) The technique does not alter the original time series data; it’s a method for analyzing the data. The original data remains intact.
A researcher is analyzing temperature data from a sensor to identify patterns of temperature fluctuation over time. The sensor records temperature every minute. The researcher decides to use a sliding window technique with an overlap to segment the data before analysis.
The window size is set to 10 minutes, and the overlap between consecutive windows is specified to be 5 minutes. Given this setup:
How many unique readings will be included in two consecutive windows?
A) 10 readings
B) 15 readings
C) 20 readings
D) 5 readings
B
To solve this, you would calculate the number of readings in one window (10 readings, since the window is 10 minutes and the sampling rate is 1 reading per minute) and then consider the overlap (5 minutes, meaning 5 readings from the end of the first window are also at the beginning of the second window). Thus, the first window has 10 unique readings, and the second window also contains these 5 overlapped readings plus 5 new ones, totaling 15 unique readings in two consecutive windows when the overlap is accounted for.
20218257
Which of the following correctly matches the methods of data collection for activity and emotion in Ground Truth Labeling? (Select all that apply.)
A) Activity - Direct Elicitation: User is asked directly to label their current activity.
B) Emotion - Naturalistic: Watching “emotional” videos or performing tasks designed to elicit specific emotional states.
C) Activity - Naturalistic: Asking people to label their current activity whenever there is a change in activity.
D) Emotion - Observation: A third person judges a user’s emotion, for example, by watching facial videos and labeling emotions.
E) Activity - Observation: Real-time following by an observer or video recording with subsequent post-hoc labeling.
C,D,E
20218257
A) Activity - Direct Elicitation: This is incorrect. While elicitation is used for emotions, here we’re collecting activity data. Option A describes gathering emotional states, not actions.
B) Emotion - Naturalistic: Not quite! Naturalistic tasks aim to evoke specific emotions, which isn’t the same as observing natural emotions. Option B describes eliciting emotions, while we want to observe natural ones.
Question:Consider a dataset with the following characteristics
Mean: 50
Median: 45
Third Quartile (Q3): 60
First Quartile (Q1): 40
Standard Deviation: 10
Determine if the value 78 is considered an outlier based on the following methods:
A. 3σ rule from the mean value
B. Boxplot rule using 1.5 times the Interquartile Range (IQR)
Which of these methods identify the value 78 as an outlier?
1.A only
2.B only
3.Both A and B
4.Neither A nor B
4
“The 3σ rule sets an upper limit at 80 (mean + 3 * standard deviation). Since 78 falls below this limit, it does not qualify as an outlier by this method.”
“The boxplot rule sets an upper limit at 90 (Q3 + 1.5 * IQR). Since 78 also falls below this threshold, it is not considered an outlier by this criterion either.”
Question: Which one of the following outlier detection methods considers the local density around each data point?
A) Chauvenet’s criterion
B) Mixture model
C) Distance-based approach
D) Local Outlier Factor (LOF)
D
20218257
Chauvenet’s criterion, Gaussian mixture models, and distance-based approaches detect outliers based on rarity or separation, without considering local data clustering or density.
The Local Outlier Factor (LOF) assesses the degree to which a point is an outlier by comparing its local reachability distance with that of its neighbors, thus effectively detecting outliers in areas with diverse data densities.
Question: In a study investigating infant emotional awareness, researchers employ an EDA (Electro-Dermal Activity) sensor that records data at a 2 KHz frequency. This sensor data is synchronized with video recordings of the infants. After recording, child psychology experts analyze the videos to determine emotions like happiness or sadness, and this analysis is used to label the EDA data accordingly. Considering this setup, which combination of the data fetching method and ground truth labeling technique is being applied in this scenario?
A) Event-based fetching and Naturalistic labeling
B) Polling-based fetching and Elicitation labeling
C) Polling-based fetching and Observation labeling
D) Event-based fetching and Observation labeling
C
In this scenario, the EDA sensor is recording data at a consistently high frequency of 2 KHz, which aligns with the concept of polling-based fetching, where data is collected continuously at regular intervals. For ground truth labeling, the method used is Observation labeling, where experts analyze the video recordings after the fact (post-hoc) to determine the emotions of the children. This approach does not rely on real-time labeling or direct elicitation but rather on the expert analysis of observed behavior.
Which of the following statements accurately describes the IQR (Interquartile Range) method in outlier detection?
A. IQR measures the spread of data around the mean of the dataset.
B. IQR is calculated as the difference between the minimum and maximum values in the dataset.
C. First quartile Q1 = the value under which 25% of data points are found when they are arranged in decreasing order.
D. IQR is computed as the difference between the third quartile (Q3) and the first quartile (Q1) of the dataset.
Option A is incorrect because IQR doesn’t measure the spread of data around the mean. It measures the spread of data around the median.
Option B is incorrect because IQR is not calculated as the difference between the minimum and maximum values. It focuses on the quartiles.
Option C is incorrect because first quartile Q1 = the value under which 25% of data points are found when they are arranged in increasing order.
Option D is correct becuase the IQR of a set of values is calculated as the difference between the upper and lower quartiles.
First quartile Q1 = the value under which 25% of data points are found when they are arranged in increasing order.
Third quartile Q3 = the value under which 75% of data points are found when arranged in increasing order.
Question: Which of the following statements accurately describe wearable sensors?(Select all that apply.)
a) EDA: Shows greater responsiveness to thermal stimuli compared to psychological stimuli.
b) PPG: When measured simultaneously with ECG at the same time, PPG demonstrates a faster peak arrival speed than ECG.
c) EEG: Commonly utilized in fundamental research concerning neurological and psychiatric disorders.
d) EOG: Measures the electrical potential difference at various positions on the eye.
C,D
EDA is indeed more sensitive to psychological stimuli, making it a powerful method for emotion detection, contradicting statement a.
The claim in statement b is reversed; ECG signals precede those of PPG because the heart’s electrical activity happens before the blood volume changes it causes can be detected.
Statements c and d are accurate, with EEG being a cornerstone in neurological and psychiatric research due to its ability to capture brain electrical activity, and EOG being valuable for measuring eye movement through electrical potential differences.
Which situation is better suited for applying a distance-based outlier detection method?
A) When the data follows a single normal distribution.
B) When the data can be described using K normal distributions (mixture models).
C) When the data exhibits a uniform distribution.
D) When the data contains missing values.
B
Distance-based outlier detection methods, such as the k-nearest neighbors (k-NN) approach or the Mahalanobis distance, are particularly useful when dealing with data that can be modeled as a mixture of multiple normal distributions (i.e., mixture models). In such cases, outliers may deviate significantly from the expected patterns represented by the mixture components. These methods calculate the distance of each data point from the distribution centers and can effectively identify outliers within the mixture.
Suppose we have a dataset with N data points. We apply a simple distance-based outlier detection method using parameters $f_{min}$ and $d_{min}$. If a fraction of $f_{min}$ of the points are found to be outside the distance threshold $d_{min}$, what can we infer about the remaining points?
A) At least $(1−f_{min})\times N $points are close to each other.
B) All points are outliers.
C) The dataset contains no outliers.
D) The number of close points cannot be determined.
A
The condition states that if a fraction of $f_{min}$ of the points are outside the distance threshold, then at least $(1−f_{min}) \times N$ points must be close (i.e., within the distance threshold).
How does the Kalman filter effectively handle noise and missing values in sensory data?
A) By applying a high-pass filter to remove noise and interpolate missing values based on the median value of adjacent data points.
B) By predicting the current state based on previous states and measurements, while minimizing the error covariance to handle noise and estimate missing values.
C) By compressing the data to reduce noise and using pattern recognition to fill in missing values.
D) By transforming sensory data into a frequency domain and filtering out frequencies that correspond to noise and missing data.
B
The Kalman filter predicts the current state of the system using a mathematical model of the system’s dynamics. This model accounts for the previous state and any control inputs that might affect the current state.
The filter estimates the uncertainty of its predictions and measurements (error covariance) and uses this estimation to weight its predictions and measurements. This process helps in effectively reducing the impact of noise in the data.
Which of the following is most commonly used to measure heart rate variability through sensor data collection?
A) EDA (Electrodermal Activity)
B) ECG (Electrocardiogram)
C) PPG (Photoplethysmogram)
D) EEG (Electroencephalogram)
E) EOG (Electrooculogram)
B
ECG is widely recognized for its ability to measure the electrical activity of the heart, making it especially useful for assessing heart rate variability (HRV), among other cardiac functions. PPG is also used to measure heart rate by detecting blood volume changes in the microvascular bed of tissue, but ECG is more directly associated with heart rate variability due to its detailed capture of the electrical signals that trigger heartbeats.
Which statement is correct about ground truth labeling?(choose 1)
a. The Experience Sampling Method is typically conducted in a natural setting
b. Elicitation means the observer will follow up and label the object in real-time.
c. In a natural setting, collector ask users to follow predetermined scenarios to collect data.
d. The “Observation” refers to requesting individuals to label their current state.
a
“Elicitation” involves collectors asking users to adhere to predefined scenarios in order to gather data. “Natural setting” entails prompting individuals to label their current activity. “Observation” refers to directly observing and recording the characteristics or situations of a subject. The Experience Sampling Method (ESM) involves sending messages to users prompting them to input their current state (label), and it is typically conducted in a natural setting. Therefore, option (a) is correct.
Which statement is incorrect about outlier? (choose 2)
a. Average is more robust than median in terms of the fact that outliers would pull a median toward outliers.
b. An outlier refers to data that is significantly distant from other data points
c. A lower reliability of a sensor could cause outliers.
d. It’s always best to get rid of outliers.
a,d
The median is more robust than the mean in the presence of outliers. Outliers can significantly affect the mean, pulling it towards their extreme values. However, the median is less influenced by outliers since it depends only on the middle value of the dataset. Therefore, (a) is incorrect.
An outlier is a data point that significantly differs from other data points in a dataset. Low reliability of a sensor means that precise measurements cannot be guaranteed, thereby diminishing confidence in the measurement results. Therefore, (b) and (c) are correct statements.
It’s not always best to remove outliers. While outliers can distort the distribution of data, they can also provide valuable insights or indicate important anomalies in the data. Removing outliers without proper justification or understanding of their origins can lead to biased or inaccurate analyses. Therefore, (d) is incorrect.
Which of the following physiological responses is mainly used as an indicator of emotional arousal?
- Electro-Cardio-gram (ECG)
- Photo-Plethysmo-gram (PPG)
- Electro-Encephalo-gram (EEG)
- Electro-Dermal Activity (EDA)
4
Electro-Cardio-gram (ECG): ECG measures the electrical activity of the heart.
Photo-Plethysmo-gram (PPG): PPG measures blood volume changes in the microvascular bed of tissue, which can indirectly reflect changes in emotional states through variations in heart rate. However, like ECG, it is not specifically used to assess emotional arousal.
Electro-Encephalo-gram (EEG): EEG records electrical activity of the brain and is crucial in neurological research and diagnosis.
Electro-Dermal Activity (EDA) primarily measures the changes in the electrical conductance of the skin due to sweat gland activity. When a person experiences emotional arousal, particularly through the activation of the sympathetic branch of the autonomic nervous system, sweat gland activity increases, leading to a higher skin conductance. Therefore, EDA is used as an indicator of emotional arousal and responsiveness to psychologically significant stimuli. This physiological response is more sensitive to emotional changes rather than thermal stimuli, making it a valuable tool in assessing emotional states and reactions. Thus, the correct answer is 4.
Consider the data set consisting of:
{88, 23, 131, 36, 1001, 294, 391, 1, -2, 94, -99, 2, 82, 42, -11, 43} (N = 16, mean = 106)
The data below the 5th percentile lies between -99 and 9.75, while the data above the 95th percentile lies between 148.6 and 1001.
Which of the following is the correct result after 90% winsorization?
a) {88, 23, 131, 36, 9.75, 9.75, 9.75, 148.6, 148.6, 94, 148.6, 148.6, 82, 42, 148.6, 43}
b) {88, 23, 131, 36, 148.6, 294, 391, 1, -2, 94, 9.75, 2, 82, 42, -11, 43}
c) {88, 23, 131, 36, 148.6, 148.6, 148.6, 9.75, 9.75, 94, 9.75, 9.75, 82, 42, 9.75, 43}
d) {88, 23, 131, 36, 0, 294, 391, 1, -2, 94, 0, 2, 82, 42, -11, 43}
c
Winsorization is a method used to mitigate the impact of outliers by substituting extreme values with values closer to the rest of the data. Specifically, in a 90% winsorization, the lowest 5% of the data are replaced with the value at the 5th percentile, and the highest 5% of the data are replaced with the value at the 95th percentile.
Given that the data below the 5th percentile lies between -99 and 9.75, and the data above the 95th percentile lies between 148.6 and 1001, the correct result after 90% winsorization should replace the lowest 5% of the data with 9.75 and the highest 5% with 148.6.
Option 3 accurately reflects this process by substituting the appropriate values with 9.75 and 148.6, resulting in a winsorized dataset. Hence, option c) is the correct answer.
Which of the following best describes the Experience Sampling Method (ESM)?
A. ESM typically gives questionnaires to participants in controlled laboratory environments.
B. A method used in qualitative research for participant observation
C. A research approach involving random sampling of experiences in real-time
D. ESM primarily relies on retrospective self-reports to gather data about individuals’ experiences and behaviors.
C
Option A is incorrect because is ESMa research procedure for studying what people do, feel, and think during their dailylives.
Option B is incorrect because participants need to report on their current experiences by themselves.
Option C is correct because participants being prompted at random intervals to report on their current experiences
Option D is incorrect because ESM involves collecting data in real-time, rather than relying on retrospective self-reports.
The Experience Sampling Method (ESM): participants being prompted at random intervals to report on their current experiences involves collecting data on participants’ experiences, behaviors, or thoughts in real-time as they occur in their natural environment. This method typically involves participants being prompted at random intervals to report on their current experiences, providing researchers with insights into everyday life experiences and behaviors. ESM allows researchers to capture momentary experiences and reduce recall bias that may occur with traditional retrospective self-report measures.
Which of the two following are the most likely applications of the data gathered from EDA (Electrodermal Activity) and PPG (Photoplethysmogram) sensors in wearable devices?
A) Personalizing hydration reminders.
B) Detecting nightmares.
C) Tracking the user’s surrounding pressure levels.
D) Measuring the speed of the user’s movement.
A, B
Correct answers: A) and B)
Option A: EDA sensors can detect changes in sweat gland activity. An increase in sweating may indicate dehydration, which could be used as an indicator for an application to suggest hydration to the user.
Option B: EDA readings reflect changes in sweat gland activity, which can be correlated with stress during nightmares. With PPG, one of the metrics that can be observed is BPM, which is usually tied to emotional stress.
Explanations for the incorrect options:
Option C: This is typically measured with barometers.
Option D: These kinds of measurements typically rely on motion sensors such as accelerometers and gyroscopes.
Which of the following scenarios involving sensor readings from wearable devices is most likely to produce actual outliers that require removal or further processing?
A) Spikes in EDA (measures skin conductivity) readings when a user is doing an intense physical exercise.
B) Low ECG (measures heart rate) readings while a user is asleep.
C) Sudden drop in PPG (measures heart rate) readings when the wearable device briefly loses contact with the user’s skin.
D) Consistently high SpO2 (measures oxygen level) readings of a user
C
Correct answer: C)
Since the sudden drop in the measurement value is caused by mechanical failure or interference while gathering the data, this can be considered as an outlier that does not reflect the wearer’s physiological state and should be removed.
Explanation for the incorrect options
Option A: Spikes in EDA readings can occur during intense physical activity due to increased sweat production, which is a normal physiological response and not necessarily an outlier.
Option B: Lower ECG readings during sleep are expected due to the decreased heart rate as the body enters a state of rest, which can’t be categorized as outliers
Option D: This can be categorized as a systematic error, which can be addressed by recalibrating the device or adjusting the values based on domain knowledge
Which sensor technology is used to measure electrodermal activity (EDA)?
A) Gyroscope
B) Photo-plethysmo-gram (PPG)
C) Galvanic skin response (GSR)
D) Accelerometer
C
Electrodermal activity (EDA), also known as galvanic skin response (GSR), is a measure of the skin’s conductivity, which changes in response to sweat gland activity. This activity is primarily controlled by the sympathetic nervous system, making EDA a useful indicator of emotional arousal. When a person experiences emotional arousal, the sympathetic branch of the autonomic nervous system becomes more active, leading to increased sweat gland activity and, consequently, higher skin conductance. This physiological basis allows EDA measurements to serve as indicators of psychological or emotional states.
According to the document, which method is described as a way to handle outliers by substituting extreme values with less extreme ones, thereby reducing the influence of potentially spurious outliers?
A) Kalman Filtering
B) Chauvenet’s Criterion
C) Winsorizing
D) Local Outlier Factor
C
A) Kalman Filtering: Kalman Filtering is a recursive algorithm used for estimating the state of linear dynamic systems from a series of incomplete and noisy measurements. It’s not specifically designed for handling outliers in a statistical dataset.
B) Chauvenet’s Criterion: Chauvenet’s Criterion is a rule for identifying and removing outliers from a dataset. It determines whether a data point should be considered an outlier based on the probability of its deviation from the mean, which is not the method described in the question.
C) Winsorizing is a statistical transformation method used to reduce the effect of possibly spurious outliers by substituting extreme data points with less extreme ones. This could involve replacing values below the 5th percentile and above the 95th percentile with values closer to the median, thus mitigating the impact of outliers on the dataset.
D) D) Local Outlier Factor (LOF): LOF is an algorithm for identifying density-based local outliers, particularly in datasets with clusters. It measures the local deviation of a data point with respect to its neighbors, aiming to identify regions of similar density.
What would be the result of applying a 80% Winsorization to the following dataset: {10, 15, 20, 25, 30, 35, 40, 45, 50, 100}?
A) {10, 15, 20, 25, 30, 35, 40, 45, 50, 50}
B) {15, 15, 20, 25, 30, 35, 40, 45, 50, 50}
C) {20, 20, 20, 25, 30, 35, 40, 45, 50, 50}
D) {20, 20, 25, 25, 30, 35, 40, 45, 50, 50}
C
An 80% Winsorization involves replacing the lowest 10%
and the highest 10% of values with the 10th and 90th percentiles
respectively. In this dataset, 10% of 10 is 1, and 10% of 100 is 10.
So, we replace the lowest value with the 10th percentile (20) and
the highest value with the 90th percentile (50),
resulting in {20, 15, 20, 25, 30, 35, 40, 45, 50, 50}.
Which ground truth labeling method would be most appropriate for accurately tracking the sleeping patterns of individuals using wearable devices?
A) Elicitation through predetermined scenarios
B) Natural observation through real-time sensors
C) Emotion observation through facial recognition
D) Experience sampling through random user prompts
B
Natural observation through real-time sensors would be the most suitable method for tracking sleeping patterns using wearable devices. This method involves directly monitoring physiological signals such as heart rate, movement, and sleep stages using sensors embedded in the wearable device. It allows for continuous and accurate tracking of sleeping patterns without relying on user input or predetermined scenarios.
Electrodermal Activity (EDA) is primarily used to measure:
A) Heart rate variability
B) Muscle tension
C) Skin conductivity changes due to sweat gland activity
D) Brain wave patterns
C
Correct answer: C.
Electrodermal Activity (EDA) is a physiological response that measures the electrical conductance of the skin, which varies with the activity of the sweat glands.
Heart rate variability is often measured using electrocardiography (ECG). Electromyography (EMG) sensors are used to measure muscle tension or activity. Brain wave patterns can be monitored using Electroencephalography (EEG).
How does the presence of outliers in a dataset affect the mean?
A) It has no effect on the mean, making it a reliable measure in all cases.
B) It causes the mean to shift towards the outliers, potentially misrepresenting the data’s central tendency.
C) It makes the mean calculation computationally easier.
D) It reduces the mean’s value, making lower values more prevalent.
B
Correct answer: B. The mean is calculated by summing all the values in a dataset and then dividing by the number of values. This calculation method means that every value, no matter how large or small, influences the result. When there are outliers they can significantly skew the mean. This skew can lead the mean to misrepresent the central tendency of the data, giving a distorted view of what’s typical or common within the dataset.
A is incorrect because outliers can significantly shift the mean away from the central mass of the data, making it a less reliable measure of central tendency in distributions with outliers.
C is incorrect because the process for calculating the mean (adding all the values together and then dividing by the number of values) remains the same regardless of whether outliers are present or not.
D is incorrect because outliers can either increase or decrease the mean, depending on whether the outliers are significantly higher or lower than the rest of the data.
Suppose you need to design an experiment to understand how users interact with a new software application under specific conditions. You plan to have users perform predetermined tasks with the software within a controlled environment to obtain their particular behaviours or responses.
Which of the following activity is best to represent above case study?
A) Observation
B) Natural
C) Elicitation
D) Emotion
C
In the described case study, what you want is to setup a controlled environment with predefined scenarios for the users to perform tasks using new software. This method is known as elicitation because it actively creates situations designed to draw out specific responses or interactions from the participants. Unlike naturalistic observation, where the researcher would observe and record behaviours without intervention, or natural labelling, where users report on their activities as they change, elicitation deliberately induces a certain environment or set of circumstances to gather data on how users behave under those particular conditions. Therefore the answer is C
Arnold is participating in a study analyzing his movements during outdoor activities using a GPS device and a step counter. However, the data collected contains some irregularities and missing values. The research team decides to use a method that can detect outliers and simultaneously impute missing data points, taking advantage of their understanding of Arnold’s typical movement patterns and the reliability of the devices used. Given the scenario and the tools at hand, which method would be the most suitable for processing Arnold’s movement data? A) Mean/Median/Mode imputation, because it is a straightforward technique that can replace missing data with the most frequently observed values. B) Kalman filter, because it not only detects outliers in Arnold’s presence at a position and velocity but also imputes missing values using the GPS data and step counter measurements. C) Interpolation-based imputation, because it can fill in missing values based on the data points immediately before and after the gaps. D) Winsorizing, because it can adjust the extreme outliers to a specified percentile, thus limiting their impact on the analysis.
20234921
You want to analyze movement activity with data from a GPS device and a step counter. Which method would be the most suitable to detect outliers and simultaneously impute missing data points, taking advantage of their understanding of typical movement patterns and the reliability of the devices used.
A) Mean/Median/Mode imputation, because it is a straightforward technique that can replace missing data with the most frequently observed values.
B) Kalman filter, because it not only detects outliers in Arnold’s presence at a position and velocity but also imputes missing values using the GPS data and step counter measurements.
C) Interpolation-based imputation, because it can fill in missing values based on the data points immediately before and after the gaps.
D) Winsorizing, because it can adjust the extreme outliers to a specified percentile, thus limiting their impact on the analysis.
B
The correct answer is B. The Kalman filter is an advanced algorithm that excels in situations where there is a need to estimate the state of a dynamic system over time. In Arnold’s case, the dynamic system is his movement through space as tracked by GPS and a step counter. The Kalman filter provides a more precise and tailored approach to cleaning and imputing the data in this context, as opposed to more general methods like mean/median/mode imputation or interpolation, which do not utilize the additional information available about the system’s behavior over time. Winsorizing is not as suitable since it primarily addresses extreme values rather than missing data points and does not take advantage of the dynamic model of Arnold’s movements.
In the lecture, the 3MAD (Median Absolute Deviation) and 3sigma (standard deviation) methods for outlier detection were introduced. It is known that 3MAD is generally more robust than 3sigma. Which of the following statements is NOT a correct reason for this?
A) 3sigma is appropriate even if the dataset does not follow a normal distribution.
B) 3sigma uses the mean to calculate the standard deviation, making it more susceptible to outliers.
C) 3MAD is less robust than 3sigma because MAD is more influenced by outliers in the data.
D) 3sigma is more robust to outliers because it squares the differences from the mean, reducing the impact of large deviations compared to MAD.
B
B is true “3sigma uses the mean to calculate the standard deviation, making it more susceptible to outliers.” - This statement is true. Since the mean is sensitive to outliers, and standard deviation is derived from the mean, the 3sigma method is also sensitive to outliers. This sensitivity makes it less robust compared to 3*MAD, which uses the median.
A is not correct 3sigma assumes the data follows a normal distribution. If the data is not normally distributed, using 3sigma for outlier detection may not be appropriate or effective.
C is incorrect because it’s the opposite of the established fact that MAD is less influenced by outliers compared to the mean. MAD uses the median, which is more robust to outliers than the mean, making 3MAD more robust for outlier detection than 3sigma.
D is misleading. While squaring the differences does penalize larger deviations more, it actually makes the sigma method more sensitive to outliers, not less. Outliers, which have large deviations from the mean, will have an even larger impact after being squared, making 3sigma less robust to outliers compared to 3*MAD.
Which of the following is the cheapest option to get heart rate variability?
A) EOG (Electrooculogram)
B) ECG (Electrocardiogram)
C) EDA (Electrodermal Activity)
D) PPG (Photoplethysmogram)
E) EEG (Electroencephalogram)
D
Correct answer is PPG. Photoplethysmography (PPG) is a simple
and low-cost optical technique that can be used to detect blood volume
changes in the microvascular bed of tissue. It is widely used to
measure heart rate and heart rate variability (HRV) by
detecting the pulse wave that travels through the blood vessels
each time the heart beats. PPG sensors are commonly found in
many consumer-grade wearable devices, such as fitness trackers
and smartwatches, because of their cost-effectiveness and
ease of integration.
Alex is participating in an experiment regarding the motion recognition using biosensors in smart watches. He was told to download an app in his smart watch and the app routinely notifies him to select his current activity among sitting still, running and etc. During a day, the app randomly notifies him 6 times except at night. After notification he has to select his current activity within 3 minutes. Alex was told that this experiment will end next Friday and hence he has to wear the smart watch for 5 days.
In above experience sampling method, what is the parameter the practitioners should additionally consider?
A) notification schedule
B) notification expiry
C) inter-notification time
D) study duration
C
A) Notification schedule: This refers to the timing and frequency of notifications sent to participants. The description mentions that notifications are sent randomly six times a day, except at night, which indicates that the schedule is already a consideration
B) Notification expiry: This is the window of time within which participants must respond to a notification, set at 3 minutes in the study.
C) Inter-notification time: This refers to the time between consecutive notifications. Although notifications are said to be random, ensuring a minimum or maximum time between them (to avoid clustering or long gaps) is crucial for balancing data collection throughout the day and reducing participant burden.
D) Study duration: The overall length of the study is mentioned as 5 days, ending next Friday.
However, the practitioner didn’t consider inter-notification time, i.e. time gap between random notifications. Hence, the answer is C
Among the outlier detection options below, which one of it does not require distributional (normality) assumption?
A) 3 sigma rule
B) Inter Quantile Range (IQR) method
C) Local Outlier Factor
D) Chauvenet’s criterion
C
A) 3 sigma rule is from the fact that if X follows normal distribution with mean mu and sigma, X will fall into 3 sigma interval, mu - 3*sigma
Which of the following statements best explains why 3MAD is considered more robust than 3σ?
A) MAD is calculated based on the mean, which is less sensitive to outliers than the median used in σ calculation.
B) MAD is more resistant to deviations from normality and works well with non-normal distributions, unlike σ, which assumes a normal distribution.
C) MAD provides consistent estimates of dispersion even in the presence of outliers, while σ tends to overestimate the spread of the data in the presence of extreme values.
D) MAD focuses on the median, which is less influenced by extreme values, resulting in a more robust estimate of dispersion compared to σ, which can be inflated by outliers.
D
A) MAD (Median Absolute Deviation) is calculated based on the median, not the mean. MAD measures the dispersion of a dataset by calculating the median of the absolute deviations from the median.
B) This statement is partially correct. MAD is indeed more robust against deviations from normality and can work well with non-normal distributions. However, standard deviation (σ) does not necessarily assume a normal distribution; it is commonly used with various distribution types.
C) While MAD does provide consistent estimates of dispersion even in the presence of outliers, this statement oversimplifies the comparison with σ. Standard deviation (σ) can also be robust to outliers under certain conditions, especially if the data follows a normal distribution or if robust methods like Winsorized standard deviation are used.
D) MAD focuses on the median, which is less influenced by extreme values, resulting in a more robust estimate of dispersion compared to σ, which can be inflated by outliers.
Which of the following statements accurately describes the difference between PPG and ECG for heart rate monitoring?
A) ECG directly measures heart activity by detecting electrical signals produced by the heart muscle.
B) PPG relies on optical measurements, capturing blood flow changes via a small LED light.
C) ECG is more reliable than PPG in measuring heart rate.
D) PPG sensors are ideal for average or moving average measurements.
E) Both A and B are correct.
A, B, E
A) ECG (Electrocardiogram) directly measures heart activity by detecting electrical signals produced by the heart muscle. It provides precise information about the heart’s electrical conduction system and is commonly used in medical settings.
B) PPG (Photoplethysmography) relies on optical measurements, capturing blood flow changes via a small LED light. PPG sensors typically measure changes in light absorption caused by blood volume changes, providing an indirect measurement of heart rate and blood flow.
C) While ECG is often considered the standard for measuring heart rate due to its accuracy and direct measurement of cardiac electrical activity, PPG can also be reliable when used properly, especially in consumer-grade devices.
D) PPG sensors can provide real-time heart rate measurements, and while they can calculate moving averages, they are not limited to this type of measurement.
E) As A) and B) are correct, this option is also correct.
What does the Kalman filter technique primarily address?A) Only outlier detectionB) Only missing value imputationC) Both outlier detection and missing value imputationD) Neither outlier detection nor missing value imputation
C
The Kalman filter is a sophisticated method that is used for both detecting outliers and imputing missing values, leveraging prior knowledge about the data’s process and measurement models.
Which of the following is NOT a method for detecting outliers?
A) Using Chauvenet’s criterion
B) Using Gaussian Mixture Models
C) Using the Least Squares Method
D) Using the Local Outlier Factor
C
The Least Squares Method is primarily used for modeling relationships in regression analysis, not for detecting outliers. Chauvenet’s criterion, Gaussian Mixture Models, and the Local Outlier Factor are techniques that can be used for outlier detection.
You are working on a project involving correct exercise form for weighted exercises (deadlift, bicep curl, etc.) and are tasked to collect data (sensor and video).
You are told that not all of the participants know/use the right form during exercise. Which is the best ground truth labelling method for labelling the collected data then?
A: Elicitation (using only participants that know and can execute the proper form)
B: Elicitation (using all participants)
C: Natural
D: Observation
D
In order to collect sufficient data, you need both positive and negative samples (people with good and bad form); thus A is not valid. Since human movement can vary based on a variety of factors, someone with good form might execute a movement incorrectly because they don’t feel well, thus making option B invalid. Using similar logic option C is also invalid; participants might think they have the correct form without knowing that they are doing an exercise incorrectly. Option D is correct since you can see if someone performed a movement correctly, making it the most reliable.
Assume you are processing some temperature data and discover that there are a couple of missing values. Which imputation strategy would work the best? Assume that the temperature data is sinusoidal.
A: Mean
B: Median
C: Mode
D: Interpolation
D
When given sinusoidal data, the mean and median imputation will produce a straight line across the middle of the sinusoidal data; although this solves the issue of missing data it doesn’t do it very well. A similar problem occurs with mode imputation,except this time the straight line can occur at any point. Interpolation is the best option since it will preserve the sinusoidal shape.
Which of the following statements is incorrect? (Select two)
- Pseudo sensors of smartphones can be used for activity recognition.
- Showing sad videos to obtain ground truth labeling should be avoided as it may affect users’ emotions.
- Users can be instructed to engage in activities such as running or walking to gather ground truth labels for activity.
- According to the Experience Sampling Method, individuals can be asked about their current emotional state to collect emotion ground truth labels.
- Self-labeling is always more accurate than labeling from observation.
2, 5
Pseudo sensors of smartphones can indicate the type of activity based on software.
For ground truth labeling, a method of showing a video of a specific emotion and then asking the user to rate the current emotion can be used. Therefore, showing sad videos to obtain ground truth labeling is not avoided.
Users may be required to follow predetermined scenarios for ground truth labeling of activities.
According to the Experience Sampling Method, users can be asked to randomly label a current emotion state.
It cannot be said which is always more accurate: self-labeling or labeling from observation.
Therefore, 2 and 5 are incorrect.
Consider the data set consisting of: {-63, -2, 14, 18, 25, 56, 75, 87, 92, 1028} (N=10)
The following are the ranges of inliers after applying outlier detection to the data:
- 3σ rule: [-771.9, 1037.9]
- 3MAD rule: [-130.7, 221.7]
With the 3σ rule, the number of inliers is (a)___. With the 3MAD rule, the number of inliers is (b)___. Based on this, 3(c)___ is more robust than 3(d)___.
Which of the following is correct for (a), (b), (c), and (d)?
- 10, 10, MAD, σ
- 10, 9, MAD, σ
- 10, 9, σ, MAD
- 9, 9, σ, MAD
2
To solve this problem, we need to know about distribution-based outlier detection. After applying the 3σ rule, the number of inliers is 10 because all values in the data lie within the range of inliers. On the other hand, after applying the 3MAD(Median Absolute Deviation) rule, ‘1028’ is outside the range of inliers, so the number of inliers is 9. Based on this, 3MAD is more robust than 3σ.
In the context of ground truth labeling for sensor data collection, what does the “Natural” method of labeling refer to?
a) Predefining activities for users to perform and label accordingly.
b) Users annotating their current activity or emotion when a change is detected.
c) Observers recording and labeling a user’s activity or emotion from a distance.
d) Implementing automated algorithms to label sensor data outputs.
b
a) Incorrect. This describes the “Elicitation” method where users follow predetermined scenarios, not the “Natural” method which involves in-situ labeling by the users themselves.
b) Correct. The “Natural” method involves in-situ labeling, which means asking people to label their current activity or emotion whenever there’s a change of activity.
c) Incorrect. This option describes the “Observation” method where labeling is done by an observer or through video recording with post-hoc labeling, not by the users themselves in their natural setting.
d) Incorrect. The use of automated algorithms to label sensor data would not involve direct user input and hence does not align with the “Natural” method of ground truth labeling, which relies on user-generated labels.
In sensor data analysis, why might a researcher use the Kalman filter for outlier detection and imputation?
a) To substitute extreme values with more typical ones using Winsorization.
b) To assume a single distribution for attribute noise reduction.
c) To employ prior knowledge of process and measurement models for data calibration.
d) To automate the detection of outliers based on local density and distance factors.
c
a) Incorrect. Winsorizing modifies extreme data points to reduce the impact of outliers but does not incorporate prior knowledge about the data, which is the key aspect of the Kalman filter’s approach.
b) Incorrect. A single distribution assumption for attribute noise reduction is a characteristic of distribution-based outlier detection methods, not the Kalman filter which utilizes a model-based approach.
c) Correct. The Kalman filter uses models of the system’s process and measurement to predict and correct state estimates, which is particularly useful in sensor data calibration and addressing outliers and missing values.
d) Incorrect. Automating outlier detection based on local density and distance factors refers to other methods like the local outlier factor (LOF), which are different from the Kalman filter’s model-based prediction and correction approach.
Which one is not a method for ground truth labeling of sensor dataset?
a. Eicitation: Asking users to follow predetermined scenarios
b. Natural: in-situ labeling - asking people to label a current activity
c. Auto Inference: Using pretrained ML models to label assign labels
d. Observation: real time following or video recording with post hoc labeling
no answer
Which outlier detection method assumes the data to be normally distributed?
a. Simple search based method
b. Local outlier factor
c. Chauvenet’s criterion
d. Isolation forest
no answer
Which of the following is NOT an example of Experience Sampling Method (ESM)?
A) Random notification schedule with a maximum of 10 times a day.
B) Hourly interval notification schedule.
C) Notifications triggered by incoming calls and app use.
D) Daily survey sent at a fixed time each day.
D
ESM typically employs random or interval notification schedules, triggering prompts at various times throughout the day to capture momentary experiences. These prompts can also be event-based, such as when specific actions occur (e.g., incoming calls, app use).
Option D describes a daily survey sent at a fixed time each day, which does not align with the principles of ESM. In ESM, the timing of prompts is variable and often unpredictable, aiming to capture experiences as they naturally occur rather than at pre-scheduled intervals. Therefore, a daily survey sent at a fixed time each day does not fit the criteria of ESM and is not considered an example of this method.
Which of the following scenarios is incorrect when handling missing data through imputation methods?
A) Filling missing values with the median of the observed data.
B) Estimating missing values based on the trend between neighboring data points.
C) Predicting missing values using a regression model trained on the available data.
D) Discarding observations with missing values to maintain data integrity.
D
Imputation methods are techniques used to handle missing data by replacing them with estimated values. Options A, B, and C describe typical scenarios of imputation. Option D is incorrect because it involves removing valuable data points rather than imputing missing values. This approach can lead to biased results and reduced sample sizes, potentially compromising the validity of the analysis. Therefore, discarding observations with missing values is not considered a proper imputation method.
What is the primary purpose of ground truth labelling in sensor data collection?
a) To increase the storage capacity of sensors
b) To calibrate sensor accuracy
c) To create a reference for data analysis
d) To reduce the cost of sensor manufacturing
C
Ground truth labelling involves assigning known labels to data collected from sensors, providing a reference against which the data can be analyzed and the performance of sensing systems can be evaluated.
Considering the use of Winsorizing in data preprocessing, what is the primary goal when applying this technique to a dataset with extreme outliers?
a) To increase the range of data by extending extreme values.
b) To replace all data points with the mean to simplify analysis.
c) To limit extreme values to reduce their influence on the analysis.
d) To evenly distribute all data points across the dataset.
C
Winsorizing is a method of limiting extreme values in the dataset to reduce the effect of potentially spurious outliers. By capping extreme values to a certain percentile at both ends of the data range (e.g., the 5th and 95th percentiles), Winsorizing reduces the influence of outliers on the analysis, leading to more robust statistical estimates.
What does EDA (Electro-Dermal Activity) measure and why is it significant? (select one)
A) The electrical conductivity of the skin to track device usage patterns.
B) The skin’s momentary electrical conductivity changes in response to stimuli, indicating emotional arousal.
C) The ambient temperature around the device to adjust screen brightness.
D) The battery life of wearable devices for efficient energy consumption.
B
EDA measures the skin’s electrical conductivity, which changes momentarily in response to various stimuli. These changes are primarily due to the activity of the sweat glands, controlled by the sympathetic nervous system, making EDA a valuable indicator of emotional arousal. This makes it a crucial measure in studies related to stress, excitement, or emotional states.
What is Chauvenet’s criterion used for in the context of outlier detection, and what does it entail? (select one)
A) Assuming data follows a single distribution to identify outliers.
B) Using the k-nearest neighbors to measure the local density around a point.
C) Finding a probability band centered on the mean to reasonably contain all samples.
D) Substituting extreme values with less extreme values to reduce outliers’ effects.
C
Chauvenet’s criterion involves determining a probability band centered on the mean of a normal distribution that should reasonably contain all samples in the dataset. It helps identify outliers by excluding data points that fall outside this band, assuming a normal distribution.
What could be a potential challenge when measuring physiological signals in a real-world setting?
a) Limited access to advanced data analysis tools
b) Difficulty in securing participants for the study
c) Inability to accurately measure due to movement interference
d) Lack of trained personnel to operate the measurement devices
C
Explanation: Physiological signal measurements can be affected by movement interference, making it challenging to obtain accurate data in real-world settings where participants may be moving or active.
Which of the following statements about the k-nearest neighbors is the least accurate?
a) The value of k should be adjusted according to the distribution and characteristics of the data.
b) A small k value can increase accuracy by making it more sensitive to the distances between data points so that it is less likely to overfit.
c) A large k value can provide more generalized results, but the accuracy may decrease as the classification boundaries become smoother.
d) In dense regions of data, a small k value can be chosen to give more weight to the influence of nearby neighbors, and in sparse regions, a large k value can be chosen to consider a wider area.
B
Explanation: While a small k value can be more sensitive to the local data points and potentially improve accuracy, it also makes the model more susceptible to overfitting the training data. This means the model performs well on the training data but may not generalize well to unseen data.
Which of the following statements is incorrect? (Select two)
- The elicit method for ground truth labeling involves recording sensor data while asking users to follow predetermined scenarios.
- Using recorded videos of the user’s facial expressions is the elicit method for labeling emotion data.
- Allowing users to watch ‘emotional’ videos constitutes the observation method for ground truth labeling.
- Randomly asking a user to label their current emotional state is a natural setting method for labeling ground truth.
2,3
Option 2 mislabels an observation method as the elicit method for emotion data, while Option 3 inaccurately assigns an elicit method for emotion data as an observation method for ground truth labeling.
Which of the following statements is incorrect? (Select one)
- An outlier is an observation point that is distant from other observations.
- Distribution-based outlier detection methods assume a certain distribution of the data.
- Distance-based outlier detection methods only consider the distance between data points.
- Chauvenet’s criterion is a distance-based outlier detection method.
4
Chauvenet’s criterion is a Distribution-based outlier detection method
What can be measured using both PPG and ECG?
a) Heart rate
b) Blood pressure
c) Blood oxygen saturation
d) Electrical activity of the heart
a
PPG and ECG both measure heart rate. PPG measures the blood volume changes in the skin, while ECG records the electrical activity of the heart. Ultimately, both technologies are utilized to measure heart rate. Therefore, (a) is the correct answer.
Which of the following statements is incorrect regarding Chauvenet’s criterion?
a) Chauvenet’s criterion is one of the statistical methods used for outlier detection.
b) Chauvenet’s criterion evaluates the extent to which data points deviate from the range of standard deviation.
c) Chauvenet’s criterion is a method for excluding the largest or smallest values from a data set.
d) Chauvenet’s criterion can be applied to data following a normal distribution.
c
Chauvenet’s criterion employs statistical methods for outlier detection. It assesses the extent to which data deviate from the range of standard deviation and can be applied to data following a normal distribution. However, Chauvenet’s criterion does not involve excluding the largest or smallest values from a data set. Instead, it evaluates the degree to which data points deviate from a criterion to identify and potentially remove outliers. Therefore, (c) is the incorrect answer.
What is a primary advantage of using Electrodermal Activity (EDA) sensors in wearable technology for psychological research?
A) They can directly measure cognitive thoughts and processes.
B) They provide a direct measure of environmental temperature.
C) They can indicate emotional arousal by measuring changes in skin conductance.
D) They are primarily used to measure physical activities such as walking or running.
c
Electrodermal Activity (EDA), also known as Galvanic Skin Response (GSR), measures the electrical conductance of the skin, which varies with its moisture level. This method is particularly useful in psychological research because the sweat glands are controlled by the sympathetic nervous system, and thus, changes in skin conductance can be indicators of emotional arousal or stress. Unlike cognitive thoughts (A) or measuring environmental factors (B), EDA provides insights into the autonomic physiological responses to emotional stimuli, which are not directly observable. This makes it valuable for studying the subconscious aspects of human emotion and stress responses, far beyond what is possible with measures of physical activity (D)
What is the preferred method for imputing missing values in a time-series dataset where the order of data points is significant, and why?
A) Mean imputation, because it is the simplest method.
B) Mode imputation, because it uses the most frequent value.
C) Winsorizing, because it limits extreme values.
D) Interpolation, because it provides more natural values by considering the temporal order of the data.
d
In time-series datasets, where the temporal order and continuity of the data points are important, interpolation is a preferred method for imputing missing values. Unlike mean or mode imputation, which might not account for the time-dependent nature of the data, interpolation uses values from neighboring data points to estimate the missing values. This method ensures that the imputed values follow the dataset’s natural flow and variability over time, leading to more accurate and realistic data restoration. Winsorizing is more about limiting extreme values rather than imputing missing ones and might not be suitable for filling gaps in time-series data.
Which of the following best describes the impact of using overlapped windowing as opposed to distinct windowing?
A) Overlapped windowing significantly reduces the computational complexity of feature extraction.
B) Overlapped windowing can lead to overfitting due to the high similarity between features generated from adjacent windows.
C) Overlapped windowing eliminates the need for selecting a window size parameter (λ).
D) Overlapped windowing is only useful for numerical data and cannot be applied to categorical data.
B
Overlapped windowing involves choosing how much windows should overlap, typically to ensure sufficient data coverage and to capture relevant information across windows. However, it also cautions that overly high overlap can lead to features being too similar, potentially causing overfitting because of the limited variation among the generated features. This concept is discussed in the context of handling numerical data in the time domain, where overlapping windows are utilized to create features from sensory data.
So the answer is B
Question: Fill in the blanks with the correct options:
Length is an example of (1)____ data because it can be (2)____ and measured. When variables like length are analyzed using Overlapped Windowing, applying too large a window size can lead to a (3)____ problem.
Options:
A. (1) Numerical, (2) Continuous, (3) Overfitting
B. (1) Categorical, (2) Discrete, (3) Underfitting
C. (1) Numerical, (2) Discrete, (3) Overfitting
D. (1) Categorical, (2) Continuous, (3) Overfitting
E. (1) Numerical, (2) Continuous, (3) Underfitting
A
Length is a numerical data type because it quantifies an amount. It’s continuous as it can represent any value within a range, not just integers. When analyzing such data with Overlapped Windowing, choosing too large a window size can lead to overfitting. Overfitting occurs since features become too similar (limited variation)