Introduction
Here are some study notes I've put together from Trailhead.com and Salesforce.com while preparing for my Salesforce Certified AI Associate Exam. These notes are just for learning and might not be perfect, so double-check everything with other sources. Some parts are copied directly.
Main Information
Content: 40 multiple-choice questions
Time allotted to complete the exam: 70 minutes
Passing score: 65%
Trailmix: Prepare for Your Salesforce AI Associate Credential
AI basics, its different types such as predictive analytics, machine learning, NLP, and computer vision
Salesforce's Trusted AI Principles, particularly in the context of CRM systems like Salesforce and its suite of products
The role of data quality, data preparation/cleansing, and data governance in training and fine-tuning AI models
Ethical and responsible handling of data, including privacy, bias, security, and compliance considerations
Ability to engage in meaningful discussions with stakeholders on how AI can be used to improve their business and differing scenarios, including identifying opportunities for AI-driven improvements and potential challenges
Exam Outline:
AI Fundamentals: 17%
Explain the basic principles and applications of AI within Salesforce.
Differentiate between the types of AI and their capabilities.
AI Capabilities in CRM: 8%
Identify CRM AI capabilities.
Describe the benefits of AI as they apply to CRM.
Ethical Considerations of AI: 39%
Describe the ethical challenges of AI (for example, human bias in machine learning, lack of transparency, etc.).
Apply Salesforce's Trusted AI Principles to given scenarios.
Data for AI: 36%
Describe the importance of data quality.
Describe the elements/components of data quality.
AI Fundamentals (17%)
Main Types of AI:
Numeric Predictions
AI predictions take the value of 0 (not going to happen) to 1 (totally going to happen)
predict next quarter’s sales
Classifications
flag fraudulent transactions
diagnose illnesses
identify toxic comments
Robotic Navigation
the case of autonomous (hands-free) driving
can adapt to changing environmental conditions
rescue robots that can traverse disaster areas, such as a collapsed building
Language Processing
NLP → Natural Language Processing.
NLP is a huge part of generative AI.
- subcategory of AI that takes words and turns them into unique images, sounds, and words.
How machine learning works?
The core driver of AI. A process of using algorithms to tell something interesting about data without code. Computers learn from data with minimal programming.
Natural Language Understanding (NLU) systems handle communication between people and machines
Natural Language Processing (NLP) machines ability to understand what humans mean when they speak as naturally would to another human.
Named Entity Recognition (NER) → data labeling. Breaking apart a sentence into segments that a computer can understand and respond to quickly.
Deep Learning → refers to artificial neural networks being developed between data points in large databases. Connecting dots to give insights, draw conclusions.
All Relies on Data
→ Structured Data: organized, with labels on every column.
→ lets computers do supervised learning
→ Unstructured Data: unorganized, news article, unlabeled image file.
→ unsupervised learning, which is when AI tries to find connections in the data without knowing what’s looking for.
AI is incomplete without Neural Networks
Products SF
Name | Description |
Einstein Bots | resolve customer issues, collect qualified customer information, and hand off the customers to agents, |
Einstein Agent | case routing, automatic triaging, case field prediction. |
Einstein Discovery | take action with predictivr service KPIs. real-time analysis of drivers that impact KPIs, like churn or CSAT and suggested recommendations and explanations, managers are empowered to make more strategic decisions for their business. |
Einstein Vision for Field Service | automates image classification to resolve issues. Just by taking a picture of the object, Einstein Vision can instantly identify the part, ensuring accuracy for the technician and boosting first-time fix rates. |
Einstei Language | deep learning to developers. devs can use pretrained. models to classify text by the sentiment as either positive, neutral, negative and be able to classify the underlying intent in a body of text. |
Bots → Transparent, Personable, Thorough, Iterative.
Why we need Neural Networks?
Training AI by adding extra layers to find hidden meaning in data is what’s called deep learning.
designing a neural network involves choosing the number of nodes, layers, and the appropriate math for the task it’s training for.
Imagine a skilled talent scout who’s looking for the next great baseball player. “I’ll know him when I see him,” they might say. They can’t explain how they’ll know, they just will. In the same way, our neural network can’t explain why certain factors are important.
AI for Biz
Yes-and-No Predicitons and Answers
→ AI helps you answer these questions by analyzing historical data.
→ Scores are used.(0 to 100)
Numeric Predicitions
→ how much revenue?
→ how many days it will take us to resolve this customer’s issue
Classifications
→ use deep learning to operate on unstructured data like free text or images.
→ clustering: gathers insights from your data that you may not otherwise have noticed. if you are a clothing vendor, AI might learn that both rural older men and urban twentysomethings like to buy a certain type of sweater.
Where your intuition might tell you that these are two totally different groups, the data shows they behave similarly with respect to the products they buy, and you may want to market to those two groups in a similar way.
Recommendations
→ people who bought a specific pair of shoes also often order a certain pair of socks.
→ recommend content like whitepapers to business users
Workflow And Rules
- → Workflow and rules aren’t technically part of AI, but they’re an essential part of how AI is used. Use workflow when the AI predicts that a customer is unlikely to renew.
Summarization
Gen AI helps summarize chunks of info into quick and easy to digest notes for you.
MARKETING
Send Time Optimization helps predict the best time to send a communication for the highest response rate, specific to each person.
Generative AI
Unlike traditional AI models, generative AI “doesn’t just classify or predict, but creates content of its own […] and, it does so with a human-like command of language.
“AI is only as good as the data you give it and you have to make sure that the datasets are representative.”
Typically by using one of two types of deep learning models:
Generative Adversarial Networks (GANs)
made up of two neural networks: a generator and a discriminator.
the two networks compete with each other, with the generatot creating an output is real or fake, the generator then fine-tunes its outut based on the discriminator’s feedback, and the cycle continues until stumps the discriminator.
Transformer Models
like ChatGPT (Chat Generative Pretrained Transformer), create outputs based on sequential data (like sentences or paragraphs) rather than individual data points.
Variational Autoencoders (VAEs)
rely on on two neural networks to generate new data based on sample data, and neural radiance fields (NeRFs), which is being used to create 2D and 3D images.
Salesforce’s CodeGen → turn English prompts into executable code.
Salesforce ProGen Project creates language models based around amino acids instead of letters and words, generative AI was able to produce proteins that have not been found in nature, and in many cases, are more functional.
But IT leaders are on guard: Nearly six in 10 (59%) said they think generative AI outputs are inaccurate.
Generative AI vs Predictive AI
GenAI → utilizes complex modeling to add a creative element. GenAI software creates images, text, video and software code based on user prompts.
PredAI → uses large data reps to recognize patterns time. PredAI apps draw inferences and suggest outcomes and future trends.
Parameters | Generative AI | Predictive AI |
Objective | Generates new, original content or data | Predicts and analyzes existing patterns or outcomes |
Function | Creates new information or content | Makes predictions based on existing data |
Training data | Requires diverse and comprehensive data | Requires historical data for learning and prediction |
Examples | Text generation, image synthesis | Forecasting, classification, regression |
Learning process | Learns patterns and relationships in data | Learns from historical data to make predictions |
Use cases | Creative tasks, content creation | Business analytics, financial forecasting |
Challenges | May lack specificity in output | Limited to existing patterns, may miss novel scenarios |
Training complexity | Generally more complex and resource-intensive | Requires less complex training compared to generative models |
Creativity | Generative AI is creative and produces things that have never existed before | Predictive AI lacks the element of content creation |
Different algorithms | Generative AI uses complex algorithms and deep learning to generate new content based on the data it is trained on | Predictive AI generally relies on statistical algorithms and machine learning to analyze data and make predictions |
Limitations of GenAI:
Vulnerability to adversarial attacks
Contextual ambiguity
Potential biases
PredAI use cases: financial services, fraud detection, healthcare, marketing.
Natural Language Processing Basics
CS and Linguistics to give computers the ability to understand, interpret, and generate human language in a way that’s meaningful and useful to humans.
NLP was the Turing Test by Alan Turing, the test measures a machine’s ability to answer any questions in a way that’s indistinguishable from a human.
NLP has two subfields
Data processed from unstructured to structured is called natural language understanding (NLU).
Data processed the reverse way–from structured to unstructured–is called natural language generation (NLG).
Elements of natural language in English include:
Vocabulary: The words we use
Grammar: The rules governing sentence structure
Syntax: How words are combined to form sentences according to grammar
Semantics: The meaning of words, phrases, and sentences
Pragmatics: The context and intent behind cultural or geographic language use
Discourse and dialogue: Units larger than a single phrase or sentence, including documents and conversations
Phonetics and phonology: The sounds we make when we communicate
Morphology: How parts of words can be combined or uncombined to make new words
Parsing Natural Language
parsing, involve breaking down text or speech into smaller parts to classify them for NLP.
Parsing includes syntactic parsing, where elements of natural language are analyzed to identify the underlying grammatical structure,
and semantic parsing which derives meaning.
Syntactic parsing may include:
Segmentation: Larger texts are divided into smaller, meaningful chunks. Segmentation usually occurs at the end of sentences at punctuation marks to help organize text for further analysis.
Tokenization: Sentences are split into individual words, called tokens. In the English language, tokenization is a fairly straightforward task because words are usually broken up by spaces. In languages like Thai or Chinese, tokenization is much more complicated and relies heavily on an understanding of vocabulary and morphology to accurately tokenize language.
Stemming: Words are reduced to their root form, or stem. For example breaking, breaks, or unbreakable are all reduced to break. Stemming helps to reduce the variations of word forms, but, depending on context, it may not lead to the most accurate stem. Look at these two examples that use stemming:
“I’m going outside to rake leaves.”
Stem = leave
“He always leaves the key in the lock.”
Stem = leave
Lemmatization: Similar to stemming, lemmatization reduces words to their root, but takes the part of speech into account to arrive at a much more valid root word, or lemma. Here are the same two examples using lemmatization:
“I’m going outside to rake leaves.”
Lemma = leaf
“He always leaves the key in the lock.”
Lemma = leave
Part of speech tagging: Assigns grammatical labels or tags to each word based on its part of speech, such as a noun, adjective, verb, and so on. Part of speech tagging is an important function in NLP because it helps computers understand the syntax of a sentence.
Named entity recognition (NER): Uses algorithms to identify and classify named entities–like people, dates, places, organizations, and so on–in text to help with tasks like answering questions and information extraction.
Sentiment analysis: Involves determining whether a piece of text
Intent analysis: Intent helps us understand what someone wants or means based on what they say or write.
Context (discourse) analysis: Natural language relies heavily on context. The interpretation of a statement might change based on the situation, the details provided, and any shared understanding that exists between the people communicating.
AI from A to Z: the GenAI Glossary:
Anthropomorphism | The tendency for people to attribute human motivation, emotions, characteristics or behavior to AI systems. |
Artificial neural network (ANN) | An Artificial Neural Network (ANN) is a computer program that mimics the way human brains process information. |
Augmented intelligence | Augmented intelligence (AI) is a human-centered design pattern that uses artificial intelligence (AI) to enhance human intelligence, rather than replace it. AI uses machines to mimic human behavior, while AI uses machines with a different perspective and goal in mind. AI is intended to operate without human assistance, while AI enhances human intelligence |
Conversational AI | a type of artificial intelligence that uses machine learning, natural language processing, and foundation models to simulate human conversation. |
Deep learning | An advanced form of AI that helps computers become really good at recognizing complex patterns in data. useful for things like image recognition, speech processing, and natural-language understanding. |
Discriminator (in a GAN) | the discriminator is like a detective. When it’s shown pictures (or other data), it has to guess which are real and which are fake. The “real” pictures are from a dataset, while the “fake” ones are created by the other part of the GAN, called the generator (see https://www.salesforce.com/blog/generative-ai-glossary/?_gl=1*17jrkqw*_ga*MTU0Mzc1NjQ4NC4xNjQ5NTEyMTI4*_ga_H6M98GGB18*MTcxMzc5NzI4My42NS4xLjE3MTM3OTgzMjIuMC4wLjA.*_gcl_au*MjA3MDIzNzgyNy4xNzEzNzkwMDQ2&_ga=2.169919136.1649462121.1713790047-1543756484.1649512128#Generator below). The discriminator’s job is to get better at telling real from fake, while the generator tries to get better at creating fakes. This is the software version of continuously building a better mousetrap. |
Ethical AI maturity model | a framework that helps organizations assess and enhance their ethical practices in using AI technologies. |
Explainable AI (XAI) | Explainable AI (XAI) should provide insight into what influenced the AI’s results, which will help users to interpret (and trust!) its outputs |
Generative AI | |
https://www.salesforce.com/news/stories/what-is-generative-ai/ is the field of artificial intelligence that focuses on creating new content based on existing data. | |
Generative adversarial network (GAN) | GANs are made up of two neural networks: a https://www.salesforce.com/blog/generative-ai-glossary/?_gl=1*17jrkqw*_ga*MTU0Mzc1NjQ4NC4xNjQ5NTEyMTI4*_ga_H6M98GGB18*MTcxMzc5NzI4My42NS4xLjE3MTM3OTgzMjIuMC4wLjA.*_gcl_au*MjA3MDIzNzgyNy4xNzEzNzkwMDQ2&_ga=2.169919136.1649462121.1713790047-1543756484.1649512128#generator and a https://www.salesforce.com/blog/generative-ai-glossary/?_gl=1*17jrkqw*_ga*MTU0Mzc1NjQ4NC4xNjQ5NTEyMTI4*_ga_H6M98GGB18*MTcxMzc5NzI4My42NS4xLjE3MTM3OTgzMjIuMC4wLjA.*_gcl_au*MjA3MDIzNzgyNy4xNzEzNzkwMDQ2&_ga=2.169919136.1649462121.1713790047-1543756484.1649512128#discriminator-in-GAN. The two networks compete with each other, with the generator creating an output based on some input, and the discriminator trying to determine if the output is real or fake. The generator then fine-tunes its output based on the discriminator’s feedback, and the cycle continues until it stumps the discriminator. |
Generative pre-trained transformer (GPT) | |
GPT is a neural network family that is trained to generate content. GPT models are pre-trained on a large amount of text data, which lets them generate clear and relevant text based on user prompts or queries. | |
Generator | |
A generator is an AI-based software tool that creates new content from a request or input. It will learn from any supplied training data, then create new information that mimics those patterns and characteristics. ChatGPT by OpenAI is a well-known example of a text-based generator. | |
Grounding | about ensuring that the system understands and relates to real-world knowledge, data, and experiences. It’s a bit like giving AI a blueprint to refer to so that it can provide relevant and meaningful responses rather than vague and unhelpful ones. For example, if you ask an AI, “What is the best time to plant flowers?” an ungrounded response would be, “Whenever you feel like it!” A grounded response would tell you that it depends on the type of flower and your local environment. The grounded answer shows that AI understands the context of how a human would need to perform this task. |
Hallucination | A hallucination happens when generative AI analyzes the content we give it, but comes to an erroneous conclusion and produces new content that doesn’t correspond to reality or its training data. |
Human in the Loop (HITL) | making sure that we offer oversight of AI output and give direct feedback to the model, in both the training and testing phases, and during active use of the system. |
Large language model (LLM) | An LLM is a type of artificial intelligence that has been trained on a lot of text data. It’s like a really smart conversation partner that can create human-sounding text based on a given prompt. Some LLMs can answer questions, write essays, create poetry, and even generate code. |
Machine learning | is how computers can learn new things without being programmed to do them. |
Machine learning bias | |
Machine learning bias happens when a computer learns from a limited or one-sided view of the world, and then starts making skewed decisions when faced with something new. | |
Model | a program that’s been trained to recognize patterns in data. You could have a model that predicts the weather, translates languages, identifies pictures of cats, etc. Just like a model airplane is a smaller, simpler version of a real airplane, an AI model is a mathematical version of a real-world process. |
Parameters | Parameters are numeric values that are adjusted during training to minimize the difference between a model’s predictions and the actual outcomes. They define the LLM’s structure and behavior and help it to recognize patterns, so it can predict what comes next when it generates content. |
Prompt defense | what terms and topics you don’t want your machine learning model to address. |
Prompt engineering | Prompt engineering means figuring out how to ask a question to get exactly the answer you need. It’s carefully crafting or choosing the input (prompt) that you give to a machine learning model to get the best possible output. |
Red-Teaming | |
The term “red-teaming” is drawn from a military tactic that assigns a group to test a system or process for weaknesses. When applied to generative AI, red-teamers craft challenges or prompts aimed at making the AI generate potentially harmful responses. By doing this, they are making sure the AI behaves safely and doesn’t inadvertently lead to any negative experiences for the users. It’s a proactive way to ensure quality and safety in AI tools. | |
Reinforcement learning | |
a technique that teaches an AI model to find the best result via trial and error, as it receives rewards or corrections from an algorithm based on its output from a prompt. Think about training an AI to be somewhat like teaching your pet a new trick. Your pet is the AI model, the pet trainer is the algorithm, and you are the pet owner. With reinforcement learning, the AI, like a pet, tries different approaches. When it gets it right, it gets a treat or reward from the trainer, and when it’s off the mark, it’s corrected. Over time, by understanding which actions lead to rewards and which don’t, it gets better at its tasks. Then you, as the pet owner, can give more specific feedback, making the pet’s responses refined to your house and lifestyle. | |
Transformer | |
Transformers are a type of deep learning model, and are especially useful for processing language. They’re really good at understanding the context of words in a sentence because they create their outputs based on sequential data (like an ongoing conversation), not just individual data points (like a sentence without context). The name “transformer” comes from the way they can transform input data (like a sentence) into output data (like a translation of the sentence). | |
Validation | Validation is a step used to check how well a model is doing during or after the training process. The model is tested on a subset of data (the validation set) that it hasn’t seen during training, to ensure it’s actually learning and not just memorizing answers. |
Zero data retention | prompts and outputs are erased and never stored in an AI model. |
Zone of proximal development (ZPD) | An education concept. For example, each year students progress their math skills from adding and subtracting, then to multiplication and division, and even up to complex algebra and calculus equations. The key to advancing is progressively learning those skills. In machine learning, ZPD is when models are trained on progressively more difficult tasks, so they will improve their ability to learn. |
AI Capabilities in CRM (8%)
Salesforce Einstein
AI Assistants
Voice input
Natural lang understanding
Voice output (natural lang generation)
Intelligent interpretation
Einstein Out-of-the-box Apps
Einstein Platform → Discover, Predict, Recommend, Automate, Generate
Einstein Prediction Builder → a simple point-click wizard that allows you to make custom predictions on your non-encrypted Salesforce data.
Einstein Next Best Action → use rules-based and predictive models to provide anyone in your business with intelligent, contextual recommendations and offers. Actions are delivered at the moment of maximum impact—surfacing insights directly within Salesforce.
Einstein GPT → yk.
Einstein Bots
Needs to have Service Cloud License.
Einstein Next Best Action
AI Is Only as Good as the Data
Sales Cloud Einstein.
Einstein Lead Scoring - This feature applies the power of AI to analyze your history of lead conversions and find the true patterns in those conversions
Einstein Opportunity Insights offers smart predictions and follow-ups about different opportunities precisely when they’re needed
Einstein Account Insights helps your sales team maintain their relationships with customers by keeping the team informed about key business developments that affect customers.
Einstein Activity Capture, activity-related insights (such as when a contact mentions a competitor or is leaving their company) are identified from recent emails and events related to accounts.
Einstein Email Insights → Actionable intelligence in reps’ inboxes helps identify which customers need attention.
Einstein Automated Contacts → Contact records are automatically added to Salesforce, so reps spend less time on data entry.
Einstein Activity Capture → Having email and event data automatically logged on Salesforce records increases reps’ productivity and visibility into potential customers.
Einstein Readiness Assessor is a tool that tells you whether you meet the requirements for specific Sales Cloud Einstein feature
Einstein Discovery
Salesforce Einstein Discovery augments your business intelligence with statistical modeling and supervised machine learning in a no-code-required, rapid-iteration environment.
Note: Einstein Discovery requires either the CRM Analytics Plus license or Einstein Predictions license, both of which are available for an extra cost.
Target Business Outcomes to Improve
Begin by selecting a business problem you want to solve, typically monitored as a key performance indicator (KPI). Einstein Discovery-powered solutions address these use cases:
Regression for numeric outcomes represented as quantitative data (measures), such as currency, counts, or any other quantity.
Binary classification for text outcomes with only two possible results. These are typically yes or no questions that are expressed in business terms, such as churned or not churned, opportunity won or lost, employee retained or not retained, and so on.
Multiclass classification for text outcomes with 3 to 10 possible results. For example, a manufacturer can predict, based on customer attributes, which of five service contracts a customer is most likely to choose.
Ethical Model Development with Einstein Discovery
Disparate Impact → it means that the data reflects discriminatory practices toward a particular group.
Proxy values are other attributes in your dataset that are correlated with sensitive
variables.
A model is a sophisticated custom equation based on a comprehensive statistical understanding of past outcomes that's used to predict future outcomes.
Einstein Predicition Service: a public REST API service that lets you programmatically interact with Einstein Discovery–powered models and predictions.
Get predictions on your data.
Get suggested actions to take to improve predicted outcomes.
Manage prediction definitions and models that are deployed in Salesforce.
Manage bulk scoring jobs.
Manage model refresh jobs.
Ethical Considerations of AI (39%)
Audience Targeting, Collect and Respect Preferences, Frequency Capping,
[Behavioral Messaging Scenarios]
Abandon Cart is the canonical retailer use case for behavioral messaging., Abandon Browse,
Price Drop → If you let a consumer know when the product goes on sale, your brand message is meaningful, relevant, and in the consumers’ interest. Also consider adding a feedback mechanism that allows the consumer to tell you,
New Item in Favorite Category or New Promotional Campaign →
Post purchase → Give customers a reason to come back to you
Frequency Capping
Relationship Design
Human-centered design (HCD)
method of creative problem-solving that leads to a desired outcome.
User experience (UX) design—The creation of meaningful and relevant experiences for users.
Service design—The creation of consistent user or customer experiences across multiple interactions.
UX design and service design describe what someone is designing: a product or a service. HCD is how something is designed. HCD methods have long been applied to UX and service design. It’s a flexible and powerful practice. What you do with it is all about where you focus.
Relationship Design is the creation of experiences that foster ongoing engagement and strengthen connections between people, companies, and communities over time.
Service design focuses on:
How easily someone can accomplish their goals during an interaction, whether online or in person.
How consistent the brand experience is for customers across distinct interactions—whether they’re online, in an app, on different devices, or in person.
How organizations are set up to deliver great experiences to customers.
Relationship Design → Engagement, Connection, Social Values.
Success is at the intersection of desirability, feasibility, and viability.
Mindsets of Relationship Design
**Compassion mindset—**to lead with strengthening connection
**Courage mindset—**to push ourselves to be vulnerable
**Intention mindset—**to engage with clear purpose
**Reciprocity mindset—**to exchange value in service of longevity
Cycle of Exclusion
Why we make: The motivations of the problem solver.
Who makes it: The problem solver.
How we make: The methods and resources the problem solver uses.
Who uses it: The assumptions the problem solver makes about the people who use the solution.
What we make: The solution or product that the problem solver creates.
Risk of a Hero Complex
A common pitfall to avoid is wanting to be the hero who saves the day with a design. Yes, great design can have a meaningful impact and you may even be called a superhero for creating it. But making assumptions about what products or solutions a group might need will likely lead to trouble.
Ability Bias → the habit of using our own abilities as the baseline to solve problems.
Disability About Mismatches
<aside> 💡 Disability ≠ personal health condition → Disability = mismatched human interactions
</aside>
<aside> 💡 The persona spectrum is what we use in inclusive design to solve for one person and then extend to many
</aside>
[x] Relatioship Design
[x] Ethics by Design
[x] Inclusive Design
[x] Values-Driven Design
Values → key character traits or codes of conduct that individuals and organizations aspire to or embody.
[x] Design as a Social Practice
[x] Accountability in Design
[x] Relationship Design at Scale
Trusted AI Principles: Responsible, Accountable, Transparent, Empowering, Inclusive
Consumers don’t trust AI systems but they expect companies to use them responsibly
Data for AI (36%)
Data Literacy
Tableau Products
Tableau Prep Builder
Tableau Desktop
Tableau Server
Tableau Cloud
Tableau Public
Tableau Desktop Public Edition
Tableau Mobile
Asking Why
The 5 Whys technique, developed by Toyota Motors founder Sakichi Toyoda, proposes asking why? of a problem that’s been identified, and then continuing to ask why for each answer or explanation given. Although the primary goal of the technique is to determine the root cause of a defect, we can use this technique to dig into the causes of any outcome.
Discover Variables and Field Types
Qualitative variables are variables that cannot be measured numerically, such as categories or characteristics. Categorize. Segment.
Nominal: cannot be ranked. (banana, grapes, and apples → these are nominal variables because there is no implied ranked order among them.
Ordinal → can be ranked. Ordinal = ordered.
Quantitative variables are variables that can be measured numerically, such as the number of items in a set. Can be aggregated.
Discrete Variables: individually separate & distinct. E.g, 3 children, 6 children → when you can count them separately.
Continuous Variables: Continuous means forming an unbroken whole, without interruption. Volume of water in the ocean.
Variance: how data points vary from the mean.
Standard deviation: is the measure of the distribution of statistical data
Here are the major characteristics of a normal distribution.
They are symmetrical around the mean.
The mean and median are equal.
The area under the normal curve is equal to 1.0 (or 100%).
They are denser in the center and less dense in the tails.
They are defined by two parameters, the mean and the standard deviation
Inference is the process of drawing conclusions about a population based on a sample of the dat
Guidelines to Recognize Misleading Charts
The bar charts should have a zero baseline.
get to know source (who, what, where, when, why)
Axes (how the data is displayed in chart)
Horizontal axis → x-axis
Vertical axis → y-axis
A scatterplot shows the relationship between two quantitative variables. The data is plotted as Cartesian coordinates, marking how far over and how far up each data point is.
A line chart connects a series of quantitative values, and is often used to show a time series (where the x-axis is time). Also known as a line graph or line plot.
A histogram depicts a distribution of data and the frequency of values in a data set as connected bars. The width of the bars is tied to the values on the x-axis. Statisticians, scientists, and analysts refer to the widths of each bar as bins.
A box and whisker plot shows the distribution of data using percentiles. Also known as a box plot.
An interval is the distance between the values (tick marks) on a quantitative axis.
use SCAM Checklist: Source, Chart, Axes on charts, Message
EU Privacy Law
General Data Protection Regulation (GDPR)
Term | Definition | Example |
Data Subject | A “natural person” who can be directly or indirectly identified by information such as a name, an identification number, location data, an online identifier (such as a username), or their physical, genetic, or other identity. | Marie Dubois |
Personal Data | Any information relating to an identified or identifiable data subject. | Woman. Age 48. Ph#: 33 1 7210 940. Address: 99 Place de l'Étoile, 75008 Paris, France. Likes hats. Reads Le Monde online every day. |
Sensitive Personal Data | Personal data pertaining to racial or ethnic origin, political opinions, religious or philosophical beliefs, trade-union membership, information about health, sex life and sexual orientation, and genetic or biometric data. | Member of En Marche! Party. Catholic. Broke leg last year. Copy of fingerprints and retinal scan. |
Processing | Anything that is done to or with personal data. | Any collection, storage, transfer, sharing, modification, use, or deletion of personal data. |
Controller | An entity that determines the purposes and means of processing of personal data. | Grande Banque du Nord is a financial institution that is providing Marie with a mortgage to buy a house. When Marie first registers on Grande Banque's website to get more information about mortgages, Grande Banque becomes a controller of the personal data Marie provides. |
Processor | An entity that processes personal data based on the instructions of a controller. | Salesforce becomes a processor of Marie’s personal data when Grande Banque uploads her data to its Sales Cloud instance. |
Pseudonymous Data | Personal data that cannot be tied to a specific data subject without additional information that is stored separately, with technological measures to ensure the data is not combined with that additional information. | When Marie visits the Grande Banque website portal hosted on Experience Cloud to learn more about the mortgage process, the system records her IP address in hashed form and links it to the pages that Marie views. The hashed IP address is considered pseudonymous data, because, although the hashed IP address alone does not identify Marie, it’s still possible to link it to other information that relates to Marie. |
Anonymous Data | Data that cannot ever be connected to an identified or identifiable person. | The Grande Banque website asks people to leave reviews. The system does not collect any information from reviewers—not even IP addresses. The reviews themselves can be considered anonymous. |
Data Requirements
Einstein Discovery requires a CRM Analytics dataset with at least 3 columns: one outcome variable plus two explanatory or predictor variables. Einstein Discovery supports datasets with up to 50 variable.
Time frames, how much data to get, time series,
Data Quality Dimensions:
Completeness.
Validity
Uniqueness
Timeliness’
Consistency
Accuracy