Nerko Marketing & Tech

Category: Social Media

How We Built a Facebook Inspector
The Citizen Browser project seeks to illuminate the content Facebook elevates in its users’ feeds

By: Surya Mattu, Leon Yin, Angie Waller, and Jon Keegan

Social media platforms are the broadcasters of the 21st century.

Like traditional broadcasters, social media platforms choose—through their algorithms—which stories to amplify and which to suppress. But unlike traditional broadcasters, the social media companies are not held accountable by any standards other than their own ever-changing decisions on the types of speech they will allow on their platforms.

And their algorithms are not transparent. Unlike on the evening news broadcast, no one can see what they have decided will be the top story of the day. No two people see exactly the same content in their personalized feeds. As a result, it is difficult for independent groups to track the spread of misinformation like the “Plandemic” video, which garnered millions of views on both Facebook and YouTube before being removed.

So we decided to try to monitor what is being algorithmically broadcast to social media users by bringing together a first-of-its-kind national panel of users who are paid to share their data to us. We started with Facebook, which has more than 2.7 billion monthly active users.

We built a custom standalone desktop application that was distributed to a panel of more than 1,000 paid participants. These panelists provided us with demographic information about themselves—gender, race, location, age, political leanings, and education level—and connected the Citizen Browser application to their personal Facebook accounts. The application periodically captures data from the panelists’ Facebook feeds.

To protect the privacy of panelists, we automatically strip potential identifiers from their captured Facebook data. The raw data we collect from them is never seen by a person and is automatically deleted after one month.

After the redaction process, we store links, news articles, and promoted groups and pages in a database for analysis. The data we collect through the app is used, in combination with demographic and political data provided by the panelists, to determine what information Facebook serves to different people, what news and narratives are amplified, and which online communities people are encouraged to join. The application, data-processing pipeline, and underlying cloud infrastructure were audited by a third-party security research firm, Trail of Bits. It carried out a security assessment and reviewed our code for best practices in securely processing panelists data. We took additional steps to protect user data based on the security firm’s recommendations. We describe these privacy-preserving steps in more detail in the Redactors section and Appendix 2.

Background

According to a recent study by Pew Research, about one in five Americans say they get their political news primarily through social media. But very little is known about the workings of the algorithms that decide which content to recommend to which people.

Facebook discloses some general principles about how its algorithm works: It says it prioritizes content based on who posted it, what type of content it is, and whether the post has attracted a lot of shares and reactions. But it has not allowed much independent research to be conducted on its platform.

In the wake of Cambridge Analytica, Facebook added sweeping restrictions on the use of its core Facebook Open Graph developer API and has increased the use of human reviews to approve developer apps. For example, just a few years ago it was very easy to collect the public posts for any page on Facebook (which was an important way to track news sources on the platform), but that availability has since been restricted to top-level metadata about public pages.

In 2018, Facebook announced a collaboration with the independent academic researchers at Harvard University’s Social Science One. Facebook committed to sharing more than a petabyte of data with researchers whose proposals were accepted by an independent committee. But after more than 18 months of delay, Facebook did not live up to its promises. Instead, researchers were given access to an extremely limited dataset and CrowdTangle, a social analytics firm owned by Facebook. In response to these shortcomings, the project co-chairs wrote that the “current situation is untenable,” and philanthropic partners began to leave the project.

In 2020, Facebook announced a new research partnership to better understand the impact of Facebook and Instagram on key political attitudes and behaviors during the U.S. 2020 elections, Social Science One facilitated the start of the project. Facebook said it does not expect the study to result in published findings till mid-2021 at the earliest.

The main source Facebook makes available for journalists and researchers to understand the patterns on its platform is CrowdTangle, which it bought in 2016. CrowdTangle offers a robust view of Facebook and Instagram engagement for posts, links, videos from public pages, groups, and verified users. Importantly, it does not provide data about the number of times content is shown to users.

Facebook has publicly criticized journalists who use CrowdTangle to understand what is being amplified on Facebook. In order to measure popularity, Facebook says, you would need to measure the number of people who see the post. However, at the moment Facebook does not make impression data available publicly.

Citizen Browser is an attempt to examine those algorithms by assembling a demographically diverse panel of Facebook users and monitoring what content is recommended to them.

Prior Work

Citizen Browser builds upon other work that attempts to understand the Facebook ecosystem.

Blue Feed, Red Feed was a 2016 project from Wall Street Journal reporter Jon Keegan (who is now at The Markup and a contributor to this methodology) that used Facebook’s own data to examine the sharing habits of 10 million U.S. users over the course of six months. Based on self-described political beliefs and the users’ sharing habits, the Journal used the news sources most strongly aligned with its most partisan users to display side by side a simulated view of what a liberal and conservative news feed might look like.

NYU Ad Observatory is a browser-extension-enabled project that archives and shares ads and metadata from Facebook and Google’s political ad libraries as well as targeted ads served to volunteers who’ve downloaded the extension and signed into Facebook on their desktops. In an attempt to squelch third-party data collection, Facebook sent a letter to NYU in the run-up to the U.S. 2020 presidential election demanding an end to the project.

Nieman Lab used Amazon’s Mechanical Turk platform to survey 173 people about the news sources they saw in their news feed. Surprisingly, for the incredibly busy news cycles of October 2020, it found that a majority of the sampled users saw no news at all (in the first 10 posts of their feeds).

For an opinion piece discussing Baby Boomers’ exposure to conspiracy theories and misinformation on Facebook, The New York Times’s Charlie Warzel did a similar experiment, observing the Facebook feeds of two strangers who agreed to share their credentials.

The Citizen Browser Panel

Description

Citizen Browser monitors the content Facebook presents to its users in their news feed along with what groups and pages are suggested to them.

The panel is currently composed of participants from 48 U.S. states. We used a survey research provider to invite a nationally representative sample of U.S. adults to join the project as paid participants. Because we could only accept participants who used a desktop or laptop, had installed the Chrome web browser, and were active users of Facebook, it was difficult to get participants. About 95 percent of the participants we approached failed to complete the registration requirements. The panel size also fluctuated: As panelists dropped out for various reasons, we recruited fresh participants.

To most accurately describe the demographic makeup of this dynamic panel as of the time of publication, we tabulated the demographic composition of our panel group based on panelists who kept the application connected between Nov. 30 and Dec. 30, 2020, and had at least 20 data captures within that period.

The tables below describe the demographics of our panel during that December time frame, alongside our target demographics based on national averages from the 2016 American Community Survey from the U.S. Census Bureau.

Despite our best efforts, we failed to reach our targets for Hispanic and Latino panelists, a challenge that other pollsters have faced as well. We also failed to reach our targets for Trump voters, a phenomenon that pollsters similarly faced in the runup to the presidential election. Our panel is also older and more educated than the U.S. population, which reflects desktop computer usage.

We describe the challenges we had reaching these target demographics in the Limitations section.

Panel Makeup

Citizen Browser Data Collection

The Application

The Citizen Browser app is a standalone desktop application based on the open source Electron JavaScript framework. The app is compatible with Windows and macOS operating systems. Panelists download the application through the panel management portal after taking a short demographic survey. When they download the app, the panelists are asked to sign in to Facebook in a browser that is controlled by the application. Once they have successfully signed in, panelists are no longer required to interact with the application.

The app is designed to run 24/7 and remains open in the background of the user’s computer, minimized in the Start Bar or Finder toolbar. The app performs Facebook captures between one and three times a day using NGFetch, a proprietary browser automation tool developed by Netograph. NGFetch uses a combination of the Chrome Devtools Protocol and JavaScript to load and interact with webpages. NGFetch captures data from the browser including HTML, a screenshot of the page, and all metadata other than large response bodies (HTML/images/CSS), as described here by Netograph.

To capture data, the application visits the following Facebook urls with a signed-in browser profile:
The app collects html source code and screenshots from Facebook. This includes the Facebook homepage, suggested groups, and recommended pages. We focus only on items promoted through shared links on these pages. This includes advertisements, public posts, publicly shared video links (not the videos themselves), shared links and reaction counts (no text or usernames are captured), suggested groups, and suggested pages. For a detailed description of what data we collect, see Appendix 3.

The app does not collect contents or photos from personal, direct messages. It does collect photos that are shared on users’ homepages, but we use a computer program to identify and discard them without human intervention. We redact and discard photos, comments, and identifiers such as users’ names and friends’ names.

The app does not track user behavior on Facebook, and it does not collect any user browsing information—even if the user is on Facebook when the app is running.

Once the capture routine is complete, the collected files are compressed in a zip archive and uploaded to our cloud infrastructure. That data is immediately processed to remove information that could identify panelists and their friends, including account names, usernames, the names of social media connections and contacts, profile pictures, and friend requests. (See The Redactors, below.)

The Redactors

To ensure that we protect user privacy, we built redactors that strip potentially identifying information from data collected from panelists. The raw data we collect from panelists is never seen by a person and is automatically deleted after one month. The only data that is ever analyzed is the post-redacted data.

The automatic redactors operate by finding the xpaths of elements based on accessibility features (ARIA), data attributes, and the content of href links. To account for multiple interface designs available on Facebook, we wrote redactors and parsers customized for each version.

The redactors block out sections of screenshots and scrub the saved source code of identifiers. We consider the following page elements to be identifiers: Facebook video stories, video chat rooms, comments and replies on posts, friend requests, birthdays, contacts, messenger conversations, notifications, pages or groups that a panelist might manage, private usernames and avatars, posts that contain memories, profile picture updates, “People you may know,” or crisis responses.

We use the resulting xpaths to identify rectangles to redact in the screenshots and elements to scrub in the source code. We also parse embedded javascript found in the head elements of source code to identify and remove all instances of the Facebook ID and the first and last name of each panelist.

The original capture is hosted on a cloud storage service with a strict no-access policy. This data can only be accessed by the automated redactors and is deleted after one month. The redacted copy of the file is then processed. There is more information about how we secure user data in Appendix 1. The full list of what we store after redaction is in Appendix 3.

Limitations

Citizen Browser’s analysis is limited by the following factors:
1. Demographic balance
  1. a) System compatibility and smartphone dependency
    The Citizen Browser application’s system requires Mac or Windows computers for security and could not be used by people who access the internet only through mobile devices. In its Internet/Broadband Fact Sheet, Pew Research reports that in 2019, 23 percent of Black and 25 percent of Hispanic households rely on smartphones for internet access compared with 12 percent of White households. For adults with less than a high school education, 32 percent rely on smartphone devices.
  3. b) Trust in surveys and political leanings
    About 95 percent of people contacted for the panel chose not to participate because of lack of trust in having a third-party application installed on their computer or other concerns for privacy. Additionally, we faced a challenge in getting a balanced sample of political leanings that is similar to the challenges found in presidential polling research, which has suggested distrust in polling is correlated with conservative leanings.
2. Data capture
  1. We try to ensure that our parsers are up to date, but it’s possible that some data is being lost as Facebook is constantly updating its user interface and running A/B tests on its users; our parsers might not work equally well on the different versions. In addition, NGFetch relies on the Chrome Web Browser. It’s possible that there are differences in Facebook’s targeting or behavior for users of non-Chrome browsers.
3. Obscure algorithms
  1. Due to the variety of signals that go into the content Facebook shows to its users, we cannot determine why any particular piece of content is shown. The only exception to this is when that information is specifically made available by Facebook, as it does with the “Why am I seeing this ad?” feature.
Appendices

Appendix 1: Security Features

The app incorporates code signing, a security technology that adds a unique signature to the app. This allows the end user’s system to know the code was authored by The Markup and detect if the app was compromised during an update. To fulfill the requirements on macOS, the app was registered in the Apple Developer Program. On Windows, the app is signed with a Windows Authenticode certificate.

The application has highly restricted access to the cloud server. It uses a one-time presigned url to upload the files to Amazon’s S3 service. This presigned url forces the uploaded file to conform to its assigned file size and timestamp, preventing unexpected files from being uploaded. Once this zip file of the capture is uploaded, it is deleted from the user’s computer.

The Citizen Browser application never sees the panelist’s login information. The application is bundled with a custom capture tool that communicates with a Chrome browser using the Chrome Developer Tools Protocol and JavaScript. This tool acts as a firewall built into the app and prevents it from having any access to the user’s login details entered in the Chrome browser.

A new browser profile for the Chrome browser is generated during the onboarding process and then used for captures. This browser profile is completely isolated from the application and never leaves the panelist’s computer. When the panelist uninstalls the application, this profile is deleted from the computer.

The raw data collected from users is automatically deleted after a month. The S3 bucket with the uploaded zip has an access policy allowing only the cloud computer (AWS lambda) running the redactors to view the data. Only a few employees of The Markup have permissions allowing them to modify this policy, and such changes are logged.

Development and testing of the application and redactors is done on data from an internal group of testers. The screenshots we collect are used only for testing and development purposes from this group. Screenshots collected from panelists are never used in analysis; only the redacted html is used from panelists. All testing is done using a copy of the infrastructure in a separate AWS account.

Appendix 2: Security Audit

Trail of Bits, a security research firm, audited the desktop application and cloud infrastructure from Dec. 8 to Dec. 11, 2020. The firm reviewed the source code through a combination of automated and manual techniques. In addition to a security assessment, it also reviewed the code for best practices in securely processing panelists’ data.

Citizen Browser’s desktop applications code was analyzed using Electronegativity, a tool that identifies security weaknesses in Electron-based applications. With its JavaScript codebase, the Electron framework operates in a manner similar to a website on the user’s machine and shares similar vulnerabilities to web-based applications. This analysis helped us identify ways in which to configure the application to protect it from being hijacked by malware or other malicious actors.

A static analysis of the backend code base was done using a combination of manual review and Semgrep; the relevant recommendations were implemented.

The cloud infrastructure powering the project was also audited by Trail of Bits. Using manual review, ScoutSuite, and cloudsplaining, they identified ways in which we could further limit permissions to resources such as storage, serverless computation, and API gateways to strengthen our security posture.

Appendix 3: Data We Collect and Store

This appendix describes the tables and columns of the database that are stored after redacting and processing the data

Appendix 4: How We Categorize Facebook Posts

Because Facebook pages contain many types of content—for example, the Facebook news feed has several types of posts, including advertisements, group posts, and private posts—we created different categories for our analysis based on the distinguishing features we found in the data. These are listed below:

Public and private posts (is_public)

We looked for differences among audience icons and their accessibility tags (“Shared with Public”) to identify whether a post is public or shared privately.

Shared by Facebook (is_facebook)

We identified posts that were from Facebook—like election information, “People you may know” modules, and unpaid promotions for Facebook features and products by noticing that none of them include audience icons. In addition, we found Facebook promotional information in the head element of the source code.

Sponsored posts (is_sponsored)

Sponsored posts usually had either text or accessibility features that identified the post as an advertisement. However, that was not always the case. Sometimes Facebook obfuscated these signals with random noise in the form of invisible characters that polluted the “Sponsored” text or invisible elements with accessibility tags that mislabeled user posts as advertisements. We correct for these mistakes when we categorize posts.

Political advertisements provide attribution to the political group or PAC that paid for the posting. We collected this information, whenever possible.

Recommended posts (is_suggested)

Facebook showed posts from pages and groups that our panelists did not “like.” We identified these posts using div elements with the text “Suggested for you.”

This article was originally published on The Markup and was republished under the Creative Commons Attribution-NonCommercial-NoDerivatives license.
2021-02-26
Video ads, 0,0001 per view-Chris Record

Video ads, 0,0001 per view, best training ever https://goo.gl/nz5CNN

Posted by Diamond Puntavicius on Saturday, February 4, 2017

Video ads, 0,0001 per view-Chris Record – How to create viral videos where you pay as low as one hundreds of a penny and how to monetize these views.

2017-02-06
Tecademics Free Training
Tecademics Free Training – Insane Internet Marketing Resource Library (over 30 hours of pure value training videos)

See what is inside:
- SHOPIFY CASE STUDY Of A Student That Did $35K+ In Sales In 12 Days! 92 min
- How to Get Facebook Video Views for $0.01 or Less Each! 13 min
- 7 Steps to Triple Your Facebook Engagement 9 min
- HOW TO GET A $10,000 WORKING CAPITAL LOAN FROM PAYPAL™ 32 min
- How to Get a $20,000+ Spending Limit for Facebook™ Ads 27 min
- FREE FB Marketing Hacks By Chris Record 115 min
- 7 Steps To Build Your Email List 18 min
- Affiliate Marketing Bonuses – What They Are and Why They Work So Well! 13 min
- 8 Key Criteria When Choosing An Affiliate Offer To Promote 17 min
- How To Set Up & Run FB Lead Ads To Build Your Email List [Step By Step Tutorial] 50 min
- How to Make Money Online With Print on Demand Products 51 min
- Email Marketing Tips To Building a Loyal Tribe 49 min
- How To CONVERT Sales Through Email Marketing 9 min
- Beginners Guide To Making Money Online 136 min
- How To Create Resources Out of Thin Air & Raise Money For Your Business 10 min
- Practical Tips to Overcome Common Sales Objections 17 min
- How To Change Your Mindset To Build, Grow & Scale Your Business! (2.5 Hour Pure Value Training!) 151 min
- How To Make Money Online With Client Work From Small Business Owners 15 min
- How to INDEX and RANK Your Blog Post In Under 60 Seconds! 55 min
- How To Run FB Ads To Promote Offline Events [STEP BY STEP TUTORIAL] 53 min
- Advanced Shopify Training With Robert Nava & Chris Record 36 min
- Beginners Guide To Print On Demand Selling 46 min
- 30+ Strategies To Monetize Your Websites (3+ Hour Pure Value Training) 207 min
- FSL Method – Find, List, Sell Shopify System + Product Research Training 90 min
- Facebook Advertising Tips Masterclass With Chris Record & Nicholas Kusmich 40 min
- Instagram For Beginners (Step By Step Training) 143 min
- How To Sell Products On Shopify With Chris Record & Guests 86 min
- Affiliate Marketing 101 Principles & Strategies Training By Chris Record 32 min
- Mindset Mastery & Overcoming Sales Objections 94 min
- $100K Affiliate Marketing Gameplan (coming soon)
- 13 Elements of a Successful Sales Message – Speech by Chris Record 77 min
- Gamify Life & Level Up Your Lifestyle – Speech & Transcription by Chris Record 40 min
- Welcome To The Free Internet Marketing Training Center! 2 min
This FREE training library contains over 30 hours of pure value that will help you in your business.

It will also make you an AMBASSADOR of Tecademics which will allow you to promote their products and make $40 per month per subscriber (residual income), one time $800, and one time $4,000 comissions per subscriber. Don't miss this opportunity.

Tecademics Free Training – is worth thousand of dollars, so take advantage of it by going through it an implementing it in your business.

TEC MASTERMIND GLOBAL PRE-LAUNCH!
2016-09-08
Importing Contacts into LinkedIn Account
Importing Contacts into LinkedIn Account is a very important activity you should be doing on a regular basis.

There are few ways to accomplish importing contacts into LinkedIn account. LinkedIn is a powerful social media site (and search engine) which is intended for professionals and businesses to network and grow their influence and sales.

It is conceptually different from Facebook in the sense that it is not meant for social interaction with family and friends but rather for business interactions and opportunities.

Here are some stats from 2015:
- 332 million people in LinkedIn
- 187 million unique monthly visitors
- used in over 200 countries
- available in 20 languages
- 40% of users check LinkedIn on a daily basis
- visitors spend 17 minutes per month on average
- average CEO has 930 connections
- 56% males and 44% females use LinkedIn
- has 39 million students and recent graduates
- 41% of millionaires use LinkedIn
- 13% LinkedIn users don't have a Facebook account
- 83% don't have a Pinterest account
- 59% are not active on Twitter
Importing Contacts into LinkedIn Account is Important. Why am I saying this? Just to understand what kind of platform it is and to be able to see its importance and power which can be harnessed by growing your network and influence on LinkedIn.

One way of growing your network is by connecting to people individually and slowly growing 1 by 1. There are also different functions on LinkedIn which are supposed to provide some automation for rapidly adding several contacts at once.

You may be disappointed if you try different file upload functions to find out that they do not work. This has not been a recent thing but it is like that for years.

Trying the "Other Email" and "Upload File" will not let you import your email list (if this should work for you please let me know).

Even if you try to import contacts from your email providers such as "Hotmail", "Gmail", and "AOL" it will not work even if you previously exported the contacts from LinkedIn and try to re-upload them back, it may not work, so be careful not to delete your contacts (you may loose them all).

The only one method that worked for me is using the "Invite by email" function and there is a limitation to it as well.

Here are few screenshots how you get to that function:

Importing Contacts into LinkedIn Account is sometimes buggy. I found out that the limit to this function is 10 email at once. That means if you want to upload 100 email you will have to do that in chunks of 10 x 10 and you cannot personalize the message.

There could also be a limit how much total you can upload in one day. Few days ago I uploaded 120 and it would not accept more. The next day it would not let me upload any. the second day after my initial upload I was able to add another 120 and I did not see any error so I assume I could have uploaded more.

As per one post I have read the import function worked for "Yahoo" email imports.

Here are the steps which I did not try out:
- export Yahoo email address book (first and last name in separate columns) via a .csv file
- select the "Yahoo! Email" in the LinkedIn import function (see pic 2 above)
- enter you Yahoo email credentials
- after successful import go to "Manage your invitations"
- click "Select all" to select the just imported contacts (they will be grouped in one page section with today's date)
- select the "Invite selected contacts" to send out the invites
- you could do this after attending an event and obtaining the email list or if you hosted the event and have the list
- you could create one separate Yahoo! email account just for this purpose and after you import all contacts to LinkedIn, delete them all from Yahoo mail, repeat this process next time
There are other strategies and ways to add contacts via third party tools so stay tuned for future LinkedIn post and videos.

Importing Contacts into LinkedIn Account is just few clicks away. Do it now once and keep doing it.

If you find this post useful please share it with your network.

Peace.
2016-07-14