GithubHelp home page GithubHelp logo

ibm-openbmc / openbmc Goto Github PK

View Code? Open in Web Editor NEW
19.0 16.0 51.0 414.59 MB

Home Page: https://github.com

License: Other

PHP 0.26% BitBake 25.85% Assembly 0.02% Shell 6.38% Python 37.20% HTML 9.95% BlitzBasic 0.07% Lua 0.01% C 7.71% M4 0.97% Makefile 0.22% C++ 2.08% Roff 5.69% Perl 1.71% Pawn 0.04% XSLT 0.02% CSS 0.11% JavaScript 1.34% Pascal 0.33% CMake 0.04%

openbmc's Introduction

OpenBMC

Build Status

OpenBMC is a Linux distribution for management controllers used in devices such as servers, top of rack switches or RAID appliances. It uses Yocto, OpenEmbedded, systemd, and D-Bus to allow easy customization for your platform.

Setting up your OpenBMC project

1) Prerequisite

  • Ubuntu 14.04
sudo apt-get install -y git build-essential libsdl1.2-dev texinfo gawk chrpath diffstat
  • Fedora 28
sudo dnf install -y git patch diffstat texinfo chrpath SDL-devel bitbake \
    rpcgen perl-Thread-Queue perl-bignum perl-Crypt-OpenSSL-Bignum
sudo dnf groupinstall "C Development Tools and Libraries"

2) Download the source

git clone [email protected]:openbmc/openbmc.git
cd openbmc

3) Target your hardware

Any build requires an environment set up according to your hardware target. There is a special script in the root of this repository that can be used to configure the environment as needed. The script is called setup and takes the name of your hardware target as an argument.

The script needs to be sourced while in the top directory of the OpenBMC repository clone, and, if run without arguments, will display the list of supported hardware targets, see the following example:

$ . setup <machine> [build_dir]
Target machine must be specified. Use one of:

bletchley               hr630                   quanta-q71l
centriq2400-rep         hr855xg2                romulus
dl360poc                kudo                    s2600wf
e3c246d4i               lanyang                 stardragon4800-rep2
ethanolx                mihawk                  swift
evb-ast2500             mtjade                  thor
evb-ast2600             neptune                 tiogapass
evb-npcm750             nicole                  transformers
evb-zx3-pm3             olympus                 witherspoon
f0b                     olympus-nuvoton         witherspoon-tacoma
fp5280g2                on5263m5                x11spi
g220a                   p10bmc                  yosemitev2
gbs                     palmetto                zaius
gsj                     qemuarm

Once you know the target (e.g. romulus), source the setup script as follows:

. setup romulus

4) Build

bitbake obmc-phosphor-image

Additional details can be found in the docs repository.

OpenBMC Development

The OpenBMC community maintains a set of tutorials new users can go through to get up to speed on OpenBMC development out here

Build Validation and Testing

Commits submitted by members of the OpenBMC GitHub community are compiled and tested via our Jenkins server. Commits are run through two levels of testing. At the repository level the makefile make check directive is run. At the system level, the commit is built into a firmware image and run with an arm-softmmu QEMU model against a barrage of CI tests.

Commits submitted by non-members do not automatically proceed through CI testing. After visual inspection of the commit, a CI run can be manually performed by the reviewer.

Automated testing against the QEMU model along with supported systems are performed. The OpenBMC project uses the Robot Framework for all automation. Our complete test repository can be found here.

Submitting Patches

Support of additional hardware and software packages is always welcome. Please follow the contributing guidelines when making a submission. It is expected that contributions contain test cases.

Bug Reporting

Issues are managed on GitHub. It is recommended you search through the issues before opening a new one.

Questions

First, please do a search on the internet. There's a good chance your question has already been asked.

For general questions, please use the openbmc tag on Stack Overflow. Please review the discussion on Stack Overflow licensing before posting any code.

For technical discussions, please see contact info below for Discord and mailing list information. Please don't file an issue to ask a question. You'll get faster results by using the mailing list or Discord.

Features of OpenBMC

Feature List

  • Host management: Power, Cooling, LEDs, Inventory, Events, Watchdog
  • Full IPMI 2.0 Compliance with DCMI
  • Code Update Support for multiple BMC/BIOS images
  • Web-based user interface
  • REST interfaces
  • D-Bus based interfaces
  • SSH based SOL
  • Remote KVM
  • Hardware Simulation
  • Automated Testing
  • User management
  • Virtual media

Features In Progress

  • OpenCompute Redfish Compliance
  • Verified Boot

Features Requested but need help

  • OpenBMC performance monitoring

Finding out more

Dive deeper into OpenBMC by opening the docs repository.

Technical Steering Committee

The Technical Steering Committee (TSC) guides the project. Members are:

  • Brad Bishop (chair), IBM
  • Nancy Yuen, Google
  • Sai Dasari, Facebook
  • James Mihm, Intel
  • Sagar Dharia, Microsoft
  • Samer El-Haj-Mahmoud, Arm

Contact

openbmc's People

Contributors

amboar avatar anoo1 avatar benjaminfair avatar bradbishop avatar devenrao avatar dhruvibm avatar dkodihal avatar eddiejames avatar edtanous avatar geissonator avatar georgehung1210 avatar ghf avatar gtmills avatar lxwinspur avatar mdmillerii avatar mine260309 avatar msbarth avatar nest1ing avatar nkskjames avatar pstrinkle avatar rameshiyyar avatar rfrandse avatar saqibkh avatar shenki avatar spinler avatar tomjoseph83 avatar vijaykhemka avatar vishwabmc avatar wak-google avatar williamspatrick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openbmc's Issues

Journal persistence not enabled for Mihawk or Mowgli

Most IBM servers enable journal persistence via a config override out in https://github.com/ibm-openbmc/openbmc/blob/OP940/meta-ibm/meta-witherspoon/recipes-core/systemd/systemd_%25.bbappend, by bringing in https://github.com/ibm-openbmc/openbmc/blob/OP940/meta-ibm/meta-witherspoon/recipes-core/systemd/systemd/journald-storage-policy.conf.

It doesn't look like Mihawk or Mowgli do this. Any particular reason? The persistent journal has been very useful for debugging failures.

1030.10.ips: `Input power was lost` appeared probalilistically after AC

Pre-condition:

  1. The server power cable is connect to a network power controller to do AC on/off through network.
  2. Create a LPAR and install OS.
  3. Enable option “Automatically start when the managed system is powered on” in LPAR profile.

AC Cycle steps:

  1. A script is executed on a client to monitor and control server power status.
  2. Power on server and wait 6 minutes to power off server with command “obmcutil poweroff” in BMC console.
  3. When script detects the host is powered off, send command to the network power controller to do AC off.
  4. After 30 seconds, send command to do AC on.
  5. Wait 3 minutes for BMC to be ready, and then send command “obmcutil poweron” in BMC console to power on host.
  6. Then server boots to runtime and LPAR boots to OS then power off again.
  7. Repeat step2-6

event Log:

{
"Private Header": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Created by":               "0x2700",
    "Created at":               "03/23/2023 02:23:11",
    "Committed at":             "03/23/2023 02:23:12",
    "Creator Subsystem":        "BMC",
    "CSSVER":                   "",
    "Platform Log Id":          "0x50001AB7",
    "Entry Id":                 "0x50001AB7",
    "BMC Event Log Id":         "2619"
},
"User Header": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Log Committed by":         "0x2000",
    "Subsystem":                "Power/Cooling",
    "Event Scope":              "Entire Platform",
    "Event Severity":           "Critical Error, Scope of Failure unknown",
    "Event Type":               "Not Applicable",
    "Action Flags": [
                                "Service Action Required",
                                "Report Externally"
    ],
    "Host Transmission":        "Acked",
    "HMC Transmission":         "Acked"
},
"Primary SRC": {
    "Section Version":          "1",
    "Sub-section type":         "1",
    "Created by":               "0x2700",
    "SRC Version":              "0x02",
    "SRC Format":               "0x55",
    "Virtual Progress SRC":     "False",
    "I5/OS Service Event Bit":  "False",
    "Hypervisor Dump Initiated":"False",
    "Backplane CCIN":           "2E2F",
    "Terminate FW Error":       "False",
    "Deconfigured":             "False",
    "Guarded":                  "False",
    "Error Details": {
        "Message":              "Input power was lost while the system was powered on."
    },
    "Valid Word Count":         "0x09",
    "Reference Code":           "110000AC",
    "Hex Word 2":               "00080055",
    "Hex Word 3":               "2E2F0010",
    "Hex Word 4":               "00000000",
    "Hex Word 5":               "00000000",
    "Hex Word 6":               "00000000",
    "Hex Word 7":               "00000000",
    "Hex Word 8":               "00000000",
    "Hex Word 9":               "00000000",
    "Callout Section": {
        "Callout Count":        "1",
        "Callouts": [{
            "FRU Type":         "Symbolic FRU",
            "Priority":         "Mandatory, replace all with this type as a unit",
            "Part Number":      "ACMODUL"
        }]
    }
},
"Extended User Header": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Created by":               "0x2000",
    "Reporting Machine Type":   "9105-42A",
    "Reporting Serial Number":  "783C4E1",
    "FW Released Ver":          "PL1030_045",
    "FW SubSys Version":        "fw1030.10-17.4",
    "Common Ref Time":          "00/00/0000 00:00:00",
    "Symptom Id Len":           "20",
    "Symptom Id":               "110000AC_2E2F0010"
},
"Failing MTMS": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Created by":               "0x2000",
    "Machine Type Model":       "9105-42A",
    "Serial Number":            "783C4E1"
},
"User Data 0": {
    "Section Version": "1",
    "Sub-section type": "1",
    "Created by": "0x2000",
    "BMCLoad": "3.41 0.94 0.32",
    "BMCState": "NotReady",
    "BMCUptime": "0y 0d 0h 1m 10s",
    "BootState": "",
    "ChassisState": "",
    "FW Version ID": "fw1030.10-17.4-ips-1030.2307.20230307i-prod (PL1030_045)",
    "HostState": "",
    "Process Name": "/usr/bin/phosphor-chassis-state-manager",
    "System IM": "50001000"
},
"User Data 1": {
    "Section Version": "1",
    "Sub-section type": "1",
    "Created by": "0x2000",
    "_PID": "1805"
}
}

1030.20.ips: incorrect description of error message

When a readonly user attempts to modify the permission to Administrator, a message error is displayed. Currently, the error message is "Error Cannot read property '0' of undefined", which should show "No permission to modify this setting" or cancel the Administrator permission modification menu for readonly users, only retaining the password modification function.

user

1030.00.ips: Building p10bmc image failed

It seems that elfutils_0.185 seems to have an error. When I replace it with elfutils_0.187, the compilation passes.

| gcc  -D_GNU_SOURCE -DHAVE_CONFIG_H -DLOCALEDIR='"/media/georgeliu/inspur/work/lxwinspur/openbmc_1030.ips/build/p10bmc/tmp/work/x86_64-linux/elfutils-native/0.185-r1/recipe-sysroot-native/usr/share/locale"'  -DDEBUGPRED=0 -DSRCDIR=\"/media/georgeliu/inspur/work/lxwinspur/openbmc_1030.ips/build/p10bmc/tmp/work/x86_64-linux/elfutils-native/0.185-r1/elfutils-0.185/src\" -DOBJDIR=\"/media/georgeliu/inspur/work/lxwinspur/openbmc_1030.ips/build/p10bmc/tmp/work/x86_64-linux/elfutils-native/0.185-r1/build/src\" -I. -I../../elfutils-0.185/src -I..  -I. -I../../elfutils-0.185/src -I../../elfutils-0.185/lib -I.. -I../../elfutils-0.185/src/../libelf -I../../elfutils-0.185/src/../libebl -I../../elfutils-0.185/src/../libdw -I../../elfutils-0.185/src/../libdwelf -I../../elfutils-0.185/src/../libdwfl -I../../elfutils-0.185/src/../libasm -isystem/media/georgeliu/inspur/work/lxwinspur/openbmc_1030.ips/build/p10bmc/tmp/work/x86_64-linux/elfutils-native/0.185-r1/recipe-sysroot-native/usr/include -std=gnu99 -Wall -Wshadow -Wformat=2 -Wold-style-definition -Wstrict-prototypes -Wtrampolines -Wlogical-op -Wduplicated-cond -Wnull-dereference -Wimplicit-fallthrough=5 -Werror -Wunused -Wextra    -D_FORTIFY_SOURCE=2 -isystem/media/georgeliu/inspur/work/lxwinspur/openbmc_1030.ips/build/p10bmc/tmp/work/x86_64-linux/elfutils-native/0.185-r1/recipe-sysroot-native/usr/include -O2 -pipe -c -o elflint.o ../../elfutils-0.185/src/elflint.c
| ../../elfutils-0.185/src/elflint.c: In function ‘check_sections’:
| ../../elfutils-0.185/src/elflint.c:4105:48: error: null pointer dereference [-Werror=null-dereference]
|  4105 |                                  idx < databits->d_size && ! bad;
|       |                                        ~~~~~~~~^~~~~~~~
| cc1: all warnings being treated as errors
| make[2]: *** [Makefile:801: elflint.o] Error 1
| make[1]: *** [Makefile:525: all-recursive] Error 1
| make: *** [Makefile:441: all] Error 2
| ERROR: oe_runmake failed
| WARNING: exit code 1 from a shell command.
ERROR: Task (virtual:native:/media/georgeliu/inspur/work/lxwinspur/openbmc_1030.ips/meta/recipes-devtools/elfutils/elfutils_0.185.bb:do_compile) failed with exit code '1'
NOTE: Tasks Summary: Attempted 404 tasks of which 387 didn't need to be rerun and 1 failed.

SBE dump parse fail

I tried to parse SBE trace from sbe dump, but encountered some problems.
Some error report here:
https://github.com/open-power/sbe/blob/c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbe-debug.py#L178
Is the tool version on github consistent with the one you @skumar8j

Here is the log:
li@Ubuntu-li:~/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug$ ./sbe-debug.py -l trace -t FILE -r System_Dump_Entry_SBE_30000004 -f plat_dump/30000004.0_0_SbeData_p10_p10_pibmem_dump

Symbol File: [/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbe_DD1.syms]
Parsing the Dump header
Missing section indicator is 0x0
BMC System: 1
Running command [hexdump -v -e '1/8 "%016x"' -e '"\n"' plat_dump/30000004.0_0_SbeData_p10_p10_pibmem_dump| xxd -r -p > output_file]
hexdump: x: bad byte count
Running command [cp output_file plat_dump/30000004.0_0_SbeData_p10_p10_pibmem_dump]

Trace buffer symbol addr: [fffd2e80] Trace Buffer Length: [00000838]

String File: [/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbeStringFile_DD1]
Running command [/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/ppe2fsp /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/DumpPIBMEM /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbetrace.bin ]
Failed converting ppe trace to fsp trace. rc = 6
PPE trace buffer must be version 2.
ERROR running command: 1536

fsp-trace: [/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/fsp-trace]
Running command [/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/fsp-trace -s /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbeStringFile_DD1 /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbetrace.bin > /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbe_0_0_tracMERG]
fsp-trace.c is_smartDump [503]: read 40 bytes of /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbetrace.bin = 0, 19: No such device
fsp-trace.c parse_opt: file /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbetrace.bin: not an fsp-trace file (Incorrect Version?)
adal_parse.c trace_adal_read_stringfile: stringfile magic cookie not found or corrupted.
fsp-trace.c read_stringfiles: cannot read stringfile '/home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/sbeStringFile_DD1'
ERROR running command: 512
Running command [mv /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/DumpPIBMEM /home/li/works/P10/fw1030-master/output/build/sbe-p10-c74232131c91c41b418e711f6fc181ff3b881d7a/src/tools/debug/dumpPibMem_trace]

1030.ips: Host failed to power on

We have a Rainier machine used to develop eBMC. Now we have encountered a problem. After refreshing the BMC Fw several times, there will be a problem of Host Power On failure. There is no output from the Host Console. According to the log, the SBE has been executed, but the BMC is still running. The getTID command sent by the Host to establish a heartbeat has not been received.
I'm not sure if it's a communication problem here, or a problem with SBE failing to start.
During this period, I changed the image of the IBM release version, factory reset, AC and other attempts, but the problem still exists (I am not sure whether there will be residual after refreshing Fw, is there a way similar to eflash to test this problem?)

After a few days, we didn't do any operation, the strange thing is that there is output from Host Console, do you know the reason?
The attachment is the log of a normal boot and the log of a boot failure, please help to check, thank you

1040.00.ips: Incorrect web prompt

The prompt on the Operations ->Server power operations page is incorrect.
In manual mode, select normal mode, and the prompt is "Current operating mode Normal. On save operating mode will be set to Manual". It should be "Current operating mode Manual. On save operating mode will be set to Normal"

手动自动提示错误1
手动自动提示错误2

A translation error

When the password complexity requirements are not met, the error message displays 'Error updating user 'admin' because the password was not accepted.' In this case, the prompt should indicate that the password does not meet the complexity requirements or specify which complexity requirement is not satisfied. The suggested English translation is: 'Error updating user 'admin' because the password complexity requirements are not met.

image

1030.ips: SMS: Host cannot boot to PHYP after power on

Problem Description

Using Hostfw compiled by IPS, the machine cannot boot to PHYP, and the Host console stays at the C7004091 interface

  • Hostfw compiled by IPS is based on IBM's Hostfw, just replacing HBB, HBBL, HBEL, HBI, HBICORE_SYMS, HBOTSTINGFILE, HBRT, HBRT_RT, HB_VOLATILE, HBD-4U, HBD_RT-4U, HBD_RT-4U.

Hostboot

https://github.com/open-power/hostboot

Branch

master-p10

CommitID

559907d6676d180c693afaa9248e94e83abc7553

image
image

Host Console:

image

Also, An error will be displayed when collecting system dump and BMC dump:

If you don't see any dumps, be sure you have the appropriate policies enabled

image
image

1030.ips: The Health status of Powersupply shows Critical after AC

On a 4U Ranier machine, after manual AC (4 power supplies are inserted in any order), every time the power supply that is inserted first will report a PSU_Kill fault

The following are the test steps

  1. Update BMC Fw or do a factory reset
  2. AC and Host does not power on
  3. check the Health status of the Power supply via GUI

Event Log:
issues.tar.gz

image

1030-ips: dev ACF can't initiate dump

We are unable to initiate dump using the dev acf file, and the web prompts a "Invalid password"

Expected Behavior

Successfully 'Initiate dump' using the generated dev acf file.

Actual Behavior

The dev acf file can be used to login to ASMI, but when we click "Initiate dump" button, we get a "Invalid password" error.
And we will see a pldm error:
Apr 18 11:11:19 p10bmc pldmd[696]: fileAckWithMetaData with token: 4294967295 and status: 1(AcfFileInvalid)

Steps to Reproduce the Problem

  1. We followed this step to create and setup the ACF file: https://github.com/ibm-openbmc/ibm-acf/tree/1030.ips#how-to-setup-this-feature
  2. Login as service account to eBMC GUI
  3. Navigate to Logs ->Dumps
  4. Select "Resource Dump" dump type
  5. Enter "serialctl -hyp -assign" as the resource selector and enter the ACF Password into the "Password" field and select "Initiate dump" to enable the PHYP console.
  6. Error occurred: "Error initiating dump.Invalid password"

123

1030.ips:Host failed to power on

We encountered the problem of Host Power On failure on a 2U Rainier machine, the attachment is the related log and dump file:

sbedump.tar.gz
FailedToStart_2-0208.log
os-release

We consulted @ojayanth about this problem, and after analysis, we think that the problem should appear in SBE SEEPROM images.
We've tried switching to an alternate SBE image test, and the Host power on was successful.

Please someone from IBM's SBE team help to analyze the reason, thanks!

1030/1040-IPS:the black point in the checkbox of option B moves to option A after option A saved and then select option B

Unexpected behavior you saw
the black point in the checkbox of option B moves to option A after option A saved and then select option B

Expected behavior
Any option can be selected after a option is selected and saved

To Reproduce
Steps to reproduce the behavior:
1.Login ASM web and navigate to Operations->Server power opertions
2.Select option "Stay On" of "server power policy" and click save.
3.Select another option "Power Off".
4.After 2 seconds,the black point in the checkbox of "Power off" moves to "Stay On".
5.The same phenomenon occurred on other options.Please refer to the attached screen record for details.
6.This can be reproduced on 1030.20/1040.00

Screenshots
Please refer to the attached video
https://github.com/ibm-openbmc/openbmc/assets/140057259/d5805510-34ca-4f9a-a830-a02586a03205

OpenBMC Information:
FW1030.20/1040.00

1030 ips: SBE 1 should be automatically switched if the SBE 0 is broken

The current logic is: If the BMC reboot fails three times, it will automatically switch to SBE 1 (this logic considers that SBE 0 is broken)

In fact, we encountered a phenomenon:
When the BMC executes host power on, it is found that SBE 0 is broken. The normal logic is that the BMC should automatically restart and try three times. If it fails, it will automatically switch to SBE1.
But when the bmc fails to power on for the first time, the bmc will be stuck after the SBE 0 startup fails, and the bmc will not be automatically restarted, so the BMC reboot will not be executed, which will not automatically switch to SBE 1

Is this a problem?

IPS localized FW Character Modification

Hi @mzipse,
We have some requirements for localized FW.

  1. Change the "IBM" appearing in lshw (Linux) or lsmcode (AIX) results to "Inspur Power Systems"

  2. Products that currently need to be modified: All localized P10 products: Rainier (eBMC), Everest (eBMC), Denali (FSP) .

1040.00.ips: Wrong CXP Card Location information

BMC Web and HMC->PCIe Topology show inconsistent information regarding the two CXP cards and their corresponding connected NVMe expansion chassis (FC #ESR0) and two Fanouts' Local Port Location & Remote Port Location.

Issue description: Both the host's two CXP cards and the NVMe Fanout's two CXP cards have Local Port Location and Remote Port Location. For the host's two CXP cards (i.e., Primary), Local Port Location refers to their own two ports, while Remote Port Location refers to the two ports of the corresponding connected CXP card on the NVMe Fanout. For the Fanout's two CXP cards (i.e., Secondary), Local Port Location refers to their own two ports, and Remote Port Location should be the two ports of the host's corresponding connected CXP card.

BMC issue: The Local & Remote Port Location information for the Primary is displayed correctly. However, for the Secondary, the Local Port Location is displayed as its own two ports (correct), but the Remote Port Location is shown as "--", requiring confirmation if this is a WAD (Working As Designed).

HMC issue: The Local & Remote Port Location information for the Primary is displayed as N/A. For the Secondary, the Local Port Location is shown as the host's CXP card's two ports, and the Remote Port Location is displayed as its own two ports.

Summary: Whether in HMC or BMC Web, the information displayed for the same PCIe devices should be consistent (including ID, Parent ID, Speed, Width, Local Port Location, Remote Port Location, etc.). It's necessary to first confirm if the BMC Web information

显示
显示2

Some PLDM questions about WarmReboot

We've noticed that the current PLDM code only supports WarmReboot[1], which, as we understand it, means that when you reboot, the motherboard's power remains on (it doesn't go through the chassis power-off process).

Why PLDM has this constraint? Is it because of the hardware design constraints of the Rainier (p10) platform that led to the choice of only supporting WarmReboot?

Additionally, in the code of phosphor-state-manager[2], we see that WarmReboot can be mapped to ColdReboot, and it mentions that "Some systems do not support a warm reboot." Is it due to motherboard design constraints that some of the systems mentioned here do not support warm reboot?

Which systems do not support WarmReboot, which ones do, and how can they be distinguished?

[1] https://github.com/ibm-openbmc/pldm/blob/1050/oem/ibm/libpldmresponder/oem_ibm_handler.cpp#L1534
[2] https://github.com/ibm-openbmc/phosphor-state-manager/blob/1050/host_state_manager.cpp#L100

1030.ips branch: Host Power On fails after several AC cycles

Pre-condition:

  1. The server power cable is connected to a network power controller to do AC on/off through the network.
  2. Create an LPAR and install OS.
  3. Enable the option “Automatically start when the managed system is powered on” in the LPAR profile.
  4. Install the HTX tool in LPAR and execute “hcl –bootme on mode:hardf period:1” to shut down LPAR in every 20 minutes.
  5. Enable the system option “Power off when the last logical partition is shutdown”.
  6. Setup system “partition start policy” to “Auto-start always”.

AC Cycle steps:

  1. A script is executed on a client to monitor and control server power status.
  2. LPAR shutdown and then host shutdown.
  3. The script keeps monitoring power status with command “obmcutil status” in BMC console.
  4. When the script detects the host is powered off, send a command to the network power controller to do AC off.
  5. After 30 seconds, send a command to do AC on.
  6. Wait 3 minutes for BMC to be ready, and then send the command “obmcutil poweron” in the BMC console to power on the host.
  7. Then the server boots to runtime and LPAR boots to OS then bootme shutdown works again.
  8. Repeat steps 2-7.

About ten times, it will fail to host power on in the sixth step, the following is the event log:

{
"Private Header": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Created by":               "0x2700",
    "Created at":               "02/04/2023 04:25:46",
    "Committed at":             "02/04/2023 04:25:46",
    "Creator Subsystem":        "BMC",
    "CSSVER":                   "",
    "Platform Log Id":          "0x50000112",
    "Entry Id":                 "0x50000112",
    "BMC Event Log Id":         "539"
},
"User Header": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Log Committed by":         "0x2000",
    "Subsystem":                "Power/Cooling",
    "Event Scope":              "Entire Platform",
    "Event Severity":           "Critical Error, Scope of Failure unknown",
    "Event Type":               "Not Applicable",
    "Action Flags": [
                                "Service Action Required",
                                "Report Externally"
    ],
    "Host Transmission":        "Not Sent",
    "HMC Transmission":         "Acked"
},
"Primary SRC": {
    "Section Version":          "1",
    "Sub-section type":         "1",
    "Created by":               "0x2700",
    "SRC Version":              "0x02",
    "SRC Format":               "0x55",
    "Virtual Progress SRC":     "False",
    "I5/OS Service Event Bit":  "False",
    "Hypervisor Dump Initiated":"False",
    "Backplane CCIN":           "2E2D",
    "Terminate FW Error":       "False",
    "Deconfigured":             "False",
    "Guarded":                  "False",
    "Error Details": {
        "Message":              "Input power was lost while the system was powered on."
    },
    "Valid Word Count":         "0x09",
    "Reference Code":           "110000AC",
    "Hex Word 2":               "00000055",
    "Hex Word 3":               "2E2D0010",
    "Hex Word 4":               "00000000",
    "Hex Word 5":               "00000000",
    "Hex Word 6":               "00000000",
    "Hex Word 7":               "00000000",
    "Hex Word 8":               "00000000",
    "Hex Word 9":               "00000000",
    "Callout Section": {
        "Callout Count":        "1",
        "Callouts": [{
            "FRU Type":         "Symbolic FRU",
            "Priority":         "Mandatory, replace all with this type as a unit",
            "Part Number":      "ACMODUL"
        }]
    }
},
"Extended User Header": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Created by":               "0x2000",
    "Reporting Machine Type":   "9105-22B",
    "Reporting Serial Number":  "783C511",
    "FW Released Ver":          "PL1030_026",
    "FW SubSys Version":        "fw1030.00.ips-2",
    "Common Ref Time":          "00/00/0000 00:00:00",
    "Symptom Id Len":           "20",
    "Symptom Id":               "110000AC_2E2D0010"
},
"Failing MTMS": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Created by":               "0x2000",
    "Machine Type Model":       "9105-22B",
    "Serial Number":            "783C511"
},
"User Data 0": {
    "Section Version": "1",
    "Sub-section type": "1",
    "Created by": "0x2000",
    "BMCLoad": "1.33 0.33 0.11",
    "BMCState": "NotReady",
    "BMCUptime": "0y 0d 0h 0m 37s",
    "BootState": "",
    "ChassisState": "",
    "FW Version ID": "fw1030.00.ips-25-1030.2249.20221130i-prod (PL1030_026)",
    "HostState": "",
    "Process Name": "/usr/bin/phosphor-chassis-state-manager",
    "System IM": "50001001"
},
"User Data 1": {
    "Section Version": "1",
    "Sub-section type": "1",
    "Created by": "0x2000",
    "_PID": "551"
}
}
,{
"Private Header": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Created by":               "0x3400",
    "Created at":               "02/04/2023 04:46:22",
    "Committed at":             "02/04/2023 04:46:22",
    "Creator Subsystem":        "BMC",
    "CSSVER":                   "",
    "Platform Log Id":          "0x50000129",
    "Entry Id":                 "0x50000129",
    "BMC Event Log Id":         "584"
},
"User Header": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Log Committed by":         "0x2000",
    "Subsystem":                "BMC Firmware",
    "Event Scope":              "Entire Platform",
    "Event Severity":           "Critical Error, Scope of Failure unknown",
    "Event Type":               "Not Applicable",
    "Action Flags": [
                                "Service Action Required",
                                "Report Externally",
                                "HMC Call Home"
    ],
    "Host Transmission":        "Not Sent",
    "HMC Transmission":         "Not Sent"
},
"Primary SRC": {
    "Section Version":          "1",
    "Sub-section type":         "1",
    "Created by":               "0x3400",
    "SRC Version":              "0x02",
    "SRC Format":               "0x55",
    "Virtual Progress SRC":     "False",
    "I5/OS Service Event Bit":  "False",
    "Hypervisor Dump Initiated":"False",
    "Backplane CCIN":           "",
    "Terminate FW Error":       "False",
    "Deconfigured":             "False",
    "Guarded":                  "False",
    "Error Details": {
        "Message":              "A critical BMC application has failed on the system, the BMC is in an undefined state"
    },
    "Valid Word Count":         "0x09",
    "Reference Code":           "BD8D3404",
    "Hex Word 2":               "00080055",
    "Hex Word 3":               "00000010",
    "Hex Word 4":               "00000000",
    "Hex Word 5":               "00000000",
    "Hex Word 6":               "00000000",
    "Hex Word 7":               "00000000",
    "Hex Word 8":               "00000000",
    "Hex Word 9":               "00000000",
    "Callout Section": {
        "Callout Count":        "1",
        "Callouts": [{
            "FRU Type":         "Maintenance Procedure Required",
            "Priority":         "Mandatory, replace all with this type as a unit",
            "Procedure":        "BMC0002"
        }]
    }
},
"Extended User Header": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Created by":               "0x2000",
    "Reporting Machine Type":   "",
    "Reporting Serial Number":  "",
    "FW Released Ver":          "PL1030_026",
    "FW SubSys Version":        "fw1030.00.ips-2",
    "Common Ref Time":          "00/00/0000 00:00:00",
    "Symptom Id Len":           "20",
    "Symptom Id":               "BD8D3404_00000010"
},
"Failing MTMS": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Created by":               "0x2000",
    "Machine Type Model":       "",
    "Serial Number":            ""
},
"User Data 0": {
    "Section Version": "1",
    "Sub-section type": "1",
    "Created by": "0x2000",
    "BMCLoad": "1.45 0.45 0.16",
    "BMCState": "NotReady",
    "BMCUptime": "0y 0d 0h 1m 2s",
    "BootState": "Unspecified",
    "ChassisState": "Off",
    "FW Version ID": "fw1030.00.ips-25-1030.2249.20221130i-prod (PL1030_026)",
    "HostState": "Off",
    "System IM": ""
},
"User Data 1": {
    "Section Version": "1",
    "Sub-section type": "1",
    "Created by": "0x2000",
    "SYSTEMD_RESULT": "failed",
    "SYSTEMD_UNIT": "com.ibm.VPD.Manager.service"
}
}
,{
"Private Header": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Created by":               "0x3400",
    "Created at":               "02/04/2023 04:46:22",
    "Committed at":             "02/04/2023 04:46:22",
    "Creator Subsystem":        "BMC",
    "CSSVER":                   "",
    "Platform Log Id":          "0x5000012A",
    "Entry Id":                 "0x5000012A",
    "BMC Event Log Id":         "585"
},
"User Header": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Log Committed by":         "0x2000",
    "Subsystem":                "BMC Firmware",
    "Event Scope":              "Entire Platform",
    "Event Severity":           "Critical Error, Scope of Failure unknown",
    "Event Type":               "Not Applicable",
    "Action Flags": [
                                "Service Action Required",
                                "Report Externally",
                                "HMC Call Home"
    ],
    "Host Transmission":        "Not Sent",
    "HMC Transmission":         "Not Sent"
},
"Primary SRC": {
    "Section Version":          "1",
    "Sub-section type":         "1",
    "Created by":               "0x3400",
    "SRC Version":              "0x02",
    "SRC Format":               "0x55",
    "Virtual Progress SRC":     "False",
    "I5/OS Service Event Bit":  "False",
    "Hypervisor Dump Initiated":"False",
    "Backplane CCIN":           "",
    "Terminate FW Error":       "False",
    "Deconfigured":             "False",
    "Guarded":                  "False",
    "Error Details": {
        "Message":              "A critical BMC application has failed on the system, the BMC is in an undefined state"
    },
    "Valid Word Count":         "0x09",
    "Reference Code":           "BD8D3404",
    "Hex Word 2":               "00080055",
    "Hex Word 3":               "00000010",
    "Hex Word 4":               "00000000",
    "Hex Word 5":               "00000000",
    "Hex Word 6":               "00000000",
    "Hex Word 7":               "00000000",
    "Hex Word 8":               "00000000",
    "Hex Word 9":               "00000000",
    "Callout Section": {
        "Callout Count":        "1",
        "Callouts": [{
            "FRU Type":         "Maintenance Procedure Required",
            "Priority":         "Mandatory, replace all with this type as a unit",
            "Procedure":        "BMC0002"
        }]
    }
},
"Extended User Header": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Created by":               "0x2000",
    "Reporting Machine Type":   "",
    "Reporting Serial Number":  "",
    "FW Released Ver":          "PL1030_026",
    "FW SubSys Version":        "fw1030.00.ips-2",
    "Common Ref Time":          "00/00/0000 00:00:00",
    "Symptom Id Len":           "20",
    "Symptom Id":               "BD8D3404_00000010"
},
"Failing MTMS": {
    "Section Version":          "1",
    "Sub-section type":         "0",
    "Created by":               "0x2000",
    "Machine Type Model":       "",
    "Serial Number":            ""
},
"User Data 0": {
    "Section Version": "1",
    "Sub-section type": "1",
    "Created by": "0x2000",
    "BMCLoad": "1.45 0.45 0.16",
    "BMCState": "Quiesced",
    "BMCUptime": "0y 0d 0h 1m 2s",
    "BootState": "Unspecified",
    "ChassisState": "Off",
    "FW Version ID": "fw1030.00.ips-25-1030.2249.20221130i-prod (PL1030_026)",
    "HostState": "Off",
    "System IM": ""
},
"User Data 1": {
    "Section Version": "1",
    "Sub-section type": "1",
    "Created by": "0x2000",
    "SYSTEMD_RESULT": "failed",
    "SYSTEMD_UNIT": "com.ibm.panel.service"
}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.